基于Markov随机游走的谱聚类相似图构造方法

南京大学学报(自然科学版) ›› 2015, Vol. 51 ›› Issue (4): 772–780.

基于Markov随机游走的谱聚类相似图构造方法

曹江中^1*，陈佩²，戴青云³，凌永权¹

出版日期:2015-07-08 发布日期:2015-07-08
作者简介:（1. 广东工业大学信息工程学院，广州，510006；2. 中山大学信息科学与技术学院，广州，510006；
3. 广东技术师范学院，广州，510665）
基金资助:
国家自然科学基金(61173083，61372173)，中组部国家青年千人计划，广东省自然科学基金（2014A030310346），501130144）

Similarity graph construction method based on Markov random walker for spectral clustering

Cao Jiang zhong¹, Chen Pei², Dai Qingyun³, Ling Wing-Kuen¹

Online:2015-07-08 Published:2015-07-08
About author:(1. School of Information Engineering, Guangdong University of Technology, Guangzhou, 510006, China; 2. School of Information Science and Technology, Sun Yat-sen University, Guangzhou, 510006, China 3. Guangdong Polytechnic Normal University, Guangzhou, 510665, China)

摘要/Abstract

摘要： 谱聚类是一种基于图谱理论的聚类方法由样本数据构成的相似图是谱聚类的基础，也是影响谱聚类性能的一个重要因素提出一种基于Markov随机游走模型的稀疏相似图构造方法提出的方法在常规的k最近邻图上定义一个Markov随机游走点，利用游走点的高阶转移概率来选择近邻点由于高阶转移概率反映的是数据间多层复杂的关联程度，因此通过高阶转移概率确定的近邻数据更可靠在人工仿真和实际数据集上的对比实验表明，提出的方法较常规的近邻图能更好地反映存在数据中的结构，提高谱聚类的效果

Abstract: Spectral clustering is a clustering method based on graph theory. Similarity graph constructed on the data is the base of spectral clustering and is one of the key factors that influence the performance of spectral clustering. The similarity graphs usually contain some error edges, which degrade the performance of spectral clustering. So, how to reduce the error edges in similarity graph is a way to improve the performance of spectral clustering. Currently, the model of Markov random walker has been widely used to measure the relations. In the paper, a similarity graph construction method is proposed based on Markov random walker for spectral clustering. In the proposed method, a Markov random walker (MRW) is defined on the common k-nearest-neighbors graph, and the high-order transition probabilities of the MRW are adopted to determine the k nearest neighbors. The neighbors in the proposed graph are more reliable because the high-order transition probabilities reflect the more complex relations among data. Experiments are performed on the synthetic and real datasets for performance evaluation and comparison. The results show that the graph obtained by our proposed method can reduce error edges effectively and better reflects the structure of the data compared to those of the state-of-the-art methods. It is also shown that the proposed graph can effectively improve the performance of spectral clustering

曹江中^1*，陈佩²，戴青云³，凌永权¹. 基于Markov随机游走的谱聚类相似图构造方法[J]. 南京大学学报(自然科学版), 2015, 51(4): 772–780.

Cao Jiang zhong¹, Chen Pei², Dai Qingyun³, Ling Wing-Kuen¹. Similarity graph construction method based on Markov random walker for spectral clustering[J]. Journal of Nanjing University(Natural Sciences), 2015, 51(4): 772–780.

参考文献

[1].Jain A K, Murty M, and Flynn P. Data clustering: A review. ACM Computing Surveys,1999, 31(3): 264 -323.
[2].Lu H, Fu Z, Shu X. Non-negative and sparse spectral clustering. Pattern Recognition, 2014, 47(1): 418-426.
[3].Huang H C, Chuang Y Y, Chen C S. Affinity aggregation for spectral clustering. In: IEEE Conference on Computer Vision and Pattern Recognition,
Washington D C , USA:IEEE Computer Society,2012:773-780.
[4].高尚兵周静波严云洋. 一种新的基于超像素的谱聚类图像分割算法. 南京大学学报(自然科学), 2013, 49(2): 169175.
[5]. 纳跃跃，于剑. 一种用于谱聚类图像分割的像素相似度计算方法. 南京大学学报(自然科学)，2013, 49(2): 199-168.
[6].Luxburg U. A tutorial on spectral clustering. Statistics and Computing, 2007, 17 (4): 395-416.
[7].Shi J, Malik J. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22: 888-905.
[8].Ng A, Jordan M, Weiss Y. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, 2002: 849-856.
[9].田铮，李小斌，句彦伟. 谱聚类的扰动分析. 中国科学(E辑:信息科学), 2007, 37(4): 527-543.
[10].Yang X, Latecki L J. Affinity learning on a tensor product graph with applications to shape and image retrieval. In: IEEE Conference on Computer Vision and Pattern Recognition, 2011:2369-2376.
[11].Pavan M, Pelillo M. Dominant sets and pairwise clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(1): 167-162.
[12].Zhang X, You Q. An improved spectral clustering algorithm based on random walk. Frontiers of Computer Science in China, 2012, (3):268-278.
[13]. Meila M, Shi J. A random walks view of spectral segmentation. In: The 8th International Workshop on Artificial Intelligence and Statistics, Florida US, 2001.
[14].Donath W E, Hoffman A J. Lower bounds for partitioning of graphs. IBM Journal of Research and Development, 1973, 17: 420-425.
[15].Fiedler M. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 1973, 23: 298-305.
[16].Kadim T. Vector quantization based approximate spectral clustering of large datasets. Pattern Recognition, 2012, 45(8): 3034-3044.
[17].王玲, 薄列峰, 焦李成. 密度敏感的半监督谱聚类. 软件学报, 2007, 18(10): 2412-2422.
[18].王娜, 李霞. 基于监督信息特性的主动半监督谱聚类算法. 电子学报, 2010, 38(1): 172-176.
[19].吴健, 崔志明,时玉杰. 基于局部密度构造相似矩阵的谱聚类算法，通信学报，2013, 34(3): 14-22.
[20].Cao J, Chen P, Dai Q, et al. Local information-based fast approximate spectral clustering. Pattern Recognition Letters, 2014, 28: 63-69.
[21].Zhou D, Bousquet O, Lal T N, et al. Learning with local and global consistency. Advances in Neural Information Processing Systems, MIT Press, 2004: 321-328 .
[22].The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 2015-03-01.
[23].汪荣鑫. 随机过程. 西安：西安交通大学出版社，1987.278.
[24]Tishby N，Slonim N. Data clustering by Markovian relaxation and the information Bottleneck method.Advances in Neural
Information Processing Systems. MIT Press, 2000，640一646.
[25]Szummer M, Jaakkola T. Partially labeled classification with Markov random walks.Advances in Neural Information Processing
Systems,MIT Press,2002，945一952.
[26]Jung J，Wang B, Tu Z. Unsupervised metric learning by sell-smoothing operator. In: IEEE Conference on Computer Vision (ICCV)
Washington D C, USA:IEEE Computer Society, 2011:794一801.
[27]Chang H，Yeung D Y. Robust path-based spectral clustering. Pattern Recognition, 2008，41(1): 191一203.
[28]Zhang X, Li J，Yu H. Local density adaptive similarity measurement for spectral clustering Pattern Recognition Letters, 2011，32(2): 352一358.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed