基于邻域粗糙集约减的谱聚类算法

南京大学学报(自然科学版) ›› 2013, Vol. 49 ›› Issue (5): 619–627.

基于邻域粗糙集约减的谱聚类算法

贾洪杰1,2丁世飞1,2

出版日期:2014-02-08 发布日期:2014-02-08
作者简介:(中国矿业大学计算机科学与技术学院徐州221116; 2. 中国科学院计算技术研究所智能信息处理重点实验室北京100190)
基金资助:

国家自然科学基金 (61035003, 61175042, 61021062, 61305068) ，国家 973 项目 (2009CB320702) ，江苏省 973 项目 (BK2011005) ，教育部新世纪优秀人才支持计划 (NCET-10-0476)

A pectral clustering algorithm based on neighborhood rough sets reduction

Jia Hong-Jie1,2, Ding Shi-Fei1,2

Online:2014-02-08 Published:2014-02-08
About author:(1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China; 2. Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Science, Beijing, 100190, China)

摘要/Abstract

摘要： 谱聚类算法是近年来机器学习领域的研究热点，它基于代数图论，可以有效地解决很多实际问题。但是传统的谱聚类算法无法很好地处理高维数据，容易受到噪声和不相关属性的干扰。为了降低计算复杂度，同时减弱噪声数据和冗余属性对聚类的负面影响，提出了一种基于邻域粗糙集约减的谱聚类算法(NRSR-SC)。该算法将信息熵引入到邻域粗糙集中，在保持样本区分能力的前提下，去除冗余的属性，保留对聚类贡献最大的属性；然后基于约简后的属性集合，计算样本点之间的相似度，构造相似性矩阵和拉普拉斯矩阵；最后利用谱方法得到最终的聚类结果。实验表明，NRSR-SC算法在处理高维数据时，具有较强的抗干扰能力，其运行效率和准确率都有明显改善。

Abstract: Spectral clustering algorithm is a hot research field of machine learning in recent years. It is based on algebraic graph theory and can effectively solve many practical problems. However, suffering from the interference of noise and irrelevant attributes, traditional spectral clustering algorithm does not work well on high-dimensional data. In order to reduce the computational complexity and weaken the negative impact of noise data and redundant attributes on clustering, this paper proposes a spectral clustering algorithm based on neighborhood rough sets reduction (NRSR-SC). Information entropy is introduced into the neighborhood rough sets in this algorithm, so that redundant attributes can be removed and the attributes making the greatest contribution to clustering can be reserved, under the premise of maintaining the ability to distinguish different kind of samples. Then, based on the reduced attribute collection, the similarities between sample points are calculated to construct the affinity matrix and Laplacian matrix. At last, we use spectral method to get the final clustering results. Experiments show that, when dealing with high-dimensional data, NRSR-SC algorithm has a strong anti-jamming ability and the efficiency and accuracy has improved significantly

贾洪杰1,2丁世飞1,2. 基于邻域粗糙集约减的谱聚类算法[J]. 南京大学学报(自然科学版), 2013, 49(5): 619–627.

Jia Hong-Jie1,2, Ding Shi-Fei1,2. A pectral clustering algorithm based on neighborhood rough sets reduction[J]. Journal of Nanjing University(Natural Sciences), 2013, 49(5): 619–627.

参考文献

[1] Bull L. Learning classifier systems: A brief introduction. Applications of Learning Classifier Systems. Springer Berlin Heidelberg, 2004, 1~12.

[2] Holland J H. Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control and artificial intelligence. Cambridge, MA, USA: MIT Press, 1992, 228.

[3] Goldberg D E. Genetic algorithm in search, optimization, and machine learning. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1989, 372.

[4] Sutton R S, Barto A G. Reinforcement learning: An introduction. Cambridge, MA, USA: MIT Press, 1998, 322.

[5] Stolzmann W, Butz M. Latent learning and action planning in robots with anticipatory classifier systems. Pier L L, Wolfgang S, Stewart W W. Learning Classifier Systems. London, UK: Springer -Verlag, 2000: 301~320.

[6] Schulenburg S, Ross P. An adaptive agent based economic model. Pier L L, Wolfgang S, Stewart W W. Learning Classifier Systems. London, UK: Springer -Verlag, 2000: 263~282.

[7] Richards R A. Zeroth-order shape optimization utilizing a learning classifier system. Ph.D. Dissertation. Stanford University, Stanford, CA, 1995.

[8] Wilson S W. Classifier fitness based on accuracy. Evolutionary Computation, 1995, 3(2): 149~175.

[9] Wilson S W. Get real! XCS with continuous-valued inputs. Pier L L, Wolfgang S, Stewart W W. Learning Classifier Systems. London, UK: Springer -Verlag, 2000: 209~219.

[10] Wilson S W. Compact rulesets from XCSI. Lanzi P L, Stolzmann W, Wilson S W. Advances in Learning Classifier Systems: Fourth International Workshop, IWLCS2001. Berlin Heidelberg: Springer-Verlag, 2002: 197~208.

[11] Dixon P W, Corne D W, Oates M J. A ruleset reduction algorithm for the xcs learning classifier system. Lanzi P L, Stolzmann W, Wilson S W. Learning Classifier System: Fifth International Workshop, IWLCS2002. Berlin Heidelberg: Springer-Verlag, 2003: 20~29.

[12] Holmes J, Sager J, Bilker W. A comparison of three methods for covering missing data in XCS. Proceeding of the 7^th International Workshop on Learning Classifier Systems (IWLCS 2004), Seattle, Washington, USA, 2004.

[13] Gu D Q, Gao Y. Incremental gradient descent imputation method for missing data in learning classifier systems. Proceedings of the 2005 Workshops on Genetic and Evolutionary Computation. ACM, 2005: 72~73.

[14] Gao Y, Huang Z X, Wu L. Learning classifier system ensemble and compact rule set. Connection Science, 2007, 19(4): 321~337.

[15] Shi L D, Shi Y H, Gao Y. Clustering with XCS and agglomerative rule merging. Proceeding of the 10^th international conference on Intelligent Data Engineering and Automated Learning (IDEAL 2009). Berlin Heidelberg, Springer-Verlag, 2009: 242~250.

[16] Karypis G, Han E H, Kumar V. Chameleon: Hierarchical clustering using dynamic modeling. Computer, 1999, 32(8): 68~75.

[17] Zhu X J. Semi-supervised learning literature survey. Technical Report. University of Wisconsin at Madison, Madison, 2006.

[18] Seeger M. Learning with labeled and unlabeled data. Technical Report. University of Edinburgh, Edinburgh, 2001.

[19] Zhou Z H, Li M. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering , 2005, 17(11): 1529~1541.

[20] Chang Y, Liang J Y, Gao J W, et al. A semi-supervised clustering algorithm based on seeds and pair-wise constraints. Journal of Nanjing University (Natural Sciences), 2012, 48(4): 405~411.（常瑜，梁吉业，高嘉伟等. 一种基于Seeds集和成对约束的半监督聚类算法. 南京大学学报（自然科学），2012，48（4）：405~411）.

[21] Narayanan H, Belkin M, Niyogi P. On the relation between low density separation, spectral clustering and graph cuts. Advances in Neural Information Processing Systems, 2007, 19: 1025.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed