南京大学学报(自然科学版) ›› 2014, Vol. 50 ›› Issue (4): 505–.

• • 上一篇    下一篇

全粒度聚类算法

李飞江1*,成红红2,钱宇华1   

  • 出版日期:2014-08-23 发布日期:2014-08-23
  • 作者简介: 1. 山西大学计算机与信息技术学院,太原,030006; 2. 山西大学数学科学学院,太原,030006
  • 基金资助:
     高等学校博士学科点专项科研基金(20121401110013),新世纪优秀人才支持计划(NCET 12 1031)

Whole-granulation luster algorithm

 Li Feijiang1, Cheng Honghong2, Qian Yuhua1   

  • Online:2014-08-23 Published:2014-08-23
  • About author: (1. School of Computer and Information Technology, Shanxi University, Taiyuan, 03006, China;
    2. School of Mathematics, Shanxi University, Taiyuan, 03006, China

摘要: 聚类分析是数据挖掘与知识发现领域的一个重要研究方向。多数聚类算法中相似性是其核心概念之一,对象之间的相似性会被直接或者间接的计算出来。传统的相似性度量方法多是基于单一的粒度去观察两个被测对象。在人类认知过程中,通常采用多粒度来更合理有效地进行问题求解。本文借鉴人类的这种多粒度认知机理,提出一种新的相似性学习方法,称作全粒度相似性度量方法,基于此发展了一种全粒度聚类算法。而全粒度相似性度量从各个角度观察被测对象,进而会得到两个对象间更加真实的相似度。从UCI数据集中选取5组数据进行实验,最后通过与两种传统的聚类方法比较验证了全粒度聚类算法的合理性与有效性。

Abstract:  In cluster analysis, especially cluster in an optimization process, one of the dramatically affect is the similarity measure employed in the clustering criterion function. By far, all proposed cluster methods have to assume connection among the information objects that applied on. Similarity between every pair objects should be computed, there are two choices which defined as explicitly or implicitly. Hence weather the structure of data can be described by the similarity measure correctly determines the effectiveness of a clustering algorithm. In addition, as one of important characters in human’s cognition, multi-granulation cognition plays a key role for data modeling. On account of from multi-perspective and multi-level to parse one problem, multi-granulation analysis can obtain more reasonable and more satisfied solutions. Through referencing human’s multi-granulation cognitive ability, in this paper, we introduced a novel similarity measure called whole-granulation similarity measure and apply this similarity measure into clustering criterion function to get a cluster algorithm called whole-granulation cluster algorithm in order to verify the rationalization of whole-granulation similarity measure. The traditional dissimilarity/similarity measure exercise only one single viewpoints, usually is the origin. More informative assessment of similarity could be achieved because whole-granulation takes all sides into consideration. As a leading partitional clustering technique, k-means is one of the most favorite algorithms to be used, because k-means is fast and easy to combine with other methods. Many research put forward the k-means through improve the heuristic function or combine with other method. This is an active aspect to do clustering research. Under this approach we introduce our measure method into cluster analysis through k-means algorithm as an initial testing. Experiments are conducted with five data sets are selected from UCI machine learning repository. Finally, compared whole-granulation cluster algorithm with two traditional cluster algorithms to verity the validity and proved the rationality of whole-granulation similarity measure at the same time. And the astringency experiment show that whole-granulation similarity measure have a strong performance as a way to measure similarity.

 [1] Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD Conference, Seattle, 1998: 73~84.
[2] Guha S, Rastogi R, Shim K. CURE: A robust clustering algorithm for categorical attributes. In: Proceedings of the 15th International Conference on Data Engineering, Sydney, Australia, IEEE Computer Society, 1999: 512~521.
[3] Karypis G, Han E H, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 1999, 32(8): 68~75.
[4] Ester M, Kriegel H P, Sander J. A density-based algorithm for discovering cluster in large spatial databases with Noise. In: Proceedings of the 2nd ACM SIGKDD, Portland, AAAI Press, 1996: 226~231.
[5] Hinneburg A, Keim D. An efficient approach to clustering large multimedia databases with noise. In: Proceedings of the 4th ACM SIGKDD, New York, NY, September, 1998, 58~65.
[6] Wang W, Yang J, Muntz R. STNG: A statistical information grid approach to spatial data minging. In: Proceedings of the 23rd Conference on VLDB, Athens, Morgan Kaufmann, 1997: 186~195.
[7] Wang W, Yang J, Muntz R. STNG+: An approach to active spatial data mining. In: Proceedings of the 15th ICDE, Sydney, IEEE Computer Society ,1999: 116~125.
[8] Agrawal R, Gehrke J, Gunopulos D. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD Conference, Seattle, Springer Verlag, Kluwer Academic Publishers, 1998: 94~105.
[9] Chris D. A tutorial on spectral clustering. In: Proceedings of the International Conference of Machine Learning, Banff, Springer US, 2004: 395-416.
[10] Kriegel H P, Kroger P, Zimek A. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transaction on Knowledge Discovery from Data, 2009, 3(1): 1~58.
[11] Dhillon I. Co-clustering documents and words using bipartite spectral path partitioning. In: Proceedings of the 7th ACM SIGKDD, New York, ACM Press, 1999: 73~83.
[12] Wu X, Kumar V, Quinlan J R, et al. Top 10 Algorithms in Data Mining. Knowledge Information Systems, 2007, 14(1): 1~37.
[13] Guyon I, Luxburg U V, Williamson R C. Clustering: Science or art? Technical Report, NIPS 2009 Workshop Clustering: science or Art? Vancouver, Canada, 2009.
[14] Lin T Y. Granular computing. Announcement of the BISC Special Interest Group on Granular Computing, 1997.
[15] Duc Thang Nguyen, Chen L H, Chee Keong Chan. Clustering with multiviewpoint-based similarity measure. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 988~1001.
[16] Lee D, Lee J. Dynamic dissimilarity measure for support based clustering. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(6): 900~905.
[17] Strehl A, Ghosh J, Mooney R. Impact of similarity measures on web-page clustering. In: Proceedings of the 17th International Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search (AAAI), 2000: 58~64.
[18] Qian Y, Liang J, Dang C. Incomplete multigranulation rough set. IEEE Transactions on Systems, Man and Cybernetics-Part A, 2010, 40(2): 420~431.
[19] Qian Y, Liang J, Yao Y, et al. MGRS: A multi-granulation rough set. Information Sciences, 2010, 180: 949~970.
[20] Qian Y, Liang J, Witold P, et al. Positive approximation: An accelerator for attribute reduction in rough set theory. Artificial Intelligence, 2010, 174: 597~618.
[21] Qian Y, Liang J, Wu W, et al. Information granularity in fuzzy binary GrC model. IEEE Transactions on Fuzzy Systems, 2011, 19(2): 253~264.
[22] Huang Z X. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998, 2(3): 283~304.
[23] Manning C D, Raghavan P, Schutze H. An introduction to information retrieval. United Kingdom ,Cambridge University Press, 2009: 496.
[24] Dhillon I, Modha D. Concept decompositions for large sparse text data using clustering. Machine Learning, 2001, 42(1/2): 143~175.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!