南京大学学报(自然科学版) ›› 2014, Vol. 50 ›› Issue (4): 505.
李飞江1*,成红红2,钱宇华1
Li Feijiang1, Cheng Honghong2, Qian Yuhua1
摘要: 聚类分析是数据挖掘与知识发现领域的一个重要研究方向。多数聚类算法中相似性是其核心概念之一,对象之间的相似性会被直接或者间接的计算出来。传统的相似性度量方法多是基于单一的粒度去观察两个被测对象。在人类认知过程中,通常采用多粒度来更合理有效地进行问题求解。本文借鉴人类的这种多粒度认知机理,提出一种新的相似性学习方法,称作全粒度相似性度量方法,基于此发展了一种全粒度聚类算法。而全粒度相似性度量从各个角度观察被测对象,进而会得到两个对象间更加真实的相似度。从UCI数据集中选取5组数据进行实验,最后通过与两种传统的聚类方法比较验证了全粒度聚类算法的合理性与有效性。
[1] Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD Conference, Seattle, 1998: 73~84. [2] Guha S, Rastogi R, Shim K. CURE: A robust clustering algorithm for categorical attributes. In: Proceedings of the 15th International Conference on Data Engineering, Sydney, Australia, IEEE Computer Society, 1999: 512~521. [3] Karypis G, Han E H, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 1999, 32(8): 68~75. [4] Ester M, Kriegel H P, Sander J. A density-based algorithm for discovering cluster in large spatial databases with Noise. In: Proceedings of the 2nd ACM SIGKDD, Portland, AAAI Press, 1996: 226~231. [5] Hinneburg A, Keim D. An efficient approach to clustering large multimedia databases with noise. In: Proceedings of the 4th ACM SIGKDD, New York, NY, September, 1998, 58~65. [6] Wang W, Yang J, Muntz R. STNG: A statistical information grid approach to spatial data minging. In: Proceedings of the 23rd Conference on VLDB, Athens, Morgan Kaufmann, 1997: 186~195. [7] Wang W, Yang J, Muntz R. STNG+: An approach to active spatial data mining. In: Proceedings of the 15th ICDE, Sydney, IEEE Computer Society ,1999: 116~125. [8] Agrawal R, Gehrke J, Gunopulos D. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD Conference, Seattle, Springer Verlag, Kluwer Academic Publishers, 1998: 94~105. [9] Chris D. A tutorial on spectral clustering. In: Proceedings of the International Conference of Machine Learning, Banff, Springer US, 2004: 395-416. [10] Kriegel H P, Kroger P, Zimek A. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Transaction on Knowledge Discovery from Data, 2009, 3(1): 1~58. [11] Dhillon I. Co-clustering documents and words using bipartite spectral path partitioning. In: Proceedings of the 7th ACM SIGKDD, New York, ACM Press, 1999: 73~83. [12] Wu X, Kumar V, Quinlan J R, et al. Top 10 Algorithms in Data Mining. Knowledge Information Systems, 2007, 14(1): 1~37. [13] Guyon I, Luxburg U V, Williamson R C. Clustering: Science or art? Technical Report, NIPS 2009 Workshop Clustering: science or Art? Vancouver, Canada, 2009. [14] Lin T Y. Granular computing. Announcement of the BISC Special Interest Group on Granular Computing, 1997. [15] Duc Thang Nguyen, Chen L H, Chee Keong Chan. Clustering with multiviewpoint-based similarity measure. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 988~1001. [16] Lee D, Lee J. Dynamic dissimilarity measure for support based clustering. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(6): 900~905. [17] Strehl A, Ghosh J, Mooney R. Impact of similarity measures on web-page clustering. In: Proceedings of the 17th International Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search (AAAI), 2000: 58~64. [18] Qian Y, Liang J, Dang C. Incomplete multigranulation rough set. IEEE Transactions on Systems, Man and Cybernetics-Part A, 2010, 40(2): 420~431. [19] Qian Y, Liang J, Yao Y, et al. MGRS: A multi-granulation rough set. Information Sciences, 2010, 180: 949~970. [20] Qian Y, Liang J, Witold P, et al. Positive approximation: An accelerator for attribute reduction in rough set theory. Artificial Intelligence, 2010, 174: 597~618. [21] Qian Y, Liang J, Wu W, et al. Information granularity in fuzzy binary GrC model. IEEE Transactions on Fuzzy Systems, 2011, 19(2): 253~264. [22] Huang Z X. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998, 2(3): 283~304. [23] Manning C D, Raghavan P, Schutze H. An introduction to information retrieval. United Kingdom ,Cambridge University Press, 2009: 496. [24] Dhillon I, Modha D. Concept decompositions for large sparse text data using clustering. Machine Learning, 2001, 42(1/2): 143~175. |
No related articles found! |
|