南京大学学报(自然科学版) ›› 2014, Vol. 50 ›› Issue (4): 482.
刘 波1, 王红军1*,成 聪2,杨 燕1
Liu Bo1, Wang Hongjun1, Cheng Cong2, Yang Yan1
摘要: 子空间聚类能有效的发现各簇与所属于的子空间的联系,同时减少高维数据中因为数据冗余和不相关属性对聚类过程产生的干扰。已有的子空间聚类方法强调各子空间中簇的发现,往往忽略子空间的划分。提出了基于属性最大间隔的子空间聚类,该方法主要思想是对子空间的划分时信息的丢失达到最小,从而子空间聚类的结果的效果比较好。主要工作包括:第一,建立了子空间划分的目标函数,也就是使各划分的子空间相互依赖达到最小,第二,设计了基于属性最大间隔的子空间聚类算法Maximum Margin Subspace Clustering (MMSC)进行子空间聚类集成。最后,采用UCI和NIPS2003比赛等数据来做实验,结果表明,在大多数数据上MMSC算法比其他子空间算法能得到更好的聚类结果。
[1] Verleysen M. Learning high-dimensional data. In: Ablameyko S, Goras L, Gori M, et al. Proceedings of the Limitations and Future Trends in Neural Computation. Siena: IOS Press, 2003: 141~162. [2] Parsons L, Haque E, Liu H. Subspace clusteringing for high dimensional data: A review. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 90~105. [3] Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clusteringing of high dimensional data for data mining applications. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD). New York: ACM Press, 1998, 27(2): 94~105. [4] Cheng C H, Fu A W, Zhang Y. Entropy-based subspace clusteringing for mining numerical data. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 1999: 84~93. [5] Goil S, Nagesh H, Choudhary A. Mafia: Efficient and scalable subspace clusteringing for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University, 1999. [6] Nagesh H S. High performance subspace clusteringing for massive data sets. Master’s thesis, Northwestern University, 1999. [7] Nagesh H S, Goil S, Choudhary A. A scalable parallel subspace clusteringing algorithm for massive data sets. In: Proceedings of the 2000 International Conference on Parallel Processing. Toronto: IEEE Press. 2000: 477~484. [8] Chang J W, Jin D S. A new cell-based clustering method for large, high-dimensional data in data mining applications. In: Proceedings of the 2002 ACM symposium on applied computing. New York: ACM Press, 2002, 503~507. [9] Liu B, Xia Y, Yu P S. Clustering through decision tree construction. In: Proceedings of the 9th International Conference on Information and Knowledge Management. New York: ACM Press, 2000: 20~29. [10] Procopiuc C M, Jones M, Agarwal P K, et al. A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2002: 418~427. [11] Aggarwal C, Wolf J L, Yu P S, et al. Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 1999, 28(2): 61~72. [12] Aggarwal C, Yu P S. Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2000, 29(2): 70~81. [13] Woo K G, Lee J H. FINDIT: A fast and intelligent subspace clusteringing algorithm using dimension voting. Information and Software Technology, 2004, 46(4): 255~271. [14] Yang J, Wang W, Wang H X, et al. -clusters: Capturing subspace correlation in a large data set. In: Proceedings of the 18th International Conference on Data Engineering. San Jose: IEEE Press, 2002: 517~528. [15] Friedman J H, Meulman J. Clustering objects on subsets of attributes. Journal of the Royal Statistic Society, 2004, 66(4): 815~849. [16] Wang D, Ding C, Li T. K-subspace clusteringing. Machine Learning and Knowledge Discovery in Databases. Slovenia: Springer Berlin Heidelberg, 2009, 506~521. [17] Muller E, Assent I, Gunnemann S, et al. Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In: Proceeding of the IEEE 9th International Conference on Data Mining. Miami: IEEE Press, 2009: 377~386. [18] Kriegel H P, Kroger P, Renz M, et al. A generic framework for efficient subspace clusteringing of high-dimensional data. In: Proceedings of the IEEE 5th International Conference on Data Mining. Houston: IEEE Press, 2005:8. [19] Moise G, Sander J, Ester M. P3C: A robust projected clustering algorithm. In: Proceedings of the IEEE 6th International Conference on Data Mining. Hong Kong: IEEE Press, 2006: 414~425. [20] Gullo F, Domeniconi C, Tagarelli A. Projective clustering ensembles. In: Proceedings of the IEEE 9th International Conference on Data Mining (ICDM). Miami: IEEE Press, 2009: 794~799. [21] Gullo F, Domeniconi C, Tagarelli A. Enhancing single-objective projective clustering ensembles. In: Proceedings of the IEEE 10th International Conference on Data Mining (ICDM). Sydney: IEEE Press, 2010, 833~838. [22] Gullo F, Domeniconi C, Tagarelli A. Advancing data clustering via projective clustering ensembles. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2011: 733~744. [23] Reshef D N, Reshef Y A, Finucane H K, et al. Detecting novel associations in large data sets. Science, 2011, 334(6062): 1518~1524. [24] Zhou Z H, Tang W. Cluster ensemble. Knowledge-Based Systems, 2006, 19(1): 77~83. [25] Garcla S, Fernndez A, Luengo J, et al. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences, 2010, 180: 2044~2064. |
No related articles found! |
|