南京大学学报(自然科学版) ›› 2014, Vol. 50 ›› Issue (4): 482–.

• • 上一篇    下一篇

基于属性最大间隔的子空间聚类

刘 波1, 王红军1*,成 聪2,杨 燕1   

  • 出版日期:2014-08-23 发布日期:2014-08-23
  • 作者简介:1. 西南交通大学信息科学与技术学院,成都,610031;
    2. 四川大学计算机学院,成都,610054
  • 基金资助:
     国家自然科学基金(61003142,61262058,61175047,61170111)

Maximum margin model for subspace clustering

 Liu Bo1, Wang Hongjun1, Cheng Cong2, Yang Yan1   

  • Online:2014-08-23 Published:2014-08-23
  • About author: 1. School of Information Science and Technology, Southwest Jiaotong University, Chengdu , 610031, China;
    2. School of Computer Science, Sichuan University, Chengdu, 610054, China

摘要:  子空间聚类能有效的发现各簇与所属于的子空间的联系,同时减少高维数据中因为数据冗余和不相关属性对聚类过程产生的干扰。已有的子空间聚类方法强调各子空间中簇的发现,往往忽略子空间的划分。提出了基于属性最大间隔的子空间聚类,该方法主要思想是对子空间的划分时信息的丢失达到最小,从而子空间聚类的结果的效果比较好。主要工作包括:第一,建立了子空间划分的目标函数,也就是使各划分的子空间相互依赖达到最小,第二,设计了基于属性最大间隔的子空间聚类算法Maximum Margin Subspace Clustering (MMSC)进行子空间聚类集成。最后,采用UCI和NIPS2003比赛等数据来做实验,结果表明,在大多数数据上MMSC算法比其他子空间算法能得到更好的聚类结果。

Abstract:  Subspace Clustering can effectively discover the relationship between clusters and the subspaces, and it can reduce the interference caused by data redundancy and unrelated features in high dimensional datasets. Existing Subspace Clustering algorithms focus on the detection of clusters in subspace, while the division of subspace is ignored. This paper proposed a Subspace Clustering method based on features maximum margin, and its main idea is that minimum information is lost during the divide of subspaces, so the results of subspace clustering are better. There are two works in this paper. Firstly, the objective function of the subspace division is stated, and it makes the dependence of different subspaces to be minimum. Secondly, Subspace Clustering algorithm Maximum Margin Subspace Clustering (MMSC) based on maximum margin is designed for Subspace Clustering ensemble. At last, UCI and NIPS2003 competition datasets are used for experiments and the results show that MMSC algorithm on most datasets performs better results than other Subspace Clustering algorithms.

 [1] Verleysen M. Learning high-dimensional data. In: Ablameyko S, Goras L, Gori M, et al. Proceedings of the Limitations and Future Trends in Neural Computation. Siena: IOS Press, 2003: 141~162.
[2] Parsons L, Haque E, Liu H. Subspace clusteringing for high dimensional data: A review. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 90~105.
[3] Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clusteringing of high dimensional data for data mining applications. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD). New York: ACM Press, 1998, 27(2): 94~105.
[4] Cheng C H, Fu A W, Zhang Y. Entropy-based subspace clusteringing for mining numerical data. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 1999: 84~93.
[5] Goil S, Nagesh H, Choudhary A. Mafia: Efficient and scalable subspace clusteringing for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern University, 1999.
[6] Nagesh H S. High performance subspace clusteringing for massive data sets. Master’s thesis, Northwestern University, 1999.
[7] Nagesh H S, Goil S, Choudhary A. A scalable parallel subspace clusteringing algorithm for massive data sets. In: Proceedings of the 2000 International Conference on Parallel Processing. Toronto: IEEE Press. 2000: 477~484.
[8] Chang J W, Jin D S. A new cell-based clustering method for large, high-dimensional data in data mining applications. In: Proceedings of the 2002 ACM symposium on applied computing. New York: ACM Press, 2002, 503~507.
[9] Liu B, Xia Y, Yu P S. Clustering through decision tree construction. In: Proceedings of the 9th International Conference on Information and Knowledge Management. New York: ACM Press, 2000: 20~29.
[10] Procopiuc C M, Jones M, Agarwal P K, et al. A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2002: 418~427.
[11] Aggarwal C, Wolf J L, Yu P S, et al. Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 1999, 28(2): 61~72.
[12] Aggarwal C, Yu P S. Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2000, 29(2): 70~81.
[13] Woo K G, Lee J H. FINDIT: A fast and intelligent subspace clusteringing algorithm using dimension voting. Information and Software Technology, 2004, 46(4): 255~271.
[14] Yang J, Wang W, Wang H X, et al. -clusters: Capturing subspace correlation in a large data set. In: Proceedings of the 18th International Conference on Data Engineering. San Jose: IEEE Press, 2002: 517~528.
[15] Friedman J H, Meulman J. Clustering objects on subsets of attributes. Journal of the Royal Statistic Society, 2004, 66(4): 815~849.
[16] Wang D, Ding C, Li T. K-subspace clusteringing. Machine Learning and Knowledge Discovery in Databases. Slovenia: Springer Berlin Heidelberg, 2009, 506~521.
[17] Muller E, Assent I, Gunnemann S, et al. Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In: Proceeding of the IEEE 9th International Conference on Data Mining. Miami: IEEE Press, 2009: 377~386.
[18] Kriegel H P, Kroger P, Renz M, et al. A generic framework for efficient subspace clusteringing of high-dimensional data. In: Proceedings of the IEEE 5th International Conference on Data Mining. Houston: IEEE Press, 2005:8.
[19] Moise G, Sander J, Ester M. P3C: A robust projected clustering algorithm. In: Proceedings of the IEEE 6th International Conference on Data Mining. Hong Kong: IEEE Press, 2006: 414~425.
[20] Gullo F, Domeniconi C, Tagarelli A. Projective clustering ensembles. In: Proceedings of the IEEE 9th International Conference on Data Mining (ICDM). Miami: IEEE Press, 2009: 794~799.
[21] Gullo F, Domeniconi C, Tagarelli A. Enhancing single-objective projective clustering ensembles. In: Proceedings of the IEEE 10th International Conference on Data Mining (ICDM). Sydney: IEEE Press, 2010, 833~838.
[22] Gullo F, Domeniconi C, Tagarelli A. Advancing data clustering via projective clustering ensembles. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. New York: ACM Press, 2011: 733~744.
[23] Reshef D N, Reshef Y A, Finucane H K, et al. Detecting novel associations in large data sets. Science, 2011, 334(6062): 1518~1524.
[24] Zhou Z H, Tang W. Cluster ensemble. Knowledge-Based Systems, 2006, 19(1): 77~83.
[25] Garcla S, Fernndez A, Luengo J, et al. Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Information Sciences, 2010, 180: 2044~2064.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!