|本期目录/Table of Contents|

[1]严丽宇,魏 巍*,郭鑫垚,等. 一种基于带核随机子空间的聚类集成算法[J].南京大学学报(自然科学),2017,53(6):1033.[doi:10.13232/j.cnki.jnju.2017.06.005]
 Yan Liyu,Wei Wei*,Guo Xinyao,et al. Clustering ensemble algorithm based on random subspace with core[J].Journal of Nanjing University(Natural Sciences),2017,53(6):1033.[doi:10.13232/j.cnki.jnju.2017.06.005]





 Clustering ensemble algorithm based on random subspace with core
 严丽宇1魏 巍12*郭鑫垚1崔军彪1
 Yan Liyu1Wei Wei12*Guo Xinyao1Cui Junbiao1
1.School of Computer &Information Technology,Shanxi University,Taiyuan,030006,China;
2.Key Laboratory of Computation Intelligence & Chinese Information Processing,Ministry of Education,Taiyuan,030006,China
 subspace clusteringclustering ensemblerough setcomplement mutual information
 Clustering analysis with a wide range of applications is very important for data mining.At present,clustering algorithms are faced with large-scale and high-dimension data,but the traditional clustering algorithms are not effective to cluster the sparse data in high dimensional data environment.Subspace clustering algorithm,which aims at solving clustering problems in high dimensional data environment,is a newly emerging and quite important embranchment of clustering analysis.For one thing,acting as an extension of the traditional clustering algorithm,subspace clustering plays a vital role in clustering the high dimensional data effectively.For another,clustering ensemble can offer a partition that could better reflect the inherent structure of the data set through integrating many clustering results of the original data set,which improves the quality of clustering to a large degree.Random subspace-based clustering ensemble algorithm generates subspaces through sampling attributes randomly,and then combines base clustering results derived from these attribute subspaces to produce the ensemble clustering result.In the whole process,it is possible that some random subspaces may contain few important attributes,which gives rise to a bad ensemble clustering result ultimately.To address this problem,we propose a core-containing random subspace generating strategy,where we pick out a set of important attributes on the basis of their values of complement mutual information in rough set theory as the core of each attribute subspace first of all,and then combine the core with some attributes sampled randomly from the rest of attributes to construct a random subspace with core.Not only does this random subspace generating strategy provide diversity among subspaces,it also heightens the ability of representing complete information of data for every subspace,which contributes to a better clustering ensemble result.Performing experiments on data from UCI(University of California.Irvine),it turns out that compared with the clustering ensemble based on completely random subspace,the one based on random subspace with core wins in the majority of the data sets.



[1] Agresti A.An introduction to categorical data analysis.The 2nd Edition.New Jersey:Wiley,2007,400.
[2] He Z Y,Xu X F,Deng S C.A cluster ensemble method for clustering categorical data.Information Fusion,2005,6(2):143-151.
[3] Strehl A,Ghosh J.Cluster Ensemble-A knowledge reuse framework for combining multiple partions.Journal of Machine Learning Research,2002,3:583-617.
[4] 李桃迎,陈 燕,张金松等.一种面向分类属性数据的聚类融合算法研究.计算机应用研究,2011,28(5):1671-1673.(Li T Y,Chen Y,Zhang J S,et al.Clustering ensemble algorithm for categorical data.Application Research of Computers,2011,28(5):1671-1673.)
[5] Fred A L N,Jain A K.Combining multiple clusterings using evidence accumulation.IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(6):835-850.
[6] Al-Razgan M,DomeniconiI C,Barbará D.Random subspace ensembles for clustering categorical data.In:Okun O,Valentini G.Supervised and Unsupervised Ensemble Methods and Their Applications.Spring Berlin Heidelberg,2008:31-48.
[7] Minaei-Bidgoli B,Topchy A,Punch W F.A comparison of resampling methods for clustering ensembles.In:Proceedings of Conference on Machine Learning methods,technology and application.Las Vegas,CA,USA:ICAI,2004:939-945.
[8] Fern X Z,Brodley C E.Random projection for high dimensional data clustering:A cluster ensemble approach.In:Proceedings of International Conference on Machine Learning.Washington DC,USA:ICML,2003:63-74.
[9] Huang Z X.Clustering large data sets with mixed numeric and categorical values.In:Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference.Singapore,Singapore:World Scientific,1997:21-35.
[10] 阳林赟,王文渊.聚类融合方法综述.计算机应用研究,2005,22(12):8-10,14.(Yang L Y,Wang W Y.Clustering ensemble approaches:An overview.Application Research of Computers,2005,22(12):8-10,14.)
[11] Li T,Ogihara M,Ma S.On combining multiple clusterings:An overview and a new perspective.Applied Intelligence,2010,33(2):207-219.
[12] Vega-Pons S,Ruiz-Shulcloper J.A survey of clustering ensemble algorithms.International Journal of Pattern Recognition and Artificial Intelligence,2011,25(3):337-372.
[13] Dimitriadou E,Weingessel A,Hornik K.Voting-Merging:An ensemble method for clustering.In:Dorffner G,Bischof H,Hornik K.International Conference on Artificial Neural Networks(ICANN2001).Springer Berlin Heidelberg,2001:217-224.
[14] Fischer B,Buhmann J M.Bagging for path-based clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25(11):1411-1415.
[15] Ayad H G,Kamel M S.Cumulative voting consensus method for partitions with variable number of clusters.IEEE Transactions on Pattern Analysis and Machine Intelligence,2008,30(1):160-173.
[16] Tumer K,Agogino A K.Ensemble clustering with voting active clusters.Pattern Recognition Letters,2008,29(14):1947-1953.
[17] Wang X,Yang C Y,Zhou J.Clustering aggregation by probability accumulation.Pattern Recognition,2009,42(5):668-675.
[18] Li F J,Qian Y H,Wang J T,et al.Multigranulation information fusion:A Dempster-Shafer evidence theory-based clustering ensemble method.Information Sciences,2017,378:389-409.
[19] Pawlak Z.Rough sets:Theoretical aspects of reasoning about data.Dordrecht:Kluwer Academic Publishers,1991,231.
[20] Liang J Y,Chin K S,Dang C Y,et al.A new method for measuring uncertainty and fuzziness in rough set theory.International Journal of General Systems,2002,31(4):331-342.



更新日期/Last Update: 2017-11-26