南京大学学报(自然科学版) ›› 2014, Vol. 50 ›› Issue (4): 448–.

• • 上一篇    下一篇

基于点密度的半监督CA算法在图像聚类中的应用

于平,王士同   

  • 出版日期:2014-08-23 发布日期:2014-08-23
  • 作者简介: 江南大学数字媒体学院,无锡,214122
  • 基金资助:
     国家自然科学基金(61170122),江苏省自然科学基金(BK2012552)

Semi-supervised competitive agglomeration algorithm based on dot density and application in image clustering

 Yu Ping, Wang Shitong   

  • Online:2014-08-23 Published:2014-08-23
  • About author:School of Digital Media, Jiangnan University, Wuxi 214122

摘要:  经典竞争凝聚(CA)算法具有自动寻找聚类总数的特性,避免了预判参数对聚类结果的影响,但在聚类过程中,该算法并未利用样本数据中普遍存在的少量已知信息,而这些已知信息往往能够对整个聚类过程提供有益的帮助;此外算法在相似度度量函数上采用了最为常见的欧氏距离,该距离仅适用于球状的聚类,且存在等划分的趋势,这就制约了算法的应用范围。针对上述问题,通过引入具有半监督学习能力的半监督项,增强隶属度矩阵的划分能力,并利用样本数据的点密度信息,生成距离调节因子修正欧氏距离,最终得到了基于点密度的半监督CA算法。在人造模拟图像和真实图像上的聚类分割结果,以及与其它算法的性能比较,表明了所得算法,能得到较为准确的中心值,有更佳的聚类效果。

Abstract:  The competitive agglomeration (CA) is a very classic algorithm in clustering algorithm. The algorithm has the ability to get cluster number automatically. It judges and gives up the false clustering centers during iterative process of continuous until the last number of cluster is most appropriate for sample date. Through this way it avoids the influence on the clustering results by anticipating parameters incorrectly,and does not need to set precise clustering number for sample date. But during its clustering, it fails to take into account the known information, which is little but prevalent in the sample data. However those known informations are important for the clustering results. Obviously, making proper use of the information is conducive to improve the clustering rate. Moreover, the algorithm uses the Euclidean distance as the similarity function. Even though the distance formula has the advantages in calculation and is wildly used in common algorithms, the distance is only applicable to spherical clustering and it has the trend of equal partition for data sets. There are many different kinds of sample data may need cluster. And considering the diversity of sample data, a conclusion would be gotten, that all these above have restricted the application scope of the algorithm. To solve these problems, the semi-supervised entry was introduced to enhance partitioning capability of membership matrix. It has the ability of learning which could help the algorithm make full use of the information that known in sample data. And a distance correction with the information of dot density was built. The dot density could reflect the importance of one point in data clustering and could be built for adjusting the Euclidean distance, in order to avoiding the distance leading a trend of equal partition for clustering result. Finally a semi-supervised algorithm based on density was proposed. Four images were divided into two groups, which were artificial image and real images. And they were designed for examining the segmentation. Three other algorithms were used for comparison with the algorithm proposed. Through the clustering segmentation results of images and the comparison with other algorithm in performance, the results show that the proposed algorithm can get more accurate center value and get better clustering results.

 [1] Bezdek J C, Hathaway R,Sabiu M, et al. Convergence theory for fuzzy C-means-counterexample and repairs. IEEE Transactions on Systems, Man, and Cybernetics, 1987, 17(5): 873~877.
[2] Hall L O,Goldgof D B.Convergence of the single-pass and online fuzzy C-means algorithms. IEEE Transactions on Fuzzy Systems, 2011, 19(4): 792~794.
[3] Zhu L,Chtmg F L,Wang S T.Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions. IEEE Transactions on Systems, Man, and Cybernetics, 2009, 39(3): 578~591.
[4] Frigui H, Krishnapuram R. Clustering by competitive agglomeration. Pattern Recognition, 1997, 30(7): 1109~1119.
[5] Boujemaa N. Generalized competitive clustering for image segmentation. In: Proceedings of the 19th International Conference of the North American Fuzzy Information Processing Society - NAFIPS. IEEE, 2000: 133~137.
[6] 刘小芳,曾黄麟,吕炳朝.点密度函数加权模糊C均值算法的聚类分析. 计算机工程与应用, 2004, 40(24): 64~65.
[7] Tang C L,Wang S G,Xu W.New fuzzy C-means clustering model based on the data weighted approach. Data & Knowledge Engineering, 2010, 69(9): 887~900.
[8] Endo Y, Hamasuna Y, Yamashiro M, et al. On semi-supervised fuzzy C-means clustering. In: Proceedings of IEEE International Conference on Fuzzy Systems, Korea: FUZZ-IEEE, 2009: 1119~1124.
[9] 姚紫阳. 半监督中心最大化模糊C均值算法. 计算机工程与应用, 2012, 48(33): 188~193.
[10] Hamasuna Y, Endo Y. On semi-supervised fuzzy c-means clustering for data with clusterwise tolerance by opposite criteria. Soft Computing - A Foundations, Methodologies & Applications, 2013, 17(1): 71~81.
[11] 楼晓俊,李隽颖,刘海涛. 距离修正的模糊C均值聚类算法. 计算机应用, 2012, 32(3): 646~648.
[12] Zhang X B, Huang H, Zhang S J. A FCM clustering algorithm based on semi-supervised and point density weighted. In: Proceedings of IEEE International Conference on ICIS, Xiamen, China: IEEE, 2010: 720~713.
[13] 陈圆圆, 陈志平. 一种基于代表点和点密度的聚类算法. 计算机工程与应用, 2008, 44(28): 136~139.
[14] Dunn J C. Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 1974, 4(1): 95~104.
[15] Bezdek J C. Pattern recognition with fuzzy objective function algorithms. Norwell, MA, USA: Kluwer Academic Publishers , 1981, 256.
[16] Deng Z, Choi K S, Chung F L, et al. Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recognition, 2010, 43(3): 767~781.
[17] Jing L, Ng M K, Huang J Z. An entropy weighting k-means algorithm for subspace clustering of high-dimensional sparse data. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(8): 1026~1041.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!