南京大学学报(自然科学版) ›› 2012, Vol. 48 ›› Issue (2): 154–163.

• • 上一篇    下一篇

 基于形态学的单词一文档谱聚类方法*

 刘娜1.2,肖智博1,鲁明羽1**   

  • 出版日期:2015-05-23 发布日期:2015-05-23
  • 作者简介: (1.大连海事大学信息科学技术学院,大连,116026;
    2.大连工业大学信息科学与工程学院,大连,116034)
  • 基金资助:
     国家自然科学基金(6117503,61073133),中央高校基本科研业务费项日(2011ZD010)

 Method for spectral co-clustering documents and words based on morphology

 Liu Na1’2,Xiao Zhi Bo1 ,Lu Minh-Yu1
  

  • Online:2015-05-23 Published:2015-05-23
  • About author: (1 .Department of Information Science and Technology, Dalian Maritime University, Dalian, 116026,China;
    2. Department of Information Science and Engineering, Dalian Polytechnic University, Dalian, 116034,China)

摘要:  木文利用形态学的方法确定聚类数目,并对单词一文档谱聚类方法进行改进.确定聚类数目主要分三个步骤:第一步将单词一文档谱聚类方法中产生的矩阵转换成可视化聚类趋势分析方法(visual assessment of tendency, VAT)灰度图,第二步利用灰度形态学、图像二值化、距离转换等图像处理技术过滤产生的V A’I}灰度图,第三步对过滤后的V AT 灰度图建立信号图,并进行平滑处理,通过平滑后的信号图的波峰波谷数目确定文档集的聚类数目.实验表明,该方法能够提高单词一文档谱聚类方
法的聚类效果.

Abstract:  One of the major problems in spectral co-clustering analysis is the determination of the number of clusters in datasets,which is a basic input for most spectral co-clustering algorithms, In this paper, we propose.new method for automatically estimating the number of clusters in datasets and modify spectral co-clustering
documents and words, which is based on an existing algorithm for visual assessment of tendency(VAT)of a data set,using several common image and signal processing techniques.The method determining the number of clusters includes three main steps. First,the input matrix generated by spectral co-clustering documents and words is created into reordered dissimilarity gray image, from the image it is better able to highlight the potential cluster structure in dataset. We generate gray image use the VAT algorithm.Then, sequential image processing operations are used to segment the regions of interest in the gray image and to convert the filtered image into a distance- transformed image.There processing operations consist of gray morphology, image binarization, distance
transform. Finally, we project the transformed image onto the diagonal axis,which yields a one-dimensional signal,from which we can extract the number of clusters by major peaks and valleys after smoothing signal. When the number of clusters is determined,we modify the typical spectral co-clustering algorithm.The primary difference between typical spectral co-clustering algorithm and our method for spectral co-clustering documents and words based on morphology is that the number of clusters is set by user before the algorithm in typical algorithm, but user may not know the true value of it,if that is the case, the value maybe set incorrectly, incorrect parameter settings may lead to inaccurate clustering. Automatic determination of the cluster number in our proposed algorithm also
helps to improve the clustering results. Our experimental results show that the proposed method works well in practice.

[1]Luxburg U V. A tutorial on spectral clustering. Statistics and Computing, 2007,17(4): 395一416.
[2]Ng A Y,Jordan M 1,Weiss Y. On spectral clustering; Analysis and an algorithm. Michael I J,Yann L C,Sara A S. Advances in Neural
information Processing Systems. Massachu- setts,Massachusetts; MlT Press, 2002,849一 856.
[3]Tian Z, Li X B, Ju Y W. Spectral clustering based on matrix perturbation theory. Science in China Scrics F; lnformation Sciences, 2007,50(1):63一81.
[4]Ng A Y,Joradan M 1,Weiss Y. On spectral clustering; Analysis and an algorithm.Thomas G D, Suzanna B, Zoubin G. Advances in Neural
Information Processing Systems 14. Massachu- sots:MlT Press, 2001,849一856.
[5]Pollito M,Poona P. Grouping and dimension ality reduction by locally linear embedding. Ad vances in Neural Information Processing Sys
tems 14. England:MIT Press,2002,1255~1262
[6]Zheng X, Lin X Y. Automatic determination of intrinsic cluster number family in spectral clus-tering using random walk on graph. 2004 lnter-
national Conference on Image Processing (ICIP). Singapore; IEEE,2004,3471一3474.
[7]Li K,Liu Y S. A spectral clustering algorithm based on self-adaption. 2007 International Con- ference on Machine Learning and Cybernetics
(ICMLC).Hong Kong; IEEE,2007,3965一3968.
[8]Cai X Y,Dai G Z, Yang L B, et al. A self- adaptive spectral clustering algorithm. Cheng D Z, Li C. Proceedings of the 27th Chinese Con-
trol Conference. Kunming; Beihang University Press. 2008,551一553.
[9]Piao D Y,Zhang D Q. Adaptive spectral cluster ring algorithm. Jorunal of Shandong University (Engineering Science),2009,39(5);22一26.(卜
德云,张道强.自适应谱聚类算法研究.山东大学学报(工学版),2009,39(5):22-26).
[10]Kong W Z, Sun Z H,Yang C,et al. Automatic spectral clustering based on cigengap and oi- thogonal cigcnvcctor. Acta Elcctronica Sinica,
2010,38(8) : 1980- 1985.(孔万增,孙志海,杨灿等.基于木征间隙与正交特征向量的自动谱聚类.电子学报,2010,38(8):1980-1985).
[11]Hu J,Huang H K,Gao F. A clustering algo- rithm for parallel coordinates-based measure model and its applications. Journal of Nanjing University(Natural Sciences),2009,45(5): 645-655.(胡俊,黄厚宽,高芳.一种基于平行坐标度量模型的聚类算法及其应用.南京大学学报(自然科学),2009,45(5):645-655).
[12]Inderjit D. Co-clustcring documents and words using bipartite spectral graph partitioning. Dohcon Lee. KDD’0l ACM SIUKDD lnterna-
tional Conference on Knowledge Discovery and Data Mining. San Francisco:ACM,2001,269一274.
[13]Bezdek J C,Hathaway R J. VAT:A tool for visual assessment of(cluster) tendency. Pro- cecdings of the International Joint Conference on
Neural Networks. Piscataway: IEEE Press, 2002,2225一2230.
[14]Soille P. Morphological image analysis; Princi- ples and applications. USA:Springer-Verlag,1999,391.
[15]Otsu N. A threshold selection method from gray-level histograms, IEEE Transactions on System Man and Cybernetic, 1979,9(1): 62一66.
[16]Strehl A,Ghosh J. Cluster ensembles-a knowl- edge rcnse framework for combining partition ings.
The Journal of Machine Learning Re- search, 2002,3:583一617.










No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!