南京大学学报(自然科学版) ›› 2015, Vol. 51 ›› Issue (1): 197–205.

• • 上一篇    下一篇

 面向不确定数据的近似骨架启发式聚类算法

 金 萍1,3,宗 瑜1,3,屈世超2,胡 燕2,田 园2
  

  • 出版日期:2015-01-04 发布日期:2015-01-04
  • 作者简介: (1. 皖西学院信息工程学院,六安,237012;2.大连理工大学软件学院,大连,116621;
    3.中国科学与技术大学计算机科学与技术学院,合肥,230027)
  • 基金资助:
     国家自然科学基金(61073110),安徽省自然科学基金面上项目(1208085MF95),安徽省教育厅自然科学基金重点项目(KJ2012A273, KJ2012A274),留学人员科研活动项目择优资助项目

 Approximate backbone guided heuristic clustering algorithm for uncertain data

 Jin Ping1,3, Zong Yu1,3, Qu Shichao2, Hu Yan2, Tian Yuan2
  

  • Online:2015-01-04 Published:2015-01-04
  • About author: (1. School of Information and Engineering, West Anhui University, Lu’an, 237012, China;
    2. School of Software, Dalian University of Technology, Dalian, 116621, China;
    3. Department of Computer Science and Technology,
    University of Science and Technology of China, Hefei, 230027, China)

摘要:  不确定数据聚类是传统数据挖掘的扩展,面对不确定数据聚类,研究者们经常把聚类问题描述成组合优化问题,并设计启发式聚类算法进行求解。现有的启发式聚类算法,如UK-means和UK-Medoids具有容易理解和实现简单等优点,但初始解敏感问题严重影响了聚类质量。本文在近似骨架理论的基础上,提出了一种近似骨架启发式聚类算法APPGCU (APProximate backbone guided heuristic clustering algorithm for uncertain data)。该算法首先对原数据集完成P次采样,在采样后的规模较小的P个数据集上分别执行UK-Medoids算法得到P个局部最优解;然后通过对P个局部最优解求交得到近似骨架,并从中提取初始簇心;最后从初始簇心开始,启发式搜索出聚类结果。在仿真和实际数据集中的实验结果表明,算法APPGCU的聚类结果明显高于实验对比的启发式聚类算法,提高了聚类质量。 

Abstract:  As an extension of traditional data mining, uncertain data clustering gets wide interest of researchers. Uncertain data clustering is equivalent to a combinatorial optimization problem, and researchers usually solve it by using heuristic algorithms. As we known, the existence heuristic clustering algorithms for uncertain data, such as UK-Means and UK-Medoids, are easy to understand and to implement, but these algorithm also have the initialization sensitivity problem which affecting the clustering quality severely. In this paper, we propose an approximate backbone based heuristic clustering algorithm APPGCU (APProximate backbone guided heuristic clustering algorithm for uncertain data). In this algorithm, we first do P times sampling on the original dataset, and run UK-Medoids on the P sampled datasets to get P sub-optimal solutions; then get the approximate backbone from the P sub-optimal solutions, and extract the initialization cluster center; eventually, re-run UK-Medoids with the initialization, get the solution by a heuristic search. The experiments on synthetic and standard UCI uncertain datasets demonstrate that APPGCU gets better clustering results than existing heuristic clustering algorithms, and improves the quality of clustering results.



 [1] Nikos P, Ioannis K, Evangelos E, et al. Clustering uncertain trajectories. Knowledge Infomation System, 2011, 28: 117~147.
[2] Steffen F, Wolfgang G, Michael K. Recurrent neural networks for fuzzy data. Integrated Computer-Aided Engineering, 2011, 18: 265~280.
[3] Alastair C, Jana E, Chris J, et al. A state space approach to extracting the signal from uncetain data. Journal of Business and Economic Statistics, 2012, 30(2): 173~180.
[4] Ding X F, Jin H. Efficient and progressive algorithms for distributed skyline queries over uncertain data. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(8): 1448~1462.
[5] 汤克明, 戴彩艳, 陈 崚. 一种基于滑动窗口的不确定数据量Top-K查询算法. 南京大学学报(自然科学), 2012,48(3):351-359.
[6] Qin B, Xia Y N, Sunil P. Rule introduction for uncertain data. Knowledge Information Sytem, 2011, 29: 103~130.
[7] 徐健锋, 张远健, zhou D, et al. 基于粒计算的不确定时间序列建模及其聚类. 南京大学学报(自然科学),2014, 50(1):86~94.
[8] Drineas P, Frieze R, Vempala S, et al. Clustering large graphs via singular value decompostion. Machine Learning, 2004, 56(1-3): 9~33.
[9] Reeves C R. Landscapes, operators and heuristic search. Annals of Operation Research, 1999, 86(1): 473~490.
[10] Chau R, Cheng M, Kao B, et al. Uncertain data mining: An example in clustering location data. In: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Miing, Springer Verlag, 2006: 199~204.
[11] Gullo F, Ponti G, Tagarelli A, et al. Clustering uncertain data via k-medoids. In: Proceeings of the 2nd International Conference on Scalable Uncertainty Management, Springer Verlag, 2008: 229~242.
[12] 宗 瑜,江 贺,李明楚. 近似骨架导向的归约聚类算法. 电子与信息学报, 2009, 31(12): 2953~2957.
[13] 江 贺, 张宪超, 陈国良. 图的二分问题唯一全局最优解实例与骨架计算复杂性. 科学通报, 2007, 52(17): 2077~2081.
[14] 江 贺, 张宪超, 陈国良等. 二次分配问题的骨架分析与算法设计. 中国科学E辑, 2008, 38(2): 209~222 .
[15] Valnir F J. Backbone guided dynamic local search for propositional satisfiability. In: Prceeding of 9th International Symposium on Artificial Intelligence and Mathematics (AI & Math-06), Florida, America, Springer Verlag, 2006:100~108.
[16] Zhang W X. Configuration landscape analysis and backbone guided local search: Part I: Satisfiability and maximum satisfiability. Artificial Intelligence, 2004, 158(1):1~26.
[17] 江 贺,邱 铁,胡 燕等. 启发式算法设计中的骨架分析与应用. 自动化学报, 2011, 37(3): 257~269.
[18] Wang K, Ngai B, Kao C K, et al. Efficient clustering of uncertain data. In: Proceedings of the 6th International Conference on Data Mining, IEEE Computer Society, 2006: 436~445.
[19] Zong Y, Xu G D, Jin P, et al. HC_AB: A new heuristic clustering algorithm based on aproximate backbone. Information Processing Letters, 2011, 111: 857~863.
[20] Khan S S, Ahmad A. Cluster center initialization algorithm for K-means clustering. Journal of Pattern Recognition Letters, 2004, 25(11): 1293~1302.
[21] Kang P, Cho S Z. K-means clustering seeds initialization based on centrality, sparsely and isotropy. In: Proceedings of the 10th International Conference on Intelligent Data Engineering and Automated Learning, IEEE Computer Society, 2009: 109~117.
[22] Redmond S J, Heneghan C. A method for initializing the K-means clustering algorithm uing kd-tree. Journal of Pattern Recognition Letters, 2007, 28(8): 965~973.
[23] 陈永彬, 张 琢. 智能单粒子优化算法在聚类分析中的应用. 南京大学学报(自然科学), 2011, (5):578~584.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!