南京大学学报(自然科学版) ›› 2015, Vol. 51 ›› Issue (1): 197205.
金 萍1,3,宗 瑜1,3,屈世超2,胡 燕2,田 园2
Jin Ping1,3, Zong Yu1,3, Qu Shichao2, Hu Yan2, Tian Yuan2
摘要: 不确定数据聚类是传统数据挖掘的扩展,面对不确定数据聚类,研究者们经常把聚类问题描述成组合优化问题,并设计启发式聚类算法进行求解。现有的启发式聚类算法,如UK-means和UK-Medoids具有容易理解和实现简单等优点,但初始解敏感问题严重影响了聚类质量。本文在近似骨架理论的基础上,提出了一种近似骨架启发式聚类算法APPGCU (APProximate backbone guided heuristic clustering algorithm for uncertain data)。该算法首先对原数据集完成P次采样,在采样后的规模较小的P个数据集上分别执行UK-Medoids算法得到P个局部最优解;然后通过对P个局部最优解求交得到近似骨架,并从中提取初始簇心;最后从初始簇心开始,启发式搜索出聚类结果。在仿真和实际数据集中的实验结果表明,算法APPGCU的聚类结果明显高于实验对比的启发式聚类算法,提高了聚类质量。
[1] Nikos P, Ioannis K, Evangelos E, et al. Clustering uncertain trajectories. Knowledge Infomation System, 2011, 28: 117~147. [2] Steffen F, Wolfgang G, Michael K. Recurrent neural networks for fuzzy data. Integrated Computer-Aided Engineering, 2011, 18: 265~280. [3] Alastair C, Jana E, Chris J, et al. A state space approach to extracting the signal from uncetain data. Journal of Business and Economic Statistics, 2012, 30(2): 173~180. [4] Ding X F, Jin H. Efficient and progressive algorithms for distributed skyline queries over uncertain data. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(8): 1448~1462. [5] 汤克明, 戴彩艳, 陈 崚. 一种基于滑动窗口的不确定数据量Top-K查询算法. 南京大学学报(自然科学), 2012,48(3):351-359. [6] Qin B, Xia Y N, Sunil P. Rule introduction for uncertain data. Knowledge Information Sytem, 2011, 29: 103~130. [7] 徐健锋, 张远健, zhou D, et al. 基于粒计算的不确定时间序列建模及其聚类. 南京大学学报(自然科学),2014, 50(1):86~94. [8] Drineas P, Frieze R, Vempala S, et al. Clustering large graphs via singular value decompostion. Machine Learning, 2004, 56(1-3): 9~33. [9] Reeves C R. Landscapes, operators and heuristic search. Annals of Operation Research, 1999, 86(1): 473~490. [10] Chau R, Cheng M, Kao B, et al. Uncertain data mining: An example in clustering location data. In: Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Miing, Springer Verlag, 2006: 199~204. [11] Gullo F, Ponti G, Tagarelli A, et al. Clustering uncertain data via k-medoids. In: Proceeings of the 2nd International Conference on Scalable Uncertainty Management, Springer Verlag, 2008: 229~242. [12] 宗 瑜,江 贺,李明楚. 近似骨架导向的归约聚类算法. 电子与信息学报, 2009, 31(12): 2953~2957. [13] 江 贺, 张宪超, 陈国良. 图的二分问题唯一全局最优解实例与骨架计算复杂性. 科学通报, 2007, 52(17): 2077~2081. [14] 江 贺, 张宪超, 陈国良等. 二次分配问题的骨架分析与算法设计. 中国科学E辑, 2008, 38(2): 209~222 . [15] Valnir F J. Backbone guided dynamic local search for propositional satisfiability. In: Prceeding of 9th International Symposium on Artificial Intelligence and Mathematics (AI & Math-06), Florida, America, Springer Verlag, 2006:100~108. [16] Zhang W X. Configuration landscape analysis and backbone guided local search: Part I: Satisfiability and maximum satisfiability. Artificial Intelligence, 2004, 158(1):1~26. [17] 江 贺,邱 铁,胡 燕等. 启发式算法设计中的骨架分析与应用. 自动化学报, 2011, 37(3): 257~269. [18] Wang K, Ngai B, Kao C K, et al. Efficient clustering of uncertain data. In: Proceedings of the 6th International Conference on Data Mining, IEEE Computer Society, 2006: 436~445. [19] Zong Y, Xu G D, Jin P, et al. HC_AB: A new heuristic clustering algorithm based on aproximate backbone. Information Processing Letters, 2011, 111: 857~863. [20] Khan S S, Ahmad A. Cluster center initialization algorithm for K-means clustering. Journal of Pattern Recognition Letters, 2004, 25(11): 1293~1302. [21] Kang P, Cho S Z. K-means clustering seeds initialization based on centrality, sparsely and isotropy. In: Proceedings of the 10th International Conference on Intelligent Data Engineering and Automated Learning, IEEE Computer Society, 2009: 109~117. [22] Redmond S J, Heneghan C. A method for initializing the K-means clustering algorithm uing kd-tree. Journal of Pattern Recognition Letters, 2007, 28(8): 965~973. [23] 陈永彬, 张 琢. 智能单粒子优化算法在聚类分析中的应用. 南京大学学报(自然科学), 2011, (5):578~584. |
No related articles found! |
|