南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (5): 963–.

• • 上一篇    下一篇

 基于粒度空间的最小生成树分类算法

 孙梦梦1,唐旭清1,2*   

  • 出版日期:2017-09-25 发布日期:2017-09-25
  • 作者简介: 1.江南大学理学院,无锡,214122;2.无锡市生物计算工程技术研究中心,无锡,214122
  • 基金资助:
     基金项目:国家自然科学基金(11371174),国际科技合作研究项目(2011DFR70500)
    收稿日期:2017-07-20
    *通讯联系人,E-mail:txq5139@jiangnan.edu.cn

 Minimum spanning tree classification algorithm based on the granular space

 Sun Mengmeng1,Tang Xuqing1,2*   

  • Online:2017-09-25 Published:2017-09-25
  • About author:1.School of Science,Jiangnan University,Wuxi,214122,China;
    2.Wuxi Engineering Research Center for Biocomputing,Jiangnan University,Wuxi,214122,China

摘要:  基于粒度空间理论,进行了基于归一化距离的最小生成树分类算法研究.首先根据类内偏差和类间偏差的性质,在已有的粒度空间生成算法的基础上,引入最小生成树以及新的最优聚类指标,给出了基于归一化距离的最小生成树分类算法,并建立了最优聚类模型.其次,将模型应用于研究从NCBI上下载的1902-2015年间的898条现在已经确认能够感染人的禽流感病毒蛋白质序列HA与NA蛋白,共有8种,包括H5N1,H5N2,H7N2,H7N3,H7N7,H9N2,H10N7,以及最近的H7N9.在距离中心最近的基础上,通过运行最小生成树分类算法,6个代表病毒序列被选出,并且得到了最优层次结构.最后,对实验结果进行分析,结果表明病毒爆发地域差异、病毒爆发时间等因素对禽流感病毒的变异产生了重要影响,这些结果与已有的研究结果一致,说明本文提出的最小生成树分类算法是有效的.在寻找基于粒度空间的最佳聚类问题上,最小生成树分类算法比原有的算法具有更低的复杂度.这些结论为基于大数据的信息处理提供了一种全新的处理方法.

Abstract:  According to the granular space theory,minimum spanning tree classification algorithm is proposed based on normalized metric.Firstly,based on the existing representation and generation algorithm of granular space,by introducing the minimum spanning tree and the new optimization clustering index based on the intra-class deviation and inter-class deviation,an optimal model was established.Furthermore,the 8 subgroups(H5N1,H5N2,H7N2,H7N3,H7N7,H9N2,H10N7 and H7N9)of 898 avian influenza viruses containing both HA and NA protein were used as an experimental database.These avian influenza viruses occurred from 1902 to 2015 around the world and could infect people.Based on the characteristics of avian influenza virus data sets,the 898 avian influenza viruses were divided into two classes by running the algorithm first time.Each class contains varying amounts of the close relationship between viral sequences,respectively,842 and 56.Considering the complexity of the evolutionary tree structure,a signature virus representative is selected for each class of optimal clustering for more effective research and discussion of new methods.In order to further study the nature of avian influenza virus,the two types of influenza viruses were analyzed separately by the algorithm again.Based on the nearest principle,6 representative viruses were selected and a phylogenetic tree was constructed.Finally,comparing the results with those in the literature,we found that the variation of human influenza virus is closely related to the region and the outbreak time.These results are consistent with the results of previous studies,indicating that the algorithm is effective.The minimum spanning tree classification algorithm has lower complexity than the original algorithm in finding the optimization clustering.These conclusions provide a new approach to information processing based on large data.

 

[1] Lin T Y.Granular computing on binary relations I:Data mining and neighborhood systems.In:Skowron A,Polkowski L.Rough sets in knowledge discovery.Heidelberg:Physica-Verlag,1998:107-121.
[2] Lin T Y.Granular computing:Structures,representations,and applications.In:Wang G,Liu Q,Yao Y,et al.Rough sets,fuzzy sets,data mining,and granular computing.Springer Berlin Heidelberg,2003:16-24.
[3] Skowron A,Stepaniuk J.Information granules:Towards foundations of granular computing.International Journal of Intelligent Systems,2001,16(1):57-85.
[4] Zhang Y Q,Fraser M D,Gagliano R A,et al.Granular neural networks for numerical-linguistic data fusion and knowledge discovery.IEEE Transactions on Neural Networks,2000,11(3):658-667.
[5] Bargiela A,Pedrycz W.Toward a theory of granular computing for human-centered information processing.IEEE Transactions on Fuzzy Systems,2008,16(2):320-330.
[6] Zadeh L A.Fuzzy logic = computing with words.IEEE Transactions on Fuzzy Systems,1996,4(2):103-111.
[7] Pawlak Z.Rough sets and intelligent data analysis.Information Sciences,2002,147(1-4):1-12.
[8] 张 铃,张 钹.模糊商空间理论(模糊粒度计算方法).软件学报,2003,14(4):770-776.(Zhang L,Zhang B.Theory of fuzzy quotient space(methods of fuzzy granular computing).Journal of Software,2003,14(4):770-776.)
[9] Yao Y Y.Interpreting concept learning in cognitive informatics and granular computing.IEEE Transactions on Systems,Man,and Cybernetics,Part B(Cybernetics),2009,39(4):855-866.
[10] 蒙祖强,周石泉.不一致决策系统中基于粒度计算的广义决策规则获取方法研究.计算机科学,2012,39(1):198-202.(Meng Z Q,Zhou S Q.Research on method of generalized decision rule acquisition based on GrC in inconsistent decision systems.Computer Science,2012,39(1):198-202.)
[11] Panoutsos G,Mahfouf M.A neural-fuzzy modelling framework based on granular computing:Concepts and applications.Fuzzy Sets and Systems,2010,161(21):2808-2830.
[12] Skowron A,Stepaniuk J,Swiniarski R.Modeling rough granular computing based on approximation spaces.Information Sciences,2012,184(1):20-43.
[13] Pedrycz W.The design of cognitive maps:A study in synergy of granular computing and evolutionary optimization.Expert Systems with Applications,2010,37(10):7288-7294.
[14] Pedrycz W,Loia V,Senatore S.Fuzzy clustering with viewpoints.IEEE Transactions on Fuzzy Systems,2010,18(2):274-284.
[15] 陶 华,唐旭清.基于模糊邻近关系的聚类结构分析.计算机科学,2013,40(01):257-261.(Tao H,Tang X Q.Clustering structural analysis on fuzzy proximity relation.Computer Science,2013,40(01):257-261.)
[16] Theodoridis S,Koutroumbas K.Pattern recognition.The 2nd Edition.Amsterdam,Netherlands:Elsevier,2003,689.
[17] 杨淑莹.模式识别与智能计算:Matlab技术实现.第2版.北京:电子工业出版社,2011,372.(Yang S Y.Pattern recognition and intelligent computing:Matlab technology realization.The 2nd Edition.Beijing:Publishing House of Electronics Industry,2011,372.)
[18] 徐宗本.计算智能中的仿生学:理论与算法.北京:科学出版社,2003,315.(Xu Z B.Bionics in intelligent computing:Theory and algorithms.Beijing:Science Press,2003,315.)
[19] Duda R O,Hart P E,Stork D G.Pattern classification.The 2nd Edition.New York:John Wiley & Sons,2004,816.
[20] 陈 新.基于最小生成树的聚类分析方法研究.硕士学位论文.重庆:重庆大学,2013.(Chen X.Research on clustering algorithm based on minimum spanning tree.Master Dissertation.Chongqing:Chongqing University,2013.)
[21] Tang X Q,Zhu P,Cheng J X.Cluster analysis based on fuzzy quotient space.Journal of Software,2008,19(4):861-868.
[22] 唐旭清,朱 平,程家兴.基于等腰归一化距离的模糊粒度空间研究.计算机科学,2008,35(4):142-145.(Tang X Q,Zhu P,Cheng J X.Study on fuzzy granular space based on normalized equicrural metric.Computer Science,2008,35(4):142-145.)
[23] 唐旭清,方雪松,朱 平.基于模糊邻近关系的结构聚类.系统工程与理论实践,2010,30(11):1986-1997.(Tang X Q,Fang X S,Zhu P.Structural clusters based on fuzzy proximity relations.Systems Engineering-Theory & Practice,2010,30(11):1986-1997.)
[24] Tang X Q,Zhu P,Cheng J X.The structural clustering and analysis of metric based on granular space.Pattern Recognition,2010,43(11):3768-3786.
[25] 赵向梅,王艳君,刘 林.聚类算法及聚类融合算法研究.电子设计工程,2011,19(15):4-5,12.(Zhao X M,Wang Y J,Liu L.Research on clustering algorithms and clustering ensemble algorithms.Electronic Design Engineering,2011,19(15):4-5,12.)
[26] 于 剑,程乾生.模糊聚类方法中的最佳聚类数的搜索范围.中国科学(E辑),2002,32(2):274-280.
[27] 李 骏,王国俊.Gdel n值命题逻辑中命题的α-真度理论.软件学报,2007,18(1):33-39.(Li J,Wang G J.Theory of α-truth degrees in n-valued Gdel propositional logic.Journal of Software,2007,18(1):33-39.)
[28] 唐旭清,朱 平,程家兴.基于归一化距离的结构聚类分析.模式识别与人工智能,2009,22(05):678-688.(Tang X Q,Zhu P,Cheng J X.Analysis of structural clustering based on normalized metric.Pattern Recognition and Artificial Intelligence,2009,22(05):678-688.)
[29] 李 阳.基于粒计算的生物复杂系统建模和网络结构分析.硕士学位论文.无锡:江南大学,2017.(Li Y.Modeling the complex biological system and exploring its network structure based on the granular computing.Master Dissertation.Wuxi:Jiangnan University,2017.)
[30] Liu D,Shi W F,Shi Y,et al.Origin and diversity of novel avian influenza A H7N9 viruses causing human infection:Phylogenetic,structural,and coalescent analyses.The Lancet,2013,381(9881):1926-1932. 第53卷 第5期,2017年9月南京大学学报(自然科学),JOURNAL OF NANJING UNIVERSITY,(NATURAL SCIENCE)Vol.53, No.5,Sept.,2017



No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!