南京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (1): 40–.

• • 上一篇    下一篇

 一种基于异域自适应的新型社团发现算法

 段明月1,2,黄 晶1,2*,陈贺昌1,2,金 弟3*   

  • 出版日期:2018-01-31 发布日期:2018-01-31
  • 作者简介:1.吉林大学计算机科学与技术学院,长春,130012;
    2.符号计算与知识工程教育部重点实验室(吉林大学),长春,130012;
    3.天津大学计算机科学与技术学院,天津,300350
  • 基金资助:
     基金项目:国家自然科学基金(61373053,61572226),吉林省科技发展计划重点科技研发项目(20180201044GX,20180201067GX)
    收稿日期:2017-12-15
    *通讯联系人,E-mail:huangjing@jlu.edu.cnjindi@tju.edu.cn

 A novel community detection algorithm based on heterogeneous domain adaptation

 Duan Mingyue1,2,Huang Jing1,2*,Chen Hechang1,2,Jin Di3*   

  • Online:2018-01-31 Published:2018-01-31
  • About author:1.College of Computer Science and Technology,Jilin University,Changchun,130012,China;
    2.Key Laboratory of Symbol Computation and Knowledge Engineering(Jilin University),Ministry of Education,Changchun,130012,China;
    3.School of Computer Science and Technology,Tianjin University,Tianjin,300350,China

摘要:  社团发现已被广泛应用于社会学、生物学、物理学和计算机科学等诸多领域.通过发现复杂网络中的社团结构,可以帮助人们理解和分析复杂网络的功能,发现复杂网络中隐藏的规律并预测复杂网络的行为.目前,已有的社团发现算法主张融合网络结构信息和内容信息,以更好地避免网络噪声和节点缺失等原因对算法有效性产生影响.然而,它们并没有考虑当网络结构信息和内容信息维度不同时如何进行信息融合.针对该问题,提出一种基于异域自适应理论的网络社团发现算法CDHDA.该算法能够将不同维度的网络结构信息和内容信息映射到同一维度的子空间中,以实现对不同维度的信息融合.此外,在信息融合过程中可以对主要信息特征进行加强,以降低网络噪声和节点缺失对算法性能的影响.通过真实的社交网络数据集与经典的社团发现算法进行对比,验证了该算法的有效性.

Abstract:  The investigation of community structures in networks is an important issue in many domains and disciplines,such as sociology,biology,physics,computer science,and so on.Community detection algorithms which aim to discover all natural network communities from given complex networks are fundamentally important for both theoretical researches and practical applications.It can be used to analyze the community structure,understand the functions,recognize the hidden patterns,and predict the behaviors of complex networks.Currently,the existing community detection algorithms propose to integrate network structure and content information to detect community structure,and can better avoid the influence on effectiveness of the algorithm caused by network noise,missing nodes,and so on.However,existing community detection algorithms did not consider how to carry out information fusion when network structure information and content information are different in dimensions.In view of this,this paper proposed a novel community detection algorithm based on the theory of heterogeneous domain adaptation.In the algorithm,the network structure information and content information are represented by heterogeneous features with different dimensions.We first transform the two kinds of information into a common subspace in order to integrate the information in different dimensions.Then,two feature mapping functions are used to augment the transformed data with their original features and zeros.In addition,it can strengthen the main characteristics of information in the process of information integration to reduce the influence on performance of the algorithm caused by network noise and missing nodes.Through experiments on multiple real-world datasets(Facebook and Twitter)with varying sizes and characteristics,we demonstrate the effectiveness and efficiency of the algorithm proposed in this paper over state-of-the-art community detection approaches.

 [1] 孙玺菁,司守奎.复杂网络算法与应用.北京:国防工业出版社,2015,293.(Sun X J,Si S K.Complex network algorithms and applications.Beijing:National Defence Industry Press,2015,293.)
[2] Fortunato S.Community detection in graphs.Physics Reports,2010,486(3-5):75-174.
[3] 杨 博,刘大有,Liu J M等.复杂网络聚类方法.软件学报,2009,20(1):54-66.(Yang B,Liu D Y,Liu J M,et al.Complex network clustering algorithms.Journal of Software,2009,20(1):54-66.).
[4] Girvan M,Newman M E J.Community structure in social and biological networks.Proceedings of the National Academy of Sciences of the United States of America,2002,99(12):7821-7826.
[5] Newman M E J.Coauthorship networks and patterns of scientific collaboration.Proceedings of the National Academy of Sciences of the United States of America,2004,101(1):5200-5205.
[6] Wang Z,Zhang J.In serach of the biological significance of modular structures in protein networks.PLOS Computational Biology,2007,3(6):e107.
[7] Sachan M,Contractor D,Faruquie T A,et al.Using content and interactions for discovering communities in social networks ∥ Proceedings of the 21st International Conference on World Wide Web.Rio Ode Karo,Brazil:ACM,2012,331-340.
[8] Rosvall M,Bergstrom C T.Maps of random walks on complex networks reveal community structure.Proceedings of the National Academy of Sciences of the United States of America,2007,105(4):1118-1123.
[9] Yang J,Leskovec J.Overlapping community detection at scale:A nonnegative matrix factorization approach ∥ Proceedings of the 6th ACM International Conference on Web Search and Data Mining.Rome,Italy:ACM,2013,587-596.
[10] Raghavan U N,Albert R,Kumara S.Near linear time algorithm to detect community structures in large-scale networks.Physical Review E,2007,76(3 Pt 2):36106.
[11] Blondel V D,Guillaume J L,Lambiotte R,et al.Fast unfolding of communities in large networks.Journal of Statistical Mechanics:Theory and Experiment,2008,2008:P10008.
[12] Ruan Y Y,Fuhry D,Parthasarathy S.Efficient community detection in large networks using content and links ∥ Proceedings of the 22nd international conference on World Wide Web.Rio Ode Karo,Brazil:ACM,2013,1089-1098.
[13] Yang J,McAuley J,Leskovec J.Community detection in networks with node attributes ∥ 2013 IEEE 13th International Conference on Data Mining.Shenzhen,China:IEEE,2013,1151-1156. 
[14] Pool S,Bonchi F,Leeuwen M V.Description-driven community detection.ACM Transactions on Intelligent Systems and Technology,2014,5(2):Article No.28.
[15] Duan L X,Xu D,Tsang I.Learning with augmented features for heterogeneous domain adaptation.Computer Science,2012,arXiv:1206.4660.
[16] Whang J J,Rai P,Dhillon I S.Stochastic blockmodel with cluster overlap,relevance selection,and similarity-based smoothing ∥ 2013 IEEE 13th International Conference on Data Mining.Shenzhen,China:IEEE,2013:817-826.
[17] Frank E,Trigg L,Geoffrey H,et al.Naive Bayes for regression.Machine Learning,2000,41(1):5-25.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!