南京大学学报(自然科学版) ›› 2014, Vol. 50 ›› Issue (4): 466–.

• • 上一篇    下一篇

大规模网络的三角形模体社区发现模型

柴变芳1,2*, 赵晓鹏3, 贾彩燕1, 于 剑1   

  • 出版日期:2014-08-23 发布日期:2014-08-23
  • 作者简介: 1. 交通数据分析与挖掘北京市重点实验室,北京交通大学,北京,100044;
    2. 石家庄经济学院 信息工程系,石家庄,050031;3. 河北省财政厅综合治税办公室,石家庄,050051
  • 基金资助:
     中央高校基本科研业务费专项资金(2014YJS039),河北省高等学校科学技术研究青年基金(2011143)

A model for community detection based on triangular motifs in large-scale networks

 Chai Bianfang1,2, Zhao Xiaopeng3, Jia Caiyan1, Yu Jian1   

  • Online:2014-08-23 Published:2014-08-23
  • About author: 1. Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing Jiaotong University,
    Beijing 100044, China;
    2. Department of Information Engineering, Shijiazhuang University of Economics, Shijiazhuang, 050031, China;
    3. Hebei Financial Department, composite tax management office, Shijiazhuang, 050051, China

摘要: 研究表明将边表示的网络转换为三角形模体表示形式,可以有效解决基于模型社区发现方法由网络规模庞大带来的计算瓶颈问题。提出一个三角形模体社区发现模型MCDTM(a Model for Community Detection based on Triangular Motifs),其将网络表示为一系列三角形模体,利用categorical分布对各三角形模体的生成过程建模,用最大似然参数估计方法给出参数估计的推理过程,根据参数估计结果可得节点、边及三角形模体的社区隶属度。人工网络和实际网络上的实验证明MCDTM模型可快速准确地发现网络的潜在结构。

Abstract:  The Mixed Membership Stochastic Blockmodel (MMSB) is well known to be flexible to model latent structures of a network. But its complexity costs too much due to edge representations. Recently, two models are provided to improve the MMSB model by representing networks as triangular motifs, which demonstrated that representing networks as a bag of triangular motifs could resolve computational bottlenecks of model-based approaches incurred by representing networks as edges. The two models used all the triangular motifs transferred from1-edges and 0-edges as input and assumed there were link patterns between any two communities, which added the inference and computing complexity. In this paper, a model for community detection based on triangular motifs (MCDTM) is provided, which uses the triangular motifs transferred from1-edges as input and only considers the link patterns within a community. Before we run the model, we should transfer 1-edges to a bag of triangular motifs. Then these motifs are sampled from all the motifs transferred from 1-edges and only sampled motifs are used as input. In addition, the model assumes there are links within a community. These two assumptions make the model be able to deal with large networks. The generative process of each triangular motif obeys a categorical distribution, and the generative probability of a triangular is the sum of the probability generating all the triangular edges in all communities. Each motif is generated independently. A maximization likelihood estimation method is used to infer the model parameters, and the memberships of nodes, edges and triangular motifs are able to be captained in terms of the results of parameters. Tests on synthetic networks and real large-scale networks demonstrated the MCDTM model can detect latent structures of the network fast and accurately.

 [1] Airodi E M, Blei D M, Fienberg S E, et al. Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 2008, 9: 1981~2014.
[2] Ho Q R, Yin J M, Eric X P. On triangular versus edge representations - Towards scalable modeling of networks. In: Advances in Neural Information Processing Systems, United States: Curran Associates Inc., 2012: 2141~2149.
[3] Yin J M, Ho Q R, Eric X P. A scalable approach to probabilistic latent space inference of large-scale networks. In: Advances in Neural Information Processing Systems, United States: Curran Associates Inc., 2013: 422~430.
[4] Gopalan P K, Mimno D, Gerrish S M, et al. Scalable inference of overlapping communities. In: Advances in Neural Information Processing Systems, United States: Curran Associates Inc., 2012: 2258~2266.
[5] Gopalan P K, Blei D M. Efficient discovery of overlapping communities in massive networks. Proceedings of the National Academy of Sciences, United States, 2013, 110(36): 14534~14539.
[6] Gopalan P K, Wang C, David B. Modeling overlapping communities with node popularities. In: Advances in Neural Information Processing Systems, United States: Curran Associates Inc., 2013: 2850~2858.
[7] Ren W, Yan G Y, Liao X P, et al. Simple probabilistic algorithm for detecting community structure. Physical Review E, 2009, 79: 036111.
[8] Ball B, Karrer B, Newman M E J. Efficient and principled method for detecting communities in network. Physical Review, 2011, 84:036103.
[9] Hofman J M, Wiggins C H. Bayesian approach to network modularity. Physical Review Letters, 2008, 100(25): 258701.
[10] Nowicki K, Snijders T. Estimation and prediction for stochastic block structures. Journal of American Statistical Association, 2001, 96(455):1077~1087.
[11] Daudin J, Picard F, Robin S. A mixture model for random graphs. Statistics and Computing, 2008, 18: 179~183.
[12] Zanghi H, Ambroise C, Miele V. Fast online graph clustering via Erdös-Rényi mixture. Pattern Recognition, 2008, 41(12): 3592~3599.
[13] Zanghi H, Picard F, Miele V, et al. Strategies for online inference of model-based clustering in large and growing networks. The Annals of Applied Statistics, 2010, 4(2): 687~714.
[14] Latouche P, Birmele E, Ambroise C. Variational Bayesian inference and complexity control for stochastic block models. Statistical Modeling, 2012, 12(1):93~115.
[15] Latouche P, Birmele E, Ambroise C. Overlapping stochastic block models with application to the French political blogosphere. The Annals of Applied Statistics, 2011, 5(1):309~336.
[16] Mark Newman. Network data. http://www-personal.umich.edu/~mejn/netdata/,2014-6-18.
[17] Jure Leskovec. Stanford network analysis project. http://snap.stanford.edu/data/index.html, 2014-6-18.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!