南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (6): 1020–1029.doi: 10.13232/j.cnki.jnju.2019.06.014

• • 上一篇    下一篇

面向网络结构发现的批量主动学习算法

柴变芳1,魏春丽1,曹欣雨1,王建岭2()   

  1. 1. 河北地质大学信息工程学院,石家庄,050031
    2. 河北中医学院教务处,石家庄,050200
  • 收稿日期:2019-07-26 出版日期:2019-11-30 发布日期:2019-11-29
  • 通讯作者: 王建岭 E-mail:30673253@qq.com
  • 基金资助:
    国家自然基金(61503260);河北省自然科学基金(F2019403070);河北省人文社会科学项目(SD151087)

Batch Mode Active Learning for exploring structure of networks

Bianfang Chai1,Chunli Wei1,Xinyu Cao1,Jianling Wang2()   

  1. 1. School of Information Engineering,Hebei GEO University,Shijiazhuang,050031,China
    2. Office of Academic Affairs, Hebei University of Chinese Medicine, Shijiazhuang, 050200, China
  • Received:2019-07-26 Online:2019-11-30 Published:2019-11-29
  • Contact: Jianling Wang E-mail:30673253@qq.com

摘要:

网络结构发现可识别网络多类型聚类模式,但其准确率有待提升.批量主动学习选择质量高的节点集合构造先验,可提升无监督网络结构发现的性能.面向属性网络分类的主动学习BMAL(Batch Mode Active Learning)只考虑链接信息实现网络节点选择,但不能有效选择使模型性能提升至最优的节点集合,且依赖初始人工标注及参数.提出一个新的批量主动学习算法,利用目标函数的子模性迭代选择最优的节点集合.该方法基于未标记节点的不确定性和非冗余影响力选择最优节点集合,不确定性依据节点及其邻居的类隶属度,影响力依据节点的非重叠中心性,两个指标的权重依据熵权法自动确定.人工和真实网络上的实验结果表明,该方法能选择使结构发现性能提升最大的节点集合.

关键词: 批量主动学习, 节点集合选择, 网络结构发现, 半监督聚类

Abstract:

Exploring structure of networks can identify many types of clustering pattern,which helps people understand and utilize networks. However,its accuracy needs to be improved. Batch Mode Active Learning(BMAL) selects informative nodes to construct priors,which are used to improve the performance of unsupervised structure exploration. BMAL for classification on attributed networks can realize node selection for networks only based on links. But it cannot select optimal nodes which improve better performance of the model. In order to improve the performance of structure exploring,a new batch active learning algorithm is provided,which selects the optimal nodes iteratively on the submodular property of objective function. It selects a node set that maximizes the uncertainty and non?redundant impact of unlabeled nodes. The uncertainty is computed according to the memberships of itself and its neighborhoods. And the impact is based on the nonoverlapping centrality. Their weights are adjusted by entropy weighted method automatically. Experiments on synthetic and real networks illustrate that our proposed method is able to select optimal node set that improves the performance of structure exploring.

Key words: batch mode active learning, node set selection, structure exploring for networks, semi?supervised clustering

中图分类号: 

  • TP391

图1

Karate网络"

表1

真实网络数据集"

网络 网络结构类型 节点
Karate 社区结构 34 78 2
Dolphins 社区结构 62 159 2
Adjnoun 二分结构 112 425 2

表2

人工网络数据集"

网络 网络结构类型 节点
39_3 混合结构 39 53 3
15_2 二分结构 15 25 2
30_2 星型结构 30 33 2

图2

39_3网络"

图3

15_2网络"

图4

30_2网络"

表3

Karate上节点选择结果"

比例

BMAL_

NMS

BMAL_L BMAL_L_H

BMAL_

L_C

BALN
5% 34,1 1,34 34,1 1,32 33,1
10% 34,1,3,20 1,34,3,5 34,1,3,32 1,32,34,5 1,34,33,2
15% 34,1,3,20,32,9 1,34,3,5,11,29 34,1,3,32,6,2 1,32,34,5,6,29 34,2,1,33,4,22
20% 34,1,3,20,32,9,25 1,34,3,5,11,29,8 34,1,3,2,32,6,24

1,32,34,5,6,29,

7

1,34,2,33,7,4,32,30

表4

39_3上节点选择结果"

比例

BMAL_

NMS

BMAL_

L

BMAL_

L_H

BMAL_

L_C

BALN
5% 8,6 6,8 36,37 6,8 6,4
10%

8,6,4,

13

6,8,37,

7

36,37,35,38 6,8,35,7 6,8,17,5
15% 8,6,4,13,32,14 6,8,37,7,9,10 36,37,35,38,39,6 6,8,35,7,9,10 4,8,6,17,34,5
20% 8,6,4,13,32,14,15,16

6,8,37,

7,9,10,11,12

36,37,35,

38,39,6,

8,30

6,8,35,7,9,10,11,12 4,6,8,5,3,34,33,17

表5

15_2上节点选择结果"

比例

BMAL_

NMS

BMAL_

L

BMAL_

L_H

BMAL_

L_C

BALN
5% 12 12 12 1 12,9
10% 12,1 12,8 12,1 1,13 12,9
15% 12,1,5 12,8,2 12,1,8 1,13,4 12,1,8,9
20% 12,1,5,3 12,8,2,11 12,1,8,9 1,13,4,2

12,1,8,

9

表6

30_2上节点选择结果"

比例

BMAL_

NMS

BMAL_

L

BMAL_

L_H

BMAL_

L_C

BALN
5% 20,10 20,10 20,10 11,10 20,10
10% 20,10,6 20,10,23 20,10,24 11,10,13 20,10,30,23
15% 20,10,6,11,23 20,10,23,16,12 20,10,24,15,16 11,10,13,26,14 20,6,10,30,23,11
20% 20,10,6,11,23,30 20,10,23,16,12,18 20,10,24,15,16,17 11,10,13,26,14,15 20,6,10,30,23,11

图5

BMAL_NMS和四种对比算法在Adjnoun数据集上的NMI对比"

图6

BMAL_NMS和四种对比算法在Adjnoun数据集上的ACC对比"

图7

BMAL_NMS和四种对比算法在Dolphins数据集上的NMI对比"

图8

BMAL_NMS和四种对比算法在Dolphins数据集上的ACC对比"

图9

BMAL_NMS和四种对比算法在Karate数据集上的NMI对比"

图10

BMAL_NMS和四种对比算法在Karate数据集上的ACC对比"

1 Newman M E J , Leicht E A . Mixture models and exploratory analysis in networks. Proceedings of the National Academy of Sciences of the United States of America,2007,104(23):9564-9569.
2 柴变芳,贾彩燕,于剑 . 基于概率模型的大规模网络结构发现方法. 软件学报,2014,25(12):2753-2766.
Chai B F , Jia C Y , Yu J . . Approaches of structure exploratory based on probabilistic models in massive networks. Journal of Software,2014,25(12):2753-2766.
3 Cheng J J , Leng M W , Li L J ,et al . Active semi?supervised community detection based on must?link and cannot?link constraints. PLOS One,2014,doi:10.1371/journal.pone.0110088 .
doi: 10.1371/journal.pone.0110088
4 Yang L , Jin D , Wang X ,et al . Active link selection for efficient semi?supervised community detection. Scientific Reports,2015,doi:10.1038/srep09039 .
doi: 10.1038/srep09039
5 Liu Z Y , Huang S J . Active sampling for open?set classification without initial annotation. Proceedings of the AAAI Conference on Artificial Intelligence,2019,33(1):4416-4423.
6 Anand D B , Saravanan R , Suji R A . Adaptive batch mode active learning technique using an improved time adaptive support vector machine for classification of remote sensed image applications. Journal of Computational and Theoretical Nanoscience,2017,14(2):1108-1113.
7 Shi L X , Zhao Y H , Tang J . Batch mode active learning for networked data. ACM Transactions on Intelligent Systems and Technology,2012,3(2):1-25.
8 Nogueira B M , Tomas Y K B , Marcacini R M . Integrating distance metric learning and cluster?level constraints in semi?supervised clustering∥International Joint Conference on Neural Networks. Anchorage,AK,USA:IEEE,2017:4118-4125.
9 Goudjil M , Koudil M , Bedda M ,et al . A novel active learning method using SVM for text classification. International Journal of Automation and Computing,2018,15(3):290-298.
10 Chen X , Wang T . Combining active learning and semi?supervised learning by using selective label spreading∥2017 IEEE International Conference on Data Mining Workshops (ICDMW). New Orleans,LA,USA:IEEE,2017:850-857.
11 Huang S J , Chen J L , Mu X ,et al . Cost?effective active learning from diverse labelers∥26th International Joint Conference on Artificial Intelligence. Melbourne,Australia:IJCAI,2017:1879-1885.
12 陈嶷瑛,柴变芳,李文斌 等 . 基于迭代框架的主动链接选择半监督社区发现算法. 计算机应用,2017,37(11):3085-3089.
Chen Y Y , Chai B F , Li W B ,et al . Semi?supervised community detection algorithm using active link selection based on iterative framework. Journal of Computer Applications,2017,37(11):3085-3089.
13 Xiong S C , Azimi J , Fern X Z . Active learning of constraints for semi?supervised clustering. IEEE Transactions on Knowledge & Data Engineering,2014,26(1):43-54.
14 Leng M W , Huang L , Li L J ,et al . Active semisupervised community detection based on asymmetric similarity measure. International Journal of Modern Physics B,2015,29(13):1550078.
15 Moore C , Yan X R , Zhu Y J ,et al . active learning for node classification in assortative and disassortative networks∥ACM SIGKDD Interna?tional Conference on Knowledge Discovery and Data Mining. San Diego,CA,USA:ACM,2011:841-849.
16 Ping S Q , Liu D Y , Yang B ,et al . Batch mode active learning for node classification in assortative and disassortative networks. IEEE Access,2018,6:4750-4758.
17 Zhu X , Ghahramani Z , Lafferty J . Semi?supervised learning using Gaussian fields and harmonic functions∥Proceeding 20th International Conference on Machine Learning,2003:912-919.
18 Newman M E J , Leicht E A . Mixture models and exploratory analysis in networks. Proceedings of the National Academy of Sciences of the United States of America,2007,104(23):9564-9569.
19 Jia C Y , Li Y F , Carson M B ,et al . Node attribute?enhanced community detection in complex networks. Scientific Reports,2017,7(1):2626.
20 Fagbote E O , Olanipekun E O , Uyi H S . Water quality index of the ground water of bitumen deposit impacted farm settlements using entropy weighted method. International Journal of Environmental Science and Technology,2014,11(1):127-138.
21 赵学华,杨博,陈贺昌 . 一种高效的随机块模型学习算法. 软件学报,2016,27(9):2248-2264.
Zhao X H , Yang B , Chen H C . Fast learning algorithm for stochastic block model. Journal of Software,2016,27(9):2248-2264.
[1] 杨红鑫,杨绪兵,张福全,业巧林. 半监督平面聚类算法设计[J]. 南京大学学报(自然科学版), 2020, 56(1): 9-18.
[2]  常瑜1.2** ,梁吉业1, 2,高嘉伟1,2,杨静1·2
.  一种基于Seeds集和成对约束的半监督聚类算法*[J]. 南京大学学报(自然科学版), 2012, 48(4): 405-411.
[3]  申 彦**,宋顺林,朱玉全
.  一种基于半监督的大规模数据集聚类算法*
[J]. 南京大学学报(自然科学版), 2011, 47(4): 372-382.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 段新春,施 斌,孙梦雅,魏广庆,顾 凯,冯晨曦. FBG蒸发式湿度计研制及其响应特性研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1075 -1084 .
[2] 许 林,张 巍*,梁小龙,肖 瑞,曹剑秋. 岩土介质孔隙结构参数灰色关联度分析[J]. 南京大学学报(自然科学版), 2018, 54(6): 1105 -1113 .
[3] 胡 淼,王开军,李海超,陈黎飞. 模糊树节点的随机森林与异常点检测[J]. 南京大学学报(自然科学版), 2018, 54(6): 1141 -1151 .
[4] 魏 桐,童向荣. 基于加权启发式搜索的鲁棒性信任路径生成[J]. 南京大学学报(自然科学版), 2018, 54(6): 1161 -1170 .
[5] 陆慎涛, 葛洪伟, 周 竞. 自动确定聚类中心的移动时间势能聚类算法[J]. 南京大学学报(自然科学版), 2019, 55(1): 143 -153 .
[6] 尹学渊, 陈兴蜀, 陶术松, 陈 林. 一种无代理虚拟机进程监控方法[J]. 南京大学学报(自然科学版), 2019, 55(2): 221 -230 .
[7] 李嘉明, 邹 勇, 郑 浩, 魏钟波, 杨柳燕, 缪爱军. 养殖塘生态系统重金属污染状况与风险评价[J]. 南京大学学报(自然科学版), 2019, 55(2): 272 -281 .
[8] 巢 前, 蔡进功, 李艳丽, 王国力. 泥岩中的有机质对基于XRD的伊蒙混层结构计算的影响[J]. 南京大学学报(自然科学版), 2019, 55(2): 291 -300 .
[9] 王飞永, 彭建兵, 卢全中, 黄强兵, 孟振江, 乔建伟. 渭河盆地地裂缝同生机制研究[J]. 南京大学学报(自然科学版), 2019, 55(3): 339 -348 .
[10] 吕海敏,沈水龙,严学新,史玉金,许烨霜. 上海地面沉降对轨道交通安全运营风险评估[J]. 南京大学学报(自然科学版), 2019, 55(3): 392 -400 .