南京大学学报(自然科学版) ›› 2020, Vol. 56 ›› Issue (1): 1–8.doi: 10.13232/j.cnki.jnju.2020.01.001

• •    下一篇

一种基于用户结构和属性的无监督用户对齐方法

俞冬明,李苑,李智星,王国胤()   

  1. 计算智能重庆市重点实验室,重庆邮电大学,重庆,400065
  • 收稿日期:2019-07-17 出版日期:2020-01-30 发布日期:2020-01-10
  • 通讯作者: 王国胤 E-mail:wanggy.cq@hotmail.com
  • 基金资助:
    国家重点研发计划(2017YFB0802300)

An unsupervised user alignment method based on user structure and attribute

Dongming Yu,Yuan Li,Zhixing Li,Guoyin Wang()   

  1. Chongqing Key Laboratory of Computational Intelligence,Chongqing University of Posts and Telecommunications,Chongqing,400065,China
  • Received:2019-07-17 Online:2020-01-30 Published:2020-01-10
  • Contact: Guoyin Wang E-mail:wanggy.cq@hotmail.com

摘要:

随着互联网应用的蓬勃发展,一个人在不同的社交网络平台上都拥有账户是很常见的.如何在多个社交网络上找到同一个人的账户,对许多应用是很重要的问题,也被称为用户对齐问题.在用户对齐问题上,目前有两个主要的挑战:首先,收集手工对齐的用户对作为训练数据的代价非常大,但传统的有监督方法往往需要大量的标注数据才能获得较好的效果;其次,不同网络中的用户的结构和属性往往不太相同,进一步增加了用户对齐的难度.提出一种无监督用户对齐方法SPUAL(Soft Principle for User Alignment),设计了一种新颖的基于用户的属性与结构的软对齐一致性原则,通过无监督方法计算用户对是否服从此原则来推断用户对是否对齐.在几个公共数据集上的实验表明,该方法的性能比目前最先进的无监督方法都有明显提高.

关键词: 用户对齐, 社交网络, 无监督, 对齐原则

Abstract:

With the fast development of Internet applications,it's common for someone to have accounts on different social network platforms. How to find out which account on multiple social networks are of the same person is an important issue for many applications today,which is also known as the user alignment problem. There are two major challenges when it comes to user alignment. First,it's extremely expensive to collect manually aligned user pairs as training data,but traditional supervised methods often need a large amount of labeled data to achieve better results. Second,users on different networks often have different structures and attributes,which further increase the difficulty of user alignment. We propose an unsupervised user alignment method SPUAL (Soft Principle for User Alignment),design a novel soft alignment principle based on user attributes and structure,and then infer whether the user alignment is correct or not by calculating whether the users meet to the principles by unsupervised method. Experiments on several common datasets show that the performance of our method is much better than the most advanced unsupervised methods.

Key words: user alignment, social network, unsupervised, alignment principle

中图分类号: 

  • TP391

表1

符号和意义"

符号意义
G=A,N一个社交网络

A

N

网络的邻接矩阵

网络用户的属性矩阵

n1,n2网络Gs和网络Gt的用户数
K用户节点的属性个数
a,b网络Gs的用户索引
x,y网络Gt的用户索引
v,w向量化对齐的节点对索引
I,l分别是单位矩阵和值全为1的向量
H,S对齐前的偏好和对齐矩阵
α,β,m对齐原则的权重以及正则化参数

a=vec(A)

D=diag(d)

?

将矩阵A以列的顺序向量化

对角化向量d

克罗内克积

图1

对齐一致性的说明"

表2

四个数据集的信息"

Dataset

Flickr?

Lastfm

Flickr?MyspaceDoubanACM?DBLP
Number12974~154366714~107333906~11189872~9916
Attributes

Sex,

Pagerank

Sex,

Pagerank

LocationConference
HUsername SimilarityUsername SimilarityDegree SimilarityDegree Similarity

表3

不同数据集上不同算法对齐准确率的对比"

MethodsFlickr?LastfmFlickr?MyspaceDoubanACM?DBLP
Regal0.010.0100.003
IsoRank0.40.360.070.21
NetAlign0.430.450.010.03
UniAlign0.10.030.010.01

Klau's

Algorithm

0.380.40.070.12
FINAL0.6650.6400.2390.210
SPUAL0.6770.6630.2490.241

表4

SPAUL在两个数据集中使用不同参数进行对齐的准确率对比"

Parameter(α-β)Flickr?LastfmFlickr?Myspace
0.7-0.30.6770.663
0.8-0.20.6750.655
0.9-0.10.6700.659
1 Zhang Y T,Tang J,Yang Z L,et al. Cosnet:connecting heterogeneous social networks with local and global consistency∥Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Sydney,Australia:ACM,2015:1485-1494.
2 Chen Z,Yu X,Song B,et al. Community?based network alignment for large attributed network∥Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.Singapore:ACM,2017:587-596.
3 Manners H N,Elmsallati A,Guzzi P H,et al. Performing local network alignment by ensembling global aligners∥2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Kansas City,MO,USA:IEEE,2017:1316-1323.
4 Smalter A,Huan J,Lushington G. Gpm:a graph pattern matching kernel with diffusion for chemical compound classification∥2008 8th IEEE International Conference on BioInformatics and BioEngineering.Athens,Greece:IEEE,2008:1-6.
5 Bayati M,Gerritsen M,Gleich D F,et al. Algorithms for large,sparse network alignment problems∥2009 9th IEEE International Conference on Data Mining.Miami,FL,USA:IEEE,2009:705-710.
6 Koutra D,Tong H,Lubensky D. Big?Align:Fast bipartite graph alignment∥2013 IEEE 13th International Conference on Data Mining (ICDM). Dallas,TX,USA:IEEE,2013.
7 Zhang S,Tong H H. Final:Fast attributed network alignment∥Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.San Francisco,CA,USA:ACM,2016:1345-1354.
8 Zhang S,Tong H,Tang J,et al. iNEAT:Incomplete network alignment∥2017 IEEE International Conference on Data Mining.New Orleans,LA,USA:IEEE,2017:1189-1194.
9 Goga O,Perito D,Lei H,et al. Large?scale correlation of accounts across social networks. Technical Report. Berkeley:University of California at Berkeley,2013:TR?13?002.
10 Liu S Y,Wang S H,Zhu F D. HYDRA:Large?scale social identity linkage via heterogeneous behavior modeling∥Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data.Snowbird,UT,USA:ACM,2014.
11 Zhou X P,Liang X,Du X Y,et al. Structure based user identification across social networks. IEEE Transactions on Knowledge and Data Engineering,2017,30(6):1178-1191.
12 Liu L,Cheung W K,Li X,et al. Aligning users across social networks using network embedding∥Proceedings of the 25th International Joint Conference on Artificial Intelligence.New York,NY,USA:AAAI Press,2016.
13 Zhang J,Chen B,Wang X,et al. Mego2vec:embedding matched ego networks for user alignment across social networks∥Proceedings of the 27th ACM International Conference on Information and Knowledge Management.Torino,Italy:ACM,2018:327-336.
14 Page L,Brin S,Motwani R,et al. The PageRank citation ranking:bringing order to the web. Technical Report. Stanford InfoLab,1999.
15 Liao C S,Lu K,Baym M,et al. IsoRankN:Spectral methods for global alignment of multiple protein networks. Bioinformatics,2009,25(12):i253-i258.
16 Andersen R,Chung F R K,Lang K J. Local graph partitioning using pageRank vectors∥2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06). Berkeley,CA,USA:IEEE,2006.
17 Bayati M,Shah D,Sharma M. Maximum weight matching via max?product belief propagation∥Proceedings of International Symposium on Information Theory (ISIT 2005). Adelaide,Australia:IEEE,2005.
18 Zhang J W,Yu P S. Multiple anonymized social networks alignment∥2015 IEEE International Conference on Data Mining.Atlantic City,NJ,USA:IEEE,2015:599-608.
19 Zhong E H,Fan W,Wang J W,et al. ComSoc:adaptive transfer of user behaviors over composite social network∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Beijing,China:ACM,2012.
20 Heimann M,Shen H M,Safavi T,et al. Regal:representation learning?based graph alignment∥Proceedings of the 27th ACM International Conference on Information and Knowledge Management.Torino,Italy:ACM,2018:117-126.
21 Klau G W. A new graph?based method for pairwise global network alignment. BMC Bioinformatics,2009,10(S1):S59.
22 Kollias G,Mohammadi S,Grama A. Network similarity decomposition (NSD):a fast and scalable approach to network alignment. IEEE Transactions on Knowledge and Data Engineering,2012,24(12):2232-2243.
23 Zheng Z D,Zheng L,Yang Y. Pedestrian alignment network for large?scale person re?identification. IEEE Transactions on Circuits and Systems for Video Technology,2018,doi:10.1109/TCSVT.2018.
doi: 10.1109/TCSVT.2018
2873599.
doi: 10.1109/TCSVT.2018
[1] 陈俊芬,赵佳成,韩洁,翟俊海. 基于深度特征表示的Softmax聚类算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 533-540.
[2] 吕国俊,曹建军,郑奇斌,常宸,翁年凤. 基于结构保持对抗网络的跨模态实体分辨[J]. 南京大学学报(自然科学版), 2020, 56(2): 197-205.
[3] 王伯伟, 聂秀山, 马林元, 尹义龙. 基于语义相似度的无监督图像哈希方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 41-48.
[4] 靳义林1,2*,胡 峰1,2. 基于三支决策的中文文本分类算法研究[J]. 南京大学学报(自然科学版), 2018, 54(4): 794-.
[5]  董利梅,赵 红*,杨文元.  基于稀疏聚类的无监督特征选择[J]. 南京大学学报(自然科学版), 2018, 54(1): 107-.
[6] 李 凡1,2,赵 姝1,2*,陈 洁1,2,张燕平1,2. 基于加权中介中心性的结构洞占据者方法获取[J]. 南京大学学报(自然科学版), 2017, 53(4): 756-.
[7]  李 婵,杨文元*,赵 红.  联合依赖最大化与稀疏表示的无监督特征选择方法[J]. 南京大学学报(自然科学版), 2017, 53(4): 775-.
[8] 宾 晟*,孙更新. 基于多子网复合复杂网络模型的多关系社交网络重要节点发现算法[J]. 南京大学学报(自然科学版), 2017, 53(2): 378-.
[9] 张燕平1,2张顺1,2钱付兰1,2严远亭1,2. 一种局部和全局用户影响力相结合的社交推荐算法[J]. 南京大学学报(自然科学版), 2015, 51(4): 858-865.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 孙 玫,张 森,聂培尧,聂秀山. 基于朴素贝叶斯的网络查询日志session划分方法研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1132 -1140 .
[2] 周星星,张海平,吉根林. 具有时空特性的区域移动模式挖掘算法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1171 -1182 .
[3] 韩明鸣, 郭虎升, 王文剑. 面向非平衡多分类问题的二次合成QSMOTE方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 1 -13 .
[4] 刘 素, 刘惊雷. 基于特征选择的CP-nets结构学习[J]. 南京大学学报(自然科学版), 2019, 55(1): 14 -28 .
[5] 王伯伟, 聂秀山, 马林元, 尹义龙. 基于语义相似度的无监督图像哈希方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 41 -48 .
[6] 孔 颉, 孙权森, 纪则轩, 刘亚洲. 基于仿射不变离散哈希的遥感图像快速目标检测新方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 49 -60 .
[7] 贾海宁, 王士同. 面向重尾噪声的模糊规则模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 61 -72 .
[8] 严云洋, 瞿学新, 朱全银, 李 翔, 赵 阳. 基于离群点检测的分类结果置信度的度量方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 102 -109 .
[9] 阚 威, 李 云. 基于LSTM的脑电情绪识别模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 110 -116 .
[10] 陆慎涛, 葛洪伟, 周 竞. 自动确定聚类中心的移动时间势能聚类算法[J]. 南京大学学报(自然科学版), 2019, 55(1): 143 -153 .