南京大学学报(自然科学版) ›› 2020, Vol. 56 ›› Issue (2): 197–205.doi: 10.13232/j.cnki.jnju.2020.02.005

• • 上一篇    下一篇

基于结构保持对抗网络的跨模态实体分辨

吕国俊1,曹建军2(),郑奇斌1,常宸1,翁年凤2   

  1. 1.陆军工程大学指挥控制工程学院,南京,210007
    2.国防科技大学第六十三研究所,南京,210007
  • 收稿日期:2019-11-24 出版日期:2020-03-30 发布日期:2020-04-02
  • 通讯作者: 曹建军 E-mail:jianjuncao@yeah.net
  • 基金资助:
    国家自然科学基金(61371196);中国博士后科学基金(20090461425);国家重大科技专项(2015ZX01040201?003)

Structure maintenance based adversarial network for cross⁃modal entity resolution

Lü Guojun1,Jianjun Cao2(),Qibin Zheng1,Chen Chang1,Nianfeng Weng2   

  1. 1.Institute of Command and Control Engineering,Army Engineering University,Nanjing,210007,China
    2.The Sixty?third Research Institute,National University of Defense Technology,Nanjing,210007,China
  • Received:2019-11-24 Online:2020-03-30 Published:2020-04-02
  • Contact: Jianjun Cao E-mail:jianjuncao@yeah.net

摘要:

跨模态实体分辨旨在从不同模态的数据中找到对同一实体的不同客观描述.常用的跨模态实体分辨方法通过将不同模态数据映射到同一空间中进行相似性度量,大多通过使用类别信息建立映射前后的语义联系,却忽略了对跨模态成对样本信息的有效利用.在真实数据源中,给大量的数据进行标注耗时费力,难以获得足够的标签数据来完成监督学习.对此,提出一种基于结构保持的对抗网络跨模态实体分辨方法(Structure Maintenance based Adversarial Network,SMAN),在对抗网络模型下构建模态间的K近邻结构损失,利用模态间成对信息在映射前后的结构保持学习更一致的表示,引入联合注意力机制实现模态间成对样本信息的对齐.实验结果表明,在不同数据集上,SMAN和其他无监督方法和一些典型的有监督方法相比有更好的性能.

关键词: 数据质量, 跨模态实体分辨, 无监督学习, 对抗学习, K近邻, 联合注意力

Abstract:

Cross?modal entity resolution aims to find different objective descriptions of the same entity in different modalities. The common way to solve the problem is to construct a shared space to measure similarity where multi?modal examples can be represented uniformly. Lots of methods establish semantic connections by using category information,while ignoring the effective usage of information for sample pairs. In real data sources,annotating a great deal of data may consume a lot of time and labor,which is difficult to obtain enough labeled data for supervised learning. According to this,a Structure Maintenance based Adversarial Network (SMAN) is proposed for cross?modal entity resolution. Under the adversarial network,K?nearest structure loss is built between modalities to learn more consistent representation which uses the maintenance of pairs information after nonlinear mapping,and co?attention mechanism is designed to achieve the alignment of pairs information between modalities. The experimental results on different datasets show the superior performance of the proposed method compared with other unsupervised methods and some typical supervised methods.

Key words: data quality, cross?modal entity resolution, unsupervised learning, adversarial learning, K?nearest, co?attention mechanism

中图分类号: 

  • TP311

图1

有监督方法和无监督方法的不同之处"

图2

基于结构保持的对抗网络跨模态实体分辨模型"

图3

跨模态K=3时的K近邻结构保持示意图"

表1

实验参数"

数据集Kμ1μ2α
Pascal Sentence51e-31e-50.001
Wikipedia201e-31e-40.01
XMedia201e-41e-40.01

表2

SMAN方法和其他对比算法在Pascal Sentence上的跨模态实体分辨结果"

方法Pascal Sentence
图像?文本文本?图像MAP均值
CCA0.23800.21690.2275
LCFS0.38510.37410.3796
SCM0.39340.38660.3901
CDLFM0.37120.35040.3608
UCAL0.29360.43920.3664
SMN0.31010.43430.3722
SMAN0.34670.47080.4088

表3

SMAN方法和其他对比算法在Wikipedia上的跨模态实体分辨结果"

方法Wikipedia
图像?文本文本?图像MAP均值
CCA0.31760.28480.3012
LCFS0.28610.22080.2535
SCM0.32760.29140.3095
CDLFM0.28330.25470.2691
UCAL0.24850.36410.3063
SMN0.26110.37330.3172
SMAN0.26780.40120.3345

表4

SMAN方法和其他对比算法在XMedia上的跨模态实体分辨结果"

方法XMedia
图像?文本文本?图像MAP均值
CCA0.20030.24170.2211
LCFS0.30430.29130.2978
SCM0.24620.32640.2863
CDLFM0.29470.28690.2908
UCAL0.27980.39770.3388
SMN0.29880.41790.3584
SMAN0.29670.45860.3777

图4

SMAN方法在Pascal Sentence数据集上最优子空间的t?SNE可视化"

图5

Pascal Sentence数据集的K值选择"

图6

不同学习率下SMAN方法在Pascal Sentence数据集上的实验结果"

1 彭宇新,綦金玮,黄鑫.多媒体内容理解的研究现状与展望.计算机研究与发展,2019,56(1):183-208.
Peng Y X,Qi J W,Huang X.Current research status and prospects on multimedia content understanding. Journal of Computer Research and Development,2019,56(1):183-208.)
2 Peng Y X,Huang X,Zhao Y Z.An overview of cross?media retrieval:concepts,methodologies,bench?marks,and challenges.IEEE Transactions on Circuits and Systems for Video Technology,2017,28(9):2372-2385.
3 Yu E,Sun J D,Li J,et al.Adaptive semi?supervised feature selection for cross?modal retrieval.IEEE Transactions on Multimedia,2019,21(5):1276-1288.
4 Rasiwasia N,Pereira J C,Coviello E,et al.A new approach to cross?modal multimedia retrieval∥Proceedings of the 18th ACM International Conference on Multimedia.Florence,Italy:ACM Press,2010:251-260.
5 Putthividhy D,Attias H T,Nagarajan S S.Topic regression multi?modal latent dirichlet allocation for image annotation∥2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.San Francisco,CA,USA:IEEE,2010:3408-3415.
6 Andrew G,Arora R,Bilmes JA,et al.Deep canonical correlation analysis∥Proceedings of the 30th International Conference on International Conference on Machine Learning.Atlanta,GA,USA:ICML Press,2013:1247-1255.
7 Srivastava N,Salakhutdinov R.Multimodal learning with deep Boltzmann machines.Journal of Machine Learning Research,2014(15):2949-2980.
8 Feng F X,Wang X J,Li R F.Cross?modal retrieval with correspondence autoencoder∥Proceedings of the 22nd ACM International Conference on Multimedia.Orlando,FL,USA:ACM Press,2014:7-16.
9 He L,Xu X,Lu H M,et al.Unsupervised cross-modal retrieval through adversarial learning∥2017 IEEE International Conference on Multimedia and Expo.Hong Kong,China:IEEE,2017:1153-1158.
10 Chung Y A,Weng W H,Tong S,et al.Unsupervised cross?modal alignment of speech and text embedding space∥Proceedings of Neural Information Processing Systems.Montreal,Canada:NIPS Press,2018:7354-7364.
11 Gong Y C,Ke Q F,Isard M,et al.A multi?view embedding space for modeling internet images,tags,and their semantics.International Journal of Computer Vision,2014,106(2):210-233.
12 Wang K Y,He R,Wang L,et al.Joint feature selection and subspace learning for cross?modal retrieval.IEEE Transactions on Pattern Analysis and Machine Intelligence,2016,38(10):2010-2023.
13 Peng Y X,Huang X,Qi J W.Cross?media shared representation by hierarchical learning with multiple deep networks∥Proceedings of the 25th International Joint Conference on Artificial Intelligence.New York,NY,USA:AAAI Press,2016:3846-3853.
14 Wang L W,Li Y,Lazebnik S.Learning deep structure?preserving image?text embeddings∥2016 IEEE Conference on Computer Vision and Pattern Recognition.Las Vegas,NV,USA:IEEE,2016:5005-5013.
15 Yan F,Mikolajczyk K.Deep correlation for matching images and text∥2015 IEEE Conference on Computer Vision and Pattern Recognition.Boston,MA,USA:IEEE,2015:3441-3450.
16 Goodfellow I J,Pouget?Abadie J,Mirza M,et al.Generative adversarial nets∥Proceedings of the 27th International Conference on Neural Information Processing Systems.Cambridge,MA,USA:MIT Press,2014:2672-2680.
17 Wang B K,Yang Y,Xu X,et al.Adversarial cross?modal retrieval∥Proceedings of the 25th International Conference on Multimedia.New York,NY,USA:ACM Press,2017:154-162.
18 Hu L Q,Kan M N,Shan S G,et al.Duplex generative adversarial network for unsupervised domain adaptation∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City,UT,USA:IEEE,2018:1498-1507.
19 Wei Y C,Zhang Y,Lu C Y,et al.Cross?modal retrieval with CNN visual features:a new baseline.IEEE Transactions on Cybernetics,2017,47(2):449-460.
20 Kingma D P,Ba J.Adam:a method for stochastic optimization.2014,arXiv:1412.6980.
21 Wang K Y,He R,Wang W,et al.Learning coupled feature spaces for cross?modal matching∥2013 IEEE International Conference on Computer Vision.Sydney,Australia:IEEE,2013:2088-2095.
22 Xu X,Shimada A,Taniguchi R I,et al.Coupled dictionary learning and feature mapping for cross?modal retrieval∥2015 IEEE International Conference on Multimedia and Expo.Turin,Italy:IEEE,2015:1-6.
23 Blei D M,Jordan M I.Modeling annotated data∥Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval.New York,NY,USA:ACM Press,2003:127-134.
24 Van Der Maaten L,Hinton G.Visualizing data using t?SNE.Journal of Machine Learning,2008,9(11):2579-2605.
[1] 陈俊芬,赵佳成,韩洁,翟俊海. 基于深度特征表示的Softmax聚类算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 533-540.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 马娜, 范敏, 李金海. 复杂网络下的概念认知学习[J]. 南京大学学报(自然科学版), 2019, 55(4): 609 -623 .
[2] 柴变芳,魏春丽,曹欣雨,王建岭. 面向网络结构发现的批量主动学习算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 1020 -1029 .
[3] 郑文萍,刘韶倩,穆俊芳. 一种基于相对熵的随机游走相似性度量模型[J]. 南京大学学报(自然科学版), 2019, 55(6): 984 -999 .
[4] 王卫星,刘兆伟,石敬华. 基于时间敏感滑动窗口的CP⁃nets结构学习[J]. 南京大学学报(自然科学版), 2020, 56(2): 175 -185 .