南京大学学报(自然科学版) ›› 2022, Vol. 58 ›› Issue (3): 377–385.doi: 10.13232/j.cnki.jnju.2022.03.002

• • 上一篇    下一篇

面向多源异构数据的跨模态存储与检索系统

孔亚宁, 李春山(), 初佃辉   

  1. 哈尔滨工业大学计算机科学与技术学院,威海,264209
  • 收稿日期:2022-04-19 出版日期:2022-05-30 发布日期:2022-06-07
  • 通讯作者: 李春山 E-mail:lics@hit.edu.cn
  • 基金资助:
    国家重点研发计划(2018YFB1700400);国家自然科学基金(61902090);山东省重大科技创新工程(2021ZLGX05)

Cross modal storage and retrieval system for multi⁃source heterogeneous data

Yaning Kong, Chunshan Li(), Dianhui Chu   

  1. School of Computer Science and Technology,Harbin Institute of Technology,Weihai,264209,China
  • Received:2022-04-19 Online:2022-05-30 Published:2022-06-07
  • Contact: Chunshan Li E-mail:lics@hit.edu.cn

摘要:

制造业在设计、生产、销售和服务环节中产生了文本、图像、音视频等海量多源异构数据,高效地管理与利用这些数据资源为制造业再生产创造价值是当前制造企业面临的重大难题.传统的数据存储与检索系统将多模态数据按不同形式或模态进行分类并单独处理,导致不同模态的数据之间缺乏语义关联(文本、图像、音视频数据之间无法互检),无法支持制造企业的设计、服务等业务流程的智能化.设计并实现了一种面向文本、图片等多源异构数据的跨模态存储与检索系统,实现智能制造多源异构数据的高效管理与检索.具体地,该系统将制造企业生产运营过程中产生的多源异构数据投影到统一的高维语义空间进行表示产生语义向量,并按不同的查询需求将数据存储到不同的模式中;其次,该系统设计了三级结构+分层联通朴素构图算法的高效检索方法,将多源异构数据按照语义向量进行索引,以满足制造业用户的语义查询需求.在flickr30k数据集上进行了实验,实验结果表明:(1)该系统可支持百万级别的跨模态数据存储与检索;(2)百万级别数据下系统检索速率为毫秒级;(3)检索的正确率比现有的向量检索方法更高.

关键词: 多源异构数据, 跨模态检索, 相似搜索框架, 混合检索

Abstract:

The manufacturing industry produces massive multi?source heterogeneous data such as texts,images,audio and video in the process of design,production,sales and service. The major problem facing manufacturing companies is how to efficiently manage and use these data resources to create value for manufacturing reproduction. Traditional data storage and retrieval systems classify these data according to different forms or modalities and process them separately,resulting in a lack of correlation between cross?modal data (texts,images,audio and video data cannot be checked each other). It cannot support the problem of manufacturing business processes. In this paper,we design and implement an efficient and fast cross?modal retrieval system for multi?source heterogeneous data such as texts and pictures to realize efficient management and retrieval of multimodal data. Specifically,the system projects the these data into a unified high?dimensional semantic space for representation,generates semantic vectors,and stores the multi?source heterogeneous data in different modes according to different query requirements. Then,the system designs an efficient retrieval method of three?level structure + layered Unicom naive composition algorithm,and indexes the multimodal data according to the semantic vector to meet the semantic query needs of manufacturing users. We conduct experiments on the flickr30k dataset. Experimental results show that: (1) This system can support millions of data storage and retrieval. (2) With millions level data,the system retrieval rate is milliseconds. (3) The retrieval accuracy is higher than traditional vector retrieval methods.

Key words: multi?source heterogeneous data, cross?modal retrieval, similarity search framework, hybrid retrieval

中图分类号: 

  • TP181

图1

制造业跨模态查询场景示意图"

图2

MDSRS的架构图"

表1

数据库定义及其相关属性"

数据库类型定义属性

多模态

存储模式

存储图文等跨模态信息id,向量,名称,文本描述,图像url

事件

存储模式

存储一组有时序因果关系的跨模态数据id,向量,图文url,标题,对象,事件

图3

MDSRS的检索架构图"

图4

正排索引表"

图5

倒排索引表"

图6

HNSW算法的原理图"

图7

基于聚类的HNSW算法示意图"

图8

图像模态检索演示图"

图9

事件模式检索演示图"

图10

事件检索演示详情图"

表2

本文提出的MDSRS和MatConvNet的性能对比"

测试对象查准率查全率速率(s)
MDSRS92.32%90.75%0.025
MatConvNet90.16%86.78%0.53

表3

MDSRS与MatConvNet单个flickr30k数据集上的性能对比"

测试数目测试对象Recall@1Recall@3Recall@5AVEMR
100MDSRS90.06%90.59%90.91%90.52%0
100MatConvNet80.02%80.23%80.78%80.13%0
500MDSRS89.01%90.22%90.42%89.88%0
500MatConvNet80.21%80.31%80.72%80.41%0
1000MDSRS90.34%90.59%91.19%90.71%0
1000MatConvNet80.24%80.41%80.42%80.36%0

表4

测试用例类型"

名词类动词类
网球运动弹吉他
乐队吃饭
冲浪
篮球攀岩
橄榄球踢足球
儿童干杯
草坪山地骑行
海滩斗牛

表5

MDSRS与MatConvNet在新数据上的性能对比"

测试对象Recall@1Recall@2Recall@3
MDSRS87.47%89.18%89.98%
MatConvNet88.90%89.12%89.18%

表6

MDSRS对于局部文本数据的Recall@ k"

测试句/原句Recall@1Recall@2Recall@3
50%49.51%57.33%60.32%
60%65.50%70.12%73.24%
70%79.26%80.05%80.12%
80%86.23%87.43%88.31%
90%90.03%90.12%90.13%

表7

MatConvNet对于局部文本数据的Recall@ k"

测试句/原句Recall@1Recall@2Recall@3
50%27.32%30.69%31.45%
60%40.12%41.42%43.48%
70%54.12%59.32%57.47%
80%65.32%67.25%68.15%
90%79.21%80.12%80.21%
1 Lin X F, Gokturk B, Sumengen B,et al. Visual search engine for product images∥Proceedings Volume 6820,Multimedia Content Access:Algorithms and Systems Ⅱ. San Jose,CA,USA:SPIE,2008:68200M.
2 Simonyan K, Zisserman A. Very deep convolutional networks for large?scale image recognition. 2015,arXiv:.
3 Gao L L, Song J K, Zou F H,et al. Scalable multimedia retrieval by deep learning hashing with relative similarity learning∥Proceedings of the 23rd ACM International Conference on Multimedia. Brisbane,Australia:ACM,2015:903-906.
4 Ahmed G F, Barskar R. A study on different image retrieval techniques in image processing. International Journal of Soft Computing and Engineering20111(4):247-251.
5 Feng X Q, Wang Z W, Liu T C. Port container number recognition system based on improved YOLO and CRNN algorithm∥2020 International Conference on Artificial Intelligence and Electromechanical Automation. Tianjin,China:IEEE,2020:72-77.
6 Zhou N, Du J P, Xue Z,et al. Cross?modal search for social networks via adversarial learning. Computational Intelligence and Neuroscience2020,Article ID:7834953.
7 Jin L, Li K, Li Z C,et al. Deep semantic?preserving ordinal hashing for cross?modal similarity search. IEEE Transactions on Neural Networks and Learning Systems201930(5):1429-1440.
8 赵军,金千里,徐波. 面向文本检索的语义计算. 计算机学报200528(12):2068-2078.
Zhao J, Jin Q L, Xu B. Semantic computation for text retrieval. Chinese Journal of Computers200528(12):2068-2078.
9 Chen Q F, Sokolova M. Word2Vec and Doc2Vec in unsupervised sentiment analysis of clinical discharge summaries. 2018,arXiv:.
10 Chiru C, Rebedea T, Ciotec S. Comparison between LSA?LDA?lexical chains∥International Conference on Web Information Systems and Technologies. Barcelona,Spain:SciTePress,2014:255-262.
11 Rui Y, Huang T S, Ortega M,et al. Relevance feedback:A power tool for interactive content?based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology19988(5):644-655.
12 Lowe D G. Distinctive image features from scale?invariant keypoints. International Journal of Computer Vision200460(2):91-110.
13 张天,靳聪,帖云,等. 面向跨模态检索的音频数据库内容匹配方法研究. 信号处理36(6):966-976.
Zhang T, Jin C, Tie Y,et al. Research on content matching method of audio database for cross?modal retrieval. Journal of Signal Processing36(6):966-976.
14 Faghri F, Fleet D J, Kiros J R,et al. VSE++:Improving visual?semantic embeddings with hard negatives. 2018,arXiv:.
15 Zheng Z D, Zheng L, Garrett M,et al. Dual?path convolutional image?text embeddings with instance loss. 2021,arXiv:.
16 Huang Y, Wu Q, Song C F,et al. Learning semantic concepts and order for image and sentence matching∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City,UT,USA:IEEE,2018:6163-6171.
17 Lee K H, Xi C, Gang H,et al. Stacked cross attention for image?text matching∥The 15th European Conference on Computer Vision. Munich,Germany:Springer,2018:212-228.
18 Wu Y L, Wang S H, Song G L,et al. Learning fragment self-attention embeddings for image?text matching∥Proceedings of the 27th ACM International Conference on Multimedia. Nice,France:ACM,2019:2088-2096.
19 Malkov Y A, Yashunin D A. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence202042(4):824-836.
[1] 曾艺祥, 林耀进, 范凯钧, 曾伯儒. 基于层次类别邻域粗糙集的在线流特征选择算法[J]. 南京大学学报(自然科学版), 2022, 58(3): 506-518.
[2] 吴天宇, 王士同. 核化的多视角特权协同随机矢量功能链接网络及其增量学习方法[J]. 南京大学学报(自然科学版), 2022, 58(2): 275-285.
[3] 杨梅, 曾雯喜, 方宇, 闵帆. 多示例学习的两阶段实例选择和自适应包映射算法[J]. 南京大学学报(自然科学版), 2022, 58(1): 94-102.
[4] 卢舜, 林耀进, 吴镒潾, 包丰浩, 王晨曦. 基于多粒度一致性邻域的多标记特征选择[J]. 南京大学学报(自然科学版), 2022, 58(1): 60-70.
[5] 吕亚兰, 徐媛媛, 张恒汝. 一种可解释性泛化矩阵分解推荐算法[J]. 南京大学学报(自然科学版), 2022, 58(1): 135-142.
[6] 高菲, 杨柳, 李晖. 开放集识别研究综述[J]. 南京大学学报(自然科学版), 2022, 58(1): 115-134.
[7] 许国强, 余长州, 王林, 周春蕾, 高阳. 一种增强贝叶斯网络结构学习的自动变量序调整算法[J]. 南京大学学报(自然科学版), 2021, 57(2): 255-261.
[8] 周小亮, 吴东洋, 曹磊, 王玉鹏, 业宁. 基于修剪树的优化聚类中心算法[J]. 南京大学学报(自然科学版), 2021, 57(2): 167-176.
[9] 汪敏,赵飞,闵帆. 储层预测的代价敏感主动学习算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 561-569.
[10] 朱荀,刘国强,丁华平,沈庆宏. 一种通过支持向量机对交通拥堵情况进行分类的方法[J]. 南京大学学报(自然科学版), 2020, 56(2): 278-283.
[11] 王卫星,刘兆伟,石敬华. 基于时间敏感滑动窗口的CP⁃nets结构学习[J]. 南京大学学报(自然科学版), 2020, 56(2): 175-185.
[12] 信统昌,刘兆伟. 基于贝叶斯⁃遗传算法的多值无环CP⁃nets学习[J]. 南京大学学报(自然科学版), 2020, 56(1): 74-84.
[13] 郑文萍,刘韶倩,穆俊芳. 一种基于相对熵的随机游走相似性度量模型[J]. 南京大学学报(自然科学版), 2019, 55(6): 984-999.
[14] 黄华娟,韦修喜. 基于自适应调节极大熵的孪生支持向量回归机[J]. 南京大学学报(自然科学版), 2019, 55(6): 1030-1039.
[15] 刘 素, 刘惊雷. 基于特征选择的CP-nets结构学习[J]. 南京大学学报(自然科学版), 2019, 55(1): 14-28.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!