面向多源异构数据的跨模态存储与检索系统

doi:10.13232/j.cnki.jnju.2022.03.002

南京大学学报(自然科学版) ›› 2022, Vol. 58 ›› Issue (3): 377–385.doi: 10.13232/j.cnki.jnju.2022.03.002

面向多源异构数据的跨模态存储与检索系统

孔亚宁, 李春山(), 初佃辉

哈尔滨工业大学计算机科学与技术学院，威海，264209

收稿日期:2022-04-19 出版日期:2022-05-30 发布日期:2022-06-07
通讯作者: 李春山 E-mail:lics@hit.edu.cn
基金资助:
国家重点研发计划(2018YFB1700400);国家自然科学基金(61902090);山东省重大科技创新工程(2021ZLGX05)

Cross modal storage and retrieval system for multi⁃source heterogeneous data

Yaning Kong, Chunshan Li(), Dianhui Chu

School of Computer Science and Technology，Harbin Institute of Technology，Weihai，264209，China

Received:2022-04-19 Online:2022-05-30 Published:2022-06-07
Contact: Chunshan Li E-mail:lics@hit.edu.cn

摘要/Abstract

摘要：

制造业在设计、生产、销售和服务环节中产生了文本、图像、音视频等海量多源异构数据，高效地管理与利用这些数据资源为制造业再生产创造价值是当前制造企业面临的重大难题.传统的数据存储与检索系统将多模态数据按不同形式或模态进行分类并单独处理，导致不同模态的数据之间缺乏语义关联（文本、图像、音视频数据之间无法互检），无法支持制造企业的设计、服务等业务流程的智能化.设计并实现了一种面向文本、图片等多源异构数据的跨模态存储与检索系统，实现智能制造多源异构数据的高效管理与检索.具体地，该系统将制造企业生产运营过程中产生的多源异构数据投影到统一的高维语义空间进行表示产生语义向量，并按不同的查询需求将数据存储到不同的模式中；其次，该系统设计了三级结构+分层联通朴素构图算法的高效检索方法，将多源异构数据按照语义向量进行索引，以满足制造业用户的语义查询需求.在flickr30k数据集上进行了实验，实验结果表明：（1）该系统可支持百万级别的跨模态数据存储与检索；（2）百万级别数据下系统检索速率为毫秒级；（3）检索的正确率比现有的向量检索方法更高.

关键词: 多源异构数据, 跨模态检索, 相似搜索框架, 混合检索

Abstract:

The manufacturing industry produces massive multi?source heterogeneous data such as texts，images，audio and video in the process of design，production，sales and service. The major problem facing manufacturing companies is how to efficiently manage and use these data resources to create value for manufacturing reproduction. Traditional data storage and retrieval systems classify these data according to different forms or modalities and process them separately，resulting in a lack of correlation between cross?modal data (texts，images，audio and video data cannot be checked each other). It cannot support the problem of manufacturing business processes. In this paper，we design and implement an efficient and fast cross?modal retrieval system for multi?source heterogeneous data such as texts and pictures to realize efficient management and retrieval of multimodal data. Specifically，the system projects the these data into a unified high?dimensional semantic space for representation，generates semantic vectors，and stores the multi?source heterogeneous data in different modes according to different query requirements. Then，the system designs an efficient retrieval method of three?level structure + layered Unicom naive composition algorithm，and indexes the multimodal data according to the semantic vector to meet the semantic query needs of manufacturing users. We conduct experiments on the flickr30k dataset. Experimental results show that: (1) This system can support millions of data storage and retrieval. (2) With millions level data，the system retrieval rate is milliseconds. (3) The retrieval accuracy is higher than traditional vector retrieval methods.

Key words: multi?source heterogeneous data, cross?modal retrieval, similarity search framework, hybrid retrieval

中图分类号:

TP181

孔亚宁, 李春山, 初佃辉. 面向多源异构数据的跨模态存储与检索系统[J]. 南京大学学报(自然科学版), 2022, 58(3): 377–385.

Yaning Kong, Chunshan Li, Dianhui Chu. Cross modal storage and retrieval system for multi⁃source heterogeneous data[J]. Journal of Nanjing University(Natural Sciences), 2022, 58(3): 377–385.

图/表 17

图1

图2

表1

图3

图4

图5

图6

图7

图8

图9

图10

表2

表3

表4

表5

表6

表7

参考文献 19

1	Lin X F， Gokturk B， Sumengen B，et al. Visual search engine for product images∥Proceedings Volume 6820，Multimedia Content Access：Algorithms and Systems Ⅱ. San Jose，CA，USA：SPIE，2008：68200M.
2	Simonyan K， Zisserman A. Very deep convolutional networks for large?scale image recognition. 2015，arXiv：.
3	Gao L L， Song J K， Zou F H，et al. Scalable multimedia retrieval by deep learning hashing with relative similarity learning∥Proceedings of the 23rd ACM International Conference on Multimedia. Brisbane，Australia：ACM，2015：903-906.
4	Ahmed G F， Barskar R. A study on different image retrieval techniques in image processing. International Journal of Soft Computing and Engineering，2011，1(4)：247-251.
5	Feng X Q， Wang Z W， Liu T C. Port container number recognition system based on improved YOLO and CRNN algorithm∥2020 International Conference on Artificial Intelligence and Electromechanical Automation. Tianjin，China：IEEE，2020：72-77.
6	Zhou N， Du J P， Xue Z，et al. Cross?modal search for social networks via adversarial learning. Computational Intelligence and Neuroscience，2020，Article ID：7834953.
7	Jin L， Li K， Li Z C，et al. Deep semantic?preserving ordinal hashing for cross?modal similarity search. IEEE Transactions on Neural Networks and Learning Systems，2019，30(5)：1429-1440.
8	赵军，金千里，徐波. 面向文本检索的语义计算. 计算机学报，2005，28(12)：2068-2078.
	Zhao J， Jin Q L， Xu B. Semantic computation for text retrieval. Chinese Journal of Computers，2005，28(12)：2068-2078.
9	Chen Q F， Sokolova M. Word2Vec and Doc2Vec in unsupervised sentiment analysis of clinical discharge summaries. 2018,arXiv:.
10	Chiru C， Rebedea T， Ciotec S. Comparison between LSA?LDA?lexical chains∥International Conference on Web Information Systems and Technologies. Barcelona，Spain：SciTePress，2014：255-262.
11	Rui Y， Huang T S， Ortega M，et al. Relevance feedback：A power tool for interactive content?based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology，1998，8(5)：644-655.
12	Lowe D G. Distinctive image features from scale?invariant keypoints. International Journal of Computer Vision，2004，60(2)：91-110.
13	张天，靳聪，帖云，等. 面向跨模态检索的音频数据库内容匹配方法研究. 信号处理，36(6)：966-976.
	Zhang T， Jin C， Tie Y，et al. Research on content matching method of audio database for cross?modal retrieval. Journal of Signal Processing，36(6)：966-976.
14	Faghri F， Fleet D J， Kiros J R，et al. VSE++：Improving visual?semantic embeddings with hard negatives. 2018，arXiv：.
15	Zheng Z D， Zheng L， Garrett M，et al. Dual?path convolutional image?text embeddings with instance loss. 2021,arXiv：.
16	Huang Y， Wu Q， Song C F，et al. Learning semantic concepts and order for image and sentence matching∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City，UT，USA：IEEE，2018：6163-6171.
17	Lee K H， Xi C， Gang H，et al. Stacked cross attention for image?text matching∥The 15^th European Conference on Computer Vision. Munich，Germany：Springer，2018：212-228.
18	Wu Y L， Wang S H， Song G L，et al. Learning fragment self-attention embeddings for image?text matching∥Proceedings of the 27th ACM International Conference on Multimedia. Nice，France：ACM，2019：2088-2096.
19	Malkov Y A， Yashunin D A. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence，2020，42(4)：824-836.

相关文章 15

[1]	曾艺祥, 林耀进, 范凯钧, 曾伯儒. 基于层次类别邻域粗糙集的在线流特征选择算法[J]. 南京大学学报(自然科学版), 2022, 58(3): 506-518.
[2]	吴天宇, 王士同. 核化的多视角特权协同随机矢量功能链接网络及其增量学习方法[J]. 南京大学学报(自然科学版), 2022, 58(2): 275-285.
[3]	杨梅, 曾雯喜, 方宇, 闵帆. 多示例学习的两阶段实例选择和自适应包映射算法[J]. 南京大学学报(自然科学版), 2022, 58(1): 94-102.
[4]	卢舜, 林耀进, 吴镒潾, 包丰浩, 王晨曦. 基于多粒度一致性邻域的多标记特征选择[J]. 南京大学学报(自然科学版), 2022, 58(1): 60-70.
[5]	吕亚兰, 徐媛媛, 张恒汝. 一种可解释性泛化矩阵分解推荐算法[J]. 南京大学学报(自然科学版), 2022, 58(1): 135-142.
[6]	高菲, 杨柳, 李晖. 开放集识别研究综述[J]. 南京大学学报(自然科学版), 2022, 58(1): 115-134.
[7]	许国强, 余长州, 王林, 周春蕾, 高阳. 一种增强贝叶斯网络结构学习的自动变量序调整算法[J]. 南京大学学报(自然科学版), 2021, 57(2): 255-261.
[8]	周小亮, 吴东洋, 曹磊, 王玉鹏, 业宁. 基于修剪树的优化聚类中心算法[J]. 南京大学学报(自然科学版), 2021, 57(2): 167-176.
[9]	汪敏,赵飞,闵帆. 储层预测的代价敏感主动学习算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 561-569.
[10]	朱荀,刘国强,丁华平,沈庆宏. 一种通过支持向量机对交通拥堵情况进行分类的方法[J]. 南京大学学报(自然科学版), 2020, 56(2): 278-283.
[11]	王卫星,刘兆伟,石敬华. 基于时间敏感滑动窗口的CP⁃nets结构学习[J]. 南京大学学报(自然科学版), 2020, 56(2): 175-185.
[12]	信统昌,刘兆伟. 基于贝叶斯⁃遗传算法的多值无环CP⁃nets学习[J]. 南京大学学报(自然科学版), 2020, 56(1): 74-84.
[13]	郑文萍,刘韶倩,穆俊芳. 一种基于相对熵的随机游走相似性度量模型[J]. 南京大学学报(自然科学版), 2019, 55(6): 984-999.
[14]	黄华娟,韦修喜. 基于自适应调节极大熵的孪生支持向量回归机[J]. 南京大学学报(自然科学版), 2019, 55(6): 1030-1039.
[15]	刘　素, 刘惊雷. 基于特征选择的CP－nets结构学习[J]. 南京大学学报(自然科学版), 2019, 55(1): 14-28.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

测试对象	查准率	查全率	速率（s）
MDSRS	92.32%	90.75%	0.025
MatConvNet	90.16%	86.78%	0.53

测试数目	测试对象	Recall@1	Recall@3	Recall@5	AVE
100	MDSRS	90.06%	90.59%	90.91%	90.52%
100	MatConvNet	80.02%	80.23%	80.78%	80.13%
500	MDSRS	89.01%	90.22%	90.42%	89.88%
500	MatConvNet	80.21%	80.31%	80.72%	80.41%
1000	MDSRS	90.34%	90.59%	91.19%	90.71%
1000	MatConvNet	80.24%	80.41%	80.42%	80.36%

名词类	动词类
网球运动	弹吉他
乐队	吃饭
狗	冲浪
篮球	攀岩
橄榄球	踢足球
儿童	干杯
草坪	山地骑行
海滩	斗牛

测试对象	Recall@1	Recall@2	Recall@3
MDSRS	87.47%	89.18%	89.98%
MatConvNet	88.90%	89.12%	89.18%

测试句/原句	Recall@1	Recall@2	Recall@3
50%	49.51%	57.33%	60.32%
60%	65.50%	70.12%	73.24%
70%	79.26%	80.05%	80.12%
80%	86.23%	87.43%	88.31%
90%	90.03%	90.12%	90.13%

面向多源异构数据的跨模态存储与检索系统

Cross modal storage and retrieval system for multi⁃source heterogeneous data

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 19

相关文章 15

Metrics

本文评价

推荐阅读 0

测试句/原句	Recall@1	Recall@2	Recall@3
50%	27.32%	30.69%	31.45%
60%	40.12%	41.42%	43.48%
70%	54.12%	59.32%	57.47%
80%	65.32%	67.25%	68.15%
90%	79.21%	80.12%	80.21%