南京大学学报(自然科学版) ›› 2022, Vol. 58 ›› Issue (1): 94–102.doi: 10.13232/j.cnki.jnju.2022.01.010

• • 上一篇    下一篇

多示例学习的两阶段实例选择和自适应包映射算法

杨梅1, 曾雯喜1, 方宇1, 闵帆1,2()   

  1. 1.西南石油大学计算机科学学院,成都,610500
    2.西南石油大学人工智能研究院,成都,610500
  • 收稿日期:2021-06-28 出版日期:2022-01-30 发布日期:2022-02-22
  • 通讯作者: 闵帆 E-mail:minfan@swpu.edu.cn
  • 作者简介:E⁃mail:minfan@swpu.edu.cn
  • 基金资助:
    国家自然科学基金(62006200);四川省自然科学基金(2019YJ0314);四川省青年科学技术创新团队(2019JDTD0017);西南石油大学研究生全英文课程建设项目(2020QY04)

Two⁃stage instance selection and adaptive bag mapping algorithm for multi⁃instance learning

Mei Yang1, Wenxi Zeng1, Yu Fang1, Fan Min1,2()   

  1. 1.School of Computer Science, Southwest Petroleum University, Chengdu, 610500, China
    2.Institute for Artificial Intelligence, Southwest Petroleum University, Chengdu, 610500, China
  • Received:2021-06-28 Online:2022-01-30 Published:2022-02-22
  • Contact: Fan Min E-mail:minfan@swpu.edu.cn

摘要:

多示例学习(Multi?Instance Learning,MIL)研究对象的内部结构比单示例学习更加复杂.已有的MIL方法大都基于原始空间中的实例进行包映射,但这些方法通常忽略包的内部结构信息,难以保证所选实例与包在新特征空间中的关联性.提出一种多示例学习的两阶段实例选择和自适应包映射(TAMI)算法.首先,实例选择技术根据包中实例的密度值和关联性,挖掘包内结构特征,选取实例原型;其次,实例选择技术选取具有峰值密度的实例原型作为代表实例;最后,自适应包映射技术通过定义新的映射函数将包转换为单向量进行学习.实验利用显著性检验从统计学的角度验证了TAMI在图像检索、文本分类等基本数据集上的有效性.结果表明,TAMI在图像检索和医学图像数据集上取得了比其他MIL算法更好的效果,并在文本分类数据集上表现良好.

关键词: 自适应映射, 关联性, 密度, 实例选择, 多示例学习

Abstract:

Compared with single?instance learning,multi?instance learning (MIL) has a more complex internal structure of its research objects. Most of the existing MIL methods map bags based on instances in the original space. They hardly consider the internal structure information of the bags. It is difficult to guarantee the affinity between the selected instance and the bag in the new feature space. In this paper,we propose a two?stage instance selection and adaptive bag mapping algorithm for multi?instance learning (TAMI) to handle this issue. Firstly,the first?stage instance selection technique excavates structural features and selects instance prototypes based on the density and affinity of the instances in the bag. Secondly,the second?stage instance selection technique chooses instance prototypes with the peak density as representatives. Finally,the new adaptive bag mapping technique converts each bag into a single vector. Experiment verifiy the effectiveness of TAMI on the basic dataset from a statistical point of view. The results show that TAMI has achieved better results than other MIL algorithms on image retrieval and medical image datasets,and it performs well on text classification datasets.

Key words: adaptive mapping, affinity, density, instance selection, multi?instance learning

中图分类号: 

  • TP181

图 1

TAMI算法的运行实例"

图 2

实例密度图"

图 3

实例关联图"

图4

决策图"

图5

自适应包映射技术示意图"

表 1

实验使用的数据集信息"

NameBagInstanceAttributeClass
Elephant20013912302
Fox20013202302
Tiger20012202302
Messidor1200123526872
Ucsb_breast5820027082
Newsgroups20008013720020

表2

图像检索和医学图像数据集上不同算法的平均准确率"

DatasetsSimple?MIBamicmiFVmiVLADMILDMTAMI
Elephant82.4±0.9082.9±1.5784.3±1.0884.7±1.1884.2±1.2589.6±1.30
Fox54.0±1.5059.8±2.0860.4±1.2063.3±2.2162.9±2.3064.1±2.10
Tiger80.5±0.8880.6±2.2376.5±1.2184.9±0.5280.1±2.0986.0±1.04
Messidor61.5±0.4862.7±0.6969.4±0.5567.9±0.2457.97±1.7180.7±0.79
Ucsb_breast76.1±1.7976.2±3.1687.0±2.5781.2±2.5674.80±4.2483.6±2.65

表3

文本分类数据集上不同算法的平均准确率"

DatasetsSimple?MIBamicmiFVmiVLADMILDMTAMI
alt.atheism59.6±0.8067.3±1.2782.4±0.8085.1±2.1755.5±3.0388.2±0.87
comp.graphics52.1±1.3080.6±0.6680.4±1.0279.2±1.6651.1±1.8582.9±0.94
comp.os.ms50.2±1.2559.5±1.0273.5±1.6968.6±2.2048.3±3.9571.4±1.28
comp.sys.ibm55.3±1.1974.6±0.6678.8±1.8980.7±1.6250.4±1.7879.6±1.02
comp.sys.mac52.1±2.0275.6±0.6678.1±1.2278.2±2.5644.3±2.5879.8±0.87
comp.window.x61.2±1.0874.1±0.8384.8±1.6681.4±1.5055.0±2.9484.3±1.49
misc.forsale54.7±2.4963.5±1.7573.7±1.8572.1±2.5947.0±4.6468.9±3.18
rec.autos54.4±0.9272.7±0.6478.8±1.1781.3±2.5746.4±4.7478.2±1.66
rec.motorcycles54.4±1.1152.0±3.7986.6±1.2881.4±1.2057.3±4.2783.4±1.69
rec.sport.baseball56.5±1.0278.1±0.5484.7±1.1982.8±1.3351.1±1.6680.0±1.00
rec.sport.hockey61.2±0.7582.4±0.8087.4±1.5089.8±1.4045.7±3.0690.2±1.72
sci.crypt60.7±0.7868.0±1.7376.1±1.3782.2±2.3251.2±5.4583.2±2.18
sci.electronics53.0±0.0092.0±0.0092.7±0.7892.3±0.7852.6±1.2693.9±0.70
sci.med59.9±1.2279.6±0.8084.3±1.4282.6±2.5450.5±2.6883.4±1.50
sci.religion60.6±1.5675.8±0.9880.5±1.1279.6±1.3652.7±4.1479.2±2.36
sci.space52.8±0.4077.4±0.4987.3±1.0085.0±1.1049.3±3.7186.9±2.07
talk.politics.guns52.8±0.4075.3±1.7377.9±1.1481.4±1.2848.1±2.3378.3±2.15
talk.politics.mideast64.7±0.9074.3±1.1079.1±1.2283.9±1.0446.0±3.2782.5±0.67
talk.politics.misc65.5±2.5458.4±2.7373.7±1.7976.5±2.2957.4±3.3482.1±1.76
talk.religion.misc59.5±0.6768.0±0.7775.3±1.7379.6±2.2051.5±2.8472.8±3.03

表4

六种算法在三类数据集上的平均排名"

DatasetsSimple?MIBamicmiFVmiVLADMILDMTAMI
平均等级5.14.132.642.385.31.45
图像检索5.34.34.3241
医学图像541.5361.5
文本分类4.954.052.12.155.91.85

图6

六种算法的Friedman检验图"

1 Dietterich T G,Lathrop R H,Lozano?Pérez T. Solving the multiple instance problem with axis?parallel rectangles. Artificial Intelligence,1997,89(1-2):31-71.
2 Maron O,Ratan A L. Multiple?instance learning for natural scene classification∥Proceedings of the 15th International Conference on Machine Learning. San Francisco,CA,USA:IEEE,1998:341-349.
3 Song X F,Jiao L C,Yang S Y,et al. Sparse coding and classifier ensemble based multi?instance learning for image categorization. Signal Processing,2013,93(1):1-11.
4 Wei X S,Ye H J,Mu X,et al. Multi?instance learning with emerging novel class. IEEE Transactions on Knowledge and Data Engineering,2019,33(5):2109-2120.
5 Zhu L,Zhao B,Gao Y. Multi?class multi?instance learning for lung cancer image classification based on bag feature selection∥2008 5th International Conference on Fuzzy Systems and Knowledge Discovery. Ji'nan,China:IEEE,2008:487-492.
6 Wang Z Y,Poon J,Sun S D,et al. Attention?based multi?instance neural network for medical diagnosis from incomplete and low quality data∥2019 International Joint Conference on Neural Networks. Budapest,Hungary:IEEE,2019:1-8.
7 Andrews S,Tsochantaridis I,Hofmann T. Support vector machines for multiple?instance learning∥Proceedings of the 15th International Conference on Neural Information Processing Systems. Vancouver,Canada:MIT Press,2002:561-568.
8 Zhou Z H,Sun Y Y,Li Y F. Multi?instance learning by treating instances as non?I.I.D. samples∥Proceedings of the 26th Annual International Conference on Machine Learning. New York,NY,USA:ACM,2009:1249-1256.
9 Angelidis S,Lapata M. Multiple instance learning networks for fine?grained sentiment analysis. Transactions of the Association for Computational Linguistics,2018(6):17-31.
10 Zhang D,He J R,Lawrence R. MI2LS:Multi?instance learning from multiple informationsources∥Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York,NY,USA:ACM,2013:149-157.
11 Zhou Z H,Zhang M L. Solving multi?instance problems with classifier ensemble based on constructive clustering. Knowledge and Information Systems,2007,11(2):155-170.
12 Chen Y X,Bi J B,Wang J Z. MILES:Multiple?instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(12):1931-1947.
13 Hong C,Wang M,Gao Y,et al. Image annotation by multiple?instance learning with discriminative feature mapping and selection. IEEE Transactions on Cybernetics,2014,44(5):669-680.
14 Rodriguez A,Laio A. Clustering by fast search and find of density peaks. Science,2014,344(6191):1492-1496.
15 Amores J. Multiple instance classification:Review,taxonomy and comparative study. Artificial Intelligence,2013(201):81-105.
16 Zhang M l,Zhou Z H. Multi?instance clustering with applications to multi?instance prediction. Applied Intelligence,2009,31(1):47-68.
17 Wei X S,Wu J X,Zhou Z H. Scalable multi?instance learning∥2014 IEEE International Conference on Data Mining. Shenzhen,China:IEEE,2014:1037-1042.
18 Xu B C,Ting K M,Zhou Z H. Isolation set?kernel and its application to multi?instance learning∥Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York,NY,USA:ACM,2019:941-949.
19 Wei X S,Wu X J,Zhou Z H. Scalable algorithms for multi?instance learning. IEEE Transactions on Neural Networks and Learning Systems,2017,28(4):975-987.
20 Wu J,Pan S R,Zhu X Q,et al. Multi?instance learning with discriminative bag mapping. IEEE Transactions on Knowledge and Data Engineering,2018,30(6):1065-1080.
21 Zhang Y L,Zhou Z H. Multi?instance learning with key instance shift∥Proceedings of the 26th International Joint Conference on Artificial Intelligence Main Track. Melbourne,Australia:IJCAI,2017:3441-3447.
22 Sánchez J,Perronnin F,Mensink T,et al. Image classification with the fisher vector:Theory and practice. International Journal of Computer Vision,2013,105(3):222-245.
23 Decencière E,Zhang X W,Cazuguel G,et al. Feedback on a publicly distributed image database:The messidor database. Image Analysis & Stereology,2014,33(3):231.
24 Kandemir M,Hamprecht F A. Computer?aided diagnosis from weak supervision:A benchmarking study. Computerized Medical Imaging and Graphics,2015(42):44-50.
25 Dem?ar J. Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research,2006(7):1-30.
[1] 李娜, 段友祥, 孙歧峰, 沈楠. 一种基于样本点距离突变的聚类方法[J]. 南京大学学报(自然科学版), 2021, 57(5): 775-784.
[2] 林 銮,陆武萍,唐朝生,赵红崴,冷 挺,李胜杰. 基于计算机图像处理技术的松散砂性土微观结构定量分析方法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1064-1074.
[3] 朱庆峰1,2,葛洪伟1,2*. 快速特征映射优化的流形密度峰聚类[J]. 南京大学学报(自然科学版), 2018, 54(4): 838-.
[4]  王一宾1,2,程玉胜1,2*,裴根生1.  结合均值漂移的多示例多标记学习改进算法[J]. 南京大学学报(自然科学版), 2018, 54(2): 422-.
[5] 杨 洁1,2,王国胤1*,庞紫玲1. 密度峰值聚类相关问题的研究[J]. 南京大学学报(自然科学版), 2017, 53(4): 791-.
[6]  曹蕴清1,2* ,曾祥华1,季 阳2,翟颖颖2,李 伟2.  激光晶化制备硅量子点/碳化硅多层膜p­i­n结构的光伏特性探索[J]. 南京大学学报(自然科学版), 2017, 53(3): 399-.
[7]  张栋冰*.  基于TOPHAT­PCNN的图像车辆目标检测方法研究[J]. 南京大学学报(自然科学版), 2017, 53(3): 590-.
[8] 贾培灵1,樊建聪1,2*,彭延军1,2. 一种基于簇边界的密度峰值点快速搜索聚类算法[J]. 南京大学学报(自然科学版), 2017, 53(2): 368-.
[9] 蓝 君1,李义丰1,2*. 密度为零的零折射率声学超材料研究[J]. 南京大学学报(自然科学版), 2017, 53(1): 69-.
[10] 汪 璐,贾修一*,顾雁囡. 三支决策贝叶斯网络分类器[J]. 南京大学学报(自然科学版), 2016, 52(5): 833-.
[11] 谢娟英*,屈亚楠,王明钊 . 基于密度峰值的无监督特征选择算法[J]. 南京大学学报(自然科学版), 2016, 52(4): 735-.
[12]  赵 洁1,2,林 锦2,吴剑锋1*,吴吉春1.  大连周水子地区海水入侵数值模型[J]. 南京大学学报(自然科学版), 2016, 52(3): 479-489.
[13] 白莹12,薛山3,鲁善海4,朱愿福12,李荣富12, 阮晓红12*. 沙颍河流域平原区土壤氮空间分布特征及影响因素研究[J]. 南京大学学报(自然科学版), 2016, 52(1): 65-76.
[14] 陈妮1,2,3,冯学智1,2,3*,肖鹏峰1,2,3, 贺广均1,2,3,4. 玛纳斯河流域春季雪层参数特性分析[J]. 南京大学学报(自然科学版), 2015, 51(5): 936-943.
[15]  廖 娟 1* , 王 江 1 , 徐 亮 2 , 李 勃 1 , 陈启美 1
.  相机抖动场景下的运动前景检测算法 

[J]. 南京大学学报(自然科学版), 2015, 51(2): 219-226.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!