南京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (2): 422–.

• • 上一篇    下一篇

 结合均值漂移的多示例多标记学习改进算法

 王一宾1,2,程玉胜1,2*,裴根生1   

  • 出版日期:2018-03-31 发布日期:2018-03-31
  • 作者简介:1.安庆师范大学计算机与信息学院,安庆,246011;
    2.安徽省高校智能感知与计算重点实验室,安庆,246011
  • 基金资助:
     基金项目:安徽省高校重点科研项目(KJ2017A352),安徽省高校重点实验室基金(ACAIM160102)
    收稿日期:2017-12-07
    *通讯联系人,E-mail:chengyshaq@163.com

 Improved algorithm for multi-instance multi-label learning based on mean shift

 Wang Yibin1,2,Cheng Yusheng1,2*,Pei Gensheng1   

  • Online:2018-03-31 Published:2018-03-31
  • About author:1.School of Computer and Information,Anqing Normal University,Anqing,246011,China;
    2.The University Key Laboratory of Intelligent Perception and Computing of Anhui Province,Anqing,246011,China

摘要:  多示例多标记学习在多语义对象处理中克服了多示例学习和多标记学习的缺点,成功应用于文本分类、图像识别标注、基因数据分析等任务中. 其中基于退化策略的多示例多标记学习算法,多利用K-Medoids聚类将多示例多标记退化成单示例多标记,但此种退化方式过于简化多语义和复杂语义的对象,并未考虑示例间的相关性,导致退化过程中的信息削弱甚至丢失. 针对这一问题,提出了结合均值漂移的多示例多标记学习改进算法(Multi-Instance Multi-Label with Mean Shift,MIMLMS),将高斯核函数和权值加入均值漂移中. 权值的加入保证了示例之间的相关性得以保留,而将多示例集合加入高斯核函数就可利用核密度估计和梯度下降法求解退化过程最优解,最终以误差平方和为分类目标函数,建立多示例多标记分类模型. 算法在基准的多示例多标记测试数据集中的实验结果,验证了算法的良好分类效果及算法的有效性和可靠性.

Abstract:  In dealing with multiple semantic objects,multi-instance multi-label learning overcomes the shortcomings of the multi-instance learning and multi-label learning,which are applied to text classification,image recognition,analysis of genetic data and so on. There are two main degeneracy strategies based on the stage of degeneration. One of the degeneration strategies is to use the multi-label learning algorithm as a bridge. By using K-Medoids clustering algorithm,the multi-instance multi-label problem is transformed into a single-instance multi-label problem. And some state-of-the-art effects have been achieved through this degenerate strategy approach. At the same time,this approach simplifies the handling of multi-semantic and complex semantic objects. In this degradation strategy process,some important data information of multi-instance multi-label is weakened or even lost,which is due to a lack of consideration of the correlation between each instance. In order to overcome this drawback and replace K-Medoids,an improved algorithm is proposed for multi-instance multi-label learning based on mean shift clustering(MIMLMS). This MIMLMS algorithm is a degenerate method using the mean shift algorithm,which adds Gaussian kernel functions and weights. Therefore,the dependency relationship between each instance is obtained by adding instances weights. And the multi-instance bags are introduced into the Gaussian kernel function,which can use the kernel density estimation and gradient descent method to solve the optimal solution of the degradation strategy process. Finally,the multi-instance multi-label classification model is established by using the objective function of the sum of squared errors(SSE). The experimental results of the MIMLMS algorithm are given in the benchmark multi-instance multi-label test data set,which fully show that the algorithm has good classification performance,effectiveness and reliability.

 [1] Zhang M L,Zhou Z H. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering,2014,26(8):1819-1837.
[2] Foulds J,Frank E. A review of multi-instance learning assumptions. The Knowledge Engineering Review,2010,25(1):1-25.
[3] Zhou Z H,Zhang M L,Huang S J,et al. Multi-instance multi-label learning. Artificial Intelligence,2012,176(1):2291-2320.
[4] Zhou Z H,Zhang M L. Multi-instance multi-label learning with application to scene classification ∥ The 19th International Conference on Neural Information Processing Systems. Vancouver,Canada:MIT Press,2006:1609-1616.
[5] Zhang M L. A k-nearest neighbor based multi-instance multi-label learning algorithm ∥ The 22nd IEEE International Conference on Tools with Artificial Intelligence. Arras,France:IEEE,2010:207-212.
[6] Schlkopf B,Platt J,Hofmann T. Multi-instance multi-label learning with application to scene classification ∥ The 20th International Conference on Neural Information Processing Systems. Vancouver,Canada:MIT Press,2007:1609-1616.  
[7] Yan K B,Li Z X,Zhang C L. A new multi-instance multi-label learning approach for image and text classification. Multimedia Tools and Applications,2016,75(13):7875-7890.
[8] Xu X S,Jiang Y,Xue X Y,et al. Semi-supervised multi-instance multi-label learning for video annotation task ∥ The 20th ACM International Conference on Multimedia. Nara,Japan:ACM,2012:737-740.
[9] 张 钢,钟 灵,黄永慧. 一种病理图像自动标注的机器学习方法. 计算机研究与发展,2015,52(9):2135-2144. (Zhang G,Zhong L,Huang Y H. A machine learning method for histopathological image automatic annotation. Journal of Computer Research and Development,2015,52(9):2135-2144. )
[10] Xu Y H,Min H Q,Song H J,et al. Multi-instance multi-label distance metric learning for genome-wide protein function prediction. Computational Biology and Chemistry,2016,63:30-40.
[11] Li Y F,Hu J H,Jiang Y,et al. Towards discovering what patterns trigger what labels ∥ The 26th AAAI Conference on Artificial Intelligence. Toronto,Canada:AAAI Press,2012:1012-1018.
[12] Wu J S,Huang S J,Zhou Z H. Genome-wide protein function prediction through multi-instance multi-label learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics,2014,11(5):891-902.
[13] Yin Y,Zhao Y H,Li C G,et al. Improving multi-instance multi-label learning by extreme learning machine. Applied Sciences,2016,6(6):160.
[14] Fukunaga K,Hostetler L. The estimation of the gradient of a density function,with applications in pattern recognition. IEEE Transactions on Information Theory,1975,21(1):32-40.
[15] Cheng Y Z. Mean shift,mode seeking,and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence,1995,17(8):790-799.
[16] Arias-Castro E,Mason D,Pelletier B. On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. The Journal of Machine Learning Research,2016,17(1):1487-1514.
[17] Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys,2002,34(1):1-47.
[18] Tang B,He H B,Baggenstoss P M,et al. A Bayesian classification approach using class-specific features for text categorization. IEEE Transactions on Knowledge and Data Engineering,2016,28(6):1602-1606.
[19] Briggs F,Fern X Z,Raich R. Rank-loss support instance machines for MIML instance annotation ∥ The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Beijing,China:ACM,2012:534-542.
[20] Winn J,Criminisi A,Minka T. Object categorization by learned universal visual dictionary ∥ The 10th IEEE International Conference on Computer Vision. Beijing,China:IEEE,2005:1800-1807.
[21] 蔡亚萍,杨 明. 一种利用局部标记相关性的多标记特征选择算法. 南京大学学报(自然科学),2016,52(4):693-704. (Cai Y P,Yang M. A multi-label feature selection algorithm by exploiting label correlations locally. Journal of Nanjing University(Natural Sciences),2016,52(4):693-704. )
[22] Tsoumakas G,Spyromitros-Xioufis E,Vilcek J,et al. Mulan:A java library for multi-label learning. The Journal of Machine Learning Research,2011,12:2411-2414.
[23] Zhang Y L,Yang Y H. Cross-validation for selecting a model selection procedure. Journal of Econometrics,2015,187(1):95-112.
[24] Zhang M L,Zhou Z H. ML-KNN:A lazy learning approach to multi-label learning. Pattern Recognition,2007,40(7):2038-2048.
[25] Zhang M L,Pea J M,Robles V. Feature selection for multi-label naive Bayes classification. Information Sciences,2009,179(19):3218-3229.
[26] Elisseeff A,Weston J. A kernel method for multi-labelled classification ∥ The 14th Interna-tional Conference on Neural Information Processing Systems:Natural and Synthetic. Vancouver,Canada:MIT Press,2001:681-687.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!