南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (6): 1023–.

• • 上一篇    下一篇

 基于可变最小贝叶斯风险的层次多标签分类方法

 徐智康1,李 旸1,李德玉1,2*   

  • 出版日期:2017-11-26 发布日期:2017-11-26
  • 作者简介:1.山西大学计算机与信息技术学院,太原,030006;
    2.山西大学计算智能与中文信息处理教育部重点实验室,太原,030006
  • 基金资助:
     基金项目:国家自然科学基金(61632011,61272095,61432011,U1435212,61573231,61672331)
    收稿日期:2017-07-30
    *通讯联系人,E-mail:lidy@sxu.edu.cn

 A method of hierarchical multilabel classification based on variable minimum Bayes risk

 Xu Zhikang1,Li Yang1,Li Deyu1,2*   

  • Online:2017-11-26 Published:2017-11-26
  • About author:1.School of Computer & Information Technology,Shanxi University,Taiyuan,030006,China;
    2.Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan,030006,China

摘要:  层次多标签分类方法,依据标签之间的相关性组织成层次结构,并将这种层次结构作为一种监督信息,从而更好地解决多标签分类问题.在层次多标签分类问题中常用的方法有两种,一种可称为损失无关方法,另一种可称为损失敏感方法.对于损失敏感方法,常用的损失函数有HMC-loss,该损失函数可对假正和假负两种错误给予不同的权重,并将层次信息添加到损失函数当中.当利用HMC-loss预测时,尽管得到的损失值是理想的,但实际预测的标签数却远多于真实的标签数.另外,层次信息的引入会对标签结点的决策顺序产生不利影响.针对这些问题,首先提出改进的损失函数IMH-loss,其次使用贝叶斯决策理论,提出了一种贝叶斯风险随决策过程可变的层次多标签分类方法.在真实数据集上的实验结果表明,该方法在保证召回率的同时,提升了标签预测精度.

Abstract:  Hierarchical multilabel classification(HMC) method organizes labels into a hierarchical structure based on the correlation among the labels which can be as a kind of supervised information,so that to better solve the multilabel classification problem.There are two commonly used methods in hierarchical multilabel classification problem.One can be called loss independent method,which does not use any loss function in training model and prediction process.The other is called loss sensitive method.For loss sensitive method,a frequently-used loss function in HMC is HMC-loss,which assigns two kinds of errors of false positive and false negative with different weights.At the same time,the hierarchical information is added to the loss function according to the location in the hierarchy.In the prediction process by using HMC-loss,although the loss value is ideal,the number of predicted positive labels are far more than the actual label number.In addition,introducing hierarchy information into HMC-loss may have a negative effect to the decision order of label nodes.To solve these problems,we firstly propose an improved loss function IMH-loss(Improved Hierarchical loss) which deletes the hierarchical information so that the decision order of the nodes is guaranteed.By using Bayesian decision theory,we then propose a hierarchical multilabel classification method which can change Bayes risk along with the decision process.The experimental results on some real-world data sets show that the presented method can improve the predicted accuracy of labels while ensuring the recall rate and the prediction results is closer to the real results.

 [1] Huang S,Peng W,Li J X,et al.Sentiment and topic analysis on social media:A multi-task multi-label classification approach.In:Proceedings of the 5th Annual ACM Web Science Conference.Paris,Franc:ACM,2013:172-181.
[2] Liu S M,Chen J H.A multi-label classification based approach for sentiment classification.Expert Systems with Applications,2015,42(3):1083-1093.
[3] Zhang B,Wang Y,Chen F.Multilabel image classification via high-order label correlation driven active learning.IEEE Transactions on Image Processing,2014,23(3):1430-1441.
[4] Cesa-Bianchi N,Valentini G.Hierarchical cost-sensitive algorithms for genome-wide gene function prediction.In:Proceedings of the third International Workshop on Machine Learning in Systems Biology.Ljubljana,Slorenia:PMLR,2009:14-29.
[5] Silla C N Jr,Freitas A.A survey of hierarchical classification across different application domains.Data Mining and Knowledge Discovery,2011,22(1-2):31-72.
[6] Punera K,Rajan S,Ghosh J.Automatically learning document taxonomies for hierarchical classification.In:Special Interest Tracks and Posters of the 14th International Conference on World Wide Web.Chiba,Japan:ACM,2005:1010-1011.
[7] Wu Q Y,Ye Y M,Zhang H J,et al.ML-TREE:A tree-structure-based approach to multilabel learning.IEEE Transactions on Neural Networks and Learning Systems,2015,26(3):430-443.
[8] Bi W,Kwok J T.Multi-label classification on tree-and DAG-structured hierarchies.In:Proceedings of the 28th International Conference on Machine Learning.Bellevue,Washington,USA:Omni Press,http://www.omnipress.com/ 2011:17-24.
[9] Cesa-Bianchi N,Gentile C,Zaniboni L.Hierarchical classification:Combining Bayes with SVM.In:Proceedings of the 23rd International Conference on Machine Learning.Pittsburgh,PA,USA:ACM,2006:177-184.
[10] Bi W,Kwok J T.Hierarchical multilabel classification with minimum Bayes risk.In:Proceedings of the 12th IEEE International Conference on Data Mining.Brussels,Belgium:IEEE,2012:101-110.
[11] Bi W,Kwok J T.Bayes-optimal hierarchical multilabel classification.IEEE Transactions on Knowledge and Data Engineering,2015,27(11):2907-2918.
[12] Hariharan R,Zelnik-Manor L,Vishwanathan S V N,et al.Large scale max-margin multi-label classification with priors.In:Proceedings of the 27th International Conference on Machine Learning.Haifa,Israel:ACM,2010:423-430.
[13] Zhang M L,Zhou Z H.A review on multi-label learning algorithms.IEEE Transactions on Knowledge and Data Engineering,2014,26(8):1819-1831.
[14] Cesa-Bianchi N,Gentile C,Zaniboni L.Incremental algorithms for hierarchical classification.The Journal of Machine Learning Research,2006,7:31-54.
[15] Zaragoza J,Sucar L,Morales E.Bayesian chain classifiers for multidimensional classification.In:Proceedings of the 22nd International Joint Conference on Artificial Intelligence.Barcelona,Spain:AAAI Press,2011:2192-2197.
[16] Platt J C.Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods.In:Advances in Large Margin Classifiers.Cambridge,MA,USA:MIT Press,1999,61-74.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!