南京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (2): 270–.

• • 上一篇    下一篇

面向多标记学习的局部粗糙集

梁新彦1,2,钱宇华1,2*,郭 倩2,成红红1,2   

  • 出版日期:2016-03-26 发布日期:2016-03-26
  • 作者简介: 1.计算智能与中文信息处理教育部重点实验室,太原,030006;2.山西大学计算机与信息技术学院,太原,030006
  • 基金资助:
    基金项目:国家优秀青年科学基金(61322211),教育部新世纪人才支持计划(NCET-12-1031),教育部博士点专项科研基金(20121401110013),山西省青年学术带头人(20120301)
    收稿日期:2015-09-17
    *通讯联系人,E­mail:jinchengqyh@sxu.edu.cn

Local rough sets for mutil­label learning

Liang Xinyan1,2,Qian Yuhua1,2*,Guo Qian2,Cheng Honghong1,2   

  • Online:2016-03-26 Published:2016-03-26
  • About author: 1.Key Laboratory for Computation Intelligence and Chinese Information Processing of Ministry of Education,Taiyuan,030006,China;2.School of Computer and Information Technology,Shanxi University,Taiyuan,030006,China

摘要: 多标记学习研究的是一个对象同时具有多个标记的一类复杂问题.文本标注、视频内容标注、图像识别和蛋白质功能的发现等都属于这类任务.与单标记学习问题一样,多标记学习也遭遇到了数据维数大的挑战.针对多标记数据,目前已经设计出一些约简算法,但与单标记约简算法相比,方法数量有限且局限性大.随着大数据时代的到来,收集大量样本越来越容易,但标注收集到的全部样本不切实际.这给想要通过利用粗糙集模型来解决多标记学习问题的研究人员带来了三个挑战:数据维数更高、现有粗糙集的局限性和部分标记决策表的出现.为了解决这三个挑战,提出了面向多标记学习的局部粗糙集模型,并获得了一些有意思的性质.最后,通过利用局部粗糙集模型,设计了一个多标记的启发式约简算法,并在三个公开的多标记数据集上验证了算法的有效性.

Abstract: Abstract:Multi­label learning is a particular learning task where each object is associated with a set of concept labels at the same time compared with single label learning.And it has been paid more attention than before because it widely exist in real world.In text labeling,each document may be annotated with more than a single label,for example,a web page on economy belongs to several predefined topics such as Buffett and stock simultaneously;in automatic scene annotation,each scene may be annotated with topical words,for instance,an image showing a sea bear in arctic may be associated with several annotated words such as bear and ice simultaneously;in the research of functional proteomics,each protein may show multiple functions meanwhile.All these cases are multi­label learning tasks.Like single label learning,multi­label learning also suffers from curse of dimensionality.Attribute reduction improving performance of multi­label classifiers is an effective means to decrease the dimension of the data.There are a large number of attribute reduction methods for single label learning,but a few methods have been designed for multi­label learning.However the existing attribute reduction methods have high computation complexity.Particularly,in the context of big data,collecting a large number of data is easier and easier,however marking all the data is unrealistic.If we analyze the multi­label problem of limitable data sets with existing rough set models,we need to take three challenges into consideration:bigger dimension,limitation of existing rough sets and appearance of partial label decision table.While the semi­supervised multi­label learning is a new research direction.To address these challenges and further exploit the information of unlabeled samples,local rough sets for multi­label learning is introduced and some interesting properties are obtained.Finally,a heuristic reduction method is designed by applying local rough sets for multi­label learning.Its effectiveness is verified on three publicity datasets.

[1] Tsoumakas G,Katakis I.Multi­lable classi­fication:An overview.International Journal of Data Warehousing and Mining,2007,3(3):1-13.
[2]  Schapire R E,Singer Y.Boostexter:A boosting­based system for text categorization.Machine Learning,2000,39(2):135-168.
[3]  Boutell M R,Luo J,Shen X,et al.Learning multi­label scene classification.Pattern Recognition,2004,37(9):1757-1771.
[4]  Bao B K,Ni B,Mu Y,et al.Efficient region­aware large graph construction towards scalable multi­label propagation.Pattern Recognition,2011,44(3):598-606.
[5]  Read J,Pfahringer B,Holmes G,et al.Classifier chains for multi­label classification.Machine learning,2011,85(3):333-359.
[6]  Tsoumakas G,Katakis I,Vlahavas I.Random k­labelsets for multi­label classification.IEEE Transaction On Knowledge and Data Engineering,2011,23(7):1079-1089.
[7]  Read J.A pruned problem transformation method for multi­label classification.In:Proceedings of 2008 New Zealand Computer Science Research Student Conference.Berlin:Springer,2008,143-150.
[8]  Zhang M,Zhou Z.ML­KNN:A lazy learning approach to multi­label learning.Patter Recognition,2007,40(7):2038-2048.
[9]  Clare A,King R D.Knowledge discovery in multi­label phenotype data.Principles of data mining and knowledge discovery.Springer Berlin Heidelberg,2001,42-53.
[10]  Freund Y,Schapire R.A decision­theoretic generalization of on­line learning and an application to boosting.Journal of computer and system sciences,1997,55(1):119-139.
[11]  De Comité F,Gilleron R,Tommasi M.Learning multi­label alternating decision trees from texts and data.Machine Learning and Data Mining in Pattern Recognition.Berlin:Springer,2003,35-49.
[12]  Zhang M,Zhou Z.Multi­label neural networks with applications to functional genomics and text categorization.IEEE Transactions on Knowledge and Data Engineering.2006,18(10):1338-1351.
[13]  Pawlak Z.Rough sets.International Journal of Computer and Information Science,1982,11:341-356.
[14]  Qian Y,Liang J,Pedrycz W,et al.Positive approximation:An accelerator for attribute reduction in rough set theory.Artificial Intelligence,2010,174(9):597-618.
[15]  Qian Y,Liang J,Pedrycz W,et al.An efficient accelerator for attribute reduction from incomplete data in rough set framework.Patter Recognition,2011,44(8):1658-1670.
[16]  Lin T,Huang K,Liu Q,et al.Rough sets,neighborhood systems and approximation.In:Proceeding of the Fourth International Symposium on Methodologies of Intelligent System.New York:North­Holland,1990,130-141.
[17]  Hu Q,Yu D,Liu J,et al.Neighborhood rough set based heterogeneous feature subset selection.Information Science,2008,178(18):3577-3594.
[18]  Ziarko W.Variable precision rough set model.Journal of Computer and System Sciences,1993,46(1):39-59.
[19]  Zhang Y,Zhou Z.Multilabel dimensionality reduction via dependence maximization.ACM Transactions on Knowledge Discovery from Data (TKDD),2010,4(3):1-21.
[20]  葛 雷,李国正,尤鸣宇.多标记学习的嵌入式特征选择.南京大学学报(自然科学),2009,45(5):671-676.(Ge L,Li G Z,You M Y.Embedded feature selection for multi­label learning.Journal of Nanjing University(Natural Sciences),2009,45(5):671-676.)
[21]  张振海,李士宁,李志刚等.一种基于信息熵的多标签特征选择算法.计算机研究与发展,2013,50(6):1177-1184.(Zhang Z H,Li S N,Li Z G,et al.Multi­label feature selection algorithm based on information entropy.Journal of Computer Research and Development,2013,50(6):1177-1184.)
[22]  张文修,徐宗本,梁 怡等.包含度理论.模糊系统与数学,1996,10(4):1-9.(Zhang W X,Xu Z B,Liang Y,et al.Inclusion degree theory.Fuzzy Systems and Mathematics,1996,10(4):1-9.)
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!