南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (1): 14–28.doi: 10.13232/j.cnki.jnju.2019.01.002

• • 上一篇    下一篇

基于特征选择的CP-nets结构学习

刘 素,刘惊雷*   

  1. 烟台大学计算机与控制工程学院,烟台,264005
  • 接受日期:2018-08-14 出版日期:2019-02-01 发布日期:2019-01-26
  • 通讯作者: 刘惊雷,E-mail:jinglei_liu@sina.com E-mail:jinglei_liu@sina.com
  • 基金资助:
    国家自然科学基金(61572419,61773331,61703360),山东省高等学校科技计划(J17KA091)

CP-nets structure learning based on feature selection

Liu Su,Liu Jinglei*   

  1. School of Computer and Control Engineering,Yantai University,Yantai,264005,China
  • Accepted:2018-08-14 Online:2019-02-01 Published:2019-01-26
  • Contact: Liu Jinglei, E-mail:jinglei_liu@sina.com E-mail:jinglei_liu@sina.com

摘要: 作为描述多属性之间定性条件偏好的一种图模型,条件偏好网(Conditional Preference networks,CP-nets)的结构学习问题在CP-nets的研究中起着重要的作用. 不同于传统的CP-nets学习方法,提出基于信息论和特征选择的方法来研究偏好数据库上的CP-nets的结构学习问题. 首先建立了偏好数据库上的互信息和条件互信息的求解方法,并将互信息看作一个属性和它的可行父亲之间的相关性,条件互信息看作可行父亲集中属性之间的冗余性,从而构造出极大相关极小冗余(Maximal Relevance Minimal Redundancy,mRMR)的目标函数,同时指出,一个属性的父亲集是由属性之间冗余度小,但对孩子属性的偏好却影响极大的属性子集组成的. 随后基于特征选择中的mRMR方法来实现CP-nets的结构学习,并设计相应的算法来完成从偏好数据中学习CP-nets的结构. 最后在电影推荐数据集上验证了算法的有效性. 研究结果表明,基于mRMR的特征选择方法可有效获取变量之间的因果关系,从而求取出每个属性的父亲集合,进而获得CP-nets的结构.

关键词: CP-nets结构学习, 极大相关极小冗余, 可行父亲集, 偏好数据库上的互信息, 特征选择

Abstract: As a graph model to describe the qualitative conditional preference between multiple attributes,the structural learning problem of Conditional Preference networks(CP-nets)plays an important role in the study of CP-nets. Different from traditional CP-nets learning method,this paper proposes a method based on information theory and feature selection to study the structural learning problem of CP-nets on preference database. Firstly,the method of solving mutual information and conditional mutual information on preference database is established,and the mutual information is regarded as the correlation between an attribute and its feasible father. Conditional mutual information is regarded as redundancy between the feasible father’s concentrated attributes,thus the objective function of Maximal Relevance Minimal Redundancy(mRMR)is constructed. It is pointed out that the parent set of an attribute is a subset of attributes which have little redundancy between attributes and have a great influence on the preference of children’s attributes preference. Then,the structure learning of CP-nets is realized basing on the mRMR method in feature selection,and the corresponding algorithm is designed to accomplish the study of CP-nets structure from the preference data. Finally,the effectiveness of the algorithm is verified on the film recommendation data set. The research results show that the method of feature selection based on mRMR can effectively obtain the causal relationship between variables,extract the father set of each attribute,and then obtain the structure of CP-nets.

Key words: CP-nets structure learning, maximal relevance minimal redundancy, feasible father set, mutual information on preference database, feature selection

中图分类号: 

  • TP181
[1] Domshlak C,Hüllermeier E,Kaci S,et al. Preferences in AI:An overview. Artificial Intelligence,2011,175(7-8):1037-1052.
[2] Pigozzi G,Tsoukiàs A,Viappiani P. Preferences in artificial intelligence. Annals of Mathematics and Artificial Intelligence,2015,20(3-4):361-401.
[3] Boutilier C,Brafman R I,Domshlak C,et al. CP-nets:A tool for representing and reasoning with conditional ceteris paribus preference statements. Journal of Artificial Intelligence Research,2004,21(1):135-191.
[4] Mattei N,Pini M S,Rossi F,et al. Bribery in voting with CP-nets. Annals of Mathematics and Artificial Intelligence,2013,68(1-3):135-160.
[5] Li M Y,Vo Q B,Kowalczyk R. Aggregating multi-valued CP-nets:A CSP-based approach. Journal of Heuristics,2014,21(1):107-140.
[6] Wu C Y,Wu C H,Feng B,et al. Conditional preference in recommender systems. Expert Systems with Applications,2015,42(2):774-788.
[7] Qiao Y H,Xiong Y,Gao H Y,et al. Protein-protein interface hot spots prediction based on a hybrid feature selection strategy. BMC Bioinformatics,2018,19:14.
[8] Liu J T,Yao Z J,Xiong Y,et al. Learning conditional preference network from noisy samples using hypothesis testing. Knowledge-Based Systems,2013,40:7-16.
[9] Dimopoulos Y,Michael L,Athienitou F. Ceteris paribus preference elicitation with predictive guarantees ∥ Proceedings of the 21stInternational Jont Conference on Artifical Intelligence. San Francisco,CA,USA:Morgan Kaufmann Publishers Inc.,2009:1890-1895.
[10] 梁 晋,梁吉业,赵兴旺. 一种面向大规模社会网络的社区发现算法. 南京大学学报(自然科学),2016,52(1):159-166.(Liang J,Liang J Y,Zhao X W. A community detection algorithm for large social network. Journal of Nanjing University(Natural Sciences),2016,52(1):159-166.)
[11] Xu Z S,Zhao N. Information fusion for intuitionistic fuzzy decision making:An overview. Information Fusion,2016,28:10-23.
[12] Allen T E. CP-nets:From theory to practice ∥ Proceedings of the 4th International Conference on Algorithmic Decision Theory. Springer Cham Heidelberg,2015,9346:555-560.
[13] Guerin J T,Allen T E,Goldsmith J. Learning CP-net preferences online from user queries ∥ Proceedings of the 3rd International Conference on Algorithmic Decision Theory. Springer Berlin Heidelberg,2013,8176:208-220.
[14] Conitzer V. Making decisions based on the preferences of multiple agents. Communications of the ACM,2010,53(3):84-94.
[15] 辛冠琳,刘惊雷. 基于G方检验的CP-nets学习. 南京大学学报(自然科学),2015,51(4):781-795.(Xin G L,Liu J L. Learning CP-nets based on G2 test. Journal of Nanjing University(Natural Sciences),2015,51(4):781-795.)
[16] Alanazi E,Mouhoub M,Zilles S. The complexity of learning acyclic CP-nets ∥ Proceedings of the 25th International Joint Conference on Artificial Intelligence(IJCAI 2016). Palo Alto,CA,USA:AAAI,2016:1361-1367.
[17] Liu H,Setiono R. A probabilistic approach to feature selection-a filter solution ∥ Proceedings of the 13th International Conference on Machine Learning. Bari,CA,USA:Morgan Kaufmann Publishers Inc.,1996:319-327.
[18] Oh I S,Lee J S,Moon B R. Hybrid genetic algorithms for feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(11):1424-1437.
[19] Kohavi R,John G H. Wrappers for feature subset selection. Artificial Intelligence,1997,97(1-2):273-324.
[20] Lippi M,Jaeger M,Frasconi P,et al. Relational information gain. Machine Learning,2011,83(2):219-239.
[21] Peng H C,Long F H,Ding C. Feature selection based on mutual information criteria of max-dependency,max-relevance,and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(8):1226-1238.
[22] 刘惊雷. CP-nets及其表达能力研究. 自动化学报,2011,37(3):290-302.(Liu J L. Research on CP-nets and its expressive power. Acta Automatica Sinica,2011,37(3):290-302.)
[23] Gonzales C,Perny P,Dubus J P. Decision making with multiple objectives using GAI networks. Artificial Intelligence,2011,175(7-8):1153-1179.
[24] Torkkola K. Feature extraction by non parametric mutual information maximization. The Journal of Machine Learning Research,2003,3(3):1415-1438.
[25] Narendra P M,Fukunaga K. A branch and bound algorithm for feature subset selection. IEEE Transactions on Computers,2006,26(9):917-922.
[26] Malone B,Kangas K,Jrvisalo M,et al. Empirical hardness of finding optimal bayesian network structures:Algorithm selection and runtime prediction. Machine Learning,2017,107(1):247-283.
[27] Liu J L,Liao S Z. Expressive efficiency of two kinds of specific CP-nets. Information Sciences,2015,295:379-394.
[28] 王国胤,于 洪,杨大春. 基于条件信息熵的决策表约简. 计算机学报,2002,25(7):759-766.(Wang G Y,Yu H,Yang D C. Decision table reduction based on conditional information entropy. Chinese Journal of Computer,2002,25(7):759-766.)
[29] Che J X,Yang Y L,Li L,et al. Maximum relevance minimum common redundancy feature selection for nonlinear data. Information Sciences,2017,409-410:68-86.
[30] Madsen A L,Jensen F,Salmerón A,et al. A parallel algorithm for bayesian network structure learning from large data sets. Knowledge-Based Systems,2017,117:46-55.
[1] 程玉胜,陈飞,庞淑芳. 标记倾向性的粗糙互信息k特征核选择[J]. 南京大学学报(自然科学版), 2020, 56(1): 19-29.
[2] 刘亮,何庆. 基于改进蝗虫优化算法的特征选择方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 41-50.
[3] 陈海娟,冯 翔,虞慧群. 基于预测算子的GSO特征选择算法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1206-1215.
[4] 温 欣1,李德玉1,2*,王素格1,2. 一种基于邻域关系和模糊决策的特征选择方法[J]. 南京大学学报(自然科学版), 2018, 54(4): 733-.
[5] 靳义林1,2*,胡 峰1,2. 基于三支决策的中文文本分类算法研究[J]. 南京大学学报(自然科学版), 2018, 54(4): 794-.
[6]  董利梅,赵 红*,杨文元.  基于稀疏聚类的无监督特征选择[J]. 南京大学学报(自然科学版), 2018, 54(1): 107-.
[7]  崔 晨,邓赵红*,王士同.  面向单调分类的简洁单调TSK模糊系统[J]. 南京大学学报(自然科学版), 2018, 54(1): 124-.
[8]  李 婵,杨文元*,赵 红.  联合依赖最大化与稀疏表示的无监督特征选择方法[J]. 南京大学学报(自然科学版), 2017, 53(4): 775-.
[9]  姚 晟1,2*,徐 风1,2,赵 鹏1,2,刘政怡1,2,陈 菊1,2.  基于改进邻域粒的模糊熵特征选择算法[J]. 南京大学学报(自然科学版), 2017, 53(4): 802-.
[10] 蔡亚萍,杨 明* . 一种利用局部标记相关性的多标记特征选择算法[J]. 南京大学学报(自然科学版), 2016, 52(4): 693-.
[11] 谢娟英*,屈亚楠,王明钊 . 基于密度峰值的无监督特征选择算法[J]. 南京大学学报(自然科学版), 2016, 52(4): 735-.
[12] 珠 杰1,2*,李天瑞1,刘胜久1. 基于条件随机场的藏文人名识别技术研究[J]. 南京大学学报(自然科学版), 2016, 52(2): 289-.
[13] 胡学钢*,许尧,李培培,张玉红. 一种过滤式多标签特征选择算法[J]. 南京大学学报(自然科学版), 2015, 51(4): 723-730.
[14] 周国静,李 云*. 基于最小最大策略的集成特征选择[J]. 南京大学学报(自然科学版), 2014, 50(4): 457-.
[15]  季薇1,李云2**
.  基于局部能量的集成特征选择*[J]. 南京大学学报(自然科学版), 2012, 48(4): 499-503.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 孙 玫,张 森,聂培尧,聂秀山. 基于朴素贝叶斯的网络查询日志session划分方法研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1132 -1140 .
[2] 周星星,张海平,吉根林. 具有时空特性的区域移动模式挖掘算法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1171 -1182 .
[3] 韩明鸣, 郭虎升, 王文剑. 面向非平衡多分类问题的二次合成QSMOTE方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 1 -13 .
[4] 王伯伟, 聂秀山, 马林元, 尹义龙. 基于语义相似度的无监督图像哈希方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 41 -48 .
[5] 孔 颉, 孙权森, 纪则轩, 刘亚洲. 基于仿射不变离散哈希的遥感图像快速目标检测新方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 49 -60 .
[6] 贾海宁, 王士同. 面向重尾噪声的模糊规则模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 61 -72 .
[7] 严云洋, 瞿学新, 朱全银, 李 翔, 赵 阳. 基于离群点检测的分类结果置信度的度量方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 102 -109 .
[8] 阚 威, 李 云. 基于LSTM的脑电情绪识别模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 110 -116 .
[9] 董少春,种亚辉,胡 欢,黄璐璐. 基于时序InSAR的常州市2015-2018年地面沉降监测[J]. 南京大学学报(自然科学版), 2019, 55(3): 370 -380 .
[10] 李藤, 杨田, 代建华, 陈鸰. 基于模糊区分矩阵的结直肠癌基因选择[J]. 南京大学学报(自然科学版), 2019, 55(4): 633 -643 .