南京大学学报(自然科学版) ›› 2020, Vol. 56 ›› Issue (2): 175–185.doi: 10.13232/j.cnki.jnju.2020.02.003

• • 上一篇    下一篇

基于时间敏感滑动窗口的CP⁃nets结构学习

王卫星1,刘兆伟1(),石敬华2   

  1. 1.烟台大学计算机与控制工程学院,烟台,264005
    2.山东省生态环境监测中心,济南,250101
  • 收稿日期:2019-11-24 出版日期:2020-03-30 发布日期:2020-04-02
  • 通讯作者: 刘兆伟 E-mail:lzw@ytu.edu.cn
  • 基金资助:
    国家自然科学基金(61403328);山东省重点研发计划(2015GSF115009)

Learning of CP⁃nets structure based on a time⁃sensitive sliding window

Weixing Wang1,Zhaowei Liu1(),Jinghua Shi2   

  1. 1.School of Computer and Control Engineering, Yantai University, Yantai, 264005, China
    2.Shandong Eco?environmental Monitoring Center, Ji'nan, 250101, China
  • Received:2019-11-24 Online:2020-03-30 Published:2020-04-02
  • Contact: Zhaowei Liu E-mail:lzw@ytu.edu.cn

摘要:

随着人工智能的发展,条件偏好网(Conditional Preference networks,CP?nets)的学习和表示被广泛研究.此前的研究工作主要集中于从静态数据库中挖掘用户的条件偏好,而在许多新兴应用中,数据通过互联网或传感器网络流动,偏好也会随之发生变化.将挖掘偏好的方法扩展到动态环境是一个挑战,遇到的问题主要包括对连续数据进行的快速处理、庞大的数据量以及有限的内存资源等.针对偏好数据流,提出一种基于时间敏感的滑动窗口模型来挖掘条件偏好关系和学习CP?nets结构的方法,该方法包括一个用来获取所有可能偏好关系的存储结构以及一个对偏好关系进行累积计数的数据结构,并提出基于时间敏感滑动窗口的条件偏好关系挖掘算法,根据输入的偏好数据流比较基本块与滑动窗口的大小对条件偏好关系进行插入和更新.实验结果表明,与其他学习CP?nets结构的方法相比,该方法所需的运行时间少,得到的CP?nets的结构更准确.

关键词: CP?nets, 滑动窗口, 数据流, 频繁项集

Abstract:

With the development of artificial intelligence,the learning and representation of preference model CP?nets(Conditional Preference networks) has been widely studied. Previous researches mainly focused on mining users' conditional preferences from static databases. In many emerging applications,data is updated through the internet or sensor networks,and then preferences change. Extending the method of mining preferences to dynamic environments is a challenge,including fast processing of continuous data,huge amount of data and limited memory resources. For preference data streams,this paper proposes a time?sensitive sliding window model to mine conditional preference relations and learn the structure of CP?nets. This method includes a storage structure for obtaining all possible preference relationships and a data structure for cumulative counting of preference relationships. An algorithm for mining conditional preference relations based on time?sensitive sliding windows is proposed,comparing the size of the basic block and the sliding window according to the input preference data stream to insert and update the conditional preference relationship. The experimental results show that compared with other methods of learning CP?nets structure,this method needs less running time and gets more accurate structure of CP?nets.

Key words: CP?nets, sliding window, data stream, frequent itemsets

中图分类号: 

  • TP181

图1

一个面向寿司推荐的CP?net"

表1

偏好事务流"

基本块时间段

事务

数量

偏好关系

(计数)

B109:00-09:5927o12(11),o43(11),o13(20),o15(2)
B210:00-10:5920o12(20),o43(20),o15(13),o37(13)
B311:00-11:5927o12(19),o43(19),o13(8),o15(7),o37(7)
B412:00-12:5923o12(10),o43(10),o15(3),o65(10),o78(10)

表2

扫描基本块B1之后TA,DT和PCP的更新结果"

TA10.8000
DTB_IDIDBcount
1111
1211
1320
PCP(1,o12,11,0) (2,o43,11,0) (3,o13,20,0)

表3

扫描基本块B2之后TA,DT和PCP的更新结果"

TA810.800
DTB_IDIDBcount
1111
1211
1320
2120
2220
230
2413
2513
PCP

(1,o12,31,0) (2,o43,31,0) (3,o13,20,0)

(4,o15,13,10) (5,o37,13,10)

表4

扫描基本块B3之后TA,DT和PCP的更新结果"

TA10.8810.80
DTB_IDIDBcount
1111
1211
1320
2120
2220
230
2413
2513
3119
3219
337
347
1111
PCP

(1,o12,50,0) (2,o43,50,0)

(3,o15,20,10) (4,o37,20,10)

表5

扫描基本块B4之后TA,DT和PCP的更新结果"

TA9.210.8810.8
DTB_IDIDBcount
2120
2220
230
2413
2513
3119
3219
337
347
4110
4210
4310
4410
PCP

(1,o12,49,0) (1,o43,49,0)

(3,o78,10,29) (4,o65,10,29)

图2

最终得到的CP?net"

图3

不同样本数量下学习CP?nets的相似度"

图4

不同样本数量下学习CP?nets的相容度"

图5

不同样本数量下学习CP?nets的运行时间"

图6

CP?nets动态学习过程"

图7

算法运行过程中的相容度"

表6

本文算法在Sushi数据集上与其他CP?nets学习方法相容度的比较"

样本

数量

卡方检验G方检验精确P值计算本文算法
2000.68560.76820.77910.7687
4000.76750.80560.82030.8094
8000.86920.89610.88920.9027
16000.89380.92470.93460.9637

表7

本文算法在MovieLens数据集上与其他CP?nets学习方法相容度的比较"

样本

数量

卡方检验G方检验精确P值计算本文算法
2000.68130.73460.74640.7492
4000.74690.81340.82010.8086
8000.82710.89680.88890.8993
16000.85680.94670.93340.9789

表8

本文算法在MovieLens数据集上限定时间内与其他多种CP?nets学习方法相容度的比较"

限定时间(s)卡方检验G方检验精确P值计算本文算法
0.50.58550.62830.64010.8432
1.50.71430.75680.74560.8978
30.79640.88490.89320.9454
60.85760.92130.93500.9350
1 朱蔚林,木伟民,金宗泽等.基于MR的高可靠分布式数据流统计模型.计算机技术与发展,2018,28(1):6-10,16.
Zhu W L,Mu W M,Jin Z Z, et al. Statistical model of distrubuted data strem with high reliability based on MR. Computer Technology and Development,2018,28(1):6-10,16.
2 徐石磊,魏星,江红等.分布式数据库系统中的并行分组聚合实现.华东师范大学学报(自然科学版),2018(5):56-66.
Xu S L,Wei X,Jiang H,et al.Implementation of the parallel groupby and aggregation functions in a distributed database system. Journal of East China Normal University (Natural Science),2018(5):56-66.
3 崔凌云,刘春玲,魏荣华.大数据技术在船舶监控系统数据管理中的应用.舰船科学技术,2018,41(2A):142-144.
Cui L Y,Liu C L,Wei R H.Application of large data technology in data management of ship monitoring system. Ship Science and Technology,2018,41(2A):142-144.
4 Lughofer E,Pratama M.Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models.IEEE Transactions on Fuzzy Systems,2018,26(1):292-309.
5 Wu K H,Zhu Y Y,Quan L,et al.A distributed real?time data prediction framework for large?scale time?series data using stream processing.International Journal of Intelligent Computing and Cybernetics,2017,10(2):145-165.
6 茹蓓,贺新征.高效的数据流完全频繁项集挖掘算法.计算机工程与设计,2017,38(10):2759-2766.
Ru B,He X Z.Efficient complete frequent itemsets mining algorithm of data stream. Computer Engineering and Design,2017,38(10):2759-2766.
7 王玲,李树林,徐培培等.基于频繁项集树的时态关联规则挖掘算法.控制与决策,2018,33(4):591-599. Wang L,Li S L,Xu P P,et al.Temporal association rules mining algorithm based on frequent item sets tree. Control and Decision,2018,33(4):591-599.
8 Oulad?Naoui S,Cherroun H,Ziadi D.A formal series?based unification of the frequent itemset mining approaches.Knowledge and Information Systems,2017,53(2):439-477.
9 Liu Z W,Zhong Z L,Zhang C H,et al.Learning CP?nets structure from preference data streams.IEEE Access,2018,6:56716-56726.
10 辛冠琳,刘惊雷.基于精确P值计算学习无环CP?nets.南京大学学报(自然科学),2017,53(3):450-461.Xin G L,Liu J L. Learning acyclic CP?nets based on exact P?value computation. Journal of Nanjing University (Natural Science),2017,53(3):450-461.
11 刘惊雷,廖士中.CP?nets学习的复杂度.计算机科学,2018,45(6):211-215.
Liu J L,Liao S Z.Complexity of CP?nets learning. Computer Science,2018,45(6):211-215.
12 郑金芳,张继栋,陈烽.基于模糊频繁模式的数据流关联规则挖掘方法.湘潭大学自科学报,2017,39(3):122-126.
Zheng J F,Zhang J D,Chen F. Mining method of data stream association rules based on fuzzy frequent mode. Natural Science Journal of Xiangtan University,2017,39(3):122-126.
13 Demaine E D,López?Ortiz A,Munro J I.Frequency estimation of internet packet streams with limited space∥European Symposium on Algorithms.Springer Berlin Heidelberg,2002:348-360.
14 Karp R M,Shenker S,Papadimitriou C H.A simple algorithm for finding frequent elements in streams and bags.ACM Transactions on Database Systems,2003,28(1):51-55.
15 Manku G S,Motwani R.Approximate frequency counts over data streams∥Proceedings of the 28th International Conference on Very Large Data Bases.Hong Kong,China:ACM,2002:346-357.
16 Cohen E,Strauss M J.Maintaining time?decaying stream aggregates.Journal of Algorithms,2006,59(1):19-36.
17 Deypir M,Sadreddini M H.A dynamic layout of sliding window for frequent itemset mining over data streams.Journal of Systems and Software,2012,85(3):746-759.
18 Li H F,Zhang N,Zhu J M,et al.Frequent itemset mining over time?sensitive streams.Chinese Journal of Computers,2012,35(11):2283.
19 Boutilier C,Brafman R I,Domshlak C,et al.CP?nets:a tool for representing and reasoning with conditional ceteris paribus preference statements.Journal of Artificial Intelligence Research,2004,21:135-191.
20 刘惊雷,刘兆伟,孙雪姣等.CP?nets的代数表示及其模型求取算法.模式识别与人工智能,2011,24(6):725-732.
Liu J L,Liu Z W,Sun X J,et al. Algebraic representation and model solving algorithm for CP?nets. Pattern Recognition and Artificial Intelligence,2011,24(6):725-732.)
21 辛冠琳,刘惊雷.从偏好数据库中挖掘Ceteris Paribus偏好.计算机应用,2016,36(8):2092-2098,2108.
Xin G L,Liu J L.Mining ceteris paribus preference from preference database. Journal of Computer Applications,2016,36(8):2092-2098,2108.
22 陈寅.基于FP?Growth算法的关联规则挖掘算法研究.无线互联科技,2017(19):118-121,141.
Chen Y. Study on association rule mining algorithm based on FP?Growth algorithm. Wireless Internet Technology,2017(19):118-121,141.
23 Liu J T,Xiong Y,Wu C H,et al.Learning conditional preference networks from inconsistent examples.IEEE Transactions on Knowledge and Data Engineering,2014,26(2):376-390.
24 Liu J T,Yao Z J,Xiong Y,et al.Learning conditional preference network from noisy samples using hypothesis testing.Knowledge?Based Systems,2013,40:7-16.
[1] 信统昌,刘兆伟. 基于贝叶斯⁃遗传算法的多值无环CP⁃nets学习[J]. 南京大学学报(自然科学版), 2020, 56(1): 74-84.
[2] 张银芳,于洪,王国胤,谢永芳. 一种用于数据流自适应分类的主动学习方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 67-73.
[3] 宋威,刘明渊,李晋宏. 基于事务型滑动窗口的数据流中高效用项集挖掘算法[J]. 南京大学学报(自然科学版), 2014, 50(4): 494-.
[4]  汤克明1,2,戴彩艳1,陈峻3.4**
.  一种基于滑动窗口的不确定数据流Top一K 查询算法*
[J]. 南京大学学报(自然科学版), 2012, 48(3): 351-359.
[5]  据春华,帅朝谦**,封毅
.  基于粒计算的商业数据流概念漂移特征选择*[J]. 南京大学学报(自然科学版), 2011, 47(4): 391-397.
[6]  赵飞, 刘奇志** , 张剡, 柏文阳 .  一种大域数据流中缺失值的填充方法*

[J]. 南京大学学报(自然科学版), 2011, 47(1): 32-39.
[7]  卞磊 * , 刘超, 金茂忠 .  一种面向审查的过程内数据流异常自动检测方法

[J]. 南京大学学报(自然科学版), 2010, 46(1): 71-76.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 李嘉明, 邹 勇, 郑 浩, 魏钟波, 杨柳燕, 缪爱军. 养殖塘生态系统重金属污染状况与风险评价[J]. 南京大学学报(自然科学版), 2019, 55(2): 272 -281 .
[2] 齐小刚, 强敏, 刘立芳. RSboFMC:提高数据可用性和负载均衡性的碎片矩阵缓存策略[J]. 南京大学学报(自然科学版), 2019, 55(4): 667 -677 .
[3] 柴变芳,魏春丽,曹欣雨,王建岭. 面向网络结构发现的批量主动学习算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 1020 -1029 .
[4] 郑文萍,刘韶倩,穆俊芳. 一种基于相对熵的随机游走相似性度量模型[J]. 南京大学学报(自然科学版), 2019, 55(6): 984 -999 .
[5] 吕国俊,曹建军,郑奇斌,常宸,翁年凤. 基于结构保持对抗网络的跨模态实体分辨[J]. 南京大学学报(自然科学版), 2020, 56(2): 197 -205 .