南京大学学报(自然科学版) ›› 2020, Vol. 56 ›› Issue (1): 6773.doi: 10.13232/j.cnki.jnju.2020.01.008
Yinfang Zhang1,Hong Yu1(),Guoyin Wang1,Yongfang Xie2
摘要:
概念漂移会导致数据流分类模型的分类能力随时间发展而下降,这就要求分类模型有自适应的能力.现有的大多数自适应概念漂移的数据流分类模型往往假设数据输入分类模型得到预测标签之后就可以得到其真实标签,但这种假设在某些情况下是不合理的,因为数据标记往往成本高、耗时长.因此,针对数据流少量标签的问题,在考虑主动学习可能出现采样偏差的情况下,结合不确定性主动学习策略以及边界点和离群点检测方法(Boundary and Outlier Detection,BOD),提出一种新的主动学习方法ALBOD(Active Learning Based on Boundary and Outlier Detection).比较实验的结果表明,在概念漂移发生的情况下,与100%标记算法OzaBagAdwin(OBA)和HoeffdingAdaptiveTree(HAT)相比,ALBOD主动学习方法只需要平均20%左右的标签就可以使分类器保持同等分类精度,说明新方法ALBOD有良好的主动学习能力.
中图分类号:
1 | Schlimmer J C , Granger R H Jr . Incremental learning from noisy data. Machine Learning,1986,1(3):317-354. |
2 | 郑灿彬,闻立杰,王建民 . 基于可扩展活动关系的过程概念漂移检测. 计算机集成制造系统,2018,24(7):1589-1597. |
Zheng C B,Wen L J,Wang J M. Process concept drift detection based on extensible activity relationship. Computer Integrated Manufacturing Systems,2018,24(7):1589-1597. | |
3 | Ditzler G , Roveri M , Alippi C ,et al . Learning in nonstationary environments:a survey. IEEE Computational Intelligence Magazine,2015,10(4):12-25. |
4 | ZareMoodi P , Beigy H , Siahroudi S K . Novel class detection in data streams using local patterns and neighborhood graph. Neurocomputing,2015,158:234-245. |
5 | 孙艳歌,王志海,原继东 等 . 基于信息熵的数据流自适应集成分类算法. 中国科学技术大学学报,2017,47(7):575-582. |
Sun Y G , Wang Z H , Yuan J D ,et al . Adaptive ensemble classification algorithm for data streams based on information entropy. Journal of University of Science and Technology of China,2017,47(7):575-582. | |
6 | Ahmadi Z , Beigy H . Semi?supervised ensemble learning of data streams in the presence of concept drift∥The 7th International Conference on Hybrid Artificial Intelligence Systems. Springer Berlin Heidelberg,2012:526-537. |
7 | Haque A , Khan L , Baron M . Sand:semi?supervised adaptive novel class detection and classification over data stream∥The 30th AAAI Conference on Artificial Intelligence.Phoenix,AZ,USA:AAAI,2016:1652-1658. |
8 | Settles B . Active learning literature survey. computer Sciences. Technical Report 1648. Madison:University of Wisconsin?Madison,2009:3-4. |
9 | Fu Y F , Zhu X Q , Li B . A survey on instance selection for active learning. Knowledge and Information Systems,2013,35(2):249-283. |
10 | Mohamad S , Bouchachia A , Sayed?Mouchaweh M . A bi?criteria active learning algorithm for dynamic data streams. IEEE Transactions on Neural Networks and Learning Systems,2018,29(1):74-86. |
11 | Zliobaite I , Bifet A , Pfahringer B ,et al . Active learning with drifting streaming data. IEEE Transactions on Neural Networks and Learning Systems,2013,25(1):27-39. |
12 | Mohamad S , Sayed?Mouchaweh M , Bouchachia A . Active learning for classifying data streams with unknown number of classes. Neural Networks,2018,98:1-15. |
13 | Li Y , Maguire L . Selecting critical patterns based on local geometrical and statistical information. IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(6):1189-1201. |
14 | Li X J , Lv J C , Yi Z . An efficient representation?based method for boundary point and outlier detection. IEEE Transactions on Neural Networks and Learning Systems,2016,29(1):51-62. |
15 | Ahmed M , Mahmood A N , Hu J K . A survey of network anomaly detection techniques. Journal of Network and Computer Applications,2016,60:19-31. |
16 | Salehi M , Rashidi L . A survey on anomaly detection in evolving data:with application to forest fire risk prediction. ACM SIGKDD Explorations Newsletter,2018,20(1):13-23. |
17 | Gao Z W , Cecati C , Ding S X . A survey of fault diagnosis and fault?tolerant techniques?Part I:fault diagnosis with model?based and signal?based approaches. IEEE Transactions on Industrial Electronics,2015,62(6):3757-3767. |
18 | Chandola V , Banerjee A , Kumar V . Anomaly detection:a survey. ACM Computing Surveys,2009,41(3):15. |
19 | Agrawal S , Agrawal J . Survey on anomaly detection using data mining techniques. Procedia Computer Science,2015,60:708-713. |
20 |
Lu J , Liu A J , Dong F ,et al . Learning under concept drift:a review. IEEE Transactions on Knowledge and Data Engineering,2018,doi:10.1109/TKDE. 2018.2876857 .
doi: 10.1109/TKDE. 2018.2876857 |
21 | Roweis S T , Saul L K . Nonlinear dimensionality reduction by locally linear embedding. Science,2000,290(5500):2323-2326. |
22 | Bifet A , Holmes G , Kirkby R ,et al . MOA:massive online analysis. Journal of Machine Learning Research,2010,11:1601-1604. |
23 | Elwell R , Polikar R . Incremental learning of concept drift in nonstationary environments. IEEE Transactions on Neural Networks,2011,22(10):1517-1531. |
[1] | 汪敏,赵飞,闵帆. 储层预测的代价敏感主动学习算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 561-569. |
[2] | 王卫星,刘兆伟,石敬华. 基于时间敏感滑动窗口的CP⁃nets结构学习[J]. 南京大学学报(自然科学版), 2020, 56(2): 175-185. |
[3] | 柴变芳,魏春丽,曹欣雨,王建岭. 面向网络结构发现的批量主动学习算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 1020-1029. |
[4] | 黄 帷,闵 帆*,任 杰. 基于协同过滤加权预测的主动学习缺失值填补算法[J]. 南京大学学报(自然科学版), 2018, 54(4): 758-. |
[5] | 宋威,刘明渊,李晋宏. 基于事务型滑动窗口的数据流中高效用项集挖掘算法[J]. 南京大学学报(自然科学版), 2014, 50(4): 494-. |
[6] | 汤克明1,2,戴彩艳1,陈峻3.4** . 一种基于滑动窗口的不确定数据流Top一K 查询算法* [J]. 南京大学学报(自然科学版), 2012, 48(3): 351-359. |
[7] | 白龙飞1,王文剑2**,郭虎升1. 一种新的支持向量机主动学习策略* [J]. 南京大学学报(自然科学版), 2012, 48(2): 182-189. |
[8] | 据春华,帅朝谦**,封毅 . 基于粒计算的商业数据流概念漂移特征选择*[J]. 南京大学学报(自然科学版), 2011, 47(4): 391-397. |
[9] | 赵飞, 刘奇志** , 张剡, 柏文阳 . 一种大域数据流中缺失值的填充方法* [J]. 南京大学学报(自然科学版), 2011, 47(1): 32-39. |
[10] | 卞磊 * , 刘超, 金茂忠 . 一种面向审查的过程内数据流异常自动检测方法 [J]. 南京大学学报(自然科学版), 2010, 46(1): 71-76. |
|