南京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (6): 11411151.doi: 10.13232/j.cnki.jnju.2018.06.010
胡 淼1,2,王开军1,2*,李海超1,2,陈黎飞1,2
Hu Miao1,2,Wang Kaijun1,2*,Li Haichao1,2,Chen Lifei1,2
摘要: 提出一种模糊树节点的随机森林算法进行异常点检测. 在构建随机森林的分类决策树过程中,把模糊方法引入到二叉决策树的节点中,在节点中设计关于类别划分的模糊区域,在模糊区域上设计正常与异常隶属度函数. 当某样本通过决策树节点的模糊区域时,若该样本的异常隶属度大于正常隶属度,则该样本被判别为异常类;否则,该样本进入决策树的下层树节点,若无下层节点则被判别为正常类. 该样本的最终类别由随机森林算法中的投票步骤决定. 在四个UCI数据集上的实验结果表明,在异常点检测的综合性能(召回率、精度和准确率)上,与基于随机森林的异常点检测算法RFV和RFP相比,新方法不仅具有较高的综合性能且性能稳定,还具有与一类支持向量机相当的性能,其部分实验结果优于一类支持向量机.
中图分类号:
[1] Hodge V,Austin J. A survey of outlier detection methodologies. Artificial Intelligence Review,2004,22(2):85-126. [2] Domingues R,Filippone M,Michiardi P,et al. A comparative evaluation of outlier detection algorithms:Experiments and analyses. Pattern Recognition,2018,74:406-421. [3] 肇启佳,龙 军,蔡志平等. 基于决策树和平行坐标系的网络异常检测方法 ∥ 2015全国理论计算机科学学术年会论文集. 金华,中国,2015:1-5.(Zhao Q J,Long J,Cai Z P,et al. The important features of anomaly detection based on visual acquisition technology ∥ Proceedings of the 2015 National Theoretical Computer Science Academic Annual Conference. Jinhua,China,2015:1-5.) [4] Shen Y H,Liu H W,Wang Y X,et al. A novel isolation-based outlier detection method ∥ PRICAI 2016:Trends in Artificial Intelligence (PRICAI 2016). Phuket,Thailand:Springer,Cham,2016:446-456. [5] Liu F T,Kai M T,Zhou Z H. Isolation forest ∥ 2008 8th IEEE International Conference on Data Mining. Pisa,Italy:IEEE,2008:413-422. [6] 梁春华,王建虹,孔德瑾. 基于模糊决策树的保险企业数据异常访问检测方法. 电脑开发与应用,2013(4):6-8.(Liang C H,Wang J H,Kong D J. Abnormal accessing diagnosis method of insurance data based on fuzzy decision tree. Computer Development & Applications,2013(4):6-8.) [7] 刘晓艳,王丽珍,杨志强等. 基于数学形态学的模糊异常点检测. 计算机研究与发展,2009,46(S2):907-914.(Liu X Y,Wang L Z,Yang Z Q,et al. Fuzzy outliers detection based oil mathematical morphology. Journal of Computer Research and Development,2009,46(S2):907-914) [8] 李建勋. 基于模糊聚类分析的数据异常知识发现方法. 硕士学位论文. 哈尔滨:哈尔滨工业大学,2015.(Li J X. Anomaly detection method for datasets based on fuzzy clustering. Master Dissertation. Harbin:Harbin Institute of Technology,2015.) [9] Schlkopf B,Williamson R,Smola A,et al. Support vector method for novelty detection ∥ Advances in Neural Information Processing Systems. Cambridge,MA,USA:MIT Press,2000:582-588. [10] Lazzaretti A E,Tax D M J,Neto H V,et al. Novelty detection and multi-class classification in power distribution voltage waveforms. Expert Systems with Applications,2016,45:322-330. [11] Xiao Y C,Wang H G,Xu W L,et al. Robust one-class SVM for fault detection. Chemometrics & Intelligent Laboratory Systems,2016,151:15-25. [12] Cid-Fuentes J A,Szabo C,Falkner K. Adaptive performance anomaly detection in distributed systems using online SVMs. IEEE Transactions on Dependable & Secure Computing,2018,doi:10.1109/TDSC.2018.2821693. [13] Erfani S M,Rajasegarar S,Karunasekera S,et al. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition,2016,58:121-134. [14] Paula E L,Ladeira M,Carvalho R N,et al. Deep learning anomaly detection as support fraud investigation in Brazilian exports and anti-money laundering ∥ 2016 15th IEEE International Conference on Machine Learning and Appli-cations. Anaheim,CA,USA:IEEE,2016:954-960. [15] Angelo P A A,Drummond A C. A survey of random forest based methods for intrusion detection systems. ACM Computing Surveys(CSUR),2018,51(3):Article No. 48. doi:10.1145/3178582. [16] Guha S,Mishra N,Roy G,et al. Robust random cut forest based anomaly detection on streams ∥ 33rd International Conference on Machine Learning. New York,NY,USA:JMLR,2016:2712-2721. [17] 邱一卉,林成德. 基于随机森林方法的异常样本检测方法. 福建工程学院学报,2007,5(4):392-396.(Qiu Y H,Lin C D. Outlier detection based on random forest. Journal of Fujian University of Technology,2007,5(4):392-396.) [18] Zhou Q F,Zhou H,Ning Y P,et al. Two approaches for novelty detection using random forest. Expert Systems with Applications,2015,42(10):4840-4850. [19] 李贞贵. 随机森林改进的若干研究. 硕士学位论文. 厦门:厦门大学,2013.(Li Z G. Several research on Random Forest improve. Master Dissertation. Xiamen:Xiamen University,2013.) [20] Breiman L,Friedman J H,Olshen R A,et al. Classification and regression trees. Wadsworth International Group,1984,57(1):243-246. [21] Breiman L. Bagging predictors. Machine Learning,1996,24(2):123-140. [22] Breiman L. Random forests. Machine Learning,2001,45(1):5-32. [23] Zadeh L A. Fuzzy sets. Information & Control,1965,8(3):338-353. [24] 张 亮,宁 芊. CART决策树的两种改进及应用. 计算机工程与设计,2015(5):1209-1213.(Zhang L,Ning Q. Two improvements on CART decision tree and its application. Computer Engineering and Design,2015(5):1209-1213.) [25] Chang C C,Lin C J. LIBSVM:A library for support vector machines. ACM Transactions on Intelligent Systems and Technology,2011,2(3):Article No. 27. Doi:10.1145/1961189.1961199. |
[1] | 陈石,张兴敢. 基于小波包能量熵和随机森林的级联H桥多电平逆变器故障诊断[J]. 南京大学学报(自然科学版), 2020, 56(2): 284-289. |
[2] | 洪思思,曹辰捷,王 喆*,李冬冬. 基于矩阵的AdaBoost多视角学习[J]. 南京大学学报(自然科学版), 2018, 54(6): 1152-1160. |
[3] | 郑丽容1,洪志令2*. HSEC:基于聚类的启发式选择性集成[J]. 南京大学学报(自然科学版), 2018, 54(1): 116-. |
[4] | 曹冬寅,王 琼*,张兴敢. 基于稀疏重构残差和随机森林的集成分类算法 [J]. 南京大学学报(自然科学版), 2016, 52(6): 1127-. |
[5] | 邢 胜1,2 王熙照3*, 王晓兰4. 基于多类重采样的非平衡数据极速学习机集成学习[J]. 南京大学学报(自然科学版), 2016, 52(1): 203-211. |
[6] | 朱亚奇1,邓维斌1 ,2*. 一种基于不平衡数据的聚类抽样方法[J]. 南京大学学报(自然科学版), 2015, 51(2): 421-429. |
[7] | 朱亚奇1,邓维斌1,2*. 一种基于不平衡数据的聚类抽样方法[J]. 南京大学学报(自然科学版), 2015, 51(2): 421-429. |
[8] | 许行1,梁吉业1,2,王宝丽1. 基于双向有序互信息的单调分类决策树算法[J]. 南京大学学报(自然科学版), 2013, 49(5): 628-636. |
[9] | 郭丽娜1**,杨杨2. 一种基于模糊支持向量机软件模块缺陷检测算法* [J]. 南京大学学报(自然科学版), 2012, 48(2): 221-227. |
|