南京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (6): 1141–1151.doi: 10.13232/j.cnki.jnju.2018.06.010

• • 上一篇    下一篇

模糊树节点的随机森林与异常点检测

胡 淼1,2,王开军1,2*,李海超1,2,陈黎飞1,2   

  1. 1.福建师范大学数学与信息学院,福州,350117;2.数字福建环境监测物联网实验室,福建师范大学,福州,350117
  • 接受日期:2018-08-22 出版日期:2018-12-01 发布日期:2018-12-01
  • 通讯作者: 王开军, wkjwang@qq.com E-mail:wkjwang@qq.com
  • 基金资助:
    国家自然科学基金(61672157),福建省自然科学基金(2018J01778)

A random forest algorithm based on fuzzy tree node for anomaly detection

Hu Miao1,2,Wang Kaijun1,2*,Li Haichao1,2,Chen Lifei1,2   

  1. 1.College of Mathematics and Informatics,Fujian Normal University,Fuzhou,350117,China; 2.Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring,Fujian Normal University,Fuzhou,350117,China
  • Accepted:2018-08-22 Online:2018-12-01 Published:2018-12-01
  • Contact: Wang Kaijun, wkjwang@qq.com E-mail:wkjwang@qq.com

摘要: 提出一种模糊树节点的随机森林算法进行异常点检测. 在构建随机森林的分类决策树过程中,把模糊方法引入到二叉决策树的节点中,在节点中设计关于类别划分的模糊区域,在模糊区域上设计正常与异常隶属度函数. 当某样本通过决策树节点的模糊区域时,若该样本的异常隶属度大于正常隶属度,则该样本被判别为异常类;否则,该样本进入决策树的下层树节点,若无下层节点则被判别为正常类. 该样本的最终类别由随机森林算法中的投票步骤决定. 在四个UCI数据集上的实验结果表明,在异常点检测的综合性能(召回率、精度和准确率)上,与基于随机森林的异常点检测算法RFV和RFP相比,新方法不仅具有较高的综合性能且性能稳定,还具有与一类支持向量机相当的性能,其部分实验结果优于一类支持向量机.

关键词: 异常点检测, 集成学习, 随机森林, 模糊隶属度函数, 模糊树节点

Abstract: This paper proposes a random forest algorithm based on fuzzy tree node for anomaly detection. In the process of constructing the classification tree of the random forests,the fuzzy method is introduced into the nodes of the binary decision tree. The fuzzy regions about the class division are designed in the nodes,and the normal and anomaly membership functions are designed on the fuzzy regions. When a sample passes through the fuzzy region of the decision tree node,if the sample’s anomaly membership degree is greater than the normal membership degree,the sample is discriminated as the anomaly class. Otherwise,the sample enters the lower tree node of the decision tree and can be identified as a normal class if there is no lower node. The final class of the sample is determined by the voting steps in the random forest algorithm. The experimental results on four UCI datasets show that on the overall performance of the anomaly detection(recall,precision and accuracy). Compared with the RFV and RFP based on the random forest anomaly detection algorithm,the new method not only has higher comprehensive capability but also is stable,while RFV and RFP have lower performance in most cases. Compared with One-class Support Vector Machines,the new method has the same performance,and some of its experimental results are superior to One-class Support Vector Machines.

Key words: anomaly detection, ensemble learning, random forest, fuzzy membership function, fuzzy tree node

中图分类号: 

  • TP311
[1] Hodge V,Austin J. A survey of outlier detection methodologies. Artificial Intelligence Review,2004,22(2):85-126.
[2] Domingues R,Filippone M,Michiardi P,et al. A comparative evaluation of outlier detection algorithms:Experiments and analyses. Pattern Recognition,2018,74:406-421.
[3] 肇启佳,龙 军,蔡志平等. 基于决策树和平行坐标系的网络异常检测方法 ∥ 2015全国理论计算机科学学术年会论文集. 金华,中国,2015:1-5.(Zhao Q J,Long J,Cai Z P,et al. The important features of anomaly detection based on visual acquisition technology ∥ Proceedings of the 2015 National Theoretical Computer Science Academic Annual Conference. Jinhua,China,2015:1-5.)
[4] Shen Y H,Liu H W,Wang Y X,et al. A novel isolation-based outlier detection method ∥ PRICAI 2016:Trends in Artificial Intelligence (PRICAI 2016). Phuket,Thailand:Springer,Cham,2016:446-456. [5] Liu F T,Kai M T,Zhou Z H. Isolation forest ∥ 2008 8th IEEE International Conference on Data Mining. Pisa,Italy:IEEE,2008:413-422.
[6] 梁春华,王建虹,孔德瑾. 基于模糊决策树的保险企业数据异常访问检测方法. 电脑开发与应用,2013(4):6-8.(Liang C H,Wang J H,Kong D J. Abnormal accessing diagnosis method of insurance data based on fuzzy decision tree. Computer Development & Applications,2013(4):6-8.)
[7] 刘晓艳,王丽珍,杨志强等. 基于数学形态学的模糊异常点检测. 计算机研究与发展,2009,46(S2):907-914.(Liu X Y,Wang L Z,Yang Z Q,et al. Fuzzy outliers detection based oil mathematical morphology. Journal of Computer Research and Development,2009,46(S2):907-914)
[8] 李建勋. 基于模糊聚类分析的数据异常知识发现方法. 硕士学位论文. 哈尔滨:哈尔滨工业大学,2015.(Li J X. Anomaly detection method for datasets based on fuzzy clustering. Master Dissertation. Harbin:Harbin Institute of Technology,2015.)
[9] Schlkopf B,Williamson R,Smola A,et al. Support vector method for novelty detection ∥ Advances in Neural Information Processing Systems. Cambridge,MA,USA:MIT Press,2000:582-588.
[10] Lazzaretti A E,Tax D M J,Neto H V,et al. Novelty detection and multi-class classification in power distribution voltage waveforms. Expert Systems with Applications,2016,45:322-330.
[11] Xiao Y C,Wang H G,Xu W L,et al. Robust one-class SVM for fault detection. Chemometrics & Intelligent Laboratory Systems,2016,151:15-25.
[12] Cid-Fuentes J A,Szabo C,Falkner K. Adaptive performance anomaly detection in distributed systems using online SVMs. IEEE Transactions on Dependable & Secure Computing,2018,doi:10.1109/TDSC.2018.2821693.
[13] Erfani S M,Rajasegarar S,Karunasekera S,et al. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition,2016,58:121-134.
[14] Paula E L,Ladeira M,Carvalho R N,et al. Deep learning anomaly detection as support fraud investigation in Brazilian exports and anti-money laundering ∥ 2016 15th IEEE International Conference on Machine Learning and Appli-cations. Anaheim,CA,USA:IEEE,2016:954-960.
[15] Angelo P A A,Drummond A C. A survey of random forest based methods for intrusion detection systems. ACM Computing Surveys(CSUR),2018,51(3):Article No. 48. doi:10.1145/3178582.
[16] Guha S,Mishra N,Roy G,et al. Robust random cut forest based anomaly detection on streams ∥ 33rd International Conference on Machine Learning. New York,NY,USA:JMLR,2016:2712-2721.
[17] 邱一卉,林成德. 基于随机森林方法的异常样本检测方法. 福建工程学院学报,2007,5(4):392-396.(Qiu Y H,Lin C D. Outlier detection based on random forest. Journal of Fujian University of Technology,2007,5(4):392-396.)
[18] Zhou Q F,Zhou H,Ning Y P,et al. Two approaches for novelty detection using random forest. Expert Systems with Applications,2015,42(10):4840-4850.
[19] 李贞贵. 随机森林改进的若干研究. 硕士学位论文. 厦门:厦门大学,2013.(Li Z G. Several research on Random Forest improve. Master Dissertation. Xiamen:Xiamen University,2013.)
[20] Breiman L,Friedman J H,Olshen R A,et al. Classification and regression trees. Wadsworth International Group,1984,57(1):243-246.
[21] Breiman L. Bagging predictors. Machine Learning,1996,24(2):123-140.
[22] Breiman L. Random forests. Machine Learning,2001,45(1):5-32.
[23] Zadeh L A. Fuzzy sets. Information & Control,1965,8(3):338-353.
[24] 张 亮,宁 芊. CART决策树的两种改进及应用. 计算机工程与设计,2015(5):1209-1213.(Zhang L,Ning Q. Two improvements on CART decision tree and its application. Computer Engineering and Design,2015(5):1209-1213.)
[25] Chang C C,Lin C J. LIBSVM:A library for support vector machines. ACM Transactions on Intelligent Systems and Technology,2011,2(3):Article No. 27. Doi:10.1145/1961189.1961199.
[1] 陈石,张兴敢. 基于小波包能量熵和随机森林的级联H桥多电平逆变器故障诊断[J]. 南京大学学报(自然科学版), 2020, 56(2): 284-289.
[2] 洪思思,曹辰捷,王 喆*,李冬冬. 基于矩阵的AdaBoost多视角学习[J]. 南京大学学报(自然科学版), 2018, 54(6): 1152-1160.
[3]  郑丽容1,洪志令2*.  HSEC:基于聚类的启发式选择性集成[J]. 南京大学学报(自然科学版), 2018, 54(1): 116-.
[4]  曹冬寅,王 琼*,张兴敢. 基于稀疏重构残差和随机森林的集成分类算法
[J]. 南京大学学报(自然科学版), 2016, 52(6): 1127-.
[5] 邢 胜1,2 王熙照3*, 王晓兰4. 基于多类重采样的非平衡数据极速学习机集成学习[J]. 南京大学学报(自然科学版), 2016, 52(1): 203-211.
[6] 朱亚奇1,邓维斌1 ,2*. 一种基于不平衡数据的聚类抽样方法[J]. 南京大学学报(自然科学版), 2015, 51(2): 421-429.
[7] 朱亚奇1,邓维斌1,2*. 一种基于不平衡数据的聚类抽样方法[J]. 南京大学学报(自然科学版), 2015, 51(2): 421-429.
[8] 许行1,梁吉业1,2,王宝丽1. 基于双向有序互信息的单调分类决策树算法[J]. 南京大学学报(自然科学版), 2013, 49(5): 628-636.
[9]  郭丽娜1**,杨杨2.  一种基于模糊支持向量机软件模块缺陷检测算法*
[J]. 南京大学学报(自然科学版), 2012, 48(2): 221-227.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 林 銮,陆武萍,唐朝生,赵红崴,冷 挺,李胜杰. 基于计算机图像处理技术的松散砂性土微观结构定量分析方法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1064 -1074 .
[2] 段新春,施 斌,孙梦雅,魏广庆,顾 凯,冯晨曦. FBG蒸发式湿度计研制及其响应特性研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1075 -1084 .
[3] 梅世嘉,施 斌,曹鼎峰,魏广庆,张 岩,郝 瑞. 基于AHFO方法的Green-Ampt模型K0取值试验研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1085 -1094 .
[4] 汪 勇,刘 瑾*,宋泽卓,白玉霞,王琼亚,祁长青,孙少锐. 高分子稳定剂加固河道边坡表层砂土室内试验研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1095 -1104 .
[5] 卢 毅,于 军,龚绪龙,王宝军,魏广庆,季峻峰. 基于DFOS的连云港第四纪地层地面沉降监测分析[J]. 南京大学学报(自然科学版), 2018, 54(6): 1114 -1123 .
[6] 孙 玫,张 森,聂培尧,聂秀山. 基于朴素贝叶斯的网络查询日志session划分方法研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1132 -1140 .
[7] 洪思思,曹辰捷,王 喆*,李冬冬. 基于矩阵的AdaBoost多视角学习[J]. 南京大学学报(自然科学版), 2018, 54(6): 1152 -1160 .
[8] 魏 桐,童向荣. 基于加权启发式搜索的鲁棒性信任路径生成[J]. 南京大学学报(自然科学版), 2018, 54(6): 1161 -1170 .
[9] 周星星,张海平,吉根林. 具有时空特性的区域移动模式挖掘算法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1171 -1182 .
[10] 赵天龙,刘 峥,韩慧健,张彩明. 基于二分图的个性化图像标签推荐算法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1193 -1205 .