南京大学学报(自然科学版) ›› 2021, Vol. 57 ›› Issue (5): 785–792.doi: 10.13232/j.cnki.jnju.2021.05.008

• • 上一篇    

基于注意力机制的大规模系统日志异常检测方法

房笑宇1, 曹陈涵1, 夏彬1,2()   

  1. 1.南京邮电大学计算机学院、软件学院、网络空间安全学院,南京,210023
    2.江苏省大数据安全与智能处理实验室,南京邮电大学,南京,210023
  • 收稿日期:2021-06-26 出版日期:2021-09-29 发布日期:2021-09-29
  • 通讯作者: 夏彬 E-mail:bxia@njupt.edu.cn
  • 作者简介:E⁃mail:bxia@njupt.edu.cn
  • 基金资助:
    国家自然科学基金(61802205)

Attention based log⁃level anomaly detection algorithm for large⁃scale system logs

Xiaoyu Fang1, Chenhan Cao1, Bin Xia1,2()   

  1. 1.School of Computer Science, Software and Cyberspace Security,Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
    2.Jiangsu Key Laboratory of Big Data Security & Intelligent Processing,Nanjing University of Posts and Telecommunications,Nanjing, 210023, China
  • Received:2021-06-26 Online:2021-09-29 Published:2021-09-29
  • Contact: Bin Xia E-mail:bxia@njupt.edu.cn

摘要:

针对传统检测算法主要面向会话级别的粗颗粒度日志异常检测,而无法完成日志级别的细颗粒度检测问题,提出一种基于注意力机制的日志级别异常检测算法.首先,使用基于模版的方法提取日志的所属事件类型,通过滑动窗口的方法获得日志序列.接着,将日志序列输入基于注意力机制的生成对抗网络,生成器负责生成该序列后续正常事件的分布,判别器用于判别输入的正常事件分布是由生成器生成的还是真实发生的,两者通过不断的博弈相互提升,最终通过对比生成器生成的后续正常事件和真实发生的后续事件是否一致来判断该日志事件是否异常.实验在开源数据集BGL上进行验证,结果表明本算法的准确率比传统算法提升15%.

关键词: 异常检测, 注意力机制, 长短期记忆神经网络, 生成对抗网络

Abstract:

Existing anomaly detection algorithms mainly solve the problems of the log anomaly detection at the session level,however,they cannot address the log?level anomaly detection. In this paper,we propose a log?level anomaly detection algorithm based on the attention mechanism. First,we exploit the template?based method to convert raw logs into events,and the sequential events are split into samples using the sliding window method. Then,the split sequential events (i.e.,pattern) are considered as the input of the attention based generative adversarial network. The generator is designed to generate the corresponding distribution of upcoming events based on the pattern,and the discriminator tries to distinguish whether the distribution of upcoming events is generated by the generator or extracted from the real dataset. The performance of generator and discriminator is promoted under this mechanism. Finally,we compare the normal?events generated by the generator and the real?events to determine whether the log event is abnormal. The experiments are conducted on the real?world dataset BGL,and the experimental results show that the proposed method outperforms the baseline approaches.

Key words: anomaly detection, attention mechanism, Long Short?term Memory (LSTM), Generative Adversarial Network (GAN)

中图分类号: 

  • TP183

图1

日志格式示例"

图2

算法框架"

图3

日志解析流程"

图4

生成器(判别器)结构"

图5

注意力机制"

图6

窗口尺寸设置对实验结果的影响"

图7

阈值设置对实验结果的影响"

图8

LSTM层数设置对实验结果的影响"

表1

DeepLog,LogGAN,AttLog的实验结果对比"

指标窗口尺寸DeepLogDeepLog+PEMLogGAN-PEMLogGANAttLog-PEMAttLog
观测未观测观测未观测观测未观测观测未观测观测未观测观测未观测
Precision20.0890.2550.1350.2600.9230.2950.9860.4140.7960.2670.9990.248
30.1800.2660.1960.2670.9830.3000.9890.3000.8170.3340.9410.347
40.1090.2580.0680.2630.6680.4490.9480.4600.8100.2720.8620.270
Recall20.9991.0000.9970.9990.9860.9500.9820.9861.0001.0000.9990.999
30.9750.9950.9981.0000.9870.9860.9870.9571.0001.0001.0000.998
41.0001.0000.3170.9940.9920.9690.9920.9691.0000.9991.0000.999
F1?score20.1620.4060.2330.4130.9530.4500.9840.5830.8860.4210.9990.432
30.3040.4910.3820.4130.9530.4500.9880.4570.8990.5010.9700.467
40.1960.4110.1120.4160.7850.6140.9690.6240.8950.4280.9110.426
TPN21.0000.9941.0000.9920.9990.9230.9990.9911.0000.9990.9990.998
30.9970.9761.0000.9950.9990.9790.9890.9441.0001.0001.0000.846
41.0001.0000.9260.9680.9990.9840.9990.9851.0000.9991.0000.998

表 2

不同技术组合对实验结果的影响"

技术PrecisonRecallF1?scoreTPN
OMPEMNS
0.4040.9980.5750.999
0.2830.5680.3800.912
0.4390.7990.5810.961
0.4570.3690.4460.899
0.3341.00.5011.0
0.4820.9990.4980.999
1 Vaarandi R. A data clustering algorithm for mining patterns from event logs∥Proceedings of the 3rd IEEE Workshop on IP Operations & Management.Kansas City,MO,USA:IEEE,2003:119-126.
2 Makanju A A O,Zincir?Heywood A N,Milios E E. Clustering event logs using iterative partitioning∥Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York,NY,USA:ACM,2009:1255-1264.
3 Meng W B,Liu Y,Zhu Y C,et al. LogAnomaly:Unsupervised detection of sequential and quantitative anomalies in unstructured logs∥Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao,China:IJCAI.org,2019:4739-4745.
4 Zhang X,Xu Y,Lin Q W,et al. Robust log?based anomaly detection on unstable log data∥Proceedings of 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York,NY,USA:ACM,2019:807-817.
5 Li X Y,Chen P F,Jing L X,et al. SwissLog:Robust and unified deep learning based log anomaly detection for diverse faults∥2020 IEEE 31st International Symposium on Software Reliability Engineering. Coimbra,Portugal:IEEE,2020:92-103.
6 Li T,Zeng C Q,Zhou W B,et al. FIU?Miner (a fast,integrated,and user?friendly system for data mining) and its applications. Knowledge and Information Systems,2017,52(2):411-443.
7 Mondal T,Pramanik P,Bhattacharya I,et al. Analysis and early detection of rumors in a post disaster scenario. Information Systems Frontiers,2018,20(5):961-979.
8 Troudi A,Zayani C A,Jamoussi S,et al. A new mashup based method for event detection from social media. Information Systems Frontiers,2018,20(5):981-992.
9 Chen M,Zheng A X,Lloyd J,et al. Failure diagnosis using decision trees∥Proceeding of International Conference on Autonomic Computing. New York,NY,USA:IEEE,2004:36-43.
10 Liang Y,Zhang Y Y,Xiong H,et al. Failure prediction in IBM bluegene/l event logs∥The 7th IEEE International Conference on Data Mining. Omaha,NE,USA:IEEE,2007:583-588.
11 Xia B,Yin J J,Xu J,et al. LogGAN:A sequence?based generative adversarial network for anomaly detection based on system logs∥The 2nd International Conference on Science of Cyber Security. Springer Berlin Heidelberg,2019:61-76.
12 Xu J,Jiang Y X,Zeng C Q,et al. Node anomaly detection for homogeneous distributed environments. Expert Systems with Applications,2015,42(20):7012-7025.
13 Xu J,Tang L,Li T. System situation ticket identification using SVMs ensemble. Expert Systems with Applications,2016(60):130-140.
14 Xu J,Tang L,Zeng C Q,et al. Pattern discovery via constraint programming. Knowledge?Based Systems,2016(94):23-32.
15 Zhang J,Wang H. Detecting outlying subspaces for high?dimensional data:The new task,algorithms,and performance. Knowledge and Information Systems,2006,10(3):333-355.
16 Du M,Li F F,Zheng G H,et al. Deeplog:Anomaly detection and diagnosis from system logs through deep learning∥Proceedings of 2017 ACM SIGSAC Conference on Computer and Communications Security. New York,NY,USA:ACM,2017:1285-1298.
17 Tuor A,Baerwolf R,Knowles N,et al. Recurrent neural network language models for open vocabulary event?level cyber anomaly detection. 2017,arXiv:.
18 Liu F T,Ting K M,Zhou Z H. Isolation forest∥2008 8th IEEE International Conference on Data Mining. Pisa,Italy:IEEE,2008:413-422.
19 Xu W,Huang L,Fox A,et al. Detecting large?scale system problems by mining console logs∥Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. Big Sky,MT,USA:ACM,2009:117-132.
20 Goldberg Y,Levy O. word2vec Explained:Deriving Mikolovet al.'s negative?sampling word?embedding method. 2014,arXiv:.
21 Vaswani A,Shazeer N,Parmar N,et al. Attention is all you need. 2017,arXiv:.
[1] 普志方, 陈秀宏. 基于卷积神经网络的细胞核图像分割算法[J]. 南京大学学报(自然科学版), 2021, 57(4): 566-574.
[2] 段建设, 崔超然, 宋广乐, 马乐乐, 马玉玲, 尹义龙. 基于多尺度注意力融合的知识追踪方法[J]. 南京大学学报(自然科学版), 2021, 57(4): 591-598.
[3] 戴臣超, 王洪元, 曹亮, 殷雨昌, 张继. 一种多目标跨摄像头跟踪技术研究与实现[J]. 南京大学学报(自然科学版), 2021, 57(2): 227-236.
[4] 温玉莲, 林培光. 基于行业背景差异下的金融时间序列预测方法[J]. 南京大学学报(自然科学版), 2021, 57(1): 90-100.
[5] 朱伟,张帅,辛晓燕,李文飞,王骏,张建,王炜. 结合区域检测和注意力机制的胸片自动定位与识别[J]. 南京大学学报(自然科学版), 2020, 56(4): 591-600.
[6] 徐扬,周文瑄,阮慧彬,孙雨,洪宇. 基于层次化表示的隐式篇章关系识别[J]. 南京大学学报(自然科学版), 2019, 55(6): 1000-1009.
[7] 曹欣怡,李鹤,王蔚. 基于语料库的语音情感识别的性别差异研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 758-764.
[8] 钱付兰, 黄鑫, 赵姝, 张燕平. 基于路径相互关注的网络嵌入算法[J]. 南京大学学报(自然科学版), 2019, 55(4): 573-580.
[9] 顾健伟, 曾 诚, 邹恩岑, 陈 扬, 沈 艺, 陆 悠, 奚雪峰. 基于双向注意力流和自注意力结合的机器阅读理解[J]. 南京大学学报(自然科学版), 2019, 55(1): 125-132.
[10]  胡 石12,李光辉123*,冯海林12. 基于Top­k(σ)的无线传感器网络异常数据检测算法[J]. 南京大学学报(自然科学版), 2016, 52(2): 261-.
[11] 谢骋;商琳;. 基于三支决策粗糙集的视频异常行为检测[J]. 南京大学学报(自然科学版), 2013, 49(4): 475-482.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!