基于注意力机制的大规模系统日志异常检测方法

doi:10.13232/j.cnki.jnju.2021.05.008

南京大学学报(自然科学版) ›› 2021, Vol. 57 ›› Issue (5): 785–792.doi: 10.13232/j.cnki.jnju.2021.05.008

• • 上一篇

基于注意力机制的大规模系统日志异常检测方法

房笑宇¹, 曹陈涵¹, 夏彬¹^,²()

^1.南京邮电大学计算机学院、软件学院、网络空间安全学院，南京，210023
^2.江苏省大数据安全与智能处理实验室，南京邮电大学，南京，210023

收稿日期:2021-06-26 出版日期:2021-09-29 发布日期:2021-09-29
通讯作者: 夏彬 E-mail:bxia@njupt.edu.cn
作者简介:E⁃mail：bxia@njupt.edu.cn
基金资助:
国家自然科学基金(61802205)

Attention based log⁃level anomaly detection algorithm for large⁃scale system logs

Xiaoyu Fang¹, Chenhan Cao¹, Bin Xia¹^,²()

^1.School of Computer Science, Software and Cyberspace Security，Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
^2.Jiangsu Key Laboratory of Big Data Security & Intelligent Processing，Nanjing University of Posts and Telecommunications，Nanjing, 210023, China

Received:2021-06-26 Online:2021-09-29 Published:2021-09-29
Contact: Bin Xia E-mail:bxia@njupt.edu.cn

摘要/Abstract

摘要：

针对传统检测算法主要面向会话级别的粗颗粒度日志异常检测，而无法完成日志级别的细颗粒度检测问题，提出一种基于注意力机制的日志级别异常检测算法.首先，使用基于模版的方法提取日志的所属事件类型，通过滑动窗口的方法获得日志序列.接着，将日志序列输入基于注意力机制的生成对抗网络，生成器负责生成该序列后续正常事件的分布，判别器用于判别输入的正常事件分布是由生成器生成的还是真实发生的，两者通过不断的博弈相互提升，最终通过对比生成器生成的后续正常事件和真实发生的后续事件是否一致来判断该日志事件是否异常.实验在开源数据集BGL上进行验证，结果表明本算法的准确率比传统算法提升15%.

关键词: 异常检测, 注意力机制, 长短期记忆神经网络, 生成对抗网络

Abstract:

Existing anomaly detection algorithms mainly solve the problems of the log anomaly detection at the session level,however,they cannot address the log?level anomaly detection. In this paper,we propose a log?level anomaly detection algorithm based on the attention mechanism. First,we exploit the template?based method to convert raw logs into events,and the sequential events are split into samples using the sliding window method. Then,the split sequential events (i.e.,pattern) are considered as the input of the attention based generative adversarial network. The generator is designed to generate the corresponding distribution of upcoming events based on the pattern,and the discriminator tries to distinguish whether the distribution of upcoming events is generated by the generator or extracted from the real dataset. The performance of generator and discriminator is promoted under this mechanism. Finally,we compare the normal?events generated by the generator and the real?events to determine whether the log event is abnormal. The experiments are conducted on the real?world dataset BGL,and the experimental results show that the proposed method outperforms the baseline approaches.

Key words: anomaly detection, attention mechanism, Long Short?term Memory (LSTM), Generative Adversarial Network (GAN)

中图分类号:

TP183

房笑宇, 曹陈涵, 夏彬. 基于注意力机制的大规模系统日志异常检测方法[J]. 南京大学学报(自然科学版), 2021, 57(5): 785–792.

Xiaoyu Fang, Chenhan Cao, Bin Xia. Attention based log⁃level anomaly detection algorithm for large⁃scale system logs[J]. Journal of Nanjing University(Natural Sciences), 2021, 57(5): 785–792.

图/表 10

图1

图2

图3

图4

图5

图6

图7

图8

表1

表 2

参考文献 21

1	Vaarandi R. A data clustering algorithm for mining patterns from event logs∥Proceedings of the 3^rd IEEE Workshop on IP Operations & Management.Kansas City，MO，USA：IEEE，2003：119-126.
2	Makanju A A O，Zincir?Heywood A N，Milios E E. Clustering event logs using iterative partitioning∥Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York，NY，USA：ACM，2009：1255-1264.
3	Meng W B，Liu Y，Zhu Y C，et al. LogAnomaly：Unsupervised detection of sequential and quantitative anomalies in unstructured logs∥Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao，China：IJCAI.org，2019：4739-4745.
4	Zhang X，Xu Y，Lin Q W，et al. Robust log?based anomaly detection on unstable log data∥Proceedings of 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York，NY，USA：ACM，2019：807-817.
5	Li X Y，Chen P F，Jing L X，et al. SwissLog：Robust and unified deep learning based log anomaly detection for diverse faults∥2020 IEEE 31^st International Symposium on Software Reliability Engineering. Coimbra，Portugal：IEEE，2020：92-103.
6	Li T，Zeng C Q，Zhou W B，et al. FIU?Miner (a fast，integrated，and user?friendly system for data mining) and its applications. Knowledge and Information Systems，2017，52(2)：411-443.
7	Mondal T，Pramanik P，Bhattacharya I，et al. Analysis and early detection of rumors in a post disaster scenario. Information Systems Frontiers，2018，20(5)：961-979.
8	Troudi A，Zayani C A，Jamoussi S，et al. A new mashup based method for event detection from social media. Information Systems Frontiers，2018，20(5)：981-992.
9	Chen M，Zheng A X，Lloyd J，et al. Failure diagnosis using decision trees∥Proceeding of International Conference on Autonomic Computing. New York，NY，USA：IEEE，2004：36-43.
10	Liang Y，Zhang Y Y，Xiong H，et al. Failure prediction in IBM bluegene/l event logs∥The 7th IEEE International Conference on Data Mining. Omaha，NE，USA：IEEE，2007：583-588.
11	Xia B，Yin J J，Xu J，et al. LogGAN：A sequence?based generative adversarial network for anomaly detection based on system logs∥The 2^nd International Conference on Science of Cyber Security. Springer Berlin Heidelberg，2019：61-76.
12	Xu J，Jiang Y X，Zeng C Q，et al. Node anomaly detection for homogeneous distributed environments. Expert Systems with Applications，2015，42(20)：7012-7025.
13	Xu J，Tang L，Li T. System situation ticket identification using SVMs ensemble. Expert Systems with Applications，2016(60)：130-140.
14	Xu J，Tang L，Zeng C Q，et al. Pattern discovery via constraint programming. Knowledge?Based Systems，2016(94)：23-32.
15	Zhang J，Wang H. Detecting outlying subspaces for high?dimensional data：The new task，algorithms，and performance. Knowledge and Information Systems，2006，10(3)：333-355.
16	Du M，Li F F，Zheng G H，et al. Deeplog：Anomaly detection and diagnosis from system logs through deep learning∥Proceedings of 2017 ACM SIGSAC Conference on Computer and Communications Security. New York，NY，USA：ACM，2017：1285-1298.
17	Tuor A，Baerwolf R，Knowles N，et al. Recurrent neural network language models for open vocabulary event?level cyber anomaly detection. 2017,arXiv:.
18	Liu F T，Ting K M，Zhou Z H. Isolation forest∥2008 8th IEEE International Conference on Data Mining. Pisa，Italy：IEEE，2008：413-422.
19	Xu W，Huang L，Fox A，et al. Detecting large?scale system problems by mining console logs∥Proceedings of the ACM SIGOPS 22^nd Symposium on Operating Systems Principles. Big Sky,MT,USA：ACM，2009：117-132.
20	Goldberg Y，Levy O. word2vec Explained：Deriving Mikolovet al.'s negative?sampling word?embedding method. 2014,arXiv：.
21	Vaswani A，Shazeer N，Parmar N，et al. Attention is all you need. 2017,arXiv:.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

指标	窗口尺寸	DeepLog		DeepLog+PEM		LogGAN-PEM		LogGAN		AttLog-PEM		AttLog
指标	窗口尺寸	观测	未观测	观测	未观测	观测	未观测	观测	未观测	观测	未观测	观测	未观测
Precision	2	0.089	0.255	0.135	0.260	0.923	0.295	0.986	0.414	0.796	0.267	0.999	0.248
	3	0.180	0.266	0.196	0.267	0.983	0.300	0.989	0.300	0.817	0.334	0.941	0.347
	4	0.109	0.258	0.068	0.263	0.668	0.449	0.948	0.460	0.810	0.272	0.862	0.270
Recall	2	0.999	1.000	0.997	0.999	0.986	0.950	0.982	0.986	1.000	1.000	0.999	0.999
	3	0.975	0.995	0.998	1.000	0.987	0.986	0.987	0.957	1.000	1.000	1.000	0.998
	4	1.000	1.000	0.317	0.994	0.992	0.969	0.992	0.969	1.000	0.999	1.000	0.999
F1?score	2	0.162	0.406	0.233	0.413	0.953	0.450	0.984	0.583	0.886	0.421	0.999	0.432
	3	0.304	0.491	0.382	0.413	0.953	0.450	0.988	0.457	0.899	0.501	0.970	0.467
	4	0.196	0.411	0.112	0.416	0.785	0.614	0.969	0.624	0.895	0.428	0.911	0.426
TPN	2	1.000	0.994	1.000	0.992	0.999	0.923	0.999	0.991	1.000	0.999	0.999	0.998
	3	0.997	0.976	1.000	0.995	0.999	0.979	0.989	0.944	1.000	1.000	1.000	0.846
	4	1.000	1.000	0.926	0.968	0.999	0.984	0.999	0.985	1.000	0.999	1.000	0.998

技术			Precison	Recall	F1?score	TPN
OM	PEM	NS	Precison	Recall	F1?score	TPN
			0.404	0.998	0.575	0.999
√			0.283	0.568	0.380	0.912
	√		0.439	0.799	0.581	0.961
√	√		0.457	0.369	0.446	0.899
√		√	0.334	1.0	0.501	1.0
√	√	√	0.482	0.999	0.498	0.999

[1]	普志方, 陈秀宏. 基于卷积神经网络的细胞核图像分割算法[J]. 南京大学学报(自然科学版), 2021, 57(4): 566-574.
[2]	段建设, 崔超然, 宋广乐, 马乐乐, 马玉玲, 尹义龙. 基于多尺度注意力融合的知识追踪方法[J]. 南京大学学报(自然科学版), 2021, 57(4): 591-598.
[3]	戴臣超, 王洪元, 曹亮, 殷雨昌, 张继. 一种多目标跨摄像头跟踪技术研究与实现[J]. 南京大学学报(自然科学版), 2021, 57(2): 227-236.
[4]	温玉莲, 林培光. 基于行业背景差异下的金融时间序列预测方法[J]. 南京大学学报(自然科学版), 2021, 57(1): 90-100.
[5]	朱伟,张帅,辛晓燕,李文飞,王骏,张建,王炜. 结合区域检测和注意力机制的胸片自动定位与识别[J]. 南京大学学报(自然科学版), 2020, 56(4): 591-600.
[6]	徐扬,周文瑄,阮慧彬,孙雨,洪宇. 基于层次化表示的隐式篇章关系识别[J]. 南京大学学报(自然科学版), 2019, 55(6): 1000-1009.
[7]	曹欣怡,李鹤,王蔚. 基于语料库的语音情感识别的性别差异研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 758-764.
[8]	钱付兰, 黄鑫, 赵姝, 张燕平. 基于路径相互关注的网络嵌入算法[J]. 南京大学学报(自然科学版), 2019, 55(4): 573-580.
[9]	顾健伟, 曾　诚, 邹恩岑, 陈　扬, 沈　艺, 陆　悠, 奚雪峰. 基于双向注意力流和自注意力结合的机器阅读理解[J]. 南京大学学报(自然科学版), 2019, 55(1): 125-132.
[10]	胡　石¹^，²，李光辉¹^，²^，^3*，冯海林¹^，². 基于Topk(σ)的无线传感器网络异常数据检测算法[J]. 南京大学学报(自然科学版), 2016, 52(2): 261-.
[11]	谢骋;商琳;. 基于三支决策粗糙集的视频异常行为检测[J]. 南京大学学报(自然科学版), 2013, 49(4): 475-482.

基于注意力机制的大规模系统日志异常检测方法

Attention based log⁃level anomaly detection algorithm for large⁃scale system logs

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 21

相关文章 11

Metrics

本文评价

推荐阅读 0