基于深度神经网络的网络安全实体识别方法

doi:10.13232/j.cnki.jnju.2019.01.003

南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (1): 29–40.doi: 10.13232/j.cnki.jnju.2019.01.003

基于深度神经网络的网络安全实体识别方法

秦　娅^1,2，申国伟^1,2*，赵文波¹，陈艳平^1,2

1.贵州大学计算机科学与技术学院，贵阳，550025；2.贵州省公共大数据重点实验室，贵阳，550025

接受日期:2018-09-01 出版日期:2019-02-01 发布日期:2019-01-26
通讯作者: 申国伟,E－mail：gwshen@gzu.edu.cn E-mail:gwshen@gzu.edu.cn
基金资助:
国家自然科学基金(61802081)，贵州省自然科学基金(20161052)，贵州省公共大数据重点实验室开放课题(2017BDKFJJ024)，贵州大学博士基金(201526)

Research on the method of network security entity recognition based on deep neural network

Qin Ya^1,2，Shen Guowei^1,2*，Zhao Wenbo¹，Chen Yanping^1,2

1. College of Computer Science and Technology，GuiZhou University，Guiyang，550025，China； 2. Guizhou Provincial Key Laboratory of Public Big Data，Guiyang，550025，China.

Accepted:2018-09-01 Online:2019-02-01 Published:2019-01-26
Contact: Shen Guowei,E－mail：gwshen@gzu.edu.cn E-mail:gwshen@gzu.edu.cn

摘要/Abstract

摘要： 基于安全知识图谱的网络安全威胁情报分析能够细粒度地分析多源威胁情报数据，因此受到广泛关注. 传统的命名实体识别方法难以识别网络安全领域中新的或中英文混合的安全实体，且提取的特征不充分，因此难以准确地识别网络安全实体. 在深度神经网络模型的基础上，提出一种结合特征模板的CNN－BiLSTM－CRF的网络安全实体识别方法，利用人工特征模板提取局部上下文特征，进一步利用神经网络模型自动提取字符特征和文本全局特征. 实验结果表明，在大规模网络安全数据集上，提出的网络安全实体识别方法，相关评价指标优于其他算法，F值达到86%.

关键词: 网络安全实体识别, 特征模板, CNN, BiLSTM, CRF

Abstract: With the continuous development of the Internet technology，network security threat intelligence analysis that base on security knowledge graph(SKG) can analyze multi－source threat intelligence data in a fine－grained manner，which has received extensive attention. Traditional named entity recognition(NER) methods are difficult to identify network security entity which mix Chinese and English in the field of network security，and can't fully extract some features，so it is difficult to accurately identify the network security entity. In this paper，we propose a novel CNN－BiLSTM－CRF security entity recognition method combining with feature template(FT－CNN－BiLSTM－CRF)on the basis of deep learning model. The feature template is used to extract local context features，and neural network model is used to automatically extract character features and text global features. Firstly，each character of the input sequence is converted into a corresponding character vector，and the convolutional neural network(CNN) extracts the character－level features. Secondly，the character－level features vectors are input into the BiLSTM(Bi－Long Short－Term Memory) together with the local context vectors extracted by the feature template. The global features of the security entity are automatically extracted by BiLSTM. Finally，the CRF(Conditional Random Fields) labels the network security entity to obtain the recognition result of the security entity. The experimental results show that our method reaches 86% F－scores on the large－scale network security dataset and outperforms other methods.

Key words: network security entity recognition, feature template, CNN, BiLSTM, CRF

中图分类号:

TP391

秦　娅, 申国伟, 赵文波, 陈艳平. 基于深度神经网络的网络安全实体识别方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 29–40.

Qin Ya, Shen Guowei, Zhao Wenbo, Chen Yanping. Research on the method of network security entity recognition based on deep neural network[J]. Journal of Nanjing University(Natural Sciences), 2019, 55(1): 29–40.

参考文献

[1]　刘　峤，李　杨，段　宏等. 知识图谱构建技术综述. 计算机研究与发展，2016，53(3)：582－600.(Liu Q，Li Y，Duan H，et al. Knowledge graph construction techniques. Journal of Computer Research and Development，2016，53(3)：582－600.)
[2] 李建华. 网络空间威胁情报感知、共享与分析技术综述. 网络与信息安全学报，2016，2(2)：16－29.(Li J H. Overview of the technologies of threat intelligence sensing，sharing and analysis in cyber space. Chinese Journal of Network and Information Security，2016，2(2)：16－29.)
[3] 张晓艳，王　挺，陈火旺. 命名实体识别研究. 计算机科学，2005，32(4)：44－48.(Zhang X Y，Wang T，Chen H W. Research on named entity recognition. Computer Science，2005，32(4)：44－48.)
[4] Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE，1989，77(2)：267－296.
[5] Koeling R. Chunking with maximum entropy models ∥ Proceedings of the 2nd workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning. Stroudsburg，PA，USA：Association for Computational Linguistics，2000：305－312.
[6] Lafferty J D，Mccallum A，Pereira F C N. Conditional random fields：probabilistic models for segmenting and labeling sequence data ∥ 18th International Conference on Machine Learning. San Francisco，CA，USA：Morgan Kaufmann Publishers Inc，2001：282－289.
[7] 邱泉清，苗夺谦，张志飞. 中文微博命名实体识别. 计算机科学，2013，40(6)：196－198.(Qiu Q Q，Miao D Q，Zhang Z F. Named entity recognition on Chinese microblog. Computer Science，2013，40(6)：196－198.)
[8] Joshi A，Lal R，Finin T，et al. Extracting cybersecurity related linked data from text ∥ 2013 IEEE Seventh International Conference on Semantic Computing. Irvine，CA，USA：IEEE Computer Society，2013：252－259.
[9] Collobert R，Weston J. A unified architecture for natural language processing：deep neural networks with multitask learning ∥ Proceedings of the 25th International Conference on Machine Learning. Helsinki，Finland：ACM，2008：160－167.
[10] Collobert R，Weston J，Bottou L，et al. Natural language processing(almost)from scratch. The Journal of Machine Learning Research，2011，12(1)：2493－2537.
[11] Hochreiter S，Schmidhuber J. Long short－term memory. Neural Computation，1997，9(8)：1735－1780.
[12] Hammerton J. Named entity recognition with long short－term memory ∥ Proceedings of the 7th Conference on Natural Language Learning at Hlt－Naacl. Stroudsburg，PA，USA：Association for Computational Linguistics，2003：172－175.
[13] Peng N Y，Dredze M. Named entity recognition for Chinese social media with jointly trained embeddings ∥ Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon，Portugal：The Association for Computational Linguistics，2015：548－554.
[14] Huang Z H，Xu W，Yu K. Bidirectional LSTM－CRF models for sequence tagging. 2018，arXiv：1508.01991.
[15] Dong C H，Zhang J J，Zong C Q，et al. Character－based LSTM－CRF with radical－level features for Chinese named entity recognition ∥ International Conference on Computer Processing of Oriental Languages. Springer Berlin Heidelberg，2016：239－250.
[16] Lample G，Ballesteros M，Subramanian S，et al. Neural architectures for named entity recognition. 2016，arXiv：1603.01360.
[17] Chiu J P C，Nichols E. Named entity recognition with bidirectional LSTM－CNNs. 2016，arXiv：1511.08308.
[18] Ma X Z，Hovy E. End－to－end Sequence Labeling via Bi－directional LSTM－CNNs－CRF. 2016，arXiv：1603.01354.
[19] Mikolov T，Chen K，Corrado G，et al. Efficient estimation of word representations in vector space. 2013，arXiv：1301.3781.
[20] Mikolov T，Sutskever I，Chen K，et al. Distributed representations of words and phrases and their compositionality ∥ Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe，NV，USA：Curran Associates Inc，2013，26：3111－3119.
[21] Lécun Y，Bottou L，Bengio Y，et al. Gradient－based learning applied to document recognition. Proceedings of the IEEE，1998，86(11)：2278－2324.
[22] Yang Y M. An evaluation of statistical approaches to text categorization. Information Retrieval，1999，1(1－2)：69－90.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于深度神经网络的网络安全实体识别方法

Research on the method of network security entity recognition based on deep neural network

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 3

Metrics

本文评价

推荐阅读 10

[1]	杨　薇, 王洪元, 张　继, 张中宝. 一种基于Faster－RCNN的车辆实时检测改进算法[J]. 南京大学学报(自然科学版), 2019, 55(2): 231-237.
[2]	王红斌，李金绘，沈　强*，线岩团，毛存礼. 基于最大熵的泰语句子级实体从属关系抽取[J]. 南京大学学报(自然科学版), 2017, 53(4): 738-.
[3]	珠　杰^1，2*，李天瑞¹，刘胜久¹. 基于条件随机场的藏文人名识别技术研究[J]. 南京大学学报(自然科学版), 2016, 52(2): 289-.