南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (1): 2940.doi: 10.13232/j.cnki.jnju.2019.01.003
秦 娅1,2,申国伟1,2*,赵文波1,陈艳平1,2
Qin Ya1,2,Shen Guowei1,2*,Zhao Wenbo1,Chen Yanping1,2
摘要: 基于安全知识图谱的网络安全威胁情报分析能够细粒度地分析多源威胁情报数据,因此受到广泛关注. 传统的命名实体识别方法难以识别网络安全领域中新的或中英文混合的安全实体,且提取的特征不充分,因此难以准确地识别网络安全实体. 在深度神经网络模型的基础上,提出一种结合特征模板的CNN-BiLSTM-CRF的网络安全实体识别方法,利用人工特征模板提取局部上下文特征,进一步利用神经网络模型自动提取字符特征和文本全局特征. 实验结果表明,在大规模网络安全数据集上,提出的网络安全实体识别方法,相关评价指标优于其他算法,F值达到86%.
中图分类号:
[1] 刘 峤,李 杨,段 宏 等. 知识图谱构建技术综述. 计算机研究与发展,2016,53(3):582-600.(Liu Q,Li Y,Duan H,et al. Knowledge graph construction techniques. Journal of Computer Research and Development,2016,53(3):582-600.) [2] 李建华. 网络空间威胁情报感知、共享与分析技术综述. 网络与信息安全学报,2016,2(2):16-29.(Li J H. Overview of the technologies of threat intelligence sensing,sharing and analysis in cyber space. Chinese Journal of Network and Information Security,2016,2(2):16-29.) [3] 张晓艳,王 挺,陈火旺. 命名实体识别研究. 计算机科学,2005,32(4):44-48.(Zhang X Y,Wang T,Chen H W. Research on named entity recognition. Computer Science,2005,32(4):44-48.) [4] Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE,1989,77(2):267-296. [5] Koeling R. Chunking with maximum entropy models ∥ Proceedings of the 2nd workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning. Stroudsburg,PA,USA:Association for Computational Linguistics,2000:305-312. [6] Lafferty J D,Mccallum A,Pereira F C N. Conditional random fields:probabilistic models for segmenting and labeling sequence data ∥ 18th International Conference on Machine Learning. San Francisco,CA,USA:Morgan Kaufmann Publishers Inc,2001:282-289. [7] 邱泉清,苗夺谦,张志飞. 中文微博命名实体识别. 计算机科学,2013,40(6):196-198.(Qiu Q Q,Miao D Q,Zhang Z F. Named entity recognition on Chinese microblog. Computer Science,2013,40(6):196-198.) [8] Joshi A,Lal R,Finin T,et al. Extracting cybersecurity related linked data from text ∥ 2013 IEEE Seventh International Conference on Semantic Computing. Irvine,CA,USA:IEEE Computer Society,2013:252-259. [9] Collobert R,Weston J. A unified architecture for natural language processing:deep neural networks with multitask learning ∥ Proceedings of the 25th International Conference on Machine Learning. Helsinki,Finland:ACM,2008:160-167. [10] Collobert R,Weston J,Bottou L,et al. Natural language processing(almost)from scratch. The Journal of Machine Learning Research,2011,12(1):2493-2537. [11] Hochreiter S,Schmidhuber J. Long short-term memory. Neural Computation,1997,9(8):1735-1780. [12] Hammerton J. Named entity recognition with long short-term memory ∥ Proceedings of the 7th Conference on Natural Language Learning at Hlt-Naacl. Stroudsburg,PA,USA:Association for Computational Linguistics,2003:172-175. [13] Peng N Y,Dredze M. Named entity recognition for Chinese social media with jointly trained embeddings ∥ Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon,Portugal:The Association for Computational Linguistics,2015:548-554. [14] Huang Z H,Xu W,Yu K. Bidirectional LSTM-CRF models for sequence tagging. 2018,arXiv:1508.01991. [15] Dong C H,Zhang J J,Zong C Q,et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition ∥ International Conference on Computer Processing of Oriental Languages. Springer Berlin Heidelberg,2016:239-250. [16] Lample G,Ballesteros M,Subramanian S,et al. Neural architectures for named entity recognition. 2016,arXiv:1603.01360. [17] Chiu J P C,Nichols E. Named entity recognition with bidirectional LSTM-CNNs. 2016,arXiv:1511.08308. [18] Ma X Z,Hovy E. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. 2016,arXiv:1603.01354. [19] Mikolov T,Chen K,Corrado G,et al. Efficient estimation of word representations in vector space. 2013,arXiv:1301.3781. [20] Mikolov T,Sutskever I,Chen K,et al. Distributed representations of words and phrases and their compositionality ∥ Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe,NV,USA:Curran Associates Inc,2013,26:3111-3119. [21] Lécun Y,Bottou L,Bengio Y,et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE,1998,86(11):2278-2324. [22] Yang Y M. An evaluation of statistical approaches to text categorization. Information Retrieval,1999,1(1-2):69-90. |
[1] | 杨 薇, 王洪元, 张 继, 张中宝. 一种基于Faster-RCNN的车辆实时检测改进算法[J]. 南京大学学报(自然科学版), 2019, 55(2): 231-237. |
[2] | 王红斌,李金绘,沈 强*,线岩团,毛存礼. 基于最大熵的泰语句子级实体从属关系抽取[J]. 南京大学学报(自然科学版), 2017, 53(4): 738-. |
[3] | 珠 杰1,2*,李天瑞1,刘胜久1. 基于条件随机场的藏文人名识别技术研究[J]. 南京大学学报(自然科学版), 2016, 52(2): 289-. |
|