南京大学学报(自然科学版) ›› 2020, Vol. 56 ›› Issue (1): 125–131.doi: 10.13232/j.cnki.jnju.2020.01.014

• • 上一篇    下一篇

融合依存信息和卷积神经网络的越南语新闻事件检测

王吉地1,2,郭军军1,2,黄于欣1,2,高盛祥1,2(),余正涛1,2,张亚飞1,2   

  1. 1. 昆明理工大学信息工程与自动化学院,昆明,650500
    2. 云南省人工智能重点实验室,昆明理工大学,昆明,650500
  • 收稿日期:2019-08-20 出版日期:2020-01-30 发布日期:2020-01-10
  • 通讯作者: 高盛祥 E-mail:gaoshengxiang.yn@foxmail.com
  • 基金资助:
    国家自然科学基金(61762056);国家重点研发计划(2018YFC0830105);云南省高新计划专项(201606);云南省自然科学基金(2018FB104)

Vietnamese news event detection based on converge dependent information and convolutional neural networks

Jidi Wang1,2,Junjun Guo1,2,Yuxin Huang1,2,Shengxiang Gao1,2(),Zhengtao Yu1,2,Yafei Zhang1,2   

  1. 1. Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,650500,China
    2. Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming,650500,China
  • Received:2019-08-20 Online:2020-01-30 Published:2020-01-10
  • Contact: Shengxiang Gao E-mail:gaoshengxiang.yn@foxmail.com

摘要:

新闻事件检测是自动检测新闻文本中出现的相关事件,需要大量人力设计模板,而且难以获取句中隐含的语义信息,识别触发词时多存在歧义.为解决以上问题,利用融合依存句法信息的卷积神经网络(Dependency Parsing Convolutional Neural Networks,DPCNN),针对句子级别越南语新闻事件进行检测.该模型在编码过程中融合了词义、位置信息、词性信息和命名实体信息,利用传统卷积编码连续词之间的特征,利用融合依存句法信息的卷积编码非连续词之间的特征,再融合两部分特征作为事件编码,进而实现事件检测.实验结果表明,该方法在越南语新闻事件检测中取得了很好的效果.

关键词: 新闻事件检测, 依存句法信息, 卷积神经网络, 越南语

Abstract:

News event detection automatically detects related events appearing in news. Currently,detecting news events requires a lot of manpower design templates. Implicit semantic information in sentences is difficult to obtain,and there are many ambiguities in identifying trigger words. In this paper,we use the convolutional neural network method of Dependency Parsing Convolutional Neural Networks (DPCNN) to detect sentence?level Vietnamese news events. Firstly,the model combines the words' semantics part of speech information,the entity information and the position information in the encoding process. Secondly,the traditional convolution is used to encode the continuous character features,and the convolution of fusion dependent information is used to encode the non?continuous character features,thereby implementing event detection. The experimental results show that the method has achieved good results in the detection of Vietnamese leaders travel events.

Key words: news event detection, dependency parsing information, convolutional neural network, Vietnamese

中图分类号: 

  • TP391

图1

DPCNN模型的结构"

图2

S1的依存句法分析结果"

表1

触发词表"

事件类型事件触发词

Ra ngoàitruy c?p

外出访问

?i th?m,Ghé th?m,Truy c?p,Nhi?m v?,Ki?m tra

拜访,访问,出访,走访,探问,考察,探访…

g?p l?nh ??o

高层会见

H?p,G?p g?,Nói chuy?n,Ti?p ki?n,G?p nhau,G?p m?t,H?i ?àm

会见,接见,见面,会晤,会谈…

Tham d?s?ki?n

出席活动

Tham d?,?i ra,Tham gia,S?p x?p,??n r?i,T?i cu?c h?p

出席,出面,参加,列席,到场,到会…

Phát bi?u

发表讲话

Bài gi?ng, xu?t b?n, nói, trình bày, phát bi?u, ?? ngh?, nói chuy?n

演讲,发表,发言,提出,讲话,谈话…

T?ng tuy?n c?

换届选举

?? ngh?, b? phi?u, Gi?i thi?u, b?u c?

推举,选举,推选,投票竞选…

表2

卷积层数对实验结果的影响"

卷积层数PRF
174.04%62.63%70.08%
276.78%64.25%71.45%
375.53%59.01%68.23%

表3

编码特征对实验结果的影响"

编码特征PRF

词向量、位置向量、

词性向量和实体向量

76.78%64.25%71.45%

词向量、词性向量和

实体向量

74.23%62.3%69.2%

词向量、位置向量和

实体向量

71.88%63.4%69.3%

词向量、位置向量和

词性向量

73.46%64.02%69.97%

表4

卷积核大小对实验结果的影响"

卷积核大小PRF
2,3,473.21%63.59%66.73%
3,4,576.78%64.25%71.45%
4,5,675.07%61.22%70.88%
5,6,773.12%62.25%67.54%

表5

不同模型的性能对比"

模型PRF
RNN70.23%65.89%67.23%
CNN73.23%63.14%69.23%
GCNs75.00%63.92%70.24%
DPCNN76.78%64.25%71.45%
1 Linguistic D C. ACE (Automatic Content Extrac?tion) Chinese annotation guidelines for events,version5.5.1.http:∥www.ldc.upenn.edu/Projects/ACE,2005.
2 黄敏中. 实用越南语语法. 北京:北京大学出版社,1997,325.
3 高源,席耀一,李弼程等. 基于依存句法分析与分类器融合的触发词抽取方法. 计算机应用研究,2016,33(5):1407-1410.
Gao Y,Xi Y Y,Li B C,et al. Trigger extraction algorithm based on dependency parsing and classifier fusion. Application Research of Computers,2016,33(5):1407-1410.
4 Massung S,Zhai C,Hockenmaier J,et al. Structural parse tree features for text representation∥The 7th International Conference on Semantic Computing. Irvine,CA,USA:IEEE,2013: 9-16.
5 张炫.微博事件抽取硕士学位论文.南京:东南大学,2017. (Zhang X.Event extraction from twitter. Master Dissertation. Nanjing: Southeast University,2017.)
6 裴东辉.中文新闻事件抽取方法研究硕士学位论文.昆明:昆明理工大学,2015.(Pei D H. Research on Chinese news event extraction method. Master Dissertation. Kunming:Kunming University of Science and Technology,2015.)
7 高永兵,陈超,熊振华等. 基于个人微博特征的事件提取研究. 计算机应用与软件,2016,33(7):47-51.
Gao Y B,Chen C,Xiong Z H,et al. Research on event extraction based on personal microblog characteristics. Computer Applications and Software,2016,33(7):47-51.
8 Nguyen T H,Cho K,Grishman R,et al. Joint Event extraction via Recurrent Neural Networks∥Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Berlin,Germany:ACL Press,2016:300-309.
9 Chen Y,Xu L,Liu K,et al. Event Extraction via dynamic multi?pooling convolutional neural networks∥Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing,China:ACL Press,2015:167-176.
10 Nguyen T H,Grishman R. Event detection and domain adaptation with convolutional neural networks∥Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing,China:ACL Press,2015:365-371.
11 Nguyen T H,Grishman R. Modeling Skip?Grams for event detection with convolutional neural networks∥Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing.Berlin,Germany:ACL Press,2016:86-91.
12 Kipf T N,Welling M. Semi?supervised classification with graph convolutional networks∥The 5th International Conference on Learning Representa?tions.Vancouver,Canada:ACL Press,2017:89-96.
13 Marcheggiani D,Titov I. Encoding sentenceswith graph convolutional networks for semantic role labeling∥Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Vancouver,Canada:ACL Press,2017:1506-1515.
14 Kearnes S,McCloskey K,Berndl M,et al. Molecular graph convolutions:moving beyond fingerprints. Journal of Computer?Aided Molecular Design,2016,30(8):595-608.
15 Nguyen T H,Grishman R. Graph convolutional networks with argument?aware pooling for event detection∥The 18th AAAI Conference on Artificial Intelligence.New Orleans,LA,USA:AAAI Press,2018:5900-5907.
16 Liu S,Chen Y,Liu K,et al. Exploiting argument information to improve event detection via supervised attention mechanisms∥Proceedings of the 55th Annual Meeting of the Association for Computa?tional Linguistics.Vancouver,Canada:ACL Press,2017:1789-1798.
17 Mikolov T,Chen K,Corrado G,et al. Efficient estimation of word representations in vector space∥The 1st International Conference on Learning Representations. Sofia,Bulgaria:ACL Press,2013:167-181.
18 侯中熙,杨蓓. 基于SVMTooL的越南语词性标注. 价值工程,2016,35(20):159-161. (Hou Z X,Yang P. Vietnamese word?of?speech tagging based on SVMTooL. Value Engineering,2016,35(20):159-161.)
19 刘艳超,郭剑毅,余正涛等. 融合实体特性识别越南语复杂命名实体的混合方法. 智能系统学报,2016,11(4):503-512.
Liu Y C,Gou J Y,Yu Z T,et al. A hybrid method for identifying Vietnamese complex named entities by merging entity features. CAAI Transactions on Intelligent Systems,2016,11(4):503-512.
20 李英,郭剑毅,余正涛等. 越南语短语树到依存树的转换研究. 计算机科学与探索,2017,11(4):599-607.
Li Y,Guo J Y,Yu Z T,et al. Constituent?to?dependency conversion for Vietnamese. Journal of Frontiers of Computer Science & Technology,2017,11(4):599-607.
[1] 朱伟,张帅,辛晓燕,李文飞,王骏,张建,王炜. 结合区域检测和注意力机制的胸片自动定位与识别[J]. 南京大学学报(自然科学版), 2020, 56(4): 591-600.
[2] 梅志伟,王维东. 基于FPGA的卷积神经网络加速模块设计[J]. 南京大学学报(自然科学版), 2020, 56(4): 581-590.
[3] 赵子龙,赵毅强,叶茂. 基于FPGA的多卷积神经网络任务实时切换方法[J]. 南京大学学报(自然科学版), 2020, 56(2): 167-174.
[4] 狄 岚, 何锐波, 梁久祯. 基于可能性聚类和卷积神经网络的道路交通标识识别算法[J]. 南京大学学报(自然科学版), 2019, 55(2): 238-250.
[5] 胡 太, 杨 明. 结合目标检测的小目标语义分割算法[J]. 南京大学学报(自然科学版), 2019, 55(1): 73-84.
[6] 安 晶, 艾 萍, 徐 森, 刘 聪, 夏建生, 刘大琨. 一种基于一维卷积神经网络的旋转机械智能故障诊断方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 133-142.
[7] 梁蒙蒙1,周 涛1,2*,夏 勇3,张飞飞1,杨 健1. 基于随机化融合和CNN的多模态肺部肿瘤图像识别[J]. 南京大学学报(自然科学版), 2018, 54(4): 775-.
[8]  李 英1,2,郭剑毅1,2*,余正涛1,2,线岩团1,2,陈 玮1,2. 融合越南语语言特征与改进PCFG的越南语短语树库构建[J]. 南京大学学报(自然科学版), 2017, 53(2): 357-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 韩普,刘亦卓,李晓艳. 基于深度学习和多特征融合的中文电子病历实体识别研究[J]. 南京大学学报(自然科学版), 2019, 55(6): 942 -951 .