南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (2): 357.
李 英1,2,郭剑毅1,2*,余正涛1,2,线岩团1,2,陈 玮1,2
Li Ying1,2,Guo Jianyi1,2*,Yu Zhengtao1,2*,Xian Yantuan1,2,Chen Wei1,2
摘要: 短语树库是自然语言处理的研究和实际应用的重要资源,就越南语而言目前也缺乏这类树库资源,不利于汉越双语信息处理工作.提出一种融合越南语语法特征与改进PCFG(概率上下文无关文法)的越南语短语树库构建方法,能够自动分析出越南语的短语结构树,从而可解决了越南语短语树库的自动构建问题.首先通过分析越南语的语言特征,制定出越南语的语言特征集;然后利用Inside-Outside算法从人工标注的少量越南语短语树获取PCFG模型中的语法规则集;最后将语法特征集作为语法规则集的补充融入PCFG模型,用得到的新模型最终完成越南语短语树库的构建.实验结果表明,新的PCFG模型针对越南语短语树库构建的准确率达到了81.14%,相比传统PCFG模型以及基于最大熵的树库构建方法准确率明显提高了2%~3%.
[1] 刘 挺,马金山.汉语自动句法分析的理论与方法.当代语言学,2009(2):100-112.(Liu T,Ma J S.Theories and methods of Chinese automatic syntactic parsing:A critical survey.Contem-porary Linguistics,2009(2):100-112.) [2] Johnson M.PCFG models of linguistic tree representations.Computational Linguistics,1998,24(4):613-632. [3] Meyer N J,Allen J P.Commitment in the workplace.Sage Publications,1997,175. [4] Zhang K,Zan H,Han Y,et al.Preliminary study on the construction of bilingual phrase structure Treebank.Lecture Notes in Computer Science,2014,8922:403-413. [5] Hong P L,Nguyen T M H,Roussanaly A.Vietnamese parsing with an automatically extracted tree-adjoining grammar.In:IEEE Rivf International Conference on Computing and Communication Technologies,Research,Innovation,and Vision for the Future.Vietnam,2012:1-6. [6] Dinh D,Thuy N,Xuan Q,et al.The parallel corpus approach to building the syntactic tree transfer set in the English-to-Vietnamese machine Translation.Química Nova,2009,32(6):1477-1481. [7] Arda8 1e3l0eb8i7,3A5rzucanArda.N-gram parsing for jointly training a discriminative constituency parser.Polibits,2013,47(47):5-12. [8] Dukes K,Habash N.One-step statistical parsing of hybrid dependency-constituency syntactic representations.In:International Conference on Parsing Technologies,Iwpt 2011.Dubin,Ireland:Dublin City University,2011,92-103. [9] Ule T.Directed treebank refinement for PCFG parsing.In:The Workshop on Treebanks & Linguistic Theories.2013:523-530. [10] Antony P J,Warrier N J,Soman K P.Penn Treebank.International Journal of Computer Applications,2010,7(8):14-21. [11] Li J,Mu L,Zan H,et al.Research on Chinese parsing based on the improved compositional vector grammar.In:Chinese Lexical Semantics.Springer International Publishing,2015. [12] Li X,Zong C.An effective framework for chinese syntactic parsing.International Journal of Signal Processing,2005:201. [13] Carroll B G,Rooth M.Valence induction with a head-lexicalised PCFG.In:Conference on Empirical Methods in Natural Language Processing,2013. [14] Nguyen C T,Nguyen T K,Phan X H,et al.Vietnamese word segmentation with CRFs and SVMs:An investigation.In:Asio Pacific International Conference on Language,Information and Computing.Wuhan,China 2006. [15] Le H P,Nguyen T M H,Romary L,et al.A lexicalized tree-adjoining grammar for Vietna-mese.In:International Conference on Language Rescources and Evaluation (Lrec).France,2006. [16] Dinh H T,Lee C,Niyato D,et al.A survey of mobile cloud computing:architecture,applications,and approaches.Wireless Communications & Mobile Computing,2015,13(18):1587-1611. [17] Carpenter B.The generative power of categorial grammars and head-driven phrase structure grammars with lexical rules.Computational Linguistics,2013,17:301-314. [18] Andy W.Robust sub-sentential alignment of phrase-structure trees.Journal of Neurology Neurosurgery and Psychiatry,2010,54(9):848-849. [19] Langlais P,Gotti F.Phrase-based SMT with shallow Tree-Phrases.In:The Workshop on Statistical Machine Translation.Association for Computational Linguistics,2006:39-46. [20] Volk B M,Gustafsoncapková S,Lundborg J,et al.Phrase alignment in parallel Treebanks.In Proc.TLT-2006,2014:91-102. [21] Johan H,Joakim N.Parsing discontinuous phrase structure with grammatical functions.In:Interna-tional Conference on Advances in Natural Language Processing.Springer-Verlag,2013:169-180. |
No related articles found! |
|