南京大学学报(自然科学版) ›› 2015, Vol. 51 ›› Issue (1): 181–186.

• • 上一篇    下一篇

融合领域知识短语树核函数的中文领域实体关系抽取

陈鹏1郭剑毅1,2余正涛1,2严馨1,2张志坤1,2高盛祥1,2   

  • 出版日期:2015-01-05 发布日期:2015-01-05
  • 作者简介: (1. 昆明理工大学信息工程与自动化学院,昆明,650504; 2. 昆明理工大学智能信息处理重点实验室,昆明,650504)
  • 基金资助:
    国家自然科学基金(61175068),云南省教育厅基金重大专项项目(KKJI201203001),云南省科技厅重点项目(KKSD201303007)

Chinese domain entity relation extraction based on domain knowledge phrasal tree

Chen Peng1, Guo Jianyi1,2*, Yu Zhengtao1,2, Yan Xin1,2, Zhang Zhikun1,2, Gao Shengxiang1,2
  

  • Online:2015-01-05 Published:2015-01-05
  • About author:(1.The School of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650504, China
    2. Key Laboratory of Intelligent Information Processing, Kunming University of Science and Technology, Kunming, 650504, China)

摘要: 针对在核函数方法中,传统短语树只包含通用领域信息难以训练出适应特定领域关系抽取模型的问题,本文提出一种融入领域知识短语树的中文领域实体关系抽取方法.基于Web上中文特定领域网站的信息结构特点,构建出能反映特定领域实体语义关系的领域知识树,并将其融合到实例句的句法树中,得到特定领域实体语义树.然后通过支持向量机训练,得到实体关系的分类模型,对特定领域实体关系进行抽取.在收集的600篇旅游领域语料上进行关系抽取实验,结果表明:本文所提出的方法优于不融入领域信息的方法,F值提高了3.4.

Abstract: Aiming at solve the problem that the traditional tree kernel method can’t train the suited model to extract entity relation in given domain, this paper proposed a method of Chinese domain entity relation extraction based on domain knowledge phrasal tree. Based on the features in web page of Chinese domain-specific website, this paper structured a domain knowledge tree which can reflect semantic information between domain entities, fused the information into the traditional phrasal tree. Finally, this paper obtained a classification model of entity relationship by use support vector machine to extract entity relation in given domain. Through the relation extraction experiment on collecting 600 corpuses in tourist domain, the experimental result shows that the presented method is better than the traditional tree method, and the F value improves 3.4 %.

[1] 奚 斌, 钱龙华, 周国栋等.语言学组合特征在语义关系抽取中的应用.中文信息学报, 2008, 22(3): 44~49,63.
[2] Lei C Y, Guo J Y, Yu Z T, et al. The field of automatic entity relation extraction based on binary classifier and reasoning. In: The 3rd International Symposium on Information Processing. Qingdao, China, 2010:327-2~331.
[3] 车万翔, 刘 挺, 李 生. 实体关系自动抽取. 中文信息学报, 2005, 19(2): 1~6.
[4] Qian L H, Zhou G D, Zhu Q M. Exploiting constituent dependencies for tree kernel-based semantic relation extraction. In: 2008 International Conference on Computational Linguistics (COLING’2008), Manchester, UK, 2008: 697~704.
[5] 郭剑毅,李 真,余正涛等.领域本体概念实例、属性和属性值的抽取及关系预测. 南京大学学报(自然科学), 2012, 48(4): 383~389.
[6] 黄瑞红,孙 乐,冯元勇等.基于核方法的中文实体关系抽取研究.中文信息学报, 2008, 22(5): 102~108.
[7] Zhang J, Ouyang Y, Li W J, et al. A novel composite kernel approach to Chinese entity relation extraction. Lecture Notes in Computer Science, 2009, 5459:236~247.
[8] Qian L L, Zhou G D. Tree kernel-based protein-protein interaction extraction from biomedical literature. Journal of Biomedical Informatics, 2012,45(3): 535~543.
[9] 虞欢欢, 钱龙华, 周国栋等. 基于合一句法和实体语义树的中文语义关系抽取. 中文信息学报, 2010, 24(5): 17~23.
[10] Peng C, Gu J H, Qian L H. Research on tree kernel-based personal relation extraction. Communications in Computer and Information Science, 2012, 333: 225~236.
[11] Liu D D, Zhao Z W, Hu Y N. Incorporating lexical semantic similarity to tree kernel-based Chinese relation extraction. Lecture Notes in Computer Science, 2013, 7717: 11~21.
[12] 郭剑毅, 薛征山, 余正涛. 基于层叠条件随机场的旅游领域命名实体识别. 中文信息学报, 2009, 23(5): 47~52.
[13] Collins M, Duffy N. Covolution kernels for natural language. Neural Tree Information Processing Systems: Natural and Synthetic (NIPS’2001), Cambridge, MA, 2001: 625~632.
[14] Zelenko D, Aone C, Richardella A. Kernel methods for relation extraction. Journal of Machine Learning Research, 2003, 3(6): 1083~1106.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!