南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (4): 738–.

• • 上一篇    下一篇

 基于最大熵的泰语句子级实体从属关系抽取

 王红斌,李金绘,沈 强*,线岩团,毛存礼   

  • 出版日期:2017-08-02 发布日期:2017-08-02
  • 作者简介: 昆明理工大学信息工程与自动化学院,昆明,650504
  • 基金资助:
     基金项目:国家自然科学基金(61462054,61363044),云南省科技厅面上项目(2015FB135),云南省教育厅科学研究基金重点项目(2015Z022),昆明理工大学省级人培项目(KKSY201403028)
    收稿日期:2017-06-23
    *通讯联系人,E-mail:shen275171387@163.com

 The affiliation relations extraction between entities in sentences of Thai language based on maximum entropy model

 Wang Hongbin,Li Jinhui,Shen Qiang*,Xian Yantuan,Mao Cunli   

  • Online:2017-08-02 Published:2017-08-02
  • About author: Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming,650504,China

摘要:  采用基于最大熵模型的方法对泰语句子级实体从属关系的抽取方法进行了研究.针对泰语句子中实体关系抽取的研究进程中语料库较为匮乏的问题,首先使用汉泰双语平行句对作为中间桥梁,将中文研究领域中相对成熟的分词、词性标注和实体识别等成果,通过汉泰双语词典映射到与中文句子相对齐的泰语句子上,对泰语句子进行必要的数据处理操作,并进行一定量的人工校正和人工实体关系标注工作;进而构建基础的泰语实体关系训练语料库.在语料库的基础上,将泰语实体关系抽取问题转化为分类问题,同时结合泰语语言本身的特点,选取合适的上下文特征模板,使用最大熵模型算法对训练语料进行学习训练,构建分类器,对泰语句子中的候选实体关系三元组进行识别,最终达到实体间从属关系自动抽取的目的.实验结果显示该方法可使F值相对于已有的泰语实体关系抽取研究方法提升8%左右.

Abstract:  The paper is aimed to extract affiliation relations between entities in the Thai research domain.An approach of the affiliation relations extraction between entities in sentences of Thai language based on the maximum entropy model is proposed.As for the deficience of corpus in the relation extraction process between entities in the sentences of Thai language,by making full use of the parallel sentence pairs of Chinese-Thai bilingual as an intermediate bridge,the comparative mature research findings in Chinese research domian,which including word segmentation,POS tagging,entity recognition and so on,will be mapped to the sentences of Thai language which corresponding to the sentences of Chinese with the help of Chinese-Thai bilingual dictionary.Then we operate several data processing procedures of Thai sentences and conduct appropriate manual amendments,as well as labeling the entity relation samples manually.Consequently,the training corpus infrastructure of entity relations extraction in Thai language is built.On the basis of the corpus,we treat the entity relations extraction problem as a classification task.Given several particular characteristics of the Thai language itself,certain features templates in context of samples are extracted to train the maximum entropy model to be a useful classifier.Thus the model is able to recognize the class of triple tuples of candidate entity affiliation realtions to verify the efficiency and precision of the classifier in order to accomplish the task of the affiliation relations extraction between entities.The experiments show that the approach put forward in the paper can enable the F-measure to improve 8% approximately compared with the existing methods.

 [1] 赵世瑜.泰语词法分析关键技术研究.硕士学位论文.昆明:昆明理工大学,2016.(Zhao S Y.Research on key technology of lexical analysis about Thai language.Master Dissertation.Kunming:Kunming University of Science and Technology,2016.)
[2] 何冬梅.泰语构词研究.博士学位论文.上海:上海师范大学,2012.(He D M.On the word-formation study of Thai language.Ph.D.Dissertation.Shanghai:Shanghai Normal University,2012.)
[3] 邓丽娜,厉 芹.泰语与汉语的同异性与对泰汉语教学.成都大学学报(教育科学版),2008,22(4):64-67.(Deng L N,Li Q.On the contrast between Thai and Chinese language and Chinese teaching in Thailand.Journal of Chengdu University(Educational Sciences Edition),2008,22(4):64-67.)
[4] 黄 勋,游宏梁,于 洋.关系抽取技术研究综述.现代图书情报技术,2013,29(11):30-39.(Huang X,You H L,Yu Y.A review of relation extraction.New Technology of Library and Information Service,2013,29(11):30-39.)
[5] 母克东,万 琪.关系抽取研究综述.现代计算机,2015(3):18-21.(Mu K D,Wan Q.Survey of the research on relation extraction.Modern Computer,2015(3):18-21.)
[6] Zhou G D,Su J,Zhang J,et al.Exploring various knowledge in relation extraction.In:Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics.Ann Arbor,Michigan:Association for Computational Linguistics,2005:427-434.
[7] 董 静,孙 乐,冯元勇等.中文实体关系抽取中的特征选择研究.中文信息学报,2007,21(4):80-85,91.(Dong J,Sun L,Feng Y Y,et al.Chinese automatic entity relation extraction.Journal of Chinese Information Processing,2007,21(4):80-85,91.)
[8] 郭喜跃,何婷婷,胡小华等.基于句法语义特征的中文实体关系抽取.中文信息学报,2014,28(6):183-189.(Guo X Y,He T T,Hu X H,et al.Chinese named entity relation extraction based on syntactic and semantic features.Journal of Chinese Information Processing,2014,28(6):183-189.)
[9] 刘绍毓,周 杰,李弼程等.基于多分类SVM-KNN的实体关系抽取方法.数据采集与处理,2015,30(1):202-210.(Liu S Y,Zhou J,Li B C,et al.Entity relation extraction method based on multi-SVM-KNN classifier.Journal of Data Acquisition & Processing,2015,30(1):202-210.)
[10] Zelenko D,Aone C,Richardella A.Kernel methods for relation extraction.The Journal of Machine Learning Research,2003,3:1083-1106.
[11] 黄瑞红,孙 乐,冯元勇等.基于核方法的中文实体关系抽取研究.中文信息学报,2008,22(5):102-108.(Huang R H,Sun L,Feng Y Y,et al.A study on kernel-based Chinese relation extraction.Journal of Chinese Information Processing,2008,22(5):102-108.)
[12] 庄成龙,钱龙华,周国栋.基于树核函数的实体语义关系抽取方法研究.中文信息学报,2009,23(1):3-8,34.(Zhuang C L,Qian L H,Zhou G D.Research on tree kernel-based entity semantic relation extraction.Journal of Chinese Information Processing,2009,23(1):3-8,34.)
[13] Zhou G D,Qian L H,Fan J X.Tree kernel-based semantic relation extraction with rich syntactic and semantic information.Information Sciences,2010,180(8):1313-1325.
[14] Zhou G D,Zhu Q M.Kernel-based semantic relation detection and classification via enriched parse tree structure.Journal of Computer Science and Technology,2011,26(1):45-46.
[15] 陈 鹏,郭剑毅,余正涛等.基于凸组合核函数的中文领域实体关系抽取.中文信息学报,2013,27(5):144-148,155.(Chen P,Guo J Y,Yu Z T,et al.Chinese field entity relation extraction based on convex combination kernel function.Journal of Chinese Information Processing,2013,27(5):144-148,155.)
[16] 陈 鹏,郭剑毅,余正涛等.融合领域知识短语树核函数的中文领域实体关系抽取.南京大学学报(自然科学),2015,51(1):181-186.(Chen P,Guo J Y,Yu Z T,et al.Chinese domain entity relation extraction based on domain knowledge phrasal tree.Journal of Nanjing University(Natural Sciences),2015,51(1):181-186.)
[17] 陈 宇,郑德权,赵铁军.基于Deep Belief Nets的中文名实体关系抽取.软件学报,2012,23(10):2572-2585.(Chen Y,Zheng D Q,Zhao T J.Chinese relation extraction based on deep belief nets.Journal of Software,2012,23(10):2572-2585.)
[18] Liu C Y,Sun W B,Chao W H,et al.Convolution neural network for relation extraction.In:Motoda H,Wu Z,Cao L,et al.Advanced Data Mining and Applications.ADMA 2013.Lecture Notes in Computer Science.Springer Berlin Heidelberg,2013:231-242.
[19] Zeng D J,Liu K,Lai S W,et al.Relation classification via convolutional deep neural network.In:Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics:Technical Papers.Dublin,Ireland:Schloss Dagstuhl,2014:2335-2344.
[20] Tongtep N,Theeramunkong T.A feature-based approach for relation extraction from Thai news documents.In:Chen H,Yang C C,Chau M,et al.Intelligence and Security Informatics,PAISI 2009,Lecture Notes in Computer Science.Springer Berlin Heidelberg,2009:149-154.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!