南京大学学报(自然科学版) ›› 2012, Vol. 48 ›› Issue (4): 383–389.

• • 上一篇    下一篇

 领域本体概念实例、属性和属性值的抽取及关系预测*

 郭剑毅1.2**,李真1·2,余正涛1·2,张志坤1.2
  

  • 出版日期:2015-06-19 发布日期:2015-06-19
  • 作者简介: (l.昆明理工大学信息工程与自动化学院,昆明,650500;
    2.昆明理工大学智能信息处理重点实验室,昆明,650500)
  • 基金资助:
     国家自然科学基金(60863011),云南省自然科学基金(2008CC023),云南省中青午学术技术带头人后各人才项日
    (2007PY01-11),云南省教育厅基金(07Z11139)

 Extraction and relation prediction of domain ontology concept instance
attribute and attribute value

 Guo Jian-Yi1’2,Li Zhen1’2,Yu Zheng Tao1’2,Zhang Zhi一Kun1’2
  

  • Online:2015-06-19 Published:2015-06-19
  • About author: (1.The School of Information Engineering and Automation, Kunming University of Science and Technology
    Kunming, 650500,China;2. Key Laboratory of Intelligent Information Processing,
    Kunming University of Science and Technology, Kunming, 650500,China)

摘要:  研究了如何使用协作分类器(协作使用条件随机场(LRFs)和支持向量机(SVM) )解决领域概念实例、属性及属性值的抽取以及它们三者之间对应关系预测的问题.首先将概念实例、属性及属性值看作三类实体,把概念实例、属性及属性值的抽取问题转化为命名实体识别问题,利用条件随机场建模进行命名实体识别;在此基础上定义实体间对应关系,对概念实例、属性及属性值三者的对应关系做预测,把概念实例、属性与属性值三者之间存在关系的向量标记为1,否则标记为0,利用支持向量机建模进行关系的预测.且以云南旅游景点概念实例、属性及属性值进行六组相关的实验.实验表明,在开放测试中协作分类器精确度达到84.4%、召回率达到82. 7%及F值达到为83. 6%,相比于词语共现F值提高了20个百分点.

Abstract:  This paper studies how to use the Collaboration Classifier(Conditional Random Fields(CRFs)and
Support Vector Machine(SVM))to solve the extraction and relation prediction problem of ontology concept
instance, attribute and attribute value. Firstly, taken concept instance,attribute and attribute value as three
entities,the problem of extraction these three entities was converted to a named entity recognition problem, CRFs
classifier model was adopted to recognize entities; Furthermore, made a definition for the relations between the
concept instance, attribute and attribute value and made relations prediction among concept instance,attribute and
attribute value after they were identified respectively, if there is a relationship among the concept instance,attribute
and attribute value, marked 1,otherwise marked 0,then use SVM classifier model to make predictions on entity
corresponding relation.Talong six trials on concept instance, attribute and attribute value on Yunnan tourist
attractions for instance,the experiment is done to make that the accuracy rate of Collaborative Classifier achieves
84.4%and recall rate is up to 82. 7%and the F score is 83. 6%,compared to Words Co-occurrence model,its F-
score increased by 20%.

[1]Eric T,Wang W M. A concept relationship ar quisition and inference approach for hierarchical taxonomy construction from tags, Information
Processing and Management; An International Journal,2010,46(1):44一57.
[2]Sanchez D. A methodology to learn ontological attributes from the Web. Data and Knowledge Engineering, 2010,6(69):57一597.
[3]Poesio M,Almuhareb A, Identifying concept attributes using a classifier. Proceedings of the ACL SIGLEX Workshop on Deep Lexical Ac-
quisition,Ann Arbor,2005,18~27.
[4]Yoshinaga N,Torisawa K. Open-domain at- tributrvalue acquisition from semi-Structured texts. Proceedings of the OntoLex 2007,
Susan, South-Korea, 2007,55一66.
[5]Ravi S, Pasca M. Using structured text for large-scale attribute extraction. Proceedings of the 17th international Conference on information
and Knowledge Management. Napa Valley, California, USA,2008,1183一1192.
[6]Kang W, Sui Z F. Ontology concept instances and attributes simultaneously extracted based on web. Journal of Chinese information Process-
ing, 2010, 1 ; 54-59.(康为,穗志方.基于Web 弱指导的木体概念实例及属性的同步提取.中文信息学报,2010,1;54-59).
[7]Ye Z, Lin H F, Su S, et al. Extraction of char- actor attributes based on support vector ma- chine. Computer Research and Development,
2007, 2;271-275.(叶正,林鸿飞,苏绥等.基于支持向量机的人物属性抽取.计算机研究与发展,2007,2;271-275).
[8]Guo J Y,Xue Z S, Yu Z T,et al. Named enti- ty recognition based on cascaded conditional ran- dom fields. Journal of Chinese information Pro-
cessing, 2009 , 5 ; 47一52.(郭剑毅,薛征山,余正涛等.基于层叠条件随机场的旅游领域命名实体识别.中文信息学报,2009 ,5:47-52).
[9]Darroch J,Lauritzen S, Speed T.Markov fields and log-linear interaction models for contingency tables. Annals of Statistics, 1980,8(3): 522一539.
[10]Della P S, Della P V, Lafferty J, Inducting fea- tures of random fields. IEEE Transactions on Pattern Analysis and Machine intelligence, 1997,19(4):380一393.
[11]Wallach H. Efficient Training of conditional random fields.http:www.cogsci.ed.ac.uk,2002
[12]Information Retrieval Laboratory, Harbin lnsti- tute of Technology. Synonymous with the word forest(Extended Edition), http;//www. it一
lab. org/, 2008-05- 19.(哈尔滨工业大学信息检索研究室.同义词词林(扩展版).http; // WWW.ir一lab. org/,2008一05一19).
[13]Liao S Z,Ding L Z,Jia L. Support vector re gression parameter adjustment. Journal of Nan- jing University(Natural Sciences),2009,45
(5):585-592.廖士中,J立中,贾磊.支持向量回归多参数的同时调节.南京大学学报(自然科学),2009,45(5):585-592).
[14]Geng Q, Geng C. Use of the word co-occur- rence for Ontology concept gain. Modern Li- brary and Information Technology, 2006,1(2):
43-45.耿赛,耿崇.利用词语共现进行Ontology的概念获取.现代图书情报技术,2006,1(2):43一45) .
[15]Geng H T,Cai Q S, Yu K,et al. Document keywords automatically extracted based on word co-occurrence map. Journal of Nanjing Univer-
say(N atural Sciences),2006,42(2):156一162.(耿焕同,蔡庆生,于混等.一种基于词共现图的文档主题词自动抽取方法.南京大学
学报(自然科学),2006, 42(2);156-162).
[16]Yao X M Guo J Y Yu Z T,et al. A new algo- rithm based on word co-occurrence and its appli- canon in domain concept extraction. 2009 IEEE
international Conference on intelligent Compu- ting and Intelligent Systems, Shanghai,China, 2009,4(3):521一525.




No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 季 斌*,陈  威,樊 杰,宋宏娇,魏桃员 . 产脲酶微生物诱导钙沉淀及其工程应用研究进展[J]. 南京大学学报(自然科学版), 2017, 53(1): 191 .