A study of relation extraction of undefined relation type
based on semrsupervised learning framework

 Cheng Xian一Yi1,Zhu Qian2

Journal of Nanjing University(Natural Sciences) ›› 2012, Vol. 48 ›› Issue (4) : 466-474.

PDF(411664 KB)
PDF(411664 KB)
Journal of Nanjing University(Natural Sciences) ›› 2012, Vol. 48 ›› Issue (4) : 466-474.

 A study of relation extraction of undefined relation type
based on semrsupervised learning framework

  •  Cheng Xian一Yi1,Zhu Qian2
Author information +
History +

Abstract

 This study aims to design a relation extraction system with undefined relation type. However, without specific areas and machinrrcadable knowledge as a guide, it is difficult to achieve expected precision and recall in relation extraction for natural language texts.This paper describes a framework of extraction entity-attribut}valuc relationship based on semi-supervised machine learning, In semi-supervised learning tasks, seeds arc obtained from the Wikipedia information table. We first identify some strong counter-example with a linear classifier, then restrain the classifier with the existing counter-example data, and finally find more counter-examples in remaining unannotated data. After semi-supervised learning, we can obtain a set of candidate relationship instances.Then we discuss the verification problem of the relationship categories. For the noise mode, we propose a standard evaluating relationship model confidence level, if modes have conflict,control match order algorithm will be presented(i. e. the principle of high confidence mode priority matching).After two algorithms, the relation type may be still with
diversities, then the algorithm of condensed hierarchical clustering will be presented in this paper, which expresses Wikipedia as a vector, and give a computing mode of similar relational and complete relation type clustering, In the Wikipedia XMI. data sets experiments arc conducted,and results show that according to Wikipedia, we can dynamically determine relation type, reduce the dependence on the predefined types, and improve the portability of relation recognition system.

Cite this article

Download Citations
 Cheng Xian一Yi1,Zhu Qian2.  A study of relation extraction of undefined relation type
based on semrsupervised learning framework
[J]. Journal of Nanjing University(Natural Sciences), 2012, 48(4): 466-474

References

[1]Suchanek F M, Kasneei G, Weikum G. YA GO; A Large ontology from Wikipedia Word Net. Elsevier Journal of Web Semantics,2008 1245一1251.
[2]Etzioni O,Cafarella M, Downey D,et al. Web-seale information extraction in knowitall.WWW,New York,2004,341~349
[3]ACE.The nist ace evaluation website,http://www.nist.gov/speech/tests/ace/ace07/,2007.
[4]Auer S, Lehmann J. What have Innsbruck and Leipzig in common? Extracting Semantics from Wiki Content. Proceedings of the 4th European
Semantic Web Conference,June 2007 in the TY rol region of Innsbruck, Austria, 2007,121一132.
[5]Suchanek F M, Kasneci G, Weikum G. YA- GO; A Core of Semantic Knowledge Unifying WordNet and Wikipedia. Proceedings of the
16th international World Wide Web Conference, National University of Ireland,Galway, 2007,443一448.
[6]Girju R,Badulescu A,Moldovan L. Learning semantic constraints for the automatic discovery of part-whole relations. Proceedings of HLT-
NAACL’03,University of Montrral,Canada, 2003,612一618.
[7]Roth D, Yih W. A linear programming formu- lation for global inference in natural language tasks. Proceedings of the 8th international Con-
ference on Computational Natural Language Learning, Ayderabad,India, 2004,23一30.
[8]Ruiz-Casado M, Alfonscca E, Castclls P. Au- tomatic extraction of semantic relationships for WordNet by means of pattern learning from
Wikipedia. Proceedings of the 10th International Conference on Applications of Natural Language to information Systems, Montoyo,Rafeal
Mvnoz, Elisath Matais, 2005,22一231.
[9]Denoyer L.The Wikipcdia XML Corpus. SI- GIR Forum. 2006.
[10]Apache Software Foundation. OpenNLP. ht- tp;//opcnnlp, sourceforge, net/,2010.
[11]Apache sOFTWARE Foundation. Apache 2. 0 licensed.http:/lucene. apache. org/,2010
[12]Chih chum Chang and Chih-Jen Lin. LIBSVM-A Library for Support Vector Machines. ht  tp//www.csie.ntu.edu.tw/一cjlin/libsvm, 2009.
[13]Zhu Q, Cheng X Y,Ding L, et al. Network blog copyright protection dual watermarking al- gorithm based on natural language processing
technology. Journal of Nanjing University (Natural Sciences),2010,16(2):140一148. (朱倩,程显毅,丁 谬等.基于自然语言处
理技术的网络博客版权保护双水印算法.南京大学学报(自然科学),2010, 46(2);140-148).
[14]Li Y,Du S D. Digit alwatermar king based on fractal co ding in wavelet domain. Journal of Nanjing University(Natural Sciences),2006,
42(4); 373-383.(李杨,都思月一小波域分形编码数字水印的研究.南京大学学报(自然科学),2006,42(4); 373-383).



PDF(411664 KB)

1507

Accesses

0

Citation

Detail

Sections
Recommended

/