基于维基语义的多文档文摘研究*

南京大学学报(自然科学版) ›› 2011, Vol. 47 ›› Issue (4): 398–406.

基于维基语义的多文档文摘研究*

龚书**，瞿有利，田盛丰

出版日期:2015-04-21 发布日期:2015-04-21
作者简介: (北京交通大学计算机与信息技术学院，北京，100044)
基金资助:
教育部科学技术研究重点项日(108126)，国家自然科学基金(1087109 / a0107)

Multi-documents summarization utilizing semantics in Wikipedia

Gong Shu，Qu You- Li Tian Shenh Fenh

Online:2015-04-21 Published:2015-04-21
About author: (School of Computer and InformationTechnology, Beijing Jiaotong University, Beijing, 100044，China)

摘要/Abstract

摘要：

Abstract: As an importance technique of natural language processing, multi-documents summarization can facilitate users,information retrieval processes.As the
documents in a collection arc always collected from different resources,there exist ahundant and also complex semantic relations inside a document collection. It’s hard for the
widely used word-based text representation to provide sufficient and accurate information for semantic analysis in summarization process.Thus, we try to use Wikipedia, which has extensive concepts coverage
to extract the concept based representation of documents. We assess the importance of concepts using both global and local information.The global relatedness of concepts is based on Wikipedia’s link structure, while the local relatedness is
calculated based on concepts’co-occurrence m sentence.Three wild-based features arc proposed:The first one is the widely used sentence salience feature based on Markov Chain. The other two are hoth hascd on sentence
similarity with first paragraphs of concept articles in Wikipedia, but one using all concepts occurring in collection while the other using only other contained in sentence itself. Finally we linearly combined these features to select
important sentences, which arc then concatenated to form summary. We compared these features in experiments, and proved that the first paragraph of related concepts’Wikipedia articles can bring better summary quality.

龚书**，瞿有利，田盛丰
. 基于维基语义的多文档文摘研究*
[J]. 南京大学学报(自然科学版), 2011, 47(4): 398–406.

Gong Shu，Qu You- Li Tian Shenh Fenh
. Multi-documents summarization utilizing semantics in Wikipedia
[J]. Journal of Nanjing University(Natural Sciences), 2011, 47(4): 398–406.

参考文献

[1]Luhn H P The automatic creation of literatureabstracts, lBM Journal of Research and Devcl opmcnt，1958，2(2):159一165.
[2]Ogden C K，Richards I A.The meaning of meaning. Harcourt, Brace and World,New York. 1946，109一138.
[3]Wu C W，Liu C L. Ontology based text sum marization for business news articles. Procecd- ings of the 18th international Conference on Computers and Their Applications. Honolulu, Hawaii，USA，2003，389一392.
[4]Nastase V, Topirdriven multi一 document sum marization with encyclopedic knowledge and spreading activation. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing，2008,763~772
[5]Yeh J Y，Ke H R，Yang W P, et al.Text summarization using a trainable summarizer and latent semantic analysis, information Processing and Management，2005，4l(1):75一95.
[6]http//en, wikipcdia. org/wild/Main_Page.
[7]httl//en.wikipcdia. org/wild/Fist of Wiki-pedias.
[8]Milnc D, Witten l H. An open-source toolkit for mining Wikipedia. Proceedings of New Zeal- and Computer Science Research Student Confer-ence, 2009，9.
[9]http;//en.wikipedia. org/wild/Chinese- Wiki pedia
[10]Mihalcea R，Csomai A. Wikify!:Linking docu- menu to encyclopedic knowledge. Proceedings of the Association for Computing Machinery CALM) Conference on information and Knowl- edge Management，2007，233一242.
[11]Roberto N. Word sense disambiguation; a sur vey.Association for Computing Machinery (ACM)Computing Surveys，2009，42(2) 1~69
[12]http，//en.wikipcdia. org/wild/Wikipcdia: Man ual一of一Style.
[13]Milne D, Witten l H. Learning to link with Wikipedia. Proceedings of the Association for Computing Machinery(ACM) Conference on in- formation and Knowledge Management，Napa Vallcv, California, 2008.
[14]http;//en, wikipcdia. org/wild/Wikipcdia; Wiki pedia一 as一 an一 academic一source.
[15]Mcdclyan O，Legg C, Milnc D, et al. Mining meaning from Wikipedia. international Journal of Human-Computer Studies, 2009，67(9): 716一754.
[16]Ramanathan K，Sankarasubramaniam Y Mathur N, et al. Document summarization u sing Wikipcdia.The First international Confer ence on Human Computer interaction, 2009.
[17]Svore K，Vandcrwcndc L, Burges C J C. En hancing single-document summarization by com bining RankNet and third-party sources. Pro ceedings of the 2007 Joint Conference on Empir
ical Methods in Natural Language Processing and Computational Natural Language Learning, 2007，448一457.
[18]Miao Y J，Li C P. Enhancing query-oriented summarization based on sentence wikification. Proceedings of the 33^rd Annual international Conference by Association for Computing Ma- hinery(ACM) Special interest Uroup on lnfor- mation Retrieval，2010，32一35.
[19]Ye S, Chua T S, Lu J. Summarizing dcfinitior from Wikipedia. Proceedings of the 47^thAnnua Meeting of the Association for Computationa Linguistics and the 4^th international Joint Con
ference on Natural Language Processing of th Asian Federation of Natural Language Process ing Associations, 2009，199一207.
[20]Liu M F, Yu B, Fang F, et al. TAC 2009 up- date summarization task of WUST. Text Anal- ysis Conference 2009. Chttp;//www, nist, gov/ tae/publications/2009/participant, papers/ CLWUST. proceedings, pdf)
[21]Milne D, Witten l H. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. Proceedings of the first Asso- ciation for the Advancement of Artificial lntelli-
gence Workshop on Wikipcdia and Artificial in- tclligcnce, Chicago，2008，25一30.
[22]Erkan U, Radcv D R. LcxRank:Uraph-based lexical centrality as salience in text summariza- tion. Journal of Artificial intelligence Research, 2004，22(1):457一479
[23]Radcv D R，Timothy A，Sasha B U, et al. MEAD-A platform for multidocumcnt multilin- gual text summarization. Proceedings of the 4^th international Conference on Language Resources and Evaluation, 2004，699一702.
[24]http;// www一nlpir, nist gov/projects/due/ duc2007/tasks, html.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed