南京大学学报(自然科学版) ›› 2014, Vol. 50 ›› Issue (4): 526–.

• • 上一篇    

一种基于语义词典的局部查询扩展方法

吴秦,梁久祯,白玉昭   

  • 出版日期:2014-08-23 发布日期:2014-08-23
  • 作者简介: 江南大学物联网工程学院,无锡,214122
  • 基金资助:
     国家自然科学基金(61202312,61170121),教育部留学回国人员科研启动基金

A local query expansion method based on semantic dictionary

 Wu Qin, Bai Yuzhao, Liang Jiuzhen   

  • Online:2014-08-23 Published:2014-08-23
  • About author: School of Internet of Things Engineering, Jiangnan University, Wuxi, 214122, China

摘要:  针对基于关键词匹配的搜索引擎存在的问题,提出一种基于语义词典的局部查询扩展方法,首先利用共现分析法和语义相似度选取扩展词,再对原始查询词和扩展词加权,最后计算文档相似度从而获得排序后的扩展查询结果。该方法克服了其它局部扩展方法将大量无关词加入查询的问题。实验表明,该方法有效地提高了查询结果的查准率。

Abstract:  Most traditional search engine models are based on keyword matching. Due to the large number of synonyms and polysemous words, the query results obtained by traditional search engines have a big probability to be different from what the user expected, especially when the length of query words is short. To overcome this problem, this paper proposes a new query method based on local query expansion technology and semantic dictionary. Firstly, initial document set is obtained by query with original keywords. And the documents most related to the original keywords are selected as extended-keyword-selection documents. By co-occurrence analysis, words with large weights are selected as extended keyword candidates from the extended-keyword-selection documents. Tongyici Cilin (Extended Edition) is used as the semantic dictionary in this paper. According to the characteristic of the encoding style of Tongyici Cilin (Extended Edition), a new measurement of word similarity is defined. And it is applied to select extended keywords from the extended keyword candidates. The original keywords and the extended keywords are used as the final query words. To get better retrieval results, each word in the final query word set is assigned a weight based on its importance in the query and its similarity to the original keyword. The similarities between the set of final query words and the initial documents are calculated based on the weights of words in the final query word set. And the final retrieval results are sorted according to the similarities between the set of final query words and the initial documents. Comparing with other local query expansion methods, the proposed method avoids adding unrelated words to the query. To test the effectiveness of the proposed method, it is applied to food information retrieval. The proposed method is compared with the method using the original keywords only, and the method using extended words obtained only by co-occurrence analysis. The results show that, comparing with the other two methods, the proposed method effectively improves the precision of retrieval results.

 [1] Manning C, Raghavan P, Schütze H. Introduction to information retrieval. Cambridge University Press, Cambridge, UK, 2008, 496.
[2] 赵一唯, 王和珍, 李振东. WWW信息检索综述. 南京大学学报(自然科学). 2001, 37(02): 192~198.
[3] Fumas G W, Deerwestr S, Dumais S T, et al. Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceeding of the 1st International Conference on Research and Development in Information Retrieval. NewYork: ACM Press, 1988:465~480.
[4] Carpineto C, Romano G. A survey of automatic query expansion in information retrieval. ACM Computing Surveys, 2012, 44(1): 1~50.
[5] Selvaretnam B, Belkhatir M. Natural language technology and query expansion: Issues, state-of-the-art and perspectives. Journal of Intelligent Information Systems, 2012, 38(3):709~740.
[6] Xu J, Croft B. Query expansion using local and global document analysis. In: Proceedings of the 19th Annual International SIGIR Conference on Research and Development in Information Retrieval. New York: ACM Press, 1996:4~11.
[7] Zhoua D, Truranb M, Liua J, et al. Collaborative pseudo-relevance feedback. Expert Systems with Applications,. 2013, 40(17): 6805~6812.
[8] Buckley C, Singhal A, Mitra M, et al. New retrieval approaches using SMART: TREC 4. In: Proceedings of the 4th Text Retrieval Conference, Gaithersburg, Maryland: National Institute of Standards and Technology, 1995: 25~28.
[9] Miller G, Beckwith R, Fellbaum C, et al. Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 1990, 3(4):235~312.
[10] Rila M T, Rila M, Tokunaga T, et al. The use of WordNet in information retrieval. Workshop on Usage of WordNet in Natural Language Processing Systems, 1998, 31~37.
[11] 王小林,王 义.改进的基于知网的词语相似度算法.计算机应用,2011,31,(11): 3075~3077, 3090.
[12] 张晓孪,王西锋.基于知网和知识图的汉语词语语义相似度算法.计算机与数字工程, 2011, 39, (10): 72~76.
[13] 梅家驹,竺一鸣,高蕴琦.同义词词林.上海:上海辞书出版社,1993, 106~108.
[14] 李海芳, 史俊冰, 段利国等. 一种基于含糊同义词的查询扩展方法. 计算机应用与软件,2011, 28(12): 41~43, 47.
[15] 付剑锋, 刘宗田, 刘念祖. 基于多知识库和局部反馈的查询扩展研究.情报杂志, 2013, 32(2): 103~106.
[16] 刘清江. 同义词在文本特征提取与查询扩展中的应用.硕士学位论文.保定:河北大学, 2010.
[17] 王曰芬,宋 爽,卢 宁.共现分析在文本知识挖掘中的研究.中国图书馆学报,2007(2):59~64.
[18] Singhal A. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 2001, 24 (4): 35~43.
[19] Schütze H, Pedersen JO. A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management, 1997, 33 (3): 307~318 .
[20] 邱均平,楼 雯.基于共现分析的语义信息检索研究.中国图书馆学报, 2012, 38(202): 89~99.
[21] Xu J, Croft B. Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 2000, 18(l):79~112.
[22] Varelas G, Voutsakis E, Petrakis E, et al. Semantic similarity methods in WordNet and their application to information retrieval on the web. In: Proceedings of the 7th ACM International Workshop on Web Information and Data Management (WIDM 2005), Bremen, Germany: ACM Press, 2005, 10~16.
[23] Agirre E, Alfonseca E, Hall K, et al. A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado: Association for Computational Linguistics, 2009: 19~27.
[24] 霍 林,王 力,黄俊文等.一种结合同义词典和词对共现距离的查询扩展方法. 广西大学学报, 2010, 35(2):303~309.
[25] Rocchio J. Relevance feedback in information retrieval. The Smart Retrieval System: Experiments in Automatic Document Processing, London: Prentice Hall, 1971,313~323.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!