南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (6): 1004–.

• • 上一篇    下一篇

基于本体和局部查询反馈的微博查询扩展算法

巩 皓1,杜军平1*,赖金财1,梁美玉1,王 巍2,罗 盎2   

  • 发布日期:2017-11-26
  • 作者简介:1.北京邮电大学计算机学院智能通信软件与多媒体北京市重点实验室,北京,100876;
    2.新浪网技术(中国)有限公司,北京,100193
  • 基金资助:
    ?基金项目:国家自然科学基金(61532006,61320106006,61502042),北京市财政项目(PXM2017_178214_000005)
    收稿日期:2017-09-15
    *通讯联系人,E-mail:junpingdu@126.com

Microblog query expansion algorithm based on ontology and local query feedback

 Gong Hao1,Du Junping1*,Lai Jincai1,Liang Meiyu1,Wang Wei2,Luo Ang2   

  • Published:2017-11-26
  • About author:1.Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia,School of Computer Science,Beijing University of Posts and Telecommunications,Beijing,100876,China;
    2.SINA Corporation,Beijing,100193,China

摘要:  传统的基于关键词匹配的查询方法因查询词短少,微博博文短小,容易引起歧义性,对查询效率有较大影响.提出一种基于本体和局部查询反馈的微博查询扩展算法,首先结合安全领域文档构建安全领域本体知识库,然后利用本体提供的语义知识对初始查询词进行扩展,再结合局部查询反馈对候选扩展词集进行筛选,最后通过二次查询和迭代操作得到最终查询结果.实验结果表明,基于本体和局部查询反馈的微博查询扩展算法比基于关键词的查询扩展算法、基于本体的查询扩展算法和基于“伪相关反馈”的查询扩展算法有更好的查全率和查准率.

关键词: 本 体, 微 博, 共现分析, 查询扩展

Abstract:  The purpose of this work is to measure the efficiency of information retrieval(IR)in microblog by using query expansion based on ontology and local query feedback.Firstly,ontology knowledge base of security domain is created by security domain documents.Then,the ontology is expended by using the security domain terminology extracted from microblog documents.Thus,the expended ontology consists two broad categories,six subclasses and more than fifty concepts.Secondly,the query word is expended by the semantic knowledge provided by the expanded ontology.And the Lucence search engine is used for initial retrieval.By calculating microblog heat and time correlation,local microblog documents are got to filter the expansion words.Finally,combining the weight of each candidate expansion word in ontology query expansion and local query feedback co-occurrence analysis,the filter function is created to select the final expansion words.The final results are got by iterative operation and secondary retrieval.In order to check the accuracy of the microblog query expansion algorithm based on ontology and local query feedback(OFQE),keywords query expansion algorithm(KQE),ontology query expansion algorithm(OQE)and pseudo relevance feedback query expansion algorithm(PRFQE)are used to compare the efficiency of microblog information retrieval.Multiple query words and their combinations are used for retrieval.The experimental results are the average scores of top N results by multiple times search,which show that OFQE has better recall rate and precision rate than KQE,OQE and PRFQE.

Key words: ontology, microblog, co-occurrence analysis, query expansion

 

[1] 王书鑫,卫冰洁,鲁 骁等.面向微博搜索的时间敏感的排序学习方法.中文信息学报,2015,29(4):175-182.(Wang S X,Wei B J,Lu X,et al.Temporal sensitive learning to rank method for microblog search.Journal of Chinese Information Processing,2015,29(4):175-182.)
[2] 田 萱,杜小勇,李海华.语义查询扩展中词语-概念相关度的计算.软件学报,2008,19(8):2043-2053.(Tian X,Du X Y,Li H H.Computing term-concept association in semantic-based query expansion.Journal of Software,2008,19(8):2043-2053.)
[3] 吴 秦,白玉昭,梁久祯.一种基于语义词典的局部查询扩展方法.南京大学学报(自然科学),2014,50(4):526-533.(Wu Q,Bai Y Z,Liang J Z.A local query expansion method based on semantic dictionary.Journal of Nanjing University(Natural Sciences),2014,50(4):526-533.) 
[4] Ruban S,Sam S B.An experimental analysis and implementation of ontology based query expansion.ARPN Journal of Engineering and Applied Sciences,2015,10(7):3108-3111.
[5] What is an ontology?http://www-ksl.stanford.edu/kst/what-is-an-ontology.html,2011.
[6] 唐晓波,房小可.一种面向微博的查询扩展方法.图书情报工作,2014,58(1):130-135.(Tang X B,Fang X K.A query expansion method for micro-blog.Library and Information Service,2014,58(1):130-135.)
[7] 万 静,王文聪,易军凯.基于本体和局部上下文分析的查询扩展.控制工程,2013,20(3):558-561.(Wan J,Wang W C,Yi J K.Search extension based on ontology and local context analysis.Control Engineering of China,2013,20(3):558-561.) 
[8] Karpagam P,Sivasubramanian S,Nalini C.Extending disease ontology with newly evaluated terms to improve semantic medical information retrieval.International Journal of Applied Engineering Research,2016,11(5):3527-3535.
[9] 王旭阳,萧 波.基于本体和局部上下文分析的查询扩展方法.计算机工程,2012,38(7):57-59,69.(Wang X Y,Xiao B.Query expansion method based on ontology and local context analysis.Computer Engineering,2012,38(7):57-59,69.)

[10] 王红霞.基于本体的语义查询扩展应用研究.科技通报,2016,32(1):118-122.(Wang H X.Application research of semantic query expansion based on ontology.Bulletin of Science and Technology,2016,32(1):118-122.)
[11] 李爱明.基于本体和用户查询意图的查询扩展方法研究.情报科学,2015,33(5):68-71.(Li A M.Research on query expansion method based on ontology and user query intention.Information Science,2015,33(5):68-71.)
[12] 张文秀,朱庆华.领域本体的构建方法研究.图书与情报,2011,(1):16-19,40.(Zhang W X,Zhu Q H.Research on construction methods of domain ontology.Library and Information,2011,(1):16-19,40.)
[13] Devi M U,Gandhi G M.Wordnet and ontology based query expansion for semantic information retrieval in sports domain.Journal of Computer Science,2015,11(2):361-371.
[14] Duranti C M,De Almeida F C.Selection of online news for competitive intelligence:Use of business domain ontology for internet search semantic query expansion.Global Journal of Computer Science and Technology,2015,15(6):11-25.
[15] Peng J J,Wang T,Wang J X,et al.Extending gene ontology with gene association networks.Bioinformatics,2016,32(8):1185-1194.
[16] 何 伟,杨小平.基于词间语义关联性的本体扩展.计算机应用与软件,2011,28(11):73-76.(He W,Yang X P.Extending ontology based on semantic relatedness between words from a text.Computer Applications and Software,2011,28(11):73-76.)
[17] 胡川洌,符云清,钟明洋.基于领域本体的语义查询扩展.计算机系统应用,2012,21(7):83-89.(Hu C L,Fu Y Q,Zhong M Y.Semantic query expansion based on domain ontology.Computer Systems & Applications,2012,21(7):83-89.)
[18] Zhou D,Truran M,Liu J,et al.Collaborative pseudo-relevance feedback.Expert Systems with Applications,2013,40(17):6805-6812.

[1] 韩彦昭1,乔亚男1*,范亚平1,李孟超2,万迪昉3. 基于条件随机场模型和文本纠错的微博新词词性识别研究[J]. 南京大学学报(自然科学版), 2016, 52(2): 353-.
[2] 吴秦,梁久祯,白玉昭. 一种基于语义词典的局部查询扩展方法[J]. 南京大学学报(自然科学版), 2014, 50(4): 526-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!