南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (3): 549–.

• • 上一篇    下一篇

 
基于主题提取的海量微博情感分析

 王灿伟1,2*   

  • 出版日期:2017-05-30 发布日期:2017-05-30
  • 作者简介: 1.山东管理学院信息工程学院,济南,250357;2.南京大学计算机科学与技术系,南京,210023
  • 基金资助:
     基金项目:国家自然科学基金青年项目(71301086),山东省电子政务项目(2150511),山东省科技厅星火计划(2013XH17003),教育厅科技计划(J14LN62)
    收稿日期:2017-03-24
    *E­mail:wangcanwei@sina.com

 Sentimental analysis of massive micro­blog based on topic extraction

 Wang Canwei1,2*   

  • Online:2017-05-30 Published:2017-05-30
  • About author: 1.Department of Information and Engineering,Shandong Management University,Jinan,250357,China;
    2.Department of Computer Science and Technology,Nanjing University,Nanjing,210023,China

摘要:  从海量微博数据中分析公众对某一社会事件的情感倾向具有重要研究意义,而海量微博文本稀疏规模庞大,导致传统方法处理这一任务时面临诸多挑战.提出一种基于主题聚类的海量微博情感分析方法.首先基于高质量微博数据挖掘频繁项集,设定语义相关阈值,筛选重要频繁项集进行谱聚类,得到主题关键词.基于主题关键词对海量微博数据依据语义相关度归类,最后结合情感词典对每类中的微博检索主题关键词前后修饰距离内情感词及否定词,结合表情符号计算微博情感值.在百万规模中文微博上进行实验,证明该方法能准确按主题归类且能有效在该主题上进行情感分类.

Abstract:  It is of great significance to analyze public sentimental tendency for a social event from massive micro­blog data of social network.Massive micro­blog data features sparse,large scale,and so on,so traditional methods of handling this task face many challenges.Therefore,our study presents a sentiment analysis method based on themes clustering.Firstly,mining frequent itemsets from high quality micro­blog datasets,then setting the semantic correlation thresholds.Filtering out significant frequent itemsets and spectral clustering to get topic keywords.Grouping massive micro­blog data by semanteme based on topic keywords.And then combining sentiment lexicon,the value of micro­blog sentiment intensity was generated based on the sentiment words and negative words which were before or after the retrieved topic keywords of each category of micro­blog data within a specified distance in order to determine the category.Conducting experiment on million Chinese micro­blog,it proves that the method is accurate for getting topic and effective in sentimental classification on the topic.

 [1] 丁兆云,贾 焰,周 斌.微博数据挖掘研究综述.计算机研究与发展,2014,51(4):691-706.(Ding Z Y,Jia Y,Zhou B.Survey of data mining for Microblogs.Journal of Computer Research and Development,2014,51(4):691-706.)
[2] 赵妍研,秦 兵,车万翔等.基于句法路径的情感评价单元识别.软件学报,2011,22(5):887-898.(Zhao Y Y,Qin B,Che W X,et al.Appraisal expression recognition based on syntactic path.Journal of Software,2011,22(5):887-898.)
[3] 张成功,刘培玉,朱振方等.一种基于极性词典的情感分析方法.山东大学学报(理学版),2012,47(3):50-53.(Zhang C G,Liu P Y,Zhu Z F,et al.A sentiment analysis method based on a polarity lexicon.Journal of Shandong University(Natural Science),2012,47(3):50-53.)
[4] 杨佳能,阳爱民,周咏梅.基于语义分析的中文微博情感分类方法.山东大学学报(理学版),2014,49(11):14-21.(Yang J N,Yang A M,Zhou Y M.Sentiment classification method of Chinese Micro­blog based on semantic analysis.Journal of Shandong University(Natural Science),2014,49(11):14-21.)
[5] Tseng C,Patel N,Paranjape H,et al.Classifying twitter data with Naïve bayes classifier.In:Proceedings of 2012 IEEE International Conference on Granular Computing(GrC).Piscataway,USA:IEEE Press,2012:294-299.
[6] Escalante H J,Montes Y G,Solorio T.A weighted profile intersection measure for profile­based authorship attribution.In:Proceedings of the 10th Mexican International Conference on Advances in Artificial Intelligence.Springer,2011:232-243.
[7] Ren Y,Kaji N,Yoshinaga N,et al.Sentiment classification in resource­scarce languages by using label propagation.In:Proceedings of the 25th Pacific Asia Conference on Language,Information and Computation.Singapore,Singapore:Pacific,2011:420-429.
[8] Jung J J.Maximum entropy­based named entity recognition method for multiple social networking services.Journal of Internet Technology,2012,13(6):931-937.
[9] Zhu X D,Guo H Y,Mohammad S,et al.An empirical study on the effect of negation words on sentiment.In:Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.Stroudsburg,USA:ACL,2014:304-313. 
[10] 张志飞,苗夺谦,聂建云等.否定句的情感不确定性度量及分类.计算机研究与发展,2015,52(8):1806-1816.(Zhang Z F,Miao D Q,Nie J Y,et al.Sentiment uncertainty measure and classification of negative sentences.Journal of Computer Research and Development,2015,52(8):1806-1806.)
[11] 彭 敏,黄佳佳,朱佳晖等.基于频繁项集的海量短文本聚类与主题抽取.计算机研究与发展,2015,52(9):1941-1953.(Peng M,Huang J J,Zhu J H,et al.Mass of short texts clustering and topic extraction based on frequent itemsets.Journal of Computer Research and Development,2015,52(9):1941-1953.)
[12] Peng M,Huang J J,Fu H,et al.High quality Microblog extraction based on multiple features fusion and time­frequency transformation.In:Proceedings of the 14th International Conference of Web Information Systems Engineering(WISE’13).Springer,2013:188-201.
[13] 周咏梅,阳爱民,林江豪.中文微博情感词典构建方法.山东大学学报(工学版),2014,44(3):36-40.(Zhou Y M,Yang A M,Lin J H.A method of building Chinese Microblog sentiment lexicon.Journal of Shandong University(Engineering Science),2014,44(3):36-40.)
[14] Yang A M,Lin J H,Zhou Y M,et al.Research on building a Chinese sentiment lexicon based on SO­PMI.Applied Mechanics and Materials,2013:1688-1693.
[15] 王潇天.基于中文微博的热点事件情感倾向分析.博士学位论文.北京:北京邮电大学,2014.(Wang X T.Sentiment analysis of popular events based on Chinese Microblog network.Ph.D.Dissertation.Beijing:Beijing University of Posts and Telecommunications,2014.)
[16] Liu B,Hao Z,Tsang E C.Nesting one­against­one algorithm based on SVMs for pattern classification.IEEE Transactions on Neural Networks,2008,19(12):2044-2052.
[17] Shen Y,Li S,Zheng L,et al.Emotion mining research on Micro­blog.In:Proceedings of the 1st IEEE Symposium on Web Society(SWS 2009).Lanzhou,China:IEEE Press,2009:71-75.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!