南京大学学报(自然科学版) ›› 2023, Vol. 59 ›› Issue (3): 483–493.doi: 10.13232/j.cnki.jnju.2023.03.011

• • 上一篇    下一篇

基于子事件的对话长文本情感分析

杨京虎1,2, 段亮1,2(), 岳昆1,2, 李忠斌1,2   

  1. 1.云南大学信息学院,昆明,650500
    2.云南大学云南省智能系统与计算重点实验室,昆明,650500
  • 收稿日期:2023-02-15 出版日期:2023-05-31 发布日期:2023-06-09
  • 通讯作者: 段亮 E-mail:duanl@ynu.edu.cn
  • 基金资助:
    云南省重大科技专项(202202AD080001);云南省重点实验室专项(202205AG070003);国家自然科学基金青年项目(62002311);云南省教育厅科学研究基金(2022Y010)

Sentimenta analysis based on subevents for long dialogue texts

Jinghu Yang1,2, Liang Duan1,2(), Kun Yue1,2, Zhongbin Li1,2   

  1. 1.School of Information Science and Engineering, Yunnan University, Kunming, 650500, China
    2.Key Laboratory of Intelligent Systems and Computing of Yunnan Province, Yunnan University, Kunming, 650500, China
  • Received:2023-02-15 Online:2023-05-31 Published:2023-06-09
  • Contact: Liang Duan E-mail:duanl@ynu.edu.cn

摘要:

传统的情感分析方法主要针对句子、微博等形式的短文本,而对话长文本具有篇幅长、对话双方情感不同且情感易随对话发生变化等特点,使对话长文本中用户多重情感集成困难、情感分析任务精度低.为此,提出子事件交互模型TSI (Topic Subevents Interaction)、预训练模型ERNIE (Enhanced Language Representation with Informative Entities)和循环卷积神经网络(Recurrent Convolutional Neural Networks,RCNN)相结合的对话长文本情感分析模型(TSI with ERNIE?RCNN,TER).该模型通过动态滑动窗口抽取子事件,保留文本关键特征,降低文本冗余度,基于抽取的子事件分析对话双方的情感来识别情感主体,并集成各子事件的情感特征来解决对话双方情感不一致的问题.在真实数据上的实验结果表明,TER的精确率、召回率与F1均优于现有模型.

关键词: 对话长文本, 情感分析, 子事件抽取, 预训练模型, 循环卷积神经网络

Abstract:

Previous studies for sentiment analysis mainly focus on short texts such as sentences and microblogs text styles. Due to the long redundant text and the different and changeable sentiment of users,the integration of multiple sentiment of users is difficult and the precision of sentiment analysis task is low in the long dialogue text. For overcoming such problems,a long dialogue text sentiment analysis model TER (Topic Sub?Events Interaction with ERNIE?RCNN) is proposed. Firstly,TSI (Topic Subevents Interaction) is used to segment long dialogue text by the dynamic sliding window in order to retain the key features of the text and reduce the text redundancy. Secondly,ERNIE?RCNN is used to analyze the sentimental polarity of users in the subevents. Finally,our model identifies the sentiment agent to integrate the sentiment of each subevent and solve the problem of sentimental inconsistency. Experimental results show that TER outperforms baseline models in terms of precision,recall and F1?score.

Key words: long dialogue text, sentiment analysis, subevent extraction, pre?trained model, recurrent convolutional neural network

中图分类号: 

  • TP391

表1

一个通信业务对话的示例"

对话者对话内容
客服您好!很高兴为您服务.(积极)
客户刚刚发信息说我已欠费五十元,我昨天才交一百元话费,怎么回事?(消极)
客服非常抱歉,先生,您的资费…(积极)
客户费用我承受不了,这个套餐是你们工作人员打电话叫我办理的,我当时在忙,还是给我换成以前的套餐吧.(消极)
客户我明白了,这个费用还包括宽带对吧,不好意思啊,这个月太忙忘记了,谢谢你哈.(积极)
客服先生您客气了,祝您生活愉快,再见!(积极)

图1

TER模型的整体架构图"

表2

符号及含义"

符号含义
n一条对话文本的句子数量
u划分窗口时循环总次数
loc当前循环位置
Wa滑动窗口起始位置
Wb滑动窗口结束位置
Wt当前滑动窗口起始位置
Z滑动窗口的主题词分布
C主题词间的相关度
S子事件集合
Simcos主题词相似度
Mp最大滑动窗口长度
Mq最小滑动窗口长度
δ滑动窗口的主题相似次数
θ主题相似度阈值
子事件成立阈值
ρ子事件位置判定阈值

表3

实验使用的数据集描述信息"

数据集无情感积极消极样本合计
mc11321142212574000
mc32721275925208000

表4

各模型在对话长文本情感分析任务上的性能"

模型精确率召回率F1
TER76.38%74.69%75.53%
TextCNN50.99%44.00%46.30%
TextRNN49.07%39.00%41.54%
FastText50.98%40.00%41.43%
DPCNN52.16%33.00%35.37%
TextRCNN57.88%43.00%44.76%
Transformer60.93%33.00%31.55%
TodKat59.28%52.68%55.79%
BERT37.21%61.00%46.22%
ERNIE39.15%63.00%48.29%

表5

TSI和对比方法在mc2数据集上的子事件评价指标对比"

方法信息性准确性理解性
TSI1.91.041.83
特征评分2.412.92.42
TextRank3.383.313.19
序列标注2.282.742.55

表6

TER和九个基线模型在mc3数据集上的子事件情感分析任务的性能对比"

模型精确率召回率F1
TER89.15%89.05%89.21%
TextCNN86.08%86.23%86.09%
TextRNN81.35%79.95%80.20%
FastText84.23%83.90%83.95%
DPCNN86.39%86.33%86.30%
TextRCNN87.53%87.17%87.24%
Transformer81.51%81.33%81.39%
TodKat82.96%80.68%81.80%
BERT87.31%87.26%87.29%
ERNIE87.39%87.36%87.37%

表7

预训练模型嵌入实验结果对比"

模型精确率召回率F1
ERNIE_TextRCNN89.15%89.05%89.21%
TextCNN86.08%86.23%86.09%
BERT_TextCNN87.48%87.51%87.48%
ERNIE_TextCNN88.41%88.30%88.33%
DPCNN86.39%86.33%86.30%
BERT_DPCNN87.99%87.17%87.26%
ERNIE_DPCNN88.57%88.08%88.21%
TextRCNN87.53%87.17%87.24%
BERT_TextRCNN88.69%88.65%88.64%

图2

学习率的变化对TER性能的影响"

表8

各模型对不同长度文本情感分析的准确率对比"

500~

1000字

1000~2000字2000~3000字3000~4000字4000~5000字
TER77.42%75.58%73.47%76.00%73.39%
TextCNN50.00%41.33%40.00%43.33%45.00%
TextRNN43.33%41.30%30.37%33.33%40.00%
FastText50.00%40.62%43.75%41.25%43.75%
DPCNN30.00%36.96%30.00%38.89%38.00%
Text⁃RCNN56.67%49.13%43.33%42.22%49.17%
Transformer53.33%38.16%43.33%43.33%45.89%
TodKat56.11%54.17%52.47%54.00%55.16%
BERT60.00%58.00%50.45%50.11%50.25%
ERNIE60.33%58.70%50.65%50.49%50.67%

图3

子事件抽取的可视化样例"

1 Zhang W X, Li X, Deng Y,et al. A survey on aspect?based sentiment analysis:Tasks,methods,and challenges. 2022,arXiv:.
2 Wang J C, Wang J J, Sun C,et al. Sentiment classification in customer service dialogue with topic?aware multi?task learning∥Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York,NY,USA:AAAI Press,2020:9177-9184.
3 宋双永,王超,陈成龙,等. 面向智能客服系统的情感分析技术. 中文信息学报202034(2):80-95.
Song S Y, Wang C, Chen C L,et al. Sentiment ana?lysis for intelligent customer service chatbots. Journal of Chinese Information Processing202034(2):80-95.
4 赵天资,段亮,岳昆,等. 基于Biterm主题模型的新闻线索生成方法. 数据分析与知识发现20215(2):1-13.
Zhao T Z, Duan L, Yue K,et al. Generating news clues with Biterm topic model. Data Analysis and Knowledge Discovery20215(2):1-13.
5 Zhang Z Y, Han X, Liu Z Y,et al. ERNIE:Enhanced language representation with informative entities∥Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence,Italy:ACL,2019:1441-1451.
6 Devlin J, Chang M W, Lee K,et al. BERT:Pre?training of deep bidirectional transformers for language understanding∥Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Volume 1. Long and Short Papers. Minneapolis,MN,USA:Association for Computational Linguistics,2019:4171-4186.
7 Lai S W, Xu L H, Liu K,et al. Recurrent convolutional neural networks for text classification∥Proceedings of the 29th AAAI Conference on Artificial Intelligence. Austin,TX,USA:AAAI Press,2015:2267-2273.
8 Pappagari R, Zelasko P, Villalba J,et al. Hierarchical transformers for long document classification∥2019 IEEE Automatic Speech Recognition and Understanding Workshop. Sentosa,Singapore:IEEE,2019:838-844.
9 Xu J C, Chen D L, Qiu X P,et al. Cached long short?term memory neural networks for document?level sentiment classification∥Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing. Austin,TX,USA:ACL,2016:1660-1669.
10 Sheng D M, Yuan J L. An efficient long chinese text sentiment analysis method using BERT?based models with BiGRU∥Proceedings of 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design. Dalian,China:IEEE,2021:192-197.
11 Cheng N C, He Y Y, Zhong P X,et al. Chinese long text sentiment analysis based on the combination of title and topic sentences∥Proceedings of 2019 6th International Conference on Dependable Systems and Their Applications. Harbin,China:IEEE,2020:347-352.
12 Hazarika D, Poria S, Mihalcea R,et al. ICON:Interactive conversational memory network for multimodal emotion detection∥Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. Brussels,Belgium:ACL,2018:2594-2604.
13 Shen C L, Sun C L, Wang J J,et al. Sentiment classification towards question?answering with hierarchical matching network∥Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. Brussels,Belgium:ACL,2018:3654-3663.
14 Wang J J, Sun C L, Li S S,et al. Aspect sentiment classification towards question?answering with reinforced bidirectional attention network∥Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence,Italy:ACL,2019:3548-3557.
15 Hu D, Wei L W, Huai X Y. DialogueCRN:Contextual reasoning networks for emotion recognition in conversations∥Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Bangkok,Thailand:ACL,2021:7042-7052.
16 Zhu L X, Pergola G, Gui L,et al. Topic?driven and knowledge?aware transformer for dialogue emotion detection∥Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1:Long Papers). Bangkok,Thailand:ACL,2021:1571-1582.
17 周楠,杜攀,靳小龙,等. 面向舆情事件的子话题标签生成模型ET?TAG. 计算机学报201841(7):1490-1503.
Zhou N, Du P, Jin X L,et al. ET?TAG:A tag generation model for the sub?topics of public opinion events. Chinese Journal of Computers201841(7):1490-1503.
18 Memon M Q, Lu Y, Chen P H,et al. An ensemble clustering approach for topic discovery using implicit text segmentation. Journal of Information Science202047(4):1-27.
19 李金鹏,张闯,陈小军,等. 自动文本摘要研究综述. 计算机研究与发展202158(1):1-21.
Li J P, Zhang C, Chen X J,et al. Survey on automatic text summarization. Journal of Computer Research and Development202158(1):1-21.
20 Kim Y. Convolutional neural networks for sentence classification∥Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha,Qatar:ACL,2014:1746-1751.
21 Liu P F, Qiu X P, Huang X J. Recurrent neural network for text classification with multi?task learning∥Proceedings of the AAAI 25th International Joint Conference on Artificial Intelligence. Phoenix,AZ,USA:AAAI Press,2016:2873-2879.
22 Joulin A, Grave E, Bojanowski P,et al. Bag of tricks for efficient text classification∥Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia,Spanish:ACL,2017:427-431.
23 Johnson R, Zhang T. Deep pyramid convolutional neural networks for text categorization∥Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver,Canada:ACL,2017:562-570.
24 Vaswani A, Shazeer N, Parmar N,et al. Attention is all you need∥Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach,CA,USA:MIT Press,2017:6000-6010.
25 李莹莹,马帅,蒋浩谊,等. 一种基于社交事件关联的故事脉络生成方法. 计算机研究与发展201855(9):1972-1986.
Li Y Y, Ma S, Jiang H Y,et al. An approach for storytelling by correlating events from social networks. Journal of Computer Research and Development201855(9):1972-1986.
[1] 仲兆满, 熊玉龙, 黄贤波. 基于异构集成学习的多元文本情感分析研究[J]. 南京大学学报(自然科学版), 2023, 59(3): 471-482.
[2] 唐伟佳, 张华, 侯志荣. 基于空间卷积融合的中文文本匹配方法[J]. 南京大学学报(自然科学版), 2022, 58(5): 868-875.
[3] 蔡国永, 兰天. 基于多头注意力和词共现关系的方面级情感分析[J]. 南京大学学报(自然科学版), 2022, 58(5): 884-893.
[4] 张俊, 陈秀宏. 基于BERT模型的无监督候选词生成及排序算法[J]. 南京大学学报(自然科学版), 2022, 58(2): 286-297.
[5] 温玉莲, 林培光. 基于行业背景差异下的金融时间序列预测方法[J]. 南京大学学报(自然科学版), 2021, 57(1): 90-100.
[6] 吴静怡,吴钟强,商琳. 基于Shapelet的不相关情感子序列挖掘方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 57-66.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!