基于子事件的对话长文本情感分析

doi:10.13232/j.cnki.jnju.2023.03.011

南京大学学报(自然科学版) ›› 2023, Vol. 59 ›› Issue (3): 483–493.doi: 10.13232/j.cnki.jnju.2023.03.011

基于子事件的对话长文本情感分析

杨京虎¹^,², 段亮¹^,²(), 岳昆¹^,², 李忠斌¹^,²

^1.云南大学信息学院，昆明，650500
^2.云南大学云南省智能系统与计算重点实验室，昆明，650500

收稿日期:2023-02-15 出版日期:2023-05-31 发布日期:2023-06-09
通讯作者: 段亮 E-mail:duanl@ynu.edu.cn
基金资助:
云南省重大科技专项(202202AD080001);云南省重点实验室专项(202205AG070003);国家自然科学基金青年项目(62002311);云南省教育厅科学研究基金(2022Y010)

Sentimenta analysis based on subevents for long dialogue texts

Jinghu Yang¹^,², Liang Duan¹^,²(), Kun Yue¹^,², Zhongbin Li¹^,²

^1.School of Information Science and Engineering, Yunnan University, Kunming, 650500, China
^2.Key Laboratory of Intelligent Systems and Computing of Yunnan Province, Yunnan University, Kunming, 650500, China

Received:2023-02-15 Online:2023-05-31 Published:2023-06-09
Contact: Liang Duan E-mail:duanl@ynu.edu.cn

摘要/Abstract

摘要：

传统的情感分析方法主要针对句子、微博等形式的短文本，而对话长文本具有篇幅长、对话双方情感不同且情感易随对话发生变化等特点，使对话长文本中用户多重情感集成困难、情感分析任务精度低.为此，提出子事件交互模型TSI (Topic Subevents Interaction)、预训练模型ERNIE （Enhanced Language Representation with Informative Entities）和循环卷积神经网络（Recurrent Convolutional Neural Networks，RCNN）相结合的对话长文本情感分析模型（TSI with ERNIE?RCNN，TER）.该模型通过动态滑动窗口抽取子事件，保留文本关键特征，降低文本冗余度，基于抽取的子事件分析对话双方的情感来识别情感主体，并集成各子事件的情感特征来解决对话双方情感不一致的问题.在真实数据上的实验结果表明，TER的精确率、召回率与F1均优于现有模型.

关键词: 对话长文本, 情感分析, 子事件抽取, 预训练模型, 循环卷积神经网络

Abstract:

Previous studies for sentiment analysis mainly focus on short texts such as sentences and microblogs text styles. Due to the long redundant text and the different and changeable sentiment of users，the integration of multiple sentiment of users is difficult and the precision of sentiment analysis task is low in the long dialogue text. For overcoming such problems，a long dialogue text sentiment analysis model TER (Topic Sub?Events Interaction with ERNIE?RCNN) is proposed. Firstly，TSI (Topic Subevents Interaction) is used to segment long dialogue text by the dynamic sliding window in order to retain the key features of the text and reduce the text redundancy. Secondly，ERNIE?RCNN is used to analyze the sentimental polarity of users in the subevents. Finally，our model identifies the sentiment agent to integrate the sentiment of each subevent and solve the problem of sentimental inconsistency. Experimental results show that TER outperforms baseline models in terms of precision，recall and F1?score.

Key words: long dialogue text, sentiment analysis, subevent extraction, pre?trained model, recurrent convolutional neural network

中图分类号:

TP391

杨京虎, 段亮, 岳昆, 李忠斌. 基于子事件的对话长文本情感分析[J]. 南京大学学报(自然科学版), 2023, 59(3): 483–493.

Jinghu Yang, Liang Duan, Kun Yue, Zhongbin Li. Sentimenta analysis based on subevents for long dialogue texts[J]. Journal of Nanjing University(Natural Sciences), 2023, 59(3): 483–493.

图/表 11

表1

图1

表2

符号及含义"

符号	含义
$n$	一条对话文本的句子数量
$u$	划分窗口时循环总次数
$l o c$	当前循环位置
$W a$	滑动窗口起始位置
$W b$	滑动窗口结束位置
$W t$	当前滑动窗口起始位置
$Z$	滑动窗口的主题词分布
$C$	主题词间的相关度
$S$	子事件集合
$S i m c o s$	主题词相似度
$M p$	最大滑动窗口长度
$M q$	最小滑动窗口长度
$δ$	滑动窗口的主题相似次数
$θ$	主题相似度阈值
$∂$	子事件成立阈值
$ρ$	子事件位置判定阈值

表2

表3

表4

表5

表6

表7

图2

表8

图3

参考文献 25

1	Zhang W X， Li X， Deng Y，et al. A survey on aspect?based sentiment analysis：Tasks，methods，and challenges. 2022，arXiv：.
2	Wang J C， Wang J J， Sun C，et al. Sentiment classification in customer service dialogue with topic?aware multi?task learning∥Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York，NY，USA：AAAI Press，2020：9177-9184.
3	宋双永，王超，陈成龙，等. 面向智能客服系统的情感分析技术. 中文信息学报，2020，34(2)：80-95.
	Song S Y， Wang C， Chen C L，et al. Sentiment ana?lysis for intelligent customer service chatbots. Journal of Chinese Information Processing，2020，34(2)：80-95.
4	赵天资，段亮，岳昆，等. 基于Biterm主题模型的新闻线索生成方法. 数据分析与知识发现，2021，5(2)：1-13.
	Zhao T Z， Duan L， Yue K，et al. Generating news clues with Biterm topic model. Data Analysis and Knowledge Discovery，2021，5(2)：1-13.
5	Zhang Z Y， Han X， Liu Z Y，et al. ERNIE：Enhanced language representation with informative entities∥Proceedings of the 57^th Annual Meeting of the Association for Computational Linguistics. Florence，Italy：ACL，2019：1441-1451.
6	Devlin J， Chang M W， Lee K，et al. BERT：Pre?training of deep bidirectional transformers for language understanding∥Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics：Human Language Technologies. Volume 1. Long and Short Papers. Minneapolis，MN，USA：Association for Computational Linguistics，2019：4171-4186.
7	Lai S W， Xu L H， Liu K，et al. Recurrent convolutional neural networks for text classification∥Proceedings of the 29th AAAI Conference on Artificial Intelligence. Austin，TX，USA：AAAI Press，2015：2267-2273.
8	Pappagari R， Zelasko P， Villalba J，et al. Hierarchical transformers for long document classification∥2019 IEEE Automatic Speech Recognition and Understanding Workshop. Sentosa，Singapore：IEEE，2019：838-844.
9	Xu J C， Chen D L， Qiu X P，et al. Cached long short?term memory neural networks for document?level sentiment classification∥Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing. Austin，TX，USA：ACL，2016：1660-1669.
10	Sheng D M， Yuan J L. An efficient long chinese text sentiment analysis method using BERT?based models with BiGRU∥Proceedings of 2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design. Dalian，China：IEEE，2021：192-197.
11	Cheng N C， He Y Y， Zhong P X，et al. Chinese long text sentiment analysis based on the combination of title and topic sentences∥Proceedings of 2019 6th International Conference on Dependable Systems and Their Applications. Harbin，China：IEEE，2020：347-352.
12	Hazarika D， Poria S， Mihalcea R，et al. ICON：Interactive conversational memory network for multimodal emotion detection∥Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. Brussels，Belgium：ACL，2018：2594-2604.
13	Shen C L， Sun C L， Wang J J，et al. Sentiment classification towards question?answering with hierarchical matching network∥Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. Brussels，Belgium：ACL，2018：3654-3663.
14	Wang J J， Sun C L， Li S S，et al. Aspect sentiment classification towards question?answering with reinforced bidirectional attention network∥Proceedings of the 57^th Annual Meeting of the Association for Computational Linguistics. Florence，Italy：ACL，2019：3548-3557.
15	Hu D， Wei L W， Huai X Y. DialogueCRN：Contextual reasoning networks for emotion recognition in conversations∥Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Bangkok，Thailand：ACL，2021：7042-7052.
16	Zhu L X， Pergola G， Gui L，et al. Topic?driven and knowledge?aware transformer for dialogue emotion detection∥Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1：Long Papers). Bangkok，Thailand：ACL，2021：1571-1582.
17	周楠，杜攀，靳小龙，等. 面向舆情事件的子话题标签生成模型ET?TAG. 计算机学报，2018，41(7)：1490-1503.
	Zhou N， Du P， Jin X L，et al. ET?TAG：A tag generation model for the sub?topics of public opinion events. Chinese Journal of Computers，2018，41(7)：1490-1503.
18	Memon M Q， Lu Y， Chen P H，et al. An ensemble clustering approach for topic discovery using implicit text segmentation. Journal of Information Science，2020，47(4)：1-27.
19	李金鹏，张闯，陈小军,等. 自动文本摘要研究综述. 计算机研究与发展，2021，58(1)：1-21.
	Li J P， Zhang C， Chen X J，et al. Survey on automatic text summarization. Journal of Computer Research and Development，2021，58(1)：1-21.
20	Kim Y. Convolutional neural networks for sentence classification∥Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha，Qatar：ACL，2014：1746-1751.
21	Liu P F， Qiu X P， Huang X J. Recurrent neural network for text classification with multi?task learning∥Proceedings of the AAAI 25th International Joint Conference on Artificial Intelligence. Phoenix，AZ，USA：AAAI Press，2016：2873-2879.
22	Joulin A， Grave E， Bojanowski P，et al. Bag of tricks for efficient text classification∥Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia，Spanish：ACL，2017：427-431.
23	Johnson R， Zhang T. Deep pyramid convolutional neural networks for text categorization∥Proceedings of the 55^th Annual Meeting of the Association for Computational Linguistics. Vancouver，Canada：ACL，2017：562-570.
24	Vaswani A， Shazeer N， Parmar N，et al. Attention is all you need∥Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach，CA，USA：MIT Press，2017：6000-6010.
25	李莹莹，马帅，蒋浩谊，等. 一种基于社交事件关联的故事脉络生成方法. 计算机研究与发展，2018，55(9)：1972-1986.
	Li Y Y， Ma S， Jiang H Y，et al. An approach for storytelling by correlating events from social networks. Journal of Computer Research and Development，2018，55(9)：1972-1986.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

对话者	对话内容
客服	您好！很高兴为您服务.（积极）
客户	刚刚发信息说我已欠费五十元，我昨天才交一百元话费，怎么回事？（消极）
客服	非常抱歉，先生，您的资费…（积极）
客户	费用我承受不了，这个套餐是你们工作人员打电话叫我办理的，我当时在忙，还是给我换成以前的套餐吧.（消极）
…	…
客户	我明白了，这个费用还包括宽带对吧，不好意思啊，这个月太忙忘记了，谢谢你哈.（积极）
客服	先生您客气了，祝您生活愉快，再见！（积极）

数据集	无情感	积极	消极	样本合计
mc1	1321	1422	1257	4000
mc3	2721	2759	2520	8000

模型	精确率	召回率	F1
TER	76.38%	74.69%	75.53%
TextCNN	50.99%	44.00%	46.30%
TextRNN	49.07%	39.00%	41.54%
FastText	50.98%	40.00%	41.43%
DPCNN	52.16%	33.00%	35.37%
TextRCNN	57.88%	43.00%	44.76%
Transformer	60.93%	33.00%	31.55%
TodKat	59.28%	52.68%	55.79%
BERT	37.21%	61.00%	46.22%
ERNIE	39.15%	63.00%	48.29%

方法	信息性	准确性	理解性
TSI	1.9	1.04	1.83
特征评分	2.41	2.9	2.42
TextRank	3.38	3.31	3.19
序列标注	2.28	2.74	2.55

模型	精确率	召回率	F1
TER	89.15%	89.05%	89.21%
TextCNN	86.08%	86.23%	86.09%
TextRNN	81.35%	79.95%	80.20%
FastText	84.23%	83.90%	83.95%
DPCNN	86.39%	86.33%	86.30%
TextRCNN	87.53%	87.17%	87.24%
Transformer	81.51%	81.33%	81.39%
TodKat	82.96%	80.68%	81.80%
BERT	87.31%	87.26%	87.29%
ERNIE	87.39%	87.36%	87.37%

基于子事件的对话长文本情感分析

Sentimenta analysis based on subevents for long dialogue texts

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献 25

相关文章 6

Metrics

本文评价

推荐阅读 0

模型	精确率	召回率	F1
ERNIE_TextRCNN	89.15%	89.05%	89.21%
TextCNN	86.08%	86.23%	86.09%
BERT_TextCNN	87.48%	87.51%	87.48%
ERNIE_TextCNN	88.41%	88.30%	88.33%
DPCNN	86.39%	86.33%	86.30%
BERT_DPCNN	87.99%	87.17%	87.26%
ERNIE_DPCNN	88.57%	88.08%	88.21%
TextRCNN	87.53%	87.17%	87.24%
BERT_TextRCNN	88.69%	88.65%	88.64%

	500~ 1000字	1000~2000字	2000~3000字	3000~4000字	4000~5000字
TER	77.42%	75.58%	73.47%	76.00%	73.39%
TextCNN	50.00%	41.33%	40.00%	43.33%	45.00%
TextRNN	43.33%	41.30%	30.37%	33.33%	40.00%
FastText	50.00%	40.62%	43.75%	41.25%	43.75%
DPCNN	30.00%	36.96%	30.00%	38.89%	38.00%
Text⁃RCNN	56.67%	49.13%	43.33%	42.22%	49.17%
Transformer	53.33%	38.16%	43.33%	43.33%	45.89%
TodKat	56.11%	54.17%	52.47%	54.00%	55.16%
BERT	60.00%	58.00%	50.45%	50.11%	50.25%
ERNIE	60.33%	58.70%	50.65%	50.49%	50.67%

[1]	仲兆满, 熊玉龙, 黄贤波. 基于异构集成学习的多元文本情感分析研究[J]. 南京大学学报(自然科学版), 2023, 59(3): 471-482.
[2]	唐伟佳, 张华, 侯志荣. 基于空间卷积融合的中文文本匹配方法[J]. 南京大学学报(自然科学版), 2022, 58(5): 868-875.
[3]	蔡国永, 兰天. 基于多头注意力和词共现关系的方面级情感分析[J]. 南京大学学报(自然科学版), 2022, 58(5): 884-893.
[4]	张俊, 陈秀宏. 基于BERT模型的无监督候选词生成及排序算法[J]. 南京大学学报(自然科学版), 2022, 58(2): 286-297.
[5]	温玉莲, 林培光. 基于行业背景差异下的金融时间序列预测方法[J]. 南京大学学报(自然科学版), 2021, 57(1): 90-100.
[6]	吴静怡,吴钟强,商琳. 基于Shapelet的不相关情感子序列挖掘方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 57-66.