南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (4): 660–666.doi: 10.13232/j.cnki.jnju.2019.04.016

• • 上一篇    下一篇

基于深度学习的自然与表演语音情感识别

王蔚(),胡婷婷,冯亚琴   

  1. MLC实验室, 南京师范大学教育科学学院教育技术系, 南京,210097
  • 收稿日期:2019-03-05 出版日期:2019-07-30 发布日期:2019-07-23
  • 通讯作者: 王蔚 E-mail:wangwei5@njnu.edu.cn
  • 基金资助:
    国家哲学社会科学基金(BCA150054)

Speech emotion recognition in nature and scripted statebased on deep learning

Wei Wang(),Tingting Hu,Yaqin Feng   

  1. MLC Lab,Department of Educational Technology,School of Educational Science,Nanjing Normal University,Nanjing,210097,China
  • Received:2019-03-05 Online:2019-07-30 Published:2019-07-23
  • Contact: Wei Wang E-mail:wangwei5@njnu.edu.cn

摘要:

语音是情感表达的重要途径,自然状态和表演状态下的语音所蕴含的情感信息并不完全相同.为了探索自然状态和表演状态下语音情感识别的差异,采用深度学习算法分析了IEMOCAP公用数据集,对自然状态和表演状态下的中性、愤怒、开心和悲伤等四类情绪语音数据进行实验:首先提取语音数据的声学特征(对比了emobase2010特征集和eGeMAPs特征集),然后利用卷积神经网络(Convolutional Neural Networks,CNN)对自然与表演状态下的语音情感进行识别,比较了两种状态下的情感识别率,再利用混淆矩阵分析两种状态下不同情绪之间的误分率和相似性.实验结果显示,自然状态下的情感识别率明显高于表演状态下,还发现愤怒和悲伤在两种状态下的误分率有明显区别.该现象对理解情绪的表达机制有启发意义.

关键词: 情感类别, 语音情感识别, 深度学习, 伪装语音

Abstract:

Speech is an important way of emotional expression. The emotional information is not the same under the speech state of nature and scripted. In order to explore the difference of speech emotion recognition under the nature and the scripted state,the deep learning algorithm is used to analysis IEMOCAP public datasets. Four types of emotions,such as neutral,anger,happy and sad,are analyzed in the following experiments. Firstly,acoustic features are extracted (compared in the emobase2010 and eGeMAPs features sets). Then,Convolution Neural network (CNN) was carried out to recognize speech emotion in the nature and scripted state,respectively. Finally,confusion matrix is used to analyze the difference of the recognition accuracy of two states in every emotions. Results show that the emotion recognition accuracy in natural state was significantly higher than the one in the scripted state. There was also significant difference in the two states for angry and sad emotions. The results would be helpful for understanding the mechanism of emotional expression.

Key words: emotion categorization, speech emotion recognition, deep learning, deceptive speech

中图分类号: 

  • H107

表1

自然与表演状态下的样本分布"

样本种类中性愤怒开心悲伤总数
自然(improvise)10992899476082943
表演(scripted)6098146894762588

表2

自然与表演状态下四类情感总的识别准确率"

数据特征集CNNSVM
UARACCUARACC
表演语音emobase20100.5990.6090.5300.562
自然语音emobase20100.6590.6690.6220.628
表演语音eGeMAPs0.5440.5560.5040.518
自然语音eGeMAPs0.6220.6630.5880.631

图1

自然与表演各种情绪识别准确率结果对比"

图2

自然与表演状态下情感识别混淆矩阵"

图3

自然和表演状态下各情绪的误识率"

1 FanX H,ZhaoH M,ChenX Q,et al. Deceptive speech detection based on sparse representation∥2016 IEEE 12th International Colloquium on Signal Processing & Its Applications. Malacca City,Malaysia:IEEE,2016,DOI:10.1109/CSPA.2016.7515793.
doi: 10.1109/CSPA.2016.7515793
2 PanX Y,ZhaoH M,ZhouY. The application of fractional Mel cepstral coefficient in deceptive speech detection. PreeJ,2015,3:e1194.
3 LevitanS I,AnG Z,WangM D,et al. Cross?cultural production and detection of deception from speech∥Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection. New York,NY,USA:ACM,2015,DOI:10.1145/2823465.2823468.
doi: 10.1145/2823465.2823468
4 DevillersL,VidrascuL,LamelL. Challenges in real?life emotion annotation and machine learning based detection. Neural Networks,2005,18(4):407-422.
5 HirschbergJ,BenusS,BrenierJ M,et al. Distinguishing deceptive from non?deceptive speech∥Interspeech 2005. Lisbon,Portugal:ISCA,2005:1833-1836.
6 MendelsG,LevitanS I,LeeK Z,et al. Hybrid acoustic?lexical deep learning approach for deception detection∥Proceedings of Interspeech 2017. Stockholm,Sweden:ISCA,2017,DOI:10.21437/Interspeech.2017-1723.
doi: 10.21437/Interspeech.2017-1723
7 FanC,ZhaoH M,ChenX Q,et al. Distinguishing deception from non?deception in Chinese speech∥2015 6th International Conference on Intelligent Control and Information Processing. Wuhan,China:IEEE,2016,DOI:10.1109/ICICIP.2015. 7388181.
doi: 10.1109/ICICIP.2015. 7388181
8 HirschbergJ. Deceptive speech:clues from spoken language. Chen F,Jokinen K. Speech Technology.Boston:Springer,2010,79-88.
9 EkmanP. Telling lies:clues to deceit in the marketplace,politics,and marriage (Revised Edition)New York:W. W. Norton & Company,2009,416.
10 Douglas?CowieE,CampbellN,CowieR,et al. Emotional speech:towards a new generation ofdatabases. Speech Communication,2003,40(1-2):33-60.
11 BatlinerA,FischerK,HuberR,et al. Desperately seeking emotions or:actors,wizards and human beings∥ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion. Northern Ireland,United Kingdom:Newcastle,2000:195-200.
12 ChenchahF,LachiriZ. A bio?inspired emotion recognition system under real?life conditions. Applied Acoustics,2017,115:6-14.
13 NeumannM,VuN T. Attentive convolutional neural network based speech emotion recognition:s study on the impact of input features,signal length,and acted speech. 2017,arXiv:1706. 00612.
14 BussoC,BulutM,LeeC C,et al. IEMOCAP:interactive emotional dyadic motion capture database. LanguageResources and Evaluation,2008,42(4):335-359.
15 胡婷婷,沈凌洁,冯亚琴等. 语音与文本情感识别中愤怒与开心误判分析. 计算机技术与发展,2018,28(11):130-133.
Hu T T,Shen L J,Feng Y Q,et al.Research on anger and happy misclassification in speech and text emotion recognition. Computer Technology and Development,2018,28(11):130-133.
16 SchullerB,SteidlS,BatlinerA,et al. The INTERSPEECH 2010 paralinguistic challenge?age,gender,and affect∥Interspeech 2010. Makuhari,Japan:ISCA,2010:2794-2797.
17 Eybe,SchererK R,SchullerB W,et al. The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing,2016,7(2):190-202.
[1] 朱伟,张帅,辛晓燕,李文飞,王骏,张建,王炜. 结合区域检测和注意力机制的胸片自动定位与识别[J]. 南京大学学报(自然科学版), 2020, 56(4): 591-600.
[2] 李康,谢宁,李旭,谭凯. 基于卷积神经网络和几何优化的统计染色体核型分析方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 116-124.
[3] 韩普,刘亦卓,李晓艳. 基于深度学习和多特征融合的中文电子病历实体识别研究[J]. 南京大学学报(自然科学版), 2019, 55(6): 942-951.
[4] 张家精,夏巽鹏,陈金兰,倪友聪. 基于张量分解和深度学习的混合推荐算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 952-959.
[5] 钟琪,冯亚琴,王蔚. 跨语言语料库的语音情感识别对比研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 765-773.
[6] 张鹏,黄毅,阮雅端,陈启美*. 基于稀疏特征的交通流视频检测算法[J]. 南京大学学报(自然科学版), 2015, 51(2): 264-270.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 许 林,张 巍*,梁小龙,肖 瑞,曹剑秋. 岩土介质孔隙结构参数灰色关联度分析[J]. 南京大学学报(自然科学版), 2018, 54(6): 1105 -1113 .
[2] 卢 毅,于 军,龚绪龙,王宝军,魏广庆,季峻峰. 基于DFOS的连云港第四纪地层地面沉降监测分析[J]. 南京大学学报(自然科学版), 2018, 54(6): 1114 -1123 .
[3] 王 倩,聂秀山,耿蕾蕾,尹义龙. D2D通信中基于Q学习的联合资源分配与功率控制算法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1183 -1192 .
[4] 孔 颉, 孙权森, 纪则轩, 刘亚洲. 基于仿射不变离散哈希的遥感图像快速目标检测新方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 49 -60 .
[5] 阚 威, 李 云. 基于LSTM的脑电情绪识别模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 110 -116 .
[6] 顾健伟, 曾 诚, 邹恩岑, 陈 扬, 沈 艺, 陆 悠, 奚雪峰. 基于双向注意力流和自注意力结合的机器阅读理解[J]. 南京大学学报(自然科学版), 2019, 55(1): 125 -132 .
[7] 安 晶, 艾 萍, 徐 森, 刘 聪, 夏建生, 刘大琨. 一种基于一维卷积神经网络的旋转机械智能故障诊断方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 133 -142 .
[8] 王博闻, 史江峰, 史逝远, 张伟杰, 马晓琦, 赵业思. 基于遥感数据定位老龄树群[J]. 南京大学学报(自然科学版), 2019, 55(4): 699 -707 .
[9] 马益平,严浩军,王琼京,赵亚云,张秋菊,孔春龙. 混合配体法合成氨基MIL⁃101(Cr)及其二氧化碳吸附和除湿性能[J]. 南京大学学报(自然科学版), 2019, 55(5): 840 -849 .
[10] 徐扬,周文瑄,阮慧彬,孙雨,洪宇. 基于层次化表示的隐式篇章关系识别[J]. 南京大学学报(自然科学版), 2019, 55(6): 1000 -1009 .