南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (5): 765–773.doi: 10.13232/j.cnki.jnju.2019.05.008

• • 上一篇    下一篇

跨语言语料库的语音情感识别对比研究

钟琪,冯亚琴,王蔚()   

  1. 南京师范大学教育科学学院机器学习与认知实验室,南京,210097
  • 收稿日期:2019-06-14 出版日期:2019-09-30 发布日期:2019-11-01
  • 通讯作者: 王蔚 E-mail:wangwei5@njnu.edu.cn
  • 基金资助:
    国家社会科学基金(BCA150054)

Comparison of speech emotion recognition in cross language corpus

Qi Zhong,Fen Yaqin,Wei Wang()   

  1. MLC Lab, School of Educational Science, Nanjing Normal University, Nanjing, 210097, China
  • Received:2019-06-14 Online:2019-09-30 Published:2019-11-01
  • Contact: Wei Wang E-mail:wangwei5@njnu.edu.cn

摘要:

情感感知具有普遍性和差异性,不同语言表达的情感有不同的情感特征,但也存在相似的情感特征.选择IEMOCAP英语情感数据库、CASIA汉语情感数据库、EMO?BD德语情感数据库,以中性、生气、快乐、悲伤四种情感为研究对象,了解在单语言语料库、混合语言语料库、跨语料库的语音情感识别情况.使用支持向量机(Support Vector Machine,SVM)、卷积神经网络(Convolutional Neural Networks,CNN)和长短时记忆网络(Long?Short Term Memory,LSTM)为分类器进行训练,对情感进行识别.从实验结果可以看出,不同语料库的语音情感的识别模式存在相似性,也存在相似的语言情感特性.还发现英文的中性情感和中文的悲伤情感具有良好的模型泛化性,英文的悲伤情感和中文的中性情感有较好的适应性.

关键词: 跨语料库, 语音情感, 深度学习, 分类器, 迁移学习

Abstract:

Emotional recognition has universality and difference. Different language emotional databases have different emotional characteristics,and they also have similar emotional characteristics. This paper chooses IEMOCAP English emotion database,CASIA Chinese emotion database and EMO?BD German emotion database,and takes neutral,angry,happy and sad emotions as research objects to understand the situation of speech emotion recognition in single language corpus,mixed language corpus and cross?language corpus. Support Vector Machine (SVM),Convolutional Neural Network (CNN) and Long?Short Term Memory (LSTM) Network are used as classifiers to recognize emotions. The results show that there are similarities and cultural characteristics in speech emotion recognition patterns of different emotion corpora. It is found that English neutral emotion and Chinese sad emotion have good generalization of models,while English sad emotion and Chinese neutral emotion have better adaptability.

Key words: cross?corpus, speech emotion, deep learning, classifier, transfer learning

中图分类号: 

  • TP391

表1

单语言语料库总体情感识别率"

IEMOCAP CASIA EMO?BD
Average 0.563 0.59 0.756
SVM 0.58 0.74 0.76
CNN 0.55 0.52 0.69
LSTM 0.56 0.51 0.82

表2

单语言语料库单个情感识别率"

IEMOCAP CASIA EMO?BD
SVM CNN LSTM SVM CNN LSTM SVM CNN LSTM
neutral 0.6 0.61 0.52 0.71 0.56 0.56 0.89 0.71 0.9
angry 0.64 0.55 0.63 0.79 0.47 0.62 0.85 0.69 0.85
happiness 0.43 0.43 0.5 0.69 0.36 0.28 0.44 0.46 0.55
sad 0.69 0.64 0.67 0.79 0.65 0.59 0.94 0.95 0.98

表3

混合语言语料库总体情感识别率"

CASIA+IEMOCAP EMO?BD+IEMOCAP CASIA+EMO?BD CASIA+IEMOCAP+EMO?BD
Average 0.4469 0.4001 0.4355 0.4327
SVM 0.6719 0.7618 0.7344 0.6956
CNN 0.3453 0.2215 0.305 0.3146
LSTM 0.3234 0.217 0.267 0.288

表4

混合语言语料库单个情感识别率"

CASIA+IEMOCAP EMO?BD+IEMOCAP CASIA+EMO?BD CASIA+IEMOCAP+EMO?BD
SVM CNN LSTM SVM CNN LSTM SVM CNN LSTM SVM CNN LSTM
neutral 0.62 0.41 0.36 0.77 0.34 0.43 0.79 0.31 0.25 0.67 0.32 0.25
angry 0.72 0.45 0.44 0.76 0.17 0.13 0.78 0.43 0.36 0.77 0.44 0.42
happiness 0.6 0.34 0.25 0.68 0.19 0.21 0.52 0.23 0.21 0.56 0.25 0.18
sad 0.74 0.18 0.24 0.85 0.19 0.11 0.84 0.2 0.21 0.77 0.21 0.27

表5

跨语料库总体情感识别率"

分类器 基础语料库 迁移语料库
IEMOCAP CASIA EMO?BD
SVM IEMOCAP 0.5802 0.495 0.4749
CASIA 0.4321 0.7425 0.53392
EMO?BD 0.46103 0.49375 0.75811
CNN IEMOCAP 0.5466 0.51 0.59
CASIA 0.4319 0.51875 0.46018
EMO?BD 0.4234 0.37875 0.6932
LSTM IEMOCAP 0.5637 0.4975 0.5516
CASIA 0.4408 0.51375 0.45723
EMO?BD 0.4366 0.44125 0.823

图1

以IEMOCAP为训练模型的跨语料库单个情感识别率"

图2

以CASIA为训练模型的跨语料库单个情感识别率"

图3

以EMO?BD为训练模型的跨语料库单个情感识别率"

1 宋鹏,郑文明,赵力 . 基于特征迁移学习方法的跨库语音情感识别. 清华大学学报(自然科学版),2016,56(11):1179-1183.
Song P , Zheng W M , Zhao L . Cross?corpus speech emotion recognition based on a feature transfer learning method. Journal of Tsinghua University (Natural Science Edition),2016,56(11):1179-1183.
2 Shah M , Chakrabarti C , Spanias A . Within and cross?corpus speech emotion recognition using latent topic model?based features. EURASIP Journal on Audio,Speech,and Music Processing ,2015,2015(1):4.
3 Schuller B , Vlasenko B , Eyben F ,et al . Cross?corpus acoustic emotion recognition:variances and strategies. IEEE Transactions on Affective Computing,2010,1(2):119-131.
4 Schuller B , Zhang Z X , Weninger F ,et al . Using multiple databases for training in emotion recognition:to unite or to vote?∥Proceedings of the 12th Annual Conference of the International Speech Communication Association. Florence,Italy,2011:1553-1556.
5 Abdelwahab M , Busso C . Supervised domain adaptation for emotion recognition from speech∥2015 IEEE International Conference on Acoustics,Speech and Signal Processing. Brisbane,Australia:IEEE,2015:5058-5062.
6 Mao Q R , Xue W T , Rao Q R ,et al . Domain adaptation for speech emotion recognition by sharing priors between related source and target classes∥2016 IEEE International Conference on Acoustics,Speech and Signal Processing. Shanghai,China:IEEE,2016:2608-2612.
7 李爱军,邵鹏飞,党建武 . 情感表达的跨文化多模态感知研究. 清华大学学报(自然科学版),2009,49(S1):1393-1401.
Li A J , Shao P F , Dang J W . Intercultural multimodal perception of emotional expressions. Journal of Tsinghua University (Natural Science Edition),2009,49(S1):1393-1401.
8 Scherer K R , Banse R , Wallbott H G . Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross?Cultural Psychology,2001,32(1):76-92.
9 Pell M D , Paulmann S , Dara C ,et al . Factors in the recognition of vocally expressed emotions:a comparison of four languages. Journal of Phonetics,2009,37(4):417-435.
10 Paulmann S , Uskul A K . Cross?cultural emotional prosody recognition:evidence from Chinese and British listeners. Cognition and Emotion,2014,28(2):230-244.
11 Koeda M , Belin P , Hama T ,et al . Cross?cultural differences in the processing of non?verbal affec?tive vocalizations by Japanese and Canadian listeners. Frontiers in Psychology,2013,4:105.
12 Sauter D A , Eisner F , Ekman P ,et al . Cross?cultural recognition of basic emotions through nonverbal emotional vocalizations. Proceedings of the National Academy of Sciences of the United States of America,2010,107(6):2408-2412.
13 Lanjewar R B , Mathurkar S , Patel N . Implementation and comparison of speech emotion recognition system using Gaussian Mixture Model (GMM) and K?Nearest Neighbor (K?NN) techni?ques. Procedia Computer Science,2015,49:50-57.
14 孙红进 . 基于GMM的语音情感信息识别. 信息技术,2008(12):138-140.
Sun H J . Emotion recognition of speech based on GMM. Information Technology,2008(12):138-140.
15 Chen Y L , Zhang Z . Research on text sentiment analysis based on CNNs and SVM∥2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA). Wuhan,China:IEEE,2018:2731-2734.
16 任浩,叶亮,李月 等 . 基于多级SVM分类的语音情感识别算法. 计算机应用研究,2017,34(6):1682-1684.
Ren H , Ye L , Li Y ,et al . Speech emotion recognition algorithm based on multi?layer SVM classification. Application Research of Computers,2017,34(6):1682-1684.
17 Zhao J F , Xia M , Chen L J . Learning deep features to recognise speech emotion using merged deep CNN. IET Signal Processing,2018,12(6):713-721.
18 薄洪健,马琳,孔祥浩 等 . 基于卷积神经网络学习的语音情感特征降维方法研究. 高技术通讯,2017,27(11-12):889-898.
Bo H J , Ma L , Kong X H ,et al . Research on a dimension reduction method of speech emotional feature based on convolution neural network. Chinese High Technology Letters,2017,27(11-12):889-898.
19 Chao L L , Tao J H , Yang M H ,et al . Long short term memory recurrent neural network based encoding method for emotion recognition in video∥IEEE International Conference on Acoustics,Speech and Signal Processing. Shanghai,China:IEEE,2016:2752-2756.
20 刘畅,张一珂,张鹏远 等 . 基于改进主题分布特征的神经网络语言模型. 电子与信息学报,2018,40(1):219-225.
Liu C , Zhang Y K , Zhang P Y ,et al . Neural network language modeling using an improved topic distribution feature. Journal of Electronics and Information Technology,2018,40(1):219-225.
21 Eyben F , W?llmer M , Schuller B . Opensmile:The munich versatile and fast open?source audio feature extractor∥Proceedings of the 18th ACM International Conference on Multimedia.Firenze,Italy:ACM,2010:1459-1462.
22 Milton A , Roy S S , Selvi S T . SVM scheme for speech emotion recognition using MFCC feature. International Journal of Computer Applications,2013,69(9):34-39.
23 Wollmer M , Schuller B , Eyben F ,et al . Combining long short?term memory and dynamic Bayesian networks for incremental emotion?sensitive artificial listening. IEEE Journal of Selected Topics in Signal Processing,2010,4(5):867-881.
24 Busso C , Bulut M , Lee C C ,et al . IEMOCAP:interactive emotional dyadic motion capture database. Language Resources and Evaluation,2008,42(4):335-359.
25 Pan S F , Tao J H , Li Y . The CASIA audio emotion recognition method for audio/visual emotion challenge 2011∥Proceedings of the 4th International Conference on Affective Computing and Intelligent Interaction.Memphis,TN,USA:ACM,2011:388-395.
26 Burkhardt F , Paeschke A , Rolfes M ,et al . A database of German emotional speech∥Proceedings of Interspeech 2005. Lisbon,Portugal,2005:1517-1520.
27 Juth P , Lundqvist D , Karlsson A ,et al . Looking for foes and friends:perceptual and emotional factors when finding a face in the crowd. Emotion,2005,5(4):379-395.
28 Shimamura A P , Ross J G , Bennett H D . Memory for facial expressions:the power of a smile. Psychonomic Bulletin & Review,2006,13(2):217-222.
29 Scherer K R . The role of culture in emotion?antecedent appraisal. Journal of Personality & Social Psychology,1997,73(5):902-922.
[1] 王丽娟,丁世飞,丁玲. 基于迁移学习的软子空间聚类算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 515-523.
[2] 朱伟,张帅,辛晓燕,李文飞,王骏,张建,王炜. 结合区域检测和注意力机制的胸片自动定位与识别[J]. 南京大学学报(自然科学版), 2020, 56(4): 591-600.
[3] 陈俊芬,赵佳成,韩洁,翟俊海. 基于深度特征表示的Softmax聚类算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 533-540.
[4] 李康,谢宁,李旭,谭凯. 基于卷积神经网络和几何优化的统计染色体核型分析方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 116-124.
[5] 韩普,刘亦卓,李晓艳. 基于深度学习和多特征融合的中文电子病历实体识别研究[J]. 南京大学学报(自然科学版), 2019, 55(6): 942-951.
[6] 张家精,夏巽鹏,陈金兰,倪友聪. 基于张量分解和深度学习的混合推荐算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 952-959.
[7] 曹欣怡,李鹤,王蔚. 基于语料库的语音情感识别的性别差异研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 758-764.
[8] 王蔚, 胡婷婷, 冯亚琴. 基于深度学习的自然与表演语音情感识别[J]. 南京大学学报(自然科学版), 2019, 55(4): 660-666.
[9] 洪思思,曹辰捷,王 喆*,李冬冬. 基于矩阵的AdaBoost多视角学习[J]. 南京大学学报(自然科学版), 2018, 54(6): 1152-1160.
[10] 陈琳琳1*,陈德刚2. 一种基于核对齐的分类器链的多标记学习算法[J]. 南京大学学报(自然科学版), 2018, 54(4): 725-.
[11] 孟佳娜*, 赵丹丹, 于玉海, 孙世昶. 归纳式迁移学习在跨领域情感倾向性分析中的应用[J]. 南京大学学报(自然科学版), 2016, 52(1): 175-183.
[12] 张鹏,黄毅,阮雅端,陈启美*. 基于稀疏特征的交通流视频检测算法[J]. 南京大学学报(自然科学版), 2015, 51(2): 264-270.
[13]  曹连连,陈松灿**.  加权Laplacian分类器*[J]. 南京大学学报(自然科学版), 2012, 48(4): 459-465.
[14]  蒋才智**,王浩,姚宏亮
.  基于知网的贝叶斯中文人名识别*
[J]. 南京大学学报(自然科学版), 2012, 48(2): 147-153.
[15]  杨小军 1 , 杨兴炜 2 , 曾  峦 3 , 刘文予 4
.  基于轮廓关键点集的形状分类

[J]. 南京大学学报(自然科学版), 2010, 46(1): 47-55.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 林 銮,陆武萍,唐朝生,赵红崴,冷 挺,李胜杰. 基于计算机图像处理技术的松散砂性土微观结构定量分析方法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1064 -1074 .
[2] 段新春,施 斌,孙梦雅,魏广庆,顾 凯,冯晨曦. FBG蒸发式湿度计研制及其响应特性研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1075 -1084 .
[3] 梅世嘉,施 斌,曹鼎峰,魏广庆,张 岩,郝 瑞. 基于AHFO方法的Green-Ampt模型K0取值试验研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1085 -1094 .
[4] 卢 毅,于 军,龚绪龙,王宝军,魏广庆,季峻峰. 基于DFOS的连云港第四纪地层地面沉降监测分析[J]. 南京大学学报(自然科学版), 2018, 54(6): 1114 -1123 .
[5] 胡 淼,王开军,李海超,陈黎飞. 模糊树节点的随机森林与异常点检测[J]. 南京大学学报(自然科学版), 2018, 54(6): 1141 -1151 .
[6] 洪思思,曹辰捷,王 喆*,李冬冬. 基于矩阵的AdaBoost多视角学习[J]. 南京大学学报(自然科学版), 2018, 54(6): 1152 -1160 .
[7] 魏 桐,童向荣. 基于加权启发式搜索的鲁棒性信任路径生成[J]. 南京大学学报(自然科学版), 2018, 54(6): 1161 -1170 .
[8] 韩明鸣, 郭虎升, 王文剑. 面向非平衡多分类问题的二次合成QSMOTE方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 1 -13 .
[9] 秦 娅, 申国伟, 赵文波, 陈艳平. 基于深度神经网络的网络安全实体识别方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 29 -40 .
[10] 陆慎涛, 葛洪伟, 周 竞. 自动确定聚类中心的移动时间势能聚类算法[J]. 南京大学学报(自然科学版), 2019, 55(1): 143 -153 .