南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (5): 758–764.doi: 10.13232/j.cnki.jnju.2019.05.007

• • 上一篇    下一篇

基于语料库的语音情感识别的性别差异研究

曹欣怡,李鹤,王蔚()   

  1. 南京师范大学教育科学学院教育技术系机器学习与认知实验室,南京,210097
  • 收稿日期:2019-08-14 出版日期:2019-09-30 发布日期:2019-11-01
  • 通讯作者: 王蔚 E-mail:wangwei5@njnu.edu.cn
  • 基金资助:
    国家哲学社会科学基金(BCA150054)

A study on gender differences in speech emotion recognitionbased on corpus

Xinyi Cao,He Li,Wei Wang()   

  1. MLC Lab, Department of Educational Technology, School of Educational Science, Nanjing Normal University, Nanjing, 210097, China
  • Received:2019-08-14 Online:2019-09-30 Published:2019-11-01
  • Contact: Wei Wang E-mail:wangwei5@njnu.edu.cn

摘要:

性别是语音情感识别中重要的影响因素之一.用机器学习方法和情感语音数据库对语音情感识别的性别差异进行探究,并进一步从声学特征的角度分析了性别影响因素.在两个英文情感数据集以及它们的融合数据集上进行实验,分别用三种分类器对男女语音情感进行识别,并用注意力机制挑选出在男女语音情感识别中的重要特征并比较其差异.结果表明,女性语音的情感识别率高于男性.梅尔倒谱系数、振幅微扰、频谱斜率等频谱特征在男女语音的情感识别中的重要性差异较大.

关键词: 机器学习, 性 别, 情感识别, 语音情感, 注意力机制

Abstract:

Gender is one of the important factors in speech emotion recognition. This study aims to explore the gender differences in speech emotion recognition by using machine learning and emotional speech database,and further explore the gender differences from the perspective of acoustic features. The research was experiment on two English emotional datasets and their fusion dataset,respectively. We employed three kinds of classifiers for men's and women's speech emotion recognition. Besides,attention mechanism was used to select the important features in speech emotion recognition and compare the differences of males and females. The results show that the recognition rate of female speech emotion is higher than that of male. The importance of spectrum features such as Mel Frequency Cepstral Coefficient,Shimmer and Spectral slope in speech emotion recognition varies greatly between men and women.

Key words: machine learning, gender, emotion recognition, speech emotion, attention mechanism

中图分类号: 

  • H107

表1

IEMOCAP数据集上,在不同特征集和不同分类器模型中的召回率"

性别特征集SVMCNNLSTM
MaleeGeMAPS0.6010.5650.6145

Emobase

2010

0.54450.63650.6475
FemaleeGeMAPS0.65750.57850.6455

Emobase

2010

0.5840.6610.6775

表2

eNTERFACE'05数据集上,在不同特征集和不同分类器模型中的召回率"

性别特征集SVMCNNLSTM
MaleeGeMAPS0.65330.48670.7067

Emobase

2010

0.86670.67330.8
FemaleeGeMAPS0.72670.54670.7933

Emobase

2010

0.880.68670.8267

表3

两个数据集上,在不同分类器和不同特征集中男女情感识别率T检验的sig值"

数据集特征集SVMCNNLSTM
IEMOCAPeGeMAPS0.0000.0000.000
Emobase 20100.0000.0000.000
eNTERFACE'05eGeMAPS0.0000.0000.000
Emobase 20100.01390.0000.000

表4

融合数据集上,在不同特征集和不同分类器模型中的召回率"

性别特征集SVMCNNLSTM
MaleeGeMAPS0.58710.54260.5992

Emobase

2010

0.53080.62920.6654
FemaleeGeMAPS0.63880.57700.6483

Emobase

2010

0.58460.68330.6829

表5

三个数据集中,在男女语音情感识别中的特征重要性的差异排序"

重要性相差大的前15个

特征名称

语音情感识别中的重要性排名(女性/男性)
梅尔倒谱系数4(MFCC4)9/24
振幅微扰(Shimmer)20/6

频谱斜率

(Spectral Slope0?500 Hz)

13/26

谐波差异H1?A3

(Harmonic difference)

29/19
F3相关能量22/32
F1带宽(F1 bandwidth)32/22
F2带宽(F2 bandwidth)31/23
F1频率(F1 frequency)16/8
每秒连续发音区域的数量10/18
Hammarbeg指数8/15
有声区域的平均长度11/4
无声区域的平均长度21/14
频谱流量(Spectral ?ux)7/1
频率微扰(jitter)23/17

等效音级

(equivalent Sound Level_dBp )

4/10
1 DehghanA,OrtizE G,ShuG,et al. DAGER:deep age,gender and emotion recognition using convolutional neural network. arXiv:1702.04280,2017.
2 KimJ,EnglebienneG,TruongK P,et al. Towards speech emotion recognition “in the wild” using aggregated corpora and deep multi?task learning. arXiv:1708.03920,2017.
3 WangZ Q,TashevI. Learning utterance?level representations for speech emotion and age/gender recognition using deep neural networks∥2017 IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). New Orleans,LA,USA:IEEE,2017,doi:10.1109/ICASSP.2017.7953138.
doi: 10.1109/ICASSP.2017.7953138
4 FuL Q,WangC J,ZhangY M. A study on influence of gender on speech emotion classification∥2nd International Conference on Signal Processing System. Dalian,China:IEEE,2010,DOI:10.1109/ICSPS.2010.5555556.
doi: 10.1109/ICSPS.2010.5555556
5 BrodyL R,HallJ A. Gender,emotion,and socialization∥Chrisler J C,McCreary D R. Handbook of Gender Research in Psychology. Springer Berlin Heidelberg,2010:429-454.
6 ChaplinT M,AldaoA. Gender differences in emotion expression in children:a meta?analytic review. Psychological Bulletin,2013,139(4):735-765.
7 LausenA,SchachtA. Gender differences in the recognition of vocal emotions. Frontiers in Psychology,2018,9:882.
8 赵力,黄程韦. 实用语音情感识别中的若干关键技术. 数据采集与处理,2014,29(2):157-170.
ZhaoL,HuangC W. Key technologies in practical speech emotion recognition. Data Acquisition and Processing,2014,29(2):157-170.
9 VogtT,AndréE. Improving automatic emotion recognition from speech via gender differentiation ∥Proceeding of Language Resources and Evaluation Conference. Genoa,Italy:LREC,2006:1-4.
10 ShahinI. Speaker identification in emotional talking environments using both gender and emotion cues∥International Conference on Communications,Signal Processing,and their Applications (ICCSPA). Sharjah,United Arab Emirates:IEEE,2013:1652-1659.
11 LaddeP P,DeshmukhV S. Use of multiple classifier system for gender driven speech emotion recognition∥International Conference on Computational Intelligence and Communication Networks. Jabalpur,India:IEEE,2015,DOI:10.1109/CICN.2015.145.
doi: 10.1109/CICN.2015.145
12 BelinP,FillionB S,GosselinF. The Montreal affective voices:a validated set of nonverbal affect bursts for research on auditory affective processing. Behavior Research Methods,2008,40:531-539.
13 CollignonO,GirardS,GosselinF,et al. Women process multisensory emotion expressions more efficiently than men. Neuropsychologia,2010,48(1):220-225.
14 GongX M,WongN,WangD H. Are gender differences in emotion culturally universal? Comparison of emotional intensity between Chinese and German samples. Journal of Cross?Cultural Psychology,2018,46(8):993-1005,doi:10.1177/0022022118768434.
doi: 10.1177/0022022118768434
15 MetallinouA,WollmerM,KatsamanisA,et al. Context?sensitive learning for enhanced audio?visual emotion classification. IEEE Transactions on Affective Computing,2012,3(2):184-198.
16 EybenF,SchererK R,SchullerB W,et al. The geneva minimalistic acoustic parameter set (GEMAPS) for voice research and affective 0computing. IEEE Transactions on Affective Computing,2016,7(2):190-202.
17 MirsamadiS,BarsoumE,ZhangC. Automatic speech emotion recognition using recurrent neural networks with local attention∥IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP). New Orleans,LA,USA:IEEE,2017:2227-2231.
18 GoblC,ChasaideA N. The role of voice quality in communicating emotion,mood and attitude. Speech Communication,2003,40(1-2):189-212.
19 JanszJ. Masculine identity and restrictive emotionality∥Fischer A H. Gender and Emotion:Social Psychological Perspectives. Cambridge,England:Cambridge University Press,2000:166-188.
20 ShieldsS A. Speaking from the heart:gender and the social meaning of emotion. Cambridge,England:Cambridge University Press,2002,230.
21 WoodW,EaglyA H. A cross?cultural analysis of the behavior of women and men:implications for the origins of sex differences. Psychological Bulletin,2002,128(5):699-727.
[1] 朱伟,张帅,辛晓燕,李文飞,王骏,张建,王炜. 结合区域检测和注意力机制的胸片自动定位与识别[J]. 南京大学学报(自然科学版), 2020, 56(4): 591-600.
[2] 徐扬,周文瑄,阮慧彬,孙雨,洪宇. 基于层次化表示的隐式篇章关系识别[J]. 南京大学学报(自然科学版), 2019, 55(6): 1000-1009.
[3] 钟琪,冯亚琴,王蔚. 跨语言语料库的语音情感识别对比研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 765-773.
[4] 钱付兰, 黄鑫, 赵姝, 张燕平. 基于路径相互关注的网络嵌入算法[J]. 南京大学学报(自然科学版), 2019, 55(4): 573-580.
[5] 王蔚, 胡婷婷, 冯亚琴. 基于深度学习的自然与表演语音情感识别[J]. 南京大学学报(自然科学版), 2019, 55(4): 660-666.
[6] 阚 威, 李 云. 基于LSTM的脑电情绪识别模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 110-116.
[7] 顾健伟, 曾 诚, 邹恩岑, 陈 扬, 沈 艺, 陆 悠, 奚雪峰. 基于双向注意力流和自注意力结合的机器阅读理解[J]. 南京大学学报(自然科学版), 2019, 55(1): 125-132.
[8] 朱亚奇1,邓维斌1 ,2*. 一种基于不平衡数据的聚类抽样方法[J]. 南京大学学报(自然科学版), 2015, 51(2): 421-429.
[9] 朱亚奇1,邓维斌1,2*. 一种基于不平衡数据的聚类抽样方法[J]. 南京大学学报(自然科学版), 2015, 51(2): 421-429.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 孙 玫,张 森,聂培尧,聂秀山. 基于朴素贝叶斯的网络查询日志session划分方法研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1132 -1140 .
[2] 周星星,张海平,吉根林. 具有时空特性的区域移动模式挖掘算法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1171 -1182 .
[3] 韩明鸣, 郭虎升, 王文剑. 面向非平衡多分类问题的二次合成QSMOTE方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 1 -13 .
[4] 刘 素, 刘惊雷. 基于特征选择的CP-nets结构学习[J]. 南京大学学报(自然科学版), 2019, 55(1): 14 -28 .
[5] 王伯伟, 聂秀山, 马林元, 尹义龙. 基于语义相似度的无监督图像哈希方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 41 -48 .
[6] 孔 颉, 孙权森, 纪则轩, 刘亚洲. 基于仿射不变离散哈希的遥感图像快速目标检测新方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 49 -60 .
[7] 贾海宁, 王士同. 面向重尾噪声的模糊规则模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 61 -72 .
[8] 严云洋, 瞿学新, 朱全银, 李 翔, 赵 阳. 基于离群点检测的分类结果置信度的度量方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 102 -109 .
[9] 阚 威, 李 云. 基于LSTM的脑电情绪识别模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 110 -116 .
[10] 董少春,种亚辉,胡 欢,黄璐璐. 基于时序InSAR的常州市2015-2018年地面沉降监测[J]. 南京大学学报(自然科学版), 2019, 55(3): 370 -380 .