南京大学学报(自然科学版) ›› 2021, Vol. 57 ›› Issue (4): 709–714.doi: 10.13232/j.cnki.jnju.2021.04.021

• • 上一篇    

融合韵律与动态倒谱特征的语音疲劳度检测

吴礼福1,2(), 徐行1   

  1. 1.南京信息工程大学电子与信息工程学院,南京,210044
    2.江苏省大气环境与装备技术协同创新中心,南京,210044
  • 收稿日期:2021-01-27 出版日期:2021-07-30 发布日期:2021-07-30
  • 通讯作者: 吴礼福 E-mail:wulifu@nuist.edu.cn
  • 作者简介:E⁃mail:wulifu@nuist.edu.cn
  • 基金资助:
    国家自然科学基金(12074192)

Speech fatigue detection combining prosodic and dynamic cepstral features

Lifu Wu1,2(), Hang Xu1   

  1. 1.School of Electronic & Information Engineering,Nanjing University of Information Science & Technology,Nanjing,210044,China
    2.Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology,Nanjing,210044,China
  • Received:2021-01-27 Online:2021-07-30 Published:2021-07-30
  • Contact: Lifu Wu E-mail:wulifu@nuist.edu.cn

摘要:

通过语音实现疲劳度检测具有操作简单、无创伤和实时性等优点.为了提高语音疲劳度检测的性能,将韵律特征与动态倒谱特征相融合,采用高斯混合模型作为分类器进行语音疲劳度检测.分别考察了梅尔频率倒谱系数、滑动差分倒谱特征以及韵律特征的检测性能.实验结果表明,在单特征时,梅尔频率倒谱系数比滑动差分倒谱特征和韵律特征的检测性能好,对于融合特征,检测性能均比单特征好,将三个特征融合后,检测正确率可达91%.

关键词: 疲劳度,梅尔频率倒谱系数,滑动差分倒谱,韵, 律,高斯混合模型,融,

Abstract:

Fatigue detection through speech has the advantages of simple operation,non?invasive and real?time. In order to improve the performance of speech fatigue detection,this paper combines prosodic features with dynamic cepstral features,uses Gaussian mixture model as the classifier to perform voice fatigue detection. The detection performance of Mel frequency cepstral coefficient,shifted delta cepstral feature and prosodic feature is investigated respectively. The experimental results show that the detection performance of Mel frequency cepstral coefficient is better than that of shifted delta cepstral feature and prosodic feature in single feature. For fusion feature,the detection performance is better than that of single feature. If the three features are fused,the detection accuracy can reach 91%.

Key words: fatigue, mel frequency cepstral coefficient, shifted delta cepstral, prosody, gaussian mixture model, fusion

中图分类号: 

  • TP391

图1

梅尔频率倒谱系数提取流程"

图2

滑动差分倒谱特征提取流程"

图3

模型训练框图"

图4

四种疲劳状态识别效果"

表1

在不同的特征参数下识别效果对比"

特征参数ABCD识别率
MFCC11188949380.2%
SDC60471019062.5%
I+F7962685454.8%
MFCC+SDC10693949981.7%
MFCC+I+F10991959581.2%
MFCC+SDC+I+F1111171109991.0%

图5

不同疲劳状态的语音频谱"

1 翁时锋,庄力可. 反馈式疲劳检测系统. 浙江专利CN101968918A,2011-02-09.
2 李响,李国正,石俊刚等.基于语音心理声学分析的驾驶疲劳检测.仪器仪表学报,2018,39(10):166-175.
Li X,Li G Z,Shi J G,et al. Fatigue driving detection based on speech psychoacoustic analysis. Chinese Journal of Scientific Instrument,2018,39(10):166-175.
3 陈枢茜.基于语音分析的疲劳度检测研究.硕士学位论文.苏州:苏州大学,2017.
Chen S X. Research on fatigue detection based on voice analysis.Master Dissertation. Suzhou: Soochow University,2017.
4 赵强. 基于神经网络的语音疲劳度检测.硕士学位论文.北京:北京交通大学,2019.
Zhao Q. Speech fatigue detection based on neural network. Master Dissertation. Beijing:Beijing Jiaotong University,2019.
5 Richardson F,Reynolds D,Dehak N. A unified deep neural network for speaker and language recognition. Computer Science,2015,arXiv:.
6 王洪海.基于声学特征的自动语言辨识研究.硕士学位论文.北京:北京邮电大学,2007.
Wang H H. Research on automatic language recognition based on acoustic features. Master Dissertation. Beijing:Beijing University of Posts and Telecommunications,2007.
7 Sarmah K,Bhattacharjee U. GMM based language identification using MFCC and SDC features. International Journal of Computer Applications,2013,85(5):36-42.
8 Krishna N M,Lakshmi P V,Srinivas Y. Inferring the human emotional state of mind using assymetric distrubution. International Journal of Advanced Computer Science and Applications,2013,4(1):116-118.
9 Rajesh B,Bhalke D G. Automatic genre classification of Indian Tamil and western music using fractional MFCC. International Journal of Speech Technology,2016,19(3):551-563.
10 Chen L,Yang Y C,Wu Z H.Mismatched feature detection with finer granularity for emotional speaker recognition. Journal of Zhejiang University:Science C,2014,15(10):903-916.
11 陈海兰,孙海信,齐洁等.基于多维特征联合的鸟类鸣声识别方法研究.南京大学学报(自然科学),2015,51(6):1234-1239.
Chen H L,Sun H X,Qi J,et al. Research of birds call recognition method based on multi?feature fusion. Journal of Nanjing University (Natural Science),2015,51(6):1234-1239.
12 张钰莎,蒋盛益.基于MFCC特征提取和改进SVM的语音情感数据挖掘分类识别方法研究.计算机应用与软件,2020,37(8):160-165, 212.
Zhang Y S,Jiang S Y. Speech emotion data mining classification and recognition method based on MFCC feature extraction and improved SVM. Computer Applications and Software,2020,37(8):160-165, 212.
13 何艳,于凤芹.一种静态特征与动态特征结合的方言辨识方法.计算机工程与应用,2012,48(13):105-108. (He Y,Yu F Q. Dialect identification method combining static and dynamic features.
Computer Engineering and Applications,2012,
48(13):105-108.
14 吴迪,曹洁,王进花.基于自适应高斯混合模型与静动态听觉特征融合的说话人识别.光学精密工程,2013,21(6):1598-1604. (Wu D,Cao J,Wang J H. Speaker recognition based on adapted Gaussian
mixture model and static and dynamic auditory
features fusion. Optics and Precision Engineering,2013,21(6):1598-1604.
15 Dehzangi O,Ma B,Chng E S,et al. Discriminative feature extraction for speech recognition using continuous output codes. Pattern Recognition Letters,2012,33(13):1703-1709.
16 曹欣怡,李鹤,王蔚. 基于语料库的语音情感识别的性别差异研究. 南京大学学报(自然科学),2019,
55(5):758-764. (Cao X Y,Li H,Wang W. A study on gender differences in speech emotion recognitionbased on corpus. Journal of Nanjing University (Natural Sciences),2019,55(5):758-764.)
17 王茂蓉.基于混合特征参数和GMM?UBM的说话人识别系统的研究.硕士学位论文.桂林:桂林电子科技大学,2016.
Wang M R. Research of speaker recognition system based on mixed feature parameters and GMM?UBM.Master Dissertation. Guilin:Guilin University of Electronic Technology,2016.
18 王洪海,刘刚,郭军.基于滑动倒谱的自动语言辨识.智能系统学报,2008,3(4):336-341.
Wang H H,Liu G,Guo J. Automatic language identification using shifted cepstra. Journal of Intelligent Systems,2008,3(4):336-341.
[1] 段建设, 崔超然, 宋广乐, 马乐乐, 马玉玲, 尹义龙. 基于多尺度注意力融合的知识追踪方法[J]. 南京大学学报(自然科学版), 2021, 57(4): 591-598.
[2] 王伟, 王钦钊, 毕千, 郭傲兵. 一种基于概率融合的群组目标定位方法[J]. 南京大学学报(自然科学版), 2021, 57(4): 611-616.
[3] 顾昭仪, 卢晶. 指定输出通道排序的半监督盲源分离算法[J]. 南京大学学报(自然科学版), 2021, 57(4): 671-682.
[4] 叶子琪, 蒋小峰, 汤其阳, 李梅. 聚乙烯微塑料对蚕豆幼苗的毒性效应[J]. 南京大学学报(自然科学版), 2021, 57(3): 385-392.
[5] 杨宗彩, 肖琳. 电化学氧化改善剩余污泥脱水性能的研究[J]. 南京大学学报(自然科学版), 2021, 57(3): 445-450.
[6] 赵前, 李义丰. 基于非线性Lamb波混频技术的复合板微裂纹方向识别研究[J]. 南京大学学报(自然科学版), 2021, 57(3): 473-481.
[7] 王颖俐, 魏玲. 基于改进的区间损失函数聚合法的三支决策[J]. 南京大学学报(自然科学版), 2021, 57(3): 493-501.
[8] 陈迪, 刘惊雷. 基于乘法更新规则的k⁃means与谱聚类的联合学习[J]. 南京大学学报(自然科学版), 2021, 57(2): 177-188.
[9] 孙金萍, 丁恩杰, 鲍蓉, 厉丹, 李子龙. 多特征融合的长时间目标跟踪算法[J]. 南京大学学报(自然科学版), 2021, 57(2): 217-226.
[10] 李季, 周轩弘, 何勇, 刘欣, 陈泽炜. 基于尺度不变性与特征融合的目标检测算法[J]. 南京大学学报(自然科学版), 2021, 57(2): 237-244.
[11] 许国强, 余长州, 王林, 周春蕾, 高阳. 一种增强贝叶斯网络结构学习的自动变量序调整算法[J]. 南京大学学报(自然科学版), 2021, 57(2): 255-261.
[12] 孙颖, 蔡天使, 张毅, 鞠恒荣, 丁卫平. 基于合理粒度的局部邻域决策粗糙计算方法[J]. 南京大学学报(自然科学版), 2021, 57(2): 262-271.
[13] 范习健, 杨绪兵, 张礼, 业巧林, 业宁. 一种融合视觉和听觉信息的双模态情感识别算法[J]. 南京大学学报(自然科学版), 2021, 57(2): 309-317.
[14] 高春永, 柏业超, 王琼. 基于改进的半监督阶梯网络SAR图像识别[J]. 南京大学学报(自然科学版), 2021, 57(1): 160-166.
[15] 夏菁, 丁世飞. 基于低秩稀疏约束的自权重多视角子空间聚类[J]. 南京大学学报(自然科学版), 2020, 56(6): 862-869.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!