南京大学学报(自然科学版) ›› 2021, Vol. 57 ›› Issue (5): 875–880.doi: 10.13232/j.cnki.jnju.2021.05.019

• • 上一篇    

强混响环境下基于K⁃medoids特征聚类的话者计数

吴礼福1,2(), 姬广慎1, 胡秋岑1   

  1. 1.南京信息工程大学电子与信息工程学院,南京,210044
    2.江苏省大气环境与装备技术协同创新中心,南京,210044
  • 收稿日期:2021-01-27 出版日期:2021-09-29 发布日期:2021-09-29
  • 通讯作者: 吴礼福 E-mail:wulifu@nuist.edu.cn
  • 作者简介:E⁃mail:wulifu@nuist.edu.cn
  • 基金资助:
    国家自然科学基金(12074192)

Speaker counting in strong reverberant environments based on K⁃medoids clustering coherence features

Lifu Wu1,2(), Guangshen Ji1, Qiucen Hu1   

  1. 1.School of Electronic & Information Engineering,Nanjing University of Information Science & Technology,Nanjing,210044,China
    2.Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology,Nanjing,210044,China
  • Received:2021-01-27 Online:2021-09-29 Published:2021-09-29
  • Contact: Lifu Wu E-mail:wulifu@nuist.edu.cn

摘要:

强混响环境下的话者数量是语音处理应用中的关键信息.以不同话者语音之间的频域幅度平方相干(Magnitude Squared Coherence,MSC)为特征进行话者计数,首先提取语音中的短时频域MSC特征,再采用K?medoids算法对其进行聚类得到话者个数.该方法无需麦克风间距和话者到麦克风之间相对距离的先验信息.不同混响条件、不同信噪比和不同麦克风间距的实验结果表明,频域MSC特征与话者是相干的,与基于广义互相关相位变换(Generalized Cross?Correlation Phase Transform,GCC?PHAT)的到达时间差方法(Time Difference of Arrival,TDOA)相比,本方法的话者计数准确率更高,对麦克风间距的敏感度更低,鲁棒性更优.

关键词: 话者计数, 幅度平方相干, K?medoids, 广义互相关相位变换, 到达时间差

Abstract:

Speakers number is the key information of speech processing applications in strong reverberation.This paper uses MSC (Magnitude Squared Coherence) between the speech of different speakers to count the speakers.First,extracting the short?time frequency domain MSC features from the speech,and then obtain the number of speakers by K?medoids algorithm clustering.The method does not require prior knowledge of the microphone spacing and relative distance between the speakers and the microphones.The experimental results of different reverberation,different signal?to?noise ratios and different microphone spacing show that the frequency domain MSC feature is coherent with the speakers. Comparing with the method which clusters the TDOA (Time Difference of Arriva) estimates from a GCC?PHAT (Generalized Cross?Correlation Phase Transform),the method in this paper has higher counting accuracy,lower sensitivity to microphone spacing and better robustness.

Key words: speaker counting, Magnitude Squared Coherence (MSC), K?medoids clustering, Generalized Cross?Correlation Phase Transform (GCC?PHAT), Time Difference of Arriva (TDOA)

中图分类号: 

  • TP391

图1

话者M=6时双元麦克风信号建模场景"

图2

话者计数系统框图"

表1

仿真环境参数设置"

房间尺寸7 m×4 m×3 m
采样频率(fs)32 kHz
帧 长512个采样点
帧 移216个采样点
时间段长度(L/fs)1 s
混响时间(RT60)0.2~0.8 s
信噪比(SNR)0~40 dB
麦克风阵元间距(d)0.1~1 m
测试总次数(Tt)100
会议上线人数(Mmax)6

图3

混响时间对话者计数的影响"

图4

信噪比对话者计数的影响"

图5

阵元间距对话者计数的影响"

1 张雷岳,张兴敢,刘超.麦克风阵列声源定位中时延估计的改进.南京大学学报(自然科学),2015,
51(1):25-30. (Zhang L Y,Zhang X G,Liu C. The improvement of time delay estimation in the microphone array sound localization system. Journal of Nanjing University(Natural Science),2015,51(1):25-30. )
2 滕鹏晓,章林柯,陈日林等.基于双传声器对的多声源二维定位跟踪算法.声学学报,2010,35(2):230-234.
Teng P X,Zhang L K,Cheng L K,et al.Two?dimensional location and tracking algorithm for multiple sound sources based on dual microphone pairs.Acta Acoustica,2010,35(2):230-234.
3 Blandin C,Ozerov A,Vincent E. Multi?source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Processing,2012,92(8):1950-1960.
4 Bertrand A,Moonen M. Energy?based multi?speaker voice activity detection with an ad hoc microphone array∥Acoustics Speech and Signal Processing. Dallas,TX,USA:IEEE Press,2010:85-88.
5 张普芬.基于差分传声器阵列的声源个数估计方法研究.硕士学位论文.南京:南京航空航天大学. 2019.
Zhang P F. Research on the method of sound source number estimation based on differential microphone array. Master Dissertation. Nanjing:Nanjing University of Aeronautics and Astronautics.2019.
6 Jia M ,Sun J,Bao C. Real?time multiple sound source localization and counting using a soundfield microphone. Journal of Ambient Intelligence & Humanized Computing,2017,8(6):829-844.
7 Owa Da N,Tsuji D,Suyama K. Stable sound source tracking based on two updating algorithms. Leej Transactions on Electrical & Electronic Engineering,2011,6(1):30-36.
8 Pasha S,Donley J, Ritz C, et al. Towards real?time source counting by estimation of coherent?to?diffuse ratios from ad?hoc microphone array recordings∥The 5th Joint Workshop on Hands?free Speech Communication and Microphone Arrays. San Francisco,CA,USA:IEEE Press,2017:161-165.
9 Park H S ,Jun C H. A simple and fast algorithm for K?medoids clustering. Expert Systems with Applications,2009,36(2p2):3336-3341.
10 曾晓迪.一种基于K?medoids改进BIRCH的大数据聚类方法——以证券客户大数据为例.硕士学位论文.云南:云南财经大学,2016.Zeng X D. A Big data clustering method based on K?medoids to improve BIRCH:Taking securities customer big data as an example. Master Dissertation. Yunnan:Yunnan University of Finance and Economics,2016.
11 Schwarz A ,Kellermann W. Coherent?to?Diffuse Power Ratio Estimation for Dereverberation. IEEE/ACM Transactions on Audio Speech & Language Processing,2015,23(6):1006-1018.
12 Preeti,Arora,Deepali,et al. Analysis of K?means and K?medoids algorithm for big data?sciencedirect. Procedia Computer Science,2016:507-512.
13 Velmurugan. Computational complexity between K?means and K?medoids clustering algorithms for normal and uniform distributions of data points. Journal of Computer Science,2010,6(3):363-368.
14 程明畅,刘友波,张程嘉等.基于分位数半径的动态K?means算法. 南京大学学报(自然科学),2018,
54(1):48-55. (Cheng M C,Liu Y B,Zhang C J,et al. Dynamic K?means algorithm based on quantile
radius. Journal of Nanjing University(Natural
Science),2018,54(1:48-55.
15 Maulik U ,Bandyopadhyay S. Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence,2002.
16 Knapp C ,Carter G. The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics Speech & Signal Processing,2003,24(4):320-327.
17 Brandstein M S ,Silverman H F. A robust method for speech signal time?delay estimation in reverberant rooms∥IEEE International Conference on Acoustics,Speech and Signal Processing. Munich Germany,Germany:IEEE Press,1997:375-378.
[1] 蔡宗义*,许学忠,梁旭斌,赵天青,成龙,孙迪峰. 一种水上落点定位方法研究[J]. 南京大学学报(自然科学版), 2015, 51(7): 27-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!