强混响环境下基于K⁃medoids特征聚类的话者计数

doi:10.13232/j.cnki.jnju.2021.05.019

南京大学学报(自然科学版) ›› 2021, Vol. 57 ›› Issue (5): 875–880.doi: 10.13232/j.cnki.jnju.2021.05.019

• • 上一篇

强混响环境下基于K⁃medoids特征聚类的话者计数

吴礼福¹^,²(), 姬广慎¹, 胡秋岑¹

^1.南京信息工程大学电子与信息工程学院，南京，210044
^2.江苏省大气环境与装备技术协同创新中心，南京，210044

收稿日期:2021-01-27 出版日期:2021-09-29 发布日期:2021-09-29
通讯作者: 吴礼福 E-mail:wulifu@nuist.edu.cn
作者简介:E⁃mail：wulifu@nuist.edu.cn
基金资助:
国家自然科学基金(12074192)

Speaker counting in strong reverberant environments based on K⁃medoids clustering coherence features

Lifu Wu¹^,²(), Guangshen Ji¹, Qiucen Hu¹

^1.School of Electronic & Information Engineering，Nanjing University of Information Science & Technology，Nanjing，210044，China
^2.Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology，Nanjing，210044，China

Received:2021-01-27 Online:2021-09-29 Published:2021-09-29
Contact: Lifu Wu E-mail:wulifu@nuist.edu.cn

摘要/Abstract

摘要：

强混响环境下的话者数量是语音处理应用中的关键信息.以不同话者语音之间的频域幅度平方相干（Magnitude Squared Coherence，MSC）为特征进行话者计数，首先提取语音中的短时频域MSC特征，再采用K?medoids算法对其进行聚类得到话者个数.该方法无需麦克风间距和话者到麦克风之间相对距离的先验信息.不同混响条件、不同信噪比和不同麦克风间距的实验结果表明，频域MSC特征与话者是相干的，与基于广义互相关相位变换（Generalized Cross?Correlation Phase Transform，GCC?PHAT）的到达时间差方法（Time Difference of Arrival，TDOA）相比，本方法的话者计数准确率更高，对麦克风间距的敏感度更低，鲁棒性更优.

关键词: 话者计数, 幅度平方相干, K?medoids, 广义互相关相位变换, 到达时间差

Abstract:

Speakers number is the key information of speech processing applications in strong reverberation.This paper uses MSC (Magnitude Squared Coherence) between the speech of different speakers to count the speakers.First，extracting the short?time frequency domain MSC features from the speech,and then obtain the number of speakers by K?medoids algorithm clustering.The method does not require prior knowledge of the microphone spacing and relative distance between the speakers and the microphones.The experimental results of different reverberation,different signal?to?noise ratios and different microphone spacing show that the frequency domain MSC feature is coherent with the speakers. Comparing with the method which clusters the TDOA (Time Difference of Arriva) estimates from a GCC?PHAT (Generalized Cross?Correlation Phase Transform)，the method in this paper has higher counting accuracy，lower sensitivity to microphone spacing and better robustness.

Key words: speaker counting, Magnitude Squared Coherence (MSC), K?medoids clustering, Generalized Cross?Correlation Phase Transform (GCC?PHAT), Time Difference of Arriva (TDOA)

中图分类号:

TP391

吴礼福, 姬广慎, 胡秋岑. 强混响环境下基于K⁃medoids特征聚类的话者计数[J]. 南京大学学报(自然科学版), 2021, 57(5): 875–880.

Lifu Wu, Guangshen Ji, Qiucen Hu. Speaker counting in strong reverberant environments based on K⁃medoids clustering coherence features[J]. Journal of Nanjing University(Natural Sciences), 2021, 57(5): 875–880.

图/表 6

图1

图2

表1

图3

图4

图5

参考文献 17

1	张雷岳,张兴敢,刘超.麦克风阵列声源定位中时延估计的改进.南京大学学报(自然科学)，2015，
	51(1)：25-30. （Zhang L Y，Zhang X G，Liu C. The improvement of time delay estimation in the microphone array sound localization system. Journal of Nanjing University(Natural Science)，2015，51(1)：25-30. ）
2	滕鹏晓,章林柯,陈日林等.基于双传声器对的多声源二维定位跟踪算法.声学学报，2010，35(2)：230-234.
	Teng P X，Zhang L K，Cheng L K，et al.Two?dimensional location and tracking algorithm for multiple sound sources based on dual microphone pairs.Acta Acoustica,2010，35(2)：230-234.
3	Blandin C,Ozerov A,Vincent E. Multi?source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Processing,2012,92(8)：1950-1960.
4	Bertrand A,Moonen M. Energy?based multi?speaker voice activity detection with an ad hoc microphone array∥Acoustics Speech and Signal Processing. Dallas,TX,USA：IEEE Press,2010：85-88.
5	张普芬.基于差分传声器阵列的声源个数估计方法研究.硕士学位论文.南京：南京航空航天大学. 2019.
	Zhang P F. Research on the method of sound source number estimation based on differential microphone array. Master Dissertation. Nanjing：Nanjing University of Aeronautics and Astronautics.2019.
6	Jia M ，Sun J，Bao C. Real?time multiple sound source localization and counting using a soundfield microphone. Journal of Ambient Intelligence & Humanized Computing，2017，8(6)：829-844.
7	Owa Da N,Tsuji D,Suyama K. Stable sound source tracking based on two updating algorithms. Leej Transactions on Electrical & Electronic Engineering,2011,6(1)：30-36.
8	Pasha S,Donley J, Ritz C, et al. Towards real?time source counting by estimation of coherent?to?diffuse ratios from ad?hoc microphone array recordings∥The 5^th Joint Workshop on Hands?free Speech Communication and Microphone Arrays. San Francisco,CA,USA：IEEE Press,2017：161-165.
9	Park H S ，Jun C H. A simple and fast algorithm for K?medoids clustering. Expert Systems with Applications，2009，36(2p2)：3336-3341.
10	曾晓迪.一种基于K?medoids改进BIRCH的大数据聚类方法——以证券客户大数据为例.硕士学位论文.云南：云南财经大学，2016.Zeng X D. A Big data clustering method based on K?medoids to improve BIRCH：Taking securities customer big data as an example. Master Dissertation. Yunnan：Yunnan University of Finance and Economics，2016.
11	Schwarz A ，Kellermann W. Coherent?to?Diffuse Power Ratio Estimation for Dereverberation. IEEE/ACM Transactions on Audio Speech & Language Processing，2015，23(6)：1006-1018.
12	Preeti,Arora,Deepali,et al. Analysis of K?means and K?medoids algorithm for big data?sciencedirect. Procedia Computer Science,2016：507-512.
13	Velmurugan. Computational complexity between K?means and K?medoids clustering algorithms for normal and uniform distributions of data points. Journal of Computer Science,2010,6(3)：363-368.
14	程明畅,刘友波,张程嘉等.基于分位数半径的动态K?means算法. 南京大学学报(自然科学)，2018，
	54(1)：48-55. (Cheng M C，Liu Y B，Zhang C J,et al. Dynamic K?means algorithm based on quantile
	radius. Journal of Nanjing University(Natural
	Science)，2018，54(1：48-55.
15	Maulik U ，Bandyopadhyay S. Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence，2002.
16	Knapp C ，Carter G. The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics Speech & Signal Processing，2003，24(4)：320-327.
17	Brandstein M S ，Silverman H F. A robust method for speech signal time?delay estimation in reverberant rooms∥IEEE International Conference on Acoustics,Speech and Signal Processing. Munich Germany,Germany：IEEE Press,1997：375-378.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

房间尺寸	7 m×4 m×3 m
采样频率(f_s)	32 kHz
帧长	512个采样点
帧移	216个采样点
时间段长度(L/f_s)	1 s
混响时间(RT₆₀)	0.2~0.8 s
信噪比(SNR)	0~40 dB
麦克风阵元间距(d)	0.1~1 m
测试总次数(T_t)	100
会议上线人数(M_max)	6

强混响环境下基于K⁃medoids特征聚类的话者计数

Speaker counting in strong reverberant environments based on K⁃medoids clustering coherence features

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 6

参考文献 17

相关文章 1

Metrics

本文评价

推荐阅读 0