南京大学学报(自然科学版) ›› 2021, Vol. 57 ›› Issue (5): 793800.doi: 10.13232/j.cnki.jnju.2021.05.009
• • 上一篇
Qiang Zhang1, Jibin Yang2(), Xiongwei Zhang2, Tieyong Cao2, Pengcheng Mei1
摘要:
音频对抗样本可以用于提高音频目标分类系统的可靠性,然而目前音频对抗样本的感知质量较低,生成质量不能令人满意.为提升音频对抗样本的质量,首次采用生成对抗网络(Generative Adversarial Network,GAN)实现音频目标分类的对抗样本生成.提出用于音频目标分类对抗样本生成的通用GAN框架,将待攻击的分类模型引入GAN.在此基础上,提出基于GAN的分段扰动/整体攻击(GAN?based Segmented?perturbation Overall?attack,SOGAN)方法.SOGAN通过对抗训练,学习短时分段音频数据上的有效扰动,按照与原始音频的对应关系生成整体扰动,并形成时长可变的对抗样本.该方法可以缩小音频对抗样本的搜索空间,降低对抗样本生成的复杂度.在UrbanSound8k,ESC50等音频目标分类数据集上的实验表明,和已有音频目标对抗样本设计方法相比,所提方法生成的对抗样本可感知性更低,对典型音频目标分类系统具有较高的攻击成功率和攻击效率.
中图分类号:
1 | Khamparia A,Gupta D,Nguyen N G,et al. Sound classification using convolutional neural network and tensor deep stacking network. IEEE Access,2019(7):7717-7727. |
2 | Hannun A,Case C,Casper J,et al. Deep speech:Scaling up end?to?end speech recognition. 2014,arXiv:. |
3 | Hoshen Y,Weiss R J,Wilson K W. Speech acoustic modeling from raw multichannel waveforms∥2015 IEEE International Conference on Acoustics,Speech and Signal Processing. South Brisbane,Australia:IEEE,2015:4624-4628. |
4 | Van Den Oord A,Dieleman S,Zen H,et al. Wavenet:A generative model for raw audio. 2016,arXiv:. |
5 | Sainath T N,Weiss R J,Senior A,et al. Learning the speech front?end with raw waveform CLDNNs∥The 16th Annual Conference of the International Speech Communication Association. Dresden,Germany:IEEE,2015:1-5. |
6 | Szegedy C,Zaremba W,Sutskever I,et al. Intriguing properties of neural networks. 2013,arXiv:1312. 6199v4. |
7 | Goodfellow I J,Shlens J,Szegedy C. Explaining and harnessing adversarial examples. 2014,arXiv:1412. 6572v2. |
8 | Kurakin A,Goodfellow I J,Bengio S. Adversarial examples in the physical world. 2016,arXiv:1607. 02533v1. |
9 | Carlini N,Wagner D. Towards evaluating the robustness of neural networks∥2017 IEEE Symposium on Security and Privacy. San Jose,CA,USA:IEEE,2017:39-57. |
10 | Papernot N,McDaniel P,Jha S,et al. The limitations of deep learning in adversarial settings∥2016 IEEE European Symposium on Security and Privacy. Saarbruecken,Germany:IEEE,2016:372-387. |
11 | Moosavi?Dezfooli S M,Fawzi A,Frossard P. Deepfool:A simple and accurate method to fool deep neural networks∥2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,NV,USA:IEEE,2016:2574-2582. |
12 | Hu W W,Tan Y. Generating adversarial malware examples for black?box attacks based on GAN. 2017,arXiv:. |
13 | Moosavi?Dezfooli S M,Fawzi A,Fawzi O,et al. Universal adversarial perturbations∥2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA:IEEE,2017:86-94. |
14 | Xiao C W,Li B,Zhu J Y,et al. Generating adversarial examples with adversarial networks∥Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm,Sweden:AAAI Press,2018:3905-3911. |
15 | Carlini N,Wagner D. Audio adversarial examples:Targeted attacks on speech?to?text∥2018 IEEE Security and Privacy Workshops. San Francisco,CA,USA:IEEE,2018:1-7. |
16 | Alzantot M,Balaji B,Srivastava M. Did you hear that?Adversarial examples against automatic speech recognition. 2018,arXiv:. |
17 | Du T Y,Ji S L,Li J F,et al. SirenAttack:Generating adversarial audio for end?to?end acoustic systems∥Proceedings of the 15th ACM Asia Conference on Computer and Communications Security. New York,NY,USA:ACM,2020:357-369. |
18 | Abdoli S,Hafemann L G,Rony J,et al. Universal adversarial audio perturbations. 2019,arXiv:1908. 03173. |
19 | Wang D H,Dong L,Wang R D,et al. Targeted speech adversarial example generation with generative adversarial network. IEEE Access,2020(8):124503-124513. |
20 | Goodfellow I J,Pouget?Abadie J,Mirza M,et al. Generative adversarial networks. 2014,arXiv:1406. 2661. |
21 | Salamon J,Jacoby C,Bello J P. A dataset and taxonomy for urban sound research∥Proceedings of the 22nd ACM International Conference on Multimedia. New York,NY,USA:ACM,2014:1041-1044. |
22 | Piczak K J. ESC:Dataset for environmental sound classification∥Proceedings of the 23rd ACM International Conference on Multimedia. New York,NY,USA:ACM,2015:1015-1018. |
23 | Johnson J,Alahi A,Li F F. Perceptual losses for real?time style transfer and super?resolution∥European Conference on Computer Vision. Springer Berlin Heidelberg,2016:694-711. |
24 | Tokozume Y,Harada T. Learning environmental sounds with end?to?end convolutional neural network∥2017 IEEE International Conference on Acoustics,Speech and Signal Processing. New Orleans,LA,USA:IEEE,2017:2721-2725. |
25 | Paszke A,Gross S,Massa F,et al. PyTorch:An imperative style,high?performance deep learning library. 2019,arXiv:. |
26 | Tokozume Y,Ushiku Y,Harada T. Learning from between?class examples for deep sound recognition. 2017,arXiv:. |
[1] | 房笑宇, 曹陈涵, 夏彬. 基于注意力机制的大规模系统日志异常检测方法[J]. 南京大学学报(自然科学版), 2021, 57(5): 785-792. |
[2] | 戴臣超, 王洪元, 曹亮, 殷雨昌, 张继. 一种多目标跨摄像头跟踪技术研究与实现[J]. 南京大学学报(自然科学版), 2021, 57(2): 227-236. |
|