南京大学学报(自然科学版) ›› 2021, Vol. 57 ›› Issue (5): 793–800.doi: 10.13232/j.cnki.jnju.2021.05.009

• • 上一篇    

基于生成对抗网络的音频目标分类对抗

张强1, 杨吉斌2(), 张雄伟2, 曹铁勇2, 梅鹏程1   

  1. 1.陆军工程大学研究生院,南京,210007
    2.陆军工程大学指挥控制工程学院,南京,210007
  • 收稿日期:2021-06-23 出版日期:2021-09-29 发布日期:2021-09-29
  • 通讯作者: 杨吉斌 E-mail:yjbice@sina.com
  • 作者简介:E⁃mail:yjbice@sina.com
  • 基金资助:
    国家自然科学基金(62071484)

Generating adversarial examples in audio object classification using generative adversarial network

Qiang Zhang1, Jibin Yang2(), Xiongwei Zhang2, Tieyong Cao2, Pengcheng Mei1   

  1. 1.Graduate School,Army Engineering University,Nanjing,210007,China
    2.School of Command and Control Engineering,Army Engineering University,Nanjing,210007,China
  • Received:2021-06-23 Online:2021-09-29 Published:2021-09-29
  • Contact: Jibin Yang E-mail:yjbice@sina.com

摘要:

音频对抗样本可以用于提高音频目标分类系统的可靠性,然而目前音频对抗样本的感知质量较低,生成质量不能令人满意.为提升音频对抗样本的质量,首次采用生成对抗网络(Generative Adversarial Network,GAN)实现音频目标分类的对抗样本生成.提出用于音频目标分类对抗样本生成的通用GAN框架,将待攻击的分类模型引入GAN.在此基础上,提出基于GAN的分段扰动/整体攻击(GAN?based Segmented?perturbation Overall?attack,SOGAN)方法.SOGAN通过对抗训练,学习短时分段音频数据上的有效扰动,按照与原始音频的对应关系生成整体扰动,并形成时长可变的对抗样本.该方法可以缩小音频对抗样本的搜索空间,降低对抗样本生成的复杂度.在UrbanSound8k,ESC50等音频目标分类数据集上的实验表明,和已有音频目标对抗样本设计方法相比,所提方法生成的对抗样本可感知性更低,对典型音频目标分类系统具有较高的攻击成功率和攻击效率.

关键词: 音频信号处理, 对抗样本, 音频目标分类, 生成对抗网络

Abstract:

Audio adversarial examples can be applied to boost audio object classification system. However, the quality of the adversarial examples in current systems is still not satisfactory. To improve the performance of adversarial examples,Generative Adversarial Network (GAN) is firstly adopted to generate adversarial examples for audio object classification. A general GAN framework integrated with the attacked audio classification model is proposed to optimize the classification attack effects. Based on the framework, a GAN?based Segmented?perturbation Overall?attack method (SOGAN) is proposed to narrow the search space in acoustic signal adversarial sample generation. SOGAN learns effective perturbations on the local data through adversarial training, and then synthesizes global perturbations to generate adversarial samples with variable length, which can not only reduce the complexity of audio adversarial example generation, but also improve the generality and performance of the GAN?based audio adversarial example generation. Experiments are carried out on typical audio object classification datasets, such as UrbanSound8k and ESC50. The results show that the proposed method can generate adversarial examples with higher attack success rate, lower perceptibility and higher attack efficiency on the state?of?the?art audio classification systems, compared with the existing audio adversarial example design methods.

Key words: audio signal processing, adversarial example, audio object classification, Generative Adversarial Network (GAN)

中图分类号: 

  • TP391

图1

生成音频目标分类对抗样本的通用GAN框架"

图2

SOGAN:基于GAN的分段扰动/整体攻击方法"

表1

数据集的详细信息"

数据集类别数

样本

数量

训练样本测试样本时长 (s)通道数
ESC50502000180020051
UrbanSound8k1087327858874≤42

表2

G和D的网络结构信息"

Gencoderconv2d (16,9,1,0),BN,LRelu
conv2d (32,10,2,0),BN,LRelu
conv2d (64,10,2,0),BN,LRelu
conv2d (128,10,2,0),BN,LRelu
bottlenekResnetblock×4
decoderdeconv2d (128,10,2,0),BN,LRelu
deconv2d (64,10,2,0),BN,LRelu
deconv2d (32,10,2,0),BN,LRelu
deconv2d (16,9,1,0),BN,LRelu
DMFEconv1d (40,8,1,4),BN,LRelu
conv1d (40,8,1,4),BN,LRelu
maxpool (160,160)
DELconv2d (50,(8,13),1,0),BN,LRelu
maxpool (3,3)
conv2d (50,(1,5),1,0),BN,LRelu
maxpool ((1,3),(1,3))
conv2d (1,(2,5),(3,3),0),LRelu
classifierlinear (16,1)

图3

在ESC50数据集上不同分段长度的SOGAN方法非定向攻击效果"

表3

在UrbanSound8k数据集上不同攻击方法的定向攻击效果对比"

方法目标类别

Air

Conditioner

Car HornChildren PlayingDog BarkDrillingEngine IdlingGun ShotJackhammerSirenStreet Music
Iterative[18]ASR (train)0.9390.9280.9210.9250.8820.9230.9020.9290.9010.906
ASR (test)0.7970.8350.6970.7670.7640.7630.8040.7440.7470.754
MSNR (test)22.85222.93222.37123.18322.70223.38720.86823.35523.04922.637
Penalty[18]ASR (train)0.9260.9350.9160.9280.9230.9040.9290.9040.9360.919
ASR (test)0.8730.9020.8600.8730.8670.8880.8950.8790.8660.863
MSNR (test)22.14321.20822.24121.60122.25121.91720.79822.36721.97121.818
SOGANASR (train)0.9510.9560.9460.9640.9470.9590.9480.9450.9540.934
ASR (test)0.9490.9440.9350.9530.9340.9450.9320.9340.9380.928
MSNR (test)28.38728.15628.50328.61728.73728.63728.75628.55627.38727.365

表4

在UrbanSound8k数据集上不同攻击方法的非定向攻击效果对比"

方法训练集测试集
ACCASRMSNRACCASRMSNRTime (s)
Iterative[18]N/A91.0%N/AN/A66.9%24.960N/A
Penalty[18]N/A90.0%N/AN/A83.1%18.727N/A
SOGAN99.5%94.8%28.53195.7%93.6%28.3100.01

图4

原始样本x与其对应的对抗样本xadv的比较"

1 Khamparia A,Gupta D,Nguyen N G,et al. Sound classification using convolutional neural network and tensor deep stacking network. IEEE Access,2019(7):7717-7727.
2 Hannun A,Case C,Casper J,et al. Deep speech:Scaling up end?to?end speech recognition. 2014,arXiv:.
3 Hoshen Y,Weiss R J,Wilson K W. Speech acoustic modeling from raw multichannel waveforms∥2015 IEEE International Conference on Acoustics,Speech and Signal Processing. South Brisbane,Australia:IEEE,2015:4624-4628.
4 Van Den Oord A,Dieleman S,Zen H,et al. Wavenet:A generative model for raw audio. 2016,arXiv:.
5 Sainath T N,Weiss R J,Senior A,et al. Learning the speech front?end with raw waveform CLDNNs∥The 16th Annual Conference of the International Speech Communication Association. Dresden,Germany:IEEE,2015:1-5.
6 Szegedy C,Zaremba W,Sutskever I,et al. Intriguing properties of neural networks. 2013,arXiv:1312. 6199v4.
7 Goodfellow I J,Shlens J,Szegedy C. Explaining and harnessing adversarial examples. 2014,arXiv:1412. 6572v2.
8 Kurakin A,Goodfellow I J,Bengio S. Adversarial examples in the physical world. 2016,arXiv:1607. 02533v1.
9 Carlini N,Wagner D. Towards evaluating the robustness of neural networks∥2017 IEEE Symposium on Security and Privacy. San Jose,CA,USA:IEEE,2017:39-57.
10 Papernot N,McDaniel P,Jha S,et al. The limitations of deep learning in adversarial settings∥2016 IEEE European Symposium on Security and Privacy. Saarbruecken,Germany:IEEE,2016:372-387.
11 Moosavi?Dezfooli S M,Fawzi A,Frossard P. Deepfool:A simple and accurate method to fool deep neural networks∥2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas,NV,USA:IEEE,2016:2574-2582.
12 Hu W W,Tan Y. Generating adversarial malware examples for black?box attacks based on GAN. 2017,arXiv:.
13 Moosavi?Dezfooli S M,Fawzi A,Fawzi O,et al. Universal adversarial perturbations∥2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA:IEEE,2017:86-94.
14 Xiao C W,Li B,Zhu J Y,et al. Generating adversarial examples with adversarial networks∥Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm,Sweden:AAAI Press,2018:3905-3911.
15 Carlini N,Wagner D. Audio adversarial examples:Targeted attacks on speech?to?text∥2018 IEEE Security and Privacy Workshops. San Francisco,CA,USA:IEEE,2018:1-7.
16 Alzantot M,Balaji B,Srivastava M. Did you hear that?Adversarial examples against automatic speech recognition. 2018,arXiv:.
17 Du T Y,Ji S L,Li J F,et al. SirenAttack:Generating adversarial audio for end?to?end acoustic systems∥Proceedings of the 15th ACM Asia Conference on Computer and Communications Security. New York,NY,USA:ACM,2020:357-369.
18 Abdoli S,Hafemann L G,Rony J,et al. Universal adversarial audio perturbations. 2019,arXiv:1908. 03173.
19 Wang D H,Dong L,Wang R D,et al. Targeted speech adversarial example generation with generative adversarial network. IEEE Access,2020(8):124503-124513.
20 Goodfellow I J,Pouget?Abadie J,Mirza M,et al. Generative adversarial networks. 2014,arXiv:1406. 2661.
21 Salamon J,Jacoby C,Bello J P. A dataset and taxonomy for urban sound research∥Proceedings of the 22nd ACM International Conference on Multimedia. New York,NY,USA:ACM,2014:1041-1044.
22 Piczak K J. ESC:Dataset for environmental sound classification∥Proceedings of the 23rd ACM International Conference on Multimedia. New York,NY,USA:ACM,2015:1015-1018.
23 Johnson J,Alahi A,Li F F. Perceptual losses for real?time style transfer and super?resolution∥European Conference on Computer Vision. Springer Berlin Heidelberg,2016:694-711.
24 Tokozume Y,Harada T. Learning environmental sounds with end?to?end convolutional neural network∥2017 IEEE International Conference on Acoustics,Speech and Signal Processing. New Orleans,LA,USA:IEEE,2017:2721-2725.
25 Paszke A,Gross S,Massa F,et al. PyTorch:An imperative style,high?performance deep learning library. 2019,arXiv:.
26 Tokozume Y,Ushiku Y,Harada T. Learning from between?class examples for deep sound recognition. 2017,arXiv:.
[1] 房笑宇, 曹陈涵, 夏彬. 基于注意力机制的大规模系统日志异常检测方法[J]. 南京大学学报(自然科学版), 2021, 57(5): 785-792.
[2] 戴臣超, 王洪元, 曹亮, 殷雨昌, 张继. 一种多目标跨摄像头跟踪技术研究与实现[J]. 南京大学学报(自然科学版), 2021, 57(2): 227-236.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!