Audio adversarial examples, imperceptible to humans, have been constructed to attack automatic speech recognition (ASR) systems. However, the adversarial examples generated by existing approaches usually incorporate noticeable noises, especially during the periods of silences and pauses. Moreover, the added noises often break temporal dependency property of the original audio, which can be easily detected by state-of-the-art defense mechanisms. In this paper, we propose a new Iterative Proportional Clipping (IPC) algorithm that preserves temporal dependency in audios for generating more robust adversarial examples. We are motivated by an observation that the temporal dependency in audios imposes a significant effect on human perception. Following our observation, we leverage a proportional clipping strategy to reduce noise during the low-intensity periods. Experimental results and user study both suggest that the generated adversarial examples can significantly reduce human-perceptible noises and resist the defenses based on the temporal structure.
CITATION STYLE
Zhang, H., Yan, Q., Zhou, P., & Liu, X. Y. (2020). Generating robust audio adversarial examples with temporal dependency. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2021-January, pp. 3167–3173). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2020/438
Mendeley helps you to discover research relevant for your work.