Analyzing and visualizing deep neural networks for speech recognition with saliency-adjusted neuron activation profiles

14Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Deep Learning-based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain a better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, several introspection methods have been proposed. However, established introspection techniques are mostly designed for computer vision tasks and rely on the data being visually interpretable, which limits their usefulness for understanding speech recognition models. To overcome this limitation, we developed a novel neuroscience-inspired technique for visualizing and understanding ANNs, called Saliency-Adjusted Neuron Activation Profiles (SNAPs). SNAPs are a flexible framework to analyze and visualize Deep Neural Networks that does not depend on visually interpretable data. In this work, we demonstrate how to utilize SNAPs for understanding fully-convolutional ASR models. This includes visualizing acoustic concepts learned by the model and the comparative analysis of their representations in the model layers.

Cite

CITATION STYLE

APA

Krug, A., Ebrahimzadeh, M., Alemann, J., Johannsmeier, J., & Stober, S. (2021). Analyzing and visualizing deep neural networks for speech recognition with saliency-adjusted neuron activation profiles. Electronics (Switzerland), 10(11). https://doi.org/10.3390/electronics10111350

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free