Abstract
Deep Learning-based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain a better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, several introspection methods have been proposed. However, established introspection techniques are mostly designed for computer vision tasks and rely on the data being visually interpretable, which limits their usefulness for understanding speech recognition models. To overcome this limitation, we developed a novel neuroscience-inspired technique for visualizing and understanding ANNs, called Saliency-Adjusted Neuron Activation Profiles (SNAPs). SNAPs are a flexible framework to analyze and visualize Deep Neural Networks that does not depend on visually interpretable data. In this work, we demonstrate how to utilize SNAPs for understanding fully-convolutional ASR models. This includes visualizing acoustic concepts learned by the model and the comparative analysis of their representations in the model layers.
Author supplied keywords
Cite
CITATION STYLE
Krug, A., Ebrahimzadeh, M., Alemann, J., Johannsmeier, J., & Stober, S. (2021). Analyzing and visualizing deep neural networks for speech recognition with saliency-adjusted neuron activation profiles. Electronics (Switzerland), 10(11). https://doi.org/10.3390/electronics10111350
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.