The astonishing and cryptic effectiveness of Deep Neural Networks comes with the critical vulnerability to adversarial inputs — samples maliciously crafted to confuse and hinder machine learning models. Insights into the internal representations learned by deep models can help to explain their decisions and estimate their confidence, which can enable us to trace, characterise, and filter out adversarial attacks.
Carrara, Fabio and Falchi, Fabrizio and Amato, Giuseppe, and Becarelli, Rudy and Caldelli, R. (2019). Detecting Adversarial Inputs by Looking in the black box. ERCIM News, (116). Retrieved from http://www.nmis.isti.cnr.it/falchi/Pub/2019-ERCIMNews116_Adv.pdf