Trojaning Attack on Neural Networks

Yingqi Liu; Shiqing Ma; Yousra Aafer; Wen Chuan Lee; Juan Zhai; Weihang Wang; Xiangyu Zhang

Conference ProceedingsOPEN ACCESS

Trojaning Attack on Neural Networks

25th Annual Network and Distributed System Security Symposium, NDSS 2018 (2018)

DOI: 10.14722/ndss.2018.23291

723Citations

400Readers

Abstract

With the fast spread of machine learning techniques, sharing and adopting public machine learning models become very popular. This gives attackers many new opportunities. In this paper, we propose a trojaning attack on neural networks. As the models are not intuitive for human to understand, the attack features stealthiness. Deploying trojaned models can cause various severe consequences including endangering human lives (in applications like autonomous driving). We first inverse the neural network to generate a general trojan trigger, and then retrain the model with reversed engineered training data to inject malicious behaviors to the model. The malicious behaviors are only activated by inputs stamped with the trojan trigger. In our attack, we do not need to tamper with the original training process, which usually takes weeks to months. Instead, it takes minutes to hours to apply our attack. Also, we do not require the datasets that are used to train the model. In practice, the datasets are usually not shared due to privacy or copyright concerns. We use five different applications to demonstrate the power of our attack, and perform a deep analysis on the possible factors that affect the attack. The results show that our attack is highly effective and efficient. The trojaned behaviors can be successfully triggered (with nearly 100% possibility) without affecting its test accuracy for normal input and even with better accuracy on public dataset. Also, it only takes a small amount of time to attack a complex neuron network model. In the end, we also discuss possible defense against such attacks.

Cite

CITATION STYLE

APA

Liu, Y., Ma, S., Aafer, Y., Lee, W. C., Zhai, J., Wang, W., & Zhang, X. (2018). Trojaning Attack on Neural Networks. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018. The Internet Society. https://doi.org/10.14722/ndss.2018.23291

Trojaning Attack on Neural Networks

Abstract

Cite

Register to see more suggestions