Improving Deep Learning Based Password Guessing Models Using Pre-processing

3Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Passwords are the most widely used authentication method and play an important role in users’ digital lives. Password guessing models are generally used to understand password security, yet statistic-based password models (like the Markov model and probabilistic context-free grammars (PCFG)) are subject to the inherent limitations of overfitting and sparsity. With the improvement of computing power, deep-learning based models with higher crack rates are emerging. Since neural networks are generally used as black boxes for learning password features, a key challenge for deep-learning based password guessing models is to choose the appropriate preprocessing methods to learn more effective features. To fill the gap, this paper explores three new preprocessing methods and makes an attempt to apply them to two promising deep-learning networks, i.e., Long Short-Term Memory (LSTM) neural networks and Generative Adversarial Networks (GAN). First, we propose a character-feature based method for encoding to replace the canonical one-hot encoding. Second, we add so far the most comprehensive recognition rules of words, keyboard patterns, years, and website names into the basic PCFG, and find that the frequency distribution of extracted segments follows the Zipf’s law. Third, we adopt Xu et al.’s PCFG improvement with chunk segmentation at CCS’21, and study the performance of the Chunk+PCFG preprocessing method when applied to LSTM and GAN. Extensive experiments on six large real-world password datasets show the effectiveness of our preprocessing methods. Results show that within 50 million guesses: 1) When we apply the PCFG preprocessing method to PassGAN (a GAN-based password model proposed by Hitja et al. at ACNS’19), 13.83%–38.81% (26.79% on average) more passwords can be cracked; 2) Our LSTM based model using PCFG for preprocessing (short for PL) outperforms Wang et al.’s original PL model by 0.35%–3.94% (1.36% on average). Overall, our preprocessing methods can improve the attacking rates in four over seven tested cases. We believe this work provides new feasible directions for guessing optimization, and contributes to a better understanding of deep-learning based models.

Cite

CITATION STYLE

APA

Wu, Y., Wang, D., Zou, Y., & Huang, Z. (2022). Improving Deep Learning Based Password Guessing Models Using Pre-processing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13407 LNCS, pp. 163–183). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-15777-6_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free