SMS Phishing Dataset for Machine Learning and Pattern Recognition

Sandhya Mishra; Devpriya Soni

Conference Proceedings

SMS Phishing Dataset for Machine Learning and Pattern Recognition

Lecture Notes in Networks and Systems (2023) 648 LNNS 597-604

DOI: 10.1007/978-3-031-27524-1_57

3Citations

10Readers

Get full text

Abstract

The reliability of the dataset is an essential factor for solving classification problems. Data is required for training, testing, classification, and evaluation of the machine learning models. SMS Phishing (Smishing) is a binary classification problem in which messages are categorized as malicious (Smishing) or legitimate (Ham). It is a fraudulent activity in which the attacker sends a malicious text message to the Smartphone user that causes financial or personal loss to the victim. Few research works have been proposed for the identification of smishing messages. According to the literature survey conducted, the smishing dataset is not publicly available yet. Hence, we have composed a smishing dataset that contains smishing messages extracted from different internet sources. We have formulated a dataset of 5971 text messages that contain 638 smishing messages, 489 spam messages, and 4844 ham messages. This SMS Phishing dataset can be used for the extraction of smishing features and classification of text messages using Machine Learning Algorithms. Experimental evaluation of the dataset for smishing message categorization using keyword classification is also presented in this paper. This smishing dataset can be used as a baseline for future research work corresponding to SMS Phishing.

Author supplied keywords

Cite

CITATION STYLE

APA

Mishra, S., & Soni, D. (2023). SMS Phishing Dataset for Machine Learning and Pattern Recognition. In Lecture Notes in Networks and Systems (Vol. 648 LNNS, pp. 597–604). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-27524-1_57

SMS Phishing Dataset for Machine Learning and Pattern Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions