SMS Phishing Dataset for Machine Learning and Pattern Recognition

3Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The reliability of the dataset is an essential factor for solving classification problems. Data is required for training, testing, classification, and evaluation of the machine learning models. SMS Phishing (Smishing) is a binary classification problem in which messages are categorized as malicious (Smishing) or legitimate (Ham). It is a fraudulent activity in which the attacker sends a malicious text message to the Smartphone user that causes financial or personal loss to the victim. Few research works have been proposed for the identification of smishing messages. According to the literature survey conducted, the smishing dataset is not publicly available yet. Hence, we have composed a smishing dataset that contains smishing messages extracted from different internet sources. We have formulated a dataset of 5971 text messages that contain 638 smishing messages, 489 spam messages, and 4844 ham messages. This SMS Phishing dataset can be used for the extraction of smishing features and classification of text messages using Machine Learning Algorithms. Experimental evaluation of the dataset for smishing message categorization using keyword classification is also presented in this paper. This smishing dataset can be used as a baseline for future research work corresponding to SMS Phishing.

Cite

CITATION STYLE

APA

Mishra, S., & Soni, D. (2023). SMS Phishing Dataset for Machine Learning and Pattern Recognition. In Lecture Notes in Networks and Systems (Vol. 648 LNNS, pp. 597–604). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-27524-1_57

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free