Text message classification using supervised machine learning algorithms

14Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In recent years, as the popularity of mobile phone devices has increased, the short message service (SMS) has grown into a multi-billion dollar industry. At the same time, a reduction in the cost of messaging services has resulted in the growth of unsolicited messages, known as spam, one of the major problems that not only causes financial damage to organizations but is also very annoying for those who receive them. Findings: Thus, the increasing volume of such unsolicited messages has generated the need to classify and block them. Although humans have the cognitive ability to readily identify a message as spam, doing so remains an uphill task for computers. Objectives: This is where machine learning comes in handy by offering a data-driven and statistical method for designing algorithms that can help computer systems identify an SMS as a desirable message (HAM) or as junk (SPAM). But the lack of real databases for SMS spam, limited features and the informal language of the body of the text are probable factors that may have caused existing SMS filtering algorithms to underperform when classifying text messages. Methods/Statistical Analysis: In this paper, a corpus of real SMS texts made available by the University of California, Irvine (UCI) Machine Learning Repository has been leveraged and a weighting method based on the ability of individual words (present in the corpus) to point towards different target classes (HAM or SPAM) has been applied to classify new SMSs as SPAM and HAM. Additionally, different supervised machine learning algorithms such as support vector machine, k-nearest neighbours, and random forest have been compared on the basis of their performance in the classification of SMSs. Applications/Improvements: The results of this comparison are shown at the end of the paper along with the desktop application for the same which helps in classification of SPAM and HAM. This is also developed and executed in python.

Cite

CITATION STYLE

APA

Merugu, S., Reddy, M. C. S., Goyal, E., & Piplani, L. (2019). Text message classification using supervised machine learning algorithms. In Lecture Notes in Electrical Engineering (Vol. 500, pp. 141–150). Springer Verlag. https://doi.org/10.1007/978-981-13-0212-1_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free