Efficient clustering of e-mails by applying supervised machine learning algorithms

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

In today's digital age, effective detection of unwanted e-mails, commonly known as "spam", has become a priority for individuals and organizations. As e-mail inboxes fill up with un-solicited messages, it has become evident that the predefined rules and heuristics used by traditional spam filters have lost their effectiveness. This persistent problem poses challenges at both the personal and business level. Despite efforts to protect e-mail accounts with anti-virus, which in many cases come at a cost, spam remains a growing concern. For businesses, implementing costly firewalls can be an unnecessary burden. The problem of spam persists, and its impact on the efficiency and security of e-mail communication is indisputable. The primary objective of this paper is to investigate and evaluate machine learning algorithms specifically designed to address the challenge of automatic spam detection. This is achieved by using text classification techniques applied to mail servers and personal computers. Three key algorithms are examined: Random Forest, decision tree and Naive Bayes, with the intention of determining their applicability in both environments. This study relies on two essential research methodologies. First, feature selection, a crucial process that identifies the most relevant variables in mail classification, including keywords and word frequencies, is conducted. In addition, performance evaluation, which uses metrics such as accuracy, recall and F1-score, is employed to understand the performance of machine learning models in detecting spam and legitimate e-mails. The results of this study are presented in the form of comparative tables showing the hit and miss rates of the three models evaluated. Notably, it is determined that the Random Forest model, when applied in conjunction with tokenization techniques, exhibits superior efficiency compared to the other two models. The choice of the right machine learning model is critical to ensure efficiency in e-mail classification, and this study provides a solid basis for making informed decisions in the implementation of e-mail security systems in real-world business environments. Spam detection, supported by machine learning algo-rhythms, remains an evolving field and offers a promising solution to address a persistent problem in the digital world.

Cite

CITATION STYLE

APA

Quirumbay Yagual, D., Soria Méndez, B., & Cruz Ruiz, V. (2024). Efficient clustering of e-mails by applying supervised machine learning algorithms. Journal of Applied Research and Technology, 22(4), 560–566. https://doi.org/10.22201/icat.24486736e.2024.22.4.2383

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free