Spam emails have recently become a concern on the Internet. Machine learning techniques such as Neural Networks, Naïve Bayes, and Decision Trees have frequently been used to combat these spam emails. Despite their efficiency, time complexity in high-dimensional datasets remains a significant challenge. Due to a large number of features in high-dimensional datasets, the intricacy of this problem grows exponentially. The existing approaches suffer from a computational burden when thousands of features are used (high-time complexity). To reduce time complexity and improve accuracy in high-dimensional datasets, extra steps of feature selection and parameter tuning are necessary. This work recommends the use of a hybrid logistic regression model with a feature selection approach and parameter tuning that could effectively handle a big dimensional dataset. The model employs the Term Frequency-Inverse Document Frequency (TF-IDF) feature extraction method to mitigate the drawbacks of Term Frequency (TF) to obtain an equal feature weight. Using publicly available datasets (Enron and Lingspam), we compared the model’s performance to that of other contemporary models. The proposed model achieved a low level of time complexity while maintaining a high level of spam detection rate of 99.1%.
CITATION STYLE
Mrisho, Z. K., Sam, A. E., & Ndibwile, J. D. (2021). Low Time Complexity Model for Email Spam Detection using Logistic Regression. International Journal of Advanced Computer Science and Applications, 12(12), 112–118. https://doi.org/10.14569/IJACSA.2021.0121215
Mendeley helps you to discover research relevant for your work.