CS-BPSO: Hybrid feature selection based on chi-square and binary PSO algorithm for Arabic email authorship analysis

45Citations
Citations of this article
66Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Email authorship analysis is a challenging task involving the detection of an author's style to help determine their identity. Emails represent a widespread application of big data, and email authorship analysis is widely performed in the forensic linguistics field. However, the high-dimensional feature space encountered in authorship analysis affects the classification performance. Moreover, the Arabic language is highly inflected and involves certain unique characteristics, which pose critical challenges in identifying the context. Therefore, the selection of prominent features is a critical step in realizing authorship analysis. Swarm intelligence (SI) algorithms are widely adopted to address such feature selection problems. In this study, an efficient hybrid feature selection algorithm based on binary particle swarm optimization (BPSO) and chi-square BPSO (CS-BPSO) was developed to enhance the performance of Arabic email authorship analysis. Static and dynamic features were considered. Experiments were conducted on Arabic email messages collected from a sample population to test the algorithm performance using three popular classifiers: support vector machine (SVM), K-nearest neighbour (KNN), and naïve Bayes (NB) classifiers. Different metrics, specifically, the accuracy, precision, recall, and f1-score, were considered as performance measures. The results showed that the CS-BPSO method achieves impressive results using dynamic features. The findings were quite satisfactory in terms of solving multiple types of difficulties, e.g., imbalanced dataset, small dataset, and short text.

Cite

CITATION STYLE

APA

BinSaeedan, W., & Alramlawi, S. (2021). CS-BPSO: Hybrid feature selection based on chi-square and binary PSO algorithm for Arabic email authorship analysis. Knowledge-Based Systems, 227. https://doi.org/10.1016/j.knosys.2021.107224

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free