Feature Selection (FS) phase is crucial in the Event Detection (ED) model. Several studies have captured the most informative features using various filter and wrapper FS methods. Recently, FS methods based on swarm intelligence algorithms have been employed to determine the relevant features. Nevertheless, ED from sparse and high-dimensional feature space resulting from a massive number of news documents with different text lengths is a challenging task. Such feature space consists of redundant, irrelevant, and noisy data, which misguide the detection process and substantially, affect the reliability of the ED model. Hence, this study proposes a novel Binary Bat Algorithm (BBA) and Markov Clustering Algorithm (MCL) to improve the performance of the ED model. To the best of our knowledge, BBA is employed for the first time in this study in the context of the ED field. The proposed method is tested on 10 benchmark datasets and 2 primary Facebook news datasets using the average of several evaluation metrics such as F-measure (F), Precision (PR), Recall (R), and Selected Feature Ratio (SFR). Comparative experiments against the basic MCL, Binary versions of the Genetic Algorithm and Particle Swarm Optimization are implemented in this study. The empirical results proved that BBA-MCL outperforms other methods on most datasets based on F and PR metrics. Furthermore, the statistical results confirmed that the BBA-MCL FS method has significantly enhanced MCL performance with p-value = 0.003, by generating the most informative features. Ultimately, this work concludes that BBA-MCL obtains significant features and effectively detects real-world events from heterogeneous news text documents.
CITATION STYLE
Al-Dyani, W. Z., Ahmad, F. K., & Kamaruddin, S. S. (2022). Binary Bat Algorithm for text feature selection in news events detection model using Markov clustering. Cogent Engineering, 9(1). https://doi.org/10.1080/23311916.2021.2010923
Mendeley helps you to discover research relevant for your work.