A Spark-Based Artificial Bee Colony Algorithm for Unbalanced Large Data Classification

Jamil Al-Sawwa; Mohammad Almseidin

Journal ArticleOPEN ACCESS

A Spark-Based Artificial Bee Colony Algorithm for Unbalanced Large Data Classification

Information (Switzerland) (2022) 13(11)

DOI: 10.3390/info13110530

4Citations

11Readers

Abstract

With the rapid development of internet technology, the amount of collected or generated data has increased exponentially. The sheer volume, complexity, and unbalanced nature of this data pose a challenge to the scientific community to extract meaningful information from this data within a reasonable time. In this paper, we implemented a scalable design of an artificial bee colony for big data classification using Apache Spark. In addition, a new fitness function is proposed to handle unbalanced data. Two experiments were performed using the real unbalanced datasets to assess the performance and scalability of our proposed algorithm. The performance results reveal that our proposed fitness function can efficiently deal with unbalanced datasets and statistically outperforms the existing fitness function in terms of G-mean and (Formula presented.) -Score. In additon, the scalability results demonstrate that our proposed Spark-based design obtained outstanding speedup and scaleup results that are very close to optimal. In addition, our Spark-based design scales efficiently with increasing data size.

Author supplied keywords

Cite

CITATION STYLE

APA

Al-Sawwa, J., & Almseidin, M. (2022). A Spark-Based Artificial Bee Colony Algorithm for Unbalanced Large Data Classification. Information (Switzerland), 13(11). https://doi.org/10.3390/info13110530

A Spark-Based Artificial Bee Colony Algorithm for Unbalanced Large Data Classification

Abstract

Author supplied keywords

Cite

Register to see more suggestions