A Spark-Based Artificial Bee Colony Algorithm for Unbalanced Large Data Classification

4Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

With the rapid development of internet technology, the amount of collected or generated data has increased exponentially. The sheer volume, complexity, and unbalanced nature of this data pose a challenge to the scientific community to extract meaningful information from this data within a reasonable time. In this paper, we implemented a scalable design of an artificial bee colony for big data classification using Apache Spark. In addition, a new fitness function is proposed to handle unbalanced data. Two experiments were performed using the real unbalanced datasets to assess the performance and scalability of our proposed algorithm. The performance results reveal that our proposed fitness function can efficiently deal with unbalanced datasets and statistically outperforms the existing fitness function in terms of G-mean and (Formula presented.) -Score. In additon, the scalability results demonstrate that our proposed Spark-based design obtained outstanding speedup and scaleup results that are very close to optimal. In addition, our Spark-based design scales efficiently with increasing data size.

Cite

CITATION STYLE

APA

Al-Sawwa, J., & Almseidin, M. (2022). A Spark-Based Artificial Bee Colony Algorithm for Unbalanced Large Data Classification. Information (Switzerland), 13(11). https://doi.org/10.3390/info13110530

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free