Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15

9Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.

Abstract

This paper looks at the impact of changing Spark’s configuration parameters on machine learning algorithms using a large dataset—the UNSW-NB15 dataset. The environmental conditions that will optimize the classification process are studied. To build smart intrusion detection systems, a deep understanding of the environmental parameters is necessary. Specifically, the focus is on the following environmental parameters: the executor memory, number of executors, number of cores per executor, execution time, as well as the impact on statistical measures. Hence, the objective was to optimize resource usage and minimize processing time for Decision Tree classification, using Spark. This shows whether additional resources will increase performance, lower processing time, and optimize computing resources. The UNSW-NB15 dataset, being a large dataset, provides enough data and complexity to see the changes in computing resource configurations in Spark. Principal Component Analysis was used for preprocessing the dataset. Results indicated that a lack of executors and cores result in wasted resources and long processing time. Excessive resource allocation did not improve processing time. Environmental tuning has a noticeable impact.

Cite

CITATION STYLE

APA

Bagui, S., Walauskis, M., Derush, R., Praviset, H., & Boucugnani, S. (2022). Spark Configurations to Optimize Decision Tree Classification on UNSW-NB15. Big Data and Cognitive Computing, 6(2). https://doi.org/10.3390/bdcc6020038

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free