On the use of random discretization and dimensionality reduction in ensembles for big data

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Massive data growth in recent years has made data reduction techniques to gain a special popularity because of their ability to reduce this enormous amount of data, also called Big Data. Random Projection Random Discretization is an innovative ensemble method. It uses two data reduction techniques to create more informative data, their proposed Random Discretization, and Random Projections (RP). However, RP has some shortcomings that can be solved by more powerful methods such as Principal Components Analysis (PCA). Aiming to tackle this problem, we propose a new ensemble method using the Apache Spark framework and PCA for dimensionality reduction, named Random Discretization Dimensionality Reduction Ensemble. In our experiments on five Big Data datasets, we show that our proposal achieves better prediction performance than the original algorithm and Random Forest.

Cite

CITATION STYLE

APA

García-Gil, D., Ramírez-Gallego, S., García, S., & Herrera, F. (2018). On the use of random discretization and dimensionality reduction in ensembles for big data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10870 LNAI, pp. 15–26). Springer Verlag. https://doi.org/10.1007/978-3-319-92639-1_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free