Building efficient fuzzy regression trees for large scale and high dimensional problems

10Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Regression trees (RTs) are simple, but powerful models, which have been widely used in the last decades in different scopes. Fuzzy RTs (FRTs) add fuzziness to RTs with the aim of dealing with uncertain environments. Most of the FRT learning approaches proposed in the literature aim to improve the accuracy, measured in terms of mean squared error, and often neglect to consider the computation time and/or the memory requirements. In today’s application domains, which require the management of huge amounts of data, this carelessness can strongly limit their use. In this paper, we propose a distributed FRT (DFRT) learning scheme for generating binary RTs from big datasets, that is based on the MapReduce paradigm. We have designed and implemented the scheme on the Apache Spark framework. We have used eight real-world and four synthetic datasets for evaluating its performance, in terms of mean squared error, computation time and scalability. As a baseline, we have compared the results with the distributed RT (DRT) and the Distributed Random Forest (DRF) available in the Spark MLlib library. Results show that our DFRT scales similarly to DRT and better than DRF. Regarding the performance, DFRT generalizes much better than DRT and similarly to DRF.

References Powered by Scopus

Random forests

96822Citations
N/AReaders
Get full text

MapReduce: Simplified data processing on large clusters

11961Citations
N/AReaders
Get full text

A Fuzzy K-Nearest Neighbor Algorithm

2258Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Spark-based parallel deep neural network model for classification of large scale RNAs into piRNAs and non-piRNAs

35Citations
N/AReaders
Get full text

Prediction of piRNAs and their function based on discriminative intelligent model using hybrid features into Chou's PseKNC

28Citations
N/AReaders
Get full text

Hierarchical fuzzy regression tree: A new gradient boosting approach to design a TSK fuzzy model

25Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Cózar, J., Marcelloni, F., Gámez, J. A., & de la Ossa, L. (2018). Building efficient fuzzy regression trees for large scale and high dimensional problems. Journal of Big Data, 5(1). https://doi.org/10.1186/s40537-018-0159-y

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 10

59%

Lecturer / Post doc 3

18%

Professor / Associate Prof. 2

12%

Researcher 2

12%

Readers' Discipline

Tooltip

Computer Science 8

53%

Engineering 3

20%

Agricultural and Biological Sciences 3

20%

Decision Sciences 1

7%

Save time finding and organizing research with Mendeley

Sign up for free