Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation

Nidia Rodríguez-Mazahua; Lisbeth Rodríguez-Mazahua; Asdrúbal López-Chau; Giner Alor-Hernández; S. Gustavo Peláez-Camarena

Book Chapter

Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation

Springer Science and Business Media Deutschland GmbH, (2021), 337-363

DOI: 10.1007/978-3-030-71115-3_15

3Citations

10Readers

Get full text

Abstract

One of the main problems faced by Data Warehouse (DW) designers is fragmentation. Several studies have proposed data mining-based horizontal fragmentation methods, which focus on optimizing query response time and execution cost to make the DW more efficient. However, to the best of our knowledge, it does not exist a horizontal fragmentation technique that uses a decision tree to carry out fragmentation. Given the importance of decision trees in classification, since they allow obtaining pure partitions (subsets of tuples) in a data set using measures such as Information Gain, Gain Ratio and the Gini Index, the aim of this work is to use decision trees in the DW fragmentation. This chapter presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka considering four evaluation metrics (Precision, ROC Area, Recall, and F-measure) for different selected data sets using the SSB (Star Schema Benchmark). Several experiments were carried out using two attribute selection methods: Best First and Greedy Stepwise, the data sets were pre-processed using the Class Conditional Probabilities filter and it was included the analysis of two data sets (24 and 50 queries) with this filter, to know the behavior of the decision tree algorithms for each data set. Once the analysis was concluded, we can determine that for 24 queries data set the best algorithm was RandomTree since it won in two methods. On the other hand, in the data set of 50 queries, the best decision tree algorithms were LMT and RandomForest because they obtained the best performance for all methods tested. Finally, J48 was the selected algorithm when neither an attribute selection method nor the Class Probabilities filter are used. But, if only the latter is applied to the data set, the best performance is given by the LMT algorithm.

Cite

CITATION STYLE

APA

Rodríguez-Mazahua, N., Rodríguez-Mazahua, L., López-Chau, A., Alor-Hernández, G., & Peláez-Camarena, S. G. (2021). Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation. In Studies in Computational Intelligence (Vol. 966, pp. 337–363). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-71115-3_15

Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation

Abstract

Cite

Register to see more suggestions