Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation

3Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of the main problems faced by Data Warehouse (DW) designers is fragmentation. Several studies have proposed data mining-based horizontal fragmentation methods, which focus on optimizing query response time and execution cost to make the DW more efficient. However, to the best of our knowledge, it does not exist a horizontal fragmentation technique that uses a decision tree to carry out fragmentation. Given the importance of decision trees in classification, since they allow obtaining pure partitions (subsets of tuples) in a data set using measures such as Information Gain, Gain Ratio and the Gini Index, the aim of this work is to use decision trees in the DW fragmentation. This chapter presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka considering four evaluation metrics (Precision, ROC Area, Recall, and F-measure) for different selected data sets using the SSB (Star Schema Benchmark). Several experiments were carried out using two attribute selection methods: Best First and Greedy Stepwise, the data sets were pre-processed using the Class Conditional Probabilities filter and it was included the analysis of two data sets (24 and 50 queries) with this filter, to know the behavior of the decision tree algorithms for each data set. Once the analysis was concluded, we can determine that for 24 queries data set the best algorithm was RandomTree since it won in two methods. On the other hand, in the data set of 50 queries, the best decision tree algorithms were LMT and RandomForest because they obtained the best performance for all methods tested. Finally, J48 was the selected algorithm when neither an attribute selection method nor the Class Probabilities filter are used. But, if only the latter is applied to the data set, the best performance is given by the LMT algorithm.

Cite

CITATION STYLE

APA

Rodríguez-Mazahua, N., Rodríguez-Mazahua, L., López-Chau, A., Alor-Hernández, G., & Peláez-Camarena, S. G. (2021). Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation. In Studies in Computational Intelligence (Vol. 966, pp. 337–363). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-71115-3_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free