Performance improvement of open source based business intelligence system using database modeling and outlier detection

0Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With all the advanced technology nowadays, new data is being generated every minute. For example, the average size of the computer’s hard disk is 10 gigabytes in 2000, today on the Facebook website has increased 500 terabytes of new data per day [1]. Data is growing rapidly, but it is not enough valuable. Thus, it is important to extract information that is useful in the future from a large amount of data. Business intelligence (BI) systems make a prediction that supports a business decision by analyzing collected data [2]. However, the accuracy of prediction depends on a data quality. In practice, data is usually a very low quality that includes many incomplete and anomaly data. Moreover, another problem is if data size increases, query response will be slow. Previous research work, we proposed a framework based on open-source technologies for the BI systems that possibility to analyze big data efficiently and apply it to the supermarket’s BI system. Under this solution, we have studied Hadoop data storage system, Hive data warehouse software, Sqoop data transmission tool and etc., successfully implemented them. In this paper, we have added anomaly detection stage on the proposed framework to improve information about related products that are purchased together by eliminating anomaly. Also, we have made an experimental study to improve the speed of time-dependent reports by applying the dimensional model to Hive data warehouse. In dimensional model data is stored in context of the single table (centralized context), and in relational model the context is distributed over many tables. As a result of the experimental study, the dimensional model is more efficient; its query response time is shown to be at least two times faster than the relational model based data warehouse.

Cite

CITATION STYLE

APA

Amarbayasgalan, T., Li, M., Namsrai, O. E., Jargalsaikhan, B., & Ryu, K. H. (2020). Performance improvement of open source based business intelligence system using database modeling and outlier detection. In Studies in Computational Intelligence (Vol. 830, pp. 373–386). Springer Verlag. https://doi.org/10.1007/978-3-030-14132-5_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free