Datawarehouse design for big data in academia

14Citations
Citations of this article
66Readers
Mendeley users who have this article in their library.

Abstract

This paper describes the process of design and construction of a datawarehouse ("DW") for an online learning platformusing three prominent technologies, Microsoft SQL Server, MongoDB and Apache Hive. The three systems are evaluated for corpus construction and descriptive analytics. The case also demonstrates the value of evidence-centered design principles for data warehouse design that is sustainable enough to adapt to the demands of handling big data in a variety of contexts. Additionally, the paper addresses maintainability-performance tradeoff, storage considerations and accessibility of big data corpora. In this NSF-sponsored work, the data were processed, transformed, and stored in the three versions of a data warehouse in search for a better performing and more suitable platform. The data warehouse engines-a relational database, a No-SQL database, and a big data technology for parallel computations-were subjected to principled analysis. Design, construction and evaluation of a data warehouse were scrutinized to find improved ways of storing, organizing and extracting information. The work also examines building corpora, performing ad-hoc extractions, and ensuring confidentiality. It was found that Apache Hive demonstrated the best processing time followed by SQL Server and MongoDB. In the aspect of analytical queries, the SQL Server was a top performer followed by MongoDB and Hive. This paper also discusses a novel process for render students anonymity complying with Family Educational Rights and Privacy Act regulations. Five phases for DW design are recommended: 1) Establishing goals at the outset based on Evidence-Centered Design principles; 2) Recognizing the unique demands of student data and use; 3) Adopting a model that integrates cost with technical considerations; 4) Designing a comparative database and 5) Planning for a DW design that is sustainable. Recommendations for future research include attempting DW design in contexts involving larger data sets, more refined operations, and ensuring attention is paid to sustainability of operations.

Cite

CITATION STYLE

APA

Rudniy, A. (2022). Datawarehouse design for big data in academia. Computers, Materials and Continua, 71(1), 979–992. https://doi.org/10.32604/cmc.2022.016676

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free