Performance analysis of queries with hive optimized data models

Meghna Sharma; Jagdeep Kaur

Conference Proceedings

Performance analysis of queries with hive optimized data models

Lecture Notes in Electrical Engineering (2020) 597 687-698

DOI: 10.1007/978-3-030-29407-6_49

1Citations

3Readers

Get full text

Abstract

The processing of structured data in Hadoop is achieved by Hive, a data warehouse tool. It is present on top of Hadoop and helps to analyze, query, and review the Big Data. The execution time of the queries has drastically reduced by using Hadoop MapReduce. This paper presents the detailed comparison of various optimizing techniques for data models like partitioning and bucket methods to improve the processing time for Hive queries. The implementation is done on data from New York Police Portal using AWS services for storage. Hive tool in Hadoop ecosystem is used for querying data. Use of partitioning has shown remarkable improvement in terms of execution time.

Author supplied keywords

Cite

CITATION STYLE

APA

Sharma, M., & Kaur, J. (2020). Performance analysis of queries with hive optimized data models. In Lecture Notes in Electrical Engineering (Vol. 597, pp. 687–698). Springer. https://doi.org/10.1007/978-3-030-29407-6_49

Performance analysis of queries with hive optimized data models

Abstract

Author supplied keywords

Cite

Register to see more suggestions