Indian Premier League Dataset Analytics using Hadoop-Hive

  • S. S
  • et al.
N/ACitations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Big Data is a term used to represent huge volume of both unstructured and structured data which cannot be processed by the traditional data processing techniques. This data is too huge, grows exponentially and doesn't fit into the structure of the traditional database systems. Analyzing Big Data is a very challenging task since it involves the processing of huge amount of data. As the industry or its business grows, the data related to the industries also tend to grow on a larger scale. Prominent data analysis tools are required to analyze the data in order to gain value out of it. Hadoop is a sought-after open source framework that uses MapReduce techniques to store and process huge datasets. However, the programs written using MapReduce techniques are not flexible and also require maintenance. This problem is overcome by making use of HiveQL. In order to execute queries in HiveQL, the platform required is Hive. It is an open-source data warehousing set-up built on Hadoop. HiveQL queries are compiled into MapReduce jobs that are executed utilizing Hadoop. In this paper we have analyzed the Indian Premier League dataset using HiveQL and compared its execution time with that of traditional SQL queries. It was found that the HiveQL provided better performance with larger dataset while SQL performed better with smaller datasets.

Cite

CITATION STYLE

APA

S., S., & S., S. (2019). Indian Premier League Dataset Analytics using Hadoop-Hive. International Journal of Engineering and Advanced Technology, 9(2), 3999–4004. https://doi.org/10.35940/ijeat.b4579.129219

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free