Implementation of ETL Process using Pig and Hadoop

  • Raj* A
  • et al.
N/ACitations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

ETL stands for extraction, transformation and loading, where extraction is done to active data from the source, transformation involve data cleansing, data filtering, data validation and finally application of certain rules and loading stores back the data to the destination repository where it has to finally reside. Pig is one of the most important to which could be applied in Extract, Transform and Load (ETL) process. It helps in applying the ETL approach to the large set of data. Initially Pig loads the data, and further is able to perform predictions, repetitions, expected conversions and further transformations. UDFs can be used to perform more complex algorithms during the transformation phase. The huge data processed by Pig, could be stored back in HDFS. In this paper we demonstrate the ETL process using Pig in Hadoop. Here we demonstrate how the files in HDFS are extracted, transformed and loaded back to HDFS using Pig. We extend the functionality of Pig Latin with Python UDFs to perform transformations.

Cite

CITATION STYLE

APA

Raj*, A., & D’Souza, R. (2020). Implementation of ETL Process using Pig and Hadoop. International Journal of Recent Technology and Engineering (IJRTE), 8(5), 4896–4899. https://doi.org/10.35940/ijrte.e4901.018520

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free