Implementation of ETL Process using Pig and Hadoop

undefined; undefined; Anushree Raj*; Rio D’Souza

Journal Article

Implementation of ETL Process using Pig and Hadoop

Raj* A
et al.

International Journal of Recent Technology and Engineering (IJRTE) (2020) 8(5) 4896-4899

DOI: 10.35940/ijrte.e4901.018520

N/ACitations

5Readers

Get full text

Abstract

ETL stands for extraction, transformation and loading, where extraction is done to active data from the source, transformation involve data cleansing, data filtering, data validation and finally application of certain rules and loading stores back the data to the destination repository where it has to finally reside. Pig is one of the most important to which could be applied in Extract, Transform and Load (ETL) process. It helps in applying the ETL approach to the large set of data. Initially Pig loads the data, and further is able to perform predictions, repetitions, expected conversions and further transformations. UDFs can be used to perform more complex algorithms during the transformation phase. The huge data processed by Pig, could be stored back in HDFS. In this paper we demonstrate the ETL process using Pig in Hadoop. Here we demonstrate how the files in HDFS are extracted, transformed and loaded back to HDFS using Pig. We extend the functionality of Pig Latin with Python UDFs to perform transformations.

Cite

CITATION STYLE

APA

Raj*, A., & D’Souza, R. (2020). Implementation of ETL Process using Pig and Hadoop. International Journal of Recent Technology and Engineering (IJRTE), 8(5), 4896–4899. https://doi.org/10.35940/ijrte.e4901.018520

Implementation of ETL Process using Pig and Hadoop

Abstract

Cite

Register to see more suggestions