Objective: The main objective of this project is to explore and analyse a secondary dataset which collected from “Hospital Uni-versitario de Caracas” in Caracas, Venezuela. Methods: The dataset comprises 858 patients’ information relating to demographic information and medical history data. There is a large number of records which are left with blank, which might be intentionally avoided by the patient due to privacy con-siderations. SAS Studio is utilized in data exploration and data pre-processing. Data cleaning and data transformation are con-ducted basing on the knowledge gathered in the process of data exploration. Afterwards, the dataset was exported from SAS Studio and uploaded to Hadoop Hortonworks platform for analysing purpose. Lastly, five hypotheses have been explored with the visualization tool of Tableau.
CITATION STYLE
Xiaotian, C., Thiruchelvam, V., & Vistro, D. M. (2020). Exploratory data analysis and etl with sas on hadoop eco-system with cervical cancer dataset. International Journal of Current Research and Review, 12(19), 88–104. https://doi.org/10.31782/IJCRR.2020.121924
Mendeley helps you to discover research relevant for your work.