Availability of big data transformed the way machine learning works and the way data is used in machine learning. In real time the data gathered from various sources might be unstructured, incomplete, unrealistic and incorrect in nature. Transforming the data with the above-mentioned qualities and making it ready for analysis is a challenging task. As the quality of data have direct impact on the efficiency of the trained model, data exploratory analysis (DEA) plays a major role in understanding the data and forms the quality training dataset for the machine learning algorithms. This paper emphasizes the importance of DEA in the selection of the significant attributes and filling of missing values to form the quality training dataset. The dataset considered for experimentation is a binary classification problem “Survival prediction of Titanic Passengers”. Experimental results show that training the model with the quality dataset has improved the accuracy as compared to the case when the model was trained with a raw data.
CITATION STYLE
Chandrasekar, J. B., Murugesh, S., & Prasadula, V. R. (2021). Data exploratory analysis for classification in machine learning algorithms. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 53, pp. 113–125). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-15-5258-8_13
Mendeley helps you to discover research relevant for your work.