In the era of big data, a vast amount of data are being produced. This results in two main issues when trying to discover knowledge from these data. There is a lot of information that is not relevant to the problem we want to solve, and there are many imperfections and errors in the data. Therefore, preprocessing these data is a key step before applying any kind of learning algorithm. Reducing the number of features to a relevant subset (feature selection) and reducing the possible values of continuous variables (discretisation) are two of the main preprocessing techniques. This paper will review different methods for completing these two steps, focusing on the big data context and giving examples of projects where they have been applied.
CITATION STYLE
Lopez-Miguel, I. D. (2021). Survey on Preprocessing Techniques for Big Data Projects †. Engineering Proceedings, 7(1). https://doi.org/10.3390/engproc2021007014
Mendeley helps you to discover research relevant for your work.