Data preprocessing techniques generally refer to the addition, deletion, or transformation of the training set data. Preprocessing data is a crucial step prior to modeling since data preparation can make or break a model’s predictive ability. To illustrate general preprocessing techniques, we begin by introducing a cell segmentation data set (Section 3.1). This data set contains common predictor problems such as skewness, outliers, and missing values. Sections 3.2 and 3.3 review predictor transformations for single predictors and multiple predictors, respectively. In Section 3.4 we discuss several approaches for handling missing data. Other preprocessing steps may include removing (Section 3.5), adding (Section 3.6), or binning (Section 3.7) predictors, all of which must be done carefully so that predictive information is not lost or erroneous information is added to the data. The computing section (3.8) provides R syntax for the previously described preprocessing steps. Exercises are provided at the end of the chapter to solidify concepts.
CITATION STYLE
Kuhn, M., & Johnson, K. (2013). Data Pre-processing. In Applied Predictive Modeling (pp. 27–59). Springer New York. https://doi.org/10.1007/978-1-4614-6849-3_3
Mendeley helps you to discover research relevant for your work.