Data Pre-processing

Max Kuhn; Kjell Johnson

Book Chapter

Data Pre-processing

Kuhn M
Johnson K

Springer New York, (2013), 27-59

DOI: 10.1007/978-1-4614-6849-3_3

N/ACitations

45Readers

Get full text

Abstract

Data preprocessing techniques generally refer to the addition, deletion, or transformation of the training set data. Preprocessing data is a crucial step prior to modeling since data preparation can make or break a model’s predictive ability. To illustrate general preprocessing techniques, we begin by introducing a cell segmentation data set (Section 3.1). This data set contains common predictor problems such as skewness, outliers, and missing values. Sections 3.2 and 3.3 review predictor transformations for single predictors and multiple predictors, respectively. In Section 3.4 we discuss several approaches for handling missing data. Other preprocessing steps may include removing (Section 3.5), adding (Section 3.6), or binning (Section 3.7) predictors, all of which must be done carefully so that predictive information is not lost or erroneous information is added to the data. The computing section (3.8) provides R syntax for the previously described preprocessing steps. Exercises are provided at the end of the chapter to solidify concepts.

Cite

CITATION STYLE

APA

Kuhn, M., & Johnson, K. (2013). Data Pre-processing. In Applied Predictive Modeling (pp. 27–59). Springer New York. https://doi.org/10.1007/978-1-4614-6849-3_3

Data Pre-processing

Abstract

Cite

Register to see more suggestions