Data Pre-processing

  • Kuhn M
  • Johnson K
N/ACitations
Citations of this article
45Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Data preprocessing techniques generally refer to the addition, deletion, or transformation of the training set data. Preprocessing data is a crucial step prior to modeling since data preparation can make or break a model’s predictive ability. To illustrate general preprocessing techniques, we begin by introducing a cell segmentation data set (Section 3.1). This data set contains common predictor problems such as skewness, outliers, and missing values. Sections 3.2 and 3.3 review predictor transformations for single predictors and multiple predictors, respectively. In Section 3.4 we discuss several approaches for handling missing data. Other preprocessing steps may include removing (Section 3.5), adding (Section 3.6), or binning (Section 3.7) predictors, all of which must be done carefully so that predictive information is not lost or erroneous information is added to the data. The computing section (3.8) provides R syntax for the previously described preprocessing steps. Exercises are provided at the end of the chapter to solidify concepts.

Cite

CITATION STYLE

APA

Kuhn, M., & Johnson, K. (2013). Data Pre-processing. In Applied Predictive Modeling (pp. 27–59). Springer New York. https://doi.org/10.1007/978-1-4614-6849-3_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free