Evaluation of Missing Value Estimation for Microarray Data

Danh V. Nguyen; Naisyin Wang; Raymond J. Carroll

Journal ArticleOPEN ACCESS

Evaluation of Missing Value Estimation for Microarray Data

Nguyen D
Wang N
Carroll R

Journal of Data Science (2021) 2(4) 347-370

DOI: 10.6339/jds.2004.02(4).170

N/ACitations

43Readers

Abstract

Microarray gene expression data contains missing values (MVs). However, some methods for downstream analyses, including some predic-tion tools, require a complete expression data matrix. Current methods for estimating the MVs include sample mean and K-nearest neighbors (KNN). Whether the accuracy of estimation (imputation) methods depends on the actual gene expression has not been thoroughly investigated. Under this set-ting, we examine how the accuracy depends on the actual expression level and propose new methods that provide improvements in accuracy relative to the current methods in certain ranges of gene expression. In particular, we propose regression methods, namely multiple imputation via ordinary least squares (OLS) and missing value prediction using partial least squares (PLS). Mean estimation of MVs ignores the observed correlation structure of the genes and is highly inaccurate. Estimating MVs using KNN, a method which incorporates pairwise gene expression information, provides substan-tial improvement in accuracy on average. However, the accuracy of KNN across the wide range of observed gene expression is unlikely to be uniform and this is revealed by evaluating accuracy as a function of the expression level.

Cite

CITATION STYLE

APA

Nguyen, D. V., Wang, N., & Carroll, R. J. (2021). Evaluation of Missing Value Estimation for Microarray Data. Journal of Data Science, 2(4), 347–370. https://doi.org/10.6339/jds.2004.02(4).170

Evaluation of Missing Value Estimation for Microarray Data

Abstract

Cite

Register to see more suggestions