Missing value estimation methods for DNA methylation data

38Citations
Citations of this article
63Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: DNA methylation is a stable epigenetic mark with major implications in both physiological (development, aging) and pathological conditions (cancers and numerous diseases). Recent research involving methylation focuses on the development of molecular age estimation methods based on DNA methylation levels (mAge). An increasing number of studies indicate that divergences between mAge and chronological age may be associated to age-related diseases. Current advances in high-throughput technologies have allowed the characterization of DNA methylation levels throughout the human genome. However, experimental methylation profiles often contain multiple missing values that can affect the analysis of the data and also mAge estimation. Although several imputation methods exist, a major deficiency lies in the inability to cope with large datasets, such as DNA methylation chips. Specific methods for imputing missing methylation data are therefore needed. Results: We present a simple and computationally efficient imputation method, metyhLImp, based on linear regression. The rationale of the approach lies in the observation that methylation levels show a high degree of inter-sample correlation. We performed a comparative study of our approach with other imputation methods on DNA methylation data of healthy and disease samples from different tissues. Performances have been assessed both in terms of imputation accuracy and in terms of the impact imputed values have on mAge estimation. In comparison to existing methods, our linear regression model proves to perform equally or better and with good computational efficiency. The results of our analysis provide recommendations for accurate estimation of missing methylation values. Availability and implementation: The R-package methyLImp is freely available at https://github.com/pdilena/methyLImp. Supplementary information: Supplementary data are available at Bioinformatics online.

Cite

CITATION STYLE

APA

Di Lena, P., Sala, C., Prodi, A., & Nardini, C. (2019). Missing value estimation methods for DNA methylation data. Bioinformatics, 35(19), 3786–3793. https://doi.org/10.1093/bioinformatics/btz134

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free