Precision-mapping and statistical validation of quantitative trait loci by machine learning

Justin Bedo; Peter Wenzl; Adam Kowalczyk; Andrzej Kilian

Journal ArticleOPEN ACCESS

Precision-mapping and statistical validation of quantitative trait loci by machine learning

BMC Genetics (2008) 9

DOI: 10.1186/1471-2156-9-35

17Citations

77Readers

Abstract

Background: We introduce a QTL-mapping algorithm based on Statistical Machine Learning (SML) that is conceptually quite different to existing methods as there is a strong focus on generalisation ability. Our approach combines ridge regression, recursive feature elimination, and estimation of generalisation performance and marker effects using bootstrap resampling. Model performance and marker effects are determined using independent testing samples (individuals), thus providing better estimates. We compare the performance of SML against Composite Interval Mapping (CIM), Bayesian Interval Mapping (BIM) and single Marker Regression (MR) on synthetic datasets and a multi-trait and multi-environment dataset of the progeny for a cross between two barley cultivars. Results: In an analysis of the synthetic datasets, SML accurately predicted the number of QTL underlying a trait while BIM tended to underestimate the number of QTL. The QTL identified by SML for the barley dataset broadly coincided with known QTL locations. SML reported approximately half of the QTL reported by either CIM or MR, not unexpected given that neither CIM nor MR incorporates independent testing. The latter makes these two methods susceptible to producing overly optimistic estimates of QTL effects, as we demonstrate for MR. The QTL resolution (peak definition) afforded by SML was consistently superior to MR, CIM and BIM, with QTL detection power similar to BIM. The precision of SML was underscored by repeatedly identifying, at ≤ 1-cM precision, three QTL for four partially related traits (heading date, plant height, lodging and yield). The set of QTL obtained using a 'raw' and a 'curated' version of the same genotypic dataset were more similar to each other for SML than for CIM or MR. Conclusion: The SML algorithm produces better estimates of QTL effects because it eliminates the optimistic bias in the predictive performance of other QTL methods. It produces narrower peaks than other methods (except BIM) and hence identifies QTL with greater precision. It is more robust to genotyping and linkage mapping errors, and identifies markers linked to QTL in the absence of a genetic map. © 2008 Bedo et al; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

Bedo, J., Wenzl, P., Kowalczyk, A., & Kilian, A. (2008). Precision-mapping and statistical validation of quantitative trait loci by machine learning. BMC Genetics, 9. https://doi.org/10.1186/1471-2156-9-35

Precision-mapping and statistical validation of quantitative trait loci by machine learning

Abstract

Cite

Register to see more suggestions