Haplotype analysis has become an important tool in studying species traits and susceptibility to diseases. Several computational methods for determining haplotype information from genotype data have been developed, but none is perfect. Haplotype Inference (HI) approaches based on different strategies or biological principles tend to fail in different loci. In this work we apply Multiple Linear Regression to explore the relevance of several biologically meaningful properties of the genotype sequences for the occurrence of errors in the results of three HI methods based on different principles. We develop models for databases on different elements, using two error metrics. We assess the accuracy of our results through statistical analysis. Our models reveal genotype properties that are relevant in general and others that are suited for particular scenarios. We also show that the Regression models present statistically better performance than Neural Network models developed for the same databases and properties. © 2012 Springer-Verlag.
CITATION STYLE
Rosa, R. S., Santos, R. H. S., & Guimarães, K. S. (2012). Associating genotype sequence properties to haplotype inference errors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7409 LNBI, pp. 132–143). https://doi.org/10.1007/978-3-642-31927-3_12
Mendeley helps you to discover research relevant for your work.