We present a fast and accurate algorithm for reducing large-scale genetic marker data to a smaller, less noisy, and more complete set of bins, representing uniquely identifiable locations on a chromosome. Our experimental results on real and synthetic data show that our algorithm runs in nearlinear time, allowing for the analysis of millions of markers. Our algorithm reduces the problem scale while preserving accuracy, making it feasible to use existing genetic mapping tools without resorting to complex, time-intensive pre-processing methods to filter or sample the original data set. Additionally, our approach also decreases the uncertainty in genotype calls, improving the quality of the data. Preliminary results demonstrate that existing methods for marker ordering designed for the small scale settings perform with equivalent accuracy when given our reduced bin set as input.
CITATION STYLE
Strnadová-Neeley, V., Buluç, A., Chapman, J., Gilbert, J. R., Gonzalez, J., & Oliker, L. (2015). Efficient data reduction for large-scale genetic mapping. In BCB 2015 - 6th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 126–135). Association for Computing Machinery, Inc. https://doi.org/10.1145/2808719.2808732
Mendeley helps you to discover research relevant for your work.