Estimating optimal window size for analysis of low-coverage next-generation sequence data

Arief Gusnanto; Charles C. Taylor; Ibrahim Nafisah; Henry M. Wood; Pamela Rabbitts; Stefano Berri

Journal ArticleOPEN ACCESS

Estimating optimal window size for analysis of low-coverage next-generation sequence data

Bioinformatics (2014) 30(13) 1823-1829

DOI: 10.1093/bioinformatics/btu123

25Citations

67Readers

Abstract

Motivation: Current high-throughput sequencing has greatly transformed genome sequence analysis. In the context of very low-coverage sequencing (<0.1×), performing 'binning' or 'windowing' on mapped short sequences ('reads') is critical to extract genomic information of interest for further evaluation, such as copy-number alteration analysis. If the window size is too small, many windows will exhibit zero counts and almost no pattern can be observed. In contrast, if the window size is too wide, the patterns or genomic features will be 'smoothed out'. Our objective is to identify an optimal window size in between the two extremes. Results: We assume the reads density to be a step function. Given this model, we propose a data-based estimation of optimal window size based on Akaike's information criterion (AIC) and cross-validation (CV) log-likelihood. By plotting the AIC and CV log-likelihood curve as a function of window size, we are able to estimate the optimal window size that minimizes AIC or maximizes CV log-likelihood. The proposed methods are of general purpose and we illustrate their application using low-coverage next-generation sequence datasets from real tumour samples and simulated datasets. © The Author 2014.

Cite

CITATION STYLE

APA

Gusnanto, A., Taylor, C. C., Nafisah, I., Wood, H. M., Rabbitts, P., & Berri, S. (2014). Estimating optimal window size for analysis of low-coverage next-generation sequence data. Bioinformatics, 30(13), 1823–1829. https://doi.org/10.1093/bioinformatics/btu123

Estimating optimal window size for analysis of low-coverage next-generation sequence data

Abstract

Cite

Register to see more suggestions