PeakPass: Automating ChIP-Seq Blacklist Creation

Charles E. Wimberley; Steffen Heber

Journal ArticleOPEN ACCESS

PeakPass: Automating ChIP-Seq Blacklist Creation

Journal of Computational Biology (2020) 27(2) 259-268

DOI: 10.1089/cmb.2019.0295

8Citations

20Readers

Abstract

ChIP-Seq blacklists contain genomic regions that frequently produce artifacts and noise in ChIP-Seq experiments. To improve signal-to-noise ratio, ChIP-Seq pipelines often remove data points that map to blacklist regions. Existing blacklists have been compiled in a manual or semiautomated way. In this article we describe PeakPass, an efficient method to generate blacklists, and demonstrate that blacklists can increase ChIP-Seq data quality. PeakPass leverages machine learning and attempts to automate blacklist generation. PeakPass uses a random forest classifier in combination with genomic features such as sequence, annotated repeats, complexity, assembly gaps, and the ratio of multimapping to uniquely mapping reads to identify artifact regions. We have validated PeakPass on a large data set and tested it for the purpose of upgrading a blacklist to a new reference genome version. We trained PeakPass on the ENCODE blacklist for the hg19 human reference genome, and created an updated blacklist for hg38. To assess the performance of this blacklist, we tested 42 ChIP-Seq replicates from 24 experiments using 10 ChIP-Seq quality metrics including relative strand coefficient, standardized standard deviation, and enrichment of reads in promoter regions. Using the blacklist generated by PeakPass resulted in a statistically significant improvement for nine of these metrics.

Author supplied keywords

Cite

CITATION STYLE

APA

Wimberley, C. E., & Heber, S. (2020). PeakPass: Automating ChIP-Seq Blacklist Creation. Journal of Computational Biology, 27(2), 259–268. https://doi.org/10.1089/cmb.2019.0295

PeakPass: Automating ChIP-Seq Blacklist Creation

Abstract

Author supplied keywords

Cite

Register to see more suggestions