Motivation: Transposon insertion sequencing (Tn-seq) is an emerging technology that combines transposon mutagenesis with next-generation sequencing technologies for the identification of genes related to bacterial survival. The resulting data from Tn-seq experiments consist of sequence reads mapped to millions of potential transposon insertion sites and a large portion of insertion sites have zero mapped reads. Novel statistical method for Tn-seq data analysis is needed to infer functions of genes on bacterial growth. Results: In this article, we propose a zero-inflated Poisson model for analyzing the Tn-seq data that are high-dimensional and with an excess of zeros. Maximum likelihood estimates of model parameters are obtained using an expectation-maximization (EM) algorithm, and pseudogenes are utilized to construct appropriate statistical tests for the transposon insertion tolerance of normal genes of interest. We propose a multiple testing procedure that categorizes genes into each of the three states, hypo-tolerant, tolerant and hyper-tolerant, while controlling the false discovery rate. We evaluate the proposed method with simulation studies and apply the proposed method to a real Tn-seq data from an experiment that studied the bacterial pathogen, Campylobacter jejuni. Availability and implementation: We provide R code for implementing our proposed method at http://github.com/ffliu/TnSeq. A user's guide with example data analysis is also available there. Supplementary information: Supplementary data are available at Bioinformatics online.
CITATION STYLE
Liu, F., Wang, C., Wu, Z., Zhang, Q., & Liu, P. (2016). A zero-inflated Poisson model for insertion tolerance analysis of genes based on Tn-seq data. Bioinformatics, 32(11), 1701–1708. https://doi.org/10.1093/bioinformatics/btw061
Mendeley helps you to discover research relevant for your work.