PATMAP: Polyadenylation site identification from next-generation sequencing data

Xiaohui Wu; Meishuang Tang; Junfeng Yao; Shuiyuan Lin; Zhe Xiang; Guoli Ji

Conference Proceedings

PATMAP: Polyadenylation site identification from next-generation sequencing data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7208 LNAI(PART 1) 485-496

DOI: 10.1007/978-3-642-28942-2_44

0Citations

3Readers

Get full text

Abstract

Polyadenylation is an essential post-transcriptional processing step in the maturation of eukaryotic mRNA. The coming flood of next-generation sequencing (NGS) data creates new opportunities for intensive study of polyadenylation. We present an automated flow called PATMAP to identify polyadenylation sites (poly(A) sites) by integrating NGS data cleaning, processing, mapping, normalizing and clustering. The ambiguous region was introduced to parse the genome annotation by first. Then a series of Perl scripts were seamlessly integrated to iteratively map the single-end or paired-end sequences to the reference genome. After mapping, the poly(A) tags (PATs) at the same coordinate were grouped into one cleavage site, and the internal priming artifacts were removed. Finally, these cleavage sites from different samples were normalized by a MA-based method and clustered into poly(A) clusters (PACs) by empirical Bayesian method. The effectiveness of PATMAP was demonstrated by identifying thousands of reliable PACs from millions of NGS sequences in Arabidopsis and yeast. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Wu, X., Tang, M., Yao, J., Lin, S., Xiang, Z., & Ji, G. (2012). PATMAP: Polyadenylation site identification from next-generation sequencing data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7208 LNAI, pp. 485–496). https://doi.org/10.1007/978-3-642-28942-2_44

PATMAP: Polyadenylation site identification from next-generation sequencing data

Abstract

Author supplied keywords

Cite

Register to see more suggestions