ExUTR: A novel pipeline for large-scale prediction of 3'-UTR sequences from NGS data

N/ACitations
Citations of this article
89Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: The three prime untranslated region (3'-UTR) is known to play a pivotal role in modulating gene expression by determining the fate of mRNA. Many crucial developmental events, such as mammalian spermatogenesis, tissue patterning, sex determination and neurogenesis, rely heavily on post-transcriptional regulation by the 3'-UTR. However, 3'-UTR biology seems to be a relatively untapped field, with only limited tools and 3'-UTR resources available. To elucidate the regulatory mechanisms of the 3'-UTR on gene expression, firstly the 3'-UTR sequences must be identified. Current 3'-UTR mining tools, such as GETUTR, 3USS and UTRscan, all depend on a well-annotated reference genome or curated 3'-UTR sequences, which hinders their application on a myriad of non-model organisms where the genomes are not available. To address these issues, the establishment of an NGS-based, automated pipeline is urgently needed for genome-wide 3'-UTR prediction in the absence of reference genomes. Results: Here, we propose ExUTR, a novel NGS-based pipeline to predict and retrieve 3'-UTR sequences from RNA-Seq experiments, particularly designed for non-model species lacking well-annotated genomes. This pipeline integrates cutting-edge bioinformatics tools, databases (Uniprot and UTRdb) and novel in-house Perl scripts, implementing a fully automated workflow. By taking transcriptome assemblies as inputs, this pipeline identifies 3'-UTR signals based primarily on the intrinsic features of transcripts, and outputs predicted 3'-UTR candidates together with associated annotations. In addition, ExUTR only requires minimal computational resources, which facilitates its implementation on a standard desktop computer with reasonable runtime, making it affordable to use for most laboratories. We also demonstrate the functionality and extensibility of this pipeline using publically available RNA-Seq data from both model and non-model species, and further validate the accuracy of predicted 3'-UTR using both well-characterized 3'-UTR resources and 3P-Seq data. Conclusions:ExUTR is a practical and powerful workflow that enables rapid genome-wide 3'-UTR discovery from NGS data. The candidates predicted through this pipeline will further advance the study of miRNA target prediction, cis elements in 3'-UTR and the evolution and biology of 3'-UTRs. Being independent of a well-annotated reference genome will dramatically expand its application to much broader research area, encompassing all species for which RNA-Seq is available.

Cite

CITATION STYLE

APA

Huang, Z., & Teeling, E. C. (2017). ExUTR: A novel pipeline for large-scale prediction of 3’-UTR sequences from NGS data. BMC Genomics, 18(1). https://doi.org/10.1186/s12864-017-4241-1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free