Identification of protein coding regions in RNA transcripts

300Citations
Citations of this article
217Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Massive parallel sequencing of RNA transcripts by next-generation technology (RNA-Seq) generates critically important data for eukaryotic gene discovery. Gene finding in transcripts can be done by statistical (alignment-free) as well as by alignment-based methods. We describe a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. The algorithm parameters are estimated by unsupervised training which makes unnecessary manually curated preparation of training sets. We demonstrate that (i) the unsupervised training is robust with respect to the presence of transcripts assembly errors and (ii) the accuracy of GeneMarkS-T in identifying protein-coding regions and, particularly, in predicting translation initiation sites in modelled as well as in assembled transcripts compares favourably to other existing methods.

References Powered by Scopus

RNA-Seq: A revolutionary tool for transcriptomics

9923Citations
N/AReaders
Get full text

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

7858Citations
N/AReaders
Get full text

Prodigal: Prokaryotic gene recognition and translation initiation site identification

7647Citations
N/AReaders
Get full text

Cited by Powered by Scopus

EnTAP: Bringing faster and smarter functional annotation to non-model eukaryotic transcriptomes

117Citations
N/AReaders
Get full text

Transcriptome, proteome and draft genome of Euglena gracilis

102Citations
N/AReaders
Get full text

A Chromosome-Scale Genome Assembly of Paper Mulberry (Broussonetia papyrifera) Provides New Insights into Its Forage and Papermaking Usage

101Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Tang, S., Lomsadze, A., & Borodovsky, M. (2015). Identification of protein coding regions in RNA transcripts. Nucleic Acids Research, 43(12). https://doi.org/10.1093/nar/gkv227

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 89

61%

Researcher 39

27%

Professor / Associate Prof. 14

10%

Lecturer / Post doc 5

3%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 72

49%

Biochemistry, Genetics and Molecular Bi... 54

37%

Computer Science 11

8%

Engineering 9

6%

Article Metrics

Tooltip
Mentions
References: 1

Save time finding and organizing research with Mendeley

Sign up for free