Sequence information for the splicing of human pre-mRNA identified by support vector machine classification

114Citations
Citations of this article
95Readers
Mendeley users who have this article in their library.

Abstract

Vertebrate pre-mRNA transcripts contain many sequences that resemble splice sites on the basis of agreement to the consensus, yet these more numerous false splice sites are usually completely ignored by the cellular splicing machinery. Even at the level of exon definition, pseudo exons defined by such false splices sites outnumber real exons by an order of magnitude. We used a support vector machine to discover sequence information that could be used to distinguish real exons from pseudo exons. This machine learning tool led to the definition of potential branch points, an extended polypyrimidine tract, and C-rich and TG-rich motifs in a region limited to 50 nt upstream of constitutively spliced exons. C-rich sequences were also found in a region extending to 80 nt downstream of exons, along with G-triplet motifs. In addition, it was shown that combinations of three bases within the splice donor consensus sequence were more effective than consensus values in distinguishing real from pseudo splice sites; two-way base combinations were optimal for distinguishing 3′ splice sites. These data also suggest that interactions between two or more of these elements may contribute to exon recognition, and provide candidate sequences for assessment as intronic splicing enhancers.

References Powered by Scopus

Support-Vector Networks

46423Citations
N/AReaders
Get full text

Text categorization with support vector machines: Learning with many relevant features

4953Citations
N/AReaders
Get full text

Prediction of complete gene structures in human genomic DNA

3376Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Alternative splicing in disease and therapy

453Citations
N/AReaders
Get full text

Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine

401Citations
N/AReaders
Get full text

Computational definition of sequence motifs governing constitutive exon splicing

362Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Zhang, X. H. F., Heller, K. A., Hefter, I., Leslie, C. S., & Chasin, L. A. (2003). Sequence information for the splicing of human pre-mRNA identified by support vector machine classification. Genome Research, 13(12), 2637–2650. https://doi.org/10.1101/gr.1679003

Readers over time

‘09‘10‘11‘12‘13‘14‘15‘16‘17‘18‘19‘20‘21‘22‘23‘24‘25036912

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 36

57%

Researcher 15

24%

Professor / Associate Prof. 10

16%

Lecturer / Post doc 2

3%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 34

51%

Biochemistry, Genetics and Molecular Bi... 17

25%

Computer Science 10

15%

Engineering 6

9%

Save time finding and organizing research with Mendeley

Sign up for free
0