High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes

110Citations
Citations of this article
220Readers
Mendeley users who have this article in their library.

Abstract

We investigate the effect of aligner choice on inferences of positive selection using site-specific models of molecular evolution. We find that independently of the choice of aligner, the rate of false positives is unacceptably high. Our study is a whole-genome analysis of all protein-coding genes in 12 Drosophila genomes annotated in either all 12 species (∼6690 genes) or in the six melanogaster group species. We compare six popular aligners: PRANK, T-Coffee, Clustal W, ProbCons, AMAP, and MUSCLE, and find that the aligner choice strongly influences the estimates of positive selection. Differences persist when we use (1) different stringency cutoffs, (2) different selection inference models, (3) alignments with or without gaps, and/or additional masking, (4) per-site versus per-gene statistics, (5) closely related melanogaster group species versus more distant 12 Drosophila genomes. Furthermore, we find that these differences are consequential for downstream analyses such as determination of over/under-represented GO terms associated with positive selection. Visual analysis indicates that most sites inferred as positively selected are, in fact, misaligned at the codon level, resulting in false positive rates of 48%-82%. PRANK, which has been reported to outperform other aligners in simulations, performed best in our empirical study as well. Unfortunately, PRANK still had a high, and unacceptable for most applications, false positives rate of 50%-55%. We identify misannotations and indels, many of which appear to be located in disordered protein regions, as primary culprits for the high misalignment-related error levels and discuss possible workaround approaches to this apparently pervasive problem in genome-wide evolutionary analyses. © 2011 by Cold Spring Harbor Laboratory Press.

References Powered by Scopus

CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice

58602Citations
N/AReaders
Get full text

MUSCLE: Multiple sequence alignment with high accuracy and high throughput

35980Citations
N/AReaders
Get full text

PAML 4: Phylogenetic analysis by maximum likelihood

10109Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Evolutionary insights into host-pathogen interactions from mammalian sequence data

207Citations
N/AReaders
Get full text

Strong Purifying Selection at Synonymous Sites in D. melanogaster

160Citations
N/AReaders
Get full text

Multiple sequence alignment modeling: Methods and applications

147Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Markova-Raina, P., & Petrov, D. (2011). High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes. Genome Research, 21(6), 863–874. https://doi.org/10.1101/gr.115949.110

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 98

54%

Researcher 51

28%

Professor / Associate Prof. 27

15%

Lecturer / Post doc 4

2%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 137

74%

Biochemistry, Genetics and Molecular Bi... 43

23%

Computer Science 2

1%

Environmental Science 2

1%

Save time finding and organizing research with Mendeley

Sign up for free