Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study
- ISSN: 00166731
- ISBN: 0016-6731
- DOI: 10.1111/j.1471-8286.2007.01758.x
- PubMed: 12610526
- arXiv: arXiv:1011.1669v3
Frequencies of mutant sites are modeled as a Poisson random field in two species that share a sufficiently recent common ancestor. The selective effect of the new alleles can be favorable, neutral, or detrimental. The model is applied to the sample configurations of nucleotides in the alcohol dehydrogenase gene (Adh) in Drosophila simulans and Drosophila yakuba. Assuming a synonymous mutation rate of 1.5 x 10(-8) per site per year and 10 generations per year, we obtain estimates for the effective population size (N(e) = 6.5 x 10(6)), the species divergence time (tdiv = 3.74 million years), and an average selection coefficient (sigma = 1.53 x 10(-6) per generation for advantageous or mildly detrimental replacements), although it is conceivable that only two of the amino acid replacements were selected and the rest neutral. The analysis, which includes a sampling theory for the independent infinite sites model with selection, also suggests the estimate that the number of amino acids in the enzyme that are susceptible to favorable mutation is in the range 2-23 at any one time. The approach provides a theoretical basis for the use of a 2 x 2 contingency table to compare fixed differences and polymorphic sites with silent sites and amino acid replacements.