Accelerated Estimation of Frequency Classes in Site-Heterogeneous Profile Mixture Models

Edward Susko; Lea Lincker; Andrew J. Roger

Journal ArticleOPEN ACCESS

Accelerated Estimation of Frequency Classes in Site-Heterogeneous Profile Mixture Models

Molecular Biology and Evolution (2018) 35(5) 1266-1283

DOI: 10.1093/molbev/msy026

18Citations

29Readers

Abstract

As a consequence of structural and functional constraints, proteins tend to have site-specific preferences for particular amino acids. Failing to adjust for heterogeneity of frequencies over sites can lead to artifacts in phylogenetic estimation. Site-heterogeneous mixture-models have been developed to address this problem. However, due to prohibitive computational times, maximum likelihood implementations utilize fixed component frequency vectors inferred from sequences in a database that are external to the alignment under analysis. Here, we propose a composite likelihood approach to estimation of component frequencies for a mixture model that directly uses the data from the alignment of interest. In the common case that the number of taxa under study is not large, several adjustments to the default composite likelihood are shown to be necessary. In simulations, the approach is shown to provide large improvements over hierarchical clustering. For empirical data, substantial improvements in likelihoods are found over mixtures using fixed components.

Author supplied keywords

Cite

CITATION STYLE

APA

Susko, E., Lincker, L., & Roger, A. J. (2018). Accelerated Estimation of Frequency Classes in Site-Heterogeneous Profile Mixture Models. Molecular Biology and Evolution, 35(5), 1266–1283. https://doi.org/10.1093/molbev/msy026

Accelerated Estimation of Frequency Classes in Site-Heterogeneous Profile Mixture Models

Abstract

Author supplied keywords

Cite

Register to see more suggestions