Protein languages differ depending on microorganism lifestyle

Citations of this article
Mendeley users who have this article in their library.


Few quantitative measures of genome architecture or organization exist to support assumptions of differences between microorganisms that are broadly defined as being free-living or pathogenic. General principles about complete proteomes exist for codon usage, amino acid biases and essential or core genes. Genome-wide shifts in amino acid usage between free-living and pathogenic microorganisms result in fundamental differences in the complexity of their respective proteomes that are size and gene content independent. These differences are evident across broad phylogenetic groups-a result of environmental factors and population genetic forces rather than phylogenetic distance. A novel comparative analysis of amino acid usage-utilizing linguistic analyses of word frequency in language and text-identified a global pattern of higher peptide word repetition in 376 free-living versus 421 pathogen genomes across broad ranges of genome size, G+C content and phylogenetic ancestry. This imprint of repetitive word usage indicates free-living microorganisms have a bias for repetitive sequence usage compared to pathogens. These findings quantify fundamental differences in microbial genomes relative to life-history function. © 2014 Grzymski, Marsh.




Grzymski, J. J., & Marsh, A. G. (2014). Protein languages differ depending on microorganism lifestyle. PLoS ONE, 9(5).

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free