Few quantitative measures of genome architecture or organization exist to support assumptions of differences between microorganisms that are broadly defined as being free-living or pathogenic. General principles about complete proteomes exist for codon usage, amino acid biases and essential or core genes. Genome-wide shifts in amino acid usage between free-living and pathogenic microorganisms result in fundamental differences in the complexity of their respective proteomes that are size and gene content independent. These differences are evident across broad phylogenetic groups-a result of environmental factors and population genetic forces rather than phylogenetic distance. A novel comparative analysis of amino acid usage-utilizing linguistic analyses of word frequency in language and text-identified a global pattern of higher peptide word repetition in 376 free-living versus 421 pathogen genomes across broad ranges of genome size, G+C content and phylogenetic ancestry. This imprint of repetitive word usage indicates free-living microorganisms have a bias for repetitive sequence usage compared to pathogens. These findings quantify fundamental differences in microbial genomes relative to life-history function. © 2014 Grzymski, Marsh.
Grzymski, J. J., & Marsh, A. G. (2014). Protein languages differ depending on microorganism lifestyle. PLoS ONE, 9(5). https://doi.org/10.1371/journal.pone.0096910