Background: Increasingly, similarity networks are being used for evolutionary analyses of molecular datasets. These networks are very useful, in particular for the analysis of gene sharing, lateral gene transfer and for the detection of distant homologs. Currently, such analyses require some computer programming skills due to the limited availability of user-friendly freely distributed software. Consequently, although appealing, the construction and analyses of these networks remain less familiar to biologists than do phylogenetic approaches. Results: In order to ease the use of similarity networks in the community of evolutionary biologists, we introduce a software program, EGN, that runs under Linux or MacOSX. EGN automates the reconstruction of gene and genome networks from nucleic and proteic sequences. EGN also implements statistics describing genetic diversity in these samples, for various user-defined thresholds of similarities. In the interest of studying the complexity of evolutionary processes affecting microbial evolution, we applied EGN to a dataset of 571,044 proteic sequences from the three domains of life and from mobile elements. We observed that, in Borrelia, plasmids play a different role than in most other eubacteria. Rather than being genetic couriers involved in lateral gene transfer, Borrelia's plasmids and their genes act as private genetic goods, that contribute to the creation of genetic diversity within their parasitic hosts. Conclusion: EGN can be used for constructing, analyzing, and mining molecular datasets in evolutionary studies. The program can help increase our knowledge of the processes through which genes from distinct sources and/or from multiple genomes co-evolve in lineages of cellular organisms. © 2013 Halary et al.; licensee BioMed Central Ltd.
Halary, S., McInerney, J. O., Lopez, P., & Bapteste, E. (2013). EGN: A wizard for construction of gene and genome similarity networks. BMC Evolutionary Biology, 13(1). https://doi.org/10.1186/1471-2148-13-146