Sign up & Download
Sign in

The genome of Bacillus coahuilensis reveals adaptations essential for survival in the relic of an ancient marine environment

by Luis David Alcaraz, Gabriela Olmedo, Germán Bonilla, René Cerritos, Gustavo Hernández, Alfredo Cruz, Enrique Ramírez, Catherine Putonti, Beatriz Jiménez, Eva Martínez, Varinia López, Jacqueline L Arvizu, Francisco Ayala, Francisco Razo, Juan Caballero, Janet Siefert, Luis Eguiarte, Jean-Philippe Vielle, Octavio Martínez, Valeria Souza, Alfredo Herrera-Estrella, Luis Herrera-Estrella show all authors
Proceedings of the National Academy of Sciences of the United States of America (2008)

Abstract

The Cuatro Ciénegas Basin (CCB) in the central part of the Chihuahan desert (Coahuila, Mexico) hosts a wide diversity of microorganisms contained within springs thought to be geomorphological relics of an ancient sea. A major question remaining to be answered is whether bacteria from CCB are ancient marine bacteria that adapted to an oligotrophic system poor in NaCl, rich in sulfates, and with extremely low phosphorus levels (<0.3 μM). Here, we report the complete genome sequence of Bacillus coahuilensis, a sporulating bacterium isolated from the water column of a desiccation lagoon in CCB. At 3.35 Megabases this is the smallest genome sequenced to date of a Bacillus species and provides insights into the origin, evolution, and adaptation of B. coahuilensis to the CCB environment. We propose that the size and complexity of the B. coahuilensis genome reflects the adaptation of an ancient marine bacterium to a novel environment, providing support to a marine isolation origin hypothesis that is consistent with the geology of CCB. This genomic adaptation includes the acquisition through horizontal gene transfer of genes involved in phosphorous utilization efficiency and adaptation to high-light environments. The B. coahuilensis genome sequence also revealed important ecological features of the bacterial community in CCB and offers opportunities for a unique glimpse of a microbe-dominated world last seen in the Precambrian.

Cite this document (BETA)

Available from Luis David Alcaraz's profile on Mendeley.
Page 1
hidden

The genome of Bacillus coahuilensis reveals adaptations essential for survival in the relic of an ancient marine environment

The genome of Bacillus coahuilensis reveals
adaptations essential for survival in the
relic of an ancient marine environment
Luis David Alcaraz*, Gabriela Olmedo*, Germa´n Bonilla†, Rene´ Cerritos†, Gustavo Herna´ndez‡, Alfredo Cruz‡,
Enrique Ramı´rez§, Catherine Putonti¶, Beatriz Jime´nez*‡, Eva Martı´nez*, Varinia Lo´pez*, Jacqueline L. Arvizu*,
Francisco Ayala*, Francisco Razo*, Juan Caballero*, Janet Siefert, Luis Eguiarte†, Jean-Philippe Vielle*‡,
Octavio Martı´nez*‡, Valeria Souza†, Alfredo Herrera-Estrella*‡, and Luis Herrera-Estrella*‡**
‡Laboratorio Nacional de Geno´mica para la Biodiversidad (Langebio), *Departamento de Ingenierı´a Gene´tica and §Departamento de Biotecnologı´a,
Cinvestav, Campus Guanajuato, AP 629 Irapuato, Guanajuato CP36500, Me´xico; †Departamento de Ecologı´a Evolutiva, Instituto de Ecologı´a, Universidad
Nacional Auto´noma de Me´xico, CU, AP 70-275 Coyoaca´n 04510 Me´xico D.F., Me´xico; Department of Statistics, Rice University, P.O. Box 1892, MS-138, Houston, TX
77251-1892; and ¶Departments of Computer Science and Biology and Biochemistry, 4800 Cullen Boulevard, University of Houston, Houston, TX 77204-5001
Contributed by Luis Herrera-Estrella, January 31, 2008 (sent for review December 10, 2007)
The Cuatro Cie´negas Basin (CCB) in the central part of the Chihua-
han desert (Coahuila, Mexico) hosts a wide diversity of microor-
ganisms contained within springs thought to be geomorphological
relics of an ancient sea. A major question remaining to be answered
is whether bacteria from CCB are ancient marine bacteria that
adapted to an oligotrophic system poor in NaCl, rich in sulfates, and
with extremely low phosphorus levels (<0.3 M). Here, we report
the complete genome sequence of Bacillus coahuilensis, a sporu-
lating bacterium isolated from the water column of a desiccation
lagoon in CCB. At 3.35 Megabases this is the smallest genome
sequenced to date of a Bacillus species and provides insights into
the origin, evolution, and adaptation of B. coahuilensis to the CCB
environment. We propose that the size and complexity of the B.
coahuilensis genome reflects the adaptation of an ancient marine
bacterium to a novel environment, providing support to a ‘‘marine
isolation origin hypothesis’’ that is consistent with the geology of
CCB. This genomic adaptation includes the acquisition through
horizontal gene transfer of genes involved in phosphorous utili-
zation efficiency and adaptation to high-light environments. The B.
coahuilensis genome sequence also revealed important ecological
features of the bacterial community in CCB and offers opportuni-
ties for a unique glimpse of a microbe-dominated world last seen
in the Precambrian.
evolution  genomic adaptation  horizontal gene transfer 
phosphorus stress  sulfolipids
The Cuatro Cie´negas Basin (CCB) is located in a valley740 mabove sea level in the state of Coahuila, Mexico, that measures
30 km by 40 km and is surrounded by highmountains (3,000 m)
(Fig. 1). CCB is an enclosed evaporitic basin that receives150mm
of annual precipitation. Despite the dry climate of the valley, the
CCB harbors an extensive system of springs, streams, and pools (1).
The CCB ecosystem is not only characterized by a high endemism
of macrooganisms and biodiversity of microorganisms (1, 2), but
also by extremely oligotrophicwaters that are unable to sustain algal
growth, making microbial mats the base of the food web (3). In
particular, phosphorus (P) levels in CCB appear to be rather low,
because they were below the level of detection of several methods
used (0.3 M) and the extremely high biomass C:P and N:P ratios
(100 by moles) previously reported for CCB stromatolites (3, 4).
Unlike the present sea, the Churince spring water is poor in NaCl
and carbonates, but it is rich in sulfates, magnesium, and calcium
(4). Characterization of the microbiological diversity by sequencing
16S rRNA genes revealed that nearly half of the phylotypes from
theCCBwere closely related to bacteria frommarine environments
(2). Bacillus coahuilensis is a free-living, spore-forming bacteria
isolated from the water column of a shallow desiccation lagoon in
the Churince system at CCB (4) (Fig. 1 A and B). A molecular
phylogenetic analysis of 16S rRNA sequences indicates that B.
coahuilensis is closely related to other marine Bacillus spp. (4), in
agreement with the theory of an ancient marine origin of these
ponds. We sequenced the genome of B. coahuilensis to gain insight
on the origin and genomic adaptation of this bacterium to the
extremely P-limited oligotrophic environment of theChurince pond
in CCB.
Results
General Genome Features. Sequencing of theB. coahuilensis genome
was accomplished by using a hybrid strategy of high-coverage
pyrosequencing (29) and a low-coverage Sanger sequencing (6)
(supporting information (SI) Table S1). The assembled genome
size is 3.35 Megabases (Mb), making it the smallest genome
reported to date forBacillus spp. (Fig. 1C). Its genome is composed
of a predicted 3,640 coding genes (5) andhas aGCcontent of 37.5%
(Table S2), with an average 1,100 orthologues shared with all 14
Bacillus genomes currently available (including Bacillus sp. NRRL
B-14911, a recently sequenced marine Bacillus from the Gulf of
Mexico) and 905 unique B. coahuilensis genes.
By using a phylogenomic approach (6), we reconstructed a
phylogeny through a concatenated alignment of 20 universally
distributed genes from theClusters ofOrthologousGroups (COGs)
that are considered largely unsusceptible to horizontal gene transfer
(6), and that are present in the 14 Bacillus spp. genomes (Table S3).
This analysis revealed that B. coahuilensis is basal to the other 14
Bacillus spp. genomes andmore related to themarine strain NRRL
B-14911 than any other Bacillus spp. Additionally, the short phy-
logenetic branch length suggests that the B. coahuilensis genome is
associated with a low substitution rate compared with the other
genomes, despite its small size (Fig. 1C). An extensive 16S analysis
shows that B. coahuilensis clearly forms a unique group, but also
supports a phylogenetic relation to marine isolates (Fig. S1).
However, at the genome level we found no signals that would
suggest the presence of bacteria closely related to B. coahuilensis in
the Global Ocean Sampling database (7) (data not shown). Taken
Author contributions: G.O., C.P., A.H.-E., and L.H.-E. designed research; L.D.A., G.H., A.C.,
E.R., B.J., E.M., V.L., J.L.A., F.A., and F.R. performed research; L.D.A., G.O., G.B., R.C., A.C.,
C.P., J.C., J.S., L.E., J.-P.V., V.S., O.M., and L.H.-E. analyzed data; and L.D.A., G.O., and L.H.-E.
wrote the paper.
The authors declare no conflict of interest.
Data deposition: The sequence from this Whole Genome Shotgun project has been depos-
ited in the DDBJ/EMBL/GenBank database (accession no. ABFU00000000). The version
described in this article is the first version (accession no. ABFU01000000).
**To whom correspondence should be addressed. E-mail: lherrera@ira.cinvestav.mx.
This article contains supporting information online at www.pnas.org/cgi/content/full/
0800981105/DCSupplemental.
© 2008 by The National Academy of Sciences of the USA
www.pnas.orgcgidoi10.1073pnas.0800981105 PNAS  April 15, 2008  vol. 105  no. 15  5803–5808
EV
O
LU
TI
O
N
Page 2
hidden
together, these results suggest that B. coahuilensis is ancestral to the
other sequenced Bacillus spp. and that the presence of this species
in the CCB did not originate from a recent migration through spore
disbursal from present marine habitats.
Lipid Profile Adaptation to a Low-Phosphorous Environment. Mem-
brane phospholipids constitute 30% of the total phosphate in
most organisms. In plants and cyanobacteria subjected to phospho-
rous deprivation, phospholipids can be replaced by non-P lipids
(such as sulfo- and galactolipids) to maintain membrane function-
ality and integrity, and releaseP to sustain other P-requiring cellular
processes (8). Interestingly, genes encoding sulfoquinovose syn-
thase (sqd1) and glycosyltransferase (sqdX), the two key enzymes in
the synthesis of sulfolipids, are present in B. coahuilensis. Because
the sulfoquinovose synthesis operon is absent in all other known
Bacillus spp. genomes, this finding suggests that the adaptation of
B. coahuilensis to the extremely low P concentration of CCB
includes the acquisition of these genes through horizontal gene
transfer (HGT). The B. coahuilensis genes are closely related to
cyanobacterial sqd1 and sqdX (Fig. 2A and Fig. S2), and the operon
arrangement is identical to that in Synechococcus sp. PC7942, where
these genes participate in the synthesis of sulfolipids (9) (Fig. 2B).
The structural prediction of the B. coahuilensis Sqd1 protein is
remarkably similar to the crystallized structure of Arabidopsis
thaliana SQD (1.2 Å; Fig. 2C); in particular, residues important for
the interactions with NAD and UDP-ribose are conserved in B.
coahuilensis. Reverse transcription PCR (RT-PCR) experiments
revealed that, in contrast to plants and cyanobacteria, sqd1 is
constitutively expressed and not induced by P limitation (Fig. 2D).
Thin layer chromatography revealed a spot in B. coahuilensis lipid
extracts with migration similar to that of pure sulfolipids, compa-
rable to that present in A. thaliana and in a cyanobacteria spp.
(isolated from CCB) but absent in lipid preparations from Bacillus
sp. NRRL B-91411 (Fig. 2E). Mass spectrometry analysis con-
firmed the presence of sulfolipids inB. coahuilensis (Fig. 2F andFig.
S3). The remarkable acquisition of constitutively expressed genes
allowing B. coahuilensis to replace membrane phospholipids with
sulfolipids is in agreement with genomic adaptations to extreme
phosphate limitation.
Light Sensing in a High-Radiation Environment. The presence of
genes encoding Bacteriorhodopsin (BR) in B. coahuilensis is rem-
iniscent of the abundance of BR genes in marine environmental
samples (7), suggesting an additional adaptation ofmarine bacteria.
The phylogeny of B. coahuilensis sensory BR (BSR) showed that its
closest orthologue is the Anabaena sp. PCC7120 rhodopsin (ASR)
(Fig. 3A). ASR was the first eubacterial SR identified (10) and
suggested to function as a photosensory receptor. The structural
prediction ofBSRand its comparisonwith crystallizedASRandSR
ofNatronomonas pharaonis (11, 12) indicates that all three proteins
contain seven conserved transmembrane helices and a Lys residue
that binds retinal (Fig. 3B). Alignment of ASR and BSR shows that
both have a Pro residue instead of the Asp (BSR203 and ASR206)
residue present in all other microbial rhodopsins (10). Evidence for
HGTof rhodopsins has recently beenobtained fromwhole-genome
sequencing and metagenomic projects, and is now thought to be a
frequent event inmarine bacteria from the photic zone and extreme
saline environments (13–15). The retinal chromophore of rhodop-
sin is synthesized as a cleavage product of carotenoids; thus, the
combination of carotenoid synthesis and rhodopsin genes has been
suggested to be sufficient for rhodopsin function (14). The genome
of B. coahuilensis also contains genes encoding crtB (phytoene
synthase) and crtICA2 (phytoene dehydrogenases) that could be
involved in retinal biosynthesis (Fig. S4). RT-PCR experiments
showed that the expression of bsr is constitutive and not light-
dependent (Fig. 3C). The high-radiation exposure prevalent in
shallow waters of CCB could explain the selection pressure respon-
sible for the maintenance and constitutive expression of the bsr
gene.
Nucleotide composition analysis (16) identified numerous
genomic islands containing genes likely to be acquired by HGT;
several have been annotated as HGT genes in B. subtilis and B.
halodurans (Fig. S5). Interestingly, our nucleotide composition
analysis did not identify bsr and sqd1 as HGT genes. We also
determined the Codon adaptation index (CAI) (Fig. S6) for bsr and
sqd1 which is of 0.719 and 0.735, respectively, indicating that these
genes have been present in B. coahuilensis long enough to undergo
amelioration to the average CAI (0.714).
Nitrogen Cycle Strategies and Feeding Capabilities. To gain insight
into the nutritional requirements and metabolic capabilities of B.
coahuilensis within the CCB microbial community, we analyzed
genes involved in ABC transport systems, one of the largest
paralogous families present in the genome of this bacterium.
Hidden Markov model (HMM) profiles were built for each ABC
importer family (17) and searched against 11Bacillus spp. genomes.
The analysis retrieved a total of 1,038 import systems. The twomost
abundant families among all Bacillus spp. were metal (MET) and
osmoprotectant (OTCN) importers (Fig. S6), which seems to be a
N 26°54'
N 26°50'
N 26°46'
W 102°20' W 102°12' W 102°04' W 102°56'
N 26°58'

e
d
a
r
r
ei
S
s
o
c
r
a
M
n
a
S
BCC
N
S
W E
B. anthracis Sterne
B. anthracis AMES
B. thuringiensis konkukian
B. cereus ATCC 14579
B. halodurans
B. clausii
O. iheyensis
B. licheniformis 14580
B. licheniformis
B. subtilis
G. kaustophilus
B. sp. NRRL B-14911
B. coahuilensis
Listeria monocytogenes
Listeria inocua
100
100
100
100
96
92
97
81
60
71
54
96
0.05
Megabases
35
35
35
35
35
43
44
35
46
46
43
52
45
37
GC %
B. cereus E33L
56
50
48
46
45
43
37
BA
C
0 1 2 3 4 5 6
Fig. 1. Marine origin of B. coahuilensis, isolated from a pond in the Chihua-
huan desert. (A) Sierra de San Marcos is a prominent mountain system in the
middle of CCB where 400 ponds sustain most of the biodiversity. The
geomorphological origin of CCB has been recently reviewed (2). (B) Churince
system (shown with a red triangle in A) consists of a springhead that feeds a
2-km-long stream with an intermediate lagoon terminating at a large shallow
desiccation lagoon. (C) Phylogenomic reconstruction. Maximum likelihood
phylogenomic reconstruction by using Tree-Puzzle (29) was carried out with
20 universally conserved COGs from the sequenced Bacillus spp. and closely
related species (Table S3). Maximum likelihood bootstrap percentage support
values are only indicated for major nodes; numbers in red represent tentative
timescale, in million years, calculated with the method proposed by Battistuzzi
et al. (30). Genome size (represented as bars) and GC content of each Bacillus
spp. genome is shown to the right.
5804  www.pnas.orgcgidoi10.1073pnas.0800981105 Alcaraz et al.
Page 3
hidden
characteristic for the group. B. coahuilensis returned 63 import
systems, thus being one of the genomes with a less absolute number
of import systems (Table S4), below the expected values for the
group. Nonetheless, the best represented families in B. coahuilensis
are also MET and OTCN. B. coahuilensis possesses a very high
proportion of iron-siderophore (ISVH) importers relative to ge-
nome size, a feature shared with the water-column marine bacilli
Oceanobacillus iheyensis, and Bacillus sp. NRRL B-14911. In addi-
tion, an operon coding for the ferric-enterobactin synthesis and
transport system fepBCD is shared withO. iheyensis. It also encodes
an Iron(III)-dicitrate and ferric enterobactin transporters, suggest-
ing thatmarine bacilli andB. coahuilensis are actively scavenging for
iron.
The proportion of polar amino acid and opine (PAO), as well as
D-L-Methionine (DLM) importers relative to the total number of
transporters and genome size is greater in B. coahuilensis than in
any otherBacillus. It also has one of the lowest ratios of oligopeptide
(OPN) importers, suggesting a specialization for the preferential
acquisition of single amino acids over oligopeptides. Experimental
results show that B. coahuilensis has an absolute requirement for 8
aa (four polar and three hydrophobic) and a partial requirement for
another 5 aa (three hydrophobic and two polar), confirming that
this bacterium depends on amino acid import (Table S4 and Fig.
S6). This overrepresentation of single-amino acid importers is
shared with the recently sequenced water-living beta-proteobacte-
ria Minibacterium massiliensis (18), suggesting that this feature
might be common in the reduced genomes of aquatic free-living
bacteria.
HMM profiles were built for 86 genes involved in the N2 cycle
(data not shown) and were searched against all sequenced Bacillus
spp. genomes. This revealed that B. coahuilensis has the lowest
number of N2 cycle enzymes of all sequenced Bacillus spp., lacking
an inorganic N pathway and most of the urea cycle and urea
degradation pathways (Fig. 4). It only possesses the canonical
transaminases, an ammonia transporter and a D-Ala transporter,
thus not being capable of performing the entire N cycle on its own.
Tyr 200/182
His 201/183
Thr 163/145
Lys 204/186
B. coahuilensis SQD1 model
Arabidopsis thaliana SQD1 crystal structure
UDP-glucose
NAD
SO4
C
B
Bacillus coahuilensis
Synechoccocus sp.
1 2 3 4 5 6
MGDG
SQDG
DGDG
1 2 3 4
D
E
A
Oriza sativa
Synechococcus sp. WH 8102
100
95
45
52
100
100
78
99
45
59
100
100
60
79
100
99
99
98
56
100
98
97
43
32
0.1
Firmicutes
Cyanobacteria
Actinobacteria
Archaea
Non cultivable
Plants and Algae
sqd1 sqdX egsAppiB
sqd1 sqdX
F
Nostoc sp. PCC 7120
Anabaena variabilis ATCC 29413
Nostoc punctiforme PCC 73102
Thermosynechococcus elongatus BP-1
Synechocystis sp. PCC 6803
Crocosphaera watsonii WH 8501
Trichodesmium erythraeum IMS101
Synechococcus sp. JA-3-3Ba
Chlamydomonas reinhardtii
Medicago trunculata
Spinacia oleracea
ChainA crystal structure Sqd1 A. thaliana
Arabidopsis thaliana
Bacillus coahuilensis
Rubrobacter xylanophilus
Syntrophobacter fumaroxidans MPOB
Thermoplasma volcanium GSS1
T. acidophilum
Prochlorococcus marinus NA TL2
P. marinus pastoris
P. marinus MIT 9312
Mycobacterium sp. JLS
gi|43443278
gi|43177758
gi|44133788
gi|43837657
gi|43443861
gi|43553861
gi|43089039
gi|43851695
gi|43200131
Haloarcula marismortui ATCC 43049
Synechococcus sp. JA-3-3Ab
ChainA crystal structure Sqd1 A. thaliana
P. marinus MIT 9211
9-Me-14:0
15-Me-16:0
100
%
C2 569.19
C1 597.23
A
270.98
B
329.01
0 250 500 750 900
m/z
-H
C1
B
C2
-H
SO2Na2+
598.23
838.5 841.44
840.45
839.46
Fig. 2. Acquisition of sulfoquinovose synthesis capabilities through HGT, an adaptation of B. coahuilensis to a phosphorus-limiting environment. (A)
Neighbor-joining phylogenetic reconstruction of SQD1 (31). (B) The operon structure of the sqd1 and sqdX resembles that of the Synecchococcus genes (9).
Flanking genes are ppi (peptidyl-prolyl cis-trans isomerase B) and egsA (glycerol-1-phosphate dehydrogenase) (not to scale). (C) Modeling (32) to the A. thaliana
SQD (PDB ID code 1I24) protein shows conservation of the NAD and UDP-glucose-binding residues. Diagrammic representation was done by using PyMol
(http://www.pymol.org). (D) Expression of the sqd1 gene. RT-PCR was carried out from RNA of B. coahuilensis grown under different phosphate concentrations
(lanes 1 to 5, RT-PCR products obtained from cells cultured with 0.001, 0.005, 0.05, 0.5, and 5 mM phosphate, respectively; lane 6, control without reverse
transcriptase). (E) TLC analysis reveals the presence of a probable SQDG band in B. coahuilensis, A. thaliana, Cyanobacteria sp. (lanes 1 to 3) but absent in Bacillus
sp. NRRL B-14911 (lane 4). (F) Mass spectroscopic analysis confirmed the identity of the sulfoquinovoside (Fig. S3).
Alcaraz et al. PNAS  April 15, 2008  vol. 105  no. 15  5805
EV
O
LU
TI
O
N
Page 4
hidden
B. coahuilensis codes for rocF, an arginase mainly found in Firmi-
cutes, but not for any other urea-cycle enzyme. All sequenced
Bacillus spp. code for a complete urea cycle (argFGH and rocF),
with the notable exception of O. iheyensis. Moreover, growth of B.
coahuilensis is impaired, but not completely arrested, when grown
without arginine, and this amino acid is one of those transported by
the overrepresented import family PAO. B. coahuilensis also lacks
the other gene that could provide a source of arginine from
citrulline, arcA. This suggests that arginine is an important amino
acid in this bacterium, and that it is most likely synthesizing arginine
by employing an unknown pathway. It was reported for example,
thatPseudomonas spp. synthesizes arginine by an arginine racemase
and a D-arginine dehydrogenase (19).
In summary, B. coahuilensis exhibits numerous auxotrophies and
high dependence on the N2 cycle, which is mainly carried out in the
CCB ponds by cyanobacteria, and is therefore also likely to be
highly dependent on the microbial community within this environ-
ment. This is in contrast with the genome from the marine strain
NRRL B-14911, which exhibits partial requirement for only 2 aa
and appears well suited to perform most of its N2 cycle on its own
(Fig. 4 and Table S5).
Genome Size Evolution and Essential Genes. The basal position of B.
coahuilensis (Fig. 1C) suggests that it is ancestral to the other
sequenced Bacillus spp. and related species. Although most of the
essential genes (20) of B. subtilis are conserved in B. coahuilensis,
many of those with redundant functions are absent (Table S6, Table
S7, and Table S8). Genes for endospore formation are conserved,
but the number of genes encoding components of the spore coat are
significantly underrepresented (Table S6), although this does not
compromise formation of heat-resistant spores.We also carried out
an analyses of the largest paralogous gene families within B.
coahuilensis compared with those in B. subtilis (21), as a means of
identifying important functions that are maintained in a small
genome. Although B. subtilis has 13 paralogous gene families of5
members involved in secondarymetabolites biosynthesis, transport,
and catabolism, these are almost completely absent in B. coahuilen-
sis (COG Q category, 1 gene family of 5 members). Signal
transduction, in contrast, seems to be overrepresented in B.
coahuilensis (COG T category, 57 gene families of 5 members)
compared with the same category in B. subtilis (1 gene family of5
members) suggesting that environmental monitoring is key to
survival in the CCB environment (Fig. S7). Analysis of all known
B. subtilis genes involved in the synthesis of cell wall and membrane
components reveals that B. coahuilensis also lacks genes necessary
for the synthesis of polyphosphate-rich teichoic and polyanion
teichuronic acids, useful cell wall phosphorus storage compounds.
Because teichoic acid synthesis genes are not only present in several
Bacillus species, but also in Listeria spp. which are used as an
outgroup for comparing the Bacillus genus (Table S7), it is likely
that B. coahuilensis lost the capacity to produce teichoic acid
because of the P-limited environment of CCB. Of the 45 genes that
A Uncultured rhodopsinsCandidatus Pelagibacter ubique Bop
Uncultured Bop
Uncultured Bop
Exiguobacterium sibiricum Bop
Pyrocistis lunula rhodopsin
Salinibacter ruber Xop
Gleobacter violaceus Sop1
Chlamydomonas reinhardtii Sop2
C. reinhardtii Sop1
Neurospora crassa Sop
Nostoc sp. PCC 7120 Sop
Bacillus coahuilensis Sop
Salinibacter ruber Sop1
Salinibacter ruber Sop2
Halorubrum xinjiangense Bop
Rubrobacter xylanophilus Bop
Salinobacter ruber Hop
Kineococcus radiotolerans Bop
Haloarcula marismortui Sop1
Natronomonas pharaonis Sop1
Haloarcula marismortui Sop1
Haloquadratum walsbyi Bop1
Haloarcula marismortui Bop precursor
Haloquadratum walsbyi Bop2
Haloarcula marismortui Bop
Natronomonas pharaoinis Hop
Haloquadratum walsbyi Hop
Haloarcula marismortui Hop
Natronomonas pharaonis Sop1 (non-functional)
Haloarcula marismortui Bop related
Salinibacter ruber bop related
57
46
100
100
100
67
100
99
77
38
72
89
47
52
47
36
12
98
85
94
58
37
48
46
24
32
37
27
20
16
84
0.2
Proton-Pump
rhodopsins
Halorhodopsins
Sensory
rhodopsins
BSR 64 TTIYYARYIDWVISTPLLLAALALTAMFGGKKN
ASR 66 QIAHYARYIDWMVTTPLLLLSLSWTAMQFIKKD
* ** ***
BSR 156 TRLARHYTRVAIYLSVLWVCYPTAWLLGPSGLG
ASR 159 SELANLYDKLVTYFTVLWIGYPIVWIIGPSGFG
*** *
ASR 192 WINQTIDTFLFCLLPFFSKVGFSFLDLHGLRNL
BSR 189 LAQELTEVLVFIILPIFSKVGFSIVDLHGLRKL
* *
Helix C
Helix F
Helix G
SRII 66 RTVFVPRYIDWILTTPLIVYFLGLLAGLDSREF
SRII 154 SGIKSLYVRLRNLTVVLWAIYPFIWLLGPPGVA
SRII 187 LLTPTVDVALIVYLDLVTKVGFGFIALDAAATR
Helix A
Helix B
Helix C
Helix D
Helix E
Helix F
Helix G Phe 210/213
Leu 202/205
Lys 207/210
Pro 203/206
B. coahuilensis SOP model
Nostoc sp. PCC 7120 SOP crystal structure
Retinal
C 1 2 3 4
B
Fig. 3. B. coahuilensis contains a sensory bacteriorhodopsin possibly ac-
quired from cyanobacteria through an ancient HGT event. (A) Neighbor-
joining tree showing the phylogenetic diversity of rhodopsins (31). (B) Bsr
possesses all of the residues involved in retinal binding and was modeled (32)
to the predicted structures of Anabaena and Natronomonas pharaonis. Align-
ment of segments of SR from B. coahuilensis (BSR, BM4401574), Anabaena sp.
PCC7120 (ASR, PDB ID code 1XIO), and Natronomonas pharaonis DSM2160
(SRII, PDB ID code 1JGJ) show conservation of residues in the retinal-binding
pocket (marked with asterisks) except for a Pro residue (positions BSR203 and
ASR206), which is an Asp residue in all other microbial rhodopsins (blue
rectangle; alignment adapted from ref. 10). Diagrammatic representation
was done by using PyMol (http://www.pymol.org). (C) bsr is expressed in B.
coahuilensis grown either under white light or in the dark. RT-PCR was carried
out by using RNA obtained from bacteria grown under dark or white-light
conditions (lanes 1 and 2, respectively). Lanes 3 and 4 are controls without
reverse transcriptase.
B. anthracis str. Ames
B. cereus E33L
B. thuringiensis konkukian
B. licheniformis ATCC14580
B. subtilis str. 168
B. clausii KSM-16
B. halodurans C-125
O. iheyensis HTE831
G. kaustophilus HTA426
B. sp. NRRL B14911
B. coahuilensis
l
A
r
g
r
A
D
p
s
A
A
r
a
C
B
g
a
D
A
c
r
a
.
A
h
d
G
h
n l
G
A
vl
I
E
L
t
y
C
p
b
P
C
g
r
A
F
g
r
A
G
g
r
A
H
e
r
C
A
s
c
a
b
.
A
h
d
G
u
m
H
p
r
t
N
C
T
p
r
C
T
p
r
F
T
y
r
B g
R
I
t
u
H
H
p
a
N
A h
C
i
c
d
L
A i
ti
h
C
n
c
t
N
A
c
o
R
F i
h
C
A
l l
A
C
c
y
C
A
tl
G
B r
a
N
I
A
s
a
N
t
a
H i
h
C
l
g
a
N
Z
r
a
N
G
r
a
N
H r
a
N
J
nl
G
B
ri
N
A
i
h
C
D
c
r
A
A o
c
e
.
A
h
d
G
l
H
c
p
e
r
U
A
e
r
U
B
e
r
U
C
c
r
A
C
r
o
N
B
r
o
N
Z
Fig. 4. B. coahuilensis lacks many of the genes coding for enzymes involved
in nitrogen metabolism. Hidden Markov models for the bacterial enzymes
involved in N2 metabolism were constructed to detect these genes in all of the
sequenced Bacillus spp. Bars in different colors denote the presence of a gene
predicted to code for a given enzyme.
5806  www.pnas.orgcgidoi10.1073pnas.0800981105 Alcaraz et al.
Page 5
hidden
constitute the PhoP/PhoR regulon in B. subtilis, only 24 are present
in B. coahuilensis (Table S8). B. coahuilensis lacks the pit gene
encoding a low-affinity P transporter that is conserved in Listeria
spp. and many Bacillus spp. (except for B. halodurans, B. clausi, and
O. iheyensis) suggesting that pit was also lost in B. coahuilensis
because it provided no advantage in the CCB low-P environment.
Interestingly, the pit gene is also absent in themarine cyanobacterial
genomes (22). Overall, several genomic characteristics seem to be
the result of evolutionary mechanisms that caused a reduction in
genome size and selected against redundant genes and genes that
provided little or no advantage for survival in the oligotrophic
environment of the CCB pools. However, given the basal position
of B. coahuilensis in the phylogenomic analysis, it is also possible
that some of the genes it is lacking, but which are present in other
Bacillus spp., may represent gains of bacilli species that evolved
later.
Discussion
P limitation seems to be an important driving force for the
adaptations observed in theB. coahuilensis genome. This bacterium
has the capability of synthesizing membrane sulfolipids, apparently
acquired byHGT froma cyanobacteria. Sulfolipid SQDG is present
in all higher plants, mosses, ferns, and algae, but it has also been
reported in nonphotosynthetic organisms, predominantly in cya-
nobacteria (Fig. 3A). Discovery of this capability in the bacilli is
especially interesting given the context of an environment with
extremely limiting P. HGT of sqd1 and sqdX seems to be frequent
in cyanobacteria and, because it involves transfer of only a couple
of genes, it seems relatively simple for a bacteria to appropriate this
mechanism to cope with P scarceness. Finally, genome-size reduc-
tion in an ecosystem with very low P availability, such as the CCB
pools, could increase the fitness of a bacterium with a reduced P
demand for nucleotide synthesis.
The predicted rhodopsin sequence of B. coahuilensis is phylo-
genically closer to sequences from cyanobacteria than to those
reported in other bacteria and archeobacteria. Because retinal,
which is derived from carotenoids, is probably readily available in
many of the pigmented CCB bacteria, the acquisition of rhodopsin
genes from cyanobacteria could easily lead to a new functional
adaptation.B. coahuilensis genome encodes a couple of photolyases
that repair UV-inducedDNA lesions and could explain, along with
the bsr gene, the adaptation of B. coahuilensis to the high-light
fluency in the shallow and clear waters of the CCB ponds.
The incidence of both rhodopsin and sulfolipid biosynthesis
genes in environmental samples is spatially restricted. We searched
the CAMERA database (23) for homologue genes to the B.
coahuilensis’ bsr and sqd1 and found only four sample points within
the Global Ocean Sampling (GOS) having both genes represented:
GS027, GS031, GS031, and GS034, which are located in shallow
waters of the Galapagos Islands. Two important common features
betweenCCBponds and theGalapagos Islands are a high-radiation
environment in shallow waters and the lack of P. The fact that sqd1
and bsr coexist in two distinct geographically distant locations with
common environmental characteristics is suggestive of common
adaptation strategies. P starvation seems to be a general constraint
in marine environments, as shown in ref. 7.
Analysis of paralogous gene families revealed a dramatic limi-
tation in genes for secondary metabolism inB. coahuilensis possibly
as a result of both genome-size reduction and adaptation to a
unique niche (24, 25). The large number of auxotrophies, the
limitation in N2 cycle genes (Fig. 4), and overrepresented signal
transduction genes suggest that this bacterium is a specialist within
theCCBwater systems, dependent and tightly bound to the primary
producers and prepared for sensing and responding to the extreme
seasonal environmental conditions (2). This is in contrast to pan-
demic and generalist bacteria like B. subtilis with its arsenal of
secondary metabolism genes.
Our findings show that HGT played a key role in the adaptations
of B. coahuilensis. We are currently exploring whether transposons
(of which there are 20 in B. coahuilensis), phages [known to be
abundant in ponds of the CCB (26)], and natural competence could
provide the mechanisms driving changes in the B. coahuilensis
genome as well as in the other bacteria in the community. We have
isolated numerous cyanobacteria in the Churince pond (27).
The strategy of sequencing a single bacterial isolate to obtain
information on the adaptations of bacteria living in a highly
oligotrophic environmentwas highly fruitful because it led to a clear
identification of the phylogenetic affiliation of some ecological
functions. A metagenome approach might have obscured the
phylogenetic association of genes of cyanobacterial origin to firmi-
cutes and would not have allowed us to observe the important
nitrogen cycle and amino acid synthesis limitations that make this
bacterium dependent on the microbial community. The small
genome size, constraints in secondary metabolism, and overrepre-
sentation of signal transduction genes are also features that could
not have been deduced from a metagenomic approach. Efforts to
study the metagenomics of the CCB ponds should be made to help
us understand how these ancient water ecosystems in the middle of
the desert are self-sufficient in their biogeochemical cycles.
Our results provide evidence that, because of the specific genome
dynamics, its ancestry, and the local adaptive response, B.
coahuilensis is most likely the result of adaptation of an ancient
marine bacterium to a novel environment. Results that are in
agreement with the geology/ontology model that predicts a marine
environment for this region in the early Jurassic followed by the
rising of the continent, the formation of the CCB valley, and its
isolation by the surrounding Sierras in the Cretaceous period 70
million years ago (2). B. coahuilensis is likely a primitive bacterial
component of a complex community that included Archaea and
Cyanobacteria that provided genomic fodder for gene transfer and
the implementation of innovative and necessary strategies for
survival in an evolving ecology.
Materials and Methods
Strains. B. coahuilensis was provided by V.S (4) and NRRL B-14911 by J.S. The
media used for growth are described in SI Text.
Genome Sequencing and Annotation. B. coahuilensis was sequenced by a hybrid
Sanger/454 approach. The entire genome sequence was obtained from a com-
bination of 16,698 end sequences (providing 6-fold coverage) from a pUC18
genomic shotgun library (2–5 kb), by using dye terminator chemistry on auto-
mated DNA sequencers (ABI3700; Applied Biosystems) and 454 technology with
seven runs at a 29-fold coverage. Predicted protein-encoding genes were man-
ually refined (see SI Text) and automatically annotated by using the BASys
system (5).
Prediction ofHorizontal Transfer.The Similarity Plot (S-plot) application was used
(16) to identify windows that contained regions of unusual compositional prop-
erties (RUCPs) within B. coahuilensis genome (see SI Text).
RT-PCR.SemiquantitativeRT-PCRswerecarriedoutbyusingSuperScriptOneStep
RT-PCR with Platinum Taq (Invitrogen Life Technologies) (see SI Text) from RNA
isolated from strains grown in modified marine medium supplemented with
phosphate. For light/dark experiments, strain was grown on Petri dishes with
marine medium grown at 37°C either under white or blue light or in the dark.
Lipid Extraction and Analysis. Lipids from Arabidopsis, Cyanobacteria spp., and
B. coahuilensis were extracted (details are available on request) and analyzed by
using the TLC technique as described in ref. 28. For lipid footprint analysis,
individual lipids were isolated from TLC plates, and duplicates of each lipid spot
were analyzed by MALDI-TOF MS technology (see SI Text).
ACKNOWLEDGMENTS. We thank Laura Espinosa Azuar and Antonio Cruz in
technical assistance at Instituto de Ecologı´a, Universidad Nacional Auto´noma
de Me´xico; Dr. Michael Travisano (University of Minnesota) and June Simpson
(Cinvestav Campus Guanajuato) for their comments on the manuscript. This
work was supported in part by Secretarı´a de Agricultura, Ganaderı´a, Desar-
rollo Rural, Pesca y Alimentacio´n (SAGARPA) Zea-2006 and Consejo Nacional
Alcaraz et al. PNAS  April 15, 2008  vol. 105  no. 15  5807
EV
O
LU
TI
O
N
Page 6
hidden
de Ciencia y Tecnologı´a (CONACyT)-Secretaria de Educatio´n Pu´blica 43979
grants and Howard Hughes Medical Institute Grant 55005946 (to L.H.-E.);
CONACyT IN Grants 223105 and 44673Q (to V.S. and L.E.); CONACyT-Secretaria
de Medio Ambiente y Recursos Naturales Grant 2002-C01-0237 (to V.S.);
Exobiology Grant NG04GJ12G (to J.S.); and graduate scholarships from
CONACyT (to L.D.A., R.C., and J.C.).
1. Minckley WL (1969) Environments of the Bolson of Cuatro Cie´negas, Coahuila,Me´xico,
with Special Reference to the Aquatic Biota.University of Texas: El Paso Science Series
(Texas Western Press, El Paso, TX).
2. Souza V, et al. (2006) An endangered oasis of aquatic microbial biodiversity in the
Chihuahuan desert. Proc Natl Acad Sci USA 103:6565–6570.
3. Elser JJ, et al. (2005) Effects of phosphorus enrichment and grazing snails on modern
stromatolitic microbial communities. Freshwater Biol 50:1808–1825.
4. Cerritos R, et al. (2008) Bacillus coahuilensis sp. nov. a new moderately halophilic
species from different pozas in the Cuatro Cie´negas Valley in Coahuila, Me´xico. Int J
Syst Evol Microbiol 58:919–923.
5. Van Domselaar GH, et al. (2005) BASys: A web server for automated bacterial genome
annotation. Nucleic Acids Res 33:W455–W459.
6. Ciccarelli FD, et al. (2006) Toward automatic reconstruction of a highly resolved tree of
life. Science 311:1283–1287.
7. Rusch DB, et al. (2007) The Sorcerer II Global Ocean Sampling Expedition: Northwest
Atlantic through Eastern Tropical Pacific. PLoS Biol 5:e77.
8. Dormann P, Benning C (2002) Galactolipids rule in seed plants. Trends Plants Sci
7:112–118.
9. Benning C (1998) Biosynthesis and function of the sulfolipid sulfoquinovosyl diacyl-
glycerol. Annu Rev Plant Physiol Plant Mol Biol 49:53–75.
10. Jung KH, Trivedi VD, Spudich JL (2003) Demonstration of a sensory rhodopsin in
eubacteria. Mol Microbiol 47:1513–1522.
11. Vogeley L, et al. (2004) Anabaena sensory rhodopsin: A photochromic color sensor at
20 A. Science 306:1390–1393.
12. Luecke H, Schobert B, Lanyi JK, Spudich EN, Spudich JL (2001) Crystal structure of
sensory rhodopsin II at 2.4 angstroms: Insights into color tuning and transducer
interaction. Science 293:1499–1503.
13. Mongodin EF, et al. (2005) The genome of Salinibacter ruber: Convergence and gene
exchange among hyperhalophilic bacteria and archaea. Proc Natl Acad Sci USA
102:18147–18152.
14. Frigaard NU, Martinez A, Mincer TJ, DeLong EF (2006) Proteorhodopsin lateral gene
transfer between marine planktonic Bacteria and Archaea. Nature 439:847–850.
15. Venter JC, et al. (2004) Environmental genome shotgun sequencing of the Sargasso
Sea. Science 304:66–74.
16. Putonti C, et al. (2006) A computational tool for the genomic identification of regions
of unusual compositional properties and its utilization in the detection of horizontally
transferred sequences. Mol Biol Evol 23:1863–1868.
17. Bouige P, Laurent D, Piloyan L, Dassa E (2002) Phylogenetic and functional classification
of ATP-binding cassette (ABC) systems. Curr Protein Pept Sci 3:541–559.
18. Audic S, et al. (2007) Genome analysis of Minibacterium massiliensis highlights the
convergent evolution of water-living bacteria. PLoS Genet 3:e138.
19. Jann A, Matsumoto H, Haas D (1988) The fourth arginine catabolic pathway of
Pseudomonas aeruginosa. J Gen Microbiol 134:1043–1053.
20. Kobayashi K, et al. (2003) Essential Bacillus subtilis genes. Proc Natl Acad Sci USA
100:4678–4683.
21. Pushker R, Mira A, Rodriguez-Valera F (2004) Comparative genomics of gene-family
size in closely related bacteria. Genome Biol 5:R27.
22. Su Z, et al. (2003) Computational inference of regulatory pathways in microbes: An
application to phosphorus assimilation pathways in Synechococcus sp. WH8102. Ge-
nome Inform 14:3–13.
23. Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M (2007) CAMERA: A community
resource for metagenomics. PLoS Biol 5:e75.
24. Mira A, Pushker, R (2005) The silencing of pseudogenes. Mol Biol Evol 22:2135–2138.
25. Mira A, Pushker R, Rodriguez-Valera F (2006) The Neolithic revolution of bacterial
genomes. Trends Microbiol 14:200–206.
26. Desnues C, et al. (2008) Biodiversity and biogeography of phages in modern stroma-
tolites and thrombolites. Nature 452:340–343.
27. Falcon LI, Cerritos R, Eguiarte LE, Souza, V (2007) Nitrogen fixation in microbial mat and
stromatolite communities from Cuatro Cienegas. Mexico Microb Ecol 54(2):363–373.
28. Hartel H, Dormann P, Benning C (2000) DGD1-independent biosynthesis of extraplas-
tidic galactolipids after phosphate deprivation in Arabidopsis. Proc Natl Acad Sci USA
97:10649–10654.
29. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: Maximum
likelihood phylogenetic analysis using quartets and parallel computing.Bioinformatics
18:502–504.
30. Battistuzzi FU, Feijao A, Hedges SB (2004) A genomic timescale of prokaryote evolu-
tion: Insights into the origin of methanogenesis, phototrophy, and the colonization of
land BMC. Evol Biol 4:44.
31. Kumar S, Tamura K (2004) Nei M MEGA3: Integrated software for molecular evolu-
tionary genetics analysis and sequence alignment. Brief Bioinform 5:150–163.
32. Schwede T, Kopp J, Guex N, Peitsch MC (2003) SWISS-MODEL: An automated protein
homology-modeling server. Nucleic Acids Res 31:3381–3385.
5808  www.pnas.orgcgidoi10.1073pnas.0800981105 Alcaraz et al.
Page 7
hidden
Supporting Information
Alcaraz et al. 10.1073/pnas.0800981105
Supporting Text
Strains and Media. Bacillus coahuilensis was isolated by the group
of V.S. from water column in the desiccation pond named
Laguna Grande of the Churince system. NRRLB14911 was
isolated fromwater columns in theGulf ofMexico by J.S.Marine
medium (US Biological) was used for general growth purposes
and a modified marine medium was used to determine amino
acid requirements and growth under different phosphate con-
centrations. This modified marine medium contained 0.5%
casein hydrolysate (Difco), 5.48 mM sorbitol, 0.41 mM Fe (III)
citrate, 40 mM MgSO47H2O, 0.33 M NaCl, 9 mM CaCl2, 0.67
mM KCl, 0.67 mM KBr, 0.47 mM NaF, 0.35 mM H3BO3, 0.19
mM (NH4)NO3, 4.7g/ml vitamin B complex. The concentration
of Na2HPO4 was varied as indicated. pH was adjusted to 7 with
10 M NaOH.
Genome Sequencing. The genomic DNA of M44 was isolated, by
using a standard technique, from pure cells cultured in marine
medium (US Biological). Shotgun libraries were prepared using
Escherichia coli DH12S, a host suited for the stable cloning of
methylated DNA (Invitrogen). The entire genome sequence was
obtained from a combination of 16,698 end sequences (providing
6-fold coverage) from a pUC18 genomic shotgun library (2–5 kb)
using dye terminator chemistry on automated DNA sequencers
(ABI3700, Applied Biosystems) and 454 technology with seven
runs and a 29-fold coverage. Synteny-guided gap closure for
some contigs was performed by PCR direct sequencing using
primers designed to anneal to each end of the neighboring
contigs. Tail PCR was carried out from ends of contigs for which
we had no synteny information. Fifteen scaffolds were assem-
bled. The ends of 43 contigs consisted of highly similar repeat
sequences and we did not attempt to determine their exact order
because these sequences did not provide new gene information.
A pseudogenome was assembled for the annotation process.
Genome Analysis. Two independent assemblies were performed,
one assembly of the 454 by using its Newbler assembler (http://
www.454.com/enabling-technology/the-software.asp) and one
assembly of the Sanger reads by using Phred-Phrap (1) with
default parameters in both cases.With the two sets of assemblies,
all Vs. all alignments were performed with the MUMmer
package (2) by using regions with a minimum overlap of 100 bp.
A new consensus assembly of the hybrid sequences was con-
ducted by using the CAP3 assembler (3) and manually curated.
To assess the orientation and synteny conservation of the newly
assembled fragments, we made Promer alignments (2) and plots
versus B. cereus E33L, B. cereus ATTC 10987, B. cereus ATTC
14579, B. anthracis str. ‘Ames,’ B. anthracis str. ‘Ames ancestor,’
B. anthracis str. Sterne, B. thuringiensis serovar konkukian, B.
licheniformis ATCC 14580, B. halodurans C-125, Geobacillus
kaustophilus HTA426, B. subtilis str. 168, Oceanobacillus ihey-
ensis HTE831, B. clausii KSM-K16, and Bacillus sp. NRRLB-
14911.
Gene prediction was done by using Glimmer v3.02 (4) and
GeneMark.hmm (5) for prokaryotes, followed by the automatic
annotation pipeline using BASys system (6), which retrieves
sequence information from SWISS PROT, InterPro, Pfam,
PROSITE, COG, GO, KEGG, and NCBI NR databases, among
other databases. Transfer RNAs were detected by using
tRNAscan-SE (7). Annotations were checked manually and
frameshift corrections were performed by using FrameD (8)
along with the original reads. Extra annotation of small (100
bp) ORFs of known function was done manually. Sequence
manipulation, parsing, and statistical features like genome size,
GC%, codon usage, and CAI were obtained through the EM-
BOSS 3.0 package (9) and Perl scripts.
Phylogenomics analyses were done according to Ciccarelli et
al. (10) after first searching all of the universally conserved
COGs across all of the sequenced Bacillus spp. resulting in a list
of 20 genes (see supporting information (SI) Table S3). Each
translated sequence was aligned individually first with MUSCLE
(11) and then concatenated. Phylogeny was reconstructed using
tree-puzzle (12) with the following criteria: quartet puzzling,
approximate quartet likelihood, 1,000 puzzling steps, exact pa-
rameter estimates,WAG (13)model of substitution, and gamma-
distributed rates estimated from the dataset.
Orthologues between the Bacillus spp. were obtained through
best bidirectional blastp (14) with a cutoff value of 1e10 of at
least 70% the length of the product. Paralogous analysis was
done as described by Pushker et al. (15). Metabolic pathway
reconstruction was done by using KEGG Automated Annota-
tion Server (http://www.genome.ad.jp/kegg/kaas/help.html).
HMM profiles (16) were built and calibrated for 51 translated
genes related to the nitrogen cycle comprising all of the series of
assimilatory and dissimilatory reactions involving both inorganic
and organic forms of nitrogen. The protein sequences used for
the HMM search included: AllC, Alr, ArcA, ArcC, ArgD, ArgF,
ArgG, ArgH, AspA, CarB, Chi, ChiA, ChiD, ChiHal, Chitin,
CreA, CycA, DagA, GdhA.arch, GdhA.bacsu, GdhA.ecol,
GlnA, GlnB, GltB, Hcp, Hmp, HutH, IlvE, LdcA, LytC, NagZ,
NapA, NarG, NarH, NarI, NarJ, NasAt, NirA, NorB, NorZ,
NtcA, NtrC, PbpC, RgI, RocF, TrpC, TrpF, TyrB, UreA, UreB,
and UreC.
ABC importers’ gene families were searched for by means of
building and calibrating HMM profiles for each of the import
families deposited in the ABCISSE (17) database (http://
www.pasteur.fr/recherche/unites/pmtg/abc/database.iphtml).
We also performed Wilcoxon’s Signed-Ranks Matched Pairs
Test between the ABC importers within the genomes of B.
coahuilensis, NRRL, B. subtilis, and O. iheyensis and found
differences between all of them (P  0.05).
To identify windows that contained regions with unusual
composition properties (RUCPs) within B. coahuilensis genome
the Similarity Plot (S-plot) (18) application was used. This
method was recently presented as an alternative means for
identifying potential horizontally transferred elements. To assess
the degree and pattern of similarity (or dissimilarity) between
two genomic sequences of size M1 and M2, the genomes are
divided into windows of length w slide along each genome with
steps (the distance between the start of two neighboring win-
dows) of size s. Similarity is quantified by using the Pearson
correlation coefficient between the frequencies of n-mers (short
subsequences of length n). By first comparing the genome of
interest against itself, the degree of homogeneity of the genome
can be determined as the average correlation value across all
windows. Next, the degree of similarity of each window with
respect to its own genome is calculated as the average of the
correlation coefficients for each window against all other win-
dows in the genome for which it is located. Because it is our
intent to identify foreign DNA, windows that are unusually
dissimilar to the rest of their genome are of particular interest.
Each of the 115 RUCPs of B. coahuilensis is compared with
430 complete as well as partial sequenced microbial genomes
available from the NCBI database by using S-plot. Included in
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 1 of 23
Page 8
hidden
these genomes are 10 B. anthracis (3 complete, 7 whole genome
shotguns), 3 B. cereus, 1 B. clausii, 1 B. halodurans, 2 B.
licheniformis, 1 B. subtilis, 1 B. thuringiensis, 1 Geobacillus
kaustophilus, and 1 Oceanobacillus iheyensis genomes. For each
RUCP the most similar window in all of these genomes was
identified. Seven RUCPs were identified as having a matching
window,0.7 in one of these other microbial genomes indicating
a highly correlated/similar sequence. These windows are likely to
share the same functionality with the identified windows in the
other microbial genomes. Therefore, these windows are not
likely to have been introduced by HGT. The remaining 108
windows may have been introduced into the B. coahuilensis
genome as a result of HGT.
Additionally, a second search round for HGT elements was
performed by means of best bidirectional blastp (14) hit with an
evalue of 1e10 by using as query all of the genes of B.
coahuilensis against a cyanobacterium and Archaea database
retrieved from KEGG database including the following species
(KEGG organism): syn, syw, syc, syf, syd, sye, syg, cya, cyb, tel,
gvi, ana, ava, pma, pmm, pmt, pmn, pmi, pmb, pmc, pmf, pmg,
pme, ter, mja, mmp, mmq, mac, mba, mma, mbu, mtp, mhu, mla,
mem, mth, mst, mka, afu, hal, hma, hwa, nph, tac, tvo, pto, pho,
pab, pfu, tko, ape, smr, hbu, sso, sto, sai, pai, pis, pcl, tpe, and
neq. Results from the bidirectional blastp were filtered to
exclude all genes shared between the Bacillus spp. to discrimi-
nate potential housekeeping genes and parse a list of unique
genes shared between B. coahuilensis, cyanobacteria, and Ar-
chaea.
Retrieval of B. coahuilensis rhodopsin, SQD and SQDX
orthologous sequences was done through a blastp (e value 1e10,
length 70%, ID30%) versus NR and environmental databases
of NCBI. Multiple alignments were made by using ClustalW (19)
with the BLOSUM62 matrix. Phylogenetic analyses were done
by using MEGA 3.1 (20) and a neighbor-joining reconstruction
with the following parameters: 1,000 replicates bootstrap
(seed  24,054), by using Poisson correction substitution model
for amino acids, gaps using complete deletion, assuming inde-
pendent evolution for each amino acid sequence, and pairwise
deletion was used when comparing translated sequences from
different organisms.
Homology modeling of B. coahuilensis SQD and rhodopsin
with crystal structures from Arabidopsis and Anabaena sp. PCC
7120, respectively, was done by using the SWISS-MODEL web
server (http://swissmodel.expasy.org) (21) by Deep View (Swiss-
Pdb Viewer) (22). PDB templates used were 1I24 for SQD and
1XIO for rhodopsin. WhatCheck summary of SWISS-MODEL
reports the following values for SQD and rhodopsins, respec-
tively, were as follows: structure Z-scores: first generation pack-
ing quality: 0.178, 0.242; second generation packing quality
1.094,0.127; Ramachandran plot appearance:0.075, 0.255;
chi-1/chi-2 rotamer normality: 1.221, 0.209; backbone confor-
mation: 0.464, 0.198; and root mean square (rms) Z-scores:
bond lengths: 0.690, 0.810; bond angles: 0.950, 1.214; omega
angle restrains: 0.874, 0.686; side chain planarity: 1.302, 1.318;
improper dihedral distribution: 1.541, 1.302; inside/outside dis-
tribution: 1.147, 1.302. Complete WhatCheck and coordinates
reports are available on request. Diagram images were produced
in Pymol Version 0.99rc6 (http://www.pymol.org).
RT-PCR. Semiquantitative RT-PCRs were carried out by using
SuperScript One Step RT-PCR with Platinum Taq (Invitrogen
Life Technologies) at 20, 25 ,and 30 cycles. Oligos for bacterio-
rhodopsin bsr: 5 TCGCTATGGTCATCCCGTTGTGG (for-
ward); 5 AGAGGGACCTAATAGCCATGCAG (reverse).
Oligos for sqd1: 5 TGCGCCTTACAGTATGATTGACC (for-
ward) and 5 AAGCCCTTGTTTGTTCTCCTGAT (reverse).
RNA was obtained by using TRIzol (Invitrogen Life Technol-
ogies) from strains grown in modified marine medium supple-
mented with phosphate at 0.001, 0.005, 0.05. 0.5, and 5 mM. For
light/dark experiments, strain was grown on Petri dishes with
marine medium grown at 37°C either under white or blue light
or in the dark.
Lipid Extraction and Analysis. Lipids from Arabidopsis, Cyanobac-
teria spp., and B. coahuilensis were isolated (details are available
on request). Lipid extracts were observed and isolated by using
the TLC technique as described (23). For lipid footprint analysis,
individual lipids were isolated from TLC plates, duplicates of
each lipid spot were analyzed by MALDI-TOF MS technology.
Spots corresponding to SQDG were isolated and eluted with 4
volumes of CH3Cl: CH3OH (2:1 V/V) and 1 volume of 0.9%
NaCl. The chloroformic fraction was extracted and evaporated
under a constant N2 stream and resuspended in 100 l of
CH3Cl/CH3OH/CH3Cl/CH3OONa (300:665:35 V/V) and ana-
lyzed by electro spray ionization MS-MS.
1. Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II.
Error probabilities. Genome Res 8:186–194.
2. Kurtz S, et al. (2004) Versatile and open software for comparing large genomes.
Genome Biol. 5:R12.
3. Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res
9:868–877.
4. Delcher AL, Bratke KA, Powers EC, Salzberg SL (2007) Identifying bacterial genes and
endosymbiont DNA with Glimmer Bioinformatics btm009.
5. Lukashin AV, Borodovsky M (1998) GeneMark.hmm: New solutions for gene finding.
Nucleic Acids Res 26:1107–1115.
6. Van Domselaar GH, et al. (2005) BASys: a web server for automated bacterial genome
annotation. Nucleic Acids Res 33:W455–W459.
7. Lowe TM, Eddy SR (1997) tRNAscan-SE: A program for improved detection of transfer
RNA genes in genomic sequence. Nucleic Acids Res 25:955–964.
8. Schiex T, Gouzy J, Moisan A, de Oliveira Y FrameD (2003) A flexible program for quality
check and gene prediction in prokaryotic genomes and noisy matured eukaryotic
sequences. Nucleic Acids Res 31:3738–3741.
9. Rice P, Longden I, Bleasby A (2000) EMBOSS: The European Molecular Biology Open
Software Suite. Trends Genet 16:276–277.
10. Ciccarelli FD, et al. (2006) Toward automatic reconstruction of a highly resolved tree of
life. Science 311:1283–1287.
11. Edgar RC (2004) MUSCLE: Multiple sequence alignment with high accuracy and high
throughput. Nucleic Acids Res 32:1792–1797.
12. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: maximum
likelihood phylogenetic analysis using quartets and parallel computing.Bioinformatics
18:502–504.
13. Goldman N, Whelan S (2000) Statistical tests of gamma-distributed rate heterogeneity
in models of sequence evolution in phylogenetics. Mol Biol Evol 17:975–978.
14. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: A new generation of protein
database search programs. Nucleic Acids Res 25:3389–3402.
15. Pushker R, Mira A, Rodriguez-Valera F (2004) Comparative genomics of gene-family
size in closely related bacteria. Genome Biol 5:R27.
16. Bateman A, Haft DH (2002) HMM-based databases in InterPro Brief. Bioinformatics
3:236–245.
17. Bouige P, Laurent D, Piloyan L, Dassa E (2002) Phylogenetic and functional classification
of ATP-binding cassette (ABC) systems. Curr Protein Pept Sci 3:541–559.
18. Putonti C, et al. (2006) A computational tool for the genomic identification of regions
of unusual compositional properties and its utilization in the detection of horizontally
transferred sequences. Mol Biol Evol 23:1863–1868.
19. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: Improving the sensitivity of
progressive multiple sequence alignment through sequence weighting, position-
specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680.
20. Kumar S, Tamura K, Nei M (2004) MEGA3: Integrated software for Molecular Evolu-
tionary Genetics Analysis and sequence alignment. Brief Bioinform 5:150–163.
21. Schwede T, Kopp J, Guex N, Peitsch MC (2003) SWISS-MODEL: An automated protein
homology-modeling server. Nucleic Acids Res 31:3381–3385.
22. Guex N, Peitsch MC (1997) SWISS-MODEL and the Swiss-PdbViewer: An environment
for comparative protein modeling. Electrophoresis 18:2714–2723.
23. Hartel H, Dormann P, Benning C (2000) .DGD1-independent biosynthesis of extraplas-
tidic galactolipids after phosphate deprivation in Arabidopsis Proc Natl Acad Sci USA
97:10649–10654.
24. Cerritos R, et al. (2008) Bacillus coahuilensis sp. nov. A new moderately halophilic
species from different pozas in the Cuatro Cie´negas Valley in Coahuila, Me´xico. Int J
Syst Evol Microbiol 58:919–923.
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 2 of 23
Page 9
hidden
B. sp. JL-29
B. sp. JL-29
B. sp. JL1082
Marine B. sp. NRRLB-14850
B. vietnamensis
B. sp. CNJ733
B. sp. MN-003
B. 8-gw2-7
B. sp. DWDY-2
B. aquimaris strain 10b
B. sp. 19500
B. aquimaris
B. marisflavi
B. coahuilensis m4-4
B. coahuilensis p1.1.43
B. coahuilensis m2-6
B. coahuilensis m2-9
B. licheniformis
B. subtilis
B. sp. NRRL B-14911
B. cereus HCY0116
B. anthracis SUF6
B. anthracis Ames
G. kaustophilus
O. iheyensis
B. halodurans
L. monocytogenes
L. innocua100
95
100
100
79
64
100
58
57
54
35
100
55
45
100
92
97
88
54
97
58
31
49
60
55
0.01
Fig. S1. Phylogenetic analysis of the 16S rRNA from B. coahuilensis and related species. Neighbor-oining tree of 16S rRNA using Kimura’s two-parameter
substitution model and 1,000 bootstrap replicates. Shown are the sequenced B. coahuilensis (m4–4) as well as other B. coahuilensis isolates. Most of the strains
shown in the phylogeny were isolated from marine environments, marine sediments, and estuaries. The exceptions are Bacillus sp. 19500 and Bacillus sp. 8-gw2-7
isolated from a mural paintings tomb in Seville, Spain, and from freshwater in Michigan, respectively. The accession numbers of the strains used in this analysis
are: 54303774, 50235228, 118577870, 5524657, 92091064, 15420442, 116266516, 75991537, 16973341, otherwise sequences were retrieved from the whole
genome sequence.
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 3 of 23
Page 10
hidden
EAJ38772.1| unknown environmental seq...
EAH36596.1| unknown environmental seq...
EAG25577.1| unknown environmental seq...
EAJ23013.1| unknown environmental seq...
EAK17406.1| unknown environmental seq...
Prochlorococcus marinus str. MIT 9312...
uncultured Prochlorococcus marinus cl...
EAK17363.1| unknown environmental seq...
Prochlorococcus marinus subsp. pastor...
Prochlorococcus marinus subsp. marinu...
Prochlorococcus marinus NATL2A glycos...
Prochlorococcus marinus str. MIT 9211...
Synechococcus sp. WH 5701 SqdX gi|873...
Synechococcus sp. RS9917 SqdX gi|8712...
EAI83699.1| unknown environmental seq...
Synechococcus sp. CC9605 SqdX g i|781...
EAK34705.1| unknown environmental seq...
Synechococcus sp. WH 8102 SqdX gi|338...
Synechococcus elongatus sulfolipid su...
Crocosphaera watsonii WH 8501gi glyco...
Synechocystis sp. PCC 6803gi|16331306...
Trichodesmium erythraeum Glycosyl tra...
Nostoc punctiforme COG0438: Glycosylt...
Anabaena variabilis glycosyl transfer...
Nostoc sp. PCC 7120 sqdx gi|17131356|...
Bacillus coahuilensis Sqd X
Rubrobacter xylanophilus DSM 9941 gly...
Arabidopsis thaliana SQD2 gi|3067916...
Oryza sativa sulfolipid synthase gi|5...
Bacillus cereus G9241 glycosyl transferase..
Bacillus thuringiensis glycosyl transferase.100
100
32
55
99
99
51
46
86
81
100
100
99
56
96
65
99 73
94
69
100
98
74
54
81
71
44
56
0.1
Firmicutes
Cyanobacteria
Actinobacteria
Plants and Algae
Bacilli regular glycosyl
transferases
SqdX, sulfolipid related
glycosyl transferase
Fig. S2. Phylogenetic analysis of the glycosyltransferase coded in the B. coahuilensis sqd operon. Despite the presence in bacteria of different glycosyltrans-
ferases, we show that the glycosyltransferase SqdX in the B. coahuilensis SQD1 operon is phylogenetically closer to the plant and cyanobacterial proteins than
it is to the bacterial glycosyltransferases, giving support to the horizontal transfer of the SQD1 operon.
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 4 of 23
Page 12
hidden
asnO crtI carA2 crtB amtB glnB norM
GroupII
intron
maturase
Glycosyl
transf.
ycdQ
Glycosyl
transf.
ycdQ
alkA
crtB, Phytoene synthase
crtI and carA2, Phytoene desaturase
crtI crtI
crtI
murG2
fabG
Methyl-accepting chemotaxis proteins in B. coahuilensis
M4400025 mcpC M4402078 tlpB
M4400248 mcpB M4402148 mcpB
M4400263 mcpB M4402589 mcpB
M4400630 tlpA M4402800 tlpA
M4400687 mcpB M4402859 tlpB
M4400745 tlpB M4403081 mcpB
M4401183 mcpB M4403083 mcpA
M4401184 mcpC M4403306 mcpC
M4401682 tlpB
Chemotaxis response proteins
M4401449 cheB Response regulator protein-glutamate methylesterase
M4401450 cheA Two-component sensor histidine kinase
M4401452 cheC Chemotaxis protein cheC
M4401861 cheR Chemotaxis protein methyltransferase
A
B
mcpB
M4400248
ISBma2
Transposase
mcpB
M4400263
Hypot.
Fig. S4. Carotenoid synthesis and methyl-accepting chemotaxis protein (MCP) coding genes in B. coahuilensis. (A) Carotenoid synthesis genes are distributed
in the B. coahuilensis genome. One operon contains both a synthase (carB) and desaturase (crtI) genes. Other crtI genes are found at different locations. Two
of them are close to genes encoding transposases, suggesting that these were acquired through HGT. (B) Methyl-accepting proteins in the genome of B.
coahuilensis. Some sensory rhodopsins are known to transduce their signal through MCPs. B. coahuilensis has 17 MCP coding genes. We do not know, however,
whether these are involved in the phototransduction signaling.
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 6 of 23
Page 13
hidden
n = 6 (di - codon) window= 5,000 bp step= 5,000 bp
1 489 977 1465 1953 2441 2929 3417
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
A
C
I
B
A
600,000 1,200,000 1,800,000 2,400,000 3,000,000
600,000
1,200,000
1,800,000
2,400,000
3,000,000 1.00
0.87
0.74
0.62
0.48
0.35
0.23
0.10
-0.02
-0.16
-0.28
B. coahuilensis
sisneliuhaoc .
B
(di - c don) indow= 5,000 bp t p = 5,000 bp
B. coahuilensis gene
sisneliuhaoc .
B
Fig. S5. Nucleotide composition analysis and Codon Adaptation Index analysis to detect Horizontal Gene Transfer events. (A) S-plot for B. coahuilensis versus
itself. Different Pearson correlation coefficients are represented on the plot by different colors. The plot leads to the identification of Regions of Unusual
Composition (RUCPs) (see SI Text.) within the genome ofB. coahuilensis. (B) Codon Adaptation Index (CAI) of each predicted ORF of the genome ofB.coahuilensis.
The average CAI is 0.7147  0.0537. An underaverage CAI could reflect recent insertion into the genome or function restrictions. Over average CAI probably
means an adaptation to an effective transcription/translation rate.
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 7 of 23
Page 14
hidden
I
O
M
S
O
M
P
S
O
O
A
P
N
H
P
M
L
D
H
V
SIA
A
H
N
P
O
N
C
T
O
T
E
M
B. anthracis Ames
B. thuringensis
B. cereus
B. licheniformis
B. subtilis
B. halodurans
B. clausii
O. iheyensis
G. kaustophilus
B.sp.NRRLB14911
B. coahuilensis
0
5
10
15
20
25
30
35
40
MET Metals
OTCN Osmoprotectans Taurine Cyanate and Nitrate
OPN Oligopeptides and Nickel
HAA Hydrophobic amino acids and amides
ISVH Iron-Siderophores Vitamin B-12 and Hemin
DLM D- L-Methionine and derivates
PHN Phosphonates and phosphites
PAO Polar amino acid and opines
OSP Oligosaccharides and polyols
MOS Monosaccharides
MOI Mineral and Organic
Fig. S6. Distribution of ABC importer families in the Bacillus spp. and closely related species. ABC importer gene families were searched by means of building
and calibrating HMM profiles for each of the import families deposited in the ABCISSE database (http://www.pasteur.fr/recherche/unites/pmtg/abc/
database.iphtml) to detect these genes in all of the sequenced Bacillus spp. Bars in different colors denote the presence of a gene predicted to code for a given
importer with the height representing the number of genes present for any given category (see SI Text).
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 8 of 23
Page 15
hidden
Z, 1 2 to 5
C, 36 D, 6
E, 54
F, 11
G, 34
H, 15
I, 33
J, 31
K, 29
L, 70
M, 40N, 17
O, 21
P, 55
Q, 10
R, 50
S, 36
T, 38
U, 7
V, 5
X, 33
Y, 52
V, 31
T, 45
K, 104
L, 107
U, 7
singleton B, 1
C, 81
D, 35
E, 139
F, 60
G, 89
H, 64
I, 49
J, 105
M, 87
N, 37
O, 69
P, 50
Q, 26
R, 215
S, 140
X, 871
Y, 371
more than 5
E, 12 F, 1
G, 1
I, 19
L, 1
M, 10
N, 8
O, 3
P, 18
Q, 1R, 12
T, 57
V, 24
X, 3
Y, 1 C, 3
D, 1
more than 5 C, 1
D, 1
E, 21
G, 4
K, 7
M, 5
O, 2
P, 10
Q, 13
R, 8
T, 1
V, 9
B. subtilis
B. coahuilensis
B. Chromatin structure and
dynamics
C. Energy production
D. Cell division and
chromosome partitioning
E. Amino acid transport and
metabolism
F. Nucleotide transport and
metabolism
G. Carbohydrate transport and
metabolism
H. Coenzyme metabolism
I. Lipid metabolism
J. Translation, ribosomal
structure and biogenesis
K. Transcription
L. DNA replication, repair and
recombination
M. Cell envelope biogenesis
N. Cell motility
O. Postranslational modification,
protein turnover, chaperones
P. Inorganic ion transport and
metabolism
Q. Secondary metabolites biosynthesis,
transport and catabolism
R. General function prediction
S. COG of unknown function
T. Signal transduction
mechanisms
U. Intracellular trafficking, secretion and vesicular
transport
V. Defesnse mechanisms
Y. Conserved Hypothetical
not in COGs
X. Hypothetical, only prediction
not known orthologous
Z. Cytoskeleton
Fig. S7. Comparison of the distribution of paralogous genes in B. subtilis and B. coahuilensis. The analysis of paralogous genes in B. coahuilensis and B. subtilis
was done as in Pushker et al. (15)
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 9 of 23
Page 16
hidden
Table S1. Genomic sequencing results
Runs Reads Bases Contigs Total length, Mb Average length, kb N50, kb
1 158,013 16,348,073 1390 1.361 1 1
2 316,756 32,921,912 853 3.15 3.7 6.8
3 471,089 48,929,738 242 3.277 13.5 29.5
4 630,975 65,562,837 133 3.287 24.7 47.2
5 792,282 82,130,316 112 3.29 29.4 53.5
6 965,789 99,627,638 107 3.289 30.7 56.3
7 1,294,112 136,203,848 107 3.321 31.54 62.98
Assembly statistics Sanger 454 454  Sanger
Reads 16,698 1,294,112 -
Sequenced bases 20,709,240 136,203,848 -
Total Mb (assembly) 2.492 3.321 3.351
Number of contigs 876 107 73
Shortest contig, bp 46 498 1,379
Longest contig, bp 37,164 136,892 256,258
Non-ATCG bases (assembly) 1,803 0 284
We used a hybrid 454/Sanger sequencing strategy and in this table we provide a summary of the assembly data.
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 10 of 23
Page 17
hidden
Table S2. Comparison of genomic features among sequenced Bacillus spp.
Strain Ref. sequence GenBank Length, Mbp GC content, % Proteins RNAs
Bacills coahuilensis str. M44 ABFU00000000 ABFU00000000 3.358 37.5 3,640 87
Bacillus sp. NRRL B-14911 NZAAOX00000000 AAOX00000000 5.086 45 5,691 106
Oceanobacillus iheyensis HTE831 NC004193 BA000028 3.63 35 3,500 92
Geobacillus kaustophilus HTA426 NC006510 BA000043 3.545 52 3,498 114
Bacillus subtilis subsp. subtilis str. 168 NC000964 AL009126 4.215 43 4,105 119
Bacillus clausii KSM-K16 NC006582 AP006627 4.304 44 4,096 96
Bacillus licheniformis ATCC 14580* NC006270 CP000002 4.222 46 4,152 93
Bacillus licheniformis ATCC 14580† NC006322 AE017333 4.223 46 4,196 93
Bacillus halodurans C-125 NC002570 BA000004 4.202 43 4,066 105
Bacillus anthracis str. Ames NC003997 AE016879 5.227 35 5,311 128
Bacillus anthracis str. Sterne NC005945 AE017225 5.229 35 5,287 128
Bacillus cereus ATCC 10987 NC003909 AE017194 5.224 35 5,603 133
Bacillus cereus ATCC 14579 project at INRA NC004722 AE016877 5.412 35 5,234 142
Bacillus cereus E33L NC006274 CP000001 5.301 35 5,134 135
Bacillus thuringiensis serovar konkukian str. 97–27 NC005957 AE017355 5.238 35 5,117 144
*Novozymes Biotech.
†Project at Gottingen Genom. Lab.
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 11 of 23
Page 18
hidden
Table S3. Twenty universally distributed Clusters of Orthologous Groups (COGs) used in the phylogenomic analysis
ID Av. length Annotation
COG0018 548 Arginyl-tRNA synthetase
COG0049 182 Ribosomal protein S7
COG0052 240 Ribosomal protein S2
COG0080 154 Ribosomal protein L11
COG0081 230 Ribosomal protein L1
COG0087 288 Ribosomal protein L3
COG0092 240 Ribosomal protein S3
COG0094 182 Ribosomal protein L5
COG0096 131 Ribosomal protein S8
COG0097 177 Ribosomal protein L6P/L9E
COG0098 220 Ribosomal protein S5
COG0100 145 Ribosomal protein S11
COG0172 442 Seryl-tRNA synthetase
COG0200 166 Ribosomal protein L15
COG0201 445 Preprotein translocase subunit SecY
COG0202 323 DNA-directed RNA polymerase, alpha subunit
COG0256 178 Ribosomal protein L18
COG0495 854 Leucyl-tRNA synthetase
COG0522 199 Ribosomal protein S4 and related proteins
COG0533 375 Metal-dependent proteases with chaperone activity
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 12 of 23
Page 19
hidden
Table S4. ABC Importers proportion in a given category per genome and normalization relative to genome size
Bsu Bha Ban Oih Bth Bli Bce Gka Bcl Bco B14911 Total
MET 28 21 32 25 35 37 31 15 32 20 36 312
OTCN 15 18 19 13 26 13 25 3 22 10 23 187
OPN 6 14 10 11 12 12 9 2 19 4 10 109
HAA 4 13 14 13 12 5 14 19 9 5 24 132
ISVH 6 7 7 7 7 6 7 3 5 6 13 74
DLM 1 1 3 4 2 2 2 1 3 4 2 25
PHN 3 6 5 4 6 1 5 1 2 3 3 39
PAO 5 2 4 3 4 2 5 2 3 4 6 40
OSP 4 5 4 5 4 3 5 2 7 2 2 43
MOS 2 3 3 2 4 2 3 6 3 2 3 33
MOI 3 4 4 3 5 5 4 4 6 3 3 44
TOTAL 77 94 105 90 117 88 110 58 111 63 125 1038
Genome size 4.22 4.2 5.23 3.63 5.24 4.22 5.22 3.54 4.3 3.36 5.09
MET 6.63 5.00 6.12 6.89 6.68 8.76 5.93 4.23 7.43 5.95 7.08
% 36.36 22.34 30.48 27.8 29.91 42.04 28.20 25.86 28.83 31.74 28.80
OTCN 3.55 4.28 3.63 3.58 4.96 3.07 4.78 0.84 5.11 2.98 4.52
% 19.48 19.15 18.09 14.4 22.22 14.77 22.73 5.17 19.82 15.87 18.40
OPN 1.42 3.33 1.91 3.03 2.29 2.84 1.72 0.56 4.41 1.19 1.96
% 7.79 14.89 9.52 12.2 10.25 13.63 8.18 3.45 17.11 6.35 8.00
HAA 0.95 3.09 2.68 3.59 2.29 1.18 2.68 5.36 2.09 1.49 4.72
% 5.19 13.83 13.33 14.4 10.25 5.68 12.73 32.76 8.11 8.11 19.20
ISVH 1.42 1.66 1.34 1.93 1.33 1.42 1.34 0.84 1.16 1.79 2.55
% 7.79 7.44 6.66 7.77 5.98 6.82 6.36 5.17 4.50 9.52 10.40
DLM 0.24 0.24 0.57 1.10 0.38 0.47 0.38 0.28 0.70 1.19 0.39
% 1.30 1.06 2.86 4.44 1.71 2.27 1.82 1.72 2.70 6.35 1.60
PHN 0.71 1.43 0.96 1.10 1.14 0.24 0.96 0.28 0.46 0.89 0.59
% 3.89 6.39 4.76 4.44 5.12 1.13 4.54 1.72 1.80 4.76 2.40
PAO 1.18 0.47 0.76 0.82 0.76 0.47 0.96 0.56 0.70 1.19 1.18
% 6.49 2.13 3.81 3.33 3.42 2.27 4.54 3.45 2.70 6.35 4.80
OSP 0.95 1.19 0.76 1.38 0.76 0.71 0.96 0.56 1.62 0.59 0.39
% 5.19 5.32 3.81 5.55 3.42 3.41 4.54 3.45 6.30 3.17 1.60
MOS 0.47 0.71 0.57 0.55 0.76 0.47 0.57 1.69 0.70 0.59 0.59
% 2.59 3.19 2.86 2.22 3.42 2.27 2.73 10.34 2.70 3.17 2.40
MOI 0.71 0.95 0.76 0.82 0.95 1.18 0.76 1.13 1.39 0.89 0.59
% 3.89 4.25 3.81 3.33 4.27 5.68 3.63 6.90 5.40 4.76 2.40
Normalized data: No. of transporters of a given class per genome size. %, transporters of a given class as percent of total number of transporters in that species.
Bsu, B. subtilis; Bha, B. halodurans; Ba, B. anthracis AMES; Oih, O. iheyensis; Bth, B. thuringensis; Bli, B. licheniformis; Bce, B. cereus; Gka, G . kaustophilus; Bcl,
B. clusii; Bco, B. coahuilensis; B14911, B. sp. NRRLB 14911.
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 13 of 23
Page 20
hidden
Table S5. Growth requirements of B. coahuilensis and Bacillus
sp. NRRLB14911
Amino acid lacking B. coahuilensis NRRLB14911
L-Alanine  
L-Aspartic  
L-Glutamic / 
L-Asparagine  
L-Glutamine  
L-Arginine / 
L-Proline  /
L-Cysteine  
L-Glycine  
L-Serine  
L-Lysine  
L-Methionine / /
L-Threonine  
L-Isoleucine / 
L-Leucine / 
L-Valine  
L-Phenylalanine  
L-Tryptophane  
L-Tyrosine  
L-Histidine  
All amino acids added  
No amino acids added  /
Modified marine medium containing the stated amino acids was inocu-
lated with B. coahuilensis or NRRLB14911 and cultured at 37°C with agitation
in nephelometric flasks. Absorbance was measured with a Klett-Summerson
colorimeter at 24 and 48 h.
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 14 of 23
Page 21
hidden
Table S6. Comparative analysis of the presence in B. coahuilensis orthologs for selected sporulation, germination, and competence
genes
B. subtilis Gene name Function
B. coahuilensis
ortholog
Orthologs in other Bacillus (when absent
in B. coahuilensis)
Sporulation initiation
BSU13990
BSU31450
BSU14490
BSU13660
BSU13530
kinA
kinB
kinC
kinD
kinE
two-component sensor histidine kinases M4401017 (kinA)
M4401019
(kinC)
M4401702 (kinA)
M4402663 (kinA)
M4402848 (kinA)
M4400900 (kinB)
M4400627 (kinC)
BSU16170 codY transcriptional repressor CodY M4401419
BSU00370 abrB transcriptional regulator M4403643
BSU12430 rapABC response regulator aspartate Absent Bs (11) Bl
BSU36690 DEFGHI phosphatase (24) Bcg ( )
BSU03770 Bh (5) Bcl (9)
BSU36380
BSU25830
BSU37460
BSU40300
BSU06830
BSU05010
BSU02820
BSU18910
Sporulation sigma factors
BSU23450 sigF sporulation sigma factor SigG M4401792
BSU15330 sigG sporulation sigma factor SigE M4401332
BSU15320 sigE RNA polymerase sporulation M4401330
(spoIIGB) specific sigma factor (sigma-K)
(C-terminal half)
BSU26390 sigK sporulation sigma factor SigK M4402263
(spoIIIC)
BSU00980 sigH Transition state sigma factor M4400132
SigH
spo0
BSU24220 spo0A two-component response regulator M4402096
BSU27930 spo0B sporulation initiation phosphotransferase M4402284
BSU13640 spo0E negative sporulation regulatory
phosphatase
Absent Bs Bl NR
BSU37130 spo0F two-component response regulator M4403280
BSU40960 spo0J site-specific DNA-binding protein M4400017
BSU11430 spo0K oligopeptide ABC transporter Similar to
(oppA) (binding protein) BM4400987 (dppE)
BSU08760 spo0M sporulation-control gene M4401200
spoII
BSU23470 spoIIAA anti-anti-sigma factor M4401790
BSU23460 spoIIAB anti-sigma F factor M4403657
BSU28060 spoIIB required for complete dissolution of the
asymmetric septum (stage II sporulation)
Absent Bs Bl Bcg Bh NR
BSU36750 spoIID serine phosphatase M4403330
BSU00640 spoIIE protease M4400073
BSU15310 spoIIGA required for dissolution of the septal cell
wall (stage II sporulation)
M4401329
BSU23530 spoIIM required for dissolution of the septal cell
wall (stage II sporulation)
M4401782
BSU25530 spoIIP required for completion of engulfment M4402279
BSU36550 spoIIQ required for processing of pro-sigma-E
(extracellular signal interacting with
SpoIIGA?) (stage II sporulation)
BM4403333 (yebA)
BSU36970 spoIIR required for processing of pro-sigma-E M4403297
BSU12830 spoIISA sporulation protein IISA Absent Bs Bl Bcl
BSU12820 spoIISB sporulation protein IISB Absent Bs Bl
spoIII
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 15 of 23
Page 22
hidden
B. subtilis Gene name Function
B. coahuilensis
ortholog
Orthologs in other Bacillus (when absent
in B. coahuilensis)
BSU24430 spoIIIAA mutants block sporulation after
engulfment
M4402070
BSU24420 spoIIIAB stage III sporulation protein SpoAB M4402071
BSU24400 spoIIIAD mutants block sporulation after
engulfment (stage III sporulation)
M4402072
BSU24390 spoIIIAE mutants block sporulation after
engulfment (stage III sporulation)
M4402073
BSU24380 spoIIIAF mutants block sporulation after
engulfment (stage III sporulation)
M4402074
BSU24370 spoIIIAG mutants block sporulation after
engulfment (stage III sporulation)
M4402076
BSU24360 spoIIIAH mutants block sporulation after
engulfment (stage III sporulation)
M4402077
BSU41040 spoIIIJ OxaA-like protein precursor M4400009 (oxaI)
BSU36420 spoIIID transcriptional regulator M4403334
BSU16800 spoIIIE DNA translocase M4401633 (ftsK)
BSU41030 jag SpoIIIJ-associated protein M4400010
spoIV
BSU22800 spoIVA required for proper spore cortex
formation and coat assembly (stage IV
sporulation)
M4401851
BSU24230 spoIVB serine peptidase of the SA class M4402095
BSU25770 spoIVCA site-specific DNA recombinase
spoIVFA stage IV sporulation protein FA M4402368
BSU27970 spoIVFB membrane metalloprotease M4402369
BSU00230 bofA inhibition of the pro-sigma-K processing
machinery
Similar to
BM4400117 and
BM4400031
Bs Bl Bcg Bcl Bh Gk Oi NR
spoV
BSU23440 spoVAA sporulation protein VAA M4401793
BSU23430 spoVAB sporulation protein VAB Absent Bs Bl Bcg Oi Gk NR
BSU23420 spoVAC sporulation protein VAC M4401923 M4403660
BSU23410 spoVAD stage V sporulation protein SpoVAD M4401924 M4403661
BSU23400 spoVAE sporulation protein VAE M4401795
BSU23390 spoVAF sporulation protein VAF M4401796 and
M4402028
BSU27670 spoVB involved in spore cortex synthesis (stage V
sporulation)
M4402312
BSU15170 spoVD penicillin-binding protein M4401140
BSU15210 spoVE required for spore cortex peptidoglycan
synthesis (stage V sporulation)
M4401145
BSU00490 spoVG required for spore cortex synthesis M4400089
BSU17420 spoVK sporulation protein VK M4403653
BSU15810 spoVM required for normal spore cortex and coat
synthesis (stage V sporulation)
M4403652
BSU09400 spoVR involved in spore cortex synthesis (stage V
sporulation)
M4400764
BSU16980 spoVS required for dehydratation of the spore
core and assembly of the coat (stage V
sporulation)
M4401524
BSU00560 spoVT transcriptional regulator M4400082
BSU28110 spoVID required for assembly of the spore coat
(stage VI sporulation)
M4402335
BSU00430 yabG hypothetical protein M4400097
BSU04110 ycsK hypothetical protein M4400277
BSU31470 kapD sporulation inhibitor KapD M4402705
BSU23190 dacB D-alanyl-D-alanine carboxypeptidase M4401803
Spore
coat
BSU06300 cotA spore coat protein (outer) Absent Bs Bl Oi Bcl (Similar multicopper
oxidases in Bh Gk Ba)
BSU36050 cotB spore coat protein (outer) Absent Bs Gk Bcg
BSU17700 cotC spore coat protein (outer) Absent Bs
BSU22200 cotD spore coat protein (inner) M4403659
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 16 of 23
Page 23
hidden
B. subtilis Gene name Function
B. coahuilensis
ortholog
Orthologs in other Bacillus (when absent
in B. coahuilensis)
BSU17030 cotE morphogenic protein M4401542 (cotE) Bs Bl Bcg Bh Bcl Oi NR
BSU40530 cotF spore coat protein Similar to
BM4403540 (yhcQ)
Bs Bl Bce Bt NR
BSU36070 cotG morphogenetic protein Absent Bs
BSU36060 cotH spore coat protein (inner) Absent Bs Bl Bcg
BSU06890 cotJA polypeptide composition of the spore
coat; required for the assembly of CotJC
M4401829 (cotJA) Bs Bl Bcg NR
BSU06900 cotJB polypeptide composition of the spore coat M4403658 (cotJB) Bs Bl Bcg Gk NR
BSU06910 cotJC polypeptide composition of the spore coat M4401828 (cotJC) Bs Bl Bcg NR
BSU17970 cotM spore coat protein (outer) Similar to
BM4401693
Bs Bl NR
BSU24620 cotN (tasA) spore coat protein BM4402955 (cotN) Bs NR
BSU05550 cotP spore coat protein Absent Bs NR
BSU30900 cotS spore coat protein Similar to
M4403199
Bs Bl Bcg Bh NR
BSU30910 cotSA spore coat protein Absent Similar glycosyltransf erases in Bs Bl Bcg
Gk Bh NR
BSU12090 cotT spore coat protein (inner) Absent Bs
BSU11780 cotV spore coat protein (insoluble fraction) Absent Bs Bl
BSU11770 cotW spore coat protein (insoluble fraction) Absent Bs Bl
BSU11760 cotX spore coat protein (insoluble fraction) M4403656 Bs Bl Bce Bh Bcl Oi NR
BSU11750
BSU11740
cotY cotZ spore coat protein (insoluble fraction) Absent Bs Bl Bcg Oi
BSU27830 coxA spore cortex protein Absent Bs Bl
BSU37910 spsA spore coat polysaccharide synthesis Absent Bs Bh
BSU37900 spsB spore coat polysaccharide synthesis Absent Bs
BSU37890 spsC spore coat polysaccharide synthesis Absent Bs Bl Bcg Gk NR
BSU37880 spsD spore coat polysaccharide synthesis Absent Bs
BSU37870 spsE spore coat polysaccharide synthesis Absent Bs Gk
BSU37860 spsF spore coat polysaccharide synthesis Absent Bs
BSU37850 spsG spore coat polysaccharide synthesis Absent Bs
BSU37840 spsI spore coat polysaccharide synthesis
(glucose-1 phosphate
thymidylyltransferase)
Absent Bs Bcg Bcl Bh Gk NR
BSU37830 spsJ spore coat polysaccharide synthesis (dTDP
glucose 4,6-dehydratase/epimerase)
Similar to
BM4403005
Bs Bcg Bcl Gk NR
BSU37820 spsK spore coat polysaccharide synthesis
(dTDP-4-dehydrorhamnose reductase
Similar to
BM4400707 and
BM4403219 strL
Bs Bcg Bcl Bh Oi NR
BSU37810 spsL spore coat polysaccharide synthesis
(dTDP-4-dehydrorhamnose epimerase)
Absent Bs Bcg Bh Bcl Oi NR
BSU02070 csgA sporulation-specific SASP protein Absent Bs Bl NR
BSU22850 seaA involved in spore envelope assembly M4401847 (yphB) Bs Bl Oi NR
BSU29570
BSU09750
BSU19950
BSU13470
sspA sspB
sspC sspD
small acid-soluble spore protein
(alpha/beta-type SASP)
M4403662 (sspA) Bs (4) Bl (6) Bc (6) Bt (7) Ban (6) Bcl(3) Bh
(3) Gk (1) Oi (1) NR (5)
BSU08660 sspE small acid-soluble spore protein
(gamma-type SASP)
BM4400675
(hypothetical)
Bs Bl Bcg Bcl Bh Oi NR
BSU32640 sspG small acid-soluble spore protein Absent Bs
BSU00450 sspF small acid-soluble spore protein
(alpha/beta-type SASP)
M4400095 (sspF)
BSU28660 sspI small acid-soluble spore protein SspI M4402408 (sspI)
BSU33340 sspJ small acid-soluble spore protein Absent Bs Bl
BSU22000 sspL small acid-soluble spore protein Absent Bs Bl
BSU22290 sspM small acid-soluble spore protein Absent Bs Bl
BSU18020 sspN small acid-soluble spore protein M4403655
BSU17990 sspO acid-soluble spore protein O M4401690
BSU18030 tlp Tlp spore cortex-lytic enzyme M4403654 (tlp)
BSU22930 sleB spore cortex-lytic enzyme M4401837 (sleB)
BSU13820 ykvT hyp hypothetical proteinothetical protein M4402316
BSU23170 spmB spore maturation protein M4401806 (spmB)
BSU23180 spmA spore maturation protein M4401805 (spmA)
BSU13930 splB spore photoproduct (thymine dimer) lyase M4400797 (splB)
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 17 of 23
Page 24
hidden
B. subtilis Gene name Function
B. coahuilensis
ortholog
Orthologs in other Bacillus (when absent
in B. coahuilensis)
Germination
(earliest stage) M4400597 (gerAB)
M4401291(gerAB),
M4401292,
M4401293 (gerIA)
BSU33070 gerAC germination response to L-alanine and
related amino acids (earliest stage)
BM4403467 (MLA
incompleto)
BSU35800 gerBA probable component of a germinant
receptor
Absent Bs
BSU35810 gerBB probable component of a germinant
receptor
Absent Bs
BSU35820 gerBC germination response to L-alanine and to
the combination of glucose, fructose,
L-asparagine, and KCl (early stage)
Absent Bs
BSU01550 gerD germination response to L-alanine and to
the combination of glucose, fructose,
L-asparagine, and KCl (early stage)
M4400170
BSU28410 gerE transcriptional regulator M4402392
gerIA germination response to the combination
of glucose, fructose, L-asparagine, and
KCl
M4400595,
M4401293
BSU03700 gerKA germination response to the combination
of glucose, fructose, L-asparagine, and
KCl
M4403644
BSU03720 gerKB germination response to the combination
of glucose, fructose, L-asparagine, and
KCl
Absent (in its place
there is a distant
ger gene,
M4403466)
BSU03710 gerKC germination (cortex hydrolysis) and
sporulation (stage II, multiple polar
septa)
M4403467
BSU28380 gerM spore germination protein M4402396
BSU10720 gerPA spore germination protein M4403651
BSU10710 gerPB spore germination protein M4403650
BSU10700 gerPC spore germination protein M4403649
BSU10690 gerPD spore germination protein M4403648
BSU10680 gerPE spore germination protein M4403647
BSU10670 gerPF spore germination protein (ywdL) M4403646
BSU15090 gerQ spore germination protein; Prespore
Specific Transcriptional Activator (ylbO)
BM4401120
BSU37620 gerR transcriptional regulatory protein BM4403368 (rsfA)
BSU25540 gpr Germination protease precursor M4402278
Competence
protein A
BSU31690 comP two-component sensor histidine kinase Absent Bs Bl Bc Bt
BSU16930 cinA competence damage-inducible protein A M4401517
BSU03430 nucA nuclease Absent Bs Bl Bcg
BSU03420 nin inhibition of the DNA degrading activity
of NucA
Absent Bs Bl Bcg
comC DNA-binding protein M4402351
BSU03500 comS regulation of genetic competence Absent Bs
BSU25590 comEA unspecific high-affinity DNA-binding
protein
M4402274
BSU25580 comEB required for DNA binding and uptake M4402275
BSU25570 comEC putative integral membrane protein M4402276
BSU25600 comER late competence protein M4402272
BSU35470 comFA late competence protein M4403036
BSU35450 comFC competence protein M4403037
BSU24730 comGA probably part of the DNA transport
machinery
M4402041
BSU24720 comGB probably part of the DNA transport
machinery
M4402042
BSU24710 comGC probably part of the DNA transport
machinery
M4402043
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 18 of 23
Page 25
hidden
B. subtilis Gene name Function
B. coahuilensis
ortholog
Orthologs in other Bacillus (when absent
in B. coahuilensis)
BSU24700 comGD probably part of the DNA transport
machinery
Absent Bs Bl Bcg
BSU24690 comGE probably part of the DNA transport
machinery
Absent Bs Bl
BSU24680 comGF probably part of the DNA transport
machinery
Absent Bs Bl
BSU24670 comGG competence transcription factor Absent Bs Bl
BSU10420 comK competence transcription factor (CTF) M4403645
BSU11300 med late competence gene M4400979
BSU31710 comQ transcriptional regulator Absent Bs Bl
BSU31700 comX pherormone Absent Bs
BSU11310 comZ M4400980
BSU00860 clpC class III stress response-related ATPase M4400120
BSU34540 clpP ATP-dependent Clp protease proteolytic
subunit
M4403116
BSU28220 clpX ATP-dependent protease ATP-binding
subunit
M4402322
BSU11520 mecA adaptor protein M4400995
spxA regulatory protein M4400994
BSU14990 ylbF regulatory protein M4403560
a When no ortholog was found in B. coahuilensis we searched in the genomes of other Bacillus spp. and closely related species to determine how common
this absence was. For genes coding for proteins of less that 100 residues, we looked for sinteny and searched at the corresponding location. Bs, B. subtilis; Bl,
B. licheniformis; Bce, B. cereus; Ba, B. anthracis; Bt, B. thuringiensis; Bcg, Bacillus cereus group (encloses Bce, Ba, and Bt); Bcl, B. clausii; Oi, O. iheyensis; Bk,
B. kaustopilus; NR, B. sp. NRRL11194.
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 19 of 23
Page 26
hidden
Table S7. Cell envelope, biogenesis and cell division proteins in B. coahuilensis in comparison to B. subtilis and other Bacillus spp.
B. subtilis* Function
Possible orthologs in B.
coahuilensis
Orthologs
in other
Bacillus† (when
absent in
B. coahuilensis)
Fatty acid synthesis
Initiation of fatty
acid synthesis
accA‡, B‡, C‡, D‡, acpA‡, fabD‡,
birA‡
M4402554 (accA), M4402080 (accB),
M4402081 and M4401121 (accC),
M4402553 (accD), M4401388
(acpA), M4401386 (fabD),
M4401885 (birA)
fabHA, fabHB Overlapping function M4400982 (fabHA)
Fatty acid chain
elongation
fabF‡,G‡ M4400983 (fabF) M4401387(fabG)
(seven more 3-oxoacyl-[acyl
carrier-protein] reductase)
fabI fabL Overlapping function M4401919 (fabI)
ywpB (fabZ), ycsD Overlapping function M4403340 (fabZ)
Phospholipid
synthesis
gpsA M4401849 (gpsA)
yhdO‡ Similar to 1-acylglycerol-3-
phosphate O-acyltransferase
Similar to BM4401843 LPAT1 and
BM4401172 LPAT1 (chloroplast)
cdsA‡ Phosphatidate cytidylyltransferase M4401481 (cdsA)
pgsA‡ CDP-diacylglycerol-glycerol-3-
phosphate 3-
phosphatidyltransferase
M4401516 (pgsA)
ywjE, ywnE, ywiE Cardiolipin synthetase (overlapping
function)
M4403386 (ywjE) M4400996 (cls),
M4401628 (cls)
yerQ‡, dgkA Diacylglycerol kinase, overlapping
function
Similar to M4400357 (bmrU),
M4400531 (bmrU) M4402174
(dgkA)
ugtP Glycosyltransferase Putative glycosyl transferase
BM4401162
(ypfP)
pssA Phosphatidylserine synthase Absent Bs, Bl, Bcg, Bh,
Gk, NR
psd Phosphatidylserine decarboxylase Absent Bs, Bl, Bcg, Bh,
Gk, NR
plsX‡ Fatty acid/phospholipid synthesis
protein
M4401385 (plsX)
Peptidoglycan
synthesis
Synthesis of amino
sugars
glmS‡ Glucosamine-fructose-6-phosphate
aminotransferase [isomerizing]
M4400183 (glmS)
ybbT (glmM)‡ Phosphoglucosamine mutase M4400182 (glmM)
gcaD‡ Bifunctional protein M4400088 (gcaD)
yvyH (mnaA)‡ UDP-N-acetylglucosamine-2-
epimerase
M4402980 (mnnA)
nagA N-Acetylglucosamine-6-phosphate
deacetylase
M4402950 and M4403240 (nagA)
nagB, gamA (ybfT) Glucosamine-6-phosphate
deaminase (overlapping
function)
M4403241 (nagB)
pgi‡ Glucose-6-phosphate isomerase A M4403532 (pgiA)
gtaB UTP-glucose-1-phosphate
uridylyltransferase
M4403211/M4403022 (gtaB)
tagE UDP-glucose:polyglycerol
phosphate glucosyltransferase
Absent Bs, Bcl
nagP Ohosphotransferase system (PTS)
N-acetylglucosamine-specific
enzyme IICB component
Similar to M4403536 (nagE)
gamP Probable PTS glucosamine-specific
enzyme IICBA component
Similar to M4400583 (ptsG)
Diaminopimelate dapG, lysC, yclM Overlapping function M4401459 (dapG), M4402386 (lysC)
asd M4401460 (asd)
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 20 of 23
Page 27
hidden
B. subtilis* Function
Possible orthologs in B.
coahuilensis
Orthologs
in other
Bacillus† (when
absent in
B. coahuilensis)
dapA, B M4401457 (dapA), M4401880
(dapB)
ykuQ (dapD)‡ M4401047 (dapD)
ykrV, ywfG Overlapping function M4400772 Transaminase (mtnE),
M4402696 Aspartate
aminotransferase (aspC)
ykuR‡ Peptidase Similar to M4401048 (ytnL) and
M4400898 (amaA)
dapF‡ M4401227 (dapF)
lysA M4400477 (lysA)
hom M4401975 (hom)
spoVFA, B M4401462, M4401461
Racemases racE‡, yrpC Overlapping function M4402395 (murI)
Alr‡ Alanine racemase M4400388 (alr); two more
hypothetical: M4400440 and
M4401269
Synthesis of
lipid-linked
disaccharide
pentapeptide
Ddl‡ D-alanine–D-alanine ligase M4400378 (ddlB)
murAA‡, AB Overlapping function M4403328 (murAA), M4403284
(murAB)
murB‡ M4401321 murB, M4402943 murB2
murC‡,D‡,E‡ M4402485 (murC), M4401144
(murD), M4400855 (murE)
murF‡, mraY‡, murG‡ M4400379 (murF), M4401143
(mraY), M4401217 (murG2)
Teichoic acid
biosynthesis
tagO§ Teichoic acid linkage unit synthesis M4403025 (tagO)
B. subtilis 168 poly
(glycerol
phosphate)
tagA Polyglycerol phosphate assembly
and export
Absent Bs, Bl, Bce, Bt,
Ban, Gk, NR
tagB Polyglycerol phosphate assembly
and export
Absent Bs, Bli, Bcl
tagD Glycerol-3-phosphate
cytidylyltransferase
Absent Bs, Bli, Bcl
tagF CDP-glycerol:polyglycerol
phosphate
glycerophosphotransferase
Absent Bs, Bli, Bcl
tagG Teichoic acid translocation
(permease)
Absent Bs, Bl, Bcl, Bh,
Ban
tagH Teichoic acid translocation
(ATP-binding protein)
Hit to other transporters
B. subtilis 23 poly
(ribitol phosphate)
tarA, D Teichoic acid linkage unit synthesis:
N-acetylmannosamine
transferase and
-glycerol-3-P-cytidyltransferase
tarB, F Glycerolphosphotransferases
tarK, L Ribitoltransferases
tarI, J 5-P-cytidyltransferase
5-P-dehydrogenase
Teichuronic acid
biosynthesis
tuaA (Lipid-carrier) sugar transferase Absent Bs, Bl, Bt, Bcl, Bh,
Gk
tuaB Polymer export M4403024 (tuaB)
tuaC Sugar transferase Absent Bs, Bl
tuaG Sugar transferase Absent Bs, Bl, Bh, Bcl, Bcg
tuaH Sugar transferase Absent Bs, Bl
tuaD UDP-G-dehydrogenase Similar to M4402984 (tuaD) Bs, Bl, Bt, Bcl, Bh,
Gk, NR
tuaE Repeating unit Absent Bs, Bl
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 21 of 23
Page 28
hidden
B. subtilis* Function
Possible orthologs in B.
coahuilensis
Orthologs
in other
Bacillus† (when
absent in
B. coahuilensis)
tuaF Membrane bound, unknown
function
Absent Bs, Bl
Cell shape and
division
Septum formation ftsA‡ M4401327 (ftsA), M4402492 (ftsA)
ftsW‡ M4400657 (ftsW), M4401090 (ylaO)
ftsZ‡ M4401328 (ftsZ)
ftsL‡ Similar to M4402000 (yqgB)
divIB‡, C‡, pbpB‡ Penicillin-binding protein 2B M4401322 divIB, (divIC, similar to
M4400077), M4401139 (pbpB)
Cell shape rodA‡, mreB‡, C‡ M4400360 (ywcF), M4402361
(mreB), M4402362 (mreC)
Capsule¶ capA Poly-gamma-glutamic synthesis M441683 (capA) M441686 (capA)
(MLA roto? le faltan unos 100
aas se juntan los dos genes)
Bs (pga), Bcg, Oi
capD Capsular polysaccharide
biosynthesis protein capD
M4403209 (capD) Bs (yveM), Bl, Bce,
Bh, Oi, NR
capI NAD-dependent
epimerase/dehydratase
M44000268 (capI) Bt, Bh, Oi, NR
icaA Biofilm PIA synthesis
N-acetylglucosaminyltransferase
icaA
M4402203 (icaA) 2977 M4403023 Bh, Gk, NR
swrC M4401621 M4401624
*In bold, genes for which no ortholog is found in B. coahuilensis. In parentheses, alternative name given to a gene.
†Whenever no ortholog was found in B. coahuilensis we searched in the genomes of other Bacillus to determine how common this absence was in this genre.
Bs, B. subtilis; Bl, B. licheniformis; Bce, B. cereus; Ba, B. anthracis; Bt, B. thuringiensis; Bcg, Bacillus cereus group (encloses Bce, Ba, and Bt); Bcl, B. clausii; Oi,
O. iheyensis; Gk, Geobacillus kaustophilus HTA426; NR, B. sp. NRRL11194).
‡Essential in B. subtilis [Kobayashi K, et al. (2003) Essential Bacillus subtilis genes. Proc Natl Acad Sci USA 100:4678–4683].
§Suggested to be involved in the synthesis of other cell wall polyanionic acids [Soldo B, Lazarevick V, Karamata D (2002) tagO is involved in the synthesis of all
anionic cell-wall polymers in Bacillus subtilis. Microbiology 20, Vol. 148:2079–2087].
¶Genes that may be involved in capsule formation are not all present in B. subtilis. There are 137 genes annotated in this functional category of which we only
show a selected set.
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 22 of 23
Page 29
hidden
Table 8. Analysis of the presence in B. coahuilensis of the B. subtilis genes constituting the phosphate regulon
Induced in Bs by
limiting
phosphate Gene/operon Function B. coahuilensis ortholog*
Orthologs
in other
Bacillus† (when
absent in
B. coahuilensis)
phoP Alkaline phosphatase synthesis
transcriptional regulatory protein
M4402201 (phoP)
M4402563 (phoP)
phoR Alkaline phosphatase synthesis sensor
protein
M4402564 (phoR)
phoB Alkaline phosphatase III (secreted) M4403423 (phoB)
ydhF Unknown Absent Bs
pstS High-affinity phosphate ABC
transporter
M4402010 (sphX)
pstC Phosphate transport system permease BM4402011 (yqgH)
pstA Phosphate transport system permease M4402012 (pstA-1)
pstBA Phosphate ABC transporter
(ATP-binding protein)
M4402013 (pstB)
pstBB Phosphate ABC transporter
(ATP-binding protein)
Absent Bs Bl
phoD Phosphodiesterase possibly for teichoic
acid turnover
Absent Bs Bl
tatAD Similar to hypothetical proteins Absent Bs Bl Ba Bcl Gk
Oi NR
resA Thiol-disulfide oxidoreductase M4401809 (resA)
resB Required for cytochrome c synthesis M4401810 (resB)
resC Required for cytochrome c synthesis M4401811 (resC)
resD Transcriptional regulatory protein M4401812 (resD)
resE tuaABCDEFGH glpQ
phoA
Sensor protein teichuronic acid
biosynthesis glycerol phosphoryl
diester phosphodiesterase
(hydrolysis of deacetylated
phospholipids; (secreted) Alkaline
phosphatase
M4401813 and M4400914
(resE) Absent Similar to
BM4402098 (yqiK) Absent
See Table S6
A single AP
also in Bh Bcl
Gk Oi NR
tatCD ykoL Twin arginine transporter, unknown
function
Absent (has tatCy) Absent Bs Bl Bcg Bh Bcl
Gk Oi NR Bs Bl
yttP Probable HTH-type transcriptional
regulator, unknown function
M4402509 (yttP)
ydbD yurI Similar to manganese containing
catalase Extracellular RNase
M4401506 (ydbD)
M4400550 (bsn)
yjdB vpr Unknown extracellular serine protease Absent M4403315 (vpr) Bs
lytB rapA glcU cotP yfkN Modifier protein of LytC response
regulator
Aspartate phosphatase glucose
uptake
Spore coat protein similar to 2,3
cyclic nucleotide 2
phosphodiesterase
Absent
Absent
Absent
Absent
Absent
See Table S5
Bs Bl Bcg NR
Repressed tagAB tagDEF Polyglycerol teichoic acid Absent
Absent
See Table S6
Related genes
present in B.
coahuilensis
resD phoR cpdB phoU Two-component response regulator
Two-component sensor histidine
kinase 2,3-cyclicnucleotide
2-phosphodiesterase precursor
negative regulator of the Pi regulon
M4400235 (resD) M4400236
(phoR) M4400483 (cpdB)
M4402014 (phoU)
See Table S6
Bcg Bh Bcl NR
*In bold, genes for which no ortholog is found in B. coahuilensis. In parentheses, name given to the B. coahuilensis gene.
†When no ortholog was found in B. coahuilensis we searched in the genomes of other Bacillus spp. and closely related species to determine how common this
absence was. For genes coding for proteins of100 residues, we looked for sinteny and searched at the corresponding location. Bs, B. subtilis; Bl, B. licheniformis;
Bce, B. cereus; Ba, B. anthracis; Bt, B. thuringiensis; Bcg, Bacillus cereus group (encloses Bce, Ba, and Bt); Bcl, B. clausii; Oi, O. iheyensis; Bk, B. kaustopilus; NR,
Bacillus sp. NRRL11194.
Alcaraz et al. www.pnas.org/cgi/content/short/0800981105 23 of 23

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

12 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
25% Post Doc
 
25% Ph.D. Student
 
17% Professor
by Country
 
42% Mexico
 
8% Spain
 
8% Thailand

Groups

Pubs