Genome sequencing and analysis of the model grass Brachypodium distachyon.
- PubMed: 20148030
Abstract
Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.
Genome sequencing and analysis of the model grass Brachypodium distachyon.
Genome sequencing and analysis of the
model grass Brachypodium distachyon
The International Brachypodium Initiative*
Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are
poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass
Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be
sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across
a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important
pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small
size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy
and food crops.
Grasses provide the bulk of human nutrition, and highly productive
grasses are promising sources of sustainable energy1. The grass family
(Poaceae) comprises over 600 genera and more than 10,000 species
that dominate many ecological and agricultural systems2,3. So far,
genomic efforts have largely focused on two economically important
grass subfamilies, the Ehrhartoideae (rice) and the Panicoideae
(maize, sorghum, sugarcane and millets). The rice4 and sorghum5
genome sequences and a detailed physical map of maize6 showed
extensive conservation of gene order5,7 and both ancient and rela-
tively recent polyploidization.
Most cool season cereal, forage and turf grasses belong to the
Pooideae subfamily, which is also the largest grass subfamily. The
genomes of many pooids are characterized by daunting size and
complexity. For example, the bread wheat genome is approximately
17,000megabases (Mb) and contains three independent genomes8.
This has prohibited genome-scale comparisons spanning the three
most economically important grass subfamilies.
Brachypodium, a member of the Pooideae subfamily, is a wild
annual grass endemic to the Mediterranean and Middle East9 that
has promise as a model system. This has led to the development of
highly efficient transformation10,11, germplasm collections12–14, genetic
markers14, a genetic linkage map15, bacterial artificial chromosome
(BAC) libraries16,17, physicalmaps18 (M.F., unpublished observations),
mutant collections (http://brachypodium.pw.usda.gov, http://www.
brachytag.org), microarrays and databases (http://www.brachybase.
org, http://www.phytozome.net, http://www.modelcrop.org, http://
mips.helmholtz-muenchen.de/plant/index.jsp) that are facilitating
the use of Brachypodium by the research community. The genome
sequence described here will allow Brachypodium to act as a powerful
functional genomics resource for the grasses. It is also an important
advance in grass structural genomics, permitting, for the first time,
whole-genome comparisons between members of the three most eco-
nomically important grass subfamilies.
Genome sequence assembly and annotation
The diploid inbred line Bd21 (ref. 19) was sequenced using whole-
genome shotgun sequencing (Supplementary Table 1). The ten largest
scaffolds contained 99.6% of all sequenced nucleotides (Supplemen-
tary Table 2). Comparison of these ten scaffolds with a genetic map
(Supplementary Fig. 1) detected two false joins and created a further
seven joins to produce five pseudomolecules that spanned 272Mb
(Supplementary Table 3), within the range measured by flow cyto-
metry20,21. The assembly was confirmed by cytogenetic analysis (Sup-
plementary Fig. 2) and alignment with two physical maps and
sequenced BACs (Supplementary Data). More than 98% of expressed
sequence tags (ESTs) mapped to the sequence assembly, consistent
with a near-complete genome (Supplementary Table 4 and Sup-
plementary Fig. 3). Compared to other grasses, the Brachypodium
genome is very compact, with retrotransposons concentrated at the
centromeres and syntenic breakpoints (Fig. 1). DNA transposons and
derivatives are broadly distributed andprimarily associatedwith gene-
rich regions.
We analysed small RNA populations from inflorescence tissues
with deep Illumina sequencing, and mapped them onto the genome
sequence (Fig. 2a, Supplementary Fig. 4 and Supplementary Table 5).
Small RNA reads were most dense in regions of high repeat density,
similar to the distribution reported in Arabidopsis22. We identified
413 and 198 21- and 24-nucleotide phased short interfering RNA
(siRNA) loci, respectively. Using the same algorithm, the only phased
loci identified inArabidopsiswere five of the eight trans-acting siRNA
loci, and none was 24-nucelotide phased. The biological functions of
these clusters of Brachypodium phased siRNAs, which account for a
significant number of small RNAs that map outside repeat regions,
are not known at present.
A total of 25,532 protein-coding gene loci was predicted in the v1.0
annotation (Supplementary Information and Supplementary Table 6).
This is in the same range as rice (RAP2, 28,236)23 and sorghum (v1.4,
27,640)5, suggesting similar gene numbers across a broad diversity of
grasses. Gene models were evaluated using ,10.2 gigabases (Gb) of
Illumina RNA-seq data (Supplementary Fig. 5)24. Overall, 92.7%
of predicted coding sequences (CDS) were supported by Illumina data
(Fig. 2b), demonstrating the high accuracy of the Brachypodium
gene predictions. These gene models are available from several data-
bases (such as http://www.brachybase.org, http://www.phytozome.net,
http://www.modelcrop.org and http://mips.org).
Between 77 and 84% of gene families (defined according to Sup-
plementary Fig. 6) are shared among the three grass subfamilies
represented byBrachypodium, rice and sorghum, reflecting a relatively
*A list of participants and their affiliations appears at the end of the paper.
Vol 463 | 11 February 2010 |doi:10.1038/nature08747
763
Macmillan Publishers Limited. All rights reserved©2010
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


