Sign up & Download
Sign in

Parent genes of retrotransposition-generated gene duplicates in Drosophila melanogaster have distinct expression profiles.

by Morgan G I Langille, Denise V Clark
Genomics (2007)

Abstract

Genes arising by retrotransposition are always different from their parent genes from the outset. In addition, the cDNA must insert into a region that allows expression or it will become a processed pseudogene. We sought to determine whether this class of gene duplication differs from other gene duplications based on functional criteria. Using amino acid sequences from Drosophila melanogaster, we identified retroduplicated gene pairs at various levels of sequence identity. Analysis of gene ontology annotations showed some enrichment of retroduplications in the cellular physiological processes class. Retroduplications show a higher level of nucleotide substitution than other gene duplications, suggesting a higher rate of divergence. Remarkably, analysis of microarray data for gene expression during embryogenesis showed that parent genes are more highly expressed relative to their retroduplicated copies, tandem duplications, and all genes. Furthermore, an expressed sequence tag library representation shows a broader distribution for parent genes than for all other genes and, as found previously by others, retroduplicated gene transcripts are found most abundantly in testes. Therefore, in examining retroduplicated gene pairs, we have found that parent genes of retroduplications are also a distinctive class in terms of transcript expression levels and distribution.

Cite this document (BETA)

Available from Morgan Langille's profile on Mendeley.
Page 1
hidden

Parent genes of retrotransposition-generated gene duplicates in Drosophila melanogaster have distinct expression profiles.

ne
tin
e 1
Bru
006
ne
of the idea was presented by Ohno in 1970 [1]: once established,
that the complementarity of their expression patterns forces both
genes to be maintained. The idea of subfunctionalization has
paralogs have been developed to test this model [4,5].
an intronless paralog [6]. In contrast to tandem duplication, the
retrotransposed gene may not carry sequences sufficient for its
transcription. To be expressed, the cDNA precursor must be
inserted into a transcribed region, have an internal promoter
Genomics 90 (2007) 3a newly duplicated gene can be inactivated by mutation or
acquire a new function without reducing fitness. The most
common path is inactivation, but the rarer path of acquiring a
new function could then lead to diversification. Ohno's ideas
have since been refined to include other models of diversifica-
tion. More recent concepts include the idea that both copies can
change. For example, the subfunctionalization model predicts
that mutations in gene regulatory regions can occur in both genes
so that their expression patterns become complementary [2,3].
For each gene, these are partial loss-of-function mutations, so
Gene duplication can occur on the whole-genome scale, on
blocks of genes, or on single genes. Single gene duplication
can occur by unequal crossing over to produce tandem
duplications. Tandem duplications may diverge, but they can
also maintain sequence similarity through gene conversion. If
the gene is duplicated in its entirety, then both copies are
initially identical and functional. Single gene duplication can
also occur through retrotransposition, whereby reverse tran-
scription of the mRNA from a parental gene converts it into a
cDNA, which is then inserted into chromosomal DNA, formingphysiological processes class. Retroduplications show a higher level of nucleotide substitution than other gene duplications, suggesting a higher rate
of divergence. Remarkably, analysis of microarray data for gene expression during embryogenesis showed that parent genes are more highly
expressed relative to their retroduplicated copies, tandem duplications, and all genes. Furthermore, an expressed sequence tag library representation
shows a broader distribution for parent genes than for all other genes and, as found previously by others, retroduplicated gene transcripts are found
most abundantly in testes. Therefore, in examining retroduplicated gene pairs, we have found that parent genes of retroduplications are also a
distinctive class in terms of transcript expression levels and distribution.
© 2007 Elsevier Inc. All rights reserved.
Keywords: Gene duplication; Retroelements; Drosophila melanogaster
Gene duplication is considered amajor contributor to genome
evolution and consequent organismal diversification. The core
also been applied to amino acid sequence, and computational
methods for measuring the distribution of divergence betweenat various levels of sequence identity. Analysis of gene ontology anGenes arising by retrotransposition are always different from their parent genes from the outset. In addition, the cDNA must insert into a region
that allows expression or it will become a processed pseudogene. We sought to determine whether this class of gene duplication differs from other
gene duplications based on functional criteria. Using amino acid sequences from Drosophila melanogaster, we identified retroduplicated gene pairs
notations showed some enrichment of retroduplications in the cellularParent genes of retrotransposition-ge
melanogaster have dis
Morgan G.I. Langill
Department of Biology, University of New
Received 20 November 2
Available onli
Abstract⁎ Corresponding author. Fax: +1 (506) 453 3583.
E-mail address: clarkd@unb.ca (D.V. Clark).
1 Current address: Department of Molecular Biology and Biochemistry,
Simon Fraser University, Burnaby, Canada BC V5A 1S6.
0888-7543/$ - see front matter © 2007 Elsevier Inc. All rights reserved.
doi:10.1016/j.ygeno.2007.06.001rated gene duplicates in Drosophila
ct expression profiles
, Denise V. Clark ⁎
nswick, Fredericton, Canada NB E3B 6E1
; accepted 5 June 2007
12 July 2007
34–343
www.elsevier.com/locate/ygenosequence, or acquire transcriptional activity through mutation.
Otherwise, the new gene duplication is destined to become a
pseudogene.
Page 2
hidden
/ GeThe availability of complete eukaryotic genome sequences
has generated opportunities for exploring gene duplication in a
systematic way. Lynch and Conery [7] reported the first genome-
wide analysis of gene duplications using three completely
sequenced and three partially sequenced genomes. Analysis of
nucleotide substitutions for 462 duplications in Drosophila
melanogaster showed that duplications arise at a rate of about 31
per million years and have a half-life of 2.9 million years. Rubin
et al. [8] calculated that 5536 of 13,601 genes arose by gene
duplication inD. melanogaster. In contrast to Lynch and Conery
[7], Rubin et al. identified a larger set of duplications because
they included multigene families in their dataset and clustered
sequences that matched with a higher BLAST E value (10−6 vs
10−10).
Other whole-genome studies of duplications showed that the
yeast Saccharomyces cerevisiae has undergone whole-genome
duplication with subsequent loss and diversification of
duplicates [9]. This latter mode of gene duplication seems to
also account for a portion of the gene duplications in the
genome of the nematode Caenorhabditis elegans, but was not
detectable for the D. melanogaster genome in which tandem
duplication of single genes was more often observed [10]. The
Drosophila genome has several types of retrotransposons [11]
and, since active elements associated with retrovirus-like
particles can exist [12,13], gene duplications can also arise
through retrotransposition.
The mechanism of gene duplication by retrotransposition
has been studied in the yeast S. cerevisiae [14]. Here,
retrotransposition is mediated by retrotransposon sequences
and reverse transcriptase, as evidenced by Ty1 element
sequences flanking the duplicated gene, tracts of poly(A)
sequences downstream from the coding sequence, and an
increase in duplication rate upon induction of a high level of
Ty1 reverse transcriptase expression. However, even in these
newly generated duplications, the poly(A) sequences are not
always found and the flanking Ty1 sequences are not arranged
in a way so that integration could occur as it does for wild-type
Ty1 elements. Thus, analysis of retrotransposition events in
yeast has not provided evidence for a simple, unifying model
for the mechanism of retrotransposition mediated by the Ty1
retrotransposon.
Gene duplications arising by retrotransposition were exam-
ined in humans with the initial release of the genome sequence,
in which 97 functional intronless paralogs were identified [15].
In this group of genes, there is an excess of translation and
nuclear regulation proteins and metabolic and regulatory
enzymes. In D. melanogaster, whole-genome analysis resulted
in the characterization of 24 gene duplications that appear to
have been generated by retrotransposition [16]. These gene pairs
fit the criteria that the two genes are on different chromosomes,
they have at least 70% amino acid sequence identity, one
member has no introns, and, in a few cases, there are signs of
retrotransposition, as poly(A) tracts, for gene pairs in which both
are intronless. Analysis of expression data and chromosome
M.G.I. Langille, D.V. Clarklinkage showed that there was a significant tendency for genes
on the X chromosome to produce new copies on the autosomes,
and the new copies examined are mostly expressed in the testes.This observation is consistent with the hypothesis that genes on
the X chromosome are escaping the X-chromosome inactivation
that is thought to occur during spermatogenesis [16,17].
With the continued expansion of the Drosophila genome
project, we now have the most comprehensive developmental
expression data to date from microarray analysis [18], expressed
sequence tag (EST) sequences from a wider range of libraries
[19], and systematic functional annotations in the form of gene
ontology (GO) descriptions [20]. We have combined these data
with an analysis of gene duplications, focusing on possible
retrotransposed duplications as a subset. Our analysis shows that
the parent genes of retrotransposed genes are distinct in having a
consistently higher level of expression.
Results
Identification of retrotransposed genes
Assembly of a set of duplicated gene pairs first involved an
all-against-all comparison of Drosophila protein sequences
using a global alignment algorithm. Cluster analysis was then
performed to identify gene pairs and gene families with more
than two members. Since it would be difficult to determine the
parent/child relationship in gene families with more than two
members with similar levels of amino acid identity, these
families were excluded from the gene duplication datasets we
used for further analysis. The cluster analysis was performed at
several levels of amino acid sequence identity to derive datasets
of gene duplications at the 50, 60, and 70% levels. The gene
duplication datasets were then subdivided into potentially
retrotransposed versus all others by two filters. These filters
were (1) a minimum intergenic distance of 100,000 bp if the two
genes were on the same chromosome arm and (2) one member of
the gene pair having no introns in the amino acid coding region.
After filtering, there were 67 gene pairs at the 50% amino acid
sequence identity cutoff, 39 pairs at 60% identity, and 20 pairs at
70% identity (see Supplemental File 1). These gene pairs include
the changes introduced after updating the sequence dataset with
FlyBase release 5.1. One pair was removed (CG32713 and
CG12725) as it now formed a cluster of three genes. Seven new
pairs were identified at the 50% cutoff, but only one of these
pairs met the filtering criteria (CG34132 and Tim13). This
update did not change any of the conclusions for the data
analyses in this paper.
We found that a minimum intergenic distance of 100,000 bp
was a natural cutoff for deriving our subset of gene duplications
by retrotransposition after inspecting the distribution of inter-
genic distances for gene pairs with introns versus gene pairs in
which one gene has no introns. A plot of intergenic distances in
log base pairs between each pair of duplicated genes on the same
chromosome shows there is a bimodal distribution (Fig. 1A).
The majority of duplicated genes have an intergenic distance of
less than 100,000 bp. For those pairs in which both genes contain
introns, 74.9% of them have an intergenic distance of less than
335nomics 90 (2007) 334–343100,000 bp (Fig. 1B). In contrast, only 36.4% of duplications
with one gene containing an intron and the other gene containing
no introns had an intergenic distance of less than 100,000 bp
Page 3
hidden
/ Ge336 M.G.I. Langille, D.V. Clark(Fig. 1C). Therefore, a minimum distance of 100,000 bp was
used as a filter to eliminate the bulk of the tandem gene
duplications from the potentially retrotransposed gene duplica-
tion set. Supplemental File 2 summarizes all filtered potential
retrotransposed gene duplications with intronless paralogs with
N50% identity.
To obtain evidence that the 67 retroduplicated gene pairs
were functional, we looked for evidence of purifying selection
and gene expression. To test for purifying selection, a
likelihood ratio test of nonsynonymous (dN) to synonymous
(dS) nucleotide substitution ratios was done by applying a
conservative criterion of dN/dS=0.5 ([21] and as performed
by others [16]). We found that 63 of the 67 duplications had a
dN/dS ratio significantly less than 0.5 (pb10−5) and an
additional duplication had a ratio less than 0.5 with pb0.05.
The 67 duplications were also assessed for evidence of
expression of mRNA, particularly representation in expressed
sequence tags derived from various cDNA libraries [19]. If no
cDNAwas found, then evidence for expression was sought in
microarray datasets [18]. Only one gene, CG34132, lacked
evidence for expression, although it showed a dN/dS ratio with
its duplicate Tim13 of 0.1861, indicating purifying selection.
Fig. 1. Bimodal distribution of intergenic distances for gene duplications with
40% or greater identity. Dashed lines indicate an intergenic distance of
100,000 bp. (A) All gene duplications. (B) Gene duplications in which both
genes have introns interrupting their coding regions. (C) Gene duplications in
which one gene has no introns interrupting its coding region.The results of this analysis are tabulated in Supplemental
File 3.
Evaluation of method
The retrotransposed gene duplication set with greater than
70% identity is shown in Table 1. After reviewing the results,
one gene pair, His4r and His4:CG31611, was omitted from the
set. The intron-containing histone H4 replacement gene, His4r,
produces a product identical to the repeated histone H4 gene
cluster on chromosome 2L [22] of which His4:CG31611 is a
member. Since His4:CG31611 was the lone copy of the repeats
present in the database, and the many other copies of histone
H4 genes were not present in the release 3.1 protein sequences,
this gene pair had passed through our selection criteria. A
similar case was found for His2Av a histone 2A variant, at
55% identity with the repeated histone 2A genes, of which
His2A:CG31618 is a member. With the FlyBase release 5.1
update, these two pairs are not identified because they now
fall into clusters.
To determine if we might have missed some gene pairs by
clustering first, we instead applied the two filtering criteria prior
to clustering. Filtering first resulted in identification of the same
21 gene pairs identified by clustering first (release 3.1 data) and
one additional gene pair. Vha16 encodes a vacuolar ATPase
subunit [23], has introns, and is on chromosome 2R, whereas
two intronless copies, Vha16-2 and Vha16-3, are in close
tandem arrangement on chromosome 3L. Since two of the three
pairwise alignments resulted in greater than 70% identity (Vha16
by Vha16-3 and Vha16-2 by Vha16-3; Vha16 by Vha16-2 had
68% identity), this group formed a cluster and was excluded
from the original dataset by clustering first. However, by
filtering first, only the Vha16 + Vha16-3 pair was recovered. In
this case it appears that we can tentatively identify the parent
gene for Vha16-3 as Vha-16, based on amino acid sequence
identities and the presence of introns only in Vha-16. However,
because Vha16-2 and Vha16-3 are likely tandem duplicates, it is
difficult to determine which of these two genes is the direct
retrogene copy of Vha16 and which is a copy of its neighbor.
Thus, due to such complexities arising from filtering first,
clustering first, to separate gene pairs from multigene families,
seems to be a more conservative approach. Reversing the
procedure can identify more gene pairs, but a more detailed
analysis must be done when there are more than two family
members to determine if the direction of retroduplication can be
unambiguously established.
Betran et al. [16] analyzed a set of retrotransposed gene
duplications that comprised gene pairs only on different
chromosomes. They used the release 2 protein dataset and
performed pairwise local alignments using a FASTA program
[24]. The differences between the set of retroduplicated gene
pairs that have 70% or greater identity from Betran et al. [16] and
our set shown in Table 1 likely reflect different datasets, different
alignment methods, our use of clustering to exclude families
nomics 90 (2007) 334–343larger than two, and inclusion of intrachromosomal duplications.
Our Table 1 has 7 additional duplications and lacks 10 others
compared to those listed by Betran et al. [16]. The 7 additional
Page 4
hidden
ntity fitting the criteria of more than 100 kb intergenic distance and one member with
Linkage Molecular function
3R Carnitine O-acetyltransferase activity
X Calcineurin
3R GTPase regulator activity
3R G-protein-coupled receptor
3R Ubiquitin-conjugating activity
3R Aconitate hydratase
3L Hydrogen-exporting ATPase
3R Endopeptidase
337/ Genomics 90 (2007) 334–343duplications include 2 X-to-autosome duplications (parental
genes Rpt3 and Cyp1) and 1 interchromosomal duplication
(parental gene Vha16). Four additional duplications are
intrachromosomal (CG1041 and CG5265, Rh4 and Rh3,
CG40045 and CG9602, Prat2 and Prat). The 10 retroduplica-
tions we did not identify, in contrast to Betran et al. [16], fall into
Table 1
Twenty-one putative retrotransposed gene pairs at N70% amino acid sequence ide
no introns (parent gene)
Parent gene Linkage Child gene
CG1041 3R CG5265
CanB2 2R CanB
CG8331 2R CG4960
Rh4 3L Rh3
CG40045 3h CG9602
Acon 2L CG4706
Vha16 2R Vha16-3
Rpt3 X Rpt3R
Hsp60 X Hsp60C
Atg8a X Atg8b
Ctp X Cdlc2
CG8310 X Vha36
Ntf-2 X Ntf-2r
Cyp1 X CG7768
CG3560 X CG17856
Pros28.1 X Pros28.1A
RpL37a X RpL37b
Ef1alpha100E 3R Ef1alpha48D
Sep2 3R Sep5
CG17734 3R CG11825
Prat2 3L Prat
M.G.I. Langille, D.V. Clarkthree categories. Mgst1-Psi is an intronless copy of Mgst1 that is
now annotated as a pseudogene in FlyBase [25]. It was found to
have a stop codon and not to be expressed according to Toba and
Aigaki [26]. Six duplications were screened from our dataset by
clustering. These represented cases in which the direction of the
duplication event was ambiguous due to other well-conserved
family members. Three other gene pairs were excluded from
Table 1 because the Needleman–Wunsch global alignment
program reported identities just under the 70% cutoff. However,
these pairs all appeared in our N60% identity dataset. Thus, in
comparison to Betran et al. [16], our approach to identifying
retroduplications is more conservative with respect to establish-
ing the parent gene and the direction of retrotransposition, while
it is less conservative by allowing the more distant intrachro-
mosomal duplications.
Retrotransposition to and from the X chromosome
One central finding of Betran et al. [16] was that there was a
significant excess of gene duplications going from the X
chromosome to the autosomes. In light of the above differences
between our datasets, we reconsidered this idea with our dataset,
which includes intrachromosomal retrotransposed gene dupli-
cations greater than 100 kb apart. Using the genes from Table 1,
we find similarly that the distribution of jumps is nonrandom,
with an excess of retrotranspositions from the X chromosome to
the autosomes (Table 2).Higher levels of expression of parent genes in comparison to
their intronless child genes
Tomancak et al. [18] performed a genome-wide analysis of
gene expression during Drosophila embryogenesis by hybridi-
zation to microarrays. We analyzed their data to compare
2L Protein folding
3R Microtubule binding
2L Dynein light chain
2R Hydrogen-exporting ATPase activity
2L Protein carrier activity
3L Peptidyl–prolyl cis–trans isomerase activity
3R Ubiquinol–cytochrome c reductase activity
3R Proteasome 28-kDa subunit
2R Ribosomal protein L37e
2R Translation elongation factor
2R Septin
2R Hypoxia-induced protein conserved domain
3R Purine biosynthesis phosphoribosylamidotransferaseexpression levels of original retrotransposed genes (parents)
with their intronless paralogs (children). For each time point in
embryogenesis, there were more gene duplications that had a
higher expression level for the parent gene compared to its
intronless paralog. This trend was seen for datasets of gene pairs
with a percentage identity greater than 70% (Fig. 2A) and 60%
(not shown).
To put these findings into a larger context of all genes
examined, the distributions of expression levels during embry-
ogenesis were box-plotted for parent genes, intronless child
genes, and all genes (Fig. 2B). We also examined the expression
of a set of DNA-based duplications with N70% identity (see
Materials and methods) for comparison to retroduplicated gene
Table 2
Distribution of retrotransposition events among chromosomes X, 2, and 3 for the
duplications listed in Table 1
Direction of retrotransposition % Expected a Expected Observed
X to X 2.6 0.55 0
X to autosome 14.4 3.02 10
Autosome to X 12.6 2.65 1
Autosome to autosome
(interchromosomal)
34.9 7.33 6
Autosome to autosome
(intrachromosomal)
35.5 7.46 4

2=19.55 and, for 4 degrees of freedom, p=0.001.
a Expected values calculated following the method of Betran et al. [16], but
revised to include intrachromosomal duplications.
Page 5
hidden
/ Genomics 90 (2007) 334–343338 M.G.I. Langille, D.V. Clarkpairs. The medians and ranges of expression levels for the
parent genes show a broader distribution compared to child
genes, the other duplications, and all genes. In addition,
overall mean expression levels were significantly higher
(pb0.01) for the parent genes compared to the intronless
child genes, other duplications, and all genes for all time
points during embryogenesis. There were no significant
differences (pN0.4) between the mean expression levels of
the intronless child genes and all genes for all time points (data
not shown). Likewise, for the other duplications, there were no
significant differences (pN0.1) between their mean expression
levels for all time points.
Consistent with the above observations, we found that the
representation of parent genes with at least one EST in a given
library was significantly higher (pb0.01) compared to all genes
in 7 of the 10 EST libraries (Fig. 3). The two adult testes
libraries (AI and AT) and the tissue culture cell library (SD)
showed a significantly higher EST representation for the
intronless paralogs compared to all genes in that library (Fig.
3). Two libraries, RE and RH, were included for completeness,
although they were made using a normalization step so that
Fig. 2. Expression of duplicated genes during embryogenesis. (A) Comparison of expression levels for retrotransposed gene pairs during embryogenesis (percentage
identity N70%). All time points show that, for a majority of gene pairs, the parent gene has a significantly higher expression level compared to the intronless child gene
(pb0.01). (B) Box-plot distributions of mean absolute measures of expression levels [18] for parent genes, their intronless child genes, other (DNA-based) duplications
(all at percentage identity N70%), and all genes during embryogenesis. Boxes represent the bounds of expression values falling within the 25th and 75th percentiles,
the midpoint in the box is the median, and vertical lines show the extent of expression values beyond this range.
Fig. 3. Representation of parent genes, their intronless paralogs (N60% identity),
and all genes in various BDGP EST libraries. ⁎ The parent gene has significantly
higher average representation compared to all genes for that EST library. ⁎⁎ The
intronless paralog gene has significantly higher average representation
compared to all genes for that EST library.
Page 6
hidden
upli
339/ Genomics 90 (2007) 334–343differences between EST representations are expected to be less
pronounced.
During the first 2 h of embryogenesis, timed from egg
deposition, the embryo is a syncytium where 13 rapid mitotic
cycles occur [27]. The earliest point when zygotic gene
expression can occur is at cycle 10, about 80 min after egg
deposition [28]. Thus, any transcripts detected in the first hour
of embryogenesis are maternally inherited and reflect gene
expression in the female germ line. Of the 21 gene pairs listed in
Table 1, data for both genes were available for 19 pairs from the
Fig. 4. Distribution of retrotransposed gene duplications (123) versus other gene d
among gene ontology biological process terms.
M.G.I. Langille, D.V. ClarkTomancak dataset [18]. Of these, we found significantly higher
levels of expression for the first hour of embryogenesis for 15 of
the 19 parent genes (p value for t test ≤0.005).
Distribution of functions according to gene ontology terms
shows enrichment for cellular physiological processes
To determine if retrotransposed gene duplications have a
different distribution of functions, we used the GoMiner [29]
resource and FlyBase release 5.1 to compare gene annotations
for retrotransposed duplications to other gene duplications in
which both copies have introns (greater that 50% identity) and
to all genes. The distribution of functions among high-level
gene ontology biological processes is shown in Fig. 4. We found
some enrichment for genes involved in cellular physiological
processes in the retroduplication gene group. This observation
may reflect a greater tolerance for genes within the cellular
physiological class to vary in expression level and dosage more
than others.
Nucleotide substitution rates
Synonymous and nonsynonymous substitution rates were
calculated for 161 gene pairs with over 70% identity for the
following three groups: (1) pairs with only one intronless geneand N100,000 bp intergenic distance; (2) pairs with both
intronless genes and N100,000 bp intergenic distance; (3) all
other pairs. We included group 2 as they could represent other
possible retrotranspositions; however, due to the absence of
introns in both genes the direction of the retroduplication could
not be determined. A plot of synonymous (KS) versus
nonsynonymous (KA) substitutions (Fig. 5) shows that the
first group has no gene pairs near the origin, in contrast to the
other two groups. This result suggests that the putative
retrotransposed gene duplicates have undergone a higher level
cations (both with introns; 338) at N50% amino acid identity and all genes (7658)of substitution compared to the other gene duplicates. This
difference could reflect a higher rate of divergence for the
retroduplications, a different age distribution, or a lack of gene
conversion events due to intergenic distance and intron/exon
organization differences.
Discussion
Our analysis of duplicated gene pairs generated by retro-
transposition in D. melanogaster has identified a natural cutoff
Fig. 5. Synonymous (KS) and nonsynonymous (KA) substitutions between
paralogous genes at 70% amino acid identity where Group 1 contains pairs with
one intronless gene and N100,000 bp intergenic distance (n=21), Group 2
contains pairs with both intronless genes and N100,000 bp intergenic distance
(n=13), and Group 3 contains all other pairs (n=124).
Page 9
hidden
Drosophila melanogaster, J. Virol. 74 (2000) 10658–10669.
[14] J. Schacherer, Y. Tourrette, J.L. Souciet, S. Potier, J. De Montigny,
/ GeRecovery of a function involving gene duplication by retroposition in
Saccharomyces cerevisiae, Genome Res. 14 (2004) 1291–1297.
[15] J.C. Venter, M.D. Adams, E.W. Myers, P.W. Li, R.J. Mural, G.G. Sutton,
The sequence of the human genome, Science 291 (2001) 1304–1351.determine if the free value is significantly different from 0.5 [21]. If the free dN/
dS value is found to be significantly less than 0.5, then there is good evidence for
purifying selection and that both genes are functional [16]. The likelihood ratio
test results are listed in Supplemental File 3.
Acknowledgments
We thank the Canadian Bioinformatics Resource at the
National Research Council in Halifax, Canada, for computing
support. This work was supported by a Natural Sciences and
Engineering Research Council of Canada grant to D.V.C.
Appendix A. Supplementary data
Supplementary data associated with this article can be found,
in the online version, at doi:10.1016/j.ygeno.2007.06.001.
References
[1] S. Ohno, Evolution by Gene Duplication, Springer-Verlag, New York,
1970.
[2] A. Force, M. Lynch, F.B. Pickett, A. Amores, Y.-L. Yan, J. Postlethwait,
Preservation of duplicate genes by complementary, degenerative muta-
tions, Genetics 151 (1999) 1531–1545.
[3] M. Lynch, A. Force, The probability of duplicate gene preservation by
subfunctionalization, Genetics 154 (2000) 459–473.
[4] E.T. Dermitzakis, A.G. Clark, Differential selection after duplication in
mammalian developmental genes, Mol. Biol. Evol. 18 (2001) 557–562.
[5] X. Gu, Maximum-likelihood approach for gene family evolution under
functional divergence, Mol. Biol. Evol. 18 (2001) 453–464.
[6] J. Brosius, RNAs from all categories generate retrosequences that may
be exapted as novel genes or regulatory elements, Gene 238 (1999)
115–134.
[7] M. Lynch, J.S. Conery, The evolutionary fate and consequences of
duplicate genes, Science 290 (2000) 1151–1155.
[8] G.M. Rubin, M.D. Yandell, J.R. Wortman, G.L. Gabor Miklos, C.R.
Nelson, I.K. Hariharan, M.E. Fortini, P.W. Li, R. Apweiler, W.
Fleischmann, J.M. Cherry, S. Henikoff, M.P. Skupski, S. Misra, M.
Ashburner, E. Birney, M.S. Boguski, T. Brody, P. Brokstein, S.E. Celniker,
S.A. Chervitz, D. Coates, A. Cravchik, A. Gabrielian, R.F. Galle, W.M.
Gelbart, R.A. George, L.S. Goldstein, F. Gong, P. Guan, N.L. Harris, B.A.
Hay, R.A. Hoskins, J. Li, Z. Li, R.O. Hynes, S.J. Jones, P.M. Kuehl, B.
Lemaitre, J.T. Littleton, D.K. Morrison, C. Mungall, P.H. O'Farrell, O.K.
Pickeral, C. Shue, L.B. Vosshall, J. Zhang, Q. Zhao, X.H. Zheng, S. Lewis,
Comparative genomics of the eukaryotes, Science 287 (2000) 2204–2215.
[9] M. Kellis, B.W. Birren, E.S. Lander, Proof and evolutionary analysis of
ancient genome duplication in the yeast Saccharomyces cerevisiae, Nature
428 (2004) 617–624.
[10] R. Friedman, A.L. Hughes, Gene duplication and the structure of
eukaryotic genomes, Genome Res. 11 (2001) 373–381.
[11] D.L. Lindsley, G.G. Zimm, The Genome of Drosophila melanogaster,
Academic Press, San Diego, 1992.
[12] D.J. Finnegan, Wandering retroviruses? Curr. Biol. 4 (1994) 641–643.
[13] P. Leblanc, S. Desset, F. Giorgi, A.R. Taddei, A.M. Fausto, M. Mazzini, B.
Dastugue, C. Vaury, Life cycle of an endogenous retrovirus, ZAM, in
342 M.G.I. Langille, D.V. Clark[16] E. Betran, K. Thornton, M. Long, Retroposed new genes out of the X in
Drosophila, Genome Res. 12 (2002) 1854–1859.
[17] J.R. McCarrey, Spermatogenesis as a model system for developmentalanalysis of regulatory mechanisms associated with tissue-specific gene
expression, Semin. Cell Dev. Biol. 9 (1998) 459–466.
[18] P. Tomancak, A. Beaton, R. Weiszmann, E. Kwan, S. Shu, S.E. Lewis, S.
Richards, M. Ashburner, V. Hartenstein, S.E. Celniker, G.M. Rubin,
Systematic determination of patterns of gene expression during Drosophila
embryogenesis, Genome Biol. 3 (2002) RESEARCH0088.
[19] G.M. Rubin, L. Hong, P. Brokstein, M. Evans-Holm, E. Frise, M.
Stapleton, D.A. Harvey, A Drosophila complementary DNA resource,
Science 287 (2000) 2222–2224.
[20] M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry,
A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L.
Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M.
Ringwald, G.M. Rubin, G. Sherlock, Gene Ontology: tool for the
unification of biology. The Gene Ontology Consortium, Nat. Genet. 25
(2000) 25–29.
[21] Z. Yang, Likelihood ratio tests for detecting positive selection and
application to primate lysozyme evolution, Mol. Biol. Evol. 15 (1998)
568–573.
[22] A. Akhmanova, K. Miedema, W. Hennig, Identification and characteriza-
tion of the Drosophila histone H4 replacement gene, FEBS Lett. 388
(1996) 219–222.
[23] J.A. Dow, The multifunctional Drosophila melanogaster V-ATPase is
encoded by a multigene family, J. Bioenerg. Biomembr. 31 (1999) 75–83.
[24] W.R. Pearson, Rapid and sensitive sequence comparison with FASTP and
FASTA, Methods Enzymol. 183 (1990) 63–98.
[25] R.A. Drysdale, M.A. Crosby, FlyBase: genes and gene models, Nucleic
Acids Res. 33 (2005) D390–D395.
[26] G. Toba, T. Aigaki, Disruption of the microsomal glutathione S-transferase-
like gene reduces life span of Drosophila melanogaster, Gene 253 (2000)
179–187.
[27] V.E. Foe, G.M. Odell, B.A. Edgar, Mitosis and morphogenesis in the
Drosophila embryo: point and counterpoint, in: M. Bate, A. Martinez
Arias (Eds.), The Development of Drosophila melanogaster, Cold Spring
Harbor Laboratory Press, Cold Spring Harbor, NY, 1993, pp. 149–300.
[28] B.A. Edgar, G. Schubiger, Parameters controlling transcriptional activation
during early Drosophila development, Cell 44 (1986) 871–877.
[29] B.R. Zeeberg, W. Feng, G. Wang, M.D. Wang, A.T. Fojo, M. Sunshine, S.
Narasimhan, D.W. Kane, W.C. Reinhold, S. Lababidi, K.J. Bussey, J. Riss,
J.C. Barrett, J.N. Weinstein, GoMiner: a resource for biological inter-
pretation of genomic and proteomic data, Genome Biol. 4 (2003) R28.
[30] H. Dai, T.F. Yoshimatsu, M. Long, Retrogene movement within- and
between-chromosomes in the evolution of Drosophila genomes, Gene 385
(2006) 96–102.
[31] A.C. Marques, I. Dupanloup, N. Vinckenbosch, A. Reymond, H.
Kaessmann, Emergence of young human genes after a burst of retro-
position in primates, PLoS Biol. 3 (2005) e357.
[32] J. Maciejowski, J.H. Ahn, P.G. Cipriani, D.J. Killian, A.L. Chaudhary, J.I.
Lee, R. Voutev, R.C. Johnsen, D.L. Baillie, K.C. Gunsalus, D.H. Fitch, E.J.
Hubbard, Autosomal genes of autosomal/X-linked duplicated gene pairs
and germ-line proliferation in Caenorhabditis elegans, Genetics 169 (2005)
1997–2011.
[33] Z. Gu, D. Nicolae, H.H. Lu, W.H. Li, Rapid divergence in expression
between duplicate genes inferred from microarray data, Trends Genet. 18
(2002) 609–613.
[34] K.D. Makova, W.H. Li, Divergence in the spatial pattern of gene
expression between human duplicate genes, Genome Res. 13 (2003)
1638–16345.
[35] B.P. Cusack, K.H. Wolfe, Not born equal: increased rate asymmetry in
relocated and retrotransposed rodent gene duplicates, Mol. Biol. Evol. 24
(2007) 679–686.
[36] Y. Bai, C. Casola, C. Feschotte, E. Betran, Comparative genomics reveals a
constant rate of origination and convergent acquisition of functional
retrogenes in Drosophila, Genome Biol. 8 (2007) R11.
[37] D.G. Gilbert, DroSpeGe, a public database of Drosophila species genomes,
Nucleic Acids Res. 35 (2007) D480–D485.
nomics 90 (2007) 334–343[38] S.B. Needleman, C.D. Wunsch, A general method applicable to the search
for similarities in the amino acid sequence of two proteins, J. Mol. Biol. 48
(1970) 443–453.
Page 10
hidden
[39] A. Heger, L. Holm, Towards a covering set of protein family profiles, Prog.
Biophys. Mol. Biol. 73 (2000) 321–337.
[40] P. Rice, I. Longden, A. Bleasby, EMBOSS: the European Molecular
Biology Open Software Suite, Trends Genet. 16 (2000) 276–277.
[41] J. Comeron, A method for estimating the numbers of synonymous
and nonsynonymous substitutions per site, J. Mol. Evol. 41 (1995)
1152–1159.
[42] J. Comeron, K-Estimator: calculation of the number of nucleotide sub-
stitutions per site and the confidence intervals, Bioinformatics 15 (1999)
763–764.
[43] Z. Yang, PAML: a program package for phylogenetic analysis by
maximum likelihood, CABIOS 13 (1997) 555–556.
[44] N. Goldman, Z. Yang, A codon-based model of nucleotide substitution for
protein-coding DNA sequences, Mol. Biol. Evol. 11 (1994) 725–736.
343M.G.I. Langille, D.V. Clark / Genomics 90 (2007) 334–343

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

5 Readers on Mendeley
by Discipline
 
by Academic Status
 
40% Ph.D. Student
 
20% Student (Master)
 
20% Post Doc
by Country
 
40% Canada
 
20% India
 
20% United Kingdom

Groups

Publications