Sign up & Download
Sign in

Gene regulatory network growth by duplication.

by Sarah A Teichmann, M Madan Babu
Nature Genetics ()

Abstract

We are beginning to elucidate transcriptional regulatory networks on a large scale and to understand some of the structural principles of these networks, but the evolutionary mechanisms that form these networks are still mostly unknown. Here we investigate the role of gene duplication in network evolution. Gene duplication is the driving force for creating new genes in genomes: at least 50% of prokaryotic genes and over 90% of eukaryotic genes are products of gene duplication. The transcriptional interactions in regulatory networks consist of multiple components, and duplication processes that generate new interactions would need to be more complex. We define possible duplication scenarios and show that they formed the regulatory networks of the prokaryote Escherichia coli and the eukaryote Saccharomyces cerevisiae. Gene duplication has had a key role in network evolution: more than one-third of known regulatory interactions were inherited from the ancestral transcription factor or target gene after duplication, and roughly one-half of the interactions were gained during divergence after duplication. In addition, we conclude that evolution has been incremental, rather than making entire regulatory circuits or motifs by duplication with inheritance of interactions.

Cite this document (BETA)

Available from www.ncbi.nlm.nih.gov
Page 1
hidden

Gene regulatory network growth by...

MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, U.K. Correspondence should be addressed to S.A.T. (sat@mrc-lmb.cam.ac.uk) or M.M.B. (madanm@mrc-lmb.cam.ac.uk). Published online 11 April 2004 doi:10.1038/ng1340 We are beginning to elucidate transcriptional regulatory networks on a large scale1 and to understand some of the structural principles of these networks2,3, but the evolutionary mechanisms that form these networks are still mostly unknown. Here we investigate the role of gene duplication in network evolution. Gene duplication is the driving force for creating new genes in genomes: at least 50% of prokaryotic genes4,5 and over 90% of eukaryotic genes6 are products of gene duplication. The transcriptional interactions in regulatory networks consist of multiple components, and duplication processes that generate new interactions would need to be more complex. We define possible duplication scenarios and show that they formed the regulatory networks of the prokaryote Escherichia coli and the eukaryote Saccharomyces cerevisiae. Gene duplication has had a key role in network evolution: more than one-third of known regulatory interactions were inherited from the ancestral transcription factor or target gene after duplication, and roughly one-half of the interactions were gained during divergence after duplication. In addition, we conclude that evolution has been incremental, rather than making entire regulatory circuits or motifs by duplication with inheritance of interactions. The basic unit of gene regulation consists of a transcription factor, its DNA binding site and the target gene or transcription unit it regu- lates. This basic unit can be elaborated to form a complex network in two ways: some genes may be regulated by more than one transcrip- tion factor, and some transcription factors may control more than one gene. In E. coli and yeast, a considerable number of regulatory interactions have been determined and are available in the RegulonDB database7 and in the data sets in refs. 2 and 3, which we used in this analysis. We investigated how these networks evolved to form complex sys- tems in which 100 transcription factors regulate several hundred genes. Gene duplication and subsequent divergence is the primary mechanism for the evolution of genomes and complexity4,5. The rate and mechanisms of duplication in eukaryotes have been investigated in detail8. When new genes evolve by duplication, regulatory interac- tions in networks can be either conserved or lost during the divergence process. Previous theoretical analyses have addressed this at an abstract level9���12. Here, we investigate the role of gene duplication and determine the extent to which duplicated genes inherit interactions from their ancestors in E. coli and yeast. To find instances of gene duplication, we need to reliably detect homology among genes. We used structural domain assignments from the SUPERFAMILY database13 to identify homology among the proteins (Supplementary Methods online), as this method can cap- ture more distant relationships than sequence comparisons alone14. From the domain assignments by the SUPERFAMILY hidden Markov models to the transcription factors, we observed that the DNA-bind- ing domains of E. coli and yeast largely come from different families, with only two families in common. Furthermore, comparison of the matches in terms of the domain architecture of the genes indicated that more than one-half of the genes with structural assignments in the E. coli and yeast networks are the results of gene duplication (Table 1 E. coli: (352 + 82) / (500 + 110) = 71% yeast: (173 + 70) / (277 + 80) = 68%). In this analysis, we considered proteins with the same domain architecture to have arisen from a common ancestor (Supplementary Methods online). Many transcription factors and target genes arose by gene duplica- tion. After the duplication event, the regulatory interaction may be inherited or may be lost. In either case, a new interaction may also be gained during divergence. Taking this into account, we describe the possible mechanisms by which duplications of transcription factor genes, target genes or both might lead to the formation of new interac- tions in the regulatory network. Then, by inspecting the data currently available, we determined the extent to which each mechanism has con- tributed to the formation of the regulatory networks of E. coli and yeast (Supplementary Note and Supplementary Methods online). When duplication of a transcription factor occurs (Fig. 1a), the new transcription factor may initially recognize the same binding site and, hence, regulate the same target gene as the original transcription factor. During subsequent divergence, the duplicated transcription factor may continue to regulate the same target genes as its ancestor but respond to a different signal (Fig. 2a), or it may recognize a new binding site upstream of some other target gene(s). Investigation of the known net- work in both organisms2,3,7 showed that duplication of transcription factor genes followed by inheritance of interaction has contributed con- siderably to the growth of the regulatory network: more than two-thirds of E. coli (77%) and yeast (69%) transcription factors have at least one interaction in common with their duplicates (Table 1). This accounts for 128 interactions (10%) in E. coli and 188 interactions (22%) in yeast Gene regulatory network growth by duplication Sarah A Teichmann & M Madan Babu L E T T E R S 492 VOLUME 36 | NUMBER 5 | MAY 2004 NATURE GENETICS �� 2004 Nature Publishing Group http://www.nature.com/naturegenetics
Page 2
hidden
L E T T E R S NATURE GENETICS VOLUME 36 | NUMBER 5 | MAY 2004 493 (Fig. 3 and Table 1). This fraction is larger in yeast than in E. coli because many genes in yeast are regulated by two or more transcription factors, whereas many genes are regulated by only one or two transcription fac- tors in E. coli (Supplementary Note online). As a rule, larger genomes have more transcription factors per gene15. In the second duplication scenario, duplication of the target gene and its upstream region can explain the evolution of new genes along with their regulatory regions (Fig. 1b). During divergence, the dupli- cated target gene may change its coding sequence to carry out a differ- ent function but conserve its upstream region, or both the coding sequence and the upstream region may diverge, resulting in recogni- tion by a different transcription factor. The first possibility results in homologous genes being regulated by the same transcription fac- tor16,17 (Fig. 2b), and the latter results in homologous genes being reg- ulated by different transcription factors, which is not uncommon in yeast18. Duplication of the target gene with inheritance of interaction contributed to 272 interactions (22%) and 166 interactions (20%) in the E. coli and yeast networks, respectively (Fig. 3 and Table 1). Yeast and E. coli show extensive duplication under both duplication scenarios discussed above, meaning that this phenomenon is not biased by prokaryotic horizontal transfer or the operon structure. So far, we have considered duplications of transcription factors and target genes separately. But a transcription factor and its target gene could both duplicate around the same time (Fig. 1c), especially if they were adjacent on a chromosome. Divergence of both the transcription factor and the recognition sites in the DNA could then occur, such that the new transcription factor would regulate only the new target gene, and the old transcription factor would regulate only its original target gene. Though it might seem unlikely, this process can be traced con- vincingly in some cases (e.g., two sugar catabolism operons in E. coli17 Fig. 2c). There are 74 (6%) and 31 (4%) such interactions in the E. coli and yeast networks, respectively (Fig. 3 and Table 1). Figure 3 and Tables 1 and 2 provide an overview of the contribu- tion of the different types of regulatory interactions to the entire net- work. The largest fraction of interactions represents cases in which either the transcription factor or target gene was duplicated, and gained new interactions after duplication during divergence, with or without loss of the original interaction (Fig. 1). There are 637 such interactions in E. coli (52%) and 365 in yeast (43% Fig. 3). The sec- ond largest group of interactions comprises those inherited by tran- scription factors or target genes after duplication (38% and 45% in E. coli and yeast, respectively), and the smallest group comprises Figure 1 Duplication growth models and consequences for network evolution. The basic unit of gene regulation is shown in the center: the transcription factor (TF), the target gene (TG) and its binding site. The three panels describe the possible duplication events of this basic unit and the subsequent divergence resulting in new regulatory interactions. Duplication events are represented by light blue arrows and divergence events by orange arrows. Divergence may also result in the loss of the duplicated gene, but we consider only duplicated genes that are selected for. (a) Duplication of the transcription factor leads to both transcription factors regulating the same gene. Divergence can result in the duplicated transcription factor regulating the original target gene by competing for the same binding site (red arrow, duplication and inheritance of interaction) used by the ancestral transcription factor or regulating a different gene (gray arrow, duplication and gain of interaction). (b) Duplication of a target gene results in both genes being regulated by the same transcription factor. Divergence can lead to the duplicated gene remaining under the control of the same transcription factor (blue arrow, duplication and inheritance of interaction) or coming under the control of a different transcription factor (gray arrow, duplication and gain of interaction). (c) Duplication of transcription factor and its target genes gives rise to new regulatory interactions. Divergence can result in homologous transcription factors regulating homologous genes (green arrow, duplication and inheritance of interaction). Subsequent divergence of the transcription factor or the target gene can result in additional interactions (gray arrow, duplication and gain of interaction). Dupli cation of TF Dup lication of TG Duplication of TF + TG Inheritance Loss and gain Inheritance Loss and gain Loss and inheritance Gain Gain a b c Figure 2 Duplications in the E. coli and yeast networks. Transcription factors and target genes that have the same domain architecture are shown as circles and squares with the same color. (a) Duplication of transcription factors in a feed-forward motif (FFM) in yeast. The homologous transcription factors PDR1 and PDR3 are involved in drug responses and regulate multidrug transporters in yeast. This FFM could have evolved by duplication according to the scheme shown in Figure 1a. (b) Duplication of target genes in a single input module (SIM) in E. coli. The BioA and BioBFCD operons are regulated by the BirA transcription factor only, a topology that is a SIM. BioA and BioF are homologous enzymes in the biotin biosynthesis pathway, and so this SIM could have evolved by duplication of target genes, as shown in Figure 1b. (c) Duplication of both a transcription factor and its target genes in yeast. This is an example in which both the transcription factor and target genes were duplicated to produce additional regulatory interactions in the network according to the scheme shown in Figure 1c. The simultaneous duplication of a transcription factor and two target genes is facilitated by the fact that the transcription factor and target genes are adjacent to each other on the yeast chromosome. Duplication of target gene Duplication of transcription factor c MAL33 MAL31 MAL32 MAL13 MAL11 MAL12 YGR290W YGR291C Chr VII Chr II Duplication of transcription factor and target gene a PDR1 PDR3 FLR1 BioA BioF b BirA �� 2004 Nature Publishing Group http://www.nature.com/naturegenetics

Readership Statistics

178 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
37% Ph.D. Student
 
18% Post Doc
 
10% Student (Master)
by Country
 
27% United States
 
12% United Kingdom
 
8% France

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in