Sign up & Download
Sign in

Automated design of synthetic ribosome binding sites to control protein expression.

by Howard M Salis, Ethan A Mirsky, Christopher A Voigt
Nature Biotechnology ()

Abstract

Microbial engineering often requires fine control over protein expression-for example, to connect genetic circuits or control flux through a metabolic pathway. To circumvent the need for trial and error optimization, we developed a predictive method for designing synthetic ribosome binding sites, enabling a rational control over the protein expression level. Experimental validation of >100 predictions in Escherichia coli showed that the method is accurate to within a factor of 2.3 over a range of 100,000-fold. The design method also correctly predicted that reusing identical ribosome binding site sequences in different genetic contexts can result in different protein expression levels. We demonstrate the method's utility by rationally optimizing protein expression to connect a genetic sensor to a synthetic circuit. The proposed forward engineering approach should accelerate the construction and systematic optimization of large genetic systems.

Cite this document (BETA)

Available from www.pubmedcentral.nih.gov
Page 1
hidden

Automated design of synthetic rib...

946 volume 27 number 10 october 2009 nature biotechnology l e t t e r s Microbial engineering often requires fine control over protein expression���for example, to connect genetic circuits1���7 or control flux through a metabolic pathway8���13. To circumvent the need for trial and error optimization, we developed a predictive method for designing synthetic ribosome binding sites, enabling a rational control over the protein expression level. Experimental validation of 100 predictions in Escherichia coli showed that the method is accurate to within a factor of 2.3 over a range of 100,000-fold. The design method also correctly predicted that reusing identical ribosome binding site sequences in different genetic contexts can result in different protein expression levels. We demonstrate the method���s utility by rationally optimizing protein expression to connect a genetic sensor to a synthetic circuit. The proposed forward engineering approach should accelerate the construction and systematic optimization of large genetic systems. Trial-and-error mutation to optimize an engineered genetic circuit or metabolic pathway becomes prohibitively inefficient as the system���s size and complexity grows. To address this problem, we devel- oped a predictive design method that interconverts between the DNA sequence of a key genetic element���ribosome binding sites���and their function inside a genetic system (controlling the translation initiation rate and the protein expression level). The design method���s capabili- ties enable the systematic optimization of genetic systems, which will be increasingly valuable as it becomes possible to synthesize larger pieces of DNA14, including whole genomes15. In bacteria, ribosome binding sites (RBSs) and other regulatory RNA sequences are effective control elements for translation ini- tiation16���19, and thereby protein expression. Previous studies have generated libraries of RBS sequences with the goal of optimizing the function of a genetic system1,7,18. However, library size increases combinatorially with the number of proteins in the engineered system���for example, randomly mutating four nucleotides of an RBS generates a library of 256 sequences, thus requiring 2563, or 16.7 million, sequences for three proteins and 2566, or 2.8 �� 1014, sequences for six proteins. In contrast to a library-based approach, we combined a biophysi- cal model of translation initiation with an optimization algorithm to predict the sequence of a synthetic RBS sequence that provides a target translation initiation rate on a proportional scale. The model builds on previous work that characterized the free energies of key molecular interactions involved in translation initiation20,21 and on measurements of the sequence-dependent energetic changes that occur during RNA folding and hybridization22���26. Bacterial translation consists of four phases: initiation, elongation, termination and ribosome turnover (Fig. 1a)27. In most cases, translation initiation is the rate-limiting step. Its rate is determined by multiple molecular interactions, including the hybridization of the 16S rRNA to the RBS sequence, the binding of tRNAfMET to the start codon, the distance between the 16S rRNA binding site and the start codon (called spacing) and the presence of RNA secondary structures that occlude either the 16S rRNA binding site or the standby site20,21,28���31. Our equilibrium statistical thermodynamic model quantifies the strengths of the molecular interactions between an mRNA transcript and the 30S ribosome complex���which includes the 16S rRNA and the tRNAfMET���to predict the resulting translation initiation rate, r (equation (1), derived in Supplementary Methods). r ��� ���b ) exp( ���Gtot The model describes the system as having two states separated by a reversible transition (Fig. 1b). The initial state is the folded mRNA transcript and the free 30S complex. The final state is the assembled 30S pre-initiation complex bound on an mRNA transcript. The difference in Gibbs free energy between these two states (���Gtot) depends on the mRNA sequence surrounding a specified start codon. ���Gtot is more negative when attractive interactions between ribosome and mRNA are present, and ���Gtot is more positive when mutually exclusive secondary structures are present. �� is the apparent Boltzmann constant for the system, which converts thermodynamic free energies to temperature differences. Importantly, equation (1) describes the differences in translation initiation rate that result from differences in mRNA sequence. The amount of expressed protein is proportional to the translation initiation rate where the proportionality factor accounts for any ribosome-mRNA molecular interactions that are independent of mRNA sequence and any translation-independent parameters, such as the DNA copy number, the promoter���s transcription rate, the mRNA stability and the protein dilution rate (Supplementary Fig. 1). (1) (1) Automated design of synthetic ribosome binding sites to control protein expression Howard M Salis1, Ethan A Mirsky2 & Christopher A Voigt1 1Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, USA. 2Graduate Group in Biophysics, University of California San Francisco, San Francisco, California, USA. Correspondence should be addressed to C.A.V. (cavoigt@picasso.ucsf.edu). Received 30 July accepted 8 September published online 4 October 2009 doi:10.1038/nbt.1568 �� 2009 Nature America, Inc. All rights reserved.
Page 2
hidden
nature biotechnology volume 27 number 10 october 2009 947 l e t t e r s Given a specific mRNA sequence���called the sub-sequence��� surrounding a start codon, ���Gtot is predicted according to an energy model (equation (2)), where the reference state is a fully unfolded sub-sequence with G = 0. ��� ���GmRNA:rRNA ��� ���Gspacing ��� ���GmRNA G G G tot start standby = + + ��� ��� (2) ���GmRNA:rRNA is the energy released when the last nine nucleotides (nt) of the E. coli 16S rRNA (3���-AUUCCUCCA-5���) hybridizes and co-folds to the mRNA sub-sequence (���GmRNA:rRNA 0). Intramolecular folding within the mRNA is allowed. All possible hybridizations between the mRNA and 16S rRNA are considered to find the highest affinity 16S rRNA binding site. The binding site minimizes the sum of the hybridization free energy ���GmRNA:rRNA and the penalty for nonoptimal spacing, ���Gspacing. Thus, the algorithm can identify the 16S rRNA binding site regardless of its similarity to the consensus Shine-Dalgarno sequence. ���Gstart is the energy released when the start codon hybridizes to the initiating tRNA anticodon loop (3���-UAC-5���). ���Gspacing is the free energy penalty caused by a nonoptimal physi- cal distance between the 16S rRNA binding site and the start codon (���Gspacing 0). When this distance is increased or decreased from an optimum of 5 nt (or ~17 ��)29, the 30S complex becomes distorted, resulting in a decreased translation initiation rate. ���GmRNA is the work required to unfold the mRNA sub-sequence when it folds to its most stable secondary structure, called the mini- mum free energy structure (���GmRNA 0). ���Gstandby is the work required to unfold any secondary structures sequestering the standby site (���Gstandby 0) after the 30S complex assembly. We define the standby site as the four nucleotides upstream of the 16S rRNA binding site, which is its location in a previously studied mRNA28. To calculate ���GmRNA:rRNA, ���Gstart, ���GmRNA and ���Gstandby, we use the NUPACK suite of algorithms32 with the Mfold 3.0 RNA energy parameters22,23. These free energy calculations do not have any addi- tional fitting or training parameters and explicitly depend on the mRNA sequence. In addition, the free energy terms are not ortho- gonal changing a single nucleotide can potentially affect multiple energy terms. The relationship between the spacing and the ���Gspacing was empirically determined by measuring the protein expression level driven by synthetic RBSs of varying spacing and fitting a quantitative model to this data (Online Methods, Supplementary Table 1 and Supplementary Fig. 2). For an arbitrary mRNA transcript, the thermodynamic model (equation (2)) is evaluated for each AUG or GUG start codon. The algorithm considers only the sub-sequence of the mRNA transcript consisting of 35 nucleotides before and after the start codon. This mRNA ���GmRNA ���Gstandby ���GmRNA:rRNA ���Gspacing ���Gstart Initial Initial state 30S a b 50S 70S 30S complex Final state Final ���Gtot Figure 1 A thermodynamic model of bacterial translation initiation. (a) The ribosome translates an mRNA transcript and produces a protein in a multistep process: the assembly of the 30S complex (box), initiation, elongation, termination, and the turnover of ribosomal subunits and other factors. (b) The thermodynamic free energy change during 30S complex assembly is determined by five molecular interactions that participate in the initial and final states of the system. The Watson-Crick base pairs and G:U wobbles (red lines) are shown. Figure 2 A ribosome binding site design method. (a) Reverse engineering. The method predicts the relative translation initiation rate (red) of an RBS upstream of a given protein coding sequence (blue). The ���Gtot is the free energy change before and after the 30S ribosomal complex assembles on the mRNA. Equation (1) predicts a linear relationship between the log protein fluorescence and the predicted ���Gtot. (b) Red fluorescence protein reporter expression driven by 28 natural or existing RBSs compared to predicted ���Gtot calculations. Error bars are s.d. of six measurements performed on two different days. Linear regression R2 = 0.54 with slope �� = 0.45 �� 0.05. (c) Histogram of the distribution of error in the predicted ���Gtot, denoted by |������G|, of the sequences in b. The average of this distribution is 2.11 kcal/mol. (d) Forward engineering. A simulated annealing optimization algorithm iteratively mutates an RNA sequence until a target ���Gtot is found. (e) RFP expression driven by 29 synthetic RBSs compared to the predicted ���Gtot calculations. Error bars are s.d. of at least five measurements performed on two different days. Linear regression R2 = 0.84 with slope �� = 0.45 �� 0.01. (f) Histogram of the distribution of the error, |������G| from e. The average of the distribution is 1.82 kcal/mol and fits well to a one-sided Gaussian distribution (red line) with s.d. �� = 2.44 kcal/mol. Reverse engineering Model Model Mutate Accept or reject? 105 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 9 10 0 104 103 Fluorescence (A U) Fluorescence (A U) Fr equency Fr equency 102 101 100 ���10 ���6 ���2 2 6 10 14 18 ���10 ���6 ���2 2 6 10 14 18 Predicted ���Gtot Predicted expression level Predicted ���Gtot (kcal/mol) Predicted ���Gtot (kcal/mol) Error |������G| (kcal/mol) 0 1 2 3 4 5 6 7 8 9 10 Error |������G| (kcal/mol) Target ���Gtot reached? Forward engineering 105 12 10 8 6 4 2 0 104 103 102 101 100 a d e f b c �� 2009 Nature America, Inc. All rights reserved.

Authors on Mendeley

Readership Statistics

304 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
39% Ph.D. Student
 
18% Post Doc
 
11% Student (Master)
by Country
 
52% United States
 
8% United Kingdom
 
4% Germany

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in