Statistical significance of quant...
BioMed Central Page 1 of 16 (page number not for citation purposes) BMC Bioinformatics Open Access Methodology article Statistical significance of quantitative PCR Yann Karlen1, Alan McNair1, S��bastien Perseguers2, Christian Mazza3 and Nicolas Mermod*1 Address: 1Institute of Biotechnology, University of Lausanne, 1015 Lausanne, Switzerland, 2Max-Planck-Institute f��r Quantenoptik, 85748 Garching, Germany and 3Department of Mathematics, University of Fribourg, CH-1700 Fribourg, Switzerland Email: Yann Karlen - firstname.lastname@example.org Alan McNair - email@example.com S��bastien Perseguers - firstname.lastname@example.org Christian Mazza - email@example.com Nicolas Mermod* - firstname.lastname@example.org * Corresponding author Abstract Background: PCR has the potential to detect and precisely quantify specific DNA sequences, but it is not yet often used as a fully quantitative method. A number of data collection and processing strategies have been described for the implementation of quantitative PCR. However, they can be experimentally cumbersome, their relative performances have not been evaluated systematically, and they often remain poorly validated statistically and/or experimentally. In this study, we evaluated the performance of known methods, and compared them with newly developed data processing strategies in terms of resolution, precision and robustness. Results: Our results indicate that simple methods that do not rely on the estimation of the efficiency of the PCR amplification may provide reproducible and sensitive data, but that they do not quantify DNA with precision. Other evaluated methods based on sigmoidal or exponential curve fitting were generally of both poor resolution and precision. A statistical analysis of the parameters that influence efficiency indicated that it depends mostly on the selected amplicon and to a lesser extent on the particular biological sample analyzed. Thus, we devised various strategies based on individual or averaged efficiency values, which were used to assess the regulated expression of several genes in response to a growth factor. Conclusion: Overall, qPCR data analysis methods differ significantly in their performance, and this analysis identifies methods that provide DNA quantification estimates of high precision, robustness and reliability. These methods allow reliable estimations of relative expression ratio of two-fold or higher, and our analysis provides an estimation of the number of biological samples that have to be analyzed to achieve a given precision. Background Quantitative PCR is used widely to detect and quantify specific DNA sequences in scientific fields that range from fundamental biology to biotechnology and forensic sci- ences. For instance, microarray and other genomic approaches require fast and reliable validation of small differences in DNA amounts in biological samples with high throughput methods such as quantitative PCR. How- ever, there is currently a gap between the analysis of the mathematical and statistical basis of quantitative PCR and Published: 20 April 2007 BMC Bioinformatics 2007, 8:131 doi:10.1186/1471-2105-8-131 Received: 22 September 2006 Accepted: 20 April 2007 This article is available from: http://www.biomedcentral.com/1471-2105/8/131 �� 2007 Karlen et al licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
BMC Bioinformatics 2007, 8:131 http://www.biomedcentral.com/1471-2105/8/131 Page 2 of 16 (page number not for citation purposes) its actual implementation by experimental laboratory users . While qPCR has been the object of probabilistic mathematical modelling, these methods have not often been employed for the treatment of actual measurements. Therefore, the validity of the assumptions or simplifica- tions on which these models are based is often unclear. At the other extreme, the treatment of laboratory measure- ments is often fairly empirical in nature, and the validity or reproducibility of the assay remains usually poorly characterized from an experimental and/or theoretical basis. Thus, practical qPCR methods usually do not allow mathematically validated measurements, nor the determi- nation of the statistical degree of confidence of the derived conclusions. Consequently qPCR results have been ques- tioned [2,3], with the consequence that semi-quantitative methods (e.g. end-point PCR) remain widely used. Quantitative PCR amplifications performed in the pres- ence of a DNA-binding fluorescent dye are typically repre- sented in the form of a plot as shown in Figure 1A, where the measured fluorescence is represented as a function of the PCR cycle number. An assumption that is common to all qPCR methods is that the fluorescence is directly corre- lated to the amount of double stranded DNA present in the amplification reaction . The amplification curves are sigmoid shaped and can be split into three phases. Phase I (Figure 1A) represents the lag phase in which no amplification can be detected over the background fluo- rescence and statistical noise. This phase is used to evalu- ate the baseline fluorescent "noise". Phase II corresponds to the early cycles at which detectable fluorescence levels start to build up following an exponential behaviour described by the equation inserted in Figure 1A. On a log scale graph, this corresponds to the linear phase, illustrat- ing the exponential dynamic of the PCR amplification (Figure 1B). During the later phase of the reaction, or phase III, the DNA concentration no longer increases exponentially and it finally reaches a plateau. This is clas- sically attributed to the fact that one or more of the reac- tants become limiting or to the inhibition of amplification by the accumulation of the PCR product itself . In a perfectly efficient PCR reaction, the amount or copy number of DNA molecules would double at each cycle but, due to a number of factors, this is rarely the case in experimental conditions. Therefore the PCR efficiency can range between 2, corresponding to the doubling of the DNA concentration at each cycle, to a value of 1, if no amplification occurs (Eq. 1 in methods). Furthermore, the efficiency of DNA amplification is not constant through- out the entire PCR reaction. The efficiency value cannot be measured during phase I, but it may be suboptimal during the first cycles because of the low concentration of the DNA template and/or sampling errors linked to the sto- chastic process by which the amplification enzymes may replicate only part of the available DNA molecules . Quantitative PCR is used under the assumption that these stochastic processes are the same for all amplifications, which may be statistically correct for N0 values that are large enough so that sampling errors become negligible . The efficiency reaches a more or less constant and maximal value that may approach 2 in the exponential amplification of phase II, and it finally drops to a value of 1 during phase III. This implies that any appropriate ana- lytical method should focus on phase II of the amplifica- tion where the amplification kinetic is exponential. Therefore, the first step in any qPCR analysis is the identi- fication of phase II, which is more conveniently per- formed when data are represented on a log scale (Figure 1B). Another assumption of qPCR is that the quantity of PCR product in the exponential phase is proportional to the initial amount of target DNA. This is exploited by choos- ing arbitrarily a fluorescence threshold with the condition that it lies within the exponential phase of the reaction. When fluorescence crosses this value, the cycle is termed the "Threshold cycle" (Ct) or "Crossing Point", and the higher the Ct, the smaller the initial amount of DNA. This is illustrated in Figure 1B, which displays qPCR amplifica- tions performed on serial dilutions of a cDNA sample. One of the first and simple methods to process qPCR data remains a set of calculations based solely on Ct values and is currently known as the ��Ct method [8,9]. However, as such, this method assumes that all amplification efficien- cies are equal to 2 or at least equal between all reactions. Therefore it does not take into consideration possible var- iations of amplification efficiencies from one sequence or sample to the other. Thus, the ��Ct method may not accu- rately estimate relative DNA amounts from one condition or one sequence to the other. Consequently, other meth- ods of data processing have been developed to estimate the efficiency of individual PCR amplifications [10-13]. Alternatively, amplification curves can be directly fitted with sigmoid  or exponential functions (Methods sec- tion, Eq. 6 and Eq. 8) in order to derive the original amount of template DNA (Eq. 7 and Eq. 9). Methods to estimate amplification efficiency can be grouped in two approaches, both of which rely on the log- linearization of the amplification plot. The most com- monly used method requires generating serial dilutions of a given sample and performing multiple PCR reactions on each dilution [10,12]. The Ct values are then plotted ver- sus the log of the dilution (Figure 2A) and a linear regres- sion is performed (Eq. 4) from which the mean efficiency can be derived (Eq. 5). As stated above, this approach is only valid if the Ct values are measured from the exponen-
BMC Bioinformatics 2007, 8:131 http://www.biomedcentral.com/1471-2105/8/131 Page 3 of 16 (page number not for citation purposes) tial phase of the PCR reaction and if the efficiency is iden- tical between amplifications. The other method currently used to measure efficiency is based on Eq. 3, which associates an efficiency value with each PCR reaction . This approach has been auto- mated in different programs , one of which, termed LinReg PCR , was used in this study. LinReg identify the exponential phase of the reaction by plotting the fluo- rescence on a log scale (Figure 2B). Then a linear regres- sion is performed, leading to the estimation of the efficiency of each PCR reaction. None of the current qPCR data treatment methods is in fact fully assumption-free, and their statistical reliability are often poorly characterized. In this study, we evaluated whether known mathematical treatment methods may estimate the amount of DNA in biological samples with precision and reliability. This led to the development of new mathematical data treatment methods, which were also evaluated. Finally, experimental measurements were subjected to a statistical analysis, in order to determine the size of the data set required to achieve significant conclu- sions. Overall, our results indicate that current qPCR data analysis methods are often unreliable and/or unprecise. This analysis identifies novels strategies that provide DNA quantification estimates of high precision, robustness and reliability. Results Quantitative PCR usually relies on the comparison of dis- tinct samples, for instance the comparison of a biological sample with a standard curve of known initial concentra- tion, when absolute quantification is required , or the comparison of the expression of a gene to an internal standard when relative expression is needed. The equation inserted in Figure 1B is used to calculate the ratio of initial target DNA of both samples (Eq. 2). The error on the nor- malized ratio depends on the error on the Ct and the error on the efficiency, and it can be estimated from Eq. 11. However, the range and relative importance of the various components, and the origin of the error on practical meas- urements remain poorly characterized. To evaluate the reproducibility of Ct measurements and their associated error, we generated a set of 144 PCR reac- tion conditions corresponding to various target DNA, cDNA samples and dilutions (see Additional file 1 for a description of targeted genes and amplicons). Each of these 144 reaction conditions was replicated by perform- ing 4 or 5 independent PCR amplifications. This yielded a complete dataset of 704 amplification reactions which Representations of real-time PCR amplification curves Figure 1 Representations of real-time PCR amplification curves. The three phases of the amplification reaction are shown either on a linear scale (panel A) or on a semi-log scale (panel B). Panel A represents a typical amplification curve, while panel B depicts amplification curves generated from serial dilutions of the same sample, either undiluted or diluted 10- or 1000-fold (indicated as 1, 0.1 or 0.001, respectively). During the lag phase (phase I), the fluorescence resulting from DNA amplification is undetectable above noise fluorescence in part A, while in part B, some data points take negative values and are not repre- sented. This phase is used to evaluate the baseline "noise" of the PCR amplification. Exponential amplification of the DNA is detected in phase II (cycles 16 to 23, panel A). This phase of the amplification corresponds to the linear portion of the curve in panel B (closed circles). A threshold value is usually set by the user to cross the log linear portion of the curve, defining the threshold cycle value (Ct). Phase II is followed by a linear or plateau phase as reactants become exhausted (phase III). The inserted equations describe the dynamic of the amplification during phase II.