Healing Length and Bubble Formation in DNA
- DOI: 10.1103/PhysRevE.73.051902
- PubMed: 16802962
- arXiv: cond-mat/0601304
Abstract
We have recently suggested that the probability for the formation of thermally activated DNA bubbles is, to a very good approximation, proportional to the number of soft AT pairs over a length L(n) that depend on the size n of the bubble and on the temperature of the DNA. Here we clarify the physical interpretation of this length by relating it to the (healing) length that is required for the effect of a base-pair defect to become neligible. This provides a simple criteria to calculate L(n) for bubbles of arbitrary size and for any temperature of the DNA. We verify our findings by exact calculations of the equilibrium statistical properties of the Peyrard-Bishop-Dauxois model. Our method permits calculations of equilibrium thermal openings with several order of magnitude less numerical expense as compared with direct evaluations.
Healing Length and Bubble Formation in DNA
X
iv
:c
on
d-
m
at
/0
60
13
04
v1
[
co
nd
-m
at.
so
ft]
1
3 J
an
20
06
Healing Length and Bubble Formation in DNA
Z. Rapti1, A. Smerzi2,3, K. Ø. Rasmussen2 and A. R. Bishop2
1 Center for Nonlinear Studies, Los Alamos National Laboratory,
Los Alamos, New Mexico 87545, and Department of Mathematics,
University of Illinois at Urbana-Champaign, 1409 W. Green Street, Urbana, IL 61801
2 Theoretical Division and Center for Nonlinear Studies,
Los Alamos National Laboratory, Los Alamos, New Mexico 87545 and
3 Istituto Nazionale di Fisica per la Materia BEC-CRS, Universita` di Trento, I-38050 Povo, Italy
C.H. Choi, A. Usheva
Endocrinology, Beth Israel Deaconess Medical Center and Harvard Medical School,
Department of Medicine, 99 Brookline Avenue, Boston, Massachusetts 02215
(Dated: February 6, 2008)
We have recently suggested that the probability for the formation of thermally activated DNA
bubbles is, to a very good approximation, proportional to the number of soft AT pairs over a length
L(n) that depend on the size n of the bubble and on the temperature of the DNA. Here we clarify
the physical interpretation of this length by relating it to the (healing) length that is required for
the effect of a base-pair defect to become neligible. This provides a simple criteria to calculate L(n)
for bubbles of arbitrary size and for any temperature of the DNA. We verify our findings by exact
calculations of the equilibrium statistical properties of the Peyrard-Bishop-Dauxois model. Our
method permits calculations of equilibrium thermal openings with several order of magnitude less
numerical expense as compared with direct evaluations.
I. INTRODUCTION.
Local separation of double-stranded DNA into single-
stranded DNA is fundamental to transcription and other
important intra-cellular processes in living organisms.
In equilibrium, DNA will locally denaturate when the
free energy of the separated single-stranded DNA is less
than that of the double-stranded DNA. Because of the
larger entropy of the flexible single-strand, the more rigid
double-strand can be thermally destabilized locally to
form temporary “bubbles” in the molecule already at
physiological temperatures [1]. Considering this entropic
effect together with the inherent energetic heterogene-
ity – GC base pairs are 25 % more strongly bound than
the AT bases – of a DNA sequence, it is plausible that
certain regions (subsequences) are more prone to such
thermal destabilization than others. In fact recent work
[2] demonstrates not only that such a phenomena exists
but more importantly that the location of these large
bubble openings in a variety of DNA sequences coincide
with sites active during transcription events. This dis-
covery represent a significant advance in the understand-
ing of the relationship between local conformation and
function in bio-molecules. While there is no guarantee
that this mechanism applies to all transcription initia-
tion events, the present agreement is very encouraging.
Similarly, other large bubbles identified may well have a
relationship to other biological functions. The agreement
is based on the Peyrard-Bishop-Dauxois (PBD) model,
[3], which evidently contains some essential basic ingredi-
ents –local constraints (nonlinearity), base-pair sequence
(colored disorder) and entropy (temperature). The equi-
librium thermodynamic properties of the model were nu-
merically calculated from the partition function using the
transfer integral operator method (TIO). (A complemen-
tary direct numerical evaluation of the partition function
has been reported in Ref. [5]). This allows the precise
evaluation of probabilities of bubbles as a function of
temperature, location in a given base-pair sequence, and
bubble size. In recent work [4], we reported that the
probabilities of finding bubbles extending over n sites
do not depend on a specific DNA subsequences. Rather,
such probabilities depend on the density of soft A/T base
pairs within a region of length L(n). Here we suggest that
this characteristic length is simply related to the charac-
teristic distance away from an AT base pair – consid-
ered as a defect placed in a homogeneous GC-sequence–
where the probability values of the base pairs return to
the GC bulk-value. Lastly, based on this concept of ef-
fective density approximation, we examine five different
human promoter sequences, and demonstrate the striking
agreement in the predictions from the two methods.
II. THE PBD MODEL AND THE TIO METHOD.
The potential energy of the PBD model is
E =
N
∑
n=1
[V (yn) +W (yn, yn−1)] =
N
∑
n=1
E(yn, yn−1)
. (1)
Here V (yn) = Dn(e−anyn − 1)2, represents the nonlin-
ear hydrogen bonds between the bases; W (yn, yn−1) =
k
2
(
1 + ρe−b(yn+yn−1)
)
(yn − yn−1)2 is the nearest-
neighbor coupling that represents the (nonlinear) stack-
ing interaction between adjacent base pairs: it is com-
prised of a harmonic coupling with a state-dependent
ness as the double strand is opened (i.e. entropic effects).
The sum in Eq.(1) is over all base-pairs of the molecule
and yn denotes the relative displacement from equilib-
rium bases at the nth base pair. The importance of the
heterogeneity of the sequence is incorporated by assign-
ing different values to the parameters of the Morse poten-
tial, depending on the the base-pair type. The parameter
values we have used are those in Refs. [6, 9] chosen to
reproduce a variety of experimentally observed thermo-
dynamic properties.
The equilibrium thermodynamic properties of the PBD
model can be calculated from the partition function
Z =
∫ N
∏
n=1
dyne−βE(yn,yn−1)
=
∫ s+k−1
∏
n=s
dynZk(s) e−βE(yn,yn−1), (2)
where we have introduced the notation
Zk(s) =
∫ N
∏
n6=s,...,s+k−1
dyn e−βE(yn,yn−1)
and β = (kBT )−1 is the Boltzmann factor. In order to
evaluate the partition function (2) using the TIO method,
we first symmetrize e−βE(x,y) by introducing [7]
S(x, y) = exp
(
−β
2
(V (x) + V (y) + 2W (x, y))
)
= S(y, x).
Here the second equality holds only when x and y corre-
spond to base-pairs of the same kind. Using Eq. (2) the
expression for Zk(s) is rewritten as
Zk(s) =
∫
N
∏
n6=s,...,s+k−1
dynS(yn, yn−1)
×dy0e−
β
2 V (y1)e−
β
2 V (yN), (3)
where open boundary conditions at n = 1, and n = N
have been used. To proceed, a Fredholm integral equa-
tions with a real symmetric kernel
∫
dyS(x, y)φ(y) = λφ(x) (4)
must be solved separately for the A/T and for the G/C
base-pairs.
Since the eigenvalues are orthonormal and the eigen-
functions form a complete basis, Eq.(4) can be used se-
quentially to replace all integrals by matrix multiplica-
tions in Eq. (3). Unlike in Ref. [8] where the kernels
S(x, y) were expanded in terms of orthonormal bases,
here we choose to use Eq. (4) iteratively. In this way
we reduce the number of integral equations that need to
be solved from 4 to 2, and at the same time the ma-
trices that need to be multiplied are lower dimensional.
Whenever the sequence heterogeneity results in a non-
symmetric S(x, y), Eq.(4) cannot be used and we resort
to a symmetrization technique, based on successive intro-
duction of auxiliary integration variables, as explained in
Ref. [10].
We evaluate the probabilities Pk(s), for a base-pair
opening spanning k base-pairs (our operational definition
of a bubble of size k), starting at base-pair s as
Pk(s) = Z−1
∫ ∞
t
s+k−1
∏
n=s
dynZk(s)e−βE(yn,yn−1) (5)
where t is the separation (which we have here taken as
1.5 A˚) of the double strand above which we define the
strand to be melted.
III. LENGTHSCALES AND EFFECTIVE
DENSITY APPROXIMATION.
In Ref.[4] we suggested that the probabilities of finding
bubbles extending over n sites localized around a given
bp, is, to a very good approximation, proportional to
the density of soft A/T base pairs within a region of
length L(n) centered around the same bp, an approach
we term here as effective density approximation (EDA).
The lengths L(n) were obtained from numerical transfer
integral calculations of the bubble probabilities of several
simple (but experimentally realizable) sequences. The
A/T density profiles were therefore compared with the
exact probabilities for thermal activation of bubbles of
sizes n = 1 and n = 5 of a wild and a mutant version
of the AAV P5 promoter. The agreement was excellent.
However, no physical explanation for the origin of these
characteristic lengths was provided nor were they con-
nected to any intrinsic length of the PBD model. But,
since they appear prominently in the formation of DNA
bubbles, it is important to investigate both of these ques-
tions.
In Figs.1-3 we consider a sequence composed of 150
G/C +1 A/T +150 G/C. In other words, we place a
defect (A/T instead of G/C) at the site l0 = 151. This
defect is 150 bp away from the two ends of the sequence
in order to eliminate boundary effects.
A/T base pairs have a smaller bonding energy than
GC bps. Therefore, the A/T defect softens a number
of GC-bps around it and increases the opening probabil-
ity. Clearly, sufficiently away from the defect the open-
ing probability regains the bulk value of a homogeneous
G/C-sequence at the given temperature, given threshold
and given bubble size. Our claim is that the characteris-
tic length L(n) is the distance necessary to be away from
the defect so that the G/C bps there are no longer af-
fected. This can be quantified by calculating the relative
Pn(l0 − L(n))− Pn(110)
Pn(110)
= α, (6)
where l0 − L(n) is the bp site obtained counting L(n)
downstream from the defect site, see Figs.1-3, and at site
110 we assume that the bulk value has been regained.
The remarkable finding is that with the choice of L(n)
considered in our previous work [4], obtained indepen-
dently by merely fitting the full numerical TIO calcula-
tions of the bubble formation probabilities of different
simple sequences, we obtain from Eq.(6) α ≃ 2.5%, inde-
pendently from the size of the bubble and the tempera-
ture of the DNA sequence. This can be seen in Figs.1-3:
the circle at bp = 151 = l0 is the A/T defect, while the
circles at bp = 141, 139, 135 are the positions of the bp
at l0 − L(n). We can therefore reverse the perspective
and define the characteristic length as the one given by
Eq.(6), with α ≃ 2.5%. This is important for pratical
applications, since it gives a simple criterion to estimate
bubbles probabilities for arbitrary bubble sizes and DNA
temperatures (and arbitrary PBD inter base-pairs inter-
action parameters), but it also immediately suggests a
simple physical explanation for L(n).
110 120 130 140 150 160 170 180 190
0.0002
0.00022
0.00024
0.00026
bp
P5
FIG. 1: The probability profile for the creation of a bubble
of size 5 bp. The isolated A/T bp embedded in a sequence
of G/C bps at bp = 151 is denoted by a solid black circle. A
second black circle is located at bp = 151 − L(5) = 141. The
relative error P5(141)−P5(110)P5(110) = 0.0232.
We parametrize the decay of the probability values as a
function of the downstream distance from the A/T defect
according to
Pn(l) = An +Bn exp[−
l0 − n+ 1− l
ξn
], (7)
where Pn(l) is the probability for finding a bubble of
size n located at the site l, An is the bulk value of the
homogeneous G/C sequence and An + Bn is the value
110 120 130 140 150 160 170 180 190
.000055
.00006
.000065
.00007
bp
P7
FIG. 2: The probability for the creation of a bubble of size 7
bp. The black circle at bp = 151 represents the defect. The
second black circle is located at bp = 151 − L(7) = 139. The
relative error P7(139)−P7(110)P7(110) = 0.0279.
110 120 130 140 150 160 170 180 190
.000009
.00001
.000011
.000012
bp
P10
FIG. 3: The probability for the creation of a bubble of size
10 bp. As before, the defect is represented by a solid black
circle, and a second one is located at bp = 151−L(10) = 135.
The relative error P10(135)−P10(110)P (110) = 0.0203.
of the probability at the site l0 − n + 1, which is the
same as the probability value of the defect site l0. ξn
is the healing length of the system, namely the char-
acteristic length for the perturbation to die out, which,
quite generally, depends on the size n of the bubble, tem-
perature of the DNA and the parameters of the PBD
model. Replacing Eq.(7) in Eq.(6), we obtain the re-
lation L(n) = n − 1 + ξn ln
(
Bn
αAn
)
, where Bn/An =
(Pn(l0)−Pn(bulk))/Pn(bulk) ≃ 0.34. We emphasize that
both Bn/An and α are independent of the size n of the
bubble. It follows that there is a simple linear relation
L(n) = n− 1 + 2.6 ξn. (8)
This is a very important result of this report. The heal-
ing length can be easily calculated as a function of the
bubble size and temperature with an homogeneous G/C
sequence plus a single defect, as shown above. From this,
we can calculate the value of L(n) and, therefore, esti-
mate the probability for the creation of bubbles for arbi-
trary DNA sequences at any temperature. For instance,
for bubbles of size n = 7, we obtain L(7) = 12, while for
bubbles of size n = 10 we have L(10) = 16 at T = 300 K
and PBD parameters as in [9].
In order to examine how the values of the parameters
of the PBD model affect those of L(n), we set ρ = 0 and
repeat the calculation of L(10). Since, when ρ decreases,
so does the ”cooperativity” of the base base pairs, one
would expect to observe a drop in the L(10) value. This is
indeed the case: L(10) = 14, while for ρ = 2 the value as
16. We will show in the next Section how this approach
110 120 130 140 150 160 170 180 190
.000055
.00006
.000065
bp
P10
FIG. 4: In this figure we show the probability for the creation
of a bubble of size 10 bp, when the coupling constant ρ =
0. As is indicated by the relative error P10(137)−P10(110)P (110) =
0.0216, now L(10) = 14.
compares with exact transfer integral operator calcula-
tions of the statistical properties of the PBD model.
We conclude this Section by noting that the Figs. 1-
3 exhibit symmetry, but not with respect to the defect.
While the defect is always at bp = 151, the symmetry
is with respect to bp 149 in the P5 case, 148 in the P7
case, and the axis that separates bps 146-147 in the P10
case. Another feature is the existence of a second local
maximum with the same value as Pn(151), and a slight
drop in the probability values in the middle of the peak.
We notice that the two maxima are located at sites l0
and l0−n+1. This suggests that a bubble with a defect
at its boundary has a higher probability to form: in the
Pn(l0−n+1) case the defect is at the end of the bubble,
while in the Pn(l0) case it is at the begining of the bubble.
Also, the probability drops in the middle of the peak
because the bubble there contains a defect that is trapped
within G/C bps, and it turns out that the probability of
formation of a bubble of this kind is smaller.
IV. COMPARISON OF THE EDA AND TIO
METHOD.
We now compare the probability profiles obtained from
the effective density approach with the characteristic
length L(n) calculated as in the previous Section, with
exact results obtained with the TIO method. We con-
sider five different human genome subsequences, and
compare the calculations for the probability of formation
of bubbles of sizes n = 7 and n = 10.
In the panels (a, b) of Figs.5-9 we plot (as a function of
the bp site) the number N7 and N10 of A/T bps calcu-
lated over a distance L(7) = 12 (panel a) and L(10) = 16
(panel b). These A/T density profiles can be compared
with the probability for the thermal creation of bubbles
of seven, P7, and ten, P10, sites, panels (c, d). In all cases
(and in several other not reported here) the resemblance
in the main features of the respective profiles is strik-
ing. In particular, EDA correctly predicts the locations
and relative weights of the probability peaks. The crucial
point is that, while the profiles obtained with the EDA
requires few seconds to be calculated, the full TIO meth-
ods is very time consuming (of the order of several hours
in the cases presented here). To fully appreciate this ad-
vantage, we note that with the EDA the entire human
genome can be sequenced for bubble formation probabil-
ities in few minutes, while a statistical approach based
on the calculation of the partition function is clearly im-
possible.
−100 0 100
2
4
6
8
bp
N7
a
−100 0 100
5
10
bp
N10
b
−100 0 100
0.0005
0.001
bp
P7
c
−100 0 100
0.0001
0.0003
bp
P10
d
FIG. 5: Effective density profiles for 7 and 10-site long bub-
bles (a,b) and probability profiles calculated with the transfer
integral approach, (c,d). The sequence is the cox 8 promoter.
2
4
6
8
bp
N7
a
−100 0 100
5
10
bp
N10
b
−100 0 100
0.0004
0.0008
bp
P7
c
−100 0 100
0.0001
0.0003
bp
P10
d
FIG. 6: Effective density profiles for 7 and 10-site long bub-
bles (a,b) and probability profiles calculated with the transfer
integral approach, (c,d). The sequence is the cox 11 promoter.
−100 0 100
5
10
bp
N7
a
−100 0 100
5
10
bp
N10
b
−100 0 100
0.0005
0.0015
bp
P7
c
−100 0 100
0.0002
0.0006
bp
P10
d
FIG. 7: Effective density profiles for 7 and 10-site long bub-
bles (a,b) and probability profiles calculated with the transfer
integral approach (c,d). The sequence is the gtf2f2 promoter.
V. CONCLUSIONS.
It has been suggested that the DNA transcription initi-
ation sites can coincide with the location of large bubble
openings. A thorough investigation of this hypothesis re-
quires the statistical analysis of many DNA promoters
within the PBD model. Such a task becomes quickly
prohibitive when studying bubble-promoter correlations
in a significantly large number of cases (namely, for large
sequences). This problem has motivated the develop-
ment of an alternative simplified approach to calculate
the bubble formation probabilities. We have found that
this probabilities are proportional to the density of soft
A/T base-pairs calculated over an effective length which
depends on the size of the bubble and the temperature of
−100 0 100
5
10
bp
N7
a
−100 0 100
5
10
bp
N10
b
−100 0 100
0.0005
0.0015
bp
P7
c
−100 0 100
0.0002
0.0006
bp
P10
d
FIG. 8: Effective density profiles for 7 and 10-site long bub-
bles (a,b) and probability profiles calculated with the transfer
integral approach (c,d). The sequence is the h33a promoter.
−100 0 100
5
10
bp
N7
a
−100 0 100
5
10
15
bp
N10
b
−100 0 100
0.0005
0.0015
0.0025
bp
P7
c
−100 0 100
0.0002
0.0006
bp
P10
d
FIG. 9: Effective density profiles for 7 and 10-site long bub-
bles (a,b) and probability profiles calculated with the transfer
integral approach (c,d). The sequence is the h3b promoter.
the DNA. We have clarified the physical origin of such a
length and suggested a simple procedure for its calucla-
tions. The results of our effective density approach are
in extremely good agreement with full exact calculations,
but with a numerical effort reduced by several order of
magnitudes. In this way, the full human genome can be
analyzed, opening a unique possibility to understand the
existence and nature of the correlations between ther-
mally activated bubbles and promoters.
VI. ACKNOWLEDGMENTS
Work at Los Alamos National Laboratory is supported
by the US Department of Energy under contract contract
[1] M. Gueron, M. Kochoyan, and J.L. Leroy Nature 328,
89 (1987); M. Frank-Kamenetskii Nature 328 17 (1987).
[2] C.H. Choi, G. Kalosakas, K.Ø. Rasmussen, M. Hiromura,
A. R. Bishop and A. Usheva, Nucleic Acids Res. 32, 1584,
(2004).
[3] M. Peyrard and A. R. Bishop, Phys. Rev. Lett, 62, 2755,
(1989); T. Dauxois, M. Peyrard and A. R. Bishop, Phys.
Rev. E 47 R44 (1993).
[4] Z. Rapti, A. Smerzi, K.Ø. Rasmussen, A.R Bishop, C.H.
Choi and A. Usheva, cond-mat/0511128
[5] T. S. van Erp, S. Cuesta-Lopez, J.-G. Hagmann, and M.
Peyrard Phys. Rev. Lett. 95, 218104 (2005).
[6] A. Campa and A. Giansanti, Phys. Rev. E, 58, 3585,
(1998).
[7] T. Dauxois and M. Peyrard, Phys. Rev. E 51 4027 (1995).
[8] Y. Zhang, W.-M. Zheng, J.-X. Liu, and Y. Z. Chen, Phys.
Rev. E, 56, 7100, (1997).
[9] The parameters were chosen in Ref. [6] to fit thermo-
dynamic properties of DNA: k = 0.025eV/A2, ρ = 2,
β = 0.35A−1 for the inter-site coupling; for the Morse
potential DGC = 0.075eV , aGC = 6.9A−1 for a G-C bp,
DAT = 0.05eV , aAT = 4.2A−1 for the A-T bp.
[10] M. B. Fogel, Nonlinear Order Parameter Fields: I. Soli-
ton Dynamics, II. Thermodynamics of a Model Impure
System, Ph.D. Thesis, Cornel University, (1977).
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


