A review on the use of the adjoint method in four-dimensional atmospheric-chemistry data assimilation
- ISSN: 00359009
Abstract
In this paper we review a theoretical formulation of the adjoint method to be used in four-dimensional (4D) chemistry data assimilation. The goal of the chemistry data assimilation is to combine an atmospheric-chemistry model and actual observations to produce the best estimate of the chemistry of the atmosphere. The observational dataset collected during the past decades is an unprecedented expansion of our knowledge of the atmosphere. The exploitation of these data is the best way to advance our understanding of atmospheric chemistry, and to develop chemistry models for chemistry-climate prediction. The assimilation focuses on estimating the state of the chemistry in a chemically and dynamically consistent manner (if the model allows online interactions between chemistry and dynamics). In so doing, we can: produce simultaneous and chemically consistent estimates of all species (including model parameters), observed and unobserved; fill in data voids; test the photochemical theories used in the chemistry models. In this paper, the Hilbert space is first formulated from the geometric structure of the Banach space, followed by the development of the adjoint operator in Hilbert space. The principle of the adjoint method is described, followed by two examples which show the relationship of the gradient of the cost function with respect to the Output vector and the gradient of the cost function with respect to the input vector. Applications to chemistry data assimilation are presented for both continuous and discrete cases. The 4D data variational adjoint method is then tested in the assimilation of stratospheric chemistry using a simple catalytic ozone-destruction mechanism, and the test results indicate that the performance of the assimilation method is good.
A review on the use of the adjoint method in four-dimensional atmospheric-chemistry data assimilation
A review on the use of the adjoint method in four-dimensional
atmospheric-chemistry data assimilation
By K.-Y. WANdv2*, D. J. LARY’, D. E. SHALLCROSS3, S. M. HALL2 and J. A. PYLE2
‘National Central University, Chung-Li, Taiwan
University of Cambridge, UK
University of Bristol, UK
(Received 19 June 2000, revised 30 January 2001)
SUMMARY
In this paper we review a theoretical formulation of the adjoint method to be used in four-dimensional (4D)
chemistry data assimilation. The goal of the chemistry data assimilation is to combine an atmospheric-chemistry
model and actual observations to produce the best estimate of the chemistry of the atmosphere. The observational
dataset collected during the past decades is an unprecedented expansion of our knowledge of the atmosphere.
The exploitation of these data is the best way to advance our understanding of atmospheric chemistry, and to
develop chemistry models for chemistry-climate prediction. The assimilation focuses on estimating the state of
the chemistry in a chemically and dynamically consistent manner (if the model allows online interactions between
chemistry and dynamics). In so doing, we can: produce simultaneous and chemically consistent estimates of all
species (including model parameters), observed and unobserved; fill in data voids; test the photochemical theories
used in the chemistry models. In this paper, the Hilbert space is first formulated from the geometric structure
of the Banach space, followed by the development of the adjoint operator in Hilbert space. The principle of the
adjoint method is described, followed by two examples which show the relationship of the gradient of the cost
function with respect to the output vector and the gradient of the cost function with respect to the input vector.
Applications to chemistry data assimilation are presented for both continuous and discrete cases. The 4D data
variational adjoint method is then tested in the assimilation of stratospheric chemistry using a simple catalytic
ozone-destruction mechanism, and the test results indicate that the performance of the assimilation method is
good.
KEYWORDS: Adjoint method Data assimilation 4D-Var
1. INTRODUCTION
Over the past decades both (A) a huge number of observations have been accumu-
lated concerning the atmosphere; and (B) a significant improvement in the photochem-
ical theories and models for the studies of important issues such as stratospheric ozone
depletion, anthropogenic perturbation of atmospheric chemistry, and the interaction be-
tween atmospheric chemistry and terrestrial vegetation have been made.
An increasing amount of effort is being devoted to the studies of the following
questions:
(i) How can we bring (A) and (B) together in order to maximize our understanding
of atmospheric chemistry?
(ii) How can we use (A) to test and improve (B)? We shouldn’t simply use (A) as a
mean of verifying (B), because (B) is often too complicated.
Discrepancies between (B) and (A) are in general very difficult to interpret. For example,
could the discrepancies be due to the improper use of initialhoundary conditions,
or the incompleteness of the photochemistry (and heterogeneous chemistry) theories
embedded in the model? However, if we try to ‘best fit’ (B) to (A), then (i) the
best analysis, given the observations and model, can be obtained (i.e. bypassing the
uncertainties incurred by the initial conditions and focusing on the model itself); hence,
(ii) data voids can be filled (e.g. by applying the ‘best fit’ of the chemistry model
* Corresponding author: Department of Atmospheric Sciences, National Central University, Chung-Li, Taiwan.
e-mail: kuoying@mail.atm.ncu.edu.tw
@ Royal Meteorological Society, 2001.
2181
to sparse and irregular observations in space and time, data voids can be analysed
(filled) in a photochemically consistent manner at spatial and temporal points where
no observation is available); hence, (iii) photochemical theories can be tested (e.g. via
the comparison of best model analyses with measurements). Furthermore, (iv) our
understanding of atmospheric chemistry can be improved (e.g. discrepancies between
the best possible model assimilation (analyses) and measurements can be attributed to
the incompleteness of the model compared with the real world).
In other words, the grand picture of the real world hidden behind those seemingly
scattered and sparse observations can be reproduced (one of the ultimate aims of the
modelling), thanks to the use of the adjoint method and the framework of the photo-
chemistry theories (models). Only when this man-made picture (based on photochemi-
cal theories) is successfully reproduced can the comparison of model results (analyses,
simulations, predictions, etc.) with observations become meaningful and an insight into
atmospheric photochemical processes be derived. For example, Fisher and Lary (1995)
proposed the first Four-Dimensional Variational (4D-Var) chemistry data assimilation
to be used in stratospheric chemistry; Khattatov et al. (1999) applied the 4D data as-
similation method to the analysis of the Upper Atmosphere Research Satellite data; and
Elbern et al. (1997) used data assimilation for tropospheric chemistry modelling.
Variational data assimilation has long been used in meteorology and oceanography
(see e.g. Talagrand and Courtier 1987; Navon et al. 1992). However, the introduction
of the adjoint method into the study of atmospheric chemistry only began in the mid-
199Os, and the increased recognition of the importance of this assimilation method has
grown over the past few years. Although some papers had already briefly sketched the
adjoint method for chemistry data assimilation (e.g. Fisher and Lary 1995; Elbern et al.
1997), a complete overview of the adjoint method has not been developed and presented.
In this paper we present a review on the implementation of the adjoint method in 4D-
Var chemistry data assimilation, and its robustness when applied to an idealized case
of atmospheric-chemistry data analysis. We note that the Kalman filter is also being
developed and used as another alternative method in atmospheric data assimilation
(e.g. Daley 1991; Khattatov et al. 1999).
2. ADJOINT HEORY
(a) The Hilbert space
The power of adjoint theory and the elegance of its approach are constructed based
on the Hilbert space theory. In this section we describe the basis of Hilbert space. Most
of our discussions are developed from those of Simmons (1963) and Feintuch (1998).
(i) Linear space. A set is generally a collection of elements, without coherence or
form. When an algebraic or geometric structure is imposed on a set, so that its elements
are organized into a systematic whole, then it becomes a space. Let L be a non-empty
set, and assume that each pair of elements x and y in L can be combined by a process
called addition to yield an element z in L denoted by z = x + y. We now assume that
each element x in L can be combined by a process called scalar multiplication to yield an
element y in L denoted by y = ax. The algebraic system L defined by these operations
and axioms is called a linear space. A linear space is often called a vector space, and its
elements are spoken of as vectors. Notice that the system of real numbers or the system
of complex numbers are the scalars.
(ii) Nomzed linear space: Banach space. Let X be a non-empty set. A metric on X
is a real function p of ordered pairs of elements of X which satisfies the following
conditions:
p(x , y) 2 0 and p ( x , Y ) = 0 e x = Y , (la)
P ( X , Y ) = P(Y9 x) (symmetry), (1b)
p ( x , y) 6 p(x, z) + p(z, y) (the triangle inequality). (1c)
The function p assigns to each pair (x, y) of elements of X a non-negative real number
p (x, y ). Here p (x , y ) is called the distance between x and y . A metric space consists of
two objects: a non-empty set X and a metric p on X.
Let X be a metric space with metric p, and let
(&A = (x1, x2, * - * 9 &a, * - * 1 (2)
be a sequence of points in X. We say that x, is convergent if there exists a point x in X
such that for every E > 0 there exists a positive integer no such that
n 2 no + p(xn , x) < E. (3)
For every convergent sequence n, has the following property: for each E > 0, there exists
a positive integer no such that
(4)
A sequence with this property is called a Cauchy sequence. Every convergent sequence
is a Cauchy sequence; but a Cauchy sequence is not necessarily convergent. A complete
metric space is a metric space in which every Cauchy sequence is convergent.
A normed linear space is a linear space on which there is defined a norm, i.e. a
function which assigns to each element x in space a real number Ilx 11 in such a manner
that
m, n B no - p h , x,) -= 6 .
A normed linear space is a metric space with respect to the induced metric defined by
P ( X , Y ) = Ilx - YII. (6)
A Banach space is a normed linear space which is complete as a metric space.
(iii) Inner-product dejned nomzed space: Hilbert space. Let 3t' be a vector space.
A function ( , ) is called an inner product. Now define a new function 11 11: 3t' + R by
Ilx )I = (x, x) ' I2, where R is a real number. It follows that 11 11 defines a norm on 3t'. This
norm allows us to define a metric on 2 by means of the metric p,
(7)
Hence, for every two points x and y, the metric p gives the distance between them. A
sequence (x,) is said to converge to x E 3t' if
P ( X 9 Y ) = Ilx - Yll.
p(x,, x) = llx, - xll + 0 as n + oo. (8)
A Cauchy sequence is a sequence x1, x2, . . . such that the metric p ( X m , X n ) satisfies
(Weisstein 1998):
We say 3t' is complete if every Cauchy sequence in 3t' converges in 3. This means
that the limit of a convergent sequence of vectors in 3t' is also in 3t'. Since the Cauchy
sequence does converge in the real line (real numbers), the finite-dimensional vector
space where the inner product has been defined is always complete. A complete inner-
product space is called a Hilbert space.
Hence, a Hilbert space is a special Banach space whose norm arises from an inner
product, that is, in which there is defined a function (x, y) of vectors x and y with the
following properties:
( 1 Oa)
(x, y) = (y, x), ifx, y E C, (lob)
(x, x) = 11x112. (10c)
( a x + By, z) =a(% z) + B ( y , z),
(b) The adjoint operator
Following Talagrand (1991), let us consider Hilbert space 6 and Hilbert space F,
on which inner products have been defined. Given a continuous linear operator L of 6
into F, there exists a unique continuous linear operator L* of F into 6, such that for
any vector U E 6 and V E F, the following equality between inner products holds:
(1 1)
L* is called the adjoint of L. The property of the adjoint operator L* can be illustrated
as follows. Let U and V have finite dimension n and m, respectively,
(LU, V) = (U, L*V).
U =
V = [ ,
Vm
and L be a rn x n matrix. Then the inner product of the left-hand side of (1 1) is
/
The exchange in the order of summation indices has been applied to obtain the last line
of (14). While the order of the summation in the first line of (14) is first taken on the
input index j then on i, the summation in the second line is first performed on the input
index i then on j . The operations on L have been turned from row-by-row in the first
line into column-by-column in the second line. This shows that the matrix representing
L* is the transpose of the matrix representing L (Talagrand 1991). The exchange of the
summation operations is allowed given that the element of the matrix L is a number and
not, for instance, another matrix.
(c) The principle of the adjoint method
We follow Talagrand (199 1) and Talagrand and Courtier (1987) to derive the general
principle of the adjoint method in this section. Let U E 6 be the input vector, and V E F
be the output vector (6 and 9 are Hilbert spaces). The components of U are denoted in
(1 2), and the components of V are denoted in (1 3). The output vector V is a differentiable
function of the input vector U by (Fig. 1):
V = G(U). (15)
SV = G’SU, (16)
For a perturbation on the output vector V, the corresponding perturbation SV is
where the Jacobian matrix G’ is the first derivative of G with respect to the input
vector G:
8 6 1 aG12 - - ...
aul au2
Equation (16) is called the tangent linear equation to the nonlinear equation (15)
(Talagrand and Courtier 1987), and the equation says that, for a given U, the local
sensitivities of the output parameters (SV) with respect to the input parameters (SU) are
given by the Jacobian matrix G’(U). Now let J : V + J(V) be a scalar-valued function
of the output vector V. The variation of J with respect to output vector V is
SJ = (VvJ, SV). (18)
Substituting (16) into (18) and using the property of the adjoint operator (1 1). (18)
becomes
(19)
By definition of a gradient, this shows that the gradient G’*Vv J of J with respect to the
input vector U is equal to
(20)
Equation (20) is at the heart of the use of adjoint equations, and provides a very efficient
way to numerically determine Vu J . The principle of the adjoint method is to compute
the gradient VuJ, not through the direct explicit perturbation of the input parameters
(U), but through the use of (20). What is more, the adjoint method gives the instructive
SJ = (VvJ, G’SU) = (G’*VvJ, SU).
Vu J = G’*Vv J.
Figure 1. A schematic diagram which shows relations between input vector U and output vector V. V = CW).
The operator G’ is the first derivative of G with respect to the input vector U. The operator G’* is the adjoint of
G’. Here J is a differentiable scalar-valued function of the output vector V. See text for explanation.
insight obtained from the comparison between (16) and (20). While the direct Jacobian
G’ transforms a perturbation on the input vector SU into the first-order corresponding
perturbation on the output vector SV, the transpose Jacobian G‘* transforms the gradient
of a scalar-valued function J with respect to the output vector (Vv J ) into the gradient
of the function J with respect to the input vector (Vv J ) (Fig. 1).
(d ) An example of the finite-dimensional case
Let us now consider the finite dimensional vectors U and V ((12) and (13)). The
first-order perturbation Svi (i = 1, . . . , rn) with respect to the input vector SU is given
by
The partial derivative of J with respect to the input parameter u j ( j = 1, . . . , n ) is given
by the chain rule
J = J(V)
= J(G(U)) ,
a J aGij
, j = 1 , ..., n .
i=l
Here, a J/auj is the gradient of J with respect to the input vector U and is denoted
by VU J ; while a J/avi is the gradient of J with respect to the output vector V and is
denoted by Vv J . Notice that the summation in (21) is performed over index j for each
i of aGij/auj, i.e. over each row of the same index i, and the summation in (22) is
performed over the index i for each indexj. Hence, if we denoted the former summation
as the operation over a matrix G’, then the latter summation is exactly the operation over
the transpose of the matrix G’ and we denote it as G’*. Therefore the matrix form of (22)
can be written as that shown in (20).
(e) An example of catalytic ozone destruction
Consider the following important catalytic cycles for ozone destruction in polar
regions:
C10 + BrO -+ Br + C1+ 0 2 (kl)
C1 +O3 + C10 + 0 2 (k2)
Br +O3 +BrO + 0 2 (k3 1
Net: 203 -302
where kl, k2, and k3 are reaction rate coefficients. The gas-phase rate equations for the
change of
where 0 3 etc. represent gas concentrations, can be written as
dU
- = F(U), dt
where
-k2[C1"31 - k3[BrlC031
-k1 [C101[B~I + k2[c1lro31
kl [ClOI [BGI - k2 [Cll [03 1
kl [C10l[BrOl - k3[Br"31
(25)
Hence, the relationship between the input vector U and the output vector V during the
time interval At can be written as
V = G(U), (26)
where
G(U) =
OF) + At(-k2[Cl(")][OF)] - k3[Br(")][Oy)]) I = [$I. (27) ClO(") + At ( 4 1 [ClO(")][BrO(")] + k2[Cl(")][O~)]) BrO(") + At ( 4 1 [ClO(")][BrO(")] + k3[Br(")][Oy)])
Equation (27) shows that G is nonlinear in U. Here U is the concentrations (input vector)
Cl(") + At (kl [CIO(")][BrO(")] - k2[Cl(")][O(")]) G4
Gs Br(") + At (kl [ClO(")][BrO(")] - k3[Br(")][O))])
at time step n:
U =
and V is the concentrations (output vector) at time step n + 1
The tangent linear equation with respect to (24) for a perturbation of U is written as (see
also Elbern et al. 1997)
SdU
dt
- = F'SU,
or, equivalently from (26)
SV = G'SU,
where
Equation (30) (or (31)) is called the tangent linear equation. Equations (32) and (27)
show that G' is now linear in U, but G' is still time dependent. Let J be a scalar-valued
function, and J a function of output vector V:
J = J(V). (33)
For example, J can be the distance function (Talagrand and Courtier 1987) which
measures the difference between model and observations at time step n + 1. If we denote
3 and B^r("+') as observations, then J is %("+I), cTo("+l), B j j ( n + l ) @+l)
The gradient of J with respect to O r ) at time step n is
aJ aG(u) aJ
a@) - a@) aG(U)
-- -
aG(u) a J
a@) av
---
Notice here that J is a scalar-valued function. The gradients of J with respect to the
other four species are derived in a similar manner. Hence, at the end we can write the
gradient of J with respect to the input vector at time step n as
VuJ =
aJ
ao?)
aciocn)
aJ
aJ
It is clear from (36) that the adjoint matrix GI* is the transpose of the Jacobian matrix
G' of (32). Equation (36) is called the adjoint tangent linear equation (Talagrand and
Courtier 1987; Fisher and Lary 1995). If J is the scalar-valued distance function (33),
then the problem for atmospheric chemistry data assimilation is to control the input
vector U which will minimize J. As will be discussed in more detail in section 3(c),
an effective minimization algorithm requires the gradient of J with respect to its input
vector U to be known (Talagrand and Courtier 1985; Giering and Kaminski 1998).
(f) A remark on the adjoint method
In the case of atmospheric chemistry modelling, where V is always an explicit but
complicated function of U (26), it is generally impossible to find a usable numerical
expression for Vu J (Talagrand and Courtier 1987). Notice here that the scalar-valued
function J is a function of the output vector V, which itself is explicitly determined from
U. Now, (36) shows that if a computer code is available to calculate GI* for a given U,
Vu J can be easily calculated from Vv J . Here Vv J can be easily determined whenever
J is a simple function of V.
(g) The essence of the adjoint method
The essence of the adjoint method is simply to systematically perform computations
of (36) (Talagrand 1991). Since at each time step n, the adjoint computations proceed
from Vv J at later time step n+l to VuJ at earlier time step n, the overall integration
of the adjoint model is performed in the sense of backward integration into the past.
Let G(') be the model computation at time step i (i = 1, . . . , N), and U('-l) the input
vector; if we denote o as (Kaminski et al. 1999)
Hence, at any given i, an updated V(’) is set to a new U(i), and
By applying the chain rule to the partial derivative of G with respect to U(O), the Jacobian
of G‘ can then be written as
3. APPLICATION TO DATA ASSIMILATION
(a) Variational approach Z
There are generally two approaches to applying adjoint methods to variational
data assimilation (Daley 1991; Fisher and Lary 1995). The first approach, in terms
of Lagrange multipliers, is developed in this section following Daley (1991). The
second approach, using the theory of adjoint operators, is developed in the next section
following Talagrand and Courtier (1987) and Elbern et al. (1997). We note here that the
major difference between these two approaches is in the treatment and identification of
the gradient of J .
Let H(’) be a scalar-valued Qnction which measures the distance between model
prediction V(’) and observations V(’) available at time step i. Hence
where di) is a specific weighting function, which represents the analysis errors of the
observations. Following (26), the model forecasts at time step i is given by
V(i) = G(i)(U(i-l)). (50)
So the problem for atmospheric chemistry data assimilation can be succinctly described
as follows: consider the minimization of (49) subject to the strong constraint (50).
Equation (50) represents chemistry models which were constructed based on current
photochemistry theories. Equation (49) represents the misfits (cost) between model
forecasts (50) and observations. By imposing the minimization of (49) under the
strong constraint of the model (50), we obtain the best model analysis of atmospheric
chemistry.
From (49), (44), (46), and (41), and introducing the following notations as defined
from (38)
the variation on J can be written as
N
Hence the gradient of J with respect to U(O) is
- w ( N ) ( T ( N ) - V(N))G(N)’G(N-l)’G(N-2)’ . . . G(1)’
+ w(N-l)(v(N-l) - V(N-I))G(N-l)’G(N-2)’ . . . G(2)’
. * * + w (1) ( T(1) - V(l))G(l)’ + ,(O)($o) - V(0)).
J
Hence
= GN’*w(N)(v(N) - V(N>) + GN-l’*w(N-l)(T(N-l) - V(N-1))
. . . + G2’*~(2) (7(2) - V(2)) + Gl’*w(l)(P(l) - V“)) + w(o) (e(o) - V(0))
(62)
Notice that the property of GN’ has been used to derived (62). Therefore, with (55),
Vu(o) J = -A(’). (63)
We note that if A(’) is zero, then VU(o) J is zero. The backward integration of the
inhomogeneous adjoint tangent linear equation (60) from = 0 to A(0) gives
the desired gradient of J with respect to the initial condition U(O). The overall time
coordinates of the integrations can be summarized as I
0 At 2At . . . (N - 2)At (N - 1)At N A t = T
o + 1 + 2 + ’ . .
h
1 t 2 t 3 t.’.
(5) N - l 9 N - 2 (1) (2) (3) ( 5 2 )
(1) (2) (3) ( N y - ( N y
U
N !N, N + l
(64)
There are N forward integrations of (50), followed by N backward integrations of
(60) with imposed natural boundary conditions at both = 0 and A(O) = 0.
(b) Variational approach ZI
Let H be the scalar-valued distance function, and
J = L H(U)dt .
T
The variation of J with respect to U is written as
T
SJ = 1 (VuH, SU) d t .
Now define the new function I as
where A is the Lagrange multiplier, and the constraint model is
dU
dt
- = F(U).
The variation of I is
dU F(U)) + ( A , dt - F'SU)) dt. (69)
While the variation on A yields the constraint model (68), the variation on U, which
follows from
T d A 1 (P) dt = A(T)SU(T) - A(O)SU(O) -
- lT (F"A, SU) dt,
and the first term of the right-hand side of (69), yields the following inhomogeneous
adjoint tangent linear equation:
F'*A + VuH = 0, (71)
and the natural boundary conditions A(0) = A(T) = 0. Equation (71) can also be written
as
dA -- -
dt
-- - F'*A - VuH, dA
which is instructive when compared with the tangent linear equation
-- - F'SU. dSU
dt (73)
Both (72) and (73) are linear models. But while the later is integrated with respect to
the positive increment of time (forwards in time), the former is integrated with respect
to the negative increment of time (backwards in time).
Following (45) and (41), the solution to (73) is written as
SU(") = R(tn, to)SU'O', (74)
where
(n)’G(n-l)’G(n-2)’ . . . G(2)’G(1)’ - N N R(tn, to) = G IcI
= R(tn, tn-lIR(tn-1, tn-2) * . * R(t2, tl)R(tl, to). (75)
For every n, G(”)’ = &,, t,-l). Here R(t,, to) is a well-defined linear operator, and is
called by Talagrand and Courtier (1987) the resolvent of (73) between times to and t, .
If set n = 0, then (74) gives
SU(’) = R(t0, to)SU(O’, (76)
or, in general, for any t ,
From (74), we can write
R(t, t ) = I .
d d
-SU = -R(t, to)SU‘o’.
dt dt
From (73) and (74), we can also write
-- - F’SU = F’R(t, t0)SU‘O’. dSU
dt
Hence the comparison between (78) and (79) shows
Both (77) and (80) are two important properties for R(t, to).
Now let S(t,- 1 , t,) be the resolvent of the linear adjoint equation
(77)
between times t,, and tn-l. This means that the solution to (81) can be written as
N - u N l o o ) = S(t0, t N ) W N )
= so03 tl)S(tl, t2) * * * WtN-21 tN-l)S(tN-l, tN)A(tN). (82)
For a solution SU of (73) and a solution A of (81) at time t, their inner product
- d (SU, A) = (& A) + (6u, ;A) d
dt
= (F’SU, A ) + (SU, - F’*A)
= 0.
Hence, the inner product is constant with time. Let y be the solution of the forward
equation at time t, and y’ be the solution of the backward equation at time t’ (t’ > t),
observing the following relationship
Time t Time t’
Solution to dSU/dt = F’SU Y R(tf, t)y
Solution to -dA/dt = F’*A S(t, t’)y’ Y’
Inner product (y, S(t, t’ly’) (Rot, t ) Y , Y)
Equation (83) ensures that the inner product of (y, S ( t , t’)y’) at time t is equal to the
inner product of (R(t’, t ) y , y’) at time t‘. Hence
Hence, S = R*, and (66) can be rewritten as
r T
6 J = lo (VUH, SU) dt
1 = ( J d T R*(t, t0)VuH dt, SU(O)
= ( lT S(t0, t)VuH dt, SU(O) . 1
The exchange of the operation of integration and inner product is allowed given that
SU(O) is not dependent on t. Therefore, the gradient of J with respect to U(O) is
V ~ ( O ) J = R*(t, t0)VuH dt = S(t0, t)VuH dt = S(t0, t)VuH dt, (87) Ir LT c
where at the last term of (87) we have set time t = 0 = to and t = T = t N . The properties
of S ( t , t ~ ) are given by
S ( t , t ) = I , (88)
= -F’*h(t)
= -F’*S(t, t N ) h ( t N ) ,
a
- S ( t , t N ) , = -~ ’*s ( t , t N ) .
at
Now consider the solution to the inhomogeneous adjoint tangent linear equation (71),
which is written as
t
h(t) = S ( t , t ) V u H ( t ) dt. (90) 1
Before we set out to verify (go), let’s have a close look at equations (71), (81), and
(89). The solution to the inhomogeneous equation (71) is
= h h -k hp, (91)
where Ah is the solution to the homogeneous equation (81), and he i’s a particular solution
to the inhomogeneous equation (71). For every t’(t .c t’ < t ~ ) , if
hh(t) = s(t, t ’ ) vU( t r )H( t ’ ) , (92)
then
K.-Y. WANG et al.
d
dt
-Ah(?) = -F’*S(f , ?’)H(?’) = -F’*Ah(?).
Hence, (92) is the general solution to (8 1). Since S is a linear opera
following summation
(93)
or, let’s consider the
were Ah is from (92). Hence, Ap has a form relating to VuH(t). Now, we write (94) in a
continuous form as
S ( t , t ) V u ( , ) H ( t ) d t =A(?). (95)
Following Boas (1983), the differentiation of an integral, following Leibniz’s rule
Let
Then
Here (89) is used to derived the last line of (98). Hence,
(99)
dA
dt
- = VU(t)H(f) - F’*(?’)A(?).
Here we have demonstrated that, with the boundary condition h(tN) = 0, (90) is the
solution of the inhomogeneous adjoint tangent linear equation (7 1). Most importantly,
if we set t = to, then (90) is
1.5
1 0
0 5
00
-0.5
1 5 - 1 0 -0 5 0 0 0 5 1 0 1 5
x1
Figure 2. Contours of the two-dimensional Rosenbrock function with respect to two variables A ! and x2. Also
shown on the plot are two steepest descent minimization paths. For the path denoted by solid circles, the steepest
descent starts from (-0.8, 1.0); while for the path denoted by open squares, the steepest descent starts from
(-1.5, -0.5). Both pads converge to (1 , 1) of the minimum point.
Hence, when compared with (87), it follows that
V"(0) J = +to). (101)
We note that if h(t0) is zero, then Vu(o)J is zero. Equation (101) now shows that lhq
desired gradient of J with respect to U(O) is equal to the negative of the backward inte-
gration of the inhomogeneous adjoint tangent linear equation from boundary condition
h(tN) = 0 to h(t9). This result is consistent with the result described in the previous
section using Lagrange multipliers.
(c) Minimization procedures
Once the gradient of the cost function J is found, a steepest descent algorithm can be
used to find its minimum value (Daley 1992). Figure 2 shows contours of a Rosenbrock
function,
(102)
and two steepest descent paths. The paths are chlculated based on the descent algorithm
MlQN3 of Gilbert and Lemarechal (1989), which used special Broyden-Fletcher-
Goldfarb-Shanno method matrices (Nocedal 1980) in updating quasi-Newton matrices
for the calculation of the Hessian at each iteration. The advantage of this method is that
the large-scale problem (with many control variables) can be employed on the platform
f ( x 1 , x2) = (1 - x2)2 + IOO(x2 - x y ,
with limited storage. It is clear that both paths efficiently converge to ihe minimum point
of (1, 1) where f d n exists.
Since chemical concentrations can vary over many orders of mtgnitude, a proper
scaling of each control variable is crucial to the success of the minimization procedure
(Navon et al. 1992). From a geometrical point of view, if the constant-cost contours are
circular, then the negative gradient of the cost function will point rildially towards the
minimum, and only one descent iteration will be needed. In practicb, the cost function
with respect to the original coordinates of control variables is multiplied by a scaling
matrix, which is chosen so that the resulting vector points more nearly towards the
minimum (Thacker 1989).
( d ) Implementation of a chemical 4D-Var
Here we conclude the discussion of adjoint theory and outline s i t s implementation.
The nonlinear forecast model is
dU - = F(U).
dt
The corresponding forward tangent linear model is written as
-- - F'SU. dSU
dt
The backward inhomogeneous adjoint tangent linear model is then formulated as
with the boundary conditions
h(0) = h(T) = 0. (106)
Here H is the scalar-valued distance function (section 3(b)). The discretized form of the
adjoint model is written as
Finally, a steepest descent minimization algorithm (as described. in section 3(c)) is
used to find the value which minimizes J , the distance between model prediction and
observation.
The implementation of a chemical 4-D var is described as follows. The VJ(U,) can
be obtained, for a given Uto, by performing the following operations:
(i) Starting from U, at time to, integrate (103) forwards in time from t = to to t = T.
Stores the values thus computed for U(t), to < t < T.
(ii) Starting from A(t = T) = 0, integrate (107) backwards in time from t = T to
t = to. The adjoint operator GI*(#) and the gradient VuJ(t) are determined, at each t,
from the values U(t) computed iq the direct forward integration of equation.
(iii) The final value of h(t = to) is the negative of the required gradient VJu,,, . Notice
that if h(t = to) is zero, then VJub is zero.
(iv) Use a descent algorithm to determine the value Umin which minimizes J .
tion of (a) 0 3 , (b) C10, (c) BrO, (d) CI and (e) Br for
48 hours of assimilation. Solid circles are the model
integrations without data assimilation; open circles
are Ute observations; and the crosses are the model
integrations with data assimilation.
A REVIEW ON CHEMISTRY 4D-VAR 2201
4 . APPLICATION TO A CHEMISTRY 4 D - V A R
Following the formulation and implementation procedures described previously, a
che@cal data assimilation has been performed using the catalytic ozone destruction
mechanism described in section 2(e). A stiff ordinary differential equation integrator
(Brown et al. 1989) was used to integrate (24), with rate coefficients taken from
De More et al. (1997). The model was first run with initial conditions close to the typical
stratospheric concentrations found at a height of 25 km. A 48-hour model integration
was performed and the output stored as the observations. A second integration was then
performed where the initial conditions used were double those of the first integration,
and the output treated as the modelled results. Finally a third integration was performed
using the modelled output from integration 2 and the observations obtained from
integration 1. Here the integration used the iterative variational data assimilation method
to minimize the difference between initial modelled values and observations.
Figure 3 shows the results from integrations (1-3). The model-integration 2-
shows that the high ozone concentrations have been significantly reduced to less than
50% of their initial value after two days of catalytic ozone destruction. During this
period of time, the radicals C10 and BrO have remained relatively constant, whereas C1
and Br atoms have risen dramatically as the availability of ozone decreases.
For integration 1, where initial ozone levels were half those of integration 2, the
rate of ozone loss is lower, but still a considerable amount (30%) is destroyed during
the-integration. Comparing integrations 1 aid 2 confirms the earlier work of Crutzen
et al. (1995) that higher ozone levels leads to increased catalytic loss. It should be
noted of course that in these integrations gas-phase chemistry only has been considered.
If heterogeneous chemistry (i.e. polar stratospheric clouds) had been included these
would have led to increased levels of chlorine and bromine in the system and therefore
increased loss of ozone.' ,
Integration 3 uses data assimilation (crosses) to minimize the difference between
the model (solid circles) and the observations (open circles), and Fig. 3 shows that
this has been achieved remarkably well. The longer-lived species (03 and C10) are
in fact a better fit than the shorter lived species (BrO, C1 and Br). However, since the
concentration of the short-lived species is basically controlled by the change of the long-
lived species (e.g. see Fisher and Lary 1995; Khattatov et al. 1999), a good estimate of
the long-lived species will produce a good analysis for the short-lived species.
In the actual implementation of the technique to real data, the observation error
should be taken into account when comparing differences between the 4D-Var analyses
and the observations. In a separate paper we will discuss the practical application of
the chemical 4D-Var in analysing Atmospheric Trace Molecule Spectroscopy data and
for studies on NO, partitioning in the stratosphere. We will also explore the use of
the Kalman filter, which considers errors from both model and observations, in 4D
variational chemistry data assimilation.
5. CONCLUSION
In this review paper we described a detailed formulation of the adjoint method to
be used in 4D variational chemistry data assimilation. The implementation procedure is
outlined and the method is tested using a simple catalytic ozone destruction mechanism
Which is relevant to the ozone depletion during the early spring over the polar regions.
'ks shpwn above, the 4IS;Var pethod is very effective in fitting the model to the
b Isbservations. Hence the use o&t.hk ,method will enable better analysis of stratospheric
and tropospheric4mnktry to be made. In a following paper we will describe the
application of the 4D-Var method in fitting a complicated stratospheric chemistry model
to ATMOS data, and to the study of NOy partioning in the stratosphere.
ACKNOWLEDGEMENTS
We are very grateful to Olivier Talagrand, Hendrik Elbern, Boris Khattatov, and Eu-
ropean Centre for Medium-Range Weather Forecasts for sending us their publications,
and M. Fisher for his 4D-Var code. We acknowledge the use of data from the Met Office
and British Atmospheric Data Centre. David Lary like to thank the Natural Environment
Research Council (NERC) and the European Union for research support. We also thank
two anonymous reviewers for their insightful comments that greatly improved the clarity
of this paper. The Centre for Atmospheric Science is a joint initiative of the Department
of Chemistry and the Department of Applied Mathematics and Theoretical Physics. This
work forms part of the NERC United Kingdom Universities Global Atmospheric Mod-
elling Programme.
Boas, M. L.
Brown, P. N., Byme, G. D. and
Hindmarsh. A. C.
Crutzen, P. J., Groos, J.-U.,
BrUhl, C.. MUller, R. and
Russell III, J. M.
Daley, R.
DeMore, W. B., Sander, S. P.,
Golden, D. M.,
Hampson. R. F., Kurylo, M. J.,
Howard, C. J.,
Ravishadtam, A. R.,
Kolb, C. E. and Molina, M. J.
E l h , H., Schmidt, H. and
Ebel, A.
Feintuch, A.
Fisher, M. and Lary, D. J.
Giering, R. and Kaminski, T.
Gilbert, J. C. and Lemarechal, C.
Kaminski, T., Heimann, M. and
Khattatov, B. V., Gille. J. C.,
Giering, R.
Lyjak, L. V., Brassew, G. B.,
Dvomv, V. L., Roche, A. E.
and Waters. J. W.
Navon. I. M., Zou. X.. Derber, J.
and Sela, J.
Nocedal, J.
Simmons, G. F.
Talagrand, 0.
Talagrand, 0. and Courtier, P.
1983
1989
1995
1991
1997
1997
1998
1995
1998
1989
1999
1999
1992
1980
1963
1991
1985
REFERENCES
Mathematical methods in the physical sciences, second edition.
John Wiley & Sons, New Yo&
VODE A variable coefficient ODE solver. SLAM J. Sci. Stat.
Comput., 10,1038-1051
A reevaluation of the ozone budget with HALOE UARS data: No
evidence for the ozone deficit. Science, 268,705-708
Atmospheric data analysis. Cambridge University Press, Cam-
bridge
chemical kinetics and photochemical data for use in stratospheric
modeling-Evaluation 12. JPL Publication, 97-94
Variational data assimilation for tropospheric chemistry model-
Robust c o w l theory in Hilbert space. Springer-Verlag, New
Lagrangian four-dimensional variational data assimulation of
Recipes for adjoint code construction. ACM TOMS, 24,437-474
Some numerical experiments with variable-storage quasi-Newton
A coarse grid three-dimensional global inverse model of the
atmospheric transport. I: Adjoint model and Jacobian matrix.
J. Geophy. Res., 104,18535-18553
Assimilation of photochemically active species and a case analy-
sis of UARS data J. Geophys. Res., 104,187 15-1 8737
ing. J. Geophys. Res., 102,15967-15985
YO&
chemical species. Q. J. R. Meteoml. Soc.. 121,1681-1704
algorithms. Math. P m g m . , 45,407-435
Variational data assimilation with an adiabatic version of the
NMC spectral model, Mon. Weather Rev., 120,1433-1446
Updating quasi-Newton matrices with limited storage. Math.
Comput., 35,773-782
Introdrcction to topology and modern analysis. McGraw-Hill,
London
Adjoint models. Pp. 73-91 in Numerical Methods in Atnros-
pheric Models, Vol. IZ. European Centre for Medium-Range
Weather Forecasts, Reading, UK
‘Formalisation de la methode du modele adjoint. Applications
m6thologiques’. Note No. 117, Etablissement &Etudes et
de Recherches M6tcorOlogiques, Paris, France
Talagrand, 0. and Courtier, P.
Thacker, W. C.
Weisstein, E. W.
1987 Variational assimilation of meteorological observations with the
adjoint vorticity equation. I: Theory. Q. J. R. Meteoml. SOC.,
The role of the Hessian matrix in fitting models to measurements.
J. Geophys. Res., 9 4 , 6 1 7 7 4 9 6
CRC concise encyclopedia of mathematics. CRC Ress, Boca
Raton, Fla., USA
113,1311-1328
1989
1998
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime



