Inverse Problem Theory -
Inverse Problem Theory and Methods for Model Parameter Estimation OT89 Tarantola FM2.qxp 11/18/2004 3:50 PM Page 1
OT89 Tarantola FM2.qxp 11/18/2004 3:50 PM Page 2
Inverse Problem Theory and Methods for Model Parameter Estimation OT89 Tarantola FM2.qxp 11/18/2004 3:50 PM Page 1
OT89 Tarantola FM2.qxp 11/18/2004 3:50 PM Page 2
Society for Industrial and Applied Mathematics Philadelphia Inverse Problem Theory and Methods for Model Parameter Estimation Albert Tarantola Institut de Physique du Globe de Paris Universit�� de Paris 6 Paris, France OT89 Tarantola FM2.qxp 11/18/2004 3:50 PM Page 3
is a registered trademark. Copyright �� 2005 by the Society for Industrial and Applied Mathematics. 10 9 8 7 6 5 4 3 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written per- mission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104- 2688. Library of Congress Cataloging-in-Publication Data Tarantola, Albert. Inverse problem theory and methods for model parameter estimation / Albert Tarantola. p. cm. Includes bibliographical references and index. ISBN 0-89871-572-5 (pbk.) 1. Inverse problems (Differential equations) I. Title. QA371.T357 2005 515���.357���dc22 2004059038 OT89 Tarantola FM2.qxp 11/18/2004 3:50 PM Page 4
To my parents, Joan and Fina OT89 Tarantola FM2.qxp 11/18/2004 3:50 PM Page 5
OT89 Tarantola FM2.qxp 11/18/2004 3:50 PM Page 2
2004/11/19 page vii ��� ��� ��� ��� ��� ��� Contents Preface xi 1 The General Discrete Inverse Problem 1 1.1 Model Space and Data Space . . . . . . . . . . . . . . . . . . . . . . 1 1.2 States of Information . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Forward Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4 Measurements and A Priori Information . . . . . . . . . . . . . . . . 24 1.5 Defining the Solution of the Inverse Problem . . . . . . . . . . . . . . 32 1.6 Using the Solution of the Inverse Problem . . . . . . . . . . . . . . . 37 2 Monte Carlo Methods 41 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2 The Movie Strategy for Inverse Problems . . . . . . . . . . . . . . . . 44 2.3 Sampling Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.4 Monte Carlo Solution to Inverse Problems . . . . . . . . . . . . . . . 51 2.5 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3 The Least-Squares Criterion 57 3.1 Preamble: The Mathematics of Linear Spaces . . . . . . . . . . . . . 57 3.2 The Least-Squares Problem . . . . . . . . . . . . . . . . . . . . . . . 62 3.3 Estimating Posterior Uncertainties . . . . . . . . . . . . . . . . . . . 70 3.4 Least-Squares Gradient and Hessian . . . . . . . . . . . . . . . . . . 75 4 Least-Absolute-Values Criterion and Minimax Criterion 81 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2 Preamble: p-Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3 The p-Norm Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.4 The 1-Norm Criterion for Inverse Problems . . . . . . . . . . . . . . 89 4.5 The ���-Norm Criterion for Inverse Problems . . . . . . . . . . . . . . 96 5 Functional Inverse Problems 101 5.1 Random Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 Solution of General Inverse Problems . . . . . . . . . . . . . . . . . . 108 5.3 Introduction to Functional Least Squares . . . . . . . . . . . . . . . . 108 5.4 Derivative and Transpose Operators in Functional Spaces . . . . . . . 119 vii
2004/11/19 page viii ��� ��� ��� ��� ��� ��� viii Contents 5.5 General Least-Squares Inversion . . . . . . . . . . . . . . . . . . . . 133 5.6 Example: X-Ray Tomography as an Inverse Problem . . . . . . . . . 140 5.7 Example: Travel-Time Tomography . . . . . . . . . . . . . . . . . . 143 5.8 Example: Nonlinear Inversion of Elastic Waveforms . . . . . . . . . . 144 6 Appendices 159 6.1 Volumetric Probability and Probability Density . . . . . . . . . . . . . 159 6.2 Homogeneous Probability Distributions . . . . . . . . . . . . . . . . . 160 6.3 Homogeneous Distribution for Elastic Parameters . . . . . . . . . . . 164 6.4 Homogeneous Distribution for Second-Rank Tensors . . . . . . . . . 170 6.5 Central Estimators and Estimators of Dispersion . . . . . . . . . . . . 170 6.6 Generalized Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . 174 6.7 Log-Normal Probability Density . . . . . . . . . . . . . . . . . . . . 175 6.8 Chi-Squared Probability Density . . . . . . . . . . . . . . . . . . . . 177 6.9 Monte Carlo Method of Numerical Integration . . . . . . . . . . . . . 179 6.10 Sequential Random Realization . . . . . . . . . . . . . . . . . . . . . 181 6.11 Cascaded Metropolis Algorithm . . . . . . . . . . . . . . . . . . . . . 182 6.12 Distance and Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 6.13 The Different Meanings of the Word Kernel . . . . . . . . . . . . . . 183 6.14 Transpose and Adjoint of a Differential Operator . . . . . . . . . . . . 184 6.15 The Bayesian Viewpoint of Backus (1970) . . . . . . . . . . . . . . . 190 6.16 The Method of Backus and Gilbert . . . . . . . . . . . . . . . . . . . 191 6.17 Disjunction and Conjunction of Probabilities . . . . . . . . . . . . . . 195 6.18 Partition of Data into Subsets . . . . . . . . . . . . . . . . . . . . . . 197 6.19 Marginalizing in Linear Least Squares . . . . . . . . . . . . . . . . . 200 6.20 Relative Information of Two Gaussians . . . . . . . . . . . . . . . . . 201 6.21 Convolution of Two Gaussians . . . . . . . . . . . . . . . . . . . . . 202 6.22 Gradient-Based Optimization Algorithms . . . . . . . . . . . . . . . . 203 6.23 Elements of Linear Programming . . . . . . . . . . . . . . . . . . . . 223 6.24 Spaces and Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 230 6.25 Usual Functional Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 242 6.26 Maximum Entropy Probability Density . . . . . . . . . . . . . . . . . 245 6.27 Two Properties of p-Norms . . . . . . . . . . . . . . . . . . . . . . . 246 6.28 Discrete Derivative Operator . . . . . . . . . . . . . . . . . . . . . . 247 6.29 Lagrange Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 249 6.30 Matrix Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 6.31 Inverse of a Partitioned Matrix . . . . . . . . . . . . . . . . . . . . . 250 6.32 Norm of the Generalized Gaussian . . . . . . . . . . . . . . . . . . . 250 7 Problems 253 7.1 Estimation of the Epicentral Coordinates of a Seismic Event . . . . . . 253 7.2 Measuring the Acceleration of Gravity . . . . . . . . . . . . . . . . . 256 7.3 Elementary Approach to Tomography . . . . . . . . . . . . . . . . . . 259 7.4 Linear Regression with Rounding Errors . . . . . . . . . . . . . . . . 266 7.5 Usual Least-Squares Regression . . . . . . . . . . . . . . . . . . . . . 269 7.6 Least-Squares Regression with Uncertainties in Both Axes . . . . . . 273
2004/11/19 page ix ��� ��� ��� ��� ��� ��� Contents ix 7.7 Linear Regression with an Outlier . . . . . . . . . . . . . . . . . . . . 275 7.8 Condition Number and A Posteriori Uncertainties . . . . . . . . . . . 279 7.9 Conjunction of Two Probability Distributions . . . . . . . . . . . . . . 285 7.10 Adjoint of a Covariance Operator . . . . . . . . . . . . . . . . . . . . 288 7.11 Problem 7.1 Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 289 7.12 Problem 7.3 Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 289 7.13 An Example of Partial Derivatives . . . . . . . . . . . . . . . . . . . 290 7.14 Shapes of the p -Norm Misfit Functions . . . . . . . . . . . . . . . . 290 7.15 Using the Simplex Method . . . . . . . . . . . . . . . . . . . . . . . 293 7.16 Problem 7.7 Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 295 7.17 Geodetic Adjustment with Outliers . . . . . . . . . . . . . . . . . . . 296 7.18 Inversion of Acoustic Waveforms . . . . . . . . . . . . . . . . . . . . 297 7.19 Using the Backus and Gilbert Method . . . . . . . . . . . . . . . . . . 304 7.20 The Coefficients in the Backus and Gilbert Method . . . . . . . . . . . 308 7.21 The Norm Associated with the 1D Exponential Covariance . . . . . . 308 7.22 The Norm Associated with the 1D Random Walk . . . . . . . . . . . 311 7.23 The Norm Associated with the 3D Exponential Covariance . . . . . . 313 References and References for General Reading 317 Index 333
2004/11/19 page x ��� ��� ��� ��� ��� ���
2004/11/19 page xi ��� ��� ��� ��� ��� ��� Preface Physical theories allow us to make predictions: given a complete description of a physical system, we can predict the outcome of some measurements. This problem of predicting the result of measurements is called the modelization problem, the simulation problem, or the forward problem. The inverse problem consists of using the actual result of some measurements to infer the values of the parameters that characterize the system. While the forward problem has (in deterministic physics) a unique solution, the inverse problem does not. As an example, consider measurements of the gravity field around a planet: given the distribution of mass inside the planet, we can uniquely predict the values of the gravity field around the planet (forward problem), but there are different distributions of mass that give exactly the same gravity field in the space outside the planet. Therefore, the inverse problem ��� of inferring the mass distribution from observations of the gravity field ��� has multiple solutions (in fact, an infinite number). Becauseofthis, intheinverseproblem, oneneedstomakeexplicitanyavailableapriori information on the model parameters. One also needs to be careful in the representation of the data uncertainties. The most general (and simple) theory is obtained when using a probabilistic point of view, where the a priori information on the model parameters is represented by a probability distribution over the ���model space.��� The theory developed here explains how this a priori probability distribution is transformed into the a posteriori probability distribution, by incor- porating a physical theory (relating the model parameters to some observable parameters) and the actual result of the observations (with their uncertainties). To develop the theory, we shall need to examine the different types of parameters that appear in physics and to be able to understand what a total absence of a priori information on a given parameter may mean. Although the notion of the inverse problem could be based on conditional probabilities and Bayes���s theorem, I choose to introduce a more general notion, that of the ���combination of states of information,��� that is, in principle, free from the special difficulties appearing in the use of conditional probability densities (like the well-known Borel paradox). The general theory has a simple (probabilistic) formulation and applies to any kind of inverse problem, including linear as well as strongly nonlinear problems. Except for very simple examples, the probabilistic formulation of the inverse problem requires a resolution in terms of ���samples��� of the a posteriori probability distribution in the model space. This, in particular, means that the solution of an inverse problem is not a model but a collection of models (that are consistent with both the data and the a priori information). This is xi
2004/11/19 page xii ��� ��� ��� ��� ��� ��� xii Preface why Monte Carlo (i.e., random) techniques are examined in this text. With the increasing availability of computer power, Monte Carlo techniques are being increasingly used. Some special problems, where nonlinearities are weak, can be solved using special, very efficient techniques that do not differ essentially from those used, for instance, by Laplace in 1799, who introduced the ���least-absolute-values��� and the ���minimax��� criteria for obtaining the best solution, or by Legendre in 1801 and Gauss in 1809, who introduced the ���least-squares��� criterion. The first part of this book deals exclusively with discrete inverse problems with a finite number of parameters. Some real problems are naturally discrete, while others contain functions of a continuous variable and can be discretized if the functions under consideration are smooth enough compared to the sampling length, or if the functions can conveniently be described by their development on a truncated basis. The advantage of a discretized point of view for problems involving functions is that the mathematics is easier. The disadvantage is that some simplifications arising in a general approach can be hidden when using a discrete formulation. (Discretizing the forward problem and setting a discrete inverse problem is not always equivalent to setting a general inverse problem and discretizing for the practical computations.) The second part of the book deals with general inverse problems, which may contain such functions as data or unknowns. As this general approach contains the discrete case in particular, the separation into two parts corresponds only to a didactical purpose. Although this book contains a lot of mathematics, it is not a mathematical book. It tries to explain how a method of acquisition of information can be applied to the actual world, and many of the arguments are heuristic. This book is an entirely rewritten version of a book I published long ago (Tarantola, 1987). Developments in inverse theory in recent years suggest that a new text be proposed, but that it should be organized in essentially the same way as my previous book. In this new version, I have clarified some notions, have underplayed the role of optimization techniques, and have taken Monte Carlo methods much more seriously. I am very indebted to my colleagues (Bartolom�� Coll, Georges Jobert, Klaus Mosegaard, Miguel Bosch, Guillaume ��vrard, John Scales, Christophe Barnes, Fr��d��ric Parrenin, and Bernard Valette) for illuminating discussions. I am also grateful to my col- laborators at what was the Tomography Group at the Institut de Physique du Globe de Paris. Albert Tarantola Paris, June 2004
2004/11/19 page 1 ��� ��� ��� ��� ��� ��� Chapter 1 The General Discrete Inverse Problem Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. John W. Tukey, 1962 Central to this chapter is the concept of the ���state of information��� over a parameter set. It is postulated that the most general way to describe such a state of information is to define a probability density over the parameter space. It follows that the results of the measurements of the observable parameters (data), the a priori information on model parameters, and the information on the physical correlations between observable parameters and model parameters can all be described using probability densities. The general inverse problem can then be set as a problem of ���combining��� all of this information. Using the point of view developed here, the solution of inverse problems, and the analysis of uncertainty (sometimes called ���error and resolution analysis���), can be performed in a fully nonlinear way (but perhaps with a large amount of computing time). In all usual cases, the results obtained with this method reduce to those obtained from more conventional approaches. 1.1 Model Space and Data Space Let S be the physical system under study. For instance, S can be a galaxy for an astro- physicist, Earth for a geophysicist, or a quantum particle for a quantum physicist. The scientific procedure for the study of a physical system can be (rather arbitrarily) divided into the following three steps. i) Parameterization of the system: discovery of a minimal set of model parameters whose values completely characterize the system (from a given point of view). 1