Sign up & Download
Sign in

Karhunen-Loeve eigenvalue problems in cosmology: how should we tackle large data sets?

by Max Tegmark, Andy Taylor, Alan Heavens
(1996)

Abstract

Since cosmology is no longer "the data-starved science", the problem of how to best analyze large data sets has recently received considerable attention, and Karhunen-Loeve eigenvalue methods have been applied to both galaxy redshift surveys and Cosmic Microwave Background (CMB) maps. We present a comprehensive discussion of methods for estimating cosmological parameters from large data sets, which includes the previously published techniques as special cases. We show that both the problem of estimating several parameters jointly and the problem of not knowing the parameters a priori can be readily solved by adding an extra singular value decomposition step. It has recently been argued that the information content in a sky map from a next generation CMB satellite is sufficient to measure key cosmological parameters (h, Omega, Lambda, etc) to an accuracy of a few percent or better - in principle. In practice, the data set is so large that both a brute force likelihood analysis and a direct expansion in signal-to-noise eigenmodes will be computationally unfeasible. We argue that it is likely that a Karhunen-Loeve approach can nonetheless measure the parameters with close to maximal accuracy, if preceded by an appropriate form of quadratic "pre-compression". We also discuss practical issues regarding parameter estimation from present and future galaxy redshift surveys, and illustrate this with a generalized eigenmode analysis of the IRAS 1.2 Jy survey optimized for measuring beta=Omega 0.6/b using redshift space distortions.

Cite this document (BETA)

Available from arxiv.org
Page 1
hidden

Karhunen-Loeve eigenvalue problems in cosmology: how should we tackle large data sets?

ar
X
iv
:a
str
o-
ph
/9
60
30
21
v2
1
9
N
ov
1
99
6
ApJ, in press. Available from h t t p://www.mpa-garching.mpg.de/˜ max/karhunen.html (faster from Europe)
and from h t t p://www.sns.ias.edu/˜ max/karhunen.html (faster from the US).
Karhunen-Loe`ve Eigenvalue Problems in Cosmology:
How should we Tackle Large Data Sets?
Max Tegmark1,2,4, Andy N. Taylor3, & Alan F. Heavens3
1Institute for Advanced Study, Olden Lane, Princeton, NJ 08540
2Max-Planck-Institut fu¨r Astrophysik, Karl-Schwarzschild-Str. 1, D-85740 Garching
3Institute for Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh, U.K.
4Hubble Fellow
max@ias.edu, ant@roe.ac.uk, afh@roe.ac.uk
1 February 2008
ABSTRACT
Since cosmology is no longer “the data-starved science”, the problem of how to best
analyze large data sets has recently received considerable attention, and Karhunen-
Loe`ve eigenvalue methods have been applied to both galaxy redshift surveys and
Cosmic Microwave Background (CMB) maps. We present a comprehensive discussion
of methods for estimating cosmological parameters from large data sets, which includes
the previously published techniques as special cases. We show that both the problem of
estimating several parameters jointly and the problem of not knowing the parameters
a priori can be readily solved by adding an extra singular value decomposition step.
It has recently been argued that the information content in a sky map from a next
generation CMB satellite is sufficient to measure key cosmological parameters (h, Ω,
Λ, etc.) to an accuracy of a few percent or better — in principle. In practice, the
data set is so large that both a brute force likelihood analysis and a direct expansion
in signal-to-noise eigenmodes will be computationally unfeasible. We argue that it is
likely that a Karhunen-Loe`ve approach can nonetheless measure the parameters with
close to maximal accuracy, if preceded by an appropriate form of quadratic “pre-
compression”.
We also discuss practical issues regarding parameter estimation from present and
future galaxy redshift surveys, and illustrate this with a generalized eigenmode analysis
of the IRAS 1.2 Jy survey optimized for measuring β ≡ Ω0.6/b using redshift space
distortions.
Key words: Cosmology: theory, large scale structure of Universe, cosmic microwave
background – Astronomical methods: data analysis, statistical
1 INTRODUCTION
The problem of analysis of large data sets is one that, un-
til recently, has not been a major concern of cosmologists.
Indeed, in some areas no data existed to be analyzed. In
the last few years, this situation has rapidly changed. A
highlight in this transition has been the discovery of fluc-
tuations in the cosmic microwave background (CMB) by
the Cosmic Background Explorer (COBE) satellite (Smoot
et al. 1992). In its short lifetime, COBE produced such a
large data set that a number of sophisticated data-analysis
methods were developed specifically to tackle it. In addi-
tion, the advent of large galaxy redshift surveys has created
a field where the data sets increase by an order of magnitude
in size in each generation. For instance, the surveys where
the object selection was based on the Infrared Astronomy
Satellite (IRAS) contain several thousand galaxies: ∼ 2, 000
(QDOT; Lawrence et al. 1996) ∼ 5, 000 (Berkeley 1.2Jy;
Fisher et al. 1995) and ∼ 15, 000 (PSC-z; Saunders et al.
1996). The proposed next-generation surveys will have much
larger numbers of objects — around 250,000 in the Anglo-
Australian Telescope 2 degree Field galaxy redshift survey
(Taylor 1995) and ∼ 106 for the Sloan Digital Sky Survey
(Gunn & Weinberg 1995). Similarly plate measuring ma-
chines, such as the APM at Cambridge and SuperCOSMOS
at the Royal Observatory, Edinburgh, can produce very large
catalogues of objects, and numerical simulations of galaxy
clustering are even now capable of producing so much data
Page 2
hidden
2 M. Tegmark, A. N. Taylor, & A. F. Heavens
that the analysis and storage of the information is in itself
a challenge.
A standard technique for estimating parameters from
data is the brute force maximum likelihood method, which
illustrates why people have been driven towards developing
more sophisticated methods. For n data items (e.g., pixels
in a CMB map, or Fourier amplitudes from a transformed
galaxy distribution), the maximum likelihood method re-
quires inversion of an n×n matrix for each set of parameter
values considered — and this is for the simplest possible
case where the probability distribution is Gaussian. Since a
next-generation CMB satellite might produce a high resolu-
tion sky map with ∼ 107 pixels, and the CPU time required
for an inversion scales as n3, a brute force likelihood analysis
of this type of data set will hardly be feasible in the near
future.
Fortunately, it is often possible to greatly accelerate
a likelihood analysis by first performing some appropriate
form of data compression, by which the data set is substan-
tially reduced in size while nonetheless retaining virtually all
the relevant cosmological information. In this spirit, a large
number of data-compression methods have been applied in
the analysis of both CMB maps (e.g. Seljak & Bertschinger
1993; Go´rski 1994) and galaxy redshift surveys (e.g. Davis &
Peebles 1983; Feldman et al. 1994; Heavens & Taylor 1995).
A powerful prescription for how to do this optimally is the
Karhunen-Loe`ve eigenvalue method (Karhunen 1947), which
has recently been applied to both CMB maps (Bond 1994;
Bunn 1995; Bunn & Sugiyama 1995) and redshift surveys
(Vogeley 1995; Vogeley & Szalay 1996). The goal of this pa-
per is to review the more general framework in which these
treatments belong, and to present some important general-
izations that will facilitate the analysis of the next genera-
tion of cosmological data sets.
The rest of this paper is organized as follows. In Sec-
tion 2, we review some useful information-theoretical results
that tell us how well parameters can be estimated, and how
to determine whether a given analysis method is good or
bad. In Section 3, we review the Karhunen-Loe`ve data com-
pression method and present some useful generalizations. In
Section 4 we illustrate the theory with various cosmology ap-
plications, including the special case of the signal-to-noise
eigenmode method. In Section 5 we discuss limitations of
method and possible ways of extending it to make the analy-
sis feasible for huge data sets such as a 107 pixel future CMB
map. Finally, our results are summarized and discussed in
Section 6.
2 HOW WELL CAN YOU DO WITHOUT
DATA COMPRESSION?
How accurately can we estimate model parameters from a
given data set? This question was basically answered 60
years ago (Fisher 1935), and we will now summarize the
results, which are both simple and useful.
2.1 The classical results
Suppose for definiteness that our data set consists on n real
numbers x1, x2, ..., xn, which we arrange in an n-dimensional
vector x. These numbers could for instance denote the mea-
sured temperatures in the n pixels of a CMB sky map, the
counts-in-cells of a galaxy redshift survey, the first n coef-
ficients of a Fourier-Bessel expansion of an observed galaxy
density field, or the number of gamma-ray bursts observed
in n different flux bins. Before collecting the data, we think
of x as a random variable with some probability distribution
L(x;Θ), which depends in some known way on a vector of
model parameters
Θ = (θ1, θ2, ..., θm). (1)
Such model parameters might for instance be the spectral
index of density fluctuations, the Hubble constant h, the cos-
mic density parameter Ω or the mean redshift of gamma-ray
bursts. We will let Θ0 denote the true parameter values and
let Θ refer to our estimate of Θ. Since Θ is some function
of the data vector x, it too is a random variable. For it to be
a good estimate, we would of course like it to be unbiased,
i.e.,
〈Θ〉 =Θ0, (2)
and give as small error bars as possible, i.e., minimize the
standard deviations
∆θi ≡
(〈
θ2i

− 〈θi〉2
)1/2
(3)
In statistics jargon, we want the BUE θi, which stands for
the “Best Unbiased Estimator”.
A key quantity in this context is the so-called Fisher
information matrix, defined as
F ij ≡

∂2L
∂θi∂θj

, (4)
where L ≡ − lnL. Another key quantity is the maximum
likelihood estimator, or ML-estimator for brevity, defined as
the parameter vector ΘML that maximizes the likelihood
function L(x;Θ). (Above we thought of L(x;Θ) as a prob-
ability distribution, a function of x. However, when limiting
ourselves to a given data set x and thinking of L(x;Θ) as
a function of the parameters Θ, it is customarily called the
likelihood function.)
Using this notation, a number of powerful theorems
have been proven (see e.g. Kenney & Keeping 1951 and
Kendall & Stuart 1969 for details):
(i) For any unbiased estimator, ∆θi ≥ 1/

F ii.
(ii) If there is a BUE Θ, then it is the ML-estimator or a
function thereof.
(iii) The ML-estimator is asymptotically BUE.
The first of these theorems, known as the Crame´r-Rao in-
equality, thus places a firm lower limit on the error bars
that one can attain, regardless of which method one is us-
ing to estimate the parameters from the data. This is the
minimum error bar attainable on θi if all the other parame-
ters are known. If the other parameters are estimated from
the data as well, the minimum standard deviation rises to
∆θi ≥ (F−1)1/2ii .
The second theorem shows that maximum-likelihood
(ML) estimates have quite a special status: if there is a best
method, then the ML-method is the one. Finally, the third
result basically tells us that in the limit of a very large data
set, the ML-estimate for all practical purposes is the best
estimate, the one that for which the Crame´r-Rao inequality
Page 3
hidden
Eigenvalue Problems 3
becomes an equality. It is these nice properties that have
made ML-estimators so popular.
2.2 The Fisher information matrix
Although the proof of the Crame´r-Rao inequality is rather
lengthy, it is quite easy to acquire some intuition for why the
Fisher information matrix has the form that it does. This is
the purpose of the present section.
Let us Taylor expand L around the ML-estimate Θ.
By definition, all the first derivatives ∂L/∂θi will vanish at
the ML-point, since the likelihood function has its maxi-
mum there, so the local behavior will be dominated by the
quadratic terms. Since L = exp[−L], we thus see that the
likelihood function will be approximately Gaussian near the
ML-point. If the error bars are quite small, L usually drops
sharply before third order terms have become important, so
that this Gaussian is a good approximation to L everywhere.
Interpreting L as a Bayesian probability distribution in pa-
rameter space, the covariance matrix T is thus given simply
by the second derivatives at the ML-point, as the inverse of
the Hessian matrix:
(T−1)ij ≡ ∂
2L
∂θi∂θj
. (5)
Note that the Fisher information matrix F is simply the
expectation value of this quantity at the point Θ = Θ0
(which coincides with the ML-point on average if the ML-
estimate is unbiased). It is basically a measure of how fast
(on average) the likelihood function falls off around the ML-
point, i.e., a measure of the width and shape of the peak.
From this discussion, it is also clear that we can use its
inverse, F−1, as an estimate of the covariance matrix
T ≡ 〈ΘΘt〉 − 〈Θ〉〈Θ〉t (6)
of our parameter estimates when we use the ML-method.
2.3 The Gaussian case
Let us now explicitly compute the Fisher information ma-
trix for the case when the probability distribution is Gaus-
sian, i.e., where (dropping an irrelevant additive constant
n ln[2π])
2L = lndetC + (x−µ)C−1(x −µ)t, (7)
where in general both the mean vector µ and the covariance
matrix
C = 〈(x −µ)(x −µ)t〉 (8)
depend on the model parameters Θ. Although vastly sim-
pler than the most general situation, the Gaussian case is
nonetheless general enough to be applicable to a wide variety
of problems in cosmology. Defining the data matrix
D ≡ (x−µ)(x−µ)t (9)
and using the well-known matrix identity ln detC = Tr lnC ,
we can re-write equation (7) as
2L = Tr
[
lnC +C−1D
]
. (10)
We will use the standard comma notation for derivatives,
where for instance
C ,i ≡

∂θi
C . (11)
Since C is a symmetric matrix for all values of the parame-
ters, it is easy to see that all the derivatives C ,i, C ,ij , etc.,
will also be symmetric matrices. Using the matrix identities
(C−1),i = −C−1C ,iC−1 and (lnC),i = C−1C,i, we thus
obtain
2L,i = Tr
[
C−1C,i −C−1C ,i C−1D +C−1D,i
]
. (12)
When evaluating C and µ at the true parameter values, we
have 〈x〉 = µ and 〈xxt〉 = C +µµt, which gives





〈D〉 = C ,
〈D,i 〉 = 0,
〈D,ij 〉 = µ,iµ,tj +µ,j µ,ti .
(13)
Using this and equation (12), we obtain 〈L,i 〉 = 0. In other
words, the ML-estimate is correct on average in the sense
that the average slope of the likelihood function is zero at the
point corresponding to the true parameter values. Applying
the chain rule to equation (12), we obtain
2L,ij = Tr[ − C−1C ,iC−1C ,j +C−1C ,ij
+ C−1(C,i C−1C ,j +C ,j C−1C ,i )C−1D
− C−1(C,i C−1D,j +C ,j C−1D,i)
− C−1C ,ij C−1D +C−1D,ij ]. (14)
Substituting this and equation (13) into equation (4) and
using the trace identity Tr[AB] = Tr[BA], many terms drop
out and the Fisher information matrix reduces to simply
F ij = 〈L,ij 〉 =
1
2
Tr[AiAj +C
−1M ij ], (15)
where we have defined the matrices Ai ≡ C−1C ,i = (lnC),i
and M ij ≡ 〈D,ij 〉 = µ,i µ,tj +µ,j µ,ti. This old and well-
known result is also derived by Bunn (1995) and Vogeley &
Szalay (1996).
2.4 An example: parameter estimation with a
future CMB experiment
Let us illustrate the above results with a simple example
where µ = 0 and C is diagonal. Suppose a next generation
CMB satellite generates a high-resolution all-sky map of the
CMB fluctuations. Let us for the moment ignore the com-
plication of the galactic plane, and expand the temperature
distribution in spherical harmonics aℓm. We define our data
vector x as the set of these coefficients from ℓ = 2 up to some
cutoff ℓmax, i.e., n = (ℓmax+1)2 − 4 and xi = ai, where we
have combined ℓ and m into a single index i = 1, 2, 3... as
i = ℓ2 + ℓ+m+ 1. With the standard assumption that the
CMB fluctuations are isotropic, we have
{µ = 0,
C ij = δij
[
Cℓ + 4πσ
2
N e
θ2bℓ(ℓ+1)
]
.
(16)
Here Cℓ denotes the angular power spectrum of the CMB,
and the second term incorporates the effects of instrumen-
tal noise and beam smearing (Knox 1995, Tegmark & Efs-
tathiou 1996). N is the number of pixels in the sky map, σ
is the r.m.s. pixel noise,
θb ≡ FWHM/

8 ln 2 ≈ 0.425 FWHM, (17)
Page 4
hidden
4 M. Tegmark, A. N. Taylor, & A. F. Heavens
Figure 1. The derivatives of the CDM power spectrum with
respect to various parameters.
and FWHM is the full-width-half-max beam width. Assum-
ing that the CMB and noise fluctuations are Gaussian, equa-
tion (15) gives the Fisher information matrix
F ij =
ℓmax

ℓ=2
(2ℓ+ 1
2
)
[
Cℓ +
4πσ2
N e
θ2bℓ(ℓ+1)
]−2
(∂Cℓ
∂θi
)
(
∂Cℓ
∂θj
)
.(18)
This handy expression, also derived by Jungman et al.
(1996a), tells us that the crucial functions which determine
the attainable accuracy are the derivatives of the power spec-
trum with respect to the various parameters. Examples of
such derivatives are shown in Figure 1. For instance, the
derivative with respect to the reionization optical depth,
∂Cℓ/∂τ , is shaped as −Cℓ for ℓ ≫ 10, since earlier reion-
ization suppresses all these multipoles by the same factor
e−2τ . The derivative with respect to the quadrupole nor-
malization, ∂Cℓ/∂Q, of course has the same shape as the
power spectrum itself.
Equation (18) has a simple geometric interpretation. We
can think of the m functions ∂Cℓ/∂θi as vectors in a Hilbert
space of dimension (ℓmax−1), and think of the Fisher matrix
component F ij as simply the dot product of the vectors
∂Cℓ/∂θi and ∂Cℓ/∂θi. This dot product gives a rather small
weight to low ℓ-values, essentially because there are fewer m-
modes there and correspondingly larger cosmic variance. In
addition, the weights are seen to be exponentially suppressed
for large ℓ, where beam smearing causes the effect of pixel
noise to explode. If the parameter dependence of Cℓ was
such that all n vectors ∂Cℓ/∂θi were orthogonal under this
dot product, then F and the parameter covariance matrix
T = F−1 would be diagonal, and the errors in the estimates
of the different parameters would be uncorrelated. The more
similarly shaped two curves in Figure 1 are, the harder it
will be to break the degeneracy between the corresponding
two parameters. In the extreme case where two curves have
identical shape (are proportional to each other), the Fisher
matrix becomes singular and the resulting error bars on the
two parameters become infinite. It is therefore interesting to
diagonalize the matrix T (or equivalently, its inverse F ). The
eigenvectors will tell us which m parameter combinations
can be independently estimated from the CMB data, and
the corresponding eigenvalues tell us how accurately this
can be done.
It has been pointed out (Bond et al. 1994) that there
will be a considerable parameter degeneracy if data is only
available up to around the first Doppler peak. This is clearly
illustrated in Figure 1: most of the curves lack strong fea-
tures in this range, so some of them can be well approxi-
mated by linear combinations of the others. If a CMB ex-
periment has high enough resolution to measure the power
out to l ∼ 103, however, this degeneracy is broken. Jungman
et al. (1996b) compute the Fisher matrix for a model with
the eleven unknown parameters
Θ = (Ω,Ωbh2, h,Λ, nS , α, nT , T/S, τ, Q,Nν); (19)
the density parameter, the baryon density, the Hubble pa-
rameter, the cosmological constant, the spectral index of
scalar fluctuations, the “running” of this index (a measure
of the deviation from power law behavior), the spectral in-
dex of tensor fluctuations, the quadrupole tensor-to-scalar
ratio, the optical depth due to reionization, and the number
of light neutrino species, respectively. They find that even
when estimating all these parameters simultaneously, the re-
sulting error bars on some of the key parameters are of the
order of a few percent or better for experiments with a res-
olution good enough to probe the first few Doppler peaks.
Even the abundance of hot dark matter can be constrained
in this fashion (Dodelson et al. 1996). An up-to-date discus-
sion of which additional parameters can be constrained when
using large-scale-structure data as well is given by Liddle et
al. (1996).
In should be stressed that the formalism above only tells
us what the attainable accuracy is if the truth is somewhere
near the point in parameter space at which we compute the
power derivatives. Figure 1 corresponds to standard COBE-
normalized CDM, i.e., to
Θ0 = (1, 0.015, 0.5, 0, 1, 0, 0, 0, 0, 20µK, 3). (20)
The worst scenario imaginable would probably be extremely
early reionization, since τ ≫ 1 would eliminate almost all
small-scale power and introduce a severe degeneracy by mak-
ing all the power derivatives almost indistinguishable for
ℓ ≫ 10, the region where they differ the most in Figure 1.
Needless to say, accurate parameter estimation also re-
quires that we can compute the theoretical power spectrum
accurately. It has been argued (Hu et al. 1995) that this can
be done to better than 1%, by accurately modeling various
weak physical effects such as Helium recombination and po-
larization. We will return to other real-world issues, such
as foreground contamination and incomplete sky coverage,
further on.
Page 5
hidden
Eigenvalue Problems 5
3 OPTIMAL DATA COMPRESSION:
KARHUNEN-LOE`VE METHODS
Above we saw how small error bars one could obtain when
using all the information in the data set. We also saw that
for large data sets, these minimal error bars could be ap-
proximately attained by making a likelihood analysis using
the entire data set. However, there are of course many situ-
ations where this approach is not feasible even in the simple
Gaussian case.
3.1 The need for data compression
If we wish to estimate m parameters from n data points,
this would involve inverting an n×n matrix at a dense grid
of points in the m-dimensional parameter space. The CPU
time required to invert such a non-sparse matrix scales as
n3, and the number of grid points grows exponentially with
m, so there are obviously limits as to what can be done in
practice.
For instance, the “brute force” COBE analysis of
Tegmark & Bunn (1995) had n = 4038 and m = 2, and
involved inverting 4038× 4038 matrices at a couple of hun-
dred grid points. A future high-resolution all-sky CMB map
might contain of order n = 107 pixels, and direct numerical
inversion of matrices this large is clearly unfeasible at the
present time. Also, to numerically map out the likelihood
function in the 11-dimensional parameter space explored by
Jungman et al. (1996b) with reasonable resolution would
entail evaluating it at such a large number of grid points,
that it would be desirable to make the inversion at each
grid point as fast as possible.
Likewise, the spherical harmonic analysis of the 1.2Jy
redshift survey (Heavens & Taylor, 1995, Ballinger Heav-
ens & Taylor 1996), and the PSC-z survey (Tadros et al.,
in preparation), required a likelihood analysis of some 1208
modes to find the redshift distortion parameter and a non-
parametric stepwise estimate of the power spectrum.
Because of this, it is common to carry out some form of
data compression whereby the n data points are reduced to
some smaller set of n′ numbers before the likelihood analysis
is done. Since the time needed for a matrix inversion scales as
n3, even a modest compression factor such as n/n′ = 5 can
accelerate the parameter estimation by more than a factor
of 100.
3.2 The optimization problem
In this section, we will discuss the special case of linear data
compression, which has as an important special case the so-
called signal-to-noise eigenmode method. We will return to
non-linear data compression methods below, in Section 5.
The most general linear data compression can clearly
be written as
y = Bx, (21)
where the n′-dimensional vector y is the new compressed
data set and B is some arbitrary n′ × n matrix. Thus the
new data set has
{
〈y〉 = Bµ,
〈yyt〉 − 〈y〉〈y〉t = BCBt, (22)
and substituting this into equation (15) we find that the new
Fisher information matrix F˜ is given by
F˜ ij =
1
2
Tr[(BCBt)−1(BC ,iBt)(BCBt)−1(BC ,j Bt)
+ (BCBt)−1(BM ijB
t)]. (23)
If n = n′ and B is invertible, then the B-matrices cancel in
this expression, leaving us with simply
F˜ ij =
1
2
Tr[B−t(AiAj +C
−1M ij)B
t] = F ij . (24)
In other words, B just makes a similarity transform of the
matrix within the trace in equation (15), leaving the Fisher
information matrix F unchanged. This reflects the fact that
when B is invertible, no information is lost or gained, so
that the error bars on a measured parameter are unchanged.
For instance, replacing a galaxy-cut COBE map consisting
of say 3600 pixels by its spherical harmonic coefficients up
to ℓ = 59 (there are 3600 such coefficients) would be an
invertible transformation, thus making no difference what-
soever as to the attainable error bars. Likewise, expansion
in galaxy-cut spherical harmonics gives exactly the same re-
sult as expansion in an orthonormal basis for the galaxy-cut
spherical harmonics (as in Go´rski 1994), since the mapping
from the non-orthonormal basis to the orthonormal basis is
invertible. This result is obvious if we think in terms of in-
formation: if the mapping from x to y is invertible, then y
must clearly contain exactly the same information that x
does, since we can reconstruct x by knowing y.
Rather, the interesting question is what happens when
n′ < n and B is not invertible. Each row vector of B spec-
ifies one of the numbers in the new data set (as some lin-
ear combination the original data x), and we wish to know
which such row vectors capture the most information about
the parametersΘ. If B has merely a single row, then we can
write B = bt for some vector b, and the diagonal (i = j)
part of equation (23) reduces to
F˜ ii =
1
2
(
btC,i b
btCb
)2
+
(btµ,i )2
(btCb)
. (25)
Let us now focus on the problem of estimating a single pa-
rameter θi when all other parameters are known — we will
return to the multi-parameter case below, in Section 3.6.
Since the error bar ∆θi ≈ F˜−1/2ii in this case, we want to find
the b that maximizes the right-hand side of equation (25).
Although this is a non-linear problem in general, there are
two simple special cases which between them cover many of
the situations that appear in cosmology applications:
• The case where the mean is known: µ,i = 0
• The case where the covariance matrix is known: C ,i = 0
Below we show first how these two cases can be solved sepa-
rately, then how the most general case can be for all practical
purposes solved by combining these two solutions.
3.3 When the mean is known
When µ is independent of θi, the second term in equa-
tion (25) vanishes, and we simply wish to maximize the
quantity
(2F˜ ii)
1/2 =
|btC ,i b|
btCb
. (26)
Page 6
hidden
6 M. Tegmark, A. N. Taylor, & A. F. Heavens
Since the mean µ does not depend on the parameters (i.e.,
it is known), let us for simplicity redefine our data by
x 7→ x − µ, so that 〈x〉 = 0. Note that whereas the de-
nominator in equation (26) is positive (since C is a co-
variance matrix and thus positive definite), the expression
btC ,i b in the numerator might be negative, since C ,i is
not necessarily positive definite. We therefore want to make
(btC ,i b)/(btCb) either as large as possible or as small as
possible, depending on its sign. Regardless of the sign, we
thus seek the b for which this ratio takes an extremum, i.e.,
we want the derivatives with respect to the components of b
to vanish. Since this ratio is clearly unchanged if we multiply
b by a constant, we can without loss of generality choose to
normalize b so that the denominator equals unity. We thus
seek an extremum of btC ,i b subject to the constraint that
btCb = 1. (27)
This optimization problem is readily solved by introducing
a Lagrange multiplier λ and extremizing the Lagrangean
btC ,i b − λbtCb. (28)
Varying b and setting the result equal to zero, we obtain the
generalized eigenvalue problem
C ,i b = λCb. (29)
Since C is symmetric and positive definite, we can Cholesky
decompose it (e.g., Press et al. 1992) as C = LLt for some
invertible matrix L. Multiplying equation (29) by L−1 from
the left, we can rewrite it as
(L−1C ,i L−t)(Ltb) = λ(Ltb), (30)
which we recognize as an ordinary eigenvalue problem for
the symmetric matrix (L−1C ,i L−t), whose solution will be
n orthogonal eigenvectors (Ltbk) and corresponding eigen-
values λk, k = 1, 2, ..., n. Let us sort them so that
|λ1| ≥ |λ2| ≥ ... ≥ |λn|. (31)
Let us choose the kth row of B to be the row vector btk, so
that the compressed data set is given by yk = btkx. The or-
thogonality property (Ltbk) · (Ltbk′) ∝ δkk′ in combination
with our chosen normalization of equation (27) then tells us
that our compressed data satisfies
〈ykyk′〉 = 〈(btkx)(xtbk′)〉 = btkCbk′ = btkLLtbk′ = δkk′ ,(32)
i.e., 〈yyt〉 = I. The compressed data values y thus have the
nice property that they are what is known as statistically or-
thonormal, i.e., they are all uncorrelated and all have unit
variance. Since we are assuming that everything is Gaussian,
this also implies that they are statistically independent — in
fact, y is merely a vector of independent unit Gaussian ran-
dom variables. In other words, knowledge of yk gives us no
information at all about the other y-values. This means that
the entire information content of the initial data set x has
been portioned out in disjoint pieces in the new compressed
data set, where y1 contains the most information about θi,
y2 is the runner up (once the information content of y1 has
been removed, y2 contains the most information about θi),
and so on.
If we use all the n vectors bk as rows inB, thenB will be
invertible, and we clearly have not achieved any compression
at all. However, once this matrix B has been found, the
compression prescription is obvious: if we want a compressed
data set of size n′ < n, then we simply throw away all but
the first n′ rows of B. It is straightforward to show (see
e.g. Therrien 1992; Vogeley & Szalay 1996) that if we fix an
integer n′ and then attempt to minimize ∆θi, we will arrive
at exactly the same B as with our prescription above (or
this B multiplied by some invertible matrix, which as we
saw will leave the information content unchanged).
How does the error bar ∆θi depend on the choice of n′?
Equation (29) implies that
C ,iBt = CBtΛ, (33)
where Λij ≡ δijλi, i.e., a diagonal matrix containing all the
eigenvalues. Since equation (32) implies that
BCBt = I , (34)
it follows from equation (33) that
BC ,iBt = Λ. (35)
Therefore the diagonal part of equation (23) reduces to sim-
ply
2F˜ ii = Tr[{(BCBt)−1(BC ,iBt)}2] = TrΛ2 =
n′

k=1
λ2k. (36)
It is thus convenient to plot λ2k as a function of the mode
number k, as we have done in Figure 2 for the specific case of
measuring the redshift distortion parameter from the 1.2Jy
redshift survey, to see at what k the information content
starts petering out. Alternatively, one can plot ∆θi as a func-
tion of n′ as in Figure 3 and see when the curve flattens out,
i.e., when the computational cost of including more modes
begins to yield diminishing returns. If one feels uncomfort-
able about choosing n′ by “eyeballing”, one can obtain a
more quantitative criterion by noting that using only a sin-
gle mode bk would give ∆θi = 1/|λk|. Thus the quantity
θi|λk| can be thought of as the signal-to-noise ratio for the
kth mode, and one can choose to keep only those modes
where this ratio exceed unity (or some other constant such
as 0.2).
3.4 When the covariance matrix is known
When C is independent of θi, the first term in equation (25)
vanishes, and we simply wish to maximize the quantity
F˜ ii =
btM iib
btCb
. (37)
Introducing a Lagrange multiplier λ and proceeding exactly
as in the previous section, this is equivalent to the eigenvalue
problem
(L−1M iiL
−t)(Ltb) = λ(Ltb). (38)
However, since the matrix M ii = 2µ,i µ,ti is merely of rank
one, this problem is much simpler than that in the previous
section. Since the left hand side is
2(L−1µ,i )(µ,ti b) ∝ (L−1µ,i ), (39)
it points in the direction (L−1µ,i ) regardless what the vec-
tor b is, so the only non-trivial eigenvector of (L−1M iiL−t)
is
(Ltb1) = (L
−1µ,i ), (40)
Page 7
hidden
Eigenvalue Problems 7
with a corresponding eigenvalue
λ1 = 2|L−1µ,i |2 = Tr[CM ii]. (41)
All other eigenvalues vanish. In other words, b1 = C−1µ,i,
and the new compressed data set consists of just one single
number,
y1 = µ,ti C−1x (42)
which contains just as much information about the param-
eter θi as the entire data set x did. Another way to see this
is to compute the Fisher information before and after the
compression, and note that
F˜ ii = F ii = µ,ti C−1µ,i . (43)
In other words, all the information about θi in x is has
been distilled into the number y1. Thus the ML-estimate of
θi is just some function of y1. For the special case where
µ depends linearly on the parameter, i.e., µ = µ,i θi where
µ,i is a known constant vector, this function becomes trivial:
the expectation value of equation (42) reduces to
〈y1〉 = (µ,ti C−1µ,i ) θi, (44)
which means that on average, up to a known constant
(µ,ti C−1µ,i ), the compressed data is just the desired pa-
rameter itself.
3.5 When neither is known — the general case
In the most general case, both µ and C depend on the
parameters Θ. This happens for instance in the CMB case
when the components of x are chosen to be estimates of the
power spectrum Cℓ, as in Section 5 below.
Clearly, the more ways in which the probability distri-
bution for x depends on the parameters, the more accurately
we can estimate them. In other words, when both µ and C
depend on Θ, we expect to do even better than in the two
cases discussed above. Finding the vectors b that extremize
equation (25) is quite a difficult numerical problem. For-
tunately, this is completely unnecessary. Solving the eigen-
value problem in Section 3.3 gives us n′′ compression vectors
capturing the bulk of the cosmological information coming
from the first term in equation (25). Section 3.4 gives us
one additional vector µ,ti C−1 that captures all the cosmo-
logical information coming from the second term. In other
words, if we simply use all these n′′ +1 compression vectors
together, we have for all practical purposes solved our prob-
lem. Since a direct numerical attack on equation (25) could
never reduce these n′′ + 1 vectors to fewer than n′′ without
widening the resulting error bars, the time savings in the
ensuing likelihood analysis would at most be a completely
negligible factor [(n′′ + 1)/n′′]3 ∼ 1 + 3/n′′ compared the
simple prescription we have given here.
3.6 Estimating several parameters at once
The Karhunen-Loe`ve method described above was designed
to condense the bulk of the information about a single
parameter into as few numbers yk as possible. Although
this particular problem does occasionally arise, most cos-
mology applications are more complicated, involving mod-
els that contain more than one unknown parameter. The
previous applications of Karhunen-Loe`ve methods to CMB
maps (Bond 1994; Bunn & Sugiyama 1995; Bunn 1995) and
galaxy surveys (Vogeley 1995; Vogeley & Szalay 1996), have
involved taking a rather minimalistic approach to this issue,
by optimizing the eigenmodes for one particular parameter
(typically the overall normalization of the fluctuations) and
assuming that these eigenmodes will capture most of the
relevant information about the other parameters as well.
If we want to estimate several parameters from a data
set simultaneously, then how should we best compress the
data? As we saw in Section 2, the covariance matrix T of
our parameter estimates Θ is approximately F−1, so for the
multivariate case, we have
∆θi = (F−1)1/2ii (45)
instead of ∆θi = F−1/2ii . One approach to the problem would
thus be trying to minimize the diagonal elements of F−1
rather than, as above, trying to minimize the diagonal ele-
ments of F . This is, however, quite a cumbersome optimiza-
tion problem.
Fortunately, there is an alternative solution to the prob-
lem which is easy to implement in practice and in addition is
easy to understand intuitively, in terms of information con-
tent. Suppose that we repeat the standard KL procedure
described above m times, once for each parameter θi, and
let n′i denote the number of modes that we choose to use
when estimating the ith parameter. We then end up with
m different compressed data sets, such that the n′i numbers
in the ith set contain essentially all the information that
there was about θi. This means that the union y of all these
compressed data sets, consisting of
n′′ =
m

i=1
n′i (46)
numbers, retains basically all the information from x that is
relevant for our joint estimation of all the parameters. The
problem is of course that there might be plenty of redun-
dancy in y. Indeed, if we typically compressed by say a fac-
tor n/n′′i ∼ 10 and had 11 parameters as in the CMB model
of Jungman et al. (1996b), then we would have achieved a
counterproductive “anti-compression” with n′′ > n.
How can we remove the unnecessary “pork” from y?
The n′i eigenvectors selected via the KL compression method
span a certain subspace of the n-dimensional space in which
the data vector x lives, and we can think of the data com-
pression step as simply projecting the vector x onto this
subspace. For each parameter θi, we found such a subspace,
and the redundancy stems from the fact that many of these
subspaces overlap with one another, or point in very simi-
lar directions. In order to compress efficiently and yet retain
almost all the the information about all the parameters, we
want to find a small set of vectors that span as much as pos-
sible of the union of all the subspaces. As described in detail
in e.g. Press et al. (1992), this is readily accomplished by
making a Singular Value Decomposition (SVD) and throw-
ing away all vectors corresponding to very small singular
values. Let us define Bi = BΛ, where B is the KL com-
pression matrix optimized for estimating the ith parameter.
Here we multiply the row vectors by their corresponding
eigenvalues so that better vectors will receive more weight
in what follows. Combining all the row vectors of the trans-
Page 8
hidden
8 M. Tegmark, A. N. Taylor, & A. F. Heavens
posed compression matrices Bti into a single matrix, SVD
allows us to factor this matrix as
(
Bt1 . . . Btm
)
= UΛV t, (47)
where U tU = I , V tV = I and the matrix Λ is diagonal.
One customarily writes Λ = diag{λi}, and refers to the di-
agonal elements λi as the singular values. These are basically
a generalization of eigenvalues to non-square matrices. We
define our final compression matrix as
B ≡ U t. (48)
The columns of U with corresponding non-zero singular val-
ues λi form an orthonormal set of basis vectors that span the
same space as all the initial compression vectors together.
The column vectors of U with vanishing singular values form
an orthonormal basis for the null-space of UΛV t, repre-
senting redundant information. Similarly, the vectors corre-
sponding to singular values near zero capture information
that is almost entirely redundant. By making a plot similar
to Figure 2, showing λk as a function of k, one decides on
the final number n′ < n′′ of row vectors in B to keep. Once
n′ has been fixed, one can of course (if one prefers⋆) make
the n′ numbers in the new compressed data set statistically
orthogonal as before by Cholesky decomposing their covari-
ance matrix BCBt = LLt and replacing z = Bx by L−tz.
In summary, we have found that when we wish to estimate
more than one parameter from the data, we can obtain a
close to optimal data compression with the following steps:
(i) Compute the KL eigenmodes that contain the bulk of
the information about the first parameter.
(ii) Repeat this procedure separately for all other param-
eters.
(iii) Arrange all the resulting vectors, multiplied by their
eigenvalues, as the rows of B.
(iv) Make an SVD of B and throw away all rows corre-
sponding to very small singular values.
3.7 The problem of not knowing the parameters a
priori
The KL-approach has sometimes been criticized for not be-
ing model-independent, in the sense that one needs to make
an a priori guess as to what the true parameter values Θ0
are in order to be able to compute the compression matrix
B. Although there is no evidence that a bad guess as to Θ0
biases the final results (see e.g. Bunn 1995 for a detailed
discussion of these issues, including numerical examples), a
poor guess of Θ0 will of course lead to a data compression
that is no longer optimal. In other words, one would expect
to still get an unbiased answer, but perhaps with larger error
bars than necessary.
In practice, this loss of efficiency is likely to be marginal.
For instance, as a test, Bunn (1995) performed a KL-analysis
of a COBE CMB map with n ∼ 1 assuming a blatantly
erroneous spectral index n = 2 to compute B (n = 2 is
ruled out at about 3σ a posteriori), and compared this with
⋆ The only merit of the statistical orthogonality is that it will
make 〈yyt〉 more close to diagonal near the fiducial parameter
valuesΘ, which might conceivably speed up the matrix inversion
if an iterative method is used.
the results obtained when assuming n = 1. The error bars
where found to change only marginally.
If one nonetheless wishes to do something about this
efficiency problem, it is fortunately quite easy in practice.
The simplest approach is an iterative scheme, whereby the
KL-analysis is carried out twice, using the ML-estimate ofΘ
from the first run as the fiducial “true” value in the second
run. A more rigorous approach is to compute compression
vectors b for a number of different assumptions about Θ0
that include the extreme corners of parameter space, and
then combine all these vectors into a single compression ma-
trix B by singular-value decomposition exactly as described
in the previous section.
4 COSMOLOGY APPLICATIONS
In this section, we illustrate the theoretical discussion above
with a number of cosmology examples, and show how the
recently published work on CMB maps and galaxy surveys
fits into the general KL-scheme as special cases. We will see
that the typical compression factor is about 10 in both an
IRAS example and a COBE example.
4.1 A large-scale-structure example:
redshift-space distortions
Here we apply a KL-analysis to the problem of redshift-space
distortions in the 1.2Jy IRAS galaxy survey (Fisher et al.
1995). This problem has previously been analyzed in detail
by Heavens & Taylor (1995, hereafter HT95) who used a
spherical harmonic analysis. Here we repeat their analysis,
but include a KL data-compression step to investigate how
many modes are needed for an accurate determination of
the redshift distortion parameter
β ≡ Ω
0.6
b , (49)
where b denotes the conventional linear bias factor, the ratio
between the fluctuations in luminous matter and the total
matter fluctuations. As was first shown by Kaiser (1987), the
peculiar motions of galaxies induces a characteristic radial
distortion in the apparent clustering pattern that depends
only on this parameter β.
As the initial data vector x, we use the coefficients
obtained by expanding the observed galaxy distribution in
combinations of spherical harmonics and spherical Bessel
functions as described in HT95, with the known mean val-
ues subtracted off so that µ = 〈x〉 = 0. HT95 show that the
covariance matrix is given by
Cµν = Λsnµν +
1
2

α
(Φαµ + βV αµ)(Φαν + βV αν)P (kα),(50)
where the indices µ, ν and α run over the above-mentioned
modes, and P (k) is the power spectrum. The matrices Φ
and V encapsulate the effects of finite survey volume con-
volution and redshift distortions, respectively. The matrix
Λsn contains the shot-noise contribution. All three matrices
are independent of β. For our illustrative example, we as-
sume a parametrised form of the power spectrum suggested
by Peacock & Dodds (1994), which means that β is the only
Page 9
hidden
Eigenvalue Problems 9
Figure 2. KL-eigenvalues λ = 1/∆β.
unknown parameter. Hence m = 1 and Θ = θ1 = β. Differ-
entiating equation (50), we obtain
C ,1 = ∂C∂β =
1
2
(V PΦt +ΦPV t) + (V PV t)β, (51)
where P αβ ≡ P (kα)δαβ (i.e., P = diag{P (kα}). We assume
an a priori value β = 1 to evaluate this matrix. Using n =
1208 modes in the initial data set x as in HT95, we obtain
the eigenvalues λk shown in Figure 2. The resulting error
bar
∆β =
[
n′

k=1
λ2k
]−1/2
(52)
is plotted as a function of n′ (the number of modes used) in
Figure 3, and is seen to level out at around n′ = 100. This
means that although HT95 used ∼ 103 modes to estimate β,
almost all the information is actually contained in the best
∼ 102 linear combinations of these modes. In other words,
we can obtain basically identical results to those found in
HT95 by using a compressed data set only 10% of the orig-
inal size. Since the matrix inversion time scales as n′3, this
compression factor of 10 thus allows the inversion to be car-
ried out 1000 times faster.
4.2 A special case: the signal-to-noise eigenmode
method
The above-mentioned example was rather generic in the
sense that C depended on Θ in a nonlinear way (in this
case, C depended quadratically on β). However, the special
case of the KL-method where the parameter dependence is
linear (or rather affine) is so common and important that it
has acquired a special name: the signal-to-noise eigenmode
method. This is the special case where µ = 0, we have only
one unknown parameter θ, and the covariance matrix can
be written in the form
C = Sθ +N , (53)
where the known matrices S and N are independent of θ.
For reasons that will become clear from the examples below,
Figure 3. Error bar on beta β as a function of n′, the number
of eigenmodes used.
S and N are normally referred to as the signal and noise
matrices, respectively, and they are normally both positive
definite. Since dC/dθ = S, equation (29) gives the general-
ized eigenvalue problem
Sb = λ(S +N )b. (54)
By Cholesky decomposing the noise matrix as N = LLt,
this is readily rearranged into the ordinary eigenvalue prob-
lem
(L−1SL−t)(Ltb) = λ′(Ltb), (55)
where λ′ ≡ λ/(1 − λ). Since the matrix to be diagonal-
ized, (L−1SL−t), is loosely speaking (N−1/2SN−1/2), a
type of signal-to-noise ratio, its eigenvectors b are usually
referred to as the signal-to-noise eigenmodes. Historically,
this terminology probably arose because one was analyzing
one-dimensional time series with white noise, where N ∝ I ,
so that (L−1SL−t) = SN−1 was the signal-to-noise ratio
even in the strict sense. Vogeley & Szalay (1996) refer to
the change of variables x 7→ L−1x as prewhitening, since
it transforms S +N into (L−1SL−t) + I , i.e., transforms
the noise matrix into the white noise matrix, the identity
matrix.
Equation (32) showed that in the general KL-case, the
compressed data set was statistically orthonormal, 〈yyt〉 =
I. In the S/N-case, the compressed data has an additional
useful property: both the signal and noise contributions to
y are uncorrelated separately. It is easy to show that
btkSbk′ = λ′btkNbk′ ∝ δkk′ , (56)
which means that the covariance matrix 〈yyt〉 will remain
diagonal even if we have misestimated the true signal-to-
noise ratio. In other words, if the sole purpose of our likeli-
hood analysis is to estimate the overall normalization θ from
equation (53), we only need to invert diagonal matrices.
The signal-to-noise method arises in a wide variety of
contexts. For instance, it is a special case not only of the KL-
method, but also of the power spectrum estimation method
of Tegmark (1996), corresponding to the case were the width
of the window functions is ignored.
Page 10
hidden
10 M. Tegmark, A. N. Taylor, & A. F. Heavens
4.2.1 Signal-to-noise analysis of CMB maps
The signal-to-noise eigenmode method was introduced into
CMB analysis by Bond (1994), who applied it to sky maps
from both the COBE and FIRS experiments. It has also
been applied to the COBE data by Bunn†, and used it to
constrain a wide variety of cosmological models (Bunn 1995;
Bunn & Sugiyama 1995; Bunn, Scott & White 1995; Hu,
Bunn & Sugiyama 1995; White & Bunn 1995; Yamamoto &
Bunn 1996). Here the uncompressed data set consists of the
CMB temperatures in n pixels from a sky map (for instance,
n = 4038 or 4016 for a COBE map with a 20◦ galactic cut,
depending on the pixelization scheme used). If foreground
contamination is negligible (see e.g. Tegmark & Efstathiou
1996 for a recent review) and the pixel noise is uncorrelated
(as it is for COBE to an excellent approximation — see
Lineweaver et al. 1994) one has



µ = 0,
Sij =
∑∞
ℓ=2
(
2ℓ+1

)
Pℓ(nˆi · nˆj)W 2ℓ Cℓ,
N ij = δijσ2i ,
(57)
where nˆi is a unit vector pointing the direction of the ith
pixel, σi is the r.m.s. noise in this pixel, Pℓ are the Legen-
dre polynomials and Wℓ is the experimental beam function.
This expression for S should only be used if one marginal-
izes over the monopole and dipole in the ensuing likelihood
analysis — otherwise S should be corrected as described in
Tegmark & Bunn (1995). In all the above-mentioned appli-
cations, the compression was optimized with respect to the
overall normalization of the power spectrum (say θ = C2),
so the matrix S was independent of θ. It has been found
(Bunn & Sugiyama 1995) that a compression factor of about
10, down to n′ ∼ 400 modes, can be attained without signifi-
cant loss of information. This is about the same compression
factor that we obtained in our redshift distortion example
above.
Although these above-mentioned papers constrained a
wide variety of cosmological parameters, the compression
was in all cases optimized only for one parameter, the
overall power spectrum normalization. Although this pro-
cedure can be improved as was described in Section 3.6,
this does not appear to be causing substantial loss of ef-
ficiency in the COBE case. Support for this conclusion is
provided by Tegmark & Bunn (1995), where it is found that
KL-compression optimized for measuring the normalization
gives error bars on the spectral index n that are less than
10% away from the minimal value that is obtained without
any compression at all. It should also be noted that if one
optimized the KL-compression for a different parameter θ,
say θ = n, then C would no longer be of the simple form
of equation (53). Thus the signal-to-noise eigenmode treat-
ment no longer applies, and the more general equation (29)
must be used instead.
As has been pointed out by Go´rski (1994), the rapid
fall-off of the COBE window function Wℓ implies that vir-
tually all the cosmological information in the COBE maps is
contained in the multipoles ℓ ≤ 30. When implementing the
KL-compression in practice, it is useful to take advantage
† In fact, both Bond and Bunn independently reinvented the
entire KL-method.
of this fact by replacing the data set by the 957 spherical
harmonic coefficients aℓm for 2 ≤ ℓ ≤ 30, since this reduces
the size of the matrix to be diagonalized by about a factor
4. This is an example of what we will call pre-compression,
and we will return to this topic below, in Section 5.
4.2.2 Signal-to-noise analysis of galaxy surveys
The application of the signal-to-noise eigenmode method to
galaxy surveys was pioneered by Vogeley (1995) and has
since been further elaborated by Vogeley & Szalay (1996).
Although these are both method papers, deferring actual
parameter estimation to future work, the former explicitly
evaluates the eigenmodes for the CfA survey and shows that
there are about 103 modes with a signal-to-noise ratio ex-
ceeding unity.
In galaxy survey applications, the noise matrix N con-
tains the contribution from Poisson shot noise due to the
finite number of galaxies per unit volume, rather than on de-
tector noise as in the CMB case. With galaxies, the question
of what to use as the initial data vector x is not as simple as
in the CMB case, since there is no obviously superior way
to “pixelize” the observed density field. Vogeley & Szalay
(1996) divide space into a large number of disjoint volumes
(“cells”), and derive the resulting signal matrix S in the ap-
proximation of ignoring redshift distortions. One alternative
is using “fuzzy” cells as discussed by Tegmark (1995), for in-
stance averages of the galaxy distribution where the weight
functions are Gaussians centered at some grid of points.
Another alternative, which has the advantage of mak-
ing it easier to include the effect of redshift distortions, is
to use the coefficients from an expansion in spherical har-
monics and spherical Bessel functions as was done by HT95,
Ballinger et al. (1995; hereafter BHT95) and in Section 4.1
above. In BHT95 the real space power spectrum of density
perturbations was left as a free function, to be evaluated
in a stepwise fashion, along with the distortion parameter.
In the case of only estimating the power, by fixing β to
some constant value, the problem reduces to a signal-to-
noise eigenmode problem, but with m free parameters —
the power at m specified wavenumbers. Due to the mixing
of wavenumbers by the Φ and V matrices, these are gener-
ally not independent and so we must apply the SVD method
to construct orthogonal eigenmodes. This application will be
discussed elsewhere.
We conclude this section with a few additional com-
ments on the above-mentioned “pixelization” issue. Com-
pared to a CMB analysis, a galaxy survey analysis involves
one more step, which is reducing the point data to the “ini-
tial” data vector x. What is the relation between the first
step (weighting galaxies) and the second step (weighting
modes)? In Feldman, Kaiser & Peacock (1994) and Heavens
& Taylor (1995), schemes for optimal weighting of galaxy
data were presented. This weighting of the data (step 1)
is prior to and complementary to the mode-weighting dis-
cussed in this paper (step 2). The data-weighting is designed
to maximize the signal-to-noise of individual modes, and one
might intuitively expect thus this to be a good mechanism
for obtaining large eigenvalues λk. The techniques of the
present paper can then subsequently be used to trim the
data in an optimal way. The data-weighting scheme alone
does not guarantee that we get the best answer possible.
Page 11
hidden
Eigenvalue Problems 11
Figure 4. The three heavy lines show the smallest error bars
attainable for the three parameters Q, n and τ as a function of
the number of modes used, i.e., the error bars obtained when using
the separately optimized KL-modes for each parameter. The three
thin lines show the error bars obtained when using the 500 SVD
modes, illustrating that these contain essentially all the relevant
information about all three parameters.
This is because it maximizes the diagonal elements in the co-
variance matrix, ignoring correlations between modes. This
does therefore not guarantee that the complete Fisher infor-
mation matrix will be optimal. In general, it does not seem
tractable to devise a data-weighting scheme which optimizes
the Fisher matrix directly.
4.3 A multi-parameter CMB example
Here we apply a KL-analysis to the problem of estimating
the quadrupole normalization Q, the spectral index n and
the reionization optical depth τ from the 4-year COBE data
(Bennett et al. 1996). As the initial data vector x, we use
the temperatures in the N = 4016 pixels that lie more than
20◦ from the galactic plane. Just as in Section 4.2.1, µ = 0
and C = S + N is given by equation (57). Computation
of C ,i thus reduces to computing the derivatives of the an-
gular power spectrum Cℓ that occur in the sum, and we
do this in practice using the software described in Seljak
& Zaldarriaga (1996). The fiducial model has the standard
CDM power spectrum shown in Figure 1. The three heavy
lines in Figure 4 show the results of performing a separate
KL-analysis for each parameter. In good agreement with
the findings of Bunn & Sugiyama (1995), we see that vir-
tually all information about Q is contained in the first 400
modes. Similarly, we see that all essentially all the infor-
mation about n and τ is contained in the first 400 modes
optimized for these parameters. Since the KL-modes for a
parameter by construction give the smallest error bars, the
curves corresponding to any other choice of modes would lie
above these solid curves. For instance, if we used the KL-
modes optimized for τ to measure Q, the resulting curve
would lie above the bottom one if we were to plot it in Fig-
ure 4. Figure 1 provides a simple interpretation of this: since
dCℓ/dτ ≈ 0 for ℓ ≪ 10, the τ -modes do not contain infor-
mation about the lowest multipoles, since this information
Figure 5. The error bars on the power spectrum normaliza-
tion are shown for hypothetical COBE experiments with differ-
ent noise levels. From top to bottom, they correspond to a noise
enhancement factor 10, the real 4 year data, and noise reduction
factors of 10, 100 and 1000.
is useless for measuring τ (even though it is important for
measuring Q).
We implement the SVD technique described in Sec-
tion 3.6 by taking the 500 best modes for each parameter
and SVD-compressing these down to 500 modes. The thin
lines in Figure 4 show the resulting error bars. We see that
although they lie above the minimal curves for k ≪ 500,
they all “catch up” at k = 500. In other words, these 500
modes retain virtually all the relevant information about Q,
n and τ , since using all of them gives virtually identical er-
ror bars to those obtained when using the full N = 4016
uncompressed data set.
4.4 A low noise CMB example
How effective will KL-compression be for future CMB data
sets? Might it be the case that when noise levels are much
lower, then all N KL-modes will have S/N≫ 1, so that no
compression is possible? Figure 5 shows that the answer to
the second question is no. For this example, we have re-
peated the above-mentioned COBE analysis for a range of
different noise levels. With ten times less noise, the com-
pression factor is seen to drop to about 4016/700 ∼ 7 and
another order of magnitude of noise reduction lowers the
compression factor to about 4. The upcoming MAP experi-
ment forecasts a noise level w−1 ≡ 4πσ2/N ≈ 2.5×10−15 , as
compared to w−1 ≈ 10−12 for COBE and w−1 ≈ 5× 10−18
for the planned COBRAS/SAMBA experiment. These are
quadratic quantities, so taking square roots, we see that
MAP and COBRAS/SAMBA reduce the COBE noise by
factors around 20 and 450, respectively. Figure 5 shows that
even with 1000 times less noise, a compression factor of 2
is readily attainable. The explanation of this is oversam-
pling. To avoid aliasing problems, the mean pixel separation
must be smaller than the beam width by a substantial factor
(the Shannon oversampling factor ∼ 2.5). This redundancy
remains even when the data set itself has excellent signal-to-
Page 12
hidden
12 M. Tegmark, A. N. Taylor, & A. F. Heavens
noise, so the KL-compression can take advantage of the fact
that the pixel basis of the map is a sense overcomplete. Al-
though future CMB experiments will of course have a much
higher angular resolution than in our example, the oversam-
pling factor will remain similar, so our conclusion about the
usefulness of KL-compression will remain the same.
5 ON GIANT DATA SETS: THE NEED FOR
PRE-COMPRESSION
Above we have shown that the KL-compression technique
is in many cases very useful. In this section, we will discuss
some of its limitations, as well as outline nonlinear exten-
sions that may make the compression technique feasible even
for the next generation of CMB missions and galaxy surveys.
The number of pixels in a sky map from a next-
generation CMB mission is likely to be around 107. The
Sloan Digital Sky Survey (SDSS) will measure the redshifts
of 106 galaxies. Is it really feasible to apply KL-methods
and likelihood analysis to data sets of this gargantuan size?
We will argue that although an orthodox KL-analysis is not
feasible, it appears as though the answer to the question
may nonetheless be an encouraging yes if an extra nonlin-
ear compression step is added to make the analysis doable
in practice. Let us first note that despite the statistical or-
thonormality property of a KL-compressed data set, the ma-
trices that need to be inverted to find the ML-estimate are
in general not diagonal. BCBt is only diagonal at the one
point in parameter space where Θ = Θ0, i.e., the point
corresponding to the parameter values that we assumed a
priori. The ML-point generally lies elsewhere, and we need to
find it by a numerical search in parameter space, so BCBt
will generally not even be sparse. This forces us to resort to
Cholesky decomposition. For the n ∼ 4000 case of Tegmark
& Bunn (1995), this took about 10 minutes per matrix on
a workstation. Since the time scales as n3 and the storage
as n2, this would require about 30 years of CPU time for
n = 106, and about one terabyte of RAM. Even allowing
for the exponential rate at which computer performance is
improving, such a brute force likelihood analysis does not
appear feasible for a megapixel CMB map within the next
ten years.
The crucial question thus becomes how large a compres-
sion factor we can expect to achieve. We will argue that a
factor of 10 is just about all one can do with linear com-
pression, but that quadratic “pre-compression” may be able
to gain another factor of 1000 for a next generation CMB
experiment.
5.1 The limits of linear compression
Consider the following simple one-parameter example: we
are given n numbers xi drawn from a Gaussian distribution
with zero mean and an unknown variance θ that we wish to
estimate. Thus
{
µ = 0,
C = θI ,
(58)
so when we solve equation (29), we find that all the eigenval-
ues are identical: λk = 1/θ. Thus if we compress by keeping
only the first n′ modes, the resulting error bar will be
∆θ =

2
n′ θ. (59)
The fact that all eigenvalues are identical means that we
would be quite stupid to throw away any modes at all, since
they all contain the same amount of information. Simple as
it is, this example nonetheless illustrates a difficulty with
analyzing future CMB maps. Even in the best of all worlds,
where we had complete sky coverage, we would encounter a
problem of this type. To estimate a multipole Cℓ, we would
be faced with (2ℓ+1) coefficients aℓm with zero mean and un-
known variance, just as in the example above, which means
that the KL-method would be of no use at all in compressing
this data.
In Bunn (1995) and in one of the above examples, it was
found that the COBE data set can be compressed down to
about 400 numbers without losing much information. Where
does this magic number 400 come from? In the absence of
a galaxy cut, the number would probably be around 600,
since beam smearing has eliminated virtually all informa-
tion about multipoles ℓ ∼> 25, and there are about 600 aℓm-
coefficients with ℓ < 25. The fact that the galaxy cut re-
moves about one third of the sky then reduces the cosmolog-
ical information by about a third. There is also a more direct
way to see where the compression factor 10 came from. As
mentioned in Section 4.4, pixelized maps are generally over-
sampled by a factor 2.5 or more to avoid aliasing problems,
which means that the mean pixel separation is considerable
smaller than the beam width. Since the sky map is two-
dimensional, the number of pixels per beam area is roughly
the square of this number, of the order of 10. Future CMB
missions will probably use similar oversampling rates, which
means that a compression factor of 10 is probably the most
we can hope for with linear compression only.
5.2 Quadratic compression
Fortunately, we are not limited to linear data compression.
Let us compress the data set of our previous example into a
single number defined by
y ≡ 1n
n

i=1
x2i . (60)
Let us assume that n is very large, say n ∼> 100. Then y
will be very close to Gaussian by the central limit theorem.
This means that the mean is 〈y〉 = nθ and the variance is
〈y2〉−〈y〉2 = 2nθ2. Substituting this into Equation (15) now
gives
∆θ =

2
n + 4θ ≈

2
nθ, (61)
i.e., the theoretically minimal error bar of equation (59)
with n′ = n. (The extra 4 in equation (61), which might
seem to indicate that one can attain smaller error bars than
the theoretical minimum, should of course be ignored —
it merely reflects the fact that the Gaussian approximation
breaks down for small n.)
In summary, we have found that whereas linear com-
pression was powerless against our simple toy model,
quadratic compression made mincemeat of the problem, con-
densing all the information into a single number. This result
Page 13
hidden
Eigenvalue Problems 13
is hardly surprising, considering that the compressed data
set y that we have defined in equation (60) is in fact the
ML-estimate of the variance. It nonetheless has far-reaching
implications for the issue of how to compress future CMB
maps. If we have complete sky coverage, and define the com-
pressed data set as
yℓ ≡
1
2ℓ+ 1


m=−ℓ
|aℓm|2, (62)
then it is easy to show that the new Fisher information ma-
trix will be identical to that of equation (18), which involved
using the entire data set. For an experiment with a FWHM
beamwidth of 4 arcminutes, there is virtually no informa-
tion on multipoles above ℓmax = 3000, so this means that
in the absence of a galaxy cut, we could compress the entire
data set into 3000 numbers without losing any cosmological
information at all. Roughly speaking, this works because
the compression throws away all the phase information and
keeps all spherically averaged amplitude information, and
it is the latter that contains all the information about the
power spectrum. (For non-Gaussian models such as ones in-
volving topological defects, the power spectrum alone does
of course not contain all the relevant information.)
5.3 Real-world CMB maps: an outline of a
compression recipe
The discussion above already included the effects of pixel
noise and beam smearing. Barring systematic errors, there
are two additional complications that will inevitably arise
when analyzing future high-quality CMB maps:
• Foregrounds
• Incomplete sky coverage
Any attempts at foreground subtraction should of course be
made before throwing away phase information, so that avail-
able spatial templates of galactic dust, radio point sources
etc. can be utilized. For a recent discussion of such sub-
traction schemes, see e.g. Tegmark & Efstathiou (1996).
The final result of the foreground treatment will almost cer-
tainly be a map where some regions, notably near the galac-
tic plane, have been discarded altogether. The resulting in-
complete sky coverage degrades information in two different
ways:
• The sample variance increases.
• The spectral resolution decreases.
The former effect causes the variance in individual multipole
estimates to grow as the inverse of the remaining sky area
(Scott et al. 1994), and this increase in variance is automat-
ically reflected in the final Fisher information matrix. The
latter effect means that the yℓ defined by equation (62) is no
longer a good estimate of the multipole Cℓ. Rather, it is easy
to show that 〈yℓ〉 will be a weighted average of all the mul-
tipoles Cℓ. These weights, known as the window function,
generally form quite a broad distribution around ℓ, which
means that the compressed data yℓ are effectively probing a
smeared out version of the power spectrum. For a 20◦ galac-
tic cut, this smearing is found to be around ∆ℓ/ℓ ∼ 25%,
which clearly destroys information about features such as
the exact location of the Doppler peaks.
5.3.1 How to deal with incomplete sky coverage
Fortunately, the problem of incomplete sky coverage can be
for all practical purposes eliminated. As described in detail
by Tegmark (1996, hereafter T96), it is possible to obtain
much narrower window functions by simply replacing the
spherical harmonic coefficients in equation (62) by expan-
sion coefficients using a different set of functions, obtained
by solving a certain eigenvalue problem. If is found that
for a 20◦ galactic cut, the window functions widths can be
brought down to ∆ℓ ∼ 1 for all ℓ, corresponding to a relative
smearing of less than a percent at ℓ ∼ 200, the scale of the
first CDM Doppler peak. In other words, as long as none of
the models between which we are trying to discriminate have
any sharp spectral features causing the power spectrum to
jump discontinuously from one multipole to the next, then
virtually no information at all is lost in our quadratic com-
pression.
In general, smoothing only destroys information if there
are features on the smoothing scale or below it. If the true
power spectrum is similar to a CDM spectrum, it will typ-
ically vary on scales ∆ℓ ∼ Cℓ/(dCℓ/dℓ) ∼ 100, at least for
angular scales ℓ ∼> 50. In other words, even smoothing it with
window functions with ∆ℓ as large as 10 would hardly de-
stroy any information at all about this part of the power
spectrum. The quadratic compression of T96 produces a
spectral resolution of ∆ℓ ∼ 1/θ, where θ is the smallest
dimension of the patch of sky analyzed, in radians. In other
words, we can allow ourselves even more leeway than the
galaxy cut dictates without losing information. This can be
used to save CPU-time in practice. Since we can achieve the
spectral resolution ∆ℓ ∼ 10 with 6◦ × 6◦ patches of sky, we
can form compressed data sets y separately for a mosaic of
such patches covering all the available sky, thus radically re-
ducing the size of the matrices that we need to diagonalize
for the T96-method, and then simply average these different
estimates of the power spectrum.
5.3.2 The final squeeze: KL-method
Although the above-mentioned prescription will reduce the
size of the data set dramatically, from perhaps 107 numbers
to about 3000, there will still be considerable amounts of
redundancy, since power spectrum estimates yℓ for neigh-
boring ℓ-values will be correlated. Because of this, it is
worthwhile to subject the new data set y to a regular KL-
compression. We thus term the above-mentioned quadratic
step pre-compression: it does by no means need to be opti-
mal, and is simply done to reduce the data set down to a
small enough size that it can be fed into the KL-machinery
without practical difficulties.
The values yℓ are Gaussian to an excellent approxima-
tion for ℓ ∼> 50, by the central limit theorem, since they are
the sum of 2ℓ+1 ∼> 100 independent numbers. The remain-
der, however, are not. Since the Gaussian property makes
such a dramatic simplification both in the compression step
and in the likelihood analysis itself, we therefore recommend
discarding the yℓ-values with ℓ < 50 and replacing them by
the 2500 spherical harmonic coefficients aℓm with ℓ < 50 be-
fore proceeding to the KL-compression step. This way, the
entire data set y will be Gaussian. In addition, no infor-
mation whatsoever has been lost about the largest angular
Page 14
hidden
14 M. Tegmark, A. N. Taylor, & A. F. Heavens
scales, where some models in fact do predict rather sharp
spectral features which could render the quadratic compres-
sion step inappropriate.
In summary, we argue that the following prescription
for analyzing future 107 pixel CMB map will be fairly close
to optimal:
(i) Foregrounds are subtracted making maximal use of
multifrequency data and spatial templates obtained from
other experiments.
(ii) The most contaminated regions, for instance the
galactic plane, are discarded altogether.
(iii) The remaining sky is expanded in spherical harmon-
ics up to ℓ = 50 and the coefficients saved.
(iv) This remaining sky is covered by a mosaic of over-
lapping regions, whose diameter are of order 5◦ − 10◦.
(v) The angular power spectrum is estimated separately
from each of these patches with the method of T96, up to
ℓ = 3000.
(vi) All these power spectrum estimates are averaged.
(vii) The first 50 multipole estimates are discarded, and
replaced by the 2500 numbers from step (iii).
(viii) The resulting data set (about 5500 numbers) is com-
pressed with the KL-method.
(ix) All cosmological parameters of interest are estimated
jointly from this compressed data set with a likelihood anal-
ysis.
5.3.3 Open problems
The above prescription for quadratic compression was nec-
essarily rather sketchy, since a detailed treatment of these
issues would go well beyond the scope of this paper. In-
deed, the discussion above left several important questions
unanswered, which are clearly worthwhile topics for further
research. Here are two such examples:
• How can information loss during pre-compression be
minimized? Above we merely showed pre-compression to be
lossless in the absence of noise (or with identical noise lev-
els in all pixels). In the presence of noise, one would expect
lossless compression to involve some form of inverse-variance
pixel weighting, i.e., giving less weight to noisier pixels. On
the other hand, pushing such noise weighting too far could
broaden the window functions to the point where low spec-
tral resolution led to irreversible information loss.
• How is quadratic precompression best implemented nu-
merically? For instance, are there particularly choices of
shapes and sizes of the above-mentioned patches that sub-
stantially facilitate the calculation of the final mean vector
and correlation matrix? Is a direct analytic calculation of
these quantities feasible (C involves terms quadratic in the
power spectrum), or is it faster to resort to Monte Carlo
techniques for this step?
5.4 Real-world galaxy surveys
Also for large future galaxy surveys, some form of precom-
pression appears to be necessary before a KL-compression
can be done. If we wish to retain all the information on clus-
tering down to say 3h−1Mpc scales in a 3D volume of typical
dimension 300h−1Mpc, we clearly need to “pixelize” it into
at least (300/3)3 = 106 numbers. Since even the power spec-
trum in the deeply non-linear regime contains valuable cos-
mological information, it would not seem justified to simply
ignore these small scales.
Unfortunately, the galaxy survey problem is consider-
ably more difficult than the corresponding CMB problem in
that information about, for instance, redshift space distor-
tions lies hidden not merely in the overall power spectrum,
but also in the phases, in the differences between radial and
transverse clustering patterns.
As we have noted, transforming to weighted harmonic
amplitudes is one way of reducing the number of data
“points” without losing cosmological information For in-
stance, in the case of the analysis of the 1.2Jy survey by
HT95 and BHT95, only the first 1208 modes where used
from the 2000 galaxies, with the limit on modes used being
set by the survey size and the smallest scale still in the lin-
ear regime. However, if the full range of available modes is
desired, we might need higher order compression options.
One way to implement quadratic compression, while re-
taining the important phase information, is to transform to
some coordinate basis which is orthogonal in redshift–space.
We can then apply the quadratic compression to estimate
a “power spectrum” in the transformed space, having lost
none of the underlying phase information. Some progress
along the lines of finding an orthonormal basis function
for redshift space has been made by Hamilton & Culhane
(1995). However, these methods still fail to adequately deal
with shot–noise and the phase mixing introduced by a finite
angular mask function.
Thus the issue of whether one can do even better with
galaxy surveys, while staying within the realms of numerical
feasibility, remains a challenging open question.
6 CONCLUSIONS
We have given a comprehensive discussion of how to best
estimate a set of cosmological parameters from a large data
set, and concluded the following:
• Well-known information-theoretical results roughly
speaking state that a brute-force likelihood analysis using
the entire data set gives the most accurate parameter deter-
mination possible.
• For computational reasons, this will be unfeasible for
the next generation of CMB maps and galaxy surveys.
• The solution is to use a good data compression scheme.
• The optimal linear compression method is the
Karhunen-Loe`ve method, of which the so-called signal-to-
noise eigenmode method is a special case.
• Although the standard KL-method applies only when
estimating a single parameter, it can be generalized to the
multi-parameter case by simply adding a step consisting of
a singular value decomposition (SVD).
• This SVD step also provides a simple way out of the
Catch-22 situation that one needs to specify the parameter
values before one has measured them.
• The KL-method produces a compression factor ∼ 10
for typically sampled CMB maps, and also for the redshift
space distortion analysis of Heavens & Taylor (1995).
• However, this is not enough to handle a high-resolution
next generation CMB map.
Page 15
hidden
Eigenvalue Problems 15
• Fortunately, it appears as though this can be remedied
by adding a quadratic pre-compression step without sub-
stantial information loss.
Cosmology, which used to be a data-starved science, is now
experiencing a formidable explosion of data in the form of
both CMB maps and galaxy redshift surveys. Around the
turn of the millennium, we will probably be equipped with
data sets so rich in information that most of the classical
cosmological parameters can — in principle — be deter-
mined with accuracies of a few percent or better. Whether
this accuracy will be attainable also in practice depends cru-
cially on what data-analysis methods are available. We have
argued that the prospects of achieving this accuracy goal
are quite promising, especially on the CMB side (which is
slightly simpler), by using a multi-parameter generalization
of the Karhunen-Loe`ve method together with a quadratic
pre-compression scheme. However, much work remains to be
done on precompression issues to ensure that we can take
full advantage of the wealth of data that awaits us.
ACKNOWLEDGMENTS
The authors wish to thank Ted Bunn, Andrew Hamilton and
our referee, Uros Seljak, for useful comments and sugges-
tions. This work was supported by a PPARC research assis-
tantship (to ANT), European Union contract CHRX-CT93-
0120, Deutsche Forschungsgemeinschaft grant SFB-375 and
NASA through a Hubble Fellowship, #HF-01084.01-96A,
awarded by the Space Telescope Science Institute, which is
operated by AURA, Inc. under NASA contract NAS5-26555.
REFERENCES
Bennett, C. L. et al. 1996, ApJ, 464, L1
Bond, J. R. et al. 1994, Phys. Rev. Lett., 72, 13
Bond, J. R., 1995, Phys. Rev. Lett., 74, 4369
Bunn, E. F. 1995, Ph.D. Thesis, U.C. Berkeley, ftp
pac2.berkeley.edu/pub/bunn
Bunn, E. F., Sugiyama N. 1995, ApJ, 446, 49
Bunn, E. F., Scott, D. & White, M. 1995, ApJ, 441, L9
Davis M., Peebles P. J. E., 1983, APJ, 267, 465
Dodelson, S., Gates, E. & Stebbins, A. 1996, ApJ, 467, 10
Feldman H. A., Kaiser N., Peacock J. A., 1994, APJ, 426,
23
Fisher, R. A. 1935, J. Roy. Stat. Soc., 98, 39
Fisher K. B., Huchra J. P., Strauss M. A., Davis M., Yahil
A., Schlegel D., 1995, ApJS, 100, 69
Go´rski, K. M. 1994, ApJ, 430, L85
Gunn, J., Weinberg, D., 1995, in Wide-field spectroscopy and
the distant universe, proc. 35th Herstmonceux confer-
ence, eds S.J. Maddox & A. Arago´n-Salamanca, World
Scientific, p3
Hamilton, A. J. S, Culhane, M., MNRAS, 278, 73
Heavens, A. F., Taylor, A. N., 1995, MNRAS, 483, 497
Hu, W. & Bunn, E. F. & Sugiyama, N. 1995, ApJ, 447, L59
Hu, W., Scott, D., Sugiyama, N. & White, M. 1995, Phys.
Rev. D, 52, 5498
Jungman, G., Kamionkowski, M., Kosowsky, A. & Spergel,
D. N. 1996a, Phys. Rev. Lett., 76, 1007
Jungman, G., Kamionkowski, M., Kosowsky A. & Spergel
D. N. 1996b, Phys. Rev. D, 54, 1332
Kaiser, N. 1987, MNRAS, 227, 1
Karhunen, K., U¨ber lineare Methoden in der Wahrschein-
lichkeitsrechnung (Kirjapaino oy. sana, Helsinki, 1947).
Kendall M. G., Stuart, A., 1969. The Advanced Theory of
Statistics, Volume II, Griffin, London
Kenney, J. F. & Keeping, E. S. 1951, Mathematics of Statis-
tics, Part II, 2nd ed. (Van Nostrand, New York).
Knox, L. 1995, Phys. Rev. D, 52, 4307
Lawrence, A., Rowan-Robinson, M., Saunders, W., Parry,
I., Xia Xiaoyang, Ellis R. S., Frenk, C.S., Efstathiou,
G. P., Kaiser, N., Crawford, J., 1996, to be submitted
to MNRAS
Liddle, A. R. et al. 1996, MNRAS, 281, 531
Peacock, J. A. & Dodds 1994, MNRAS, 267, 1020
Press, W. H., Teukolsky, S. A., Vetterling, W. T., Flannery,
B. P., 1992, Numerical Recipes. Cambridge University
Press, Cambridge
Saunders, W. et al. 1996, in preparation
Scott, D., Srednicki, M. & White, M. 1994, ApJ, 421, L5
Seljak, U. & Bertschinger, E. 1993, ApJ, 417, L9
Seljak, U. & Zaldarriaga, M. 1996, preprint astro-
ph/9603033
Smoot, G. F. et al. 1992, ApJ, 396, L1
Taylor, K., 1995, in Wide-field spectroscopy and the distant
universe, proc. 35th Herstmonceux conference, eds S.J.
Maddox & A. Arago´n-Salamanca, World Scientific, p15
Tegmark, M., Bunn, E. F. 1995, ApJ, 455, 1
Tegmark, M. 1995, ApJ, 455, 429
Tegmark, M. 1996, MNRAS, 280, 299
Tegmark, M. & Efstathiou, G. 1996, MNRAS, 281, 1297
Therrien, C. W. 1992, Discrete Random Signals and Statis-
tical Signal Processing (Englewood Cliffs: Prentice Hall)
Vogeley, M. S. 1995, in “Wide-Field Spectroscopy and the
Distant Universe”, eds. Maddox & Arago´n-Salamanca
(World Scientific, Singapore)
Vogeley, M. S. & Szalay, A. S., 1996, ApJ, 465, 34
White, M. & Bunn, E. F. 1995, ApJ, 450, 477
Yamamoto, K. & Bunn, E. F. 1996, ApJ, 464, 8

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

Readership statistics are being calculated.