Combining experiments to discover linear cyclic models with latent variables
Abstract
We present an algorithm to infer causal relations between a set of measured variables on the basis of experiments on these variables. The algorithm assumes that the causal relations are linear, but is otherwise completely general: It provides consistent estimates when the true causal structure con- tains feedback loops and latent variables, while the experiments can involve surgical or soft interventions on one or multiple variables at a time. The algorithm is online in the sense that it combines the results from any set of available experiments, can incorporate background knowledge and resolves conflicts that arise from combining results from different experiments. In addition we provide a necessary and sufficient condition that (i) determines when the algorithm can uniquely return the true graph, and (ii) can be used to select the next best experiment until this condition is satisfied. We demonstrate the method by applying it to simulated data and the flow cytometry data of Sachs et al (2005).
Author-supplied keywords
Combining experiments to discover linear cyclic models with latent variables
Combining Experiments to Discover Linear Cyclic Models
with Latent Variables
Frederick Eberhardt Patrik O. Hoyer Richard Scheines
Department of Philosophy
Washington University in St Louis
CSAIL, MIT
& HIIT, Univ. of Helsinki
Department of Philosophy
Carnegie Mellon University
Abstract
We present an algorithm to infer causal re-
lations between a set of measured variables
on the basis of experiments on these vari-
ables. The algorithm assumes that the causal
relations are linear, but is otherwise com-
pletely general: It provides consistent esti-
mates when the true causal structure con-
tains feedback loops and latent variables,
while the experiments can involve surgical or
`soft' interventions on one or multiple vari-
ables at a time. The algorithm is `online'
in the sense that it combines the results from
any set of available experiments, can incorpo-
rate background knowledge and resolves con-
icts that arise from combining results from
dierent experiments. In addition we provide
a necessary and sucient condition that (i)
determines when the algorithm can uniquely
return the true graph, and (ii) can be used
to select the next best experiment until this
condition is satised. We demonstrate the
method by applying it to simulated data and
the
ow cytometry data of Sachs et al (2005).
1 INTRODUCTION
Causal knowledge is key to supporting inferences
about the eects of interventions on a system, hence
much of applied science is devoted to identifying
causal relationships between various measured quan-
tities. For this purpose the randomized controlled ex-
periment is generally the tool of choice whenever such
experiments are feasible.
Appearing in Proceedings of the 13th International Con-
ference on Articial Intelligence and Statistics (AISTATS)
2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of
JMLR: W&CP 9. Copyright 2010 by the authors.
While the standard theory of experimental design pro-
vides procedures and inference techniques for a given
set of potential causes (treatment variables) and eects
(outcomes), it does not provide guidance on how to de-
termine the full set of causal relationships among the
measured variables (i.e. the causal `interaction graph').
Randomized experiments directly provide the eects of
the intervened variables in the experiment, but break-
ing these eects down into direct and indirect eects
(with respect to the measured variables), or combin-
ing them to determine the total eect, is typically
not straightforward. This problem arises in particu-
lar when correlations in the data may be partly at-
tributable to unknown confounding variables, or when
the system of interest exhibits feedback phenomena,
as for example is common in bioinformatics and eco-
nomics. Feedback phenomena in particular invalidate
standard approaches based on directed acyclic graphs.
Furthermore, experiments can involve dierent types
of manipulations (e.g. randomizing a single vs. mul-
tiple variables simultaneously per experiment), and
sometimes it is only possible to perform `soft' interven-
tions, in which the variables in
uenced by the exper-
imenter still (partly) depend on their natural causes.
Because of diculties such as these, there is a denite
need for algorithms and procedures that (i) combine
the (possibly con
icting) results from dierent types
of experiments, (ii) incorporate background knowledge
where available, (iii) provide guidance on the selection
of experiments, and (iv) given a set of stated assump-
tions, are able to identify consistently and eciently
the causal structure from experimental data, or do as
well as possible (with regard to some measure) in cases
of underdetermination or con
icting evidence.
As a simple and concrete example, consider the two
alternative graphs of Figure 1 as explanations of the
causal relationships between the variables x1; x2; and
x3. If in three separate experiments we randomize each
of the variables one at a time while measuring the two
others, we could deduce from the resulting dependen-
cies that there exist directed paths from x1 to x2, from
Discovery of Linear Cyclic Models with Latent Variables
x1
x2
x3
l1
l2
x1
x2
x3
l1
l2
Figure 1: Two distinct causal graphs which cannot
be distinguished based on (marginal and conditional)
independencies alone, when using only a combination
of observational data and experiments randomizing at
most one variable per experiment.
x1 to x3, and from x2 to x3. However, based on de-
pendencies and independencies alone it is impossible
to determine whether there is a direct eect x1 ! x3
in addition to the indirect eect through x2, because
the latents l1 and l2 preclude us from using statistical
conditioning to identify this feature. Note that we can
identify all edges among the observed variables if we
can in an experiment simultaneously randomize both
x1 and x2 while measuring x3. This holds generally:
By simultaneously randomizing all variables except a
target variable, it is possible to determine all direct
causal eects with respect to the randomized variables.
By performing such a `leave-one-out' experiment for
each variable, and combining the results, the complete
graph can be determined. Unfortunately, it is seldom
feasible or economical to perform such experiments.
There already exists a substantial body of work in
this area. For instance, Cooper & Yoo (1999) use a
Bayesian score to combine experimental and obser-
vational data in the acyclic no-hidden-variable case.
Tong & Koller (2001), Murphy (2001), Eberhardt
(2008) and He & Geng (2008) provide strategies for
this setting to optimally select the experiments to learn
as much as possible about the underlying structure.
Furthermore, Eaton & Murphy (2007) describe how to
handle `uncertain' interventions where the targets of a
given intervention is unknown, while Nyberg & Korb
(2006) analyze the implications of soft interventions.
All of the above methods assume the restricted case
of acyclic structures and no confounding latent vari-
ables. While Richardson (1996) discusses learning
cyclic models based on observational data alone, there
is little work that formally integrates experimental re-
sults into the search procedure. One exception to this
is recent work by Schmidt and Murphy (2009) who
adapt a formulation for undirected graphs to model
cyclic directed graphs with no latents. In contrast, the
standard literature on experimental design permits la-
tent variables but only considers a very restricted set
of possible causal structures.
In this paper we provide a procedure for discovery
of linear causal models that uses experimental data
and is completely general with regard to structural as-
sumptions. Given a set of datasets over the variables
of interest where in each dataset a dierent combi-
nation of variables has been subject to intervention,
our procedure identies the linear constraints on the
causal eects among the variables. For an underdeter-
mined system it returns a model that provides a mini-
mal representation of the measured constraints. From
here there are two ways to proceed: Either additional
routines can be used to characterize the underdetermi-
nation, or, alternatively, a condition we provide iden-
ties the next experiment that would add the most
additional required constraints. Once the system is
suciently constrained (available background knowl-
edge can be used if available), the procedure identi-
es the direct causal eects relative to the measured
variables, and additionally infers the presence and lo-
cations of confounding latent variables. If the system
is overconstrained it returns the model with the min-
imum sum of squared errors to the constraints. The
technique works for both cyclic and acyclic systems,
and easily incorporates soft interventions.
2 MODEL
The data generating model can be represented as a
set of structural equations that assign to each of the
observed variables a linear combination of the other
variables and an additive `disturbance' term. Group-
ing the observed variables xi; i = 1; : : : ; n into the
vector x, the corresponding disturbances ei into the
vector e, and the linear coecients bji representing
the direct eects xi ! xj into the matrix B, we have:
x := Bx + e: (1)
Here, latent variables are represented as non-zero
o-diagonal entries in the covariance matrix e =
EfeeT g of the disturbances. Without loss of gener-
ality, all the xi and the ei have zero mean.
A surgical (edge-breaking, i.e. fully randomizing) in-
tervention on a variable xi is represented by a ma-
nipulated model where the row in B corresponding
to xi has been set to zero, indicating that xi is no
longer in
uenced by any of the other observed vari-
ables. Similarly, we set the corresponding disturbance
ei to zero, and instead assign an independent, unit
variance experimental variable ci to xi. If several of
the observed variables are surgically intervened on si-
multaneously (in the same experiment), we perform
these steps simultaneously for each of the intervened
variables. Thus from the original model (B;e) we
obtain a manipulated model (B;e) for which
x := Bx + c + e;
Eberhardt, Hoyer & Scheines
where the intervention vector c is zero except for rows
corresponding to variables that were subject to an
intervention, while the manipulated disturbances e
equal e except that elements for intervened variables
are set to zero. This follows the standard view that
interventions break all incoming arrows to any inter-
vened node in a directed graph representing the direct
eects between the observed variables (Pearl 2000).
In some situations an intervention that breaks the in-
uence of the set of causes on a given variable may not
be possible, while a `soft' intervention, in which the in-
tervention in
uences the variable but does not fully de-
termine it, can be performed. In our framework such
an intervention on a variable xi does not aect the
corresponding row of B, nor the corresponding distur-
bance ei, but it does add an independent experimental
variable ci. Assuming that ci is measured, its correla-
tion with x can be used to obtain an estimate of its
causal eect, as in the standard instrumental variable
setting.1 In what follows, however, all interventions
are surgical unless otherwise explicitly stated.
We assume that our data sets are generated from a se-
quence of experiments (Ek) k=1; ::: ;m where each exper-
iment Ek = (Jk;Uk) consists of a set Jk of one or more
variables that are subject to an intervention (simulta-
neously and independently) and a set Uk denoting the
other variables, which are passively observed.
In each experiment Ek = (Jk;Uk) our measurements
consist of the experimental eects of each xj 2 Jk
on each xu 2 Uk, denoted by t(xj xujjJk), which is
equal to the covariance of xj and xu in this experimen-
tal setting (since xj has unit variance).The total eect
t(xj xu) of xj on xu is standardly dened as the ex-
perimental eect in a (hypothetical if not actual) ex-
periment in which only xj is subject to an intervention,
i.e. we have t(xj xu) t(xj xujjfxjg). In experi-
ments in which interventions are performed on more
than one variable (i.e. Jk is larger than the set fxjg),
the experimental eect may dier from the total eect,
so t(xj xu) may in general not equal t(xj xujjJk)
when Jk 6= fxjg.
Additionally, if available, a set of passive observational
data (for which Jk = ;) can be used to estimate the
(passive observational) covariance matrix of the ob-
served variables: x = EfxxT g.
In the special case when the variables can be or-
dered in such a way that the matrix B is strictly
lower-triangular (corresponding to a directed acyclic
graph) the interpretation of the model is straightfor-
1We here assume that the parametrization (B;e) of
the original model does not change as a result of the soft
intervention. One could imagine other soft interventions
where this is not the case.
ward. This case is known as a `recursive SEM' in the
literature and corresponds to a causal Bayes network
with linear relationships over continuous variables.
The interpretation of models which contain directed
cycles (i.e. are `non-recursive') requires more care.
Such cases arise from feedback phenomena and rep-
resent systems that equilibriate.2 To ensure that an
equilibrium exists the absolute values of the eigenval-
ues of B must all be less than 1. This condition must
also be satised for the manipulated matrix B in all
possible experiments (both hypothetical and actual).3
While self-loops ([B]ii 6= 0 for any i) are permitted in
the true model, the estimated model will not repre-
sent them explicitly; instead the eect of the self-loop
will be included in the incoming edges to the vari-
able in question.4 The only underdetermination is the
path and the speed to convergence to the equilibrium,
not the equilibrium point itself. Since all of our mea-
surments are presumed to come from the equilibrium,
this underdetermination is not only natural, but in-
evitable. Hence, we assume in the following that B
has been standardized to have a diagonal of zeros.
Finally, we note that our representation of latent vari-
ables is only implicit (in correlations among the com-
ponents of e) rather than explicit. Thus, while the
true model may contain some latent variables that si-
multaneously confound more than two variables, these
will be rendered as a set of non-zero entries in the es-
timated covariance matrix e, connecting any pair of
variables that are confounded by the hidden variables.
3 PROCEDURE
The experiments only directly supply measures of the
experimental eects, but our main interest lies in the
direct eects represented by the matrix B in the model
of Section 2. Our procedure for deriving the direct ef-
fects essentially consists of two steps: The experimen-
tal eects are combined to infer total eects, and the
total eects are subsequently used to derive the direct
eects. We rst describe the latter step.
2Our procedure can handle both stochastic and deter-
ministic equilibria for cyclic models, but the details are
beyond the scope of this paper.
3In order to avoid special cases, in which identiability
depends on the experiments performed and the particular
parameterization, we use this slightly stronger assumption
than is needed, including all possible experiments.
4If variable xi has an incoming edge with edge-coef-
cient a and a self-loop with edge-coecient b in the true
model B, then the estimated model will not return the self
loop but instead return the edge coecient on the incoming
edge as a=(1 b).
Discovery of Linear Cyclic Models with Latent Variables
3.1 TOTAL TO DIRECT EFFECTS
An experiment Ek consisting of a single surgical in-
tervention (i.e. Jk = fxjg) directly supplies the total
eect of that variable (xj) on all the other measured
variables. (Note that the presence of latent variables
does not aect the total eect of a variable that is
subject to a surgical intervention.5) If we have such
an experiment for each measured variable, then all to-
tal eects can be estimated and represented by a total
eects matrix T, where [T]ji = t(xi xj) and we de-
ne t(xi xi) = 1 for all i.
By solving equation (1) for x as a function of e, we
obtain x = Ae with A = (I B) 1, where I refers to
the nn identity matrix.6 It turns out that T = AD
where D is a diagonal matrix that rescales A to have
a diagonal of ones. Thus, given an estimated total
eects matrix T we can infer the direct eects as
B = I A 1 = I DT 1:
Because B is standardized to have a diagonal of zeros,
it follows that D rescales T 1 to have a diagonal of
ones. Furthermore, given A and the covariance matrix
e over the disturbances, x is determined by
x = EfxxT g = EfAeeTAT g = AeAT :
Given passive observational data from which we can
estimate x, we can now infer the sought covariance as
e = A 1xA T to determine the latent variables.
3.2 EXPERIMENTAL TO TOTAL
EFFECTS
Experiments intervening on more than one variable
enable search strategies that are more ecient (in the
number of experiments and in total sample size) than
single intervention experiments (see Section 5, Fig. 2).
In such multi-intervention experiments the experimen-
tal eects do not necessarily correspond to the total ef-
fects: Any path of the form xj ! : : :! xi ! : : :! xu
is broken if xi is surgically manipulated. Nevertheless,
the experimental eects provide partial information on
the total eects in the form of linear constraints: In an
experiment Ek = (Jk;Uk) with xj 2 Jk and xu 2 Uk
the total eect [T]uj = t(xj xu) from xj to xu can
be represented as
[T]uj = t(xj xu) =
X
xi 2Jk
t(xj xi)t(xi xujjJk) (2)
5In the case of soft interventions techniques used for
instrumental variables can be applied with the same result.
6Recall that for stability of the linear system we have
to require that the absolute values of all eigenvalues of B
are smaller than 1. Thus for each 2 eigenvalues(B) we
have 1 2 eigenvalues(I B), so the stability condition
ensures that no eigenvalue of I B is zero and the matrix
is always invertible.
where we dene t(xj xj) = 1. That is, the total ef-
fect of xj on xu can be redescribed as a decomposition
of the total eects of xj on each variable xi in the in-
tervention set and the experimental eects of the xi
on the passively observed variable xu. In the experi-
ment only the experimental eects t(xi xujjJk) can
be measured, but given the intervention set Jk we can
specify for each experiment qk = jJkjjUkj linear con-
straints of the form of equation (2) on the total eects.
Since there are n2 n total eects for n measured vari-
ables, several experiments will be required to obtain a
sucient number of constraints.
We can now combine the set of q = Pi qi linear con-straints (from all available experiments) into
Ht = h (3)
where the (q (n2 n))-matrix H and the vector h
of q scalars contain the measured experimental eects,
and t is the (n2 n)-vector of desired total eects that
can be re-arranged into the matrix T.
In principle, any number of constraints can be repre-
sented in this equation: constraints that are linearly
dependent (by coincidence or because two sets of mea-
surements, possibly from dierent experiments, con-
strain the same total eects and agree), constraints
that con
ict (because two sets of measurements con-
strain the same total eects, but do not agree), or
added constraints that represent background knowl-
edge (e.g. xj is not an ancestor of xi: t(xj xi) = 0,
or more generally that the total eect of xj on xi is
equal to some scalar c: t(xj xi) = c).
In most cases H is not square nor full-rank, and hence
equation (3) is not straightforwardly solvable for t.
When there are no con
icting constraints and the rank
of H is less than n2 n then the system is underdeter-
mined, so there are several directed graphs that satisfy
the experimental constraints. If on the other hand the
rank of H is n2 n then there is either a unique causal
structure that satises all the constraints, or if there
are con
icting constraints, there is no such model.
In principle, there are several ways to proceed. One
could select non-con
icting constraints until full rank
is achieved and proceed to the unique solution ignoring
the other constraints, thereby avoiding (denying) con-
ict and remaining agnostic in the underdetermined
case. We proceed dierently: We use the Moore-
Penrose pseudoinverse Hy to invert H using whatever
constraints are available. In the underdetermined case
the pseudoinverse returns the solution that minimizes
the l2-norm of the total eects; in the square invertible
case we obtain the unique solution, and in the overde-
termined case we obtain the solution that minimizes
the sum of the squared errors of the constraints. We
Eberhardt, Hoyer & Scheines
Program 1 Algorithm pseudocode. A full implemen-
tation of this procedure is available at:
http://www.cs.helsinki.fi/u/phoyer/code/LLC/
Experiments, LLC: (linear, latents, cyclic)
Given m datasets Dk from a sequence of experiments
(Ek) k=1; ::: ;m and, if available, also a dataset Do of pas-
sive observational data over a set of measured variables:
For each experiment Ek = (Jk;Uk) with dataset Dk
For each ordered pair of variables (xj ; xu) with xj 2 Jk
and xu 2 Uk
Determine from Dk the constraint
t(xj xu) = Pxi2Jk t(xj xi)t(xi xujjJk).
Re-order the constraint and concatenate it to the
matrix equation Ht = h.
Check H and h and report rank and the existence of
con
icts, if any, to determine whether the system is
underconstrained, con
icted or exactly determined.
Compute t = Hyh, and determine the total eects ma-
trix T by reordering t into matrix form (note that the
diagonal of T consists of 1's.).
Determine the direct eects matrix B = I DT 1,
where I is the n n identity matrix.
If passive observational dataset Do is available then
Estimate the passive observational covariance matrix
x = EfxxT g from dataset Do.
Infer the error covariance matrix e = A 1xA T .
Return the estimated B (and e if available).
are thus able to provide an `online' algorithm (Pro-
gram 1), which can integrate background knowledge
and, given the constraints, does as well as possible
with regard to a well-dened measure.
4 IDENTIFIABILITY
Ideally, the constraints obtained from the experiments
would determine the total eects matrix T uniquely.
However, this will almost never be the case, since most
combinations of experiments provide either too few or
too many constraints, and in the latter case con
icts
will be common. Hence the crucial question for the sci-
entist is how the identiability of the causal structure
is related to the experiments. Given a set of measured
variables V, the following condition enables exactly
this connection between identiability and experimen-
tal set-up.
Denition 1 Given experiments (Ek)k=1;:::;m and an
ordered pair of variables (xi; xj) 2 VV with i 6= j we
say that the pair condition is satised for this variable
pair w.r.t. the experiments whenever there is a k such
that xi 2 Jk and xj 2 Uk for some Ek = (Jk;Uk).
Theorem 1 Given the experimental eects from the
experiments (Ek)k=1;:::;m over the variables in V, all
total eects t(xi xj) are identied if and only if the
pair condition is satised for all ordered pairs of vari-
ables w.r.t. these experiments.7
Sketch of proof of necessity: Suppose the pair condi-
tion is not satised for an ordered pair of variables
(xi; xj). Then for any experiment Ek = (Jk;Uk), ei-
ther (i) xi; xj 2 Jk or (ii) xi; xj 2 Uk or (iii) xi 2 Uk
and xj 2 Jk. Consider t(xi xj). Experiments of type
(iii) place no constraint on t(xi xj). Experiments of
type (i) provide for each w 2 Uk a constraint of the
form t(xi w) = t(xi xj)t(xj wjjJk) + const, but
since t(xi w) is unknown, these constraints are in-
sucient to determine t(xi xj). The only relevant
constraints from experiments of type (ii) marginalize
over xi making it impossible to separate the eect of
some w 2 Jk via xi from the more direct eects:
t(w xj jjJk) = t(w xijjJk)t(xi xj jjJk [ fxig)
+t(w xj jjJk [ fxig)
Since there will be a constraint of this type for each w,
t(xi xj jjJk [fxig) is undetermined. But it is needed
to determine the total eect of xi on xj in terms of
what is known, since:
t(xi xj) = t(xi xj jjJk [ fxig)
+ X
xk2Jk
t(xi xk)t(xk xj jjJk [ fxig)
Hence the total eect t(xi xj) is underdetermined, so
the total eects matrix T is not fully determined.
Sketch of proof of suciency: Since the pair condition
is satised for each ordered pair of variables (xi; xj),
we can select for each total eect t(xi xj) one such
constraint for our matrix equation Ht = h.
Let H be the matrix of n2 n constraints on the to-
tal eects obtained when for each variable there is an
experiment intervening on all but that variable. Each
constraint will have at most n 1 non-zero entries
in H. These entries will correspond to direct eects
of an intervened variable on the target variable. Ap-
propriately re-ordering the rows, H can be organized
into a block-diagonal matrix, where each block on the
diagonal corresponds to a submatrix of I B. One
such block is shown in Table 1. Given the stability
7Full proofs supplied upon request. Due to space con-
straints we here give a proof sketch only.
Discovery of Linear Cyclic Models with Latent Variables
Table 1: One of the blocks in H. The b(xi ! xj) are
entries of the direct eects matrix B.
J t(x1 x2) t(x1 x3) t(x1 x4)
V n fx2g 1 b(x3! x2) b(x4! x2)
V n fx3g b(x2! x3) 1 b(x4! x3)
V n fx4g b(x2! x4) b(x3! x4) 1
assumption on B (see Section 2), any such block, and
consequently H, is full rank. In cases where the pair
condition is satised, but the experiments did not in-
tervene on all but one variable, at least one of the
o-diagonal elements in such a block of H must be
zero. Leaving a variable out of the intervention set
corresponds in the measurement of the constraint to
a marginalization over that variable. It can be shown
that marginalization correponds to elementary row op-
erations on the corresponding block of H. Further,
since satisfaction of the pair condition ensures that
each row of H consists of a constraint corresponding
to a dierent marginalization, it can be shown that
any such H can be obtained by row operations from
the H matrix. Since H is full rank, and since row
operations preserve rank, any such H is full rank, and
hence invertible.
For an underdetermined system, the theorem implies
that there are ordered pairs of variables that do not
satisfy the pair condition. One way to proceed is
to select the next intervention set to maximally re-
duce the number of such pairs. Using brute force,
this would require checking Pki=1
n
i
possible exper-
iments for n variables and maximum intervention set
size k n=2. However, faster heuristics are available
or can be adapted (Eberhardt 2008). If feasible, an
experiment that intervenes on n=2 variables, none of
which have yet been subject to intervention, supplies
the maximum number of additional pair constraints.
Selecting the next best experiment is obviously greedy,
and if this procedure is used to select a whole sequence
of experiments starting with no prior knowledge, it will
select a sequence of 2 log2(n) experiments interveningon a dierent set of n=2 variables in each experiment.
This sequence of experiments is sucient for identi-
ability, but is known to be suboptimal for most n.
Instead of attempts at further resolution one might be
interested in a characterization of the underdetermi-
nation. Given that the system is linear, the solution
space for the total eects is easily specied, so fur-
ther numerical or analytical routines can be used to
specify the implications of the underdetermination for
the direct eects matrix B. Although analytical pro-
cedures could be devised, a naive numerical heuristic
would sample total eects from the solution space, and
invert T repeatedly to explore the variation in B.
5 SIMULATIONS
Several factors will aect the accuracy of our estima-
tion procedure. The number of samples used in each
experimental condition will naturally be an important
consideration. Another is the number of constraints
obtained from the set of experiments, which depends
not only on the number of experiments but also on how
many variables are intervened per experiment. Finally,
the relative eciency of soft vs. surgical interventions
is unclear. It is thus not obvious what the best exper-
imental strategy is in any given practical situation.
To investigate the issue empirically, we generated 100
random cyclic models over 8 variables, each with 4 la-
tents and possible self-loops, and sampled experimen-
tal data from these models for a variety of strategies.
First, we considered surgical interventions only. We
compared the strategy of intervening on each variable
separately (8 experiments to satisfy the pair condition)
to intervening on pairs of variables (also requiring 8 ex-
periments) to intervening on four variables at a time
(requiring only 2 log2 8 = 6 experiments), with a va-riety of sample sizes. Figure 2 (top row) shows the
accuracy measured as the mean correlation between
true and estimated values across the 100 graphs as a
function of the total number of samples (divided evenly
over the number of experiments) used for each strat-
egy. The main result is that experiments involving in-
terventions on a number of variables are more eective
than experiments of single interventions, re
ecting the
larger number of constraints obtained per experiment.
Next, we investigated the eciency of soft interven-
tions, performing identical simulations to those given
above but replacing surgical interventions with soft in-
terventions (bottom row in Figure 2). Here, we addi-
tionally test the strategy of performing a single ex-
periment intervening on all variables simultaneously.
(Surgically intervening on all variables simultaneously
does not provide any information, but softly interven-
ing does.) Results are comparable to the surgical in-
terventions case, except that softly intervening on all
variables is a superior strategy in terms of the accuracy
as a function of the total number of samples used.
6 FLOW CYTOMETRY DATA
To demonstrate how our algorithm might be used in
practice we show its application to the
ow cytome-
try data of Sachs et al. (2005). This data consists of
single cell recordings of the abundance of 11 dierent
phosphoproteins and phospholipids under a number of
experimental conditions in human T-cells. In each con-
dition, measurements were obtained from some 700{
900 single cells. These data have been previously an-
Discovery of Linear Cyclic Models with Latent Variables
Table 2: Estimated direct eects from the Sachs et
al. data, as estimated from (a) the experiments with
a background condition of -CD3/28; and (b) with a
background condition of -CD3/28 + ICAM-2. The
tables are read from column to row so, for instance, the
direct eect of PKC on PIP2 is 0:90 and 0:61 in the
two settings. Errors given by a bootstrap analysis were
small compared to the size of the coecients.
(a) (b)
Akt PKC PIP2 Mek Akt PKC PIP2 Mek
Akt -0.91 -0.22 0.22 Akt -1.34 -0.24 -0.26
PKC -0.15 -0.09 0.48 PKC 0.13 -0.10 -0.08
PIP2 -0.43 -0.90 0.40 PIP2 0.09 -0.61 -0.01
Mek -0.27 -1.69 -0.22 Mek 0.10 -0.93 -0.11
Given the total eects among the four intervened-upon
molecules, we derived the corresponding direct eects
among them. Note that these direct eects are direct
only relative to these four molecules; the eects are
presumably mediated by the other (both measured and
unmeasured) molecules in the signaling system. The
result is shown in Table 2a. The most pronounced re-
sult is that PKC has strong negative connections to
the other three variables. Similarly, PIP2 and Akt
have weak negative eects, and Mek weak positive di-
rect eects on the other variables.
While it is obvious that linearity is a very strong (and
indeed quite unreasonable) assumption for this data,
we nevertheless hope that we can use it as a rst and
very rough approximation. Fortunately, the dataset
contains some additional data that can be used to at
least partly corroborate the results from our analysis:
In addition to the -CD3/28 background condition,
there are similar experimental data in which a reagent
called ICAM-2 was added to the background, both
with or without the inhibitory targeted interventions.
Using this additional data, we were able to conrm the
strong inhibitory direct eect (with respect to these
four variables) of PKC on the other three molecules,
as shown in Table 2b. Similarly, PIP2 still has a weak
negative eect on the other variables. On the other
hand, in this additional data the eect of Akt and of
Mek on the other variables is weak and of opposite
sign to the results in (a), indicating that these connec-
tions should be treated as unreliable at best. Thus, in
this respect, the model ts the `ground truth' graph
of Sachs et al. (2005) relatively well, as there were no
directed paths from these variables to the other of the
four variables in their graph.
7 CONCLUSIONS
We have described (and provided a software package
for) a general algorithm for discovery of linear causal
models using experimental data. The algorithm can
handle cyclic structures and discover the presence and
location of latent variables. It is `online' in that it
works with any set of experimental data, and indi-
cates whether the system is underdetermined, overde-
termined or exactly determined. The method of inte-
grating data from dierent experiments resolves con-
icts in a way that is optimal with regard to a specied
measure. In an underdetermined system the algorithm
provides a minimally satiscing result, and we have
suggested how to proceed by selecting more experi-
ments, or by characterizing the underdetermination.
It should be noted that in addition to the generality of
structures we consider, we do not assume faithfulness.
This results in the (admittedly demanding) identia-
bility condition we have shown to be necessary and
sucient. Similarly, if constraints other than those on
the experimental eects are considered, model search
can be made more ecient.
The current version of the algorithm relies on the as-
sumption of linearity. While no corresponding solution
can be supplied for general discrete networks, we are
hopeful that a similar procedure for particular types of
discrete parameterizations (e.g. noisy-or) is possible.
References
Cooper, G. and C. Yoo (1999). Causal discovery from a
mixture of experimental and observational data. In
UAI '99, pp. 116{125.
Eaton, D. and K. Murphy (2007). Exact Bayesian struc-
ture learning from uncertain interventions. In AIS-
TATS '07.
Eberhardt, F. (2008). Almost optimal intervention sets
for causal discovery. In UAI '08.
Ellis, B. and W. Wong (2008). Learning causal Bayesian
network structures from experimental data. J. Amer.
Stat. Assoc. 103, 778{789.
He, Y. and Z. Geng (2008). Active learning of causal
networks with intervention experiments and optimal
designs. J. Mach. Learn. Res. 9, 2523{2547.
Murphy, K. P. (2001). Active learning of causal Bayes
net structure. Technical report, U.C. Berkeley.
Nyberg, E. and K. Korb (2006). Informative interven-
tions. In Causality and Probability in the Sciences.
College Publications, London.
Pearl, J. (2000). Causality. Oxford University Press.
Richardson, T. (1996). Feedback Models: Interpretation
and Discovery. Ph. D. thesis, Carnegie Mellon.
Sachs, K., O. Perez, D. Pe'er, D. Lauenburger, and
G. Nolan (2005). Causal protein-signaling networks
derived from multiparameter single-cell data. Sci-
ence 308 (5721), 523{529.
Schmidt, M. and K. Murphy (2009). Modeling discrete
interventional data using directed cyclic graphical
models. In UAI '09.
Tong, S. and D. Koller (2001). Active learning for struc-
ture in Bayesian networks. In UAI '01, pp. 863{869.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


