Inferring Multiple Graphical Models
Available from ArXiv
Page 1
Inferring Multiple Graphical Models
Inferring Multiple Graphical Models
Julien Chiquet, Yves Grandvalet, Christophe Ambroise.
Laboratoire Statistique et Ge´nome
523, place des Terrasses de l’Agora
91000 E´vry, FRANCE
e-mail:
[julien.chiquet,christophe.ambroise]@genopole.cnrs.fr; yves.grandvalet@utc.fr
url: http://stat.genopole.cnrs.fr
Abstract: Gaussian Graphical Models provide a convenient framework
for representing dependencies between variables. Recently, this tool has
received a high interest for the discovery of biological networks. The lit-
terature focuses on the case where a single network is inferred from a set
of measurements, but, as wetlab data is typically scarce, several assays,
where the experimental conditions affect interactions, are usually merged
to infer a single network. In this paper, we propose two approaches for
estimating multiple related graphs, by rendering the closeness assumption
into an empirical prior or group penalties. We provide quantitative results
demonstrating the benefits of the proposed approaches.
Keywords and phrases: Network inference, Gaussian graphical model,
Multiple sample setup, Cooperative-LASSO, Intertwined-LASSO.
Contents
1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Network Inference with GGM . . . . . . . . . . . . . . . . . . . . . . . 2
3 Inferring Multiple GGMs . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1 Intertwined Estimation . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Graphical Cooperative-LASSO . . . . . . . . . . . . . . . . . . . 5
4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1 Problem Decomposition . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Solving the Sub-Problems . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . 12
5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1 Synthetic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Protein Signaling Network . . . . . . . . . . . . . . . . . . . . . . 17
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
A.1 Derivation of the pseudo-log-likelihood . . . . . . . . . . . . . . . 20
A.2 Blockwise Optimization of the pseudo-log-likelihood . . . . . . . 21
A.3 Derivation of the Subdifferential for the Cooperative-LASSO . . 21
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1. Motivations
Systems biology provides a large amount of data sets that aim to understand
the complex relationships existing between the molecular entities that drive any
biological process. Depending on the molecule of interest, various networks can
1
ar
Xi
v:
09
12
.4
43
4v
1
[s
tat
.M
E]
2
2 D
ec
20
09
Julien Chiquet, Yves Grandvalet, Christophe Ambroise.
Laboratoire Statistique et Ge´nome
523, place des Terrasses de l’Agora
91000 E´vry, FRANCE
e-mail:
[julien.chiquet,christophe.ambroise]@genopole.cnrs.fr; yves.grandvalet@utc.fr
url: http://stat.genopole.cnrs.fr
Abstract: Gaussian Graphical Models provide a convenient framework
for representing dependencies between variables. Recently, this tool has
received a high interest for the discovery of biological networks. The lit-
terature focuses on the case where a single network is inferred from a set
of measurements, but, as wetlab data is typically scarce, several assays,
where the experimental conditions affect interactions, are usually merged
to infer a single network. In this paper, we propose two approaches for
estimating multiple related graphs, by rendering the closeness assumption
into an empirical prior or group penalties. We provide quantitative results
demonstrating the benefits of the proposed approaches.
Keywords and phrases: Network inference, Gaussian graphical model,
Multiple sample setup, Cooperative-LASSO, Intertwined-LASSO.
Contents
1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Network Inference with GGM . . . . . . . . . . . . . . . . . . . . . . . 2
3 Inferring Multiple GGMs . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1 Intertwined Estimation . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Graphical Cooperative-LASSO . . . . . . . . . . . . . . . . . . . 5
4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1 Problem Decomposition . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Solving the Sub-Problems . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . 12
5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1 Synthetic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Protein Signaling Network . . . . . . . . . . . . . . . . . . . . . . 17
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
A.1 Derivation of the pseudo-log-likelihood . . . . . . . . . . . . . . . 20
A.2 Blockwise Optimization of the pseudo-log-likelihood . . . . . . . 21
A.3 Derivation of the Subdifferential for the Cooperative-LASSO . . 21
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1. Motivations
Systems biology provides a large amount of data sets that aim to understand
the complex relationships existing between the molecular entities that drive any
biological process. Depending on the molecule of interest, various networks can
1
ar
Xi
v:
09
12
.4
43
4v
1
[s
tat
.M
E]
2
2 D
ec
20
09
Page 2
Chiquet, Grandvalet, and Ambroise/Inferring Multiple Graphical Models 2
be inferred, e.g., gene-to-gene regulation network or protein-protein interaction
network. The basic idea is to consider that if two molecules interact, a statistical
dependency between their expression should be observed.
A convenient model of multivariate dependence patterns is Gaussian Graph-
ical Modeling (GGM). In this framework, a multidimensional Gaussian variable
is characterized by the so-called concentration matrix, where conditional in-
dependence between pairs of variables is characterized by a zero entry. This
matrix may be represented by an undirected graph, where each vertex repre-
sents a variable, and an edge connects two vertices if the corresponding pair of
random variables are dependent, conditional on the remaining variables.
Merging different experimental conditions from wetlab data is a common
practice in GGM-based inference methods (Toh and Horimoto 2002, Scha¨fer and
Strimmer 2005). This process enlarges the number of observations available for
inferring interactions. However, GGMs assume that the observed data form an
independent and identically distributed (i.i.d.) sample. In the aforementioned
paradigm, assuming that the merged data is drawn from a single Gaussian
component is obviously wrong, and is likely to have detrimental side effects in
the estimation process.
In this paper, we propose to remedy this problem by estimating multiple
GGMs, each of whom matching different modalities of the same set of variables,
which correspond here to the different experimental conditions. As the distribu-
tions of these modes have strong commonalities, we propose to estimate these
graphs jointly, in the multi-task framework (Caruana 1997). This line of attack
alleviates the difficulties arising from the scarcity of data in each experimental
condition by coupling the estimation problems. Our first proposal biases the
estimation of the concentration matrices towards a common value. Our second
proposition focuses on the similarities in the sparsity pattern that are more di-
rectly related to the graph itself. We propose the Cooperative-LASSO, which
builds on the Group-LASSO, (Yuan and Lin 2006) to favor solutions with a
common sparsity pattern, but encodes a further preference towards solutions
with similar sign patterns, thus preserving the type of co-regulation (positive or
negative) across assays.
To our knowledge, the present work is the first to exploit the multi-task
learning framework for learning GGMs. However, coupling the estimation of
several networks has recently been investigated for Markov random fields, in
the context of time-varying networks. Kolar et al. (to appear) propose two spe-
cific constraints, one for smooth variations over time, the other one for abrupt
changes. Their penalties are closer to the Fused-LASSO and total variations
penalties than to the group penalties proposed here.
2. Network Inference with GGM
In the GGM framework, we aim to infer the graph of conditional depen-
dencies among the p variables of a vector X from independent observations
(X1, . . . , Xn). We assume that X is a p-dimensional Gaussian random variable
X ∼ N (0p,Σ). Let K = Σ−1 be the concentration matrix of the model; the
non-zero entries of Kij indicate a conditional dependency between the variables
Xi and Xj , and thus define the graph G of conditional dependencies of X.
The GGM approach produces the graph G from an inferred K. The latter
cannot be obtained by maximum likelihood estimation that would typically
be inferred, e.g., gene-to-gene regulation network or protein-protein interaction
network. The basic idea is to consider that if two molecules interact, a statistical
dependency between their expression should be observed.
A convenient model of multivariate dependence patterns is Gaussian Graph-
ical Modeling (GGM). In this framework, a multidimensional Gaussian variable
is characterized by the so-called concentration matrix, where conditional in-
dependence between pairs of variables is characterized by a zero entry. This
matrix may be represented by an undirected graph, where each vertex repre-
sents a variable, and an edge connects two vertices if the corresponding pair of
random variables are dependent, conditional on the remaining variables.
Merging different experimental conditions from wetlab data is a common
practice in GGM-based inference methods (Toh and Horimoto 2002, Scha¨fer and
Strimmer 2005). This process enlarges the number of observations available for
inferring interactions. However, GGMs assume that the observed data form an
independent and identically distributed (i.i.d.) sample. In the aforementioned
paradigm, assuming that the merged data is drawn from a single Gaussian
component is obviously wrong, and is likely to have detrimental side effects in
the estimation process.
In this paper, we propose to remedy this problem by estimating multiple
GGMs, each of whom matching different modalities of the same set of variables,
which correspond here to the different experimental conditions. As the distribu-
tions of these modes have strong commonalities, we propose to estimate these
graphs jointly, in the multi-task framework (Caruana 1997). This line of attack
alleviates the difficulties arising from the scarcity of data in each experimental
condition by coupling the estimation problems. Our first proposal biases the
estimation of the concentration matrices towards a common value. Our second
proposition focuses on the similarities in the sparsity pattern that are more di-
rectly related to the graph itself. We propose the Cooperative-LASSO, which
builds on the Group-LASSO, (Yuan and Lin 2006) to favor solutions with a
common sparsity pattern, but encodes a further preference towards solutions
with similar sign patterns, thus preserving the type of co-regulation (positive or
negative) across assays.
To our knowledge, the present work is the first to exploit the multi-task
learning framework for learning GGMs. However, coupling the estimation of
several networks has recently been investigated for Markov random fields, in
the context of time-varying networks. Kolar et al. (to appear) propose two spe-
cific constraints, one for smooth variations over time, the other one for abrupt
changes. Their penalties are closer to the Fused-LASSO and total variations
penalties than to the group penalties proposed here.
2. Network Inference with GGM
In the GGM framework, we aim to infer the graph of conditional depen-
dencies among the p variables of a vector X from independent observations
(X1, . . . , Xn). We assume that X is a p-dimensional Gaussian random variable
X ∼ N (0p,Σ). Let K = Σ−1 be the concentration matrix of the model; the
non-zero entries of Kij indicate a conditional dependency between the variables
Xi and Xj , and thus define the graph G of conditional dependencies of X.
The GGM approach produces the graph G from an inferred K. The latter
cannot be obtained by maximum likelihood estimation that would typically
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
1 Reader on Mendeley
by Discipline
100% Mathematics
by Academic Status
100% Assistant Professor
by Country
100% France


