Sign up & Download
Sign in

Weighted-LASSO for structured network inference from time course data.

by Camille Charbonnier, Julien Chiquet, Christophe Ambroise
Statistical Applications in Genetics and Molecular Biology (2010)

Abstract

We present a weighted-Lasso method to infer the parameters of a first-order vector auto-regressive model that describes time course expression data generated by directed gene-to-gene regulation networks. These networks are assumed to own a prior internal structure of connectivity which drives the inference method. This prior structure can be either derived from prior biological knowledge or inferred by the method itself. We illustrate the performance of this structure-based penalization both on synthetic data and on two canonical regulatory networks, first yeast cell cycle regulation network by analyzing Spellman et al's dataset and second E. coli S.O.S. DNA repair network by analysing U. Alon's lab data.

Cite this document (BETA)

Available from arxiv.org
Page 1
hidden

Weighted-LASSO for structured network inference from time course data.

Weighted-Lasso for Structured Network
Inference from Time Course Data
Camille Charbonnier, Julien Chiquet, Christophe Ambroise.
Laboratoire Statistique et Genome
523, place des Terrasses de l'Agora
91000 Evry, FRANCE
e-mail: julien.chiquet@genopole.cnrs.fr; camille.charbonnier; christophe.ambroise
url: http://stat.genopole.cnrs.fr
Abstract: We present a weighted-Lasso method to infer the parameters
of a rst-order vector auto-regressive model that describes time course
expression data generated by directed gene-to-gene regulation networks.
These networks are assumed to own prior internal structures of connectiv-
ity which drive the inference method. This prior structure can be either
derived from prior biological knowledge or inferred by the method itself.
We illustrate the performance of this structure-based penalization both on
synthetic data and on two canonical regulatory networks, rst yeast cell
cycle regulation network by analyzing Spellman et al's dataset and second
E. coli S.O.S. DNA repair network by analysing U. Alon's lab data.
Keywords and phrases: Biological networks, Vector auto-regressive model,
Lasso.
1. Introduction
Along the dozen of years of statistical studies related to microarrays for gene ex-
pression pro ling, conditional dependency has been recognized as an appropriate
statistical tool to model direct interactions between genes. Graph representation
suits well such relationships between variables. As a consequence GGMs (Gaus-
sian Graphical Models) have been widely studied by statisticians, particularly
those looking for applications to the reconstruction of gene-to-gene regulation
networks (See e.g. Schafer and Strimmer 2005, Meinshausen and Buhlmann
2006, Wille and Buhlmann 2006, Castelo and Roverato 2006, Drton and Perlman
2007, Shimamura et al. 2007). In the context of transcriptomic data, the main
statistical issue paradoxically relies on the scarcity of data: despite a shrinking
cost, microarrays still provide dataset that fall into the high-dimensional set-
ting. Namely, the number of variables (the p genes) remains greater than the
sample size n (the number of microarray slides).
In the Gaussian independent identically distributed (hereafter i.d.d.) setting,
each microarray experiment is considered as a realization of a Gaussian vector
whose dependency structure is fully determined by its covariance matrix. It can
be shown that non-null conditional dependencies between genes are described
by nonzero entries of the inverse of the covariance matrix (Dempster 1972).
Thus, inferring this matrix is equivalent to recovering the graph of interest,
which is not trivial when n is smaller than or of the order of p. To handle
1
ar
X
iv
:0
91
0.
17
23
v2
[
sta
t.A
P]
9
D
ec
20
09
Page 2
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 2
the data scarcity, methods based upon `1-norm are very popular: they answer
to both questions of regularization and of variable selection by selecting the
most signi cant edges between genes in the network. In the i.i.d setting, `1-
penalized maximum likelihood Gaussian covariance estimation has been rst
investigated by Yuan and Lin (2007) and Banerjee et al. (2008) independently.
These methods provide sparse graph estimates, sparsity being a characteristic
of gene-to-gene regulation networks.
Looking for an improvement of these methods regarding the biological con-
text, we provided in Ambroise et al. (2009) a method that not only looks for
sparse solutions, but also for an internal structure of the network that drives
the inference. Indeed, biological networks and particularly gene regulation net-
works are known not only to be sparse, but also organized, so as nodes belong to
di erent classes of connectivity. Thus, we suggested a criterion that takes this
heterogeneity into account. This leads to a better inference when networks are
highly structured. Note that Marlin et al. (2009) published subsequently an in-
dependent paper providing a similar method in a Bayesian framework. In these
two papers, the internal structure considered relies on aliation networks. That
is, genes are clustered into groups that share the same connectivity patterns.
This can be seen as the analogous to the group-Lasso (Yuan and Lin 2006)
applied to a graphical context.
Finally, some authors (e.g. Opgen-Rhein and Strimmer 2007, Lebre 2009,
Shimamura et al. 2009) underlined that transcriptomic dataset are not i.i.d.
when considering time course expression data. Assuming a rst-order vector
auto-regressive (VAR1) model for the time course data generation, they pro-
vided inference methods handling high-dimensional settings: Opgen-Rhein and
Strimmer suggested a shrinkage estimate while Lebre performed statistical tests
on limited-order partial correlations to select signi cant edges. In a recent work,
Shimamura et al. (2009) proposed to deal with this VAR1 setup by combining
ideas from two major developments of the Lasso to de ne the Recursive elastic-
net. As an elastic-net (Zou and T. 2005), this method adds an `2 penalty to the
original `1 regularization, thus encouraging the simultaneous selection of highly
correlated covariates on top of the automatic selection process due to the `1
norm. As in the adaptive Lasso (Zou 2006), weights are corrected on the basis
of a former estimate so as to adapt the regularization parameter to the relative
importance of coecients. Note that, in this context, we are no longer looking
for an estimate of the inverse of the covariance matrix but of the parameters of
the VAR1 model, which leads to a directed graph.
In this paper, we aim to couple the time course data modeling by the VAR1
model to an `1-regularizing approach that takes the internal structure of the
network into account. This internal structure does not rely on an aliation
structure anymore since graphs inferred from time course data display a com-
pletely asymmetrical pattern. The internal structure adopted here splits the
genes into two groups: a group of hubs that exhibit a high connection proba-
bility to all other genes and a group of leaves that only receive edges leaving
from the hub class. This information can either be inferred as seen in this paper
or recovered from biological expertise since recovering hubs consists roughly in
Page 3
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 3
exhibiting transcription factors in regulatory networks, a large number of them
being already identi ed by the biologists.
Another re nement of our method is to built on the adaptive-Lasso (Zou
2006, Zhou et al. 2009) which is known to reduce false positive rate compared
to the classical Lasso. As such, our method belongs to the larger family of
weighted-Lasso methods. Shimamura et al. (2007) built upon Meinshausen
and Buhlmann's neighborhood selection and the adaptive Lasso to improve
inference of networks in an i.i.d. context. They chose separate penalties for each
node's neighborhood selection problem and adapted each individual penalty co-
ecient to the information brought by an initial ridge estimate. Here, we suggest
to lower the bias of the Lasso by not only using information from an initial
statistical inference but also from prior knowledge about the topology of the
network that assumes the existence of genes with high connection probability
to other genes.
The rest of the paper is organized as follows: in the next Section, the VAR1
model and the associated likelihood function are brie
y recalled; an `1-penalized
criterion is proposed where each parameter of the VAR1 model, representing the
graph of interest, is weighted according to its belonging to the hub group. The
weights can also depend on a previous estimate just as in the adaptive-Lasso.
We also brie
y recall available tools to guide the choice of the regularization
parameter. In Section 3, the inference procedure is detailed: we present how
the internal structure can be recovered; from that point, network inference re-
duces to a convex optimization problem which we solve through an active-set
algorithm based upon the approach of Osborne et al. (2000). Finally, an exper-
imental Section investigates the performances of the method. First, simulated
data are considered; then, we try to recover edges implied in two di erent reg-
ulation processes. First in yeast cell cycle, by analyzing the Spellman's dataset
and comparing the selected edges to the direct regulations collected from the
Yeastract database; second in E. coli, by analyzing U. Alon's precise kinetic
data on S.O.S. DNA repair subnetwork.
Remark. The code will soon be embedded in the R package SIMoNe. (Chiquet
et al. 2009).
2. Modeling Heterogeneous Regulation Networks from Time Course
Data
2.1. Auto-regressive Model and Sparse Networks
Let P = f1; 2; : : : ; pg be the set of variables of interest, e.g., some p genes. Let us
denote by (Xt)t2N the Rp-valued stochastic process that represents the discrete-
time evolution of the gene expression levels, written as a row vector. Also denote
by Xit the expression level of gene i at time t and X
ni
t the expression level of
all genes but i at time t. Herein, Xt is assumed to be generated by a rst-order
vector auto-regressive (VAR1) model
Xt = Xt1A + b + "t; t 2 N;
Page 4
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 4
where A = (Aij)i;j2P is an pp matrix, b is a size-p row vector and "t is a white
Gaussian process. Namely, "t  N (0;D) where D is a diagonal matrix such as
Dii = 2i and cov("t; "s) = 1fs=tgD for all s; t > 0. Moreover, X0  N (;),
with  a size-p vector of means and  a covariance matrix. Also assume that
cov(Xt; "s) = 0 for all s > t: hence, Xt is obviously a rst-order Markov process.
Since the covariance matrix D is diagonal, each entry Aij is directly linked
to the partial correlation coecient between variables Xit and X
j
t1. In fact,
Aij =
cov

Xjt ; X
i
t1jX
ni
t1

var

Xit1jX
ni
t1
 ;
thus nonzero entries of A code for the adjacency matrix of a directed graph
describing the conditional dependencies between the elements of P. Inferring A
is equivalent to reconstructing this graph and is the main issue of this paper.
To this end, let us set up the estimation framework: assume that Xt is ob-
served on the time space t = 0; 1; : : : ; n. Denote by X the (n+ 1) p matrix of
available centered, scaled to unit-variance data, whose tth row contains the infor-
mation relative to the p variables at time t. The empirical variance{covariance
matrix S and the empirical temporal covariance matrix V are then given by
S =
1
n
X|nnXnn; V =
1
n
X|nnXn0;
where Xnk denotes matrix X deprived of its kth row.
The well-known maximum likelihood estimator (MLE) of A is easily recov-
ered and recalled in the following proposition.
Proposition 1. Maximizing the log-likelihood of the VAR1 process is equiva-
lent to the following maximization problem
max
A

Tr (V|A)
1
2
Tr (A|SA)

;
whose solution is given by
bAmle = S1V: (1)
Remark. Thanks to the assumptions we made on " the VAR1 model can be
seen as a usual regression problem: denote by Xp (respectively Xf ) the n rst
(respectively last) rows of X. A^ols is naturally given by (X|pXp)
1X|pXf =
S1V = A^mle. The MLE (1) is straightforwardly equivalent to the ordinary
least square estimate (OLS) of A.
Solution (1) requires a covariance matrix S that is invertible, which occurs
when S is at least of rank p. In real situations the number of observations is often
about or lower than the number of variables, thus MLE needs to be regularized.
Regularization such as Moore-Penrose pseudo inversion or `1-regularization can
be applied on matrix S in order to make the inversion always achievable. A
Page 5
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 5
sharpest approach is investigated in Opgen-Rhein and Strimmer (2007), where
the OLS solution is regularized by shrinking both matrices S and V.
We suggest to draw inspiration from the `1-penalized likelihood approach
developed by Banerjee et al. (2008) in the case of i.i.d. samples of a multivariate
Gaussian distribution: here, samples are no longer i.i.d yet linked through time
by the VAR1 model. Still, the sparsity can be controlled with a positive scalar
 adjoined to an `1-norm penalty on A by solving
A^`1 = arg max
A

Tr (V|A)
1
2
Tr (A|SA) kAk`1

: (2)
Since MLE and OLS are equivalent in this framework, solution to the penalized-
likelihood formulation (2) is equivalent to solving p independent Lasso problems
on each column of A, which is exactly Meinshausen and Buhlmann's approach.
The di erence is that it does not require any post-symmetrization since there
is no symmetry constraint on A in the present context.
2.2. A Structured Modeling of the Network
To attempt a better t of data, we suggest that A owns an internal structure
that describes classes of connectivity between the variables. Indeed, the `1-norm
regularization encourages a rst restriction on the network's topology inferred
through criteria (2), by encouraging sparsity. Yet, it is well known that by
penalizing truly signi cant entries of A as much as truly zero entries a single
`1 penalization leads to biased estimates and a particularly strong number of
false positives (Knight and Fu 2000, Zou 2006). Weighted-Lasso approaches
can lower this bias by adapting penalties to prior information about where the
true zero entries should be, relying on possibly data-driven as well as biological
information. An existing correction is given by the Adaptive-Lasso (Zou 2006,
Zhou et al. 2009). Penalty coecients are alleviated or increased using individual
weights reversely proportional to an initial estimate Ainit.
The main purpose of this paper is to show the interest of taking into account
information about the topology of the network: not only should we scale coef-
cients individually, but also consider the underlying organization of P. Adap-
tation of weights is made by providing A with a well-chosen prior distribution,
relying on the organization of P. We assume that genes are spread through a
partition of P into Q classes of connectivity. Both existences and weights of
edges, described by the elements of A, depend on the connectivity class each
vertex belongs to. Denote by Ziq the indicator function that gene i belongs to
class q. Each entry Aij ;ZiqZj` = 1 is provided with an independent prior distri-
bution fijq`. Following Ambroise et al. (2009), we choose Laplace distributions
for fijq` since it is the corresponding log-prior distribution to the `1 term in the
Lasso. Hence, by choosing
fijq`(x) =
1
2ijq`
exp


jxj
ijq`

;
Page 6
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 6
where ijq` are scaling parameters, we expect a model whose log-likelihood will
naturally make a speci c `1-penalization term appear.
Modeling hubs. Many con gurations t into this general model. In Ambroise
et al. (2009) we focused on an aliation model. This structure opposes intra
to inter-cluster connections, assuming the former to be far more likely than the
latter. In the present context, where dynamic regulatory networks are repre-
sented by directed graphs, the aliation model unnaturally assumes symmetric
probabilities for \incoming" and \outgoing" edges and should be banished. In-
deed, adjacency matrices associated to directed gene regulatory networks are
asymmetrical: genes belong to two completely di erent groups. While a group
of hubs exhibits a high connection probability to all other genes, the remaining
set of genes only receives edges leaving from the rst class. Illustration of this
phenomenon by Spellman et al. (1998)'s dataset on Saccharomyces cerevisiae is
presented in Section 4. This setup can be summarized as follows:
fijq` =
(
fhub (;hub) if q is the hub class,
fleaf (;leaf) if q is not the hub class:
Note this structure only di erentiate edges on the basis of their origin, whether
they leave from a hub or not, whatever be their arrival points. In this type of
structure built around hubs, the number of classes is xed at 2.
Allowing for individual prior information about i and j, this model can be
generalized to
fijq` =
(
fhub (;ijhub) if q is the hub class,
fleaf (;ijleaf) if q is not the hub class.
The likelihood. As the matrix A has been given a prior distribution, our
aim is to maximize the posterior probability of A, given the data X. For a xed
structure Z, this is equivalent to maximizing the joint probability
A^ = arg max
A
logP(X;A; Z):
Now, the likelihood P(X;A; Z) is straightforwardly given by
logP(X;A; Z) = Tr (V|A)
1
2
Tr (A|SA) kPZ ?Ak`1 + c; (3)
where c is a constant term and the p p penalty matrix is de ned by
PZ = (PZij )i;j2P =
X
q;`2Q
ZiqZj`
ijq`
:
Practically, we obtain the following penalty
PZij = 
1
ij 

1hubZi;hub + 
1
leafZi;leaf

=   ij  (hubZi;hub + leafZi;leaf) ;
Page 7
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 7
where  > 0 is a common factor to 1hub and 
1
leaf, which can vary so as to
adapt the penalty while the ratio 1hub=
1
leaf = hub=leaf > 1 remains constant
at a chosen level. Coecient ij can be held xed at 1 when no individual
information is taken into account or replaced by any well-chosen transformation
of an initial estimate of A in order to provide accurate information on where
true zeros might be.
2.3. Tuning the penalty parameter
We brie
y recall here the di erent techniques available in the literature. Asymp-
totic theory of the Lasso demonstrates that a penalty parameter of order
p
n
guarantees both estimation consistency and selection consistency: asymptoti-
cally, estimation is unbiaised and all relevant covariates are included in the model
with strictly positive probability (Knight and Fu 2000). In practice, this does
not tell us which penalty to use for a xed sample size n. To solve this problem,
Tibshirani (1996) suggests the use of cross-validation. However it is well-known
that the best penalty for prediction is not the best penalty for model selection
purposes. Cross-validation is therefore unrevelant here. Optimality in terms of
selection naturally draws attention towards penalties which would in some way
control the false discovery rate. Closest to that goal are penalty choices which
guarantee a control over the probability of connecting two nodes by a chain of
edges though no such path exists in the true graph. Such penalties have been
discussed in Banerjee et al. (2008), Meinshausen and Buhlmann (2006) or Am-
broise et al. (2009) for instance. However, as underlined in the latter, this kind of
penalty is often much too conservative to be used as anything else than an upper
bound on the set of interesting penalties. Relying on the Bayesian interpreta-
tion of the Lasso, another option is to maximize the marginal probability of the
data over all possible tuning parameters. A speci c approximation for graphs
is derived in Shimamura et al. (2007). Taking into account the fact that the
number of degrees of freedom of the Lasso equals the nal number of nonzero
parameters (Zou et al. 2007) computations get a lot easier. Particularly, the BIC
approximation of the marginal distribution as well as the AIC criterion, whose
good properties for model selection are well-known, are trivial to compute. In
our case, we obtain the following expressions:
BIC = n

Tr

V| bA


1
2
Tr

bA|SbA


log(n)
2
df;
AIC = n

Tr

V| bA


1
2
Tr

bA|SA^

df;
where bA denotes the estimate of A associated to penalty  and df the number
of nonzero entries in bA.
In practice, we observe that these last criteria present both advantages of
being straightforward to compute and of providing impressively sensible results
in terms of both Recall and Precision rates. We therefore adopt these criteria
to select two best penalties to choose from in the remaining of the paper.
Page 8
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 8
3. Inference Strategy
3.1. Structure Inference
In many application elds, the structure can be considered as known, learned
from expert knowledge. In genetic for instance, biologists can often extract the
list of transcription factors from the overall set of target genes.
Otherwise, the structure, or part of it, could remain latent: we suggest a basic
strategy that performs well practically for biological networks. In this context,
the structure goes down to the identi cation of hubs. To this purpose we suggest
a very intuitive path. A rst matrix A0 is estimated using an adequate single
Lasso penalty. We rely on AIC and BIC criteria to identify the best initial
penalty. Nodes are then classi ed into two groups, hubs and leaves, according to
the values of the `1-norms of the corresponding rows in A0. In order to account
for the particularly strong heterogeneity between the two groups (di erences
in size and dispersion), a Gaussian mixture approach is used for clustering the
genes. This de nes two submatrices A10 and A
2
0 containing respectively the lines
corresponding to the rst and second groups. Hubs are then characterized as
the class with the maximum mean absolute value of Ak0 .
3.2. Active-Set algorithm for Network Inference
Once the internal structure has been recovered, inference of A amounts to op-
timizing the penalized likelihood (3) where Z are xed parameters. This can be
achieved by solving some p independent Lasso{style problems since there is no
symmetry constraint on A: denoting by Mk the kth column of a given matrix
M, we wish to solve for each column of A the following minimization problem
A^k = arg min

L( ); where L( ) =
1
2
|S |Vk + k ? k`1 ; (4)
where  = (PZ)k for clarity purpose.
Solving penalized problem (4) can be achieved through various algorithms.
The elegant active-set approach suggested in Osborne et al. (2000) takes ad-
vantage of the sparsity of to solve the equivalent constrained problem: start-
ing from 0p as an initial guess, the set of active variables A = fi : i 6= 0g is
updated at various stages of the algorithm so as we solve linear systems with
limited sizes to determine the current nonzero coecients denoted by A herein.
The algorithm stops once the optimality conditions derived from the classical
Karush-Kuhn-Tucker conditions are satis ed. In the next paragraph, we detail
an adaptation to the present context of the Osborne et al., initially developed
for the Lasso for linear regression.
The objective function L in (4) is convex, yet not di erentiable everywhere
due to the `1-norm: from convex analysis, is solution to (4) iif 0p belongs to
Page 9
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 9
the subdi erential of L, which mainly forms the optimality conditions of the
problem. Here, the subdi erential is given by
@ L( ) = S Vk +  ? ;
where  2 sign( ), that is, i = sign ( i) if i 2 A, and i 2 [1; 1] if i 2 A.
Starting from = 0p, we select the component ` of whose subgradient
absolute value is maximal: as a matter of fact, a subgradient highly di erent
from zero induces high violation of the optimality conditions. Such a choice will
guarantee a large reduction of the objective function L during the optimization
procedure. Thus, this component is added to the active set A = A [ f`g.
Then, optimization is only performed on nonzero coecients A whose cardi-
nal is small since the solution is likely to be sparse. This is done by minimizing
L( A), which reduces to a classical optimization problem because the subdi er-
ential turns to an usual gradient r L on the active set A.
While optimizing, the next update +A = A + h is obtained by solving
rhL( A + h) = 0jAj, which leads to the following descent direction
h = A + S
1
A;A

VkA A ? sign( A + h)

:
However sign( A+ h) cannot be known while computing h and is consequently
approximated by the current sign of A equal to A:
h  A + S
1
A;A

VkA A ? A

:
Due to this approximation, we check for sign-consistency between the candi-
date update A + h and A. In case of inconsistency, the descent direction is
reduced so as A +
h is sign consistent with A. This ends the optimization
part of the algorithm.
Then, the active set A is updated since some i could have been set to zero
during the optimization procedure: this is done by looking for vanished is,
verifying @ L( i) = 0. Finally, optimality conditions are tested: if the maximal
` of the subdi erential corresponding to an unactivated component of is zero,
we have found a solution; otherwise, the active set is updated by adding ` to A,
since it induces the highest reduction of L.
These three steps | optimization, deactivation and optimality testing { are
repeated until a solution has been found, which is guaranteed (see Osborne et al.
2000). The full algorithm is detailed below. Note that it can either start from
0 = 0p or from a solution obtain from a more penalized problem with larger
vector of penalties , that speeds up the computation, hence having a behavior
that is similar to the homotopy/Lars algorithm (Efron et al. 2004).
The full matrix A is directly recovered by binding column-wisely the solutions
to the p Lasso{style problems.
Remark. With this method, the sparsity constraint only applies to each column
of A. This constraint implies that if we use n + 1 time points, S is of rank n
and thus no more than n connections can be activated by the Lasso at most in
each column (assuming the penalty is low enough to accept the activation of all
Page 10
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 10
Algorithm 1: Active-set algorithm
//INITIALIZATION
0;A fi : i 6= 0g ;  sign( )
while 0p =2 @ L( ) do
//1. OPTIMIZATION OVER A
//1.1 Compute the (approximate) direction h
h = A + S
1
A;A(V
k
A A ? A)
//1.2 Check for sign consistency
if sign( A + h) 6= A then
//Find a solution which is sign-feasible
; k arg min0<
<1 f
; k 2 A : k +
hk = 0g
A A +
h
else
A A + h
//2. LOOK FOR NEWLY ZEROED VARIABLES
for i : i = 0 and min2sign( i) j@ iL( i)j = 0 do
A Anfig
//3. OPTIMALITY TESTING
// Select ` providing the highest reduction of L
` arg maxi2 A i, where i = min2sign( i) j@ iL( i)j
if ` = 0 then
Stop and return
else
Update the active set: A = A [ f`g
Page 11
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 11
possible edges). Consequently, the sparsity constraint only applies to incoming
edges and not to outgoing ones. In that sense, sparsity assumptions implied by
`1 penalization only assume that each node is regulated by a small set of nodes
and do not contradict the existence of hubs regulating a huge set of nodes.
4. Experiments and Discussion
In this section we apply our algorithm to both synthetic and real data. Com-
parison is made rst within the family of the weighted-Lasso. We observe the
performances of the Lasso when associated with a single Lasso penalty or an
adaptive penalty. For the adaptive-Lasso, a single Lasso penalty is used as
initial estimator. We then try two di erent hub penalties: one relying only on
the known hub structure and a last one inferring the hub structure from the
initial Lasso estimator. We denote these estimators by Lasso, Adaptive, KnwCl,
and InfCl respectively. Corresponding penalties can be summarized as follows:
PLassoij _ 1
PAdaptiveij _

1
A^initij
_ 1
!
PKnwClij _ (hubZi;hub + leafZi;leaf)
P InfClij _

hubZ^i;hub + leafZ^i;leaf

;
where x _ y = maxfx; yg and Z^ denotes the inferred classi cation. In the re-
mainder of this section, we x the ratio leaf=hub = 2, thus penalizing twice as
much nodes labeled as leaves as nodes labeled as hubs. Note also that we choose
to maintain the modi cation of adaptive weights adopted in Zhou et al. (2009)
and prevent the alleviation of penalty parameters. This trick ensures that the
adaptive Lasso will select a subnetwork from the network inferred by the ini-
tial Lasso estimate. No edge can be included if it was already excluded by the
Lasso. In this way, the adaptive Lasso guarantees a decrease in false positives.
Apart from our family of weighted-Lasso proposals, comparison will be made
with state-of-the art network inference methods in a VAR1 setting: the Shrink-
age method suggested by Opgen-Rhein and Strimmer (2007), the Recursive
Elastic Net method (Renet-VAR) developed by Shimamura et al. (2009) and
the method based on dynamic Bayesian networks proposed by Lebre (2009)
and available in R within the G1DBN package.
Here, the interest of the inference lies in the recovery of the true edges, in
other words of whether the entries of A are correctly identi ed as nonzero.
Our estimators are mainly used for discriminating nonzero entries from others.
Quantities such as True Positives (TP), False Positives (FP), True Negatives
(TN) and False Negatives (FN) summarize the performances of these classi ers.
Precision TP/(TP + FP) is the ratio of the number of true nonzero elements
to the total number of nonzero elements in the estimated matrix A^. Recall
TP/(TP + FN) denotes the proportion of nonzero elements in A which were
Page 12
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 12
correctly recovered as nonzero in the estimation. Fallout FP/(FP+TN) gives on
the contrary the proportion of zero elements in A which were falsely declared as
nonzero in the estimation. In statistical terms, the Recall (or Hit Rate) would
be the empirical equivalent of the power of our classi cation method considered
as a test, while the Fallout (or False Alarm Rate) would correspond to the
rst type error. Note that, in the context of sparse network inference, the
number of total positives is small compared to the number of total negatives.
Thus, small variations of FP and TP will induce small variations in Fallout and
large variations in Recall. Hence, comparison between Precision and Recall is
generally more relevant than Fallout / Recall comparison in the present sparse
context. This is why we will generally choose to omit Fallout rates when we
need to alleviate the presentation of results.
These rates are easily obtained for the Lasso based methods since they
automatically produce null coecients. By increasing the penalty parameter we
obtain sparser and sparser graphs. We start from a large enough penalty to
constrain all coecients of A^ to 0 and decrease the penalty until we include
as many variables as allowed by the ratio n=p. We then select the best penalty
from this list as the one maximizing either the BIC or the AIC criterion.
Like the Lasso, Renet-VAR directly implements variable selection and penalty
choice is included in the algorithm. Concerning G1DBN, we follow the author's
advice to tune the parameters of the test procedure as described in the addi-
tional material of Lebre (2009). When applying the Shrinkage method devel-
opped by Opgen-Rhein and Strimmer (2007), a supplementary step is required
to transform continuous results into a binary solution. We follow Opgen-Rhein
and Strimmer's advice and rely on local false discovery rates. This provides each
edge with an existence probability conditional on the corresponding entry in A^.
We declare as inferred edge any edge with posterior probability exceeding the
threshold of 80% as the authors do.
4.1. Simulated Data
Simulation settings. To assess the performances of our approach, we apply
the previous model to a very favorable setup, where existing models already
perform quite well. We then decrease the ratio n=p in order to observe the
response of each method to this increasing lack of information. On top of that,
we consider graphs of di erent sizes: small graphs of 20 nodes, larger graphs of
100 nodes and a setup with 800 nodes. For smaller graphs, we consider three
di erent amounts of observations: 10, 20 and 40. For medium sized graphs,
we also consider the cases n = p=2 and n = p but omit the case n = 2p as
unrealistic. The setup p = 800; n = 20 is meant to mimic Spellman et al.'s
dataset.
Simulation of the VAR1 process is based upon the simulation strategy used
by Opgen-Rhein and Strimmer in order to ease the comparisons, but introduces
a structure based on hubs in order to better re
ect the structure we could
expect from a real data set. A graph is rst simulated, with xed numbers of
Page 13
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 13
nodes and edges. Like Opgen-Rhein and Strimmer we simulate sparse graphs,
with K = 2p edges. Nodes are split into two groups according to a multinomial
distribution with probabilities (0.1,0.9), leading to 10% of hubs in average. Edges
are then positioned in the graph according to a multinomial distribution, with
85% of edges from hubs to leafs, and the remaining set within hubs. Exception
is made for the very large graph, for which we base the number of edges and
their distribution on Spellman et al.'s data. The matrix A is synthesized on the
basis of this graph: we attribute a random partial correlation value uniformly
distributed on [1;0:2] [ [0:2; 1] to all nonzero coecients (corresponding to
edges in the graph).
From this matrix, a VAR1 observation is generated, using a centered Gaus-
sian starting value and a centered Gaussian noise, both with variance 2 = 0:1.
For computing time reasons, this is repeated 500 times for the small graphs, 200
times for medium sized graphs and 100 times for the large graph. Results are
averaged over all samples.
To gain a better insight into the diculty of these synthesized data set for
a Lasso estimator, we checked whether the irrepresentability condition (Zhao
and Yu 2006, Meinshausen and Yu 2008) was validated in all these very simple
simulations. First, note that the graphical context requires the irrepresentability
condition to be validated for each of the p genes at the same time, which makes
it much more dicult to hold than in the simple regression context where it
is an already strong hypothesis. In our context, since we solve p independent
Lasso problems, we can check the validity of the hypothesis in each of these
individual problems. For each gene, the irrepresentability condition is tested
using the true sign pattern extracted from the corresponding column of the
true adjacency matrix. Thus the sets of relevant and irrelevant covariates are
allowed to vary from one problem to another. Simulating 100 samples of each
simulation setting, we observed that even in a favorable setup with twice as many
observations as variables (p = 20 genes) the irrepresentability condition fails for
30% of genes in average . With p = 20 genes and only n = 10 observations this
assumption fails in average for 51% of the genes. In other words, for around
half of the genes we cannot expect the Lasso to recover the exact sign pattern.
See Table 1 for details. Admittedly, the irrepresentability condition is a really
strong assumption, necessary and sucient for exact sign recovery, that is to
say not only the exact neighborhoods (no false positives, no false negatives) but
also the exact signs of the correlations. Yet since the simulated values are quite
well separated between true zeros and true nonzeros we would have expected
that this hypothesis would have been much more validated. Information about
the validity of the restricted eigen-value assumptions (Bickel et al. 2009) would
be greatly appreciated to compensate for such pessimistic results, but these are
computationally intractable. Adaptation of Juditsky and Nemirovsky (2008)'s
results to the present context could be of great bene t.
Discussion of simulation results. Results are presented in Figure 1 under
the form of Barcharts. Figure 2 illustrates the case where p = 100 by giving
Page 14
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 14
n=pp 20 100
2 0.30 (0.23) -
1 0.41 (0.23) 0.37 (0.15)
1/2 0.51 (0.18) 0.42 (0.12)
Table 1
Average proportion of genes for which the irrepresentability condition does not hold and
standard error in each simulation setting.
boxplots for the distributions of Precision, Recall and Fallout.
Compared methods di er with the type of setting. First of all, since the
Shrinkage method (particularly the local false discovery rate step) relies on the
hypothesis that p is large, we do not consider it fair to apply it to the small
network setting. Reversely, for computing time reasons we decided to restrict
the application of G1DBN to the graphs of size p = 20.
Penalties for the Lasso based methods were chosen on the basis of either the
BIC or AIC criterion. Although theory states that the BIC ought to outperform
the AIC in terms of model selection (Zou et al. 2007), we observed that in prac-
tice the BIC criterion might be too conservative when n is small compared to p.
In that situation, it might be interesting to favor the less stringent AIC criterion
which will induce a higher recall rate for not such a large loss in precision. Note
that the penalty choice based on the AIC or the BIC can lead to choose the
null model as best model. In that case, Precision cannot be de ned. We thus
show the results for precision over all simulations where at least one variable
was included.
The rst point worth noting in Figure 1 is that in all settings the Lasso is
outperformed by weighted-Lasso methods and others. This quick check con-
rms the interest of compensating for the bias induced by `1 regularization on
large coecients. It is also possible that what we observe about the validity
of the irrepresentability condition jeopardizes the performances of the single-
penalty Lasso. In line with Table 1, the Lasso performs particularly badly
when the ratio n=p is not favorable, with recall and precision rates under 20%
when p = 20; n = 10. It even performs so poorly that it deprecates the infer-
ence based on adaptive weights. A priori information on where the true ze-
ros might compensate for this apparent lack of \neighborhood stability", using
Meinshausen and Buhlmann's vocabulary, and explain why the KnwCl penalty
is far more accurate (precision of 84% in average for a recall of nearly 50% in
average for the same simulation setting p = 20; n = 10).
As expected, in all settings (except when n is really too small compared to
p) the Adaptive penalty improves the precision but at the price of a smaller
recall rate. On the contrary, the inferred classi cation InfCl allows to improve
the precision without undermining the recall rate. However, both methods are
highly dependent on the initial Lasso estimate. Therefore, the gain in precision
resulting from such methods decreases with the n=p ratio.
Bene tting from a certain amount of supplementary information, the KnwCl
penalty leads to a clear increase in both precision and recall. Particularly when
little information is available in terms of number of observations, taking a priori
Page 16
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 16
n
=
10
0
!
! !
!
!
!
!
!
Lasso Adapt KnwCl InfCl Renet Shrinkage
0.0
0.4
0.8
Precision comparison (BIC)
!
!!
!
!
!!!
!
!
!!!
Lasso Adapt KnwCl InfCl Renet Shrinkage
0.0
0.4
0.8
Recall comparison (BIC)
!
! !
!
!
!
!
!
Lasso Adapt KnwCl InfCl Renet Shrinkage
0.0
0.4
0.8
Precision comparison (BIC)
!
!!
!
!
!!!
!
!
!!!
Lasso Adapt KnwCl InfCl Renet Shrinkage
0.0
0.4
0.8
Recall comparison (BIC)
n
=
50
!!
!
!
!!
!!
!
!
!
Lasso Adapt KnwCl InfCl Renet Shrinkage
0.0
0.4
0.8
Precision comparison (BIC)
! !!
!!
!
!
!!!
!!
Lasso Adapt KnwCl InfCl Renet Shrinkage
0.0
0.4
0.8
Recall comparison (BIC)
!!
!
!
!!
!!
!
!
!
Lasso Adapt KnwCl InfCl Renet Shrinkage
0.0
0.4
0.8
Precision comparison (BIC)
! !!
!!
!
!
!!!
!!
Lasso Adapt KnwCl InfCl Renet Shrinkage
0.0
0.4
0.8
Recall comparison (BIC)
Fig 2. Boxplots of Precision, Recall and Fallout statistics for all methods except Shrinkage
in a setup p = 100, for 200 simulated data sets. Best Lasso penalties chosen on the basis of
the BIC criterion.
this we let the number of nodes range from 5 to 185 and xed the number of
observations at half the maximum number of nodes, i.e. n = 92. This leads to a
ratio n=p ranging from 0.05 to 2. Computing times for the weighted Lasso with
inference of the classi cation InfCl and selection of the best penalty, Renet-
VAR and G1DBN are presented in the log-log scale in Figure 3. We can see
Page 17
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 17
that running times for Renet-VAR and G1DBN can become a handicap as soon
as p gets large while computing times for InfCl rarely exceed 2 minutes.
2.5 3.0 3.5 4.0 4.5 5.0
0
2
4
6
8
10
CPU time (log scale)
log of p/n
log
of CP
U tim
e (sec
)
1 se
c.
10 s
ec.
1 mi
n.
10 m
in.
1 hr.
4 hr.G1DBNRenetVARInfCl
Fig 3. Computing times on the log-log scale for Renet-VAR, G1DBN and InfCl (including
inference of classes). Intel Dual Core 3.40 GHz processor.
4.2. Yeast Data
We confronted our model to time measurements of Saccharomyces cerevisiae
gene expression data collected by Spellman et al. (1998). We focus on the subset
of genes they identi ed as periodic, i.e. genes whose transcription levels over time
show evidence that they are cell-cycle regulated.
Remarks on the data set. This dataset is one of the rst microarray ex-
periments. It is thus doomed to be rather noisy, contrary to the simulated data
sets. Besides, we had to face the problem of missing values, which appeared on
some of the most important genes. We imputed them as the mean of the two
closer known observations in time for the gene considered, before and after the
time point of interest.
On top of its noisiness, Spellman et al.'s data set is particularly hard to tackle
from a statistical view point. Information is provided on 786 genes for only 18
time points. This implies that using our algorithm we cannot activate more than
17  786 = 13362 edges out of 789  786 = 617796 possible ones, that is to say
2.2%.
However, we can rely on experimental conclusions on yeast gene regulation
networks to collect target information about the true edges of the graph. We
compare our results to the adjacency matrix provided by the Yeastract database
(www.yeastract.com). We retain information on documented direct relation-
Page 18
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 18
ships, that is to say direct regulations con rmed by published experimental
results.
Note however that this theoretical benchmark is biased in two ways. First,
some true edges might be missing because all regulations might not have been
con rmed by experiments yet. Second, this graph gathers all reported regu-
lations, whatever the conditions of the experiment. Some might not actually
happen during the precise experiment we consider. We can suppose the e ect of
the rst bias to be low in a model organism such as Saccharomyces cerevisiae.
The e ect of the second bias is much more likely however, since measurements
are all made while cells are at the beginning of their growth, growing until ready
for DNA synthesis. We cannot expect the whole range of possible regulations to
happen in such a small portion of the cell cycle.
This dataset illustrates quite well the biological properties our model is based
upon. First, documented information reveals the existence of 1385 true edges
(among more than 600000 possible ones in theory). The theoretical graph is
thus extremely sparse. Secondly, the hub structure is quite clear: edges leave
from only 26 out of 786 genes. Hence knowledge of the hubs provides crucial
information on the position of edges. This phenomenon also clearly appears on
Figure 4. Incoming degrees never exceed 20 but only 1 is null. On the contrary,
outgoing degrees are null for the vast majority of genes. Signi cant degrees
appear as outliers in this distribution, reaching up to 150 for some of them.
lll
ll
l
l
l
l
ll
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
indegrees outdegrees
0
50
100
150
Fig 4. Boxplots of incoming and outgoing degrees in Yeast theoretical adjacency matrix
Discussion of the results. The setting is much harder than in the rst
simulated data sets, with a ratio n=p = 2:3% as well as harder than the last
simulated dataset with less separated correlations between existing and non
existing edges. Results presented in Table 2 show quite well the diculty all
methods encounter in front of this data set. Results for the Shrinkage approach
Page 19
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 19
are not shown because the local false discovery rate step included in this method
was heavily
awed by the lack of separability between edges and non edges.
Except for the KnwCl penalty, all Lasso based estimators are reduced to the
null model. Both the BIC and AIC criteria do not nd the increase in likelihood
large enough to compensate for the complexity of any model with at least one
edge. Performances of the KnwCl penalty and Renet-VAR remain lower than
what we could expect from simulated results.
Models Lasso Adaptive KnownCl InferCl Renet
Precision - - 0.082 - 0.004
Recall 0 0 0.068 0 0.003
Fallout 0 0 0.002 0 0.002
Table 2
Precision, Recall and Fallout performances for all Lasso based methods and Renet-VAR on
Spellman et al.'s data set. Best Lasso penalties chosen on the basis of the BIC criterion.
Many reasons for such bad perfomances could be thought of. We already
mentionned the noisiness of the data, which quite hardly di erentiated the edges
from non edges. Second, homogeneity of the VAR(1) model might be too strong
an assumption. Last but not least, when looking more closely at how data were
collected we noticed that measurements were made every 7 minutes, which might
be long enough for dependencies to vanish. Also, since we measure values related
to the cell cycle, measurements were necessarily made on di erent cells each
time, thus measuring the expression levels on di erent individuals at each time
point. In brief, this apparently longitudinal data set might share more common
points with i.i.d. models than with VAR1 processes.
4.3. E. coli S.O.S. DNA repair network
In this section we quit the high dimensional setup and compare the performances
of all methods in a much easier framework. We focus on a sub-network from E.
Coli S.O.S. DNA repair network analyzed by Ronen et al. 1. Data provide infor-
mation on the main 8 genes of the S.O.S. network (uvrD,lexA,umuD,recA,uvrA,uvrY,ruvA,polB)
across 50 time points. Measurements rely on precise expression kinetics which
allow Ronen et al. to monitor mRNA expression levels every 6 min after expo-
sition of the DNA to UV light at time 0. We will not dwell on the measurement
technology here (see Ronen et al. (2002) for details). Note however that the
authors do not measure the actual mRNA quantity present in the cell at time
t but the instant promoter activity of each gene. Equivalence between the two
measurements is guaranteed if the instant quantity of mRNA in the cell roughly
equals its production rate, that is to say if there is no accumulation of mRNA
in the cell. Under this assumption, Ronen et al. 's data can be used as any
microarray dataset.
E. coli S.O.S. DNA repair network provides a precise benchmark: speci c
regulatory interactions in response to DNA damage have been characterized. In
1data downloadable on Uri Alon's homepage, http://www.weizmann.ac.il/mcb/UriAlon/
Page 20
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 20
other words, we can rely on a theoretical regulatory network which represents
the main direct transcriptory regulations actually taking place during the ex-
periment. According to the regularly updated EcoCyc database, lexA is the only
regulator in this subnetwork, regulating all genes including itself. Concretly, the
protein LexA is at the core of the regulation network, usually binding sites in the
promoter regions of S.O.S. genes to repress their expression. As soon as RecA
senses DNA damage (by binding to single-stranded DNA), it becomes activated
and induces LexA autocleavage. The decrease in LexA concentration alleviates
the repression of S.O.S. genes. When damage is repaired, the level of activated
RecA drops, LexA accumulates and represses again all S.O.S. genes.
Detailed results are presented in Figure 5. We can see that performances
di er a lot from one experiment to another. Particularly, experiments 1 and 4
lead to signi cantly poor results although nothing should a priori distinguish
them from 2 or 3 (1 and 2, respectively 3 and 4, share the same U.V. exposure).
As on simulated data, the Lasso leads to poor results. G1DBN shows simi-
larly poor performances here. Quite surprisingly, Renet-VAR does not perform
as well as we could have expected from simulations. It reaches 50% of recall
at the expense of very low precision rates. Adaptive penalty improves more the
quality of the estimation than in the simulation studies. Now they increase the
precision of the Lasso without really undermining the recall rate. Inference of
the classi cation outperforms these, with higher recall and precision rates. This
is quite interesting since except in experiments 1 and 4 where the Lasso provide
almost no information, inference of the classes seems quite good although the
initial Lasso still shows mediocre results. To nish with, the KnwCl penalty
bene ts quite well here from its extra information since it outperforms all other
methods and manages to reach honest results even in datasets 1 and 4 which
disturbed all other methods.
Inferred graphs on experiment 2 are shown in Figure 6. The regulatory activ-
ity of lexA is more or less recovered by all methods. What is interesting is that
a common structure recurently shows up among false positives: regulations due
to uvrA. This regulation pattern is particularly what dominates experiment 4
and leads to so poor results. Strangely, we could not nd any mention of this
regulatory activity in the literature. Either there is a need for further biological
research on this gene or there is an undirect regulation blurring the results.
Another unknown regulation dominates all inferred graphs: regulation of uvrY
by polB. It is all the more interesting as it survives the bad a priori that the
KnwCl penalty holds against it. Further biological investigation could want to
look at this couple of genes more closely.
In this respect, we could note that the regulatory e ect of activated RecA
on LexA does not appear on these graphs, which we could see as a good point
since this is a post-transcriptional regulation. We would also like to lay the
emphasis on the fact that we here check selection consistency of all the methods
but not their sign consistency. We only check whether we identify the right
edges and not the activation/inhibition processes associated to them. Looking
more closely at the estimated matrices, we can see that the (shrunk) correlations
estimated between lexA and the remaining genes are all positive and not negative
Page 22
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 22
Lasso Adaptive Lasso
uvrDlexAumuDC
recA
uvrA
uvrY ruvA
polB
uvrDlexAumuDC
recA
uvrA
uvrY
ruvA
polB
Known Classi cation Inferred Classi cation
uvrD
lexA
umuDC
recA
uvrA uvrY ruvA
polB
uvrDlexAumuDC
recA
uvrA
uvrY ruvA
polB
G1DBN Renet-VAR
uvrDlexAumuDC
recA
uvrA uvrY ruvA
polB
uvrDlexAumuDC
recA
uvrA uvrY ruvA
polB
Fig 6. Graphs inferred by the di erent methods on experiment 2 data. Lasso penalties are
chosen so as to maximize the BIC criterion. True positives are drawn in black while false
positives are shown in dashed gray.
the proposed approach outperforms similar methods. Even when regulators and
regulatees cannot a priori been distinguished through analysis of the literature,
inference of the classi cation improves a lot the performances of the Lasso. It
therefore seems good to advice that, whenever available, knowledge about poten-
tial transcription factors should be taken into account and that basic knowledge
Page 23
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 23
on the topology of biological networks should not be omitted in the modeling
process. We also want to emphasize the fact that this method reaches great
results on networks of reasonable size for always reasonnable computing times.
References
C. Ambroise, J. Chiquet, and C. Matias. Inferring sparse Gaussian graphical
models with latent structure. Electronic Journal of Statistics, 3:205{238, 2009.
O. Banerjee, L. El Ghaoui, and A. d'Aspremont. Model selection through sparse
maximum likelihood estimation for multivariate Gaussian or binary data. J.
Mach. Learn. Res., 9:485{516, 2008.
P. Bickel, Y. Ritov, and A. Tsybakov. Simultaneous analysis of lasso and dantzig
selector. Annals of Statistics, 37:1705{1732, 2009.
R. Castelo and A. Roverato. A robust procedure for Gaussian graphical model
search from microarray data with p larger than n. J. Mach. Learn. Res., 7:
2621{2650, 2006.
J. Chiquet, A. Smith, G. Grasseau, C. Matias, and C. Ambroise. Simone: Sta-
tistical inference for modular networks. Bioinformatics, 25(3):417{418, 2009.
A.P. Dempster. Covariance selection. Biometrics, Special Multivariate Issue,
28:157{175, 1972.
M. Drton and M.D. Perlman. Multiple testing and error control in Gaussian
graphical model selection. Statist. Sci., 22:430, 2007.
B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression.
Ann. Statist., 32(2):407{499, 2004.
A Juditsky and A. Nemirovsky. On veri able sucient conditions for sparse
signal recovery via `1 minimization. ArXiv, 2008. URL http://arxiv.org/
abs/0809.2650.
K. Knight and W. Fu. Asymptotics for lasso-type estimators. Ann. Statist., 28
(5):1356{1378, 2000.
S. Lebre. Inferring dynamic genetic networks with low order independencies.
Statistical Applications in Genetics and Molecular Biology, 8(1), 2009.
B. Marlin, M. Schmidt, and K. Murphy. Group sparse priors for cavariance
estimation. In Uncertainty in Arti cial Intelligence, 2009.
N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selec-
tion with the lasso. Ann. Statist., 34(3):1436{1462, 2006.
N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for
high-dimensional data. Annals of statistics, 2008.
R. Opgen-Rhein and K. Strimmer. Learning causal networks from systems
biology time course data: an e ective model selection procedure for the vector
autoregressive model. BMC Bioinformatics, 8, 2007.
M.R. Osborne, B. Presnell, and B.A. Turlach. On the LASSO and its dual. J.
Comput. Graph. Statist., 9(2):319{337, 2000.
M. Ronen, R. Rosenberg, B. Shraiman, and U. Alon. Assigning numberes to the
arrows: parametrizing a gene regulation network by using accurate expression
kinetics. PNAS, 99(16):10555{10560, 2002.
Page 24
hidden
Charbonnier, Chiquet, Ambroise/Structured Dynamic Regulation Network inference 24
J. Schafer and K. Strimmer. A shrinkage approach to large-scale covariance
matrix estimation and implications for functional genomics. Statistical Ap-
plications in Genetics and Molecular Biology, 4(1), 2005.
T. Shimamura, S. Imoto, R. Yamaguchi, and S. Miyano. Weighted lasso in
graphical gaussian modeling for large gene network estimation based on mi-
croarray data. Genome Informatics, 19:142 { 153, 2007.
T. Shimamura, S. Imoto, R. Yamaguchi, A. Fujita, M. Nagasaki, and S. Miyano.
Recursive regularization for inferring gene networks from time-course gene
expression pro les. BMC Systems Biology, 3(41), 2009.
P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, M.B. Eisen, P. Braow,
D. Botstein, and B. Futcher. Comprehensive identi cation of cell cycle-
regulated genes of the yeast saccharomyces cerevisiae by microarray hybridiza-
tion. Molecular Biology of the cell, 9:3273{3297, 1998.
R. Tibshirani. Regression shrinkage and selection via the lasso. J. Roy. Statist.
Soc. Ser. B, 58(1):267{288, 1996.
A. Wille and P. Buhlmann. Low-order conditional independence graphs for
inferring genetic networks. Statistical Applications in Genetics and Molecular
Biology, 5(1), 2006.
M. Yuan and Y. Lin. Model selection and estimation in regression with grouped
variables. Journal of the Royal Statistical Society, Series B, 68(1):49{67, 2006.
M. Yuan and Y. Lin. Model selection and estimation in the Gaussian graphical
model. Biometrika, 94(1):19{35, 2007.
P. Zhao and B. Yu. On model selection consistency of lasso. Journal of Machine
learning Research, 7:2541{2563, 2006.
S. Zhou, S. van de Geer, and P. Buhlmann. Adaptive lasso for high dimensional
regression and Gaussian graphical modeling. ArXiv, 2009. URL http://
arxiv.org/abs/0903.2515v1.
H. Zou. The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc.,
101(476):1418{1429, 2006.
H. Zou and Hastie T. Regularization and variable selection via the elastic net.
Journal of the Royal Statistical Society, 67:301{320, 2005.
H. Zou, T. Hastie, and R. Tibshirani. On the degrees of freedom of the lasso.
Ann. Statist., 34(5):2173{2192, 2007.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

12 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
50% Ph.D. Student
 
17% Assistant Professor
 
8% Student (Master)
by Country
 
25% France
 
17% Sweden
 
17% United Kingdom