Sign up & Download
Sign in

Functional annotation and network reconstruction through cross-platform integration of microarray data.

by Xianghong Jasmine Zhou, Ming-Chih J Kao, Haiyan Huang, Angela Wong, Juan Nunez-Iglesias, Michael Primig, Oscar M Aparicio, Caleb E Finch, Todd E Morgan, Wing Hung Wong show all authors
Nature Biotechnology (2005)

Abstract

The rapid accumulation of microarray data translates into a need for methods to effectively integrate data generated with different platforms. Here we introduce an approach, 2(nd)-order expression analysis, that addresses this challenge by first extracting expression patterns as meta-information from each data set (1(st)-order expression analysis) and then analyzing them across multiple data sets. Using yeast as a model system, we demonstrate two distinct advantages of our approach: we can identify genes of the same function yet without coexpression patterns and we can elucidate the cooperativities between transcription factors for regulatory network reconstruction by overcoming a key obstacle, namely the quantification of activities of transcription factors. Experiments reported in the literature and performed in our lab support a significant number of our predictions.

Cite this document (BETA)

Available from www.ncbi.nlm.nih.gov
Page 1
hidden

Functional annotation and network reconstruction through cross-platform integration of microarray data.

Functional annotation and network reconstruction
through cross-platform integration of microarray data
Xianghong Jasmine Zhou1,2, Ming-Chih J Kao2,3, Haiyan Huang2,4, Angela Wong1,5, Juan Nunez-Iglesias1,
Michael Primig6, Oscar M Aparicio1, Caleb E Finch1,5, Todd E Morgan1,5 & Wing Hung Wong2,7
The rapid accumulation of microarray data translates into a
need for methods to effectively integrate data generated with
different platforms. Here we introduce an approach, 2nd-order
expression analysis, that addresses this challenge by first
extracting expression patterns as meta-information from each
data set (1st-order expression analysis) and then analyzing
them across multiple data sets. Using yeast as a model
system, we demonstrate two distinct advantages of our
approach: we can identify genes of the same function yet
without coexpression patterns and we can elucidate the
cooperativities between transcription factors for regulatory
network reconstruction by overcoming a key obstacle, namely
the quantification of activities of transcription factors.
Experiments reported in the literature and performed in our
lab support a significant number of our predictions.
Microarray gene expression profiling is now done in many laboratories,
resulting in the rapid accumulation of data in public repositories1,2.
Despite recent advances in analysis techniques, several important
challenges remain. (i) There is an urgent need for methods to effectively
integrate multiple microarray data sets. Gene expression values gener-
ated with different platforms (such as spotted cDNA or Affymetrix
high-density oligonucleotide arrays) are not directly comparable. Even
within the same technology, alternative experimental parameters result
in systematic variations among data sets often beyond the capability of
statistical normalization. (ii) There is a lack of algorithms that can
identify functionally related genes which do not have similar expression
patterns. Most methods for functional analysis of microarray data
make the implicit assumption that genes with similar expression
profiles have similar functions3,4. However, among genes involved in
the same pathway, many gene pairs do not show similar expression
profiles5. (iii) The reconstruction of transcriptional regulatory networks
remains the key challenge for microarray analysis. A major issue is the
measurement of transcription factor activities because changes in their
expression are often subtle and their activities are often controlled at
levels other than expression. This further leads to difficulties in the
elucidation of cooperativity between transcription factors. Recently,
several approaches have been proposed to address some of these
individual problems5–7, yet there remains a lack of unified frameworks
that can simultaneously respond to these challenges.
Here we introduce an approach termed 2nd-order expression
analysis, which we will show to be useful in overcoming the three
aforementioned problems. We define 1st-order expression analysis as
the extraction of expression patterns from one microarray data set,
which contains a set of expression profiles measured under relevant
conditions. We propose 2nd-order expression analysis as a study of the
correlated occurrences of those expression patterns across multiple
data sets measured under different types of conditions (e.g., starva-
tion, heat shock). By first extracting expression patterns as meta-
information from each data set and then analyzing them compara-
tively, the results are not affected by variations among data sets. This
allows integration of multiple microarray data sets in a platform-
independent manner. Here, we apply 2nd-order analysis to 618 yeast
expression profiles comprising 39 cDNA or Affymetrix array data sets
to group genes that have the same function but may not be
coexpressed, to annotate their functions, to quantify the activity
profiles of transcription factors and reconstruct regulatory networks.
We illustrate 2nd-order expression analysis with a simple case, the
analysis of expression patterns of coexpressed gene pairs. If a pair of
genes is tightly coexpressed in multiple data sets, the genes are likely to
be functionally linked. We term such gene pairs doublets. Our first
objective is to find pairs of such doublets that simultaneously exhibit
either high or low expression correlations across multiple data sets,
that is, simultaneously turn on or off their functional links over
different types of conditions. Such a set of four genes, termed a
quadruplet, is likely to be functionally related, even though the global
expression profiles of those genes do not exhibit gross similarities (see
an example in Fig. 1). We identify quadruplets using a two-step
procedure: (i) calculate the expression correlations of the doublet in
each of the data sets and store them in a vector, termed 1st-order
expression correlation profile; (ii) calculate the correlation between
two 1st-order profiles to generate the 2nd-order expression correlation,
and define those pairs of doublets with high 2nd-order correlations as
quadruplets. Throughout the paper, an expression correlation or a
Published online 16 January 2005; doi:10.1038/nbt1058
1Program in Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089-1113, USA. 2Department of Biostatistics, Harvard
School of Public Health, Boston, Massachusetts 02115, USA. 3School of Medicine, University of Michigan, Ann Arbor, Michigan 48109, USA. 4Department of Statistics,
University of California, Berkeley, California 94720, USA. 5Andrus Gerontology Center, University of Southern California, Los Angeles, California 90089-0191, USA.
6Biozentrum & Swiss Institute of Bioinformatics, University of Basel, CH-4056 Basel, Switzerland. 7Department of Statistics, Harvard University, Boston, Massachusetts
02138-2901, USA. Correspondence should be addressed to X.J.Z. (xjzhou@usc.edu) or W.H.W. (whwong@stanford.edu).
2 38 VOLUME 23 NUMBER 2 FEBRUARY 2005 NATURE BIOTECHNOLOGY
L E T T E R S
©
20
05
N
at
ur
e
Pu
bl
is
hi
ng
G
ro
u
p
h
ttp
://
w
w
w.
n
at
ur
e.
co
m
/n
at
ur
eb
io
te
ch
no
lo
gy

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

44 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
30% Ph.D. Student
 
20% Post Doc
 
14% Researcher (at an Academic Institution)
by Country
 
30% United States
 
16% United Kingdom
 
7% Spain