Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era.
- ISSN: 02659247
- DOI: 10.1002/bies.10385
- PubMed: 14696046
Abstract
It is considered in some quarters that hypothesis-driven methods are the only valuable, reliable or significant means of scientific advance. Data-driven or 'inductive' advances in scientific knowledge are then seen as marginal, irrelevant, insecure or wrong-headed, while the development of technology-which is not of itself 'hypothesis-led' (beyond the recognition that such tools might be of value)-must be seen as equally irrelevant to the hypothetico-deductive scientific agenda. We argue here that data- and technology-driven programmes are not alternatives to hypothesis-led studies in scientific knowledge discovery but are complementary and iterative partners with them. Many fields are data-rich but hypothesis-poor. Here, computational methods of data analysis, which may be automated, provide the means of generating novel hypotheses, especially in the post-genomic era.
Author-supplied keywords
Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era.
what is the hypothesis?
The complementary roles of
inductive and hypothesis-driven
science in the post-genomic era
Douglas B. Kell
1
* and Stephen G. Oliver
2
Summary
It is considered in some quarters that hypothesis-driven
methods are the only valuable, reliable or significant
means of scientific advance. Data-driven or ‘inductive’
advances in scientific knowledge are then seen as
marginal, irrelevant, insecure or wrong-headed, while
the development of technology—which is not of itself
‘hypothesis-led’ (beyond the recognition that such tools
might be of value)—must be seen as equally irrelevant to
the hypothetico-deductive scientific agenda. We argue
here that data- and technology-driven programmes are
not alternatives to hypothesis-led studies in scientific
knowledge discovery but are complementary and itera-
tive partners with them. Many fields are data-rich but
hypothesis-poor. Here, computational methods of data
analysis, which may be automated, provide the means of
generating novel hypotheses, especially in the post-
genomic era. BioEssays 26:99–105, 2004.
2003 Wiley Periodicals, Inc.
‘‘Simply gathering data without having any specific question
in mind is an approach to science that many people are
doubtful about. Modern science is supposed to be mostly
hypothesis-driven’. ...My first studies of the worm lineage
didn’t require me to ask a question (other than ‘What happens
next?’). They were pure observation, gathering data for the
sake of seeing the whole picture. ...This kind of project suits
me—it’s never bothered me that it doesn’t involve bold
theories or sudden leaps of understanding, or indeed that it
doesn’t usually attract the same level of recognition as they
do.’’ John Sulston.
(1)
Introduction
Thegeneration and testing of hypotheses is widely considered
to be the primary method by which Science progresses. So
much so that it is still common, in some circles, to find a
scientific proposal or an intellectual argument damned on the
grounds that ‘‘it has no hypothesis being tested’’, ‘‘it ismerely a
fishing expedition’’, and soon. Extremeversions run ‘‘if there is
no hypothesis, it is not Science’’, the clear implication being
that hypothesis-driven programmes (as opposed to data-
driven studies or technology development) are the only con-
tributor to the scientific endeavour. In our view, such divisive or
exclusive views—possibly based on amisreading of Popper
(2)
and/or more readable commentators such as Medawar
(3)
—
misrepresent the complex intellectual and social intricacies
that more correctly characterise the generation of knowledge
and understanding from the study of natural phenomena and
laboratory experiments.
A discussion of some of these important issues
(4)
was
initiated in this journal by John Allen,
(5)
and elicited some
further debate.
(6–10)
However, the somewhat polemical start-
ing position
(5)
inevitably organised the combatants into an
either-or view that is too simplistic. The purpose of this essay is
to promote the view that the hypothesis-driven and inductive
modes of reasoning are not competitive but complementary
(see also Ref. 11). Our motivation, in part, is to understand the
failure of the prevailing scientific practices to have predicted
the existence of so many genes (many of them essential) that
were uncovered by the systematic genome sequencing
programs,
(12)
and to rehearse the relative roles of inductive
expression profiling methods, technology development and
scientific hypothesis testing in post-genomic systems biology.
Abstractions and data
It is commonplace in philosophy to distinguish the world of
the mind, knowledge, ideas, thoughts, hypotheses, rules and
other mental constructs from physical and material reality as
perceived byour senses ormeasuredby our instruments (data
or observations). Tomake things simple, we refer to these two
elements as Ideas and Data, respectively. This is the first
important distinction to make (Fig. 1), and recasts our ques-
tions in terms of the nature of the form of the relationship
between Ideas and Data. It is (we hope) obvious that (i) the
logical means of going from one to the other depend on the
BioEssays 26:99–105, 2003 Wiley Periodicals, Inc. BioEssays 26.1 99
1
Department of Chemistry, UMIST, Manchester, UK.
2
School of Biological Sciences, University of Manchester, UK
*Correspondence to: Douglas B. Kell, Faraday Building, Sackville
Street, PO Box 88, Manchester M60 1QD, UK.
E-mail: dbk@umist.ac.uk
DOI 10.1002/bies.10385
Published online in Wiley InterScience (www.interscience.wiley.com).
Commentary
other,
(13)
and (ii) the process is to be seen as an iterative cycle.
Logical inference: deduction, induction
and abduction
Inference is the derivation of new facts from existing facts or
premises by any acceptable form of reasoning. Three main
types of logical inference are deduction, induction and abduc-
tion. The direction of (hypothetico-)deductive reasoning is
from Ideas toData;
(14,15)
anexperimenter hasan idea, designs
and performs a controlled experiment with a predicted
outcome that leads (for a well-designed experiment) to data
that are either consistent or inconsistent with the hypothesis.
(2)
The distinction between abduction and induction is not
settled
(16)
and, for our purposes, we combine the two under
one heading—induction, and note the key point that they go
from Data to Ideas. This is generalisation from cases, and can
also be seen as going from effect to cause, a process referred
to as ‘inverse entailment’. Thus, by deduction, we can say IF it
rained (cause), THEN the grass will be wet (effect). However,
we cannot with certainty invert the argument to read IF the
grass is wet (effect) THEN it has rained (cause) as the wetting
might have been done with a garden hose.
The reason that Deduction appears to enjoy preferred
philosophical status then seems to be that if the axiom and the
observation are correct the logical inference must be correct
(all whales are blue; George is a whale; therefore George
is blue). By contrast, induction is seen as being insecure
philosophically as it falls to counter-examples. If George is a
whale and is blue, Anne is awhale and is blue, Percy is awhale
and is blue, andsoon,we can induce the idea (hypothesis) that
all whales are blue. When Moby Dick comes along and is a
whale but white, the inductively generated hypothesis is found
to be false. The problem with this cosy distinction is that the
appearance of Moby Dick also falsifies the deductive version
(‘the great tragedy of Science: the slaying of a beautiful
hypothesis by an ugly fact’—T.H. Huxley). Of course, in the
real world, we know that preferred hypotheses survive any
number of inconvenient facts.
(17)
We thus see that the great
philosophical preference for deduction has no genuinely
secure basis, but seems to be rooted in a qualitative logical
system that is based on a search for certainty or inevitability.
Neither of these is noticeably a property of the world of
complex, non-linear systems such as those that are the
hallmark of modern biology.
Cause and effect in post-genomic
science and systems biology
Parameters and variables
In a dynamical system, the parameters are the parts of the
system, which have values that are either controlled by the
experimenter or are invariant during the experiment. In
metabolic biochemistry, these might be parameters such as
thepH, or the k
cat
of anenzyme.Variables are those things that
change during an experiment as a result of a change in the
parameters. In metabolic biochemistry, the variables include
metabolic fluxes and concentrations. Although it is often very
desirable to be able to estimate the parameters from the vari-
ables (see, for example, Refs. 18,19 and see later), the
parameters are the causes and the variables the effects. (Note
however that the time elapsed during an experiment is often
regarded as an ‘honorary’ variable.)
Pre- and post-genomics
The cause–effect relationship for genetics/genomics and
observable phenotypes is, of course, that the phenotype is
caused by the genotype, not vice versa, although it is poss-
ible to infer the genotype from the phenotype. Similarly, our
perception of the relationship between gene and function
(however defined
(20)
) depends, as in Fig. 1, on the direction
involved. Pre-genomic molecular biology tended to be ‘func-
tion first’ and sought genes that were involved in providing that
function. Post-genomics starts with, nominally, all the genes,
for many of which there is no corresponding biochemical
activity or function known, and is thus ‘gene first’.
(21)
Why,
then, did the hypothesis-driven mode of reasoning fail to find
the approximately 40% of the genes that were uncovered,
even in well-worked model organisms, after whole-genome
sequencing methods were applied? We think that the main
Figure 1. Scientific advance may be seen as an iterative
cycle linking knowledge and observations. The hypothetico-
deductive mode of reasoning uses background knowledge to
construct a hypothesis that is tested experimentally to produce
observations. This is only half the story, however, as the
inductive mode of reasoning is based purely on generalising
rules (or hypotheses) fromexamples, i.e. it is purely data-driven
(and the hypothesis is the end, not the beginning). Because of
the high dimensionality of typical data, computer-intensive
methodsare required to turn thedata into knowledge.Weargue
here that scientific advance should exploit both deductive and
inductive modes of reasoning in an iterative cycle.
Commentary
100 BioEssays 26.1
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime



