Model selection in ecology and evolution.
- PubMed: 16701236
Abstract
Recently, researchers in several areas of ecology and evolution have begun to change the way in which they analyze data and make biological inferences. Rather than the traditional null hypothesis testing approach, they have adopted an approach called model selection, in which several competing hypotheses are simultaneously confronted with data. Model selection can be used to identify a single best model, thus lending support to one particular hypothesis, or it can be used to make inferences based on weighted support from a complete set of competing models. Model selection is widely accepted and well developed in certain fields, most notably in molecular systematics and mark-recapture analysis. However, it is now gaining support in several other areas, from molecular evolution to landscape ecology. Here, we outline the steps of model selection and highlight several ways that it is now being implemented. By adopting this approach, researchers in ecology and evolution will find a valuable alternative to traditional null hypothesis testing, especially when more than one hypothesis is plausible.
Model selection in ecology and evolution.
evolution
Jerald B. Johnson
1
and Kristian S. Omland
2
1
Conservation Biology Division, National Marine Fisheries Service, 2725 Montlake Boulevard East, Seattle, WA 98112, USA
2
Vermont Cooperative Fish & Wildlife Research Unit, School of Natural Resources, University of Vermont, Burlington, VT 05405, USA
Recently, researchers in several areas of ecology and
evolution have begun to change the way in which they
analyze data and make biological inferences. Rather
than the traditional null hypothesis testing approach,
they have adopted an approach called model selection,
in which several competing hypotheses are simul-
taneously confronted with data. Model selection can be
used to identify a single best model, thus lending sup-
port to one particular hypothesis, or it can be used to
make inferences based on weighted support from a
complete set of competing models. Model selection is
widely accepted and well developed in certain fields,
most notably in molecular systematics and mark–
recapture analysis. However, it is now gaining support
in several other areas, frommolecular evolution to land-
scape ecology. Here, we outline the steps of model
selection and highlight several ways that it is now being
implemented. By adopting this approach, researchers in
ecology and evolution will find a valuable alternative
to traditional null hypothesis testing, especially when
more than one hypothesis is plausible.
Science is a process for learning about nature in which
competing ideas about how the world works are evaluated
against observations [1]. These ideas are usually expressed
first as verbal hypotheses, and then as mathematical
equations, or models. Models depict biological processes
in simplified and general ways that provide insight into
factors that are responsible for observed patterns. Hence,
the degree to which observed data support a model also
reflects the relative support for the associated hypothesis.
Two basic approaches have been used to draw biological
inferences. The dominant paradigm is to generate a null
hypothesis (typically one with little biological meaning [2])
and ask whether the hypothesis can be rejected in light
of observed data. Rejection occurs when a test statistic
generated from observed data falls beyond an arbitrary
probability threshold (usually P ,0.05), which is inter-
preted as tacit support for a biologically more meaningful
alternative hypothesis. Hence, the actual hypothesis of
interest (the alternative hypothesis) is accepted only in the
sense that the null hypothesis is rejected.
By contrast, model selection offers a way to draw
inferences from a set of multiple competing hypotheses.
Model selection is grounded in likelihood theory, a robust
framework that supports most modern statistical
approaches. Moreover, this approach is rapidly gaining
support across several fields in ecology and evolution as a
preferred alternative to null hypothesis testing [1,3,4].
Advocates of model selection argue that it has three
primary advantages. First, practitioners are not restricted
to evaluating a single model where significance is measured
against some arbitrary probability threshold. Instead,
competing models are compared to one another by evalu-
ating the relative support in the observed data for each
model. Second, models can be ranked and weighted, thereby
providing a quantitative measure of relative support for
each competing hypothesis. Third, in cases where models
have similar levels of support from the data, model averag-
ing can be used to make robust parameter estimates and
predictions. Here, we review the steps of model selection,
overview several fields where model selection is commonly
used, indicate how model selection could be more broadly
implemented and, finally, discuss caveats and areas of
future development in model selection (Box 1).
How model selection works
Generating biological hypotheses as candidate models
Model selection is underpinned by a philosophical view
that understanding can best be approached by simul-
taneously weighing evidence for multiple working hypo-
theses [1,3,5]. Consequently, the first step in model
selection lies in articulating a reasonable set of competing
hypotheses. Ideally, this set is chosen before data collection
and represents the best understanding of factors thought
to be involved in the process of interest. Hypotheses that
originate in verbal or graphical form must be translated
to mathematical equations (i.e. models) before being fit to
Box 1. The big picture
† Biologists rely on statistical approaches to draw inferences about
biological processes.
† In many fields, the approach of null hypothesis testing is being
replaced by model selection as a means of making inferences.
† Under the model selection approach, several models, each repre-
senting one hypothesis, are simultaneously evaluated in terms of
support from observed data.
† Models can be ranked and assigned weights, providing a quanti-
tative measure of relative support for each hypothesis.
† Where models have similar levels of support, model averaging
can be used to make robust parameter estimates and predictions.
Corresponding author: Jerald B. Johnson ( jerry.johnson@noaa.gov).
Review TRENDS in Ecology and Evolution Vol.19 No.2 February 2004
www.sciencedirect.com 0169-5347/$ - see front matter q 2004 Published by Elsevier Ltd. doi:10.1016/j.tree.2003.10.013
identifying variables and selecting mathematical func-
tions that depict the biological processes through which
those variables are related (Box 2).
Fitting models to data
Once a set of candidate models is specified, each model
must be fit to the observed data. At an early stage of the
analysis, one can examine the goodness-of-fit of the most
heavily parameterized (i.e. global) model in the candidate
set [3]. Such goodness-of-fit can be assessed using con-
ventional statistical tests (e.g. x
2
tests or G-tests) [7] or a
PARAMETRIC BOOTSTRAP procedure (see Glossary). If the
global model provides a reasonable fit to the data, then
the analysis proceeds by fitting each of the models in the
candidate set to the observed data using the method of
MAXIMUM LIKELIHOOD or the method of LEAST SQUARES.
Selecting a best model or best set of models
Model selection is frequently employed as a way to identify
the model that is best supported by the data (referred to as
the ‘best model’) from among the candidate set. In other
words, it can be used to identify the hypothesis that is best
supported by observations. Two fundamentally different
approaches are frequently used to address this in ecology
and evolution (Box 3). One is to use a series of null
hypothesis tests, such as LIKELIHOOD RATIO TESTS in
phylogenetic analysis or F–tests in multiple regression
analysis, to compare pairs of models from among the
candidate set. However, this approach is typically
restricted to nested models (i.e. the simpler model is a
special case of themore complexmodel) and, in some cases,
leads to suboptimal models that are dependent upon the
hierarchical order in which models are compared [8].
Moreover, such tests cannot be used to quantify the
relative support for the various models.
By contrast, model selection criteria can be used to rank
competing models and to weigh the relative support for
each one. These techniques utilize maximum likelihood
scores as a measure of fit (more precisely, negative
Glossary
Akaike information criterion (AIC): an estimate of the expected Kullback–
Leibler information [3] lost by using a model to approximate the process that
generated observed data (full reality). AIC has two components: negative log-
likelihood, which measures lack of model fit to the observed data, and a bias
correction factor, which increases as a function of the number of model
parameters.
Akaike weight: the relative likelihood of the model given the data. Akaike
weights are normalized across the set of candidate models to sum to one, and
are interpreted as probabilities. A model whose Akaike weight approaches 1 is
unambiguously supported by the data, whereas models with approximately
equal weights have a similar level of support in the data. Akaike weights
provide a basis for model averaging (Box 4).
Least squares: a method of fitting a model to data by minimizing the squared
differences between observed and predicted values.
Likelihood ratio test: a test frequently used to determine whether data support
a fuller model over a reduced model (Box 3). The fuller model is accepted as
best when the likelihood ratio (reduced model negative log-likelihood: full
model negative log-likelihood) is sufficiently large that the difference is
unlikely to have occurred by chance (i.e. P , 0.05).
Maximum likelihood: a method of fitting a model to data by maximizing an
explicit likelihood function, which specifies the likelihood of the unknown
parameters of the model given the model form and the data. Parameter values
associated with the maximum of the likelihood function are termed the
maximum likelihood estimates of that model.
Model averaging: a procedure that accounts for model selection uncertainty
(defined below) in order to obtain robust estimates of model parameters ð
^
uÞ or
model predictions ð ^yÞ (Box 4). A weighted average of the model-specific
estimates of
^
u or ^y is calculated based on the Akaike weight [3] (or posterior
probabilities if estimated using a Bayesian approach [48]) of each model.
Where
^
u does not appear in a model, the value of zero is entered.
Model selection bias: bias favoring models with parameters that are over-
estimated; such bias can be overcome during model averaging by entering the
value 0 for parameters when they are not already included in the particular
models to be averaged.
Model selection uncertainty: uncertainty about parameter estimates or model
predictions that arises from having selected the model based on observations
rather than actually knowing the best approximating model. Model selection
uncertainty can be accounted for using model averaging.
Parametric bootstrap: a statistical technique in which new data are generated
from Monte Carlo simulations of the fitted model. A measure offit (typically the
deviance) is then computed, both for the model fit to the observed data, and for
the model fit to the simulated data. If the deviance of the model fit to the
observed data falls within the core of the distribution of the deviance of model
fit to the simulated data, then the model is said to fit the data adequately.
Parsimony: in statistics, a tradeoff between bias and variance. Too few
parameters results in high bias in parameter estimators and an underfit model
(relative to the best model) that fails to identify all factors of importance. Too
many parameters results in high variance in parameter estimators and an
overfit model that risks identifying spurious factors as important, and that
cannot be generalized beyond the observed sample data.
Schwarz criterion (SC) (also known as the Bayesian information criterion) [10]:
a model selection criterion designed to find the most probable model (from a
Bayesian perspective) given the data (Box 3). Superficially similar to AIC
c
,SC
has two components: negative log-likelihood, which measures lack of fit, and
a penalty term that varies as a function of sample size and the number of
model parameters. SC is equivalent (under certain conditions) to the natural
logarithm of the Bayes factor [48].
Box 2. From multiple working hypotheses to a set of
candidate models
To use model selection, verbal hypotheses must be translated to
mathematical models. Ideally, the parameters of such models
have direct biological interpretation, but translating hypotheses to
meaningful models (as opposed to statistically arbitrary models,
e.g. ANOVA or linear regression) is not always intuitive. Hence,
we offer some guidance about how to get from multiple working
hypotheses to a set of candidate models [2,6].
The first step is to specify variables in the model. Variables should
correspond directly to causal factors outlined in the verbal hypo-
theses. The second step is to decide on the functions that define the
relationship between independent variables and the response vari-
able in terms of mathematical operators and parameters. In fields
where model selection is commonly used (Box 5), appropriate
functions can be found in published literature or tailored software
[45,46]. In other fields, suitable models can be found in theoretical
literature or borrowed from other disciplines. The third step is to
define the error structure of the model.
Generating hypotheses and translating them to models is an
iterative process. For example, one hypothesis might seem to be
equally well depicted by two or more models, including different
error structures. In such cases, the verbal rendition of the hypothesis
must be refined so that there is a one-to-one mapping from hypo-
thesis to model. This can lead to an increase in the number of working
hypotheses; however, care should be taken not to include models
with functional relationships among variables that are not interpret-
able. In this regard, model selection differs from data dredging,
where the analyst explores all possible models regardless of the
interpretability of their functions, or continues to develop models to
be tested after analysis is underway [3].
Ultimately, the number of candidate models should be small
(some argue, on philosophical grounds, that this should be fewer
than 20 [3]). The guiding principle at this step is to avoid generating
so many models that spurious findings become likely. Moreover, one
should avoid relying on computing power to fit all available models
in lieu of identifying a bona fide candidate set.
Review TRENDS in Ecology and Evolution Vol.19 No.2 February 2004
102
www.sciencedirect.com
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime





