Evaluating Integrated Assessment Tools for Policy Support
In Practice (2010)
- ISBN: 9789048136193
- DOI: 10.1007/978-90-481-3619-3
Available from
Jacques-Eric Bergez's profile on Mendeley.
or
Available from
Jacques-Eric Bergez's profile on Mendeley.
Page 1
Evaluating Integrated Assessment Tools for Policy Support
1
Chapter 10
Evaluating integrated assessment tools for policy
support
Jacques-Eric Bergez, Marijke Kuiper, Olivier Thérond, Marie Taverne, Hatem Belhouchette
and Jacques Wery
Abstract
Integrated Assessment Modelling tools are complex tools requiring specific evaluation
methodologies. Based on the example of the SEAMLESS-Integrated framework, we show
how the conceptual, technical and system evaluation steps of the different components
(procedures, quantitative models, graphic user interfaces) were performed by a
multidisciplinary team. To make the not-yet-available tool real, mock-up and test cases were
mobilized throughout the development process in order to integrate final end-users in the
evaluation process. The main lessons from the project are that the evaluation required: i) the
use of prototypes to advance properly in the design and testing (spiral methodology); ii) the
use of case studies to stick to the end-users requirements; iii) a proper timing of development
and delivery in order to keep on schedule and leave time to the evaluation process; iv) a
multidisciplinary team of evaluators as tools are of diverse types and v) that it is difficult to
keep independence between testers, end-users and modellers in order to guaranty transparency
in the development and evaluation process.
Page 2
2
Key words:
Quantitative Evaluation, Qualitative Evaluation, Prototyping, Integrated Assessment
Introduction
During recent decades Integrated Assessment (IA) tools (van Ittersum et al., 2008) have been
developed and used widely in order to provide answers to complex questions arising from
policy-makers (Rotmans 1998; Harris 2002; Toth 2003). Success of such tools lay in the fact
that they provide a conceptual and operational framework to assess the effectiveness and
trade-off among different policy options, while integrating the knowledge from various
disciplines and/or stakeholders which is required for dealing with complex system issues and
associated interactions and feedbacks (Rotmans 1998; Tol and Vellinga 1998; Toth 2003).
Their policy-orientation makes evaluation of IA tools a critical issue. Aiming to support
policy decisions, results from the assessment studies need to be robust enough to be useful in
the political arena but also sensitive enough to distinguish and choose between different
policy proposals. IA tools are however by definition complex and encompass paradigms,
methods and tools from different disciplines. Developing an operational IA methodology that
provides useful results for policy decisions is already a tall order; evaluating such a system is
even more challenging (Lopez 2003). Not only do individual components of the system need
to be tested, which could be done in accordance with the evaluation procedures of a specific
discipline from which the component originates. In addition the system as a whole needs to be
evaluated, assessing whether interactions between different components originating from
different disciplines perform properly. Evaluation is then likely to be less dependent on the
conventional and classical peer review and history matching and more dependant upon
protocols and tests yet to be developed (Parker et al. 2002). In practice development of an
Page 3
3
operational IA methodology requires extensive resources, leaving limited time and funds for
evaluation.
This chapter reflects on the evaluation of IA tools, looking at different types of
components of an IA methodology as well as their interactions. The aim of this chapter is to
derive general lessons and ideas for evaluating IA tools from the practical experiences gained
from the SEAMLESS project. We first offer the evaluation approach used in SEAMLESS. It
consists of three steps (conceptual evaluation, technical evaluation, system evaluation), each
of which is discussed in more detail in subsequent sections. In each of these sections we
discuss evaluation of three types of components of an IA methodology: procedures,
quantitative tool and graphical user interfaces. The different character of these components
gives rise to different evaluation approaches. The last section concludes by summarizing the
lessons learned from SEAMLESS which may be useful for others projects aiming at
evaluating IA tools.
Deriving a general approach to evaluate IA tools
Many methods and tools have been developed to deal with IA objectives and constraints (van
Itterstum et al. 2008). Broadly speaking, they can be split into two groups: analytical
(embracing models, scenarios and risk analysis) and participatory (including dialogue
methods, policy exercises and mutual learning methods) (Rotmans 1998). Among these
methods, Integrated Assessment and Modelling (IAM) includes a variety of quantitative
models as well as scenario-based approaches (Sharma and Norton 2005). Such tools aim to
support managers to control uncertainty when they are making decisions about future options.
When making policy decisions, scenarios enable policy makers to anticipate by exploring
possible futures and to assess different alternatives according to their potential consequences
Page 4
4
(van Notten et al. 2003; Börjeson et al. 2006). Scenario-based approaches tell ‘highly detailed,
logically consistent stories about the future’ (Sharma and Norton 2005). Model-based
approaches aim at describing quantitatively as accurately as possible the causal relationships
and interactions between the various components of the system under study, in reaction to
external constraints and endogenous changes. Scenario-based approaches can use simulation
results from quantitative models.
Since IAM tools aim at addressing policy questions, their evaluation needs to look
beyond the classical evaluation processes of quantitative models (Oreskes et al. 1994; Rykiel
1996; Sinclair and Seligman 2000). The majority of literature on evaluation deals with
verification and validation of Decision Support Systems (DSS), which are decision-oriented
and model-based like IAM tools (Finlay and Wilson 1991; Mosqueira-Rey and Moret-Bonillo
2000; Brunner and Starkl 2004; Jakeman et al. 2006; Sojda 2007). Evaluation of DSS can be
separated into verification (‘building the system correctly’) and validation (‘building the right
system for a given purpose’) (Boehm (1981) cited by Mosqueira-Rey and Moret-Bonillo
2000). The verification step ensures that the DSS is internally complete, coherent, and logical
from a modelling and programming perspective (Sodja 2007). Validation is less concerned
with internal operation of the software and more concerned with its output and its usefulness
to the user. It analyzes whether the decision support system addresses the user’s problem
(e.g.: making better decisions, avoiding bad ones, or helping the user to take these decisions
more quickly or with less data, information, and knowledge). This attention for suitability to
address the user’s problem sets DSS and IAM tools apart from the classical evaluation of
numerical models, like ecological and agronomical models (Parker et al. 2002).
For DSS development, iterative development-evaluation processes, such as spiral
methodology (Boehm 1988 cited by Mosqueira-Rey and Moret-Bonillo 2000) are advocated.
These methods allow incremental development and fast prototyping that are fundamental in
Page 5
5
the development of an intelligent system. In spiral methodology the final stage of each
development cycle is considered as an evaluation step of the quality of the developed product.
Successful implementation of DSS or IAM for decision-making relies on the use of
three evaluation dimensions (Adelman, 1992 cited by Sojda 2007): (1) examining the logical
consistency of the system’s algorithms (verification), (2) empirically testing the predictive
accuracy of the system (validation), and (3) documenting users’ satisfaction (validation).
Finlay and Wilson (1991) make a further distinction inside the testing of the predictive
accuracy of the system. They use ‘analytical validation’, for checking each part of the
modelling system, and ‘synoptic validation’ for checking that an acceptable output from the
whole modelling system is achieved for each set of inputs. Many authors add to this
validation one or several additional phases—commonly grouped together under the term
evaluation—assessing aspects of the tool beyond the validity of the final solutions. As a result
evaluation becomes an endeavour to analyse aspects such as utility, robustness, rapidity,
efficiency, extension possibilities, ease of use, credibility, etc. (Mosqueira-Rey and Moret-
Bonillo 2000).
Three phases to evaluate an integrated framework
In this chapter we focus on the case of the integrated framework (IF), SEAMLESS-IF.
SEAMLESS-IF has been developed for ex-ante assessment of alternative agricultural,
economic and environmental policy options and technical innovations for their impacts on the
sustainability of agricultural systems at European, national or regional levels (van Ittersum et
al. 2008). This framework is a model-based tool to support policy decisions by translating
policies to scenarios which can be analyzed through this tool. In terms of evaluation three
major components of SEAMLESS-IF can be distinguished:
Page 6
6
- IA procedures;
- quantitative tools (database, numerical models and indicators);
- and user interface of the computer system.
Two main procedures are part of SEAMLESS. The first deals with how SEAMLESS-IF may
be used in impact assessments. The second is more concerned with the assessment of the
institutional compatibility of proposed policies. The framework is built upon various types of
quantitative tools. SEAMLESS-IF includes several databases that not only provide inputs for
the model but also have stand-alone value in providing a single point of access to datasets at
different levels and in different domains. Reflecting the aim of IA there are models from
different domains and ranging from field to global market level which are linked to each
other, using a modular software to keep the framework open for further developments.
Indicators are the third group of quantitative tools which aim to integrate results from the
various models in a set of variables relevant to assessing the sustainability impact of policies.
Being designed as a computer-based tool there is an important role for the user interface in
bringing together the different components, while allowing for assessment of a wide range of
potential policy questions with SEAMLESS-IF.
In evaluating SEAMLESS-IF, we thus deal with a mixed bag of components requiring
an evaluation methodology that is both flexible and practical (i.e. aimed at getting the system
operational for the policy support for which it is designed). In order to influence the design of
SEAMLESS-IF, this evaluation took place throughout the project, with evaluation methods
reflecting the development of the framework over time (Figure 10.1).
Page 7
7
Figure 10.1 Three types of evaluation corresponding to development phases of an
integrated framework
Following different authors (Boehm 1988; Mosqueira-Rey and Moret-Bonillo 2000;
Sodja 2007) the participatory development strategy of SEAMLESS-IF was based on iterative
cycles of design and evaluation of the framework’s prototypes. This involved potential users
for some aspects and in any cases, feedbacks between ‘testers’ and developers. Prime users
(i.e. Directorate General of European Commission) were involved in these cycles by the way
of a User Forum (van Ittersum et al. 2008) and other potential users (regional and national
decision makers, stakeholder groups) have been periodically consulted. Starting from
individual components the first phase consists of designing the structure of the integrated
framework (for example by making mock-ups showing how the final system may look
without any functionality). Conceptual evaluation checks that the design of components and
the integrated framework will lead to the desired functionalities for end-users. This phase
precede the ‘verification phase’ as identified by Sodja (2007) with explicit attention to end-
user requirements. It aims at defining and clarifying the operational objectives of the tool
development (i.e. main expected capacities) and the future tool shape (e.g.: main
functionalities and material form) (Dieste et al. 2003).
Individual
components
Integrated
framework
Conceptual
evaluation
Technical
evaluation
System
evaluation
Design
structure of
framework
Integrate
components
in framework
Link all
components of
framework
Page 8
8
The second phase consists of building the framework from the individual components.
Technical evaluation assess whether the components function in technical terms while
keeping the objective of the integrated framework in mind (i.e. are the components designed
in such a way that they can be further integrated to perform the foreseen analyses). This
corresponds to the ‘verification phase’ as defined by Sodja (2007) which checks that the
system is complete, coherent and logical from a modelling and programming perspective. But
it is better defined by the ‘analytical validation’ (checking each part of the system) as defined
by Finlay and Wilson (1991). While more and more components are integrated during the tool
development, the evaluation shifts from testing individual components to combinations of
different components, evolving into the system evaluation of whether the framework as a
whole gives acceptable results for scientists which are useful for end-users. This phase
encompasses both tests of validity of solutions of the framework as a whole (“synoptic
validation” as defined by Finlay and Wilson (1991)) as well as analysis of solution time,
extension possibilities, ease of using the system etc. (as identified by Mosqueira-Rey and
Moret-Bonillo 2000).
Using case studies to guide evaluation and development of the integrated framework
Case studies played a crucial role in the evaluations of SEAMLESS-IF by providing, real
world applications of the integrated framework. Through the evaluations test cases have
become an important driver of the framework development.
In the case of SEAMLESS we opted for two types of test cases (Belhouchette et al.
2006a, 2006b, 2007). The first deals with trade liberalization consisting of lowering EU tariffs
on imports and abolishing EU export subsidies. This case study represents a macro (EU and
global level) policy in the economic domain for which the implications at all levels (from
Page 9
9
field to global level) and in all domains (economic, environmental and social) are assessed.
The second group of case studies challenges the system in the opposite directions. It consists
of an assessment, at regional level, of environmental policies (Nitrate Directive) in contrasting
EU regions in terms of biophysical conditions and cropping systems (Louhichi et al., 2008).
These case studies represent regional and environmental policies where new production
technologies play an important role and which are again assessed at different scales (field to
region) and in all domains. The test cases used different combinations of models, allowing to
test the core modelling chains of SEAMLESS-IF.
Apart from serving as different ways of evaluating the framework in a technical sense,
the policy relevance of the test cases (and the integrated framework in general) was assured
by presenting them to prime users (i.e. Directorate General of European Commission) and
other potential users (regional and national decision makers, stakeholder groups) at several
points during the project (van Ittersum et al. 2008).
These case studies proved to be useful for the development of the framework. The
multidisciplinary character of the test cases forced to reach consensus among disciplines to
arrive at a single and clear definition of the concepts and procedure (ontology) needed for the
framework. Hence after a large number of iterations among SEAMLESS participants with
different backgrounds a project assessment has been defined as a set of realistic scenarios,
based on economic, policy and technological changes. This project is first defined by the
spatial scales (resolution and extent) at which the scenario is assessed (e.g. field-region,
region-EU) which determines the appropriate modelling chain for the investigated IA
problem, as well as the possible economic, environmental and social indicators with which the
scenario should be assessed. After choosing scales and indicators, several experiments can be
developed by varying the external driving forces, the biophysical context and the policy
parameters (Thérond et al. 2009). In a nutshell, the case studies provided real time
Page 10
10
applications involving different disciplines which allowed derivation of the general approach
to implement the IA problem within SEAMLESS-IF and accordingly the graphical user
interface (GUI) functionalities to develop.
Conceptual evaluation
The first type of evaluation consists of assessing whether the foreseen integrated framework is
internally consistent and suited for addressing the purpose for which it is designed. The
specificity of this evaluation is that it used the specifications of the framework’s components
as described in the available project reports while components nor framework were yet
available for testing. In the case of SEAMLESS-IF the aim -a computer-based tool for ex-ante
assessments of alternative agricultural, economic and environmental policy options and
technical innovations- is too general for an evaluation. To create a reference point or yardstick
for evaluating the framework the test cases were used. These test cases provided the concrete
background of the prototype’s evaluations.
Conceptual evaluation of the procedure for using the integrated framework
Integrated assessment implies an analysis of policies from different perspectives or
disciplines, requiring cooperation between actors with different background. Procedures for
implementing policy assessments with the computer-based tools aid this cooperation by
structuring the exchange of ideas in such a way that policies can be translated into scenarios
that can be assessed by the available modelling chains.
The development of a procedure for using the framework is a way to ensure that the
integrated framework will be able to address the issues of interest for end-users. It is thus
Page 11
11
closely linked to the aim of a conceptual evaluation (checking that design of components and
the integrated framework will lead to the desired functionalities for end-users).
Evaluation criteria can be of very diverse abstraction levels. They can be a priori
defined or elaborated during the evaluation process (Darses et al. 2004; Darses 2002). For the
conceptual evaluation of the IA procedure only criteria of high abstraction level were pre-
defined, like relevance, understanding, etc. Commonly procedures are assessed by comparing
users’ organisational situations and working methods to spot possible incompatibilities
without involving the users and assuming sufficient knowledge of their working environment.
However, to ensure the best possible fit between the procedure, and more generally the
development of SEAMLESS-IF, and end-user requirements, potential end-users and
modellers from different disciplines were involved in the evaluation of the procedure.
Involving the end-users allowed inclusion of more concrete evaluation criteria during the
evaluation itself. Similar to the argument made with the test cases, care has to be taken to
ensure representativeness of the end-users involved in the evaluation for the targeted end-
users of the framework. Else there is a risk of gearing the development of the framework to
specific users in such a way that functionalities for other users become limited.
SEAMLESS-IF is foreseen to be used within a participatory process involving policy
experts who bring the problem to study and integrative modellers who set up the IA within
SEAMLESS-IF. PE are either in charge of policy design or interested in impacts of policies
designed by others. A specific procedure has been developed in which the framework use is a
component of the impact assessment procedures of the end-users (Figure 10.2). The later can
include additional tools before or after the scenario analysis done through SEAMLESS-IF
(see for example SEC, 2005 for the EU procedure).
The SEAMLESS-IF procedure consists of three main steps: pre-modelling (describing
the problem and defining indicators at levels and for domains relevant to the policy question
Page 12
12
at hand), modelling (selecting the appropriate models and defining scenarios needed to assess
the policy question at hand), and post-modelling (analyzing and presenting the model results)
(Thérond et al. 2009). This SEAMLESS-IF procedure had a strong influence on the design of
the GUI of SEAMLESS-IF, the evaluation of which will be discussed below.
For organizational reasons, evaluation of the procedure in conceptual, technical and
system terms was done during one session in order to facilitate the interactions with end-
users. In these sessions working situations, using graphical representations of the framework,
were simulated (the framework was still under development) to facilitate further design
(Béguin and Cerf 2004; Pastre 2005). Evaluation was done both through in-situ observations
of problem formulation and through users' opinions, highlighting problems met at each step.
Given that the framework was not yet operational, the evaluation focussed on the first steps
(pre-modelling – problem definition) of defining the policy question in a form suitable for
further analysis with SEAMLESS-IF.
Figure 10.2 The SEAMLESS-IF procedure in relation to the integrated assessment
procedures of end-users and the SEAMLESS-IF GUI
Impact
assessment
SEAMLESSIF
Premodelling
Modelling
Postmodelling
Typologiestools chain
selection and run
Results presentation
and analysis
Stakeholders
Other user
criteria
SEAMLESSIF
scenarios
Articulation between procedures
Procedure
Graphical
User interface
Problem description
Indicator selection
Page 13
13
The conceptual evaluation of the SEAMLESS-IF procedures was based on a role-game
simulating the joint problem definition by policy experts (end-users) and modellers (involved
in SEAMLESS-IF). The modellers were provided with several guidelines for defining the
problem in a SEAMLESS-IF compatible from which they could choose the most appropriate
one. This allowed to test different versions of the procedure. During the problem formulation
exercise somebody observed the discussion between policy experts and modellers. At the end
of the meeting, policy experts were questioned about points that did not emerged
spontaneously but were identified beforehand as being relevant. Each of the modellers filled a
questionnaire on these same aspects to ensure that all pre-defined evaluation criteria have
been covered.
The evaluations of the SEAMLESS-IF procedure were performed with eleven different
policy experts (or groups of experts) and eight modellers (or pairs of modellers). Policy
experts who were consulted are representative of different types of institutions (government,
resource management institution, agriculture advice services, local public institutions) acting
in different political fields such as water management, resource management, agriculture and
at different levels from local to regional and national. Modellers were SEAMLESS
researchers of different disciplines from biophysical (agro-ecology) to social sciences (e.g.
economics, geography). They had different levels of knowledge of the framework and its
components. This diversity of policy experts and integrative modellers ensured that the
procedure (and accordingly the relevance of the framework specifications) was evaluated in a
large range of situations so that the outcome of the evaluation exercises were not narrowed to
specific issues and working conditions.
Page 14
14
Conceptual evaluation of quantitative tools
Integrated assessment tools combine quantitative tools in a single framework. In the case of
SEAMLESS three groups can be distinguished: databases (and ontology), numerical models
and indicators. These components are of course interrelated, within the modelling chains
including the database and the procedure to compute indicators for sustainability assessment.
The requirements on each component depend on its place in the overall framework. The
conceptual evaluation first determined the expected contribution of each component to the
overall objective of the integrated framework (i.e. does it deliver the information required by
the whole framework). Then each component was assessed individually (i.e. does it provide
the information properly). The two case studies were essential in these evaluations by
providing concrete requirements for the whole system from which requirements on
components could be derived. This illustrates also the need to carefully choose case studies
representative of the target range of studies, because by guiding the conceptual evaluation
they drive further development of the components.
Different types of quantitative tools require different evaluation procedures. Databases
are based on the development of ontology that should encompass the different data,
procedures and concepts mobilized in SEAMLESS-IF. Using object-oriented databases and
specific UML diagrams helped at analysing the links between these different entities and the
conceptual functioning between databases and other quantitative tools (models and
indicators). Conceptual evaluation of numerical models is easier because these models
describe, in their concepts and equations, a part of the reality the users have in mind for the
real systems under study. This principle has been used to communicate with users and detect
missing information (Dieste et al. 2003). Such conceptual evaluation was mainly based on
flowchart and expert appraisal. Conceptual evaluation of indicators was based on a 5-points
Page 15
15
grid: presence of the indicator (“Is it requested by the end-users?”), pertinence (“Does it
express the requested impact or trend?”), robustness (“How do we know that this indicator
will be suitable for the different situation from field to EU?”), sensitivity (“Do we think that
the indicator will change with the expected policy?”) and how is it computed (“Do we have
the knowledge, the data and the skill to compute it?”). This conceptual evaluation was
performed in close relation with the general evaluation procedure. Using the test cases as
concrete stories led the selection of indicators.
APES (Agricultural Production and Externalities Simulator) can illustrate the
conceptual evaluation of a quantitative model in the case of SEAMLESS-IF. APES (see also
Chapter 4 of this Volume Donatelli) is a field scale crop model describing the soil-plant
behaviour under a specific climate and technical management. From the test cases we derived
that APES should be able to simulate the yield and externalities of main European crops to be
further used as input by the farm model to which it is linked. It also needed to be flexible and
modular enough to allow for future additions of modules describing other processes (e.g.
carbon sequestration) and types of crops (combining several crops in an agroforestry system).
In order to generate proper yields and externalities APES has to take into account water and
nitrogen stresses as well as soil and weather variability across Europe, have parameters that
can be estimated from available data, and the integration of different modules within APES
should be scientifically sound (i.e. without scale and precision discrepancy between modules).
Conceptual evaluation of user interfaces
When extending evaluation to issues like utility, robustness, rapidity, efficiency, extension
possibilities, ease of use, credibility, etc. (Mosqueira-Rey and Moret-Bonillo 2000) the
evaluation of user interfaces become important. The way in which integrated frameworks can
Page 16
16
be accessed, i.e. through the graphic user interface, determines to a large extent the type of
study for which the integrated framework can be put. The evaluation of user interfaces is
closely linked to the target uses and users of the framework.
In the case of SEAMLESS-IF which is a computer-based tool a GUI was developed to
set up IA within the framework. The GUI design was therefore derived from the SEAMLESS
procedure for integrated assessment (see Figure 10.1). The conceptual evaluation of the GUI
was based on mock-ups, which were screenshots of how the GUI would look like while it was
not yet operational to run the model chains. These mock-ups were then evaluated to see
whether they would allow the implementation of the full SEAMLESS procedure for
integrated assessment of each of the test cases. Apart from allowing a conceptual evaluation
of the GUI at an early stage of development, mock-ups also facilitated development of a
common understanding and representation between participants of different disciplines by
providing a graphical illustration of the proposed functionalities of the GUI.
Technical evaluation
The second type of evaluation consists of assessing whether components function in technical
terms and are integrated in such a way that the foreseen analyses can be performed. The case
studies played a central role in the technical evaluation by providing an operational
benchmark for the required properties of the individual components as well as their
integration in the framework.
Page 17
17
Technical evaluation of procedures for using the integrated framework
As discussed before, the technical evaluation of the procedure was performed at the same time
as the conceptual evaluation but the technical evaluation focussed on the operational
feasibility of the procedure. It was assessed using criteria like the distribution of roles among
users, degree of interaction among policy experts and modellers and time needed to complete
an integrated assessment. These criteria were made operational using a set of questions that
were asked both to policy experts and modellers at the end of the evaluation meeting provided
these had not yet been discussed spontaneously:
- Does information required for problem formulation from policy experts match their
skills?
- Are the guidelines provided for problem formulation easy to use by the modellers?
- Does the translation of policy questions of the experts into experiments compatible with
SEAMLESS-IF match with the modellers’ skills?
- Does problem formulation with guidelines of the SEAMLESS-IF procedure produce
the necessary information to perform an assessment?
The evaluation revealed that a single meeting is not sufficient to define an IA project in a
computer-based tool such as SEAMLESS-IF. This was partly due to the framework not being
ready yet, hampering the assessment of whether specific questions could be addressed by the
framework, and partly due to its complexity. Another key concern was the necessary level of
knowledge of the modellers of different disciplines in order to frame the policy question in a
manner suitable for assessment within the modelling chains. Another key reason to have more
than one meeting is the time needed for sharing a minimum level of mutual understanding
between policy experts and modellers.
Page 18
18
Technical evaluation of quantitative tools
In the case of quantitative tools the technical evaluation focussed first on assessing the
functioning of the individual components after which the combination of components was
tested. In the case of quantitative tools the evaluation relies first on the classical evaluation of
numerical models (like the well-developed methods for ecological and agronomical models)
to ensure that the outputs are valid. The second aspect is to check that each component
delivers the right outputs in the right format to the other components of the modelling chain,
to ensure that the framework as a whole derives indicators as output from scenarios as input.
Evaluation of the individual components is case-specific since it depends on the
modelling concepts used by the discipline from which they originate. Continuing with the
example of APES, agronomic validation methods were used to assess the validity of the
results. In addition the programming of the model was tested (programming bugs, rapidity,
RAM and ROM requirement) as well as its compatibility with multiple operating systems.
The latter requirement was essential for the objective to integrate APES in the SEAMLESS-IF
framework. The testing itself was split into qualitative and quantitative tests.
The qualitative tests consisted of expert appraisals of results. Given the known reaction
of crops to modification of soil, weather or management (e.g. reduced yield with no N
fertilization or less rain) it was analyzed if the model could reproduce this trend.
The quantitative tests then proceeded by more in depth study of the precision,
robustness and sensitivity of the APES model. These tests took place through workshops in
which mini-applications typical of the agricultural activities to be simulated by APES in
SEAMLESS-IF were simulated. These workshops involved three types of participants:
Page 19
19
- mini-applications experts with broad knowledge of the soil, crops and agro-management
and able to provide data for calibration and evaluation (both data from test cases regions
and field experiments were used);
- component developers of all components combined in the APES configuration required
for the mini-application (e.g. the soil water component, the arable crops component
etc.); and
- software experts able to change equations and parameters in the code version during the
workshop.
This combination of participants allowed several iterations between evaluation and model
development. The quantitative evaluation revealed that the availability of suitable detailed
data needed for quantitative testing is hard to obtain for the range of situations representative
of Europe. This requirement limits the scope for a full evaluation in line with the agronomic
standards, but seems inherent to the development of an integrated framework of the size and
scale of SEAMLESS-IF.
Technical evaluation of the user interface
The technical evaluation of the GUI addresses issues like ease of use, rapidity, ergonomics
related to the use of the system through the interface. In addition the technical evaluation
needs to assess whether the user interface is able to define and conduct properly the runs of
the underlying components. The requirements of a user interface depend strongly on the
background of the users and therefore needs to be tested by potential users.
In the case of the SEAMLESS-IF GUI role games were the main tool for testing. These
role games involved both groups and individuals of SEAMLESS researchers that acted as
Page 20
20
integrative modellers who interact with policy experts for an impact assessment with
SEAMLESS-IF. Varying with the level of development of the integrated framework these
tests evolved during the project from defining a policy question with the GUI (pre-modelling),
to quantifying scenarios and running models through the GUI (modelling) and to testing the
GUI functionalities for displaying and analyzing results (post-modelling). Outcomes from the
testing of the SEAMLESS-IF procedure on preferences of potential end-users on display of
results were used in this last step. Each round of testing resulted in a set of recommendations
to the GUI developers on screens and functionality, which was prioritized in the light of the
needs for implementing the test cases.
System evaluation
The third type of evaluation consists of assessing whether the framework as a whole gives
acceptable results that are useful for end-users. A key question in this analysis would also be
whether the integrated framework as a whole provides different insights from those gained
from an analysis based on the individual components. In the case of SEAMLESS-IF the
integrated framework was not yet available at the time of writing, preventing evaluations of
the system as a whole. Below the outlines are sketched of the way in which such a system
evaluation could be done for the three types of components. Comparing the results obtained
for the test case with the framework and with the use of its individual components is the
ultimate step of this evaluation.
Page 21
21
System evaluation of procedures for using the integrated framework
When performing a system evaluation of the procedures for integrated assessment the focus
will be on whether the procedure succeeds in translating a policy question of end-users into
assessed scenarios that provide satisfactory and useful information to support policy
decisions. This assessment would encompass not only the quality of the definition of the
problem and analysis of results in relation to the policy question, but also the time needed for
such an assessment in relation to the organization of integrated assessments by end-users.
System evaluation of quantitative tools
A system evaluation of quantitative tools in an integrated framework will focus on validation
of the outcomes of the linked set of components included in the framework. One of the big
issues here is the problem of chain of errors and uncertainty (van Itterstum et al. 2006).
Another issue whether problems with the compatibility of different components (models or
databases) has been solved through the ontology.
System evaluation of the user interface
The key question of the user interface is whether it provides access to results that are useful
and transparent for end-users. Added to this may be requirements on the way results are
displayed, which could vary by type of end-user. These requirements can to some extent be
determined beforehand (and are also part of the technical evaluation using the test cases). Use
of the operational framework by end-users will no doubt lead to additional requirements, just
Page 22
22
as in the evaluation of the integrated assessment procedure evaluation criteria were defined
during testing.
Some more lessons to organize the evaluation of an integrated
framework
The type of testing done for SEAMLESS-IF is summarized in Table 10.1. The experiences
gained can be translated in several key points which can be of use for other projects aiming at
evaluating integrated assessment tools: use of prototypes, use of case studies, timing of
testing, independent testers, multidisciplinarity, separating end-users and modellers. Where
possible we suggest potential solutions to problems encountered during SEAMLESS-IF.
Table 10.1 Summary of evaluation levels used for the different types of components
Conceptual evaluation Technical evaluation System evaluation
Procedure
(example: SEAMLESS-
IF procedure for IA)
- consistency with overall
objectives
- relevance for users
- operational feasibility
- reliability (quality of
outcome)
organisational feasibility
Quantitative tools
(example: APES)
consistency with overall
objectives and with major
characteristics of the
system it simulate (a
crop))
- reliability (outputs
quantitative analysis)
- programming problems
- operational feasibility
- Uncertainty analysis
- Input/output analysis
User interfaces
(example: GUI)
consistency with overall
objectives
operational feasibility
Page 23
23
Use of prototypes
As illustrated in Figure 10.1 the development of SEAMLESS-IF proceeded from a set of
individual components to an integrated framework. This process was made operational
through the use of prototypes and transition workshops as illustrated in Figure 10.3. The first
prototype both ran parallel with and was the support of the conceptual evaluation, i.e. assuring
that the integrated framework would be internally consistent and suited for addressing the
purpose for which it is designed. To assure alignment with end-user demands it was oriented
at end-users. It consisted of a series of mock-ups or screen shots showing how the integrated
framework would look like and what kind of functionalities it would have. These mock-ups
were found to be an important tool not only for the conceptual evaluation, but also to generate
a common understanding among the various disciplines involved in building the framework.
Whereas the first prototype did not have any functionality in terms of running analyses,
the subsequent prototypes did have increasing functionalities allowing the evaluation to
switch to technical testing of the framework and its components.
The switch from one prototype to the next was marked by so-called transition
workshops. These workshops brought together developers and testers to translate the
assessment of the prototype into (prioritized) requirements for the next prototype.
Page 24
24
Figure 10.3 The role of prototype testing and transition workshops in the
development path of SEAMLESS-IF
Evaluations of the prototypes proved to be instrumental in directing the development and
functionality of the integrated framework. A drawback of the development of prototypes is
the required time and resources which come at the expense of developing the components and
conducting research with it. Based on the experiences in SEAMLESS-IF however these
investments were considered as essential for building a common understanding of the status
and direction of development of the integrated framework as a whole.
Use of case studies
Two sets of case studies formed the thread along which the evaluations were performed.
These cases studies provided contrasting and realistic applications of the integrated
framework. Care was taken to make the applications as representative as possible of the target
range of questions (top-down versus bottom-up, economic versus environmental policies,
Page 25
25
region vs. EU scale) to avoid to pull the framework development in a single direction. This
allowed to test the general applicability of the tool across different disciplines and scales and
to keep it generic enough to maintain the flexibility and broader use of the framework. The
case studies turned out to provide a crucial basis for assessing the development of the
framework and identifying differences between disciplines that affect the integrated
assessment. Furthermore, the case studies define operational minimum requirements to the
integrated framework which aid priority setting and allocation of resources in the project.
Timing of testing
The production of an integrated framework such as SEAMLESS-IF required development and
integration of different components. This simultaneous development and integration proved to
be more complex than expected and resulted in considerable delays in prototype delivery to
the testing group. These delays in turn hampered testing and made reliance on the conceptual
evaluation developed for this purpose (Thérond et al. 2009) an essential contribution to
further development of the framework. The delays in arriving at an operational framework
also imply that no tests of the system as a whole are possible during the project. The technical
tests were done on partially finished components and needed to strike a careful balance
between providing constructive comments to the developers already under pressure and
providing early indications that a tool may prove unsuitable to the whole system’s features
and validity domain.
In the case of SEAMLESS-IF, end-users were also involved in the testing to assure the
relevance of the integrated framework. Early involvement of users provides ideas on required
features and potential fields of application useful for development of the integrated
framework. At the same time user –involvement at an early stage can interfere with proper
Page 26
26
testing to avoid deterring potential users and leads to pressure on quick delivery of the
integrated framework by fuelling users’ expectations. A balance thus needs to be found to
ensure that the integrated framework is relevant while not raising users’ expectations beyond
feasible levels.
Independent testers
Following Mosqueira-Rey and Moret-Bonillo (2000) and Sojda (2007) the testing of
SEAMLESS-IF started with evaluators independent of tool developers. This allowed a more
objective evaluation of the tools and their suitability for the integrated framework and it
ensured that an interdisciplinary group was identified in the project with the primary objective
of regularly testing prototypes in realistic applications to keep the development of the
framework on track (Figure 10.3). During the project, however, it proved difficult to enrol
scientists in evaluation work that is hard to publish in scientific journals, while requiring a
high level of involvement to understand and assess prototypes with limited features and a high
level of integration of computerized tools from different disciplines. In addition to this push
effect the evaluators were also pulled into other parts of the project, for example contributing
to the development of links between models, due to their extensive knowledge of the
framework and its components.
These push and pull factors led to a blurring of the distinction between evaluators and
developers at the end of the project. Given these factors at play it seems advisable that
evaluators start as external reviewers involved in the conceptualisation of the whole
framework based on potential applications, but not on tool development and become involved
in integrated tool development once individual components become available for linking.
Page 27
27
Multidisciplinarity
Evaluating a complex multidisciplinary tool like SEAMLESS-IF is a daunting task. As
noticed by Langvad and Noe (2006) multidisciplinary involvement is a precondition for
developing relevant decision support systems. At the same time a different disciplinary
background compounded by a wide diversity of cultures and practices in an international
project like SEAMLESS poses a major challenge to both development and evaluation of the
framework. As an illustration, in the evaluation of the integrated assessment procedure the
aim was to allow a diversity of ways to frame problems to account for the diversity in
backgrounds of both the policy experts and modellers involved. Accounting for this diversity
implied testing in different countries. Because of language it was impossible to nominate a
single person to record observations (spontaneous comments from policy experts about
presentations and the way they interact with integrative modellers for problem formulation) to
assure a uniform recording of observations. Instead a standardized way of recording had to be
developed that interfered with the aim of having diversity in problem representations.
Separating end-users and modellers
The aim of an integrated framework to use models in managing complex situations rather than
in sharply defined areas of research, results in people with little modelling or quantitative
background relying on models while not being in a position to judge their quality or
appropriateness. Linking models from different disciplines in a single framework implies that
this danger not only applies to use of the framework by non-modellers as identified by
Jakeman et al. (2006), but also to modellers relying on inputs from models outside their
discipline. Being unaware of limitations, uncertainties, omissions and subjective choices may
Page 28
28
lead to invalid conclusions by wrongly interpreting model results and/or applying models for
purposes for which they are not designed.
The only way to mitigate these risks is to generate wider awareness of the modelling
process, choices made, good practice for testing and applying models and the interpretation of
model results. Within SEAMLESS-IF these considerations are accounted for in the design of
the framework separating pre- and post-modelling from the modelling phase to which only
integrative modellers have access. Although this reduces the risks of having users with limited
quantitative background running models, it still allows modellers to use tools from other
disciplines without sufficient background information. This latter risks can probably only be
mitigated by having analyses done by a team of modellers from different disciplines, which
may be hard to realize in practice.
Acknowledgements
The authors wish to thank all the participants of SEAMLESS-IF involved in testing as well as
the students that have worked on the testing specific components of SEAMLESS-IF.
Page 29
29
References
Adelman, L. (1992). Evaluation decision support and expert systems. New York, NY: John
Wiley and Sons.
Béguin, P. & Cerf, M. (2004). Formes et enjeux de l'analyse de l'activité pour la conception
de systèmes de travail. @ctivités, 1, 54-71.
Belhouchette, H., Wery, J., Therond, O., Bergez, J.E. & van Ittersum, M. (2006a). Design of
scenarios for integrated impact assessment of interactions between EU policies and agro-
ecological technologies using SEAMLESS-IF, Proceedings of the IX ESA Congress (part III),
Warsaw (Poland), pp. 11-20.
Belhouchette, J., Wery, J., Therond, O., Duru, M., Bigot, G. et al. (2006b). The major
characteristics of environmental policies and agro-ecological technologies to be studied in
Test case 2, SEAMLESS Report No.13, SEAMLESS integrated project, EU 6th Framework
Programme, contract no. 010036-2, www.SEAMLESS-IP.org, 87 pp.
Belhouchette, H., Therond, O., Adenaeuer, M., Kuiper, M., Bigot, G., AlkanOlsson, J., Wery,
J. et al. (2007). Documentation of baseline and policy options to be assessed with Prototypes
2 and 3, D6.2.3.3, SEAMLESS integrated project, EU 6th Framework Programme, contract
no. 010036-2, www.SEAMLESS-IP.org, 112 pp.
Boehm, B.W. (1981). Software engineering economics. Englewood Cliffs, NJ: Prentice Hall.
Page 30
30
Boehm, B.W. (1988). A spiral model of software development and enhancement. Computer,
May, 61-72.
Börjeson, L., Hojer, M., Dreborg, K.H., Ekvall, T. & Finnveden, G. (2006). Scenario types
and techniques: Towards a user's guide. Futures, 38, 723-739.
Brunner, N. & Starkl, M. (2004). Decision aid systems for evaluating sustainability: a critical
survey. Environmental Impact Assessment Review, 24, 441-469.
Darses, F., Détienne, F. & Visser, W. (2004). Les activités de conception et leur assistance. In
P. Falzon (Ed.), Ergonomie (pp. 545-563). Paris: PUF.
Darses, F. (2002). A cognitive analysis of collective decision-making in the participatory
design process. Participatory design conference. Malmö, Sweden.
Dieste, O., Genero, M., Juristo, N., Mate, J.L. & Moreno, A.M. (2003). A conceptual model
completely independent of the implementation paradigm. Journal of Systems and Software,
68, 183-198.
Finlay, P.N. & Wilson, J.M. (1991). Validation for decision support systems: Recent
developments and findings. Systems Practice, 4, 599-610.
Harris, G. (2002). Integrated assessment and modelling: an essential way of doing science.
Environmental Modelling & Software, 17, 201-207.
Page 31
31
Jakeman, A.J., Letcher, R.A. & Norton, J.P. (2006). Ten iterative steps in development and
evaluation of environmental models. Environmental Modelling & Software, 21, 602-614.
Langvad, A.M. & Noe, E. (2006). (Re-)innovating tools for decision-support in the light of
farmers’ various strategies. In H. Langeveld & N. Röling (Eds.), Changing European farming
systems for a better future – new visions for rural areas (pp. 335–339). The Netherlands:
Wageningen.
Lopez, M. (2003). Application of an evaluation framework for analyzing the architecture
tradeoff analysis methodSM. Journal of Systems and Software, 68, 233-241.
Louhichi, K., Belhouchette, H., Flichman, G., Therond, O. & Wery J. (2008). Application of
FSSIM in two Test Case regions to assess agro-environmental policies at farm and regional
level, PD6.3.2.2, SEAMLESS integrated project, EU 6th Framework Programme, contract no.
010036-2, www.SEAMLESS-IP.org, 67 pp.
Mosqueira-Rey, E. & Moret-Bonillo, V. (2000). Validation of intelligent systems: a critical
study and a tool. Expert Systems with Applications, 18, 1-16.
Oreskes, N., Shrader-Frechette, K. & Belitz, K. (1994). Verification, validation, and
confirmation of numerical models in the earth sciences. Science, 263, 641-646.
Parker, P., Letcher, R.A., Jakeman, A.J., Beck, M.B., Harris, G., Argent, R.M., Hare, M.,
Pahl-Wostl, C., Voinov, A. & Janssen, M. (2002). Progress in integrated assessment and
modelling. Environmental Modelling & Software, 17 (3), 209-217.
Page 32
32
Pastre, P. (2005). Apprendre par la simulation: de l’analyse du travail aux apprentissages
professionnels. Octares, collection Formation, 363 p.
Rykiel, E. J. Jr. (1996). Testing ecological models: the meaning of validation. Ecological
Modelling, 90, 229-244.
Rotmans, J. (1998). Methods for IA: The challenges and opportunities ahead. Environmental
Modeling and Assessment, 3, 155-79.
SEC, 2005. Impact Assessment Guidelines (791).
http://ec.europa.eu/governance/impact/docs/SEC2005_791_IA_guidelines_main.pdf.
Sharma, M. & Norton, B.G. (2005). A policy decision tool for integrated environmental
assessment. Environmental Science & Policy, 8, 356-366.
Sinclair, T.R. & Seligman, N.G. (2000). Criteria for publishing papers on crop modeling.
Field Crops Research, 68, 165-172.
Sojda, R.S. (2007). Empirical evaluation of decision support systems: Needs, definitions,
potential methods, and an example pertaining to waterfowl management. Environmental
Modelling & Software, 22, 269-277.
Thérond, O., Belhouchette, O., Janssen, S., Louhichi, K., Ewert, F., Bergez, J.E., Wery, J.,
Heckelei, T., Alkan Olsson, J., Leenhardt, D. & van Ittersum, M. (2009). Methodology to
Page 33
33
translate policy assessment problems into scenarios: the example of the SEAMLESS
Integrated Framework. Environmental Science & Policy, in press.
Tol, R.S.J. & Vellinga, P. (1998). The European Forum on Integrated Environmental
Assessment. Environmental Modelling and Assessment, 3, 181-191.
Toth, F.L. (2003). State of the art and future challenges for integrated environmental
assessment. Integrated Assessment, 4, 250-264.
van Ittersum, M.K., Gabbert, S., Ewert, F., Norton, J.P., Jakeman, A. J., Pollino, C., Heckelei,
T. & Wallach, D. (2006). Uncertainty analysis in model chains for integrated assessment.
PD1.3.11.2, SEAMLESS integrated project, EU 6th Framework Programme, contract no.
010036-2, www.SEAMLESS-IP.org, 61 pp.
van Ittersum, M., Ewert, F., Heckelei, T., Wery, J., Alkan Olsson, J., Andersen, E.,
Bezlepkina, I., Brouwer, F., Donatelli, M., Flichman, G., Olsson, L., Rizzoli, A., van der Wal,
T., Wien, J.E. & Wolf, J. (2008). Integrated assessment of agricultural systems – a
component-based framework for the European Union (SEAMLESS). Agricultural Systems,
96, 150-165.
van Notten, P.W.F., Rotmans, J., van Asselt, M.B.A. & Rothman, D.S. (2003). An updated
scenario typology. Futures, 35, 423-443.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
6 Readers on Mendeley
by Discipline
by Academic Status
33% Ph.D. Student
33% Researcher (at an Academic Institution)
17% Post Doc
by Country
50% France
33% United States
17% Nigeria


