Evaluating Automatic Warning Cues for Visual Search in Vascular Images
Available from portal.acm.org
Page 1
Evaluating Automatic Warning Cues for Visual Search in Vascular Images
Evaluating Automatic Warning Cues for Visual Search in
Vascular Images
Boris W. van Schooten
Faculty of EEMCS
University of Twente
schooten@ewi.utwente.nl
Betsy M.A.G. van Dijk
Faculty of EEMCS
University of Twente
bvdijk@ewi.utwente.nl
Anton Nijholt
Faculty of EEMCS
University of Twente
anijholt@ewi.utwente.nl
Johan H.C. Reiber
Leiden University
Medical Center
j.h.c.reiber@lumc.nl
ABSTRACT
Visual search is a task that is performed in various appli-
cation areas. Search can be aided by an automatic warning
system, which highlights the sections that may contain tar-
gets and require the user’s attention. The effect of imperfect
automatic warnings on overall performance ultimately de-
pends on the interplay between the user and the automatic
warning system. While various user studies exist, the differ-
ent studies differ in several experimental variables including
the nature of the visualisation itself. Studies in the medical
area remain relatively rare, even though there is a growing
interest in medical screening systems. We describe an ex-
periment where users had to perform a visual search on a
vascular structure, traversing a particular vessel linearly in
search of possible errors made in an automatic segmenta-
tion. We find that only the case in which the warning system
generates only false positives improves user time and error
performance. We discuss this finding in relation to the find-
ings of other studies.
Author Keywords
visual search, automatic warning system, magnetic reso-
nance angiography, image segmentation
ACM Classification Keywords
H.5.2 Information Interfaces and Presentation: User Inter-
faces—Graphical user interfaces (GUI)
General Terms
Human Factors, Performance, Reliability
INTRODUCTION
Visual search tasks are performed in various areas: finding
weapons in x-rayed baggage [4], targets from a moving ve-
hicle [14] or on aerial photographs [10, 11], cancer areas
in mammograms [5, 9], polyps in colonoscopy [6], or low-
credibility areas in automatic medical image segmentations
[8, 7]. In many cases, automatic warning systems have been
devised that highlight potential targets. Such systems are
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
IUI’10, February 7–10, 2010, Hong Kong, China.
Copyright 2010 ACM 978-1-60558-515-4/10/02...$10.00.
imperfect: failure may be either a false positive (false alarm)
or false negative (a missed item). A detection system may
be tuned to produce either more false positives or false neg-
atives. Some medical systems can be tuned to produce near
zero false positives or negatives [6, 9]. Especially the ab-
sence of false negatives is often seen as a prerequisite for
their medical applicability.
However, the presence of failures (especially false alarms)
in alarm systems (for both visual search tasks and other
tasks) are known to cause problems for users, such as over-
or under-reliance. While various studies have been made,
experimental variables vary widely among different appli-
cation areas: the presence or absence of a moving scene
or navigation, the prevalence of false positives or negatives,
whether the search is self-terminating or not (that is, whether
the search ends when the target is found), task difficulty (ex-
amined in [10]), target rarity [4], the level of information
about the system given to the users, and of course the task
itself, which varies widely in nature. While some of these
variables have been examined, most have not, and we can ex-
pect different applications to have quite different outcomes.
These are too many variables to examine all at once, and the
research coverage remains as yet spotty. Examining differ-
ent application areas is still a very meaningful exercise.
We examine a new application involving vascular image
analysis, more specifically, 3D magnetic resonance angiog-
raphy (MRA) segmentation, as performed routinely by ra-
diologists. Vascular segmentation involves determining the
thickness of the inside of the vessel (the lumen), which en-
ables analysis of possible pathological narrowings or widen-
ings. While a vessel is tortuous, it can basically be navigated
linearly (from one end to the other), as can for example the
colon in colonoscopy. So, the task can be characterised as
relatively easy, non-self-terminating, involving simple navi-
gation, with users given information about presence of false
positives or negatives. We examine in particular the effect of
the presence of false positives versus false negatives.
RELATED WORK
Studies of generic self-terminating target finding tasks with
target highlighting found that imperfect highlighting often
increased rather than decreased overall user response time,
due to suboptimal increase in response time for the cases
where the wrong target was highlighted [3, 12]. For some
non-self-terminating tasks, users were also found to spend
more time double-checking the data in case of false pos-
393
Vascular Images
Boris W. van Schooten
Faculty of EEMCS
University of Twente
schooten@ewi.utwente.nl
Betsy M.A.G. van Dijk
Faculty of EEMCS
University of Twente
bvdijk@ewi.utwente.nl
Anton Nijholt
Faculty of EEMCS
University of Twente
anijholt@ewi.utwente.nl
Johan H.C. Reiber
Leiden University
Medical Center
j.h.c.reiber@lumc.nl
ABSTRACT
Visual search is a task that is performed in various appli-
cation areas. Search can be aided by an automatic warning
system, which highlights the sections that may contain tar-
gets and require the user’s attention. The effect of imperfect
automatic warnings on overall performance ultimately de-
pends on the interplay between the user and the automatic
warning system. While various user studies exist, the differ-
ent studies differ in several experimental variables including
the nature of the visualisation itself. Studies in the medical
area remain relatively rare, even though there is a growing
interest in medical screening systems. We describe an ex-
periment where users had to perform a visual search on a
vascular structure, traversing a particular vessel linearly in
search of possible errors made in an automatic segmenta-
tion. We find that only the case in which the warning system
generates only false positives improves user time and error
performance. We discuss this finding in relation to the find-
ings of other studies.
Author Keywords
visual search, automatic warning system, magnetic reso-
nance angiography, image segmentation
ACM Classification Keywords
H.5.2 Information Interfaces and Presentation: User Inter-
faces—Graphical user interfaces (GUI)
General Terms
Human Factors, Performance, Reliability
INTRODUCTION
Visual search tasks are performed in various areas: finding
weapons in x-rayed baggage [4], targets from a moving ve-
hicle [14] or on aerial photographs [10, 11], cancer areas
in mammograms [5, 9], polyps in colonoscopy [6], or low-
credibility areas in automatic medical image segmentations
[8, 7]. In many cases, automatic warning systems have been
devised that highlight potential targets. Such systems are
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
IUI’10, February 7–10, 2010, Hong Kong, China.
Copyright 2010 ACM 978-1-60558-515-4/10/02...$10.00.
imperfect: failure may be either a false positive (false alarm)
or false negative (a missed item). A detection system may
be tuned to produce either more false positives or false neg-
atives. Some medical systems can be tuned to produce near
zero false positives or negatives [6, 9]. Especially the ab-
sence of false negatives is often seen as a prerequisite for
their medical applicability.
However, the presence of failures (especially false alarms)
in alarm systems (for both visual search tasks and other
tasks) are known to cause problems for users, such as over-
or under-reliance. While various studies have been made,
experimental variables vary widely among different appli-
cation areas: the presence or absence of a moving scene
or navigation, the prevalence of false positives or negatives,
whether the search is self-terminating or not (that is, whether
the search ends when the target is found), task difficulty (ex-
amined in [10]), target rarity [4], the level of information
about the system given to the users, and of course the task
itself, which varies widely in nature. While some of these
variables have been examined, most have not, and we can ex-
pect different applications to have quite different outcomes.
These are too many variables to examine all at once, and the
research coverage remains as yet spotty. Examining differ-
ent application areas is still a very meaningful exercise.
We examine a new application involving vascular image
analysis, more specifically, 3D magnetic resonance angiog-
raphy (MRA) segmentation, as performed routinely by ra-
diologists. Vascular segmentation involves determining the
thickness of the inside of the vessel (the lumen), which en-
ables analysis of possible pathological narrowings or widen-
ings. While a vessel is tortuous, it can basically be navigated
linearly (from one end to the other), as can for example the
colon in colonoscopy. So, the task can be characterised as
relatively easy, non-self-terminating, involving simple navi-
gation, with users given information about presence of false
positives or negatives. We examine in particular the effect of
the presence of false positives versus false negatives.
RELATED WORK
Studies of generic self-terminating target finding tasks with
target highlighting found that imperfect highlighting often
increased rather than decreased overall user response time,
due to suboptimal increase in response time for the cases
where the wrong target was highlighted [3, 12]. For some
non-self-terminating tasks, users were also found to spend
more time double-checking the data in case of false pos-
393
Page 2
itives, resulting in increased overall response times in the
presence of warnings [1]. Overall, user performance is sub-
optimal, even when users have a good estimate of the sys-
tem’s reliability [2, 11].
Wickens et al. [13] found that distinction of visual elements
by highlighting helps focussed attention (attention to one tar-
get) but hinders integrative attention (where all targets need
to be interpreted in an integrated way). Another detrimental
effect is called attention tunneling, which means the high-
lights distract the user from seeing other elements in the
scene. Yeh et al. [14] found that, even if highlighting of one
target served to predict with 100% accuracy a target in the
vicinity rather than the highlighted target itself, performance
worsened.
Studies on the reliance (or trust) of users on (visual and non-
visual) automatic warnings as related to the failure rate of
the warning system has been studied fairly extensively. One
common finding is that false positives are more damaging to
trust and hence performance than false negatives [10]. Maltz
et al. [10] also finds that target cueing works best if the tar-
gets are otherwise very difficult to detect.
None of these studies were conducted in the medical do-
main. One of the rare medical studies in this area, done
by Freer et al. [5], seems to contradict some of these find-
ings. It indicates a positive effect on clinical outcome in
a mammogram-reading study with as much as 97.4% false
positives. Freer et al. use a double-reading scheme, taken
from medical practice, but used by none of the other studies:
each mammogram is first examined as a plain image, before
the warning highlights are shown, reducing any possible ef-
fect of attention tunneling. Additionally Freer et al.’s task is
difficult (experts miss 50% or more of targets), unlike most
of the other experiments. This shows that studies in the med-
ical domain may have different outcomes due to differences
in experimental variables, which are implicitly assumed in
the other domains. This makes it worthwhile to study other
medical tasks more closely.
EXPERIMENTAL DESIGN
Our task consists of checking the correctness of automatic
segmentations of vessels in MRA scans. A typical segmen-
tation algorithm determines a vessel’s location by drawing
a line through the (density) center of the vessel, called the
centerline. Then, it determines the thickness of the inside of
the vessel (the lumen) based on the centerline.
We used a software phantom approach. The MRA data is
artificially generated, along with segmentations with artifi-
cially generated segmentation errors. This way it is easy to
generate dozens of cases with a clear distinction between
correct and erroneous, an unambiguous ground truth, and
similar difficulty levels. A vessel is constructed using a sum
of sine waves. Three distractors vessels were added in each
phantom. Thickness of the vessel was varied in a stylized
manner with thinner and thicker areas. When looking at a
cross-section, density in the center of the vessel was highest,
gradually lowering towards the boundaries of the vessel, and
Figure 1. Illustration of the visual stimuli used. Top left: real-life data.
Top right: typical software phantom as used in our experiment. Bot-
tom: stimuli as presented to the users. Bottom left: with thickness
error in the center and marked as potential error. Bottom right: with
veering error in the center but not marked.
zero outside of the vessel. No noise or other distractors were
added, neither were bifurcations present. See figure 1.
Errors are simply defined as a deviation between the seg-
mentation and the densest parts of the volume. Only three
error types exist: a veering away of the centerline and seg-
mentation from the vessel, the segmentation being thinner
than the vessel, and the segmentation being thicker.
We use direct volume rendering (DVR) to visualise the vol-
ume data, with a yellow line indicating the centerline, and a
brown mesh indicating the segmentation. The warning sys-
tem highlights parts of the centerline and mesh in red to in-
dicate possible errors.
We chose controls to be as simple as possible without sac-
rificing user control. Control is with the mouse only. One
major choice we made is to base navigation on the center-
line. The camera is always centered around a point on the
centerline, and rotates so that the vessel is viewed from the
side. The centerline is navigated by rolling the mouse wheel,
or by clicking on a centerline point with the middle mouse
button (MMB). The user can specify relative rotation using
a two-axis valuator scheme controlled with the right mouse
button (RMB). The camera is zoomed in close to the ves-
sel so details can be seen clearly. The user can simply click
on a section of the vessel with the left mouse button (LMB)
to indicate a segmentation error. The appropriate section is
highlighted in green.
We compare user performance (time taken and error rate) for
the following four conditions:
1. NONE - no suspicious areas (baseline)
2. PAR (paranoid suspicious areas) yields only false posi-
tives - the user only has to search within the suspicious
areas
3. CON (conservative suspicious areas) yields only false
394
presence of warnings [1]. Overall, user performance is sub-
optimal, even when users have a good estimate of the sys-
tem’s reliability [2, 11].
Wickens et al. [13] found that distinction of visual elements
by highlighting helps focussed attention (attention to one tar-
get) but hinders integrative attention (where all targets need
to be interpreted in an integrated way). Another detrimental
effect is called attention tunneling, which means the high-
lights distract the user from seeing other elements in the
scene. Yeh et al. [14] found that, even if highlighting of one
target served to predict with 100% accuracy a target in the
vicinity rather than the highlighted target itself, performance
worsened.
Studies on the reliance (or trust) of users on (visual and non-
visual) automatic warnings as related to the failure rate of
the warning system has been studied fairly extensively. One
common finding is that false positives are more damaging to
trust and hence performance than false negatives [10]. Maltz
et al. [10] also finds that target cueing works best if the tar-
gets are otherwise very difficult to detect.
None of these studies were conducted in the medical do-
main. One of the rare medical studies in this area, done
by Freer et al. [5], seems to contradict some of these find-
ings. It indicates a positive effect on clinical outcome in
a mammogram-reading study with as much as 97.4% false
positives. Freer et al. use a double-reading scheme, taken
from medical practice, but used by none of the other studies:
each mammogram is first examined as a plain image, before
the warning highlights are shown, reducing any possible ef-
fect of attention tunneling. Additionally Freer et al.’s task is
difficult (experts miss 50% or more of targets), unlike most
of the other experiments. This shows that studies in the med-
ical domain may have different outcomes due to differences
in experimental variables, which are implicitly assumed in
the other domains. This makes it worthwhile to study other
medical tasks more closely.
EXPERIMENTAL DESIGN
Our task consists of checking the correctness of automatic
segmentations of vessels in MRA scans. A typical segmen-
tation algorithm determines a vessel’s location by drawing
a line through the (density) center of the vessel, called the
centerline. Then, it determines the thickness of the inside of
the vessel (the lumen) based on the centerline.
We used a software phantom approach. The MRA data is
artificially generated, along with segmentations with artifi-
cially generated segmentation errors. This way it is easy to
generate dozens of cases with a clear distinction between
correct and erroneous, an unambiguous ground truth, and
similar difficulty levels. A vessel is constructed using a sum
of sine waves. Three distractors vessels were added in each
phantom. Thickness of the vessel was varied in a stylized
manner with thinner and thicker areas. When looking at a
cross-section, density in the center of the vessel was highest,
gradually lowering towards the boundaries of the vessel, and
Figure 1. Illustration of the visual stimuli used. Top left: real-life data.
Top right: typical software phantom as used in our experiment. Bot-
tom: stimuli as presented to the users. Bottom left: with thickness
error in the center and marked as potential error. Bottom right: with
veering error in the center but not marked.
zero outside of the vessel. No noise or other distractors were
added, neither were bifurcations present. See figure 1.
Errors are simply defined as a deviation between the seg-
mentation and the densest parts of the volume. Only three
error types exist: a veering away of the centerline and seg-
mentation from the vessel, the segmentation being thinner
than the vessel, and the segmentation being thicker.
We use direct volume rendering (DVR) to visualise the vol-
ume data, with a yellow line indicating the centerline, and a
brown mesh indicating the segmentation. The warning sys-
tem highlights parts of the centerline and mesh in red to in-
dicate possible errors.
We chose controls to be as simple as possible without sac-
rificing user control. Control is with the mouse only. One
major choice we made is to base navigation on the center-
line. The camera is always centered around a point on the
centerline, and rotates so that the vessel is viewed from the
side. The centerline is navigated by rolling the mouse wheel,
or by clicking on a centerline point with the middle mouse
button (MMB). The user can specify relative rotation using
a two-axis valuator scheme controlled with the right mouse
button (RMB). The camera is zoomed in close to the ves-
sel so details can be seen clearly. The user can simply click
on a section of the vessel with the left mouse button (LMB)
to indicate a segmentation error. The appropriate section is
highlighted in green.
We compare user performance (time taken and error rate) for
the following four conditions:
1. NONE - no suspicious areas (baseline)
2. PAR (paranoid suspicious areas) yields only false posi-
tives - the user only has to search within the suspicious
areas
3. CON (conservative suspicious areas) yields only false
394
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
3 Readers on Mendeley
by Discipline
33% Design
by Academic Status
33% Student (Master)
33% Researcher (at an Academic Institution)
33% Researcher (at a non-Academic Institution)
by Country
33% United Kingdom
33% Argentina
33% United States


