Sign up & Download
Sign in

Visualization of Search Results from the World Wide Web

by Thomas M Mann
World Wide Web Internet And Web Information Systems (2002)

Cite this document (BETA)

Available from kops.ub.uni-konstanz.de
Page 1
hidden

Visualization of Search Results from the World Wide Web


Visualization of Search Results
from the
World Wide Web
Dissertation
zur Erlangung des akademischen Grades
des Doktors der Naturwissenschaften (Dr. rer. nat.)
an der Universität Konstanz
Fachbereich Informatik und Informationswissenschaft
vorgelegt von
Thomas M. Mann




in conformity with the requirements for the degree of
Doctor of Natural Sciences (Dr. rer. nat.)
at the University of Konstanz, Germany
Department of Computer and Information Science




Tag der mündlichen Prüfung: 10. Januar 2002
Referent: Prof. Dr. Harald Reiterer
Referent: Prof. Dr. Wolfgang Pree


Day of the oral examination: 2002-01-10


Page 2
hidden
Page 2 from 266 Thomas M. Mann
Version 1.07 - 2002-01-30 Visualization of Search Results from the World Wide Web



Page 3
hidden
Thomas M. Mann Page 3 from 266
Visualization of Search Results from the World Wide Web Abstract

Visualization of Search Results from the World Wide Web
Thomas M. Mann
Dissertation, Freiburg im Breisgau 2002
University of Konstanz, Germany, Department of Computer and Information Science
Universität Konstanz, Fachbereich Informatik und Informationswissenschaft

Abstract (English)
This thesis explores special forms of presentations of search results from the World Wide Web.
The usage of Information Visualization methodologies is discussed as an alternative to the usual
arrangement in form of a static HTML-list. The thesis is structured into four main parts. The first
part deals with information seeking. It presents ideas from the literature on how to structure the
information seeking process and some results from studies of how people search the Web. For the
second part visualization ideas, metaphors, techniques, components and systems have been col-
lected. The overview focuses on the visualization of queries or query attributes, document attrib-
utes, and interdocument similarities. The reference model for visualization from [Card, Mackinlay,
Shneiderman 1999] is used to discuss differences between certain techniques. Visualization com-
ponents from a number of areas, usage scenarios, and authors are presented using a consistent
search example wherever possible. The part about Information Visualization also includes a dis-
cussion of multiple coordinated views and some results from empirical evaluations of visualiza-
tions by other authors. The third, empirical part of the thesis presents the results of an evaluation of
five different user interface conditions of a local meta search engine called INSYDER. An over-
view covering the INSYDER project in general, the system architecture, and the development of
the implemented visualization ideas is included. In a test with 40 users effectiveness, efficiency,
expected value, and user satisfaction were measured for twelve tasks. Evaluated user interface
conditions were HTML-List, ResultTable, ScatterPlot plus ResultTable, BarGraph plus Result-
Table, and SegmentView plus ResultTable. The SegmentView included TileBars and StackedCol-
umns variants. The traditional presentation in the form of an HTML-List performed best in terms
of effectiveness and efficiency. In contrast to this, the users preferred the ResultTable and the Seg-
mentView. The last section of the thesis consists of a summary and an outlook.
Abstract (Deutsch)
Diese Dissertation untersucht spezielle Formen der Darstellung von Suchergebnissen aus dem
World Wide Web. Diskutiert wird die Nutzung von Methoden der Informationsvisualisierung als
Alternative zur üblichen Anordnung in Form einer statischen HTML-Liste. Die Arbeit ist in vier
Hauptteile strukturiert. Der erste Teil beschäftigt sich mit der Informationssuche. Er stellt Ideen
aus der Literatur vor wie der Suchprozess strukturiert werden kann, sowie einige Resultate aus
Studien wie Benutzer im Web suchen. Für den zweiten Teil wurden Ideen, Metaphern, Techniken,
Komponenten und Systeme für Visualisierungen gesichtet. Der Überblick ist ausgerichtet auf die
Visualisierung von Abfragen oder Abfrageattributen, von Dokumentattributen und von Ähnlich-
keiten zwischen Dokumenten. Das Referenzmodell für Visualisierung von [Card, Mackinlay,
Shneiderman 1999] wird verwendet, um Unterschiede zwischen bestimmten Techniken zu disku-
Page 4
hidden
Page 4 from 266 Thomas M. Mann
Abstract Visualization of Search Results from the World Wide Web
tieren. Visualisierungskomponenten aus bzw. von einer Anzahl von Bereichen, Anwendungssze-
narien und Autoren werden dargestellt, indem wo immer möglich ein konsistentes Suchbeispiel
verwendet wird. Der Abschnitt über Informationsvisualisierung umfasst auch eine Diskussion über
mehrfache, koordinierte Ansichten und einige Resultate aus empirischen Untersuchungen von
Visualisierungen durch andere Autoren. Der dritte, empirische Teil der Dissertation stellt die Re-
sultate einer Untersuchung von fünf unterschiedlichen Darstellungs-Szenarien einer lokalen Meta-
suchmaschine mit dem Namen INSYDER vor. Enthalten ist auch ein Überblick über das
INSYDER-Projekt im allgemeinen, die Systemarchitektur und die Entwicklung der umgesetzten
Visualisierungen. In einem Test mit 40 Benutzern wurden Effektivität, Effizienz, erwarteter Nut-
zen und Benutzer-Zufriedenheit für zwölf Aufgaben gemessen. Untersuchte Präsentationsformen
waren HTML-Liste, ResultTable, ScatterPlot plus ResultTable, BarGraph plus ResultTable und
SegmentView plus ResultTable. Die SegmentView bestand aus TileBar- und StackedColumn-
Varianten. Die traditionelle Darstellung in der Form einer HTML-Liste zeigte die besten Ergebnis-
se bezüglich Effektivität und Effizienz. Im Gegensatz dazu bevorzugten die Benutzer die Result-
Table und die SegmentView. Die Arbeit schließt mit einer Zusammenfassung und einem Ausblick.
Page 5
hidden
Thomas M. Mann Page 5 from 266
Visualization of Search Results from the World Wide Web Zweiseitige Zusammenfasung in Deutsch

Zweiseitige Zusammenfassung in Deutsch
Für eine Kurzübersicht über den Inhalt der Arbeit siehe den vorangegangenen deutschen Abstract.
Auf den folgenden zwei Seiten erfolgt eine kurze Darstellung der Inhalte der einzelnen Kapitel.
Die Einleitung (Introduction) umreißt das Aufgabenfeld Suchen im Web und thematisiert hier
insbesondere Informationsüberflutung und Selektion. Als mögliche Lösung von Problemen wird
der Einsatz von Techniken der Informationsvisualisierung vorgeschlagen.
Das Hauptkapitel zum Thema Informationssuche (Information seeking) gliedert sich zwei Teile.
Nach einer kurzen Darstellung der Unterschiede zwischen Suchprozessen im Web und klassi-
schem Information Retrieval werden Ideen aus der Literatur vorgestellt, wie der Suchprozess
strukturiert werden kann. In einem zweiten Teil werden einige Resultate präsentiert wie Benutzer
im Web suchen. Die Diskussion von möglichen Strukturierungsansätzen für Suchprozesse präsen-
tiert im wesentlichen Modelle, die im Zusammenhang mit klassischem Information Retrieval ent-
wickelt wurden. Besonderheiten des Suchens im World Wide Web werden dargestellt. Die Dis-
kussion der möglichen Strukturierungsansätze ist gegliedert in drei Granularitätsstufen: a) generel-
le Ziele, Aufgaben und Strategien, b) Funktionen, Phasen und Schritte des Suchprozesses, sowie c)
Detailaufgaben, -ziele und Bedienschritte. Ausgewählt werden mit dem task actions model, dem
four-phase framework of information seeking und der TTT data type by task taxonomy drei An-
sätze von Shneiderman. Da sich die Darstellungen der Arbeit im wesentlichen auf den Ebenen a)
und b) bewegen, spielt die TTT data type by task taxonomy im weiteren Verlauf nur eine unterge-
ordnete Rolle. Die Aufnahme erfolgte zur Abrundung des Gesamtbildes. Nach der theoretischen
Auseinandersetzung mit dem Suchprozess erfolgt ein Blick auf empirische Ergebnisse zum realen
Suchverhalten. Im Abschnitt zur Frage „wie suchen Benutzer im Web“ werden im wesentlichen
die Ergebnisse aus vier Studien vorgestellt, in denen Protokolldateien großer Suchmaschinen ana-
lysiert wurden. Es handelt sich dabei um die Excite-Studie von [Jansen, Spink, Bateman et al.
1998], die AltaVista-Studie von [Silverstein, Henzinger, Marais et al. 1999], die 1998er Fireball-
Studie von [Hölscher 1998] und die 1999er Fireball-Studie von [Röttgers 1999]. Wichtigste Er-
gebnisse: eine Suchanfrage enthält im Schnitt etwa zwei Suchbegriffe und die Benutzer gehen nur
selten über die erste Ergebnisseite mit zehn Treffern hinaus. Das Kapitel schließt mit einigen Er-
gebnissen zu Unterschieden bei der Web-Suche zwischen Benutzergruppen.
Nach einer knappen Darstellung der Aufgaben der Informationsvisualisierung (Information Visu-
alization) beginnt das Kapitel mit der Vorstellung eines Referenzmodells von [Card, Mackinlay,
Shneiderman 1999]. Die Autoren strukturieren hier den Prozess der Abbildung von Ausgangsdaten
über Datentabellen und visuelle Strukturen zu den Ansichten, die der Benutzer letztendlich auf
dem Schirm präsentiert bekommt. Das Modell wird im weiteren Verlauf der Arbeit benutzt, um
Technikübersichten zu strukturieren, bestimmte Einzelaspekte einzuordnen oder die Datenabbil-
dungen im System INSYDER zu erläutern. Großen Raum in der Arbeit nimmt die Darstellung der
Möglichkeiten der Informationsvisualisierung dar. Die Übersicht ist fokussiert auf die Darstellung
von Suchergebnissen und beleuchtet das Thema von mehreren Seiten. Als Einstieg wurde der As-
pekt der Metaphern gewählt, die ja normalerweise auch dem Benutzer den Zugang zu einem Sys-
tem erleichtern sollen. Es folgt ein Abschnitt, der auf abstraktem Niveau Techniken beschreibt, die
im Rahmen der Informationsvisualisierung genutzt werden. Anschließend werden, unter Verwen-
dung eines wo immer möglich durchgehend einheitlichen Beispiels, zahlreiche Ideen präsentiert
wie Suchergebnisse visualisiert werden können. Die komponentenorientierte Darstellung ist ge-
gliedert in die Visualisierung von Abfragen oder Abfrageattributen, die Visualisierung von Doku-
Page 6
hidden
Page 6 from 266 Thomas M. Mann
Zweiseitige Zusammenfasung in Deutsch Visualization of Search Results from the World Wide Web
mentattributen und die Visualisierung von Ähnlichkeiten zwischen Dokumenten. Zum Themenbe-
reich Visualisierung von Beziehungen zwischen Dokumenten wird auf andere Arbeiten verwiesen.
Die Betrachtung aus unterschiedlichen Blickwinkeln wird abgeschlossen durch eine strukturierte
Auflistung der erwähnten Systeme. Es folgt eine Auseinandersetzung mit mehrfachen, koordinier-
ten Ansichten und der Frage, wann und wie solche Konzepte einzusetzen sind. Das Kapitel zum
Thema Informationsvisualisierung wird beendet mit der Präsentation einiger Resultate aus empiri-
schen Untersuchungen zum Nutzen ausgewählter Visualisierungsansätze und unter dem Stichwort
„5T-Environment“, einer Zusammenfassung von Faktoren, die den Nutzen von Visualisierungen
beeinflussen.
Der empirische Teil der Arbeit beginnt mit einer Beschreibung des Projektes INSYDER, in dessen
Rahmen die Software entwickelt wurde, die bei der Evaluierung verschiedener Darstellungsformen
von Suchergebnissen eingesetzt wurde. Beschrieben werden die Funktionen des Systems im All-
gemeinen, seine Softwarearchitektur, die Funktionen der einzelnen Softwaremodule, der prototy-
pengestützte Entwicklungsprozess und erste formative Evaluationen während des Projektes. Es
folgt eine ausführliche Darstellung der implementierten Visualisierungen sowie des konkreten
Abbildungsprozesses von den Ausgangsdaten zu Ansichten. Hierbei werden auch Probleme
thematisiert, die im Rahmen dieses Prozesses auftraten, sowie verschiedene Visualisierungen, die
aus unterschiedlichen Gründen in der endgültigen Softwareversion nicht umgesetzt wurden. Die
Diskussion der durchgeführten Evaluation beginnt mit einer Beschreibung der Hypothesen und
Variablen, sowie des Versuchsablaufs. Untersucht wurden Effektivität, Effizienz, erwarteter
Nutzen und Benutzer-Zufriedenheit für die Präsentationsformen HTML-Liste, ResultTable,
ScatterPlot plus ResultTable, BarGraph plus ResultTable und SegmentView plus ResultTable. Der
Test wurde mit 40 Benutzern und jeweils zwölf Aufgaben im Frühjahr 2000 an der Universität
Konstanz durchgeführt. Unabhängige Variablen waren Präsentationsform, Benutzergruppe
(Anfänger / Experte), Anzahl der Suchbegriffe (1 / 3 / 8), Anzahl der als Ergebnis präsentierten
Dokumente (30 / 500) und Art der Aufgabe (Finden spezifischer Fakten / erweitertes Finden von
Fakten). Die Fragebogenauswertung ergab, dass die Benutzer zwar an verschiedenen Stellen
Probleme mit der Benutzbarkeit der Visualisierungen hatten, ganz generell aber die Möglichkeiten
sehr begrüßten, die von der ResultTable und den Visualisierungen geboten wurden. Die
Unterschiede in der Einschätzung zwischen Anfängern und Experten waren gering und bezogen
sich, wenn überhaupt, meist auf den ScatterPlot. Wenn positive und negative Bewertungen zu-
sammengefasst werden, schneiden die ResultTable und die SegmentView besser ab als die HTML-
Liste. Der BarGraph und speziell der ScatterPlot schneiden schlechter ab als die HTML-Liste.
Beim Vergleich von subjektiven Einschätzungen und ermitteltem Erfolg der Komponenten muss
beachtet werden, dass im Fragebogen nach den einzelnen Komponenten gefragt wurde, im
Versuch für die drei echten Visualisierungen aber immer zusätzlich die ResultTable zur Verfügung
stand und von den meisten Probanden auch genutzt wurde. Von einigen sogar mehr als die
eigentliche Visualisierung. Bezüglich Effektivität, Aufgabenerledigungszeit und Effizienz zeigte
die traditionelle Darstellung in Form einer HTML-Liste generell die besten Werte.
Die Arbeit schließt mit einer Zusammenfassung und einem Ausblick (Summary and Outlook) in
dem auch weitergehende Evaluationen der bestehenden Komponenten und veränderte Visualisie-
rungsansätze in Form einer SuperTable und eines verbesserten ScatterPlots diskutiert werden.
Page 7
hidden
Thomas M. Mann Page 7 from 266
Visualization of Search Results from the World Wide Web Acknowledgements

Acknowledgements
It must have been at the end of the 1980s when I was studying Information Science at the Univer-
sity of Konstanz (Germany) that I first had the idea of getting a doctors degree. Information Sci-
ence in Konstanz is unthinkable without Prof. Dr. Rainer Kuhlen, whom I have to thank for my
choice of Information Science. For a number of reasons it took several years until I really started
to take the next step in the direction of a doctor’s degree. Special thanks to Dr. Wolf R. Dom-
browsky from the Katastrophenforschungsstelle at the Christian-Albrechts-Universität zu Kiel
(Germany), who encouraged me in the mid 1990s to turn my vague plans into a real project.
From all the people who contributed in one or another way to me finishing the project, I have to
thank first of all my advisor Prof. Dr. Harald Reiterer, who gave me the chance to be the first
member in his team at the University of Konstanz in 1997. He motivated, stimulated and supported
me for more than four years in the finally successful project. Without him this thesis would never
have been possible. Also thanks to Prof. Dr. Wolfgang Pree from the University of Konstanz for
his interest in my work and for taking over the role of a second referee.
From my colleagues I have to thank especially my “doctor sister” Gabriela Mußler, who, together
with Harald, was the driving force behind the University of Konstanz getting the chance to partici-
pate in the INSYDER project. This was the basis for my evaluation work. Congratulations to her
for finishing her thesis about the INSYDER project at her new home in Penistone, Sheffield (Eng-
land). Also special thanks to Siegfried Handschuh. As chief architect of contributions from the
University of Konstanz for the INSYDER system development he was one of the main factors
helping to turn my visualization ideas into concrete pieces of software. In this context I also have
to thank Georg Odenthal from the University of Konstanz and Laurent Dosdat from Arisem S.A.
Paris (France) for the time they spent developing the INSYDER system.
At the University of Konstanz a number of additional colleagues and students helped to work on
the INSYDER project, the evaluation performed, or my work in general. Just to list the most im-
portant ones: Dagmar Michels, Dr. Marc Rittberger, Dr. Wolfgang Semar, Dr. Ulrik Brandes, Er-
sin Kurun, and especially the evaluation team Dietmar Ohlmann, Edgar Fiederer, Edgar Spre-
thuber, Joachim Griesbau, and Ludmilla Bernet. Thanks also to the forty participants of the study.
From the other members of the INSYDER team I want to especially thank Alain Garnier, Olivier
Spinelli and Jean Ferrè from Arisem, Rina Angeletti from Innova (Rome, Italy), Flavia D’Auria
from Promoroma (Rome, Italy), Carlo Revelli and Guillaume Lory from Cybion (Paris, France)
and last but not least the European Commission DG III (Brussels, Belgium) with Patrick Corsi
who funded the project.
Thank you to John V. Cugini, Dr. Christoph Hölscher, Heike Röttgers and other of the above men-
tioned for providing me with helpful material.
Special thanks to Dr. Bertrand Lisbach from Basel (Switzerland) for support in statistics, as well as
to Charlie Smith, Malcolm MacLaren, Beate Heckner, and a number of other people for reading
preliminary versions of the thesis and providing me with helpful advice.
Last but not least I want to thank Veronika for years of love, patience, and support.

Page 8
hidden
Page 8 from 266 Thomas M. Mann
Contents Visualization of Search Results from the World Wide Web
Contents
1. Introduction............................................................................................................................ 11
1.1. Problem................................................................................................................................ 11
1.2. Solution................................................................................................................................ 14
1.3. Structure of the Thesis......................................................................................................... 16
2. Information seeking ............................................................................................................... 18
2.1. Information Retrieval .......................................................................................................... 18
2.2. Structuring the information-seeking process....................................................................... 19
2.2.1. High-level goals, tasks, and strategies.......................................................................... 20
2.2.2. Functions, phases, and steps of searching .................................................................... 24
2.2.3. Low-level tasks, goals, and interface actions ............................................................... 27
2.3. How do users search in the Web?........................................................................................ 29
2.3.1. General trends............................................................................................................... 30
2.3.2. User group differences ................................................................................................. 40
2.4. Summary of the chapter about Information Seeking........................................................... 44
3. Information Visualization ..................................................................................................... 46
3.1. The ideas behind Information Visualization ....................................................................... 46
3.2. The reference model for visualization................................................................................. 47
3.3. State of the Art: Visualization Ideas, Metaphors, Techniques, Components and Systems. 49
3.3.1. Metaphors ..................................................................................................................... 51
3.3.2. Techniques.................................................................................................................... 60
3.3.2.1. Brushing and linking ............................................................................................. 60
3.3.2.2. Panning and zooming ............................................................................................ 61
3.3.2.3. Focus-plus-context ................................................................................................ 62
3.3.2.4. Magic Lenses......................................................................................................... 64
3.3.2.5. Animation.............................................................................................................. 65
3.3.2.6. Overview plus detail.............................................................................................. 65
3.3.3. Components.................................................................................................................. 66
3.3.3.1. Visualization of queries or query attributes .......................................................... 67
3.3.3.2. Visualization of document attributes..................................................................... 75
3.3.3.3. Visualization of interdocument similarities .......................................................... 88
3.3.3.4. Visualization of interdocument connections ....................................................... 113
3.3.4. Systems....................................................................................................................... 113
3.4. State of the Art: Multiple Coordinated Views................................................................... 117
3.5. Empirical evaluation of visualizations .............................................................................. 121
3.6. Influencing Factors: 5T-Environment ............................................................................... 127
4. INSYDER.............................................................................................................................. 129
4.1. The INSYDER project ...................................................................................................... 129
4.1.1. Functions of the INSYDER system............................................................................ 129
4.1.2. Architecture and Implementation ............................................................................... 133
4.1.3. Software development and prototypes ....................................................................... 135
4.1.4. Formative evaluation during the project..................................................................... 138
4.2. The INSYDER visualizations............................................................................................ 139
Page 9
hidden
Thomas M. Mann Page 9 from 266
Visualization of Search Results from the World Wide Web Contents

4.2.1. Ideas behind the INSYDER visualization components.............................................. 139
4.2.2. INSYDER and the reference model for visualization ................................................ 142
4.2.3. The INSYDER visualization components.................................................................. 147
4.3. Evaluation of the visualizations......................................................................................... 157
4.3.1. Hypotheses ................................................................................................................. 158
4.3.2. Independent Variables ................................................................................................ 160
4.3.2.1. User Interface ...................................................................................................... 160
4.3.2.2. Target User Group............................................................................................... 161
4.3.2.3. Type and number of data..................................................................................... 163
4.3.2.4. Task ..................................................................................................................... 165
4.3.3. Static Variables........................................................................................................... 167
4.3.3.1. Technical Environment ....................................................................................... 167
4.3.3.2. Training ............................................................................................................... 168
4.3.4. Dependent Variables................................................................................................... 168
4.3.4.1. Effectiveness ....................................................................................................... 169
4.3.4.2. Task time ............................................................................................................. 169
4.3.4.3. Temporal efficiency............................................................................................. 169
4.3.4.4. Expected added value .......................................................................................... 169
4.3.4.5. Satisfaction .......................................................................................................... 170
4.3.5. Procedure .................................................................................................................... 170
4.3.5.1. Pre-test................................................................................................................. 170
4.3.5.2. Entry Questionnaire............................................................................................. 171
4.3.5.3. ScreenCam introduction ...................................................................................... 171
4.3.5.4. Warm-up Phase ................................................................................................... 172
4.3.5.5. 12 Tasks............................................................................................................... 172
4.3.5.6. Questionnaire....................................................................................................... 174
4.3.6. Evaluation: results ...................................................................................................... 176
4.3.6.1. Expected added value .......................................................................................... 176
4.3.6.2. User Satisfaction.................................................................................................. 178
4.3.6.3. Hard Facts............................................................................................................ 193
4.3.6.4. Summary of the hard facts results ....................................................................... 221
5. Summary and Outlook ........................................................................................................ 225
6. References ............................................................................................................................. 232
7. Index of figures and tables .................................................................................................. 251
7.1. Figures ............................................................................................................................... 251
7.2. Tables................................................................................................................................. 254
8. Appendix ............................................................................................................................... 256
8.1. Tasks.................................................................................................................................. 256
8.2. Additional figures from the hard facts............................................................................... 257
8.3. Additional inferential statistics.......................................................................................... 263
8.4. INSYDER function Mindmap ........................................................................................... 264


Page 10
hidden
Page 10 from 266 Thomas M. Mann
Version 1.07 - 2002-01-30 Visualization of Search Results from the World Wide Web


Page 13
hidden
Thomas M. Mann Page 13 from 266
Visualization of Search Results from the World Wide Web 1. Introduction

gine Excite [Jansen, Spink, Bateman et al. 1998]5 showed that users normally do not have a look at
more than the first 20 or 30 results presented6, 7 in a session. Other studies report similar measures8
or even lower numbers of hit pages viewed by the users9 when looking at the query level.
People seem to do what Zimmer demands: if the result set is too large, rejection is the reaction.
Regarding the information-seeking process as a multiple step selection process - where the user
decides to look for the needed information in the internet, selects a search or meta search engine,
chooses the keywords and search options, launches the search - in the step of reviewing the result
set, the next selections are highly dependent on one dimension of the attributes of the results: the
ordering of the result set, which is in most cases the relevance measure calculated by the search
engine. Especially for large, unstructured result sets with intransparent ranking criteria, the distilla-
tion of relevant information will be more or less a result of a pure rejection, instead of a logic
based selection in this step of the search. Due to the fact that they are all based on examinations of
the search engines log files the studies about Web searching cited above say nothing about the
question of which of the documents of the first three result pages are really viewed by the users. So
the selection from this maximum of 10 to 30 documents could be based on a number of other
dimensions showed in the result pages like title, abstract, size or age of the document, the server
where it resides or others, but in any case most of the users rejected all documents in the result set
ranked 31 or higher. The numbers regarding Web searching should not be over interpreted due to a
number of limitations these studies have10. But taking it as an assumption that people do not exam-
ine all hits of large result sets and despite all efforts to improve the process of getting the result set
and the ranking of items in the result set, the ranking could be a bottleneck for the selection or
rejection decision of the user. This is independent from the question how many criteria or dimen-
sions are taken into account when calculating the relevance value.
In the INSYDER project which is the basis for the work discussed here, a lot of effort has been
spent to support the user on his way from his information demand to the result set and the best
possible ranking of the documents in the result set (for details see [Mußler, Reiterer, Mann 2000],
[Mußler 2002]). But despite all the work in this area undertaken in this project and many others,
the question remains as to whether different presentations of the result set to the user, which break
up the traditional sequential ordering mostly based on relevance ranking, will help the user to sat-
isfy his information demand faster, better or in a more satisfying way.

5 86% of 18,113 users viewed not more than three result pages from Excite with 10 hits each, 77% not more than
two and 58% not more than one
6 Preliminary Version of [Jansen, Spink, Bateman et al. 1998a] cited by [Amento, Hill, Terveen et al. 1999]:
“showed that 86% of all users looked at no more than 30 pages”
7 [Jansen, Spink, Bateman et al. 1998] cited by [Heidorn, Cui 2000]: “study showed that 58% of users do not look
beyond the first 10 titles and 77% do not look beyond the first 20”
8 [Xu 1999] cited by [Spink, Xu 2000] from 1996 to 1999 over 70% of Excite users viewed not more than one re-
sult page with 10 hits each
9 [Silverstein, Henzinger, Marais et al. 1999] in 95.7% of nearly 1 billion requests the users viewed not more than
three result pages from AltaVista with 10 hits each, 92,7% not more than two and 85,2% not more than one
10 So is the frequently cited study of Jansen et al. based on data collected from one search engine during a couple of
hours on a single day, or Silverstein, Henzinger, Marais et al. mention that they could not distinguish requests by
robots from requests by humans.
Page 16
hidden
Page 16 from 266 Thomas M. Mann
1. Introduction Visualization of Search Results from the World Wide Web
ideas of how to use visualization for information handling purposes has exploded over the last few
years. On the other hand the number of experimental verifications of how helpful these ideas really
are is relatively low. Additionally the usage of information visualizations inherently carries or
requires some of the other possibilities like direct manipulation or short response times. This is
also evident when looking at the above listed principles for visual information-seeking by Ahlberg
and Shneiderman, which by the way were derived by taking the principles of direct manipulation
as a starting point. The above mentioned INSYDER project offered an ideal test bed to implement
some ideas out of the huge field of IV ideas, and really test their effects when used to support users
in handling result sets of Web searches. The theoretical background, the rationales behind the user
interface design choices and implementation, the design of the performed user study and its results
will be described in the remainder of this thesis.
1.3. Structure of the Thesis
The remainder of this thesis will start in Chapter 2 with a brief discussion of the information-
seeking process. The relation to classical information retrieval will be exposed. Different models
used to structure the information-seeking process in phases or tasks will be shown. Focusing on
the application domain information seeking in the Web, the chapter will close with some notes
regarding what is known about how users search in the Web.
Chapter 3 is dedicated to Information Visualization. An introduction of a reference model for in-
formation visualization is followed by an overview about the state-of-the-art of Information Visu-
alization structured in metaphors, techniques, components, and systems. The chapter is focused on
visualizations of abstract data. The special case of multiple coordinated views will be addressed in
a separate sub-chapter. The main chapter about IV will close with a discussion of empirical
evaluations of visualization ideas and a compilation of crucial factors for the usefulness of visuali-
zations.
Chapter 4 begins with a description of the INSYDER project and software as a framework for the
evaluations which are the basis for the results presented in this thesis. The implemented visualiza-
tions are presented and discussed in detail. After a description of the ideas behind the evaluation,
the hypothesis, the variables, and the procedure, the findings will be thoroughly presented and
discussed.
The thesis will conclude with a summary and outlook in Chapter 0. A reference list in Chapter 0,
an index of the figures and tables in Chapter 0, and some additional information in the appendix
follow from this.
Figure 1 shows the structure of the thesis with its main parts.
Page 17
hidden
Thomas M. Mann Page 17 from 266
Visualization of Search Results from the World Wide Web 1. Introduction



























Figure 1: Structure of the thesis (main parts)

References page 232 - 251

Introduction page 11 - 18
INSYDER page 129 - 223

Summary and Outlook page 223 - 232
Structuring the information-seeking process page 19 - 29
Empirical evaluation of visualizations page 121 - 127
State of the Art: Multiple Coordinated Views page 117 - 121
The INSYDER visualizations page 139 - 157
Information Visualization page 46 - 129
Information seeking page 18 - 46
How do users search in the Web? page 29 - 46
Evaluation of the visualizations page 157 - 223
State of the Art: Visualization Ideas, Metaphors, Techniques,
Components and Systems page 49- 117
The INSYDER project page 129 - 138
Page 18
hidden
Page 18 from 266 Thomas M. Mann
2. Information seeking Visualization of Search Results from the World Wide Web
2. Information seeking
2.1. Information Retrieval
The search of information in the World Wide Web today has a number of elements in common
with classical Information Retrieval (IR). Basically in both cases the user has an information need
that is being satisfied by using a (online) search system. In Chapter 2.2 “Structuring the informa-
tion-seeking process”, we will see what the structural differences are when we try to model the
information-seeking process for classical Information Retrieval or Internet searching. In harmony
with the common elements of the search process, Internet search engines use a number of princi-
ples and methods developed in the long history of IR. Anyhow there are also a number of impor-
tant differences that have to been taken into account when working in this field. “Internet search-
ing is very different then IR searching as traditionally practiced and researched. Internet IR is a
different IR.” [Jansen, Spink, Bateman et al. 1998]. Especially when looking for research results
from Information Retrieval to draw conclusions for Internet searching, there are a number of
points which have to be regarded14:
• Classical IR in the past often dealt with bibliographic citations. Internet searching is mainly
full text searching15.
• Many of the classical IR studies in the past were performed with systems using pure Boo-
lean logic. Internet search engines mainly use statistical ranking methods16.
• A near miss in classical IR was often a miss, due to absent hyperlink possibilities in the
document collection. Searching the Internet a near miss can sometimes lead to a needed
document by following a hyperlink.
• Precision may play another role in Web retrieval, than in classical IR [Eastman 1999]17.
• Many of the classical IR studies focus on professional intermediaries like librarians. Inter-
net searching is mainly end user searching.
• IR systems used in earlier times in classical IR studies often had command line based inter-
faces. Internet searching nowadays means at least form fill-in or hyperlink-environments,
sometimes even direct manipulation interfaces. A number of studies in the classical IR-
environments during the last few years also used these types of interfaces. Here it is impor-
tant to assess under which conditions reported results and conclusions arose.

14 Most of the points taken from [Hearst 1999]. Complemented with my own considerations. The goal of Hearst’s
listing is a comparison between earlier IR interface studies and “modern information access”. Nevertheless many of
the points are true for a comparison of a large part of the IR-research described in the literature and “internet search-
ing”.
15 “Full text” does not mean the full text of the Internet, but the full text of the documents in the fraction of the
Internet covered by the used search engine(s).
16 Many of them have additional Boolean options, but the statistical ranking is nearly always present. Sometimes
these statistical ranking methods are not only concentrated on the query-document-relation itself, but also process
information like the number of references from other pages or sites.
17 [Eastman 1999] made her students perform exercises in Web search, to demonstrate a well-known effect from
classical IR: more precise and narrower searches lead to fewer hits and better results. This was not always reliable for
the searches the students performed using popular Web search engines. A reexamination of a number of searches
confirmed this observation.
Page 20
hidden
Page 20 from 266 Thomas M. Mann
2. Information seeking Visualization of Search Results from the World Wide Web
model can be a “step” in another, and what one author classifies as a “goal” is a “task” for another.
When talking about specific terms like “tasks”, the granularity of a certain task can range from
“information retrieval” in general to “compare within entities” as a specific low-level task. The
same is true for goals, where the level can range from “monitoring a well known topic over time”
to “accurate value lookup”. The next three sub-chapters try to structure the field and distill out a
framework that can be used as a guideline for system design and evaluation.
2.2.1. High-level goals, tasks, and strategies
The common starting point of nearly all interaction-process- or phase-models of the information-
seeking process is that there is always a user information need at the beginning. This starting situa-
tion is often characterized in the IR literature as an anomalous state of knowledge (ASK) [Belkin
1980] / [Belkin, Oddy, Brooks 1982] / [Belkin, Oddy, Brooks 1982a]. Derived from the informa-
tion need, the user will have one or more goals explicitly formulated, or implicitly in mind behind
his actions. [Hearst 1999] lists “finding a plumber”, “keeping informed about a business competi-
tor”, “writing a publishable scholarly article”, and “investigating an allegation of fraud” as exam-
ples for goals. Hearst comes from her goals to information access tasks that are used to achieve
these goals. These tasks can span from asking specific questions to exhaustively researching a
topic. A task example she cites from [O’Day, Jeffries 1993] is “monitoring a well-known topic
over time”. This task could, for example, be developed from the goal to be kept informed about a
business competitor. From the tasks Hearst comes to a model of interaction, where the information
need is the starting point that is to be followed by different steps like “select a system and collec-
tion to search on” or “formulate a query”.
Whereas Hearst’s tasks are dependent on the user’s goals, [Goldstein, Roth 1994] developed a
model for data exploration where the goals are dependent on the user’s task. However the authors
write: “… we classified the types of interactive data exploration tasks (goals) that users will per-
form …”. They list for example under data manipulation tasks goals such as “controlling scope” or
“choosing level of detail”. Goals at the same level of detail can also be found in other contexts too,
like for example “accurate value lookup” or “comparison of values” in [Roth, Mattis 1990]. This
type of goals will be classified here as low-level tasks, and will be discussed later in Chapter 2.2.3
Low-level tasks, goals, and interface actions.
On the same granularity of information access tasks listed by Hearst, [Shneiderman 1998]
differentiates four types of “task actions” listed in Table 2.
Task actions
Specific fact-finding (known-item search)
Extended fact-finding
Open-ended browsing
Exploration of availability.
Table 2: Task actions according to [Shneiderman 1998]
The two fact-finding tasks both produce clear and replicable outcomes. The main difference be-
tween these two types is that in the first case there is a clear stop criterion, when the user finds a
document to answer the question. In the second case there is no such clear abort criterion to stop
the examination of a result set or the overall search, and therefore the investigation process of a
result set or the complete information-seeking process will be much broader in scope and possibly
Page 21
hidden
Thomas M. Mann Page 21 from 266
Visualization of Search Results from the World Wide Web 2. Information seeking

of longer duration. Even more open and unstructured are the remaining two task actions open-
ended browsing and exploration of availability. Trying to fit Hearst’s goal examples in this classi-
fication, “finding a plumber” can lead to a specific fact-finding task. Shneiderman’s corresponding
example is “Find the telephone number of Bill Clinton”. Hearst’s “keeping informed about a busi-
ness competitor” could lead to an extended fact-finding task or open-ended browsing. Here the
corresponding examples from Shneiderman are “What genres of music is Sony publishing?” for
extended fact-finding and “Is there new work on voice recognition being reported from Japan?”
for open ended browsing. Taking the remaining example goals from Hearst “writing a publishable
scholarly article” and “investigating an allegation of fraud” the first task action will probably be
an exploration of availability, eventually later followed by more specific task actions. A compari-
son of the information access tasks by [Hearst 1999] and the task actions by [Shneiderman 1998]
is shown in Figure 2.
Readily
identifable
outcome
Openess
[Hearst 1999] [Shneiderman 1998]
Asking specific questions
Exhaustively researching a topic
Specific fact finding (known item search)
Extended fact finding
Open Ended Browsing
Exploration of Availabilty

Figure 2: High-level tasks by [Hearst 1999] and [Shneiderman 1998]
[Shneiderman 1998] points out that the task actions are broken down into browsing or searching.
In a next step browsing and searching are represented by interface actions like scrolling or zoom-
ing. But before we reach this level of detail two other points should be discussed in more depth:
information-seeking strategies and phases or steps of searching.
Using again the “finding a plumber” example, there are different possibilities to fulfill the informa-
tion need. [Baeza-Yates, Ribeiro-Neto 1999] emphasize, when using a retrieval system for ASK-
situations, the distinction between two different types of strategies: information or data retrieval on
the one hand and browsing on the other. In fact, they categorize retrieval and browsing as two dif-
ferent types of tasks. The general distinction between searching (sometimes also named direct
querying or retrieval by specification) and browsing (sometime also named scanning or retrieval
by recognition) is very common in the literature. As shown above, Shneiderman makes the same
distinction, however not directly using the term “task” on this level. Because the term task is used
in such an inflationary way by many authors, it seems to be more appropriate to classify these dif-
ferent types of behavior as strategies like for example done by [Henninger, Belkin 1996]. Having a
closer look at information-seeking strategies [Belkin, Marchetti, Cool 1993] and [Belkin, Cool,
Stein et al. 1995] try to structure the field by defining a multi-dimensional space of information-
seeking strategies. For this purpose they use four dimensions: method of interaction (scanning 
searching), mode of retrieval (recognition  specification), goal of interaction (learning  select-
ing), and resource considered (information  meta information). With these dimensions they cre-
ate a matrix that shows the possible combinations in the form of sixteen different Information-
Seeking Strategies (ISS). Table 3 shows a selection of the most interesting ISSs in the context of
this thesis.
Page 22
hidden
Page 22 from 266 Thomas M. Mann
2. Information seeking Visualization of Search Results from the World Wide Web
ISS Method of Interaction Mode of Retrieval Goal of Interaction Resource Considered
ISS5 Scan Recognize Select Information
ISS7 Scan Specify Select Information
ISS13 Search Recognize Select Information
ISS15 Search Specify Select Information
Table 3: Examples of Information-Seeking Strategies ISS according to [Belkin, Marchetti, Cool 1993] and
[Belkin, Cool, Stein et al. 1995]
The goal of interaction as a dimension of the matrix created by Belkin et al. focuses on the re-
trieval system used. The two modes are “learn” and “select”. For the resource considered, the dis-
tinction between “information” and “meta information” is a classical IR category. The subtle dif-
ferentiation between method of interaction and mode of retrieval is particularly interesting. The
authors point out that scanning is typically associated with retrieval by recognition, and searching
with retrieval by specifications, but they present examples where this typical connection is broken
up. Another important point Belkin et al. emphasize is possible changes of the ISS during an in-
formation-seeking episode. Depending on previous knowledge, the user will start an information-
seeking process with a certain strategy. Getting the first results may cause him to change this strat-
egy. The next set of results may cause another change and so on. The idea that information seeking
is not always a straightforward process with one best strategy can also be found in other models.
One of the most famous ones, which also emphasizes the diversity of strategies, is the berrypicking
model of [Bates 1989]. She also points out that it is not only the strategy that may change, but also
the information need itself. Another important message from Bates is that the information need
may not be satisfied by a single, final retrieved set of documents. All or part of the information
chunks found on the way may also contribute to satisfying the information need(s). Bates lists six
widely used information-seeking strategies: footnote chasing or backward chaining, citation
searching or forward chaining, journal run, area scanning, subject search in bibliographies and
abstracting and indexing services, and author searching. These strategies as parts of the berrypick-
ing model were observed when people used manual sources. At the end of the 1980s, Bates had
great expectations that hypertext approaches would be ideal for berrypicking. What was true for
hypertext will also be true for the Web as the biggest hypertext so far formed.
The findings of Bates are supported by a number of authors like [O’Day, Jeffries 1993] or [Hearst
1999]. The former studied the use of information search results by fifteen regular clients of profes-
sional intermediaries. As shown above, Web searching is mainly end-user-searching. Nevertheless,
the patterns they found for mediated searches may also occur in Internet searching. They classified
three basic search modes: monitoring, planned, and exploratory. Or in more detail: monitoring a
well-known topic or set of variables over time, following an information-gathering plan suggested
by a typical approach to the task at hand, and exploring a topic in an undirected fashion. In addi-
tion they identified patterns of interconnected searches. They established that the accumulation of
search results had value for the end-users - not only the final result set – and this even for mediated
searches. It may be even more the case for end-user searching.
Focusing back on the internet [Baeza-Yates, Ribeiro-Neto 1999] expand their above listed two
different tasks retrieval and browsing to three basic forms of searching for information in the Web:
the use of search engines, that index a portion of the Web documents as a full-text database, the
use of Web directories, which classify selected Web documents by subject, and the exploitation of
the hyperlink structure of the Web for search purposes. In fact we have three different strategies
Page 24
hidden
Page 24 from 266 Thomas M. Mann
2. Information seeking Visualization of Search Results from the World Wide Web
Shneiderman’s task action model [Shneiderman 1998] shown in Table 2 on page 20 will be fo-
cused on as a concrete task model in the remainder of this thesis. The content area of this thesis is
the visualization of search results; therefore the next chapter, discussing lower levels of abstrac-
tion, will concentrate mainly on the aspects of searching as a strategy, despite the fact that there
are a number of other possibilities which can be used in fulfilling an information need.
2.2.2. Functions, phases, and steps of searching
When concentrating on searching, the information-seeking process can be broken down into a
number of finer granulated functions, phases or steps. A famous model of doing this, especially
targeted on end-user information seeking, is proposed by [Marchionini 1992]. It consists of the
following five functions: Define the problem, Select the source, Articulate the problem, Examine
the results, and Extract information. Like many other authors20 Marchionini points out that the
overall process is iterative. To accentuate this, he represents the functions in the corresponding
figure in a nonlinear way as shown in Figure 3.
Select Source Extract Information
Articulate Problem Examine Results
Define Problem

Figure 3: Information seeking functions according to [Marchionini 1992] p. 157 FIG. 1.
The representation is without doubt nonlinear, but it lacks a little bit in terms of showing what
Marchionini himself explains as: “recognizing and defining an information problem initiates in-
formation seeking” [Marchionini 1992]. This initiation as a starting point is better depicted by a
revision of this model undertaken in [Marchionini 1997], and shown in Figure 4. The fact that the
process starts at a certain point with an information need is also shown in a figure used by [Hearst
1999] to show a standard process as a sequence of steps. It is reproduced here in Figure 5. The
revised model by [Marchionini 1997] contains the following steps: Recognize and accept an in-
formation problem => Define and understand the problem => Choose a search system => Formu-
late a query => Execute search => Examine results => Extract information => Reflect / Iterate /
Stop. Comparing the figures from Marchionini and Hearst the main functions from Marchionini
can be found as steps in Hearst’s diagram, except “select source”. Interestingly enough, in her
textual description the step is listed: “(1) Start with information need. (2) Select a system and col-
lections to search on. (3) Formulate a query. (4) Send the query to the system. (5) Receive the re-
sults in the form of information items. (6) Scan, evaluate, and interpret the results. (7) Either stop,
or, (8) Reformulate the query and go to step 4.” [Hearst 1999]. After introducing the “standard”
process Hearst too emphasizes the non-linearity of the overall process, and furthermore, points out
that there are a number of points like the role of scanning and navigation not represented in the
model. Supporting Bates, she also de-emphasizes the role of the final result set and states that ac-
cumulated learning and acquisition of information occurring during the search process is the main
value of the search.

20 E.g. [Shneiderman 1998] or [Hearst 1999]
Page 28
hidden
Page 28 from 266 Thomas M. Mann
2. Information seeking Visualization of Search Results from the World Wide Web
interesting when dealing with the metadata of the documents or facts drawn from the documents.
But even when we look at broader scope information searches, like those undertaken in the previ-
ously cited study of [O’Day, Jeffries 1993], we can identify some comparable tasks or low-level
goals. [O’Day, Jeffries 1993] refer to these as analysis techniques, and state that about 80% of 80
analysis examples they researched fell into one of the six categories listed in Table 10.
Analysis technique
Looking for trends or correlations
Making comparisons of the different pieces of the data set
Experimenting with different aggregates and/or scaling
Identifying a critical subset of relevant or unique items
Making assessments
Interpreting data to find meaning in terms of domain or problem concepts
Table 9: Analysis techniques according to [O’Day, Jeffries 1993]
Focusing on the data manipulation aspects of interactive data exploration, [Goldstein, Roth 1994]
categorized users’ goals by examining users tasks as shown in Table 10. Additionally they break
down the goals into operations as shown in the third column.
Task Goal Operation
Filter data using attribute(s) Controlling scope
Select multiple disjunctive subsets
Select attribute(s) for viewing operations
Select attribute(s) for level of detail operations
Selecting focus of attention
Select attribute(s) from existing attributes
Predefined aggregation & decomposition
Data manipu-
lation
Choosing level of detail
Flexible aggregation & decomposition
Data analysis No detailed goals or operations defined by the authors. An example listed is obtaining statistics on
portions of the data.
Data visuali-
zation
No detailed goals or operations defined by the authors. Examples listed are requirements and specifica-
tions for viewing the data through appropriate visualizations.
Table 10: User tasks and goals in interactive data exploration according to [Goldstein, Roth 1994]
Another model, especially for visual environments comes from [Wehrend, Lewis 1990]. They
describe domain-independent operations classes that users might perform. The operation classes
they list are shown in Table 11.
Task
Identify
Locate
Distinguish
Categorize
Cluster
Distribution
Rank
Compare within relations
Compare between relations
Associate
Correlate
Table 11: Operation classes in a visual environment according to [Wehrend, Lewis 1990], Table 2
Page 29
hidden
Thomas M. Mann Page 29 from 266
Visualization of Search Results from the World Wide Web 2. Information seeking

To complete the overview about the scope within which tasks can be structured, a last model spe-
cially dedicated to document spaces will be introduced: Navigation tasks taken as an assumption
by [Spring, Morse, Heo 1996] to develop the CASCADE system are shown in Table 12.
Navigation tasks
Finding groups of objects of interest
Finding specific objects of interest
Following interesting paths
Tentative exploration of objects of given attributes
Table 12: Navigation tasks in a document space taken as an assumption by [Spring, Morse, Heo 1996] to de-
velop the CASCADE system.
What is the insight from this chapter? On higher levels of abstraction of the information-seeking
process it was, with a certain amount of effort, still possible to draw a common set of conclusions
from the different models and findings or to compare them. On the plane of low-level tasks, goals
and interface actions this is much more difficult. In the rest of this thesis Shneiderman’s task clas-
sification [Shneiderman 1998] will be used where necessary. On the one hand its level of granular-
ity is high enough to grasp a broad scope of problems. On the other hand it is seamlessly integrated
in the higher levels of abstraction by Shneiderman et al. already chosen above, like the four-phase
framework or the task actions model. In the remainder of this work, the discussion will concentrate
mainly on the area of high-level goals, tasks, and strategies or functions, phases, and steps of
searching. Low-level tasks, goals, and interface actions will play only a subordinate role, except
for a later introduction of visualization techniques. Nevertheless, this chapter has been included to
provide a summary of the overall view.
2.3. How do users search in the Web?
After the introduction of theoretical models for the information-seeking process it will be interest-
ing to know what people are really doing when they search the Web. A number of findings have
already been mentioned in the introduction in establishing the framework for this thesis, and in
Chapter 2.1 “Information Retrieval” to show the differences between classical Information Re-
trieval and Web searching. In the following chapter we will have a closer look at a number of stud-
ies about people searching the Web.
Research on Web searching is in its infancy [Jansen, Pooch 2000]. When investigating search be-
havior of World Wide Web users, a number of interesting questions arise, including who the users
of the Web are, what they are looking for, where they are looking, how they are looking? Where
do they search?, and How do they search? As well as being of scientific interest, some of the an-
swers to these questions are of high economic value, due to the high economic impact of the
World Wide Web. So it is, for example, very interesting for the advertising industry to know what
the characteristics of the user population of a certain search engine are or which keywords people
use the most. Due to this economic value, not all of the material collected is affordable for some-
one who is writing a thesis. This is made more frustrating by the fact that things are changing fast.
This change does not only involve the size of the Web or the number of users. With a growth in
user population, a change in its assembly follows. In summary, we are looking here at results from
a research discipline which is at an incipient stage, where not all of the already collected material
is available, and where we are trying to study a fast moving and evolving target. Therefore the
subsequent discussed results can only be a spotlight on the overall field of Web searching. Never-
Page 32
hidden
Page 32 from 266 Thomas M. Mann
2. Information seeking Visualization of Search Results from the World Wide Web
In the literature, the usage of concepts like “query”, “request” or “activity” is not always 100%
consistent. A number of authors try to clarify their wording for their studies. Unfortunately this
clarity is not always really present. In what follows, the attempt will nevertheless be made to ho-
mogenize the usage of the terms, at least in the context of this thesis. “Request” will be used for
one transaction record in a log file (called “activity” by [He, Göker 2000]), being a query, a unique
query, a modified query, an identical query, a null query, or a request for additional result screens.
“Query” will be used in the way proposed by [He, Göker 2000] only for “forming and modifying a
search statement”. Sending it to the search engine as a request is included in this definition. So the
broader usage of “query”, like that of Jansen et al., is narrowed. One search with no change of the
search string and three result screens viewed will be one query but three requests. For cases where
it is not clear whether the authors are talking about queries or requests, the wording “queries / re-
quests” is used.
[Jansen, Spink, Bateman et al. 1998] report an average of “2.84 queries per user”. Ignoring identi-
cal queries33 the average was 1.6 [Jansen, Spink, Saracevic 2000]. In terms of the previously men-
tioned homogenization, they reported 2.84 requests per user, or 1.6 queries per user. [Silverstein,
Henzinger, Marais et al. 1999] report an average of 2.02 queries / requests per session34. The pos-
sible distortion of results due to automatic search agents can be seen in the results of the AltaVista
study with a very high standard deviation of 123.4 and a maximum number of 172,325 queries in
one session35. [Hölscher 2000]36 and [Röttgers 1999] do not report the number of queries per user
or session. Ignoring the methodological problems involved in defining a session, and involved in
making a distinction between requests and queries, we can get the impression of the number of
queries per session shown in Figure 6. The log file hand-tagged by [Lau, Horvitz 1999] revealed in
this context that the users performed an average of 3.27 queries / requests37 per goal38, and 4.28
queries / requests per day39.

33 In the examined Excite log data no differentiation was possible between an identical query entered by the user
and a request for further result pages of an already displayed query, which had also been logged as an identical query.
34 It’s not clear if this 2.0 is with or without identical queries. It may be 2.0 queries per session, but could theoreti-
cally also be 2.0 requests per session. Interestingly enough [Silverstein, Henzinger, Marais et al. 1999] compare their
2.0 with the 2.8 from [Jansen, Spink, Bateman et al. 1998]. [Jansen, Pooch 2000] do the same comparison using the
2.0 and the 1.6.
35 This single session contains 3 times more queries than the whole Excite study, but only 0.017% of the number of
queries of the AltaVista study.
36 For here and the remainder “[Hölscher 2000] does not report “ stands also for [Hölscher 1998], [Hölscher
1998a], [Hölscher, Strube 1999], and [Hölscher, Strube 2000]
37 They report the average number of queries, and it is not clear if this is done using Jansen et al.’s broader method,
or the more narrowed method used in this thesis.
38 Information goals were defined, and the researchers detected changing of goals by using an ontology, inspecting
the Excite log file, and interpreting the sequences of the query terms of the users.
39 The authors do not describe how they extracted the 200 kB from the 48 MB log file. So the basis upon which this
average of 4.28 queries or requests per day and user was discovered is not completely clear.
Page 39
hidden
Thomas M. Mann Page 39 from 266
Visualization of Search Results from the World Wide Web 2. Information seeking

78.4% can be calculated64 for the query. [Silverstein, Henzinger, Marais et al. 1999] report 63.7%
for the session65 and 85.2% for the query. Detailed data on query level, shown in Figure 9, is only
available from [Hölscher 1998a], and [Silverstein, Henzinger, Marais et al. 1999].
Result pages viewed per query
0%
20%
40%
60%
80%
100%
more than 3 pages 664,951 4.3%
3 pages viewed 411,862 3.0%
2 pages viewed 1,000,694 7.5%
1 page viewed 7,543,840 85.2%
Fireball [Hö lscher 1998a] AltaVista [Silverstein, Henzinger, M arais et al. 1999]

Figure 9: Result pages viewed per query according to [Hölscher 1998a]66, and [Silverstein, Henzinger, Marais et
al. 1999]
One of the differences between classical IR and Web searching is that a near miss in a Web search
can nevertheless lead to the requested information. This aspect is taken into account in Web search
models. Empirical evaluations studying users instead of analyzing log files of search engines, like
[Hölscher 2000] or [Körber 2000], show that browsing episodes are not only part of the models,
but really occur quite often in reality. The number of screens viewed from the search engine is an
important parameter when talking about the search engine part of an information-seeking episode,
but it is important to remember that this – in contrast to classical IR studies – is not correlated with
the number of pages or documents the user really viewed.
When studying the topics people look for when searching the Web by analyzing the most com-
monly occurring keywords in [Jansen, Spink, Bateman et al. 1998], [Silverstein, Henzinger,
Marais et al. 1999] or [Sullivan 2000a], the impression is that the topics come from all conceivable
areas, they definitely contain sexual topics, and they seem to be influenced by trends. Or in the
words of [Jansen, Spink, Bateman et al. 1998]: “There is a lot of searching about sex on the Web,
but all together it represents only a small proportion of all searches. … A great many other sub-
jects are searched, and the diversity of subjects searched is very high.” What they do not comment

64 [Hölscher 1998a] contains a table listing that 59.1% (9,621,347) of the 16 Million queries are requests for a first
result screen, 12.85% (2,077,507) are requests for a second result screen, 6.66% (1,076,813) for a third, 4.11%
(664,951) for a fourth, and so on. Summing up the request from the table, there are 0.5% missing of the reported
16,252,902 queries. Assuming that in most cases people will have a look at the result pages in a sequence beginning
with the first screen, then the second screen, then the third screen, and so on, it can be calculated that having 9,621,347
requests for a first screen, and 2,077,507 requests for a second screen, there must have been 7,543,840 queries where
only one screen had been viewed. Like other engines, Fireball allows one to follow hyperlinks to later or earlier result
screens more than one step away from the current one directly. Assuming a sequence jumping around in the result
pages with skipping single screens is ignored. [Jansen, Pooch 2000] list in their Table 1, for [Hölscher 1998] “Number
of Relevant Documents Viewed in a Session”, “10 or less: 59.51% (9,621,347)”. Hölscher did not try to reconstruct
sessions from the log file [personal mail communication with Christoph Hölscher 2000-01-15]. Probably [Jansen,
Pooch 2000] misinterpreted here the table from [Hölscher 1998a], and took the percentage of requests for a first screen
from all requests, as percentage of sessions with just one result screen.
65 [Jansen, Pooch 2000] cite [Silverstein, Henzinger, Marais et al. 1999] with 85.2% for the session, but 85.2% is
the figure for the query, for the session it is 63.7%.
66 Numbers for [Hölscher 1998a] are calculated as described in 64. Percentages are: one screen per query 78.4%,
two screens 10.4%, three screens 4.3%, more than three screens 6.9%.
Page 40
hidden
Page 40 from 266 Thomas M. Mann
2. Information seeking Visualization of Search Results from the World Wide Web
on is the dynamic changing process of the top topics over time, but that is clear when discussing a
dataset which covers only a portion of a single day. In addition to the influence of trends, it doesn’t
take much research to realize that the topics people are looking for will at least additionally be
dependent on the country where they are living.
2.3.2. User group differences
When considering that there are differences in Web usage or Web search behavior depending on
the characteristics of a user population, it will be interesting to know what the characteristics of the
users who are behind the results presented in the last chapter are. One factor influencing the behav-
ior we have already seen is the location where people live – or is it their Internet maturity, or their
mother language?
[Spink, Bateman, Jansen 1998] performed a theoretically appealing study trying to find out more
about the users of the Excite search engine by doing an interactive survey in April 1997, one
month after the log data for [Jansen, Spink, Bateman et al. 1998] had been collected. The results
from the 35767 users who responded are to a certain degree interesting68, especially the fact that
single search sessions appear often to have been part of a longer search process (at least valid for
the participants of the survey). In general, the sample seems to be too small and the participants are
from a too specific group of users who took the time to answer the questionnaire, to get a valid
impression about the Excite user population, their characteristics, their goals, or their behavior. A
much broader picture about the user population, not for a specific search engine, but for the Web
itself, comes from the series of the Georgia Institute of Technology’s Graphics, Visualization, and
Usability Center WWW User Surveys [GVU 1994 – 1998], or from surveys carried out by various
consulting companies. One important trend shown very clearly by [Pitkow, Kehoe 1996] is the
continuous change over time of the characteristics of the people using the Internet. Therefore in
the case of a log file analysis from search engines, all data of the characteristics of a user popula-
tion must be drawn contemporary to the time frame of the log file. These criteria are fulfilled by
[Spink, Bateman, Jansen 1998], but their work is heavily influenced by the problems described by
[Pitkow, Kehoe 1996] for all WWW-based surveys: self-selection and sampling. In general, when
studying search behavior and user characteristics in the Web there is always a problem: The
broader the statistical basis of data about the search behavior, the more difficulty one has in getting
detailed information about the characteristics of the user population and vice versa. The excursus
about the user population that is responsible for the trends reported in the last chapter will be
stopped here. The lesson learned is that drawing conclusions from the results should be done with
care when trying to transfer them to a special user population like people from small and medium
size enterprises looking for business information in the Web. We will now turn to the question of
whether there is any material available which focuses on such a special population or if there are
any more differences known between users groups other than the already reported differences be-
tween UK/US and European users respectively the differences between the users of the different

67 In the abstract [Spink, Bateman, Jansen 1998] write “Three hundred and fifty-seven (357) EXCITE users re-
sponded ...“, and in the result section “Only 316 of the 480 returned survey forms contained usable data.” The maxi-
mum number of reported answers for one of the 18 questions was 301 answers.
68 Some other results seem to be rather banal, like for example “Interestingly, the largest group of respondents were
searching EXCITE from home …” [Spink, Bateman, Jansen 1998]. Indeed very surprising for a dataset collected in a
five-day period from Friday to Tuesday, with the heaviest usage of the survey form on Saturday, that 36% of the re-
spondents used their computer at home.
Page 41
hidden
Thomas M. Mann Page 41 from 266
Visualization of Search Results from the World Wide Web 2. Information seeking

search engines.
At this stage, four studies should be mentioned. The investigation of [Meyer, Sit, Spaulding et al.
1997] about age group and training differences in World Wide Web navigation, the studies of
[Hölscher, Strube 2000] and [Körber 2000] about differences in Web search behavior between
internet experts and newbies, and the exploration by [Wang, Hawk, Tenopir 2000] into the influ-
ence of search experience, affective states, and cognitive style on Web search process and success.
In an experiment involving 20 participants without significant WWW experience, [Meyer, Sit,
Spaulding et al. 1997] detected that the “older” participants (ages 64 to 81) took more steps to
reach a target in a set of 19 locally stored Web pages, than the “younger” ones (ages 19 to 36). The
average of the nine tasks for older adults was 9.7 steps, for younger adults 6.4. Unfortunately there
is no statistical validation of their results included69. The training effects were also interesting,
showing that the 11 users (7 old / 4 young) who got a “hands-on” navigation tutorial had an aver-
age of 7.8 steps compared to an average of 9.3 steps for the other 9 users (6 old / 3 young) who
just got a “hands-off” description of navigation methods.
In a first step, [Hölscher, Strube 2000] performed interviews with 12 established internet experts
and from this developed a process model of the information seeking process in the Web. Their
model is comparable to the ones introduced in Chapter 2.2. What is really interesting is the fact
that in a second step, the experts had to perform a number of real-world information tasks using
their own choice of strategy and search engine. Analyzing these information seeking episodes, the
authors calculated transition probabilities between the steps of the model. It was found that in 47%
of the cases, using a search engine led to a browsing episode of varying length or that the experts
often switched back and forth between browsing and querying. Also interesting was the fact that
the average query length was 3.64 words, instead of the considerably shorter averages found in the
large scope Web-searching studies. Another difference was the usage of the different types of
modifiers in the queries shown in Figure 10 as a relative distribution for all queries using modifi-
ers, and in Figure 11 as a percentage of all queries.
Distribut ion of modif ier usage for queries with modif iers
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
" " 40 1,401,738 3,282
- (minus) 0 77,531 1,766
+ (plus) 47 4,034,312 3,010
() 12 15,738 273
NOT 0 10,372 105
OR 13 13,817 177
AND 56 390,272 4,094
12 Experts [Hölscher, Strube 2000] / [Hölscher
2000]
1998 Fireball [Hölscher, Strube 2000] / [Hölscher
2000] Excite [Jansen, Spink, Bateman et al. 1998]

Figure 10: Distribution of modifier usage for queries with modifiers 12 Experts / Fireball / Excite

69 This is a particular problem because the two independent variables are not counterbalanced. This can be seen
from the age differentiation: 53.8% of the older got the “hands-on” training, compared to 57.1% of the younger, and in
the training differentiation 63.8% of the “hands-on” group were older, compared to 66.9% of the “hands-off.
Page 42
hidden
Page 42 from 266 Thomas M. Mann
2. Information seeking Visualization of Search Results from the World Wide Web
Distribution o f modifier usage fo r all queries
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
12 Experts [Hö lscher 2000] 34.57% 8.02% 0.00% 7.41% 29.01% 0.00% 24.69%
1998 Fireball [Hö lscher 2000] 2.40% 0.09% 0.06% 0.10% 24.82% 0.48% 8.62%
Excite [Jansen, Spink, Bateman et al. 1998] 7.95% 0.34% 0.20% 0.53% 5.85% 3.43% 6.38%
AND OR NOT () + (plus) - (minus) ""

Figure 11: Distribution of modifier usage for all queries 12 Experts / Fireball / Excite70
In a second experiment, [Hölscher, Strube 2000] looked for the potential influence of Web exper-
tise and domain knowledge on Web search behavior. 24 participants in 2x2 matrix experiment had
to solve a set of five information-search problems from an economic domain. Among the findings
are the following points:
• Double-novices (low Web expertise and low domain knowledge) had the highest propor-
tion of query reformulations, chose the smallest number of target documents for closer ex-
amination, and viewed the highest proportion of irrelevant documents.
• Double-experts were overall most successful in their search behavior
• Double-experts showed the lowest percentage of backward oriented behavior like using the
back button or returning to previous search engine results.
• In some cases double-experts followed a strategy of directly accessing Web sites related to
economics. No other group displayed this behavior.
• Domain-experts spent significantly less time with domain-specific documents, than do-
main-novices.
• Web-experts used modifiers significantly more often (87% vs. 47%) and made far less
formatting errors (1.9% vs. 19.6%) than Web-novices
• The average query length of the Web-experts was only marginally longer than that of the
Web novices (2.61 vs. 2.32), but surprisingly the average query length of domain-experts
was significantly shorter than that of domain-novices (1.97 vs. 2.96)
[Hölscher, Strube 2000] differentiated between technical Web expertise and domain-specific
background knowledge. ”Participants which could rely on both types of expertise were overall
most successful in their search behaviour.” [Hölscher, Strube 2000]. Web expertise alone did not
help to get higher effectiveness rates. On the other hand the authors report, discussing their first
experiment, that there are differences in the way Web experts search the Web compared to the
average user. For an overview covering studies dealing with the influence of expertise on retrieval
success see [Hölscher 2000]. His summary is that an information-seeking process is positively
influenced by all of the three expertise types defined by [Marchionini 1997]: domain knowledge,
general information-seeking expertise, and system expertise. The differences between search ex-
perts and novices are clearer for measures which focus on the search process [Hölscher 2000].
Page 48
hidden
Page 48 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web
nal (obey a < relation) and Quantitative (can do arithmetic on them). Additionally, they make a
distinction between important subtypes like spatial or geographical quantitative data, or quantita-
tive or ordinal time. These differentiations are vital because they determine the type of axis to be
used in a visual structure, or because the subtypes as important properties of the real world are
normally associated with special visual conventions. The data transformation from raw data into
data tables can lead to a loss or gain in information. It can range from a simple reduction of vari-
ables or cases, through statistical computations, to construction of derived values or derived struc-
ture. In Chapter 4.2.2, data transformations from raw data to data tables done in the INSYDER
system will be shown.
The next step in the model, the visual mapping from data tables to visual structures, is one of the
most critical ones in the whole process of visualization. “Good mappings are difficult, …” [Card,
Mackinlay, Shneiderman 1999]. A good mapping must preserve the data, it must be expressive,
and it must be effective. Examples of what can be done wrong can be found in [Tufte 1983] or
[Card, Mackinlay, Shneiderman 1999]. The route to the visualizations finally used in the
INSYDER system was also not free from errors. Rules, guidelines, or examples to follow can be
found in a large number of publications. It will blast this thesis, even when trying to discuss the
most important ones. It is recommended that interested readers should have a look in publications
like [Bertin 1977], [Bertin 1982], [Tufte 1983], [Mackinlay 1986], or [Card, Mackinlay, Shnei-
derman 1999]. When discussing the decisions made for the INSYDER system, a number of rules
or guidelines will be mentioned which directly influenced the process.
The last step in the reference model for visualization is view transformations from visual structures
to views. View transformations allow the user to get more information from a visualization than
would be possible from a static presentation. The three most common view transformations listed
by [Card, Mackinlay, Shneiderman 1999] are: location probes, viewpoint controls, and distortions.
Location probes reveal additional data table information by using location in a visual structure.
Viewpoint controls change the point of view by zooming, panning, or clipping. Overview + detail
[Shneiderman 1996] is also a viewpoint control technique. By using distortion, overview + detail
are combined in a single view with focus + context.
The human interaction that is also part of the model can work on all transformation and mapping
steps described above. An example for a human interaction influencing the transformation from
raw data to data tables is a selection of cases or variables, and an example for influencing the map-
ping from data tables to visual structures is a change of the diagram-type in a spreadsheet program.
For influencing the transformation from visual structures to views, a good example would be a
zooming operation in a diagram displayed.
The information visualization data state reference model of [Chi 2000] is very similar to the refer-
ence model for visualization by [Card, Mackinlay, Shneiderman 1999]. Chi presents a detailed
analysis of a large number of visualization techniques using his version of the model. The similari-
ties are not surprising because the data state model by [Chi, Riedl 1998] which is the basis for the
new taxonomy from Chi was influenced by Card.
Page 49
hidden
Thomas M. Mann Page 49 from 266
Visualization of Search Results from the World Wide Web 3: Information Visualization

3.3. State of the Art: Visualization Ideas, Metaphors, Techniques,
Components and Systems
The aim of this chapter is to give an impression of the great variety of ideas that have already been
developed to map data tables on visual structures. When designing the INSYDER system, a scan
of the available literature showed that it may be scientifically honorable to develop new ideas of
how to visualize search results from the World Wide Web, but that there are already a great num-
ber of ideas available. Some of them already used for the visualization of Web search results, some
are used for other IR-related systems, or others come from different application areas that could be
potentially useful. Some of them have been evaluated, others of them not. Some of them proved
useful, others of them not. The printouts of the figures found in the literature filled the walls of the
researchers’ office, and the question was, who was to structure the heap of visualization ideas?
[Shneiderman 1996] solved the problem by proposing a data type by task taxonomy (TTT) of
information visualizations. The tasks are the ones shown in Table 6 on page 27. The data types are
listed in Table 14. Shneiderman used the TTT in [Shneiderman 1998] to structure his overview of
visualization ideas and systems. In [North 1997] and the On-line Library of Information Visualiza-
tion Environments [OLIVE 1997], which also used the TTT, the data types were expanded and
include an additional eighth type “workspace”.
Data type Examples
1-D Linear Textual documents, program source code, alphabetical lists of names.
2-D Map Planar or map data include geographic maps, floor plans, newspaper layouts.
3-D World Real-world objects such as molecules, the human body, buildings
Temporal Timelines used in medical records, project management, historical presentations. Special form
of 1-D Linear.
Multi-Dimensional Relational- and statistical-database contents.
Tree Hierarchies and tree structures, with each item having a link to one parent item (except root)
Network Network structures with items linked to an arbitrary number of other items
Table 14: Data types of the TTT data type by task taxonomy from [Shneiderman 1996], [Shneiderman 1998]
In other publications discussing a large number of visualization possibilities, the different tech-
niques are grouped by a number of principles. [Card, Mackinlay, Shneiderman 1999] divide their
overview into the chapters Space, Interaction, Focus + Context, Data Mapping: Document Visu-
alization, “Infosphere, Workspace, Tools, Objects”, Using Vision to Think. [Chi 2000] organized
his discussion of visualization techniques into the following groups: “Some example Scientific
Visualizations”, “Geographical-based Info Visualization”, “2D”, “Multi-dimensional Plots”, “In-
formation Landscapes and Spaces”, “Trees”, “Network”, “Text”, “Web Visualization”, and “Visu-
alization Spreadsheets”. In a general examination of the visualization of search results in docu-
ment retrieval systems, [Zamir 1998] used a classification shown in Figure 14. The classification
focuses on post-retrieval document visualization techniques.
Visualization
Techniques
Visualization of
Document Attributes
Visualization of
Interdocument Similarities
Query Terms
Distribution
User-Defined
Attributes
Prefdefined
Attributes
Document
Networks
ClusteringSpring
Embeddings
Self Organizing
Maps
Figure 14: Classification of post-retrieval document visualization techniques according to [Zamir 1998] Fig. 1
Page 52
hidden
Page 52 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web
interesting discussion about the disadvantages of metaphors, also including aspects of design, see
[Bederson, Hollan 1994], [Bederson, Hollan, Perlin et al. 1996]. They propose using physics-based
design strategies instead of metaphors.
During the design phase of the INSYDER system and its visualizations, one part of the work was
to investigate which metaphors had been used for other systems with comparable functionality.
The usage of certain metaphors would also be a candidate for structuring an overview covering
visualization ideas. But metaphors can stand behind a component or behind a whole system. Their
match can be more or less complete. Composite metaphors could be used. Metaphors are some-
times easily comparable, on the other hand a certain metaphor can be used for completely different
target domains or tasks. During the design of the INSYDER system, the results from searching
metaphors served more as a pool of ideas, than for the classification of visualizations in general. In
the INSYDER system itself metaphors are used in a number of ways, like for example presenting
predefined or stored searches, watches and news in the form of a file-browser, or using visualiza-
tions with similarities to business-graphics, after discussing the target user group of the system and
their possible pre-experiences. This chapter about metaphors will be restricted to a brief overview
of metaphors used in systems with visualizations of queries, browsing or search results. Metaphors
found include: Book, Bookshelf, Newspaper, City, Landscape, Rooms, Building, Tower, Guided
Tour, Lens, Butterfly, Pile, Universe / Galaxy / Starfield, Magnet, Sculpture, Television, Wall,
Aquarium, and flowing Water.
A book metaphor has been used in several systems. Examples of these include SuperBook
(showing one document as one book), BOOK HOUSE (showing the metadata of a book as a
book), the WebBook (showing groups of Web pages as books), and the libViewer (showing n
documents or n Web pages as n books). SuperBook / MiteyBook [Egan, Remde, Gomez et al.
1989] is more a hypertext browsing-system than a retrieval system, but implemented a number of
good ideas (not really visualizations) using a book metaphor. An ASCII-text with heading markers
or in a standard text markup language is preprocessed and displayed in book format with table of
content, word lookup, and text display. A number of features are available, such as string search,
and highlighting of query terms in text. [Pejtersen 1989] used an image of an “open book” in the
implementation of the BOOK HOUSE system to show descriptions of retrieved documents one at
a time. The BOOK HOUSE itself was an electronic DOS/GEM replica of a real library with a li-
brary building, rooms, or people. Retrieval was depicted using icons symbolizing the different
dimensions of the classification system. A globe represented the geographic setting of the book, a
clock the time dimension, or a theatre mask the emotional experience provided by a book. Icons
are also used for a number of other functions in the BOOK HOUSE. The WebBook [Card, Robert-
son, York 1996] allows users to group related Web pages into a higher aggregated entity, and to
manipulate them as a unit. WebBooks themselves are used in an information Workspace called
Web Forager. The whole system is implemented in the framework of the Information Visualizer
system [Robertson, Card, Mackinlay 1993]. The WebBook preloads a collection of Web pages and
shows them in a 3D simulation of a real book. A number of HTML-properties are adapted to the
usage of the pages in a collection. Links, for example, are color-coded depending on if they point
to pages inside or outside the virtual book. The WebBook supports a number of features associated
with real-world books, like for example ruffling or insertation of bookmarks. Other features have
no counterparts in the real world, like the possibility to explode the book out so that all pages are
available simultaneously and can be viewed using a fisheye-technique called Document Lens
Page 53
hidden
Thomas M. Mann Page 53 from 266
Visualization of Search Results from the World Wide Web 3: Information Visualization

[Robertson, Mackinlay 1993]. Animation plays an important role in the implementation. In the
libViewer applet [Rauber, Bina 1999], which is part of the SOMLib project, search results from a
retrieval system are shown as 3D-books by mapping metadata of documents to attributes of real-
world books. Examples can be seen on page 87 and page 102. The applet has also been used to
show Web search results [Rauber, Bina 2000]. [Card, Robertson, York 1996] list a number of
other systems that also use the book metaphor.
Among other systems the already introduced Web Forager system and libViewer applet
both use in addition to the book metaphor also a bookshelf metaphor. The Web Forager allows the
user to place the WebBooks on a virtual bookshelf as a tertiary storage area, in addition to an im-
mediate storage area to work on and a virtual desk as intermediate storage [Robertson, Card,
Mackinlay 1993]. The libViewer uses a virtual bookshelf to display the book-representations of
documents in an ordered or grouped way. In a simple mode the “books” are ordered in the book-
shelf by a dimension of the available metadata, like size or relevance. In an advanced mode, the
authors use an unsupervised neural network in the form of a self-organizing map [Kohonen 1998]
to cluster documents dealing with similar topics. Every single cluster is then displayed as a single
shelf in the bookshelf labeled by using a so-called LabelSOM technique. [Baeza-Yates 1996] also
proposes the usage of a bookshelf metaphor in a way comparable to the “simple” mode of the lib-
Viewer. He calls it “library” or “bookpile” depending on the orientation. Document attributes like
relevance, size, or age can be mapped by the user to graphical properties like position, color,
width, or height. The “library”-idea has like the libViewer, been implemented in a Java-Applet
[Alonso, Baeza-Yates 1998]. There the library-view is also called horizontal bookpile.
In the VOIR (Visualization Of Information Retrieval) system [Golovchinsky 1997]74 uses a
newspaper metaphor for the visualization of search results, respectively the navigation in a query-
mediated hypertext. Newspaper metaphors are quite frequent in the Web. Examples are electronic
newspapers or personalized electronic newspapers75. The special point of the VOIR system is the
usage of a newspaper metaphor for the visualization of texts that have in general nothing to do
with news as content. The idea is to use the metaphor of a newspaper for organizing loosely re-
lated units of internally coherent text, retrieved by a number of different mechanisms. Visual cues
from newspapers, like space used to display a certain document, are applied, for example, to mir-
ror the relevance of the text in a current situation76.
[Dieberger 1994] proposed the usage of a city metaphor as a conceptual spatial user inter-
face metaphor for large information spaces. [Dieberger, Frank 1998] contains an overview cover-
ing other use cases of the city metaphor. In their Information City approach the authors describe an
ontology of spaces and connections to be used when talking about systems of spatial metaphors
and how they interrelate. The ontology includes containers, landmarks, and paths in form of dis-
tricts, sub-districts, buildings, rooms, doors, taxis, subways, and others.

74 See also: [Golovchinsky 1997a], [Golovchinsky 1997b], and [Golovchinsky, Chignell 1997]
75 “The Kraktatoa Chronicle” [Kamba, Bharat, Albers 1995] seems to the first one using besides the news as con-
tent also a newspaper-like layout.
76 The usage of space to reflect relevance seems to be used not fully consistent. The author describes that every
page displayed shows a fixed number of eight articles. More than eight retrieved articles are displayed on subsequent
pages. Therefore the usage of space can only reflect the relative relevance of an article in a group of eight, but not the
overall relevance. The ninth article in an overall relevance-ranking list will get more space than the eighth one.
Page 54
hidden
Page 54 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web
A landscape metaphor has been used in a number of systems including the Harmony Hy-
per-G / Hyper View browser, ThemeScapes in the SPRIRE system, and Landscapes in Vineta or
Bead. [Andrews 1995] describes the Harmony VRweb 3D scene viewer with handcrafted three-
dimensional landscapes (e.g. a plan of the city center of Graz containing hyperlinks to sightseeing
information), or automatically created three-dimensional landscapes depending on user navigation
steps or searches in the hypertext environment. Providing an additional 2D-map overview helps to
keep orientation in the three-dimensional landscape. ThemeScapes [Wise, Thomas, Pennock et al.
1995] was one of the views developed in the MVAB (Multidimensional Visualization and Ad-
vanced Browsing project) / SPIRE (Spatial Paradigm for Information Retrieval and Exploration)
project. ThemeScapes are abstract, three-dimensional landscapes of information constructed by
automatically analyzing the thematic content expressed in the documents of a collection. The sec-
ond visualization in SPIRE is the Galaxies view. The visualizations of the German prototype Vi-
neta were described in an earlier paper [Krohn 1995] as spheres in 3D space. Later Vineta also
used a landscape and a galaxy view77 [Elzer, Krohn 1997]. Whereas the automatically constructed
landscapes in the Harmony VRweb 3D scene viewer looked like pedestals and boxes connected by
wires78, ThemeScapes provoke the impression of mountains or natural terrain. An example of a
ThemeScape can be found on page 99. [Chalmers 1993] also used a technique to present high di-
mensional data in low dimensional space in the Bead system. The system calculates similarities
between pairs of documents. In the visualization the documents are spread over a landscape like
trees or little pyramids. Documents with keyword-matches are displayed in another color. In later
versions the landscape looked more like cubes and wires [Chalmers 1995] and had additional col-
ored districts [Chalmers 1996]. There seem to be also more labels. As opposed to the Harmony
browser where wires symbolize links between documents, the wires in Bead seem to visualize
other connections. [Bekavac 1999] used a landscape to symbolize the geographical frame of an
electronic mall in the VR-emb79 prototype. The navigation in the electronic mall itself was done
inside a tower (See below). In front of the tower some road signs allowed navigation to cities and
institutions in the geographical area of the electronic mall. The landscape around the tower also
included cars and a helicopter in front of the tower for navigation to other malls or places (not
implemented).
[Henderson, Card 1986] used the rooms metaphor for a technique that virtually enlarged the
available screen space by allowing the user to organize, save, and recall window positions and
other features as working sets for later reuse. Their Rooms system included a lot of additional
ideas and metaphors like an overview to switch between rooms, “pockets” for carrying windows to
every room, or “baggage” to carry windows to another room. They also list a number of previous
usages of the room metaphor. Their usage was purely desktop organization. The logic of the origi-
nal 2D-version was later in the 3D/Rooms of the Information Visualizer extended to a 3D-version,
whilst keeping the original controls like doors for “walking”80 from one room to another and add-

77 „Um die Brauchbarkeit und Akzeptanz verschiedener Darstellungsformen besser testen zu können, wurden zwei
Modelle realisiert: „Die ‚Galaxie’ (Fig.6 und 7) und die ‚Landschaft’ (Fig. 5).“ [Elzer, Krohn 1997]
78 Comparable to the pedestals, boxes, wires-look of the FSN (pronounced fusion) 3D File System Navigator for
IRIX developed by [Tesler, Strasnick 1992]. The FSN has also a similar 2D-map overview window.
79 Virtual Reality – electronic mall bodensee (Lake Constance, Germany – Switzerland – Austria)
80 Walking is an additional metaphor used by [Robertson, Card, Mackinlay 1993]
Page 56
hidden
Page 56 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web
newsgroups. The system is based on user ratings, and the comparison of ratings and profiles. The
term “lens” for the filtering mechanism is really used metaphorically.
A butterfly metaphor is used by [Mackinlay, Rao, Card 1995] in the Butterfly part of the
Information Visualizer project. The system, targeted in general to solve a Fast User Interface /
Slow Multiple Repository problem, is used to support asynchronous querying of three DIALOG
databases: the Science Citation Index, the Social-Science Citation Index, and the IEEE Inspec da-
tabase. The Butterfly visualization shows references of an article as “veins” of a stylized left wing
of a butterfly, and the article’s citers located in the citation databases as veins of the right wing.
A pile metaphor is used by a number of systems to visualize search results. [Rose, Mander,
Oren et al. 1993] used the metaphor “a pile of documents” presented in [Mander, Salomon, Wong
1992] for a prototype implementation of a tool to support casual organization of information on a
Macintosh. Besides possibilities for manual organization of documents in piles81, the system also
included mechanisms for automatic filing and indexing of documents. They used a variant of the
popular tf*idf algorithm to rank documents and additional mechanisms for extracting terms to
describe documents and piles. The prototype supported functions like flipping step by step through
the documents, ordering, or automatic subpiling of piles. The icons of the documents and piles
used the well-known icon-style of the Macintosh. [Brown, Shillner 1995] introduced DeckScape
as a Web browser based on a “deck” metaphor. Web documents are represented as stapled simple
rectangles containing the titles of the documents. The system supports mechanisms like inserting
documents into a deck when returning to a previous seen document, and following a new link from
there. Other features include “Expand One Level” as a command, which follows all links of a par-
ticular page, and returns all resulting pages in a new deck. As mentioned above, [Baeza-Yates
1996] / [Alonso, Baeza-Yates 1998] also called their “library” view “horizontal bookpile” when
oriented horizontally, or just “bookpile” when oriented vertically. The Butterfly [Mackinlay, Rao,
Card 1995] part of the Information Visualizer project also uses a pile metaphor in the form of a
stylized pile below the butterfly to stack articles the user has selected.
A galaxy or starfield or universe metaphor has been used in a number of systems including
Galaxies in the SPIRE system, and Vineta. [Wise, Thomas, Pennock et al. 1995] described the
Galaxies used in SPIRE as 2D scatterplots of ‘docupoints’ appearing in the way that stars do in the
night sky. They show cluster and document interrelatedness by reducing a high dimensional repre-
sentation into two dimensions. Clusters are annotated with key terms. The more similar two docu-
ments or clusters are, the nearer to each other they appear in the visualization. The component is
enriched by additional features like a “temporal-slicer” to divide the document collection into tem-
poral units. The galaxies in Vineta [Elzer, Krohn 1997] were implemented in 3D. The usage of the
metaphor here is more abstract than in the SPIRE Galaxies. The main concepts are the same.
A magnet metaphor is used by [Morse, Lewis 1997] in the WebVIBE to symbolize the refer-
ence points / Points of Interest (POIs) attracting documents in a virtual 2D-Document space.

81 [Robertson, Czerwinski, Larson et al. 1998] also implemented a prototype where they allowed users to organize
documents in piles. The usage there was for bookmarks. They did not talk from a “pile” metaphor, but used instead the
term “Data Mountain”, because the users had a virtual mountain with a planar surface in form of a plane tilted at 65
degrees to put down and organize the document thumbnails.
Page 57
hidden
Thomas M. Mann Page 57 from 266
Visualization of Search Results from the World Wide Web 3: Information Visualization

In the Information Visualizer [Robertson, Card, Mackinlay 1993] also used a sculpture meta-
phor for a visualization called Data Sculpture, visualizing in the example 65,000 sampling points
from a data set like a sculpture in a museum. The visualization is a 3-D surface plot offering the
possibility to fly around the object. The visualization shown in their Figure 6 has more similarities
to a landscape than a sculpture. Interestingly, in the system overview displayed in their Figure 1
the room with the Data Sculpture is labeled DataMap.
Influenced by the FRIEND2182 project [Nonogaki, Ueda 1991], a television metaphor has
been used in the WebStage prototype by [Yamaguchi, Hosomi, Miyashita 1997]. The aim of the
system is to reduce user operations necessary to access the Web by presenting Web page informa-
tion in a style comparable to television programs. This includes media transformations in a form
where, for example, titles and captions are displayed on the screen using a large font, whereas
other text strings are spoken by a text-to-speech-synthesizer. Images are also presented on the
screen. The presentation can be accompanied by background music or sound effects chosen by the
system to create an appropriate atmosphere for certain information types83. Retrieval or selection
of Web pages to be displayed is also implemented in a TV-like style by organizing, for example,
URLs by time slots over the day and automatically starting a currently scheduled presentation
when starting the system. Clusters of URLs to be displayed on a channel-panel can be retrieved by
using other Web search engines or directory services.
The wall metaphor is used in the form of the “Perspective Wall” in the Information Visual-
izer environment by [Mackinlay, Robertson, Card 1991] to solve two principle problems of visu-
alizations of large amounts of linear structured data: the large amount of information that must be
displayed, and the difficulty of accommodating the extreme aspect ration of a linear structure on
the screen [Robertson, Card, Mackinlay 1993]. A detailed and a contextual view are integrated in
one visualization. In the implementation, the horizontal dimension of the wall is used for time, and
the vertical is used to visualize layering in an information space. Examples are visualizations of
files with the modification date in the horizontal axis and the file type in the vertical axis. The Per-
spective Wall is a variant of the one-dimensional Bifocal Display introduced by [Spence, Apperley
1982]. The Bifocal display does not use the wall metaphor, and has a constant demagnification rate
for the regions out of focus, whereas the Perspective Wall has an increasing rate for demagnifica-
tion. On page 109, figures of both techniques are shown. [Mackinlay, Robertson, Card 1991] use a
number of other metaphors to explain the functionality of the Perspective Wall, namely sheets in a
player piano to explain navigation on the wall, and a sheet of rubber to explain changes of the ratio
between detailed and contextual information. The metaphor of a “rubber sheet” is also used by
other authors to explain the functionality of their system. Examples are [Jog, Shneiderman 1995]
for the Filmfinder (“rubber mat”, “rubber carpet”) or [Bederson, Hollan, Perlin et al. 1996] for
Pad++ (“rubber sheet”). [Leung, Apperley 1994] use the “rubber sheet” metaphor to explain dis-
tortion-oriented presentation techniques in general, and list some additional papers using it.

82 FRIEND21 = Future Personalized Information Environment Development project, initiated in 1988 by the Japa-
nese Ministry of International Trade and Industry
83 [Bekavac 1999] described also the idea of using background music. In the case of the VR-emb different types of
background music should support orientation in an electronic mall.
Page 62
hidden
Page 62 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web
This can be done without changing the zoom factor by moving sideways or by changing the zoom
factor. [Hearst 1999] uses the metaphor of a movie camera for explanation: “scan sideways across
a scene (panning) or move in for a closeup or back away to get a wider view (zooming)”. [Card,
Mackinlay, Shneiderman 1999] do not use the term “panning and zooming” in their listing of in-
teraction techniques. Their equivalent is “camera movement” on one side and “zoom” on the other
side. In contrast to simple panning, camera movement includes the third dimension, when dealing
with three-dimensional visualizations. In both papers, zooming includes possible changes of the
level of details displayed, when changing the zoom factor. Also an interesting contribution, when
talking about zooming, is the “single-axis-at-a-time-zooming”, discussed by [Jog, Shneiderman
1995]. Whereas normal zooming can be explained by using a camera metaphor87, this fails to work
when only the scale of one the axes is changed. [Jog, Shneiderman 1995] call this single-axis-at-a-
time-zooming, as shown in Figure 20.
zooming out or in
(camera metaphor)
panning in different directions
single-axis-at-a-time-zooming out or in
(according to [Jog, Shneiderman 1995])
Figure 20: Panning and zooming, including different types of zooming
A classical example for a system implementing panning and zooming for the visualization of
browsing and searching is Pad++ [Bederson, Hollan, Perlin et al. 1996]. One of the central charac-
teristics of the system is the fact that scale is added as a first class parameter to all items displayed.
In addition to implementing simple panning and zooming, Pad++ goes far beyond this interface
technique. Besides other techniques it also offers focus-plus-context views as well as overview
plus detail, described later. The explanations of [Bederson, Hollan, Perlin et al. 1996] using space-
scale diagrams [Furnas, Bederson 1995] to explain basic concepts of panning and zooming, com-
binations of panning and zooming, and special problems when animating panning and zooming,
are particularly interesting. In general at least simple forms of panning and zooming are today one
of the general techniques implemented in a great many of the available visualization systems.
3.3.2.3. Focus-plus-context
An inherent problem of zooming leads to “focus-plus-context” as a solution. The problem is that
the higher the zooming factor is, the more details can be shown about particular items or the better
the separation between close up items, but less can be perceived from surrounding items or the
overall structure. A solution for this problem is to present more details about the items in focus,
and less about the context, avoiding completely hiding the context. [Card, Mackinlay, Shneider-
man 1999] list as premises for focus plus context the following three points:
• The user needs both overview and detail information simultaneously.
• Information needed in the overview may be different than that needed in detail.
• These two types of information can be combined in a single (dynamic) display

87 Not to be mixed up with the more complex camera movement metaphor used by [Card, Mackinlay, Shneiderman
1999]
Page 63
hidden
Thomas M. Mann Page 63 from 266
Visualization of Search Results from the World Wide Web 3: Information Visualization

As we will see in Chapter 3.3.2.6 overview plus detail is another method which can be used to
cope with the mentioned problem of zooming and the first and the second of the above listed
premises, but overview plus detail does not combine both types of information in single display.
Raw
Data
Data
Tables
Visual
Structures Views
Data
Transformations
Visual
Mappings
View
Transformations
Task
Human Interaction
Focus-plus-context

Figure 21: Reference model for visualization: Focus-plus-context
[Hearst 1999] describes a fisheye camera lens as a metaphor for focus-plus-context. The trailblaz-
ers for fisheye views were [Furnas 1981] / [Furnas 1986] with his theory about “Degree Of Inter-
est” (DOI) functions, and [Sarkar, Brown 1992] with their extensions for graphical fisheye views.
For a good overview of distortion-oriented presentation techniques see [Leung, Apperley 1994].
[Card, Mackinlay, Shneiderman 1999] list the following techniques for selective reduction of in-
formation for the contextual area: Filtering, Selective aggregation, Micro-macro readings, High-
lighting, and last but not least Distortion. Explanations can be found in Table 16. Interestingly they
interpret filtering in focus-plus-context as a data transformation, whereas for zooming, where a
sort of filtering can also occur, they categorized the complete technique as working on the view
transformation. The interpretation of [Card, Mackinlay, Shneiderman 1999], that focus-plus-
context has at least partially to do with data transformations, is indicated in Figure 21 as a dotted
line. Actually this should also be valid for panning and zooming in Figure 19, but has been omitted
there because of the above-mentioned classification of the authors.
Technique Explanation
Filtering Selection of cases in the Data Table
Selective aggregation Creation of new cases in the Data Table by aggregating other cases
Micro-macro readings Graphics in which detail cumulates into larger coherent structures88
Highlighting A overall set of items provides a macro environment against the micro reading of individual
highlighted items can be interpreted
Distortion Relative changes in the number of pixels devoted to objects in the space (more pixels for
focus objects)
Table 16: Focus plus context: selective reduction of information for the context according to [Card, Mackinlay,
Shneiderman 1999]
Examples for systems using focus-plus-context for the visualization of search results or browsing
are the Document Lens, the Table Lens, or the Pad++ system. The Document Lens [Robertson,
Mackinlay 1993] is a component of the Information Visualizer system. It is a 3D tool for large
rectangular presentations of documents or Web page collections, like the WebBook. The pages of
a document or a collection are exploded out, so that all pages are available simultaneously and can
be viewed using a rectangular lens magnifying the page in focus, and therefore distorting all the
other pages. The principle is shown in Figure 41 on page 77. Another component, also using a lens
metaphor, is the Table Lens [Rao, Card 1994]. The Table Lens can be used for the viewing of re-

88 Their example is the illustration of an new born infants sleep/wake cycles from [Winfree 1987], reproduced as
Figure 1.8 in [Card, Mackinlay, Shneiderman 1999]
Page 67
hidden
Thomas M. Mann Page 67 from 266
Visualization of Search Results from the World Wide Web 3: Information Visualization

played, it would be a good idea to help the user finding the needle in the haystack by applying
adequate visualizations. The discussion of components for the result phase will be subdivided into
visualizations of document attributes, visualizations of interdocument similarities, and visualiza-
tions of interdocument connections. In terms of visualization, the refinement step has elements
from the formulation and the result phase. Therefore visualizations for the refinement phase are
discussed in the context of the formulation or the result phase.
3.3.3.1. Visualization of queries or query attributes
In the AI-STARS system, [Anick, Brennan, Flynn et al. 1990] used a component called “Query
Reformulation Workspace” to visualize Boolean queries automatically derived from natural lan-
guage queries. The ascertained citation forms are laid out as tiles two dimensional form, represent-
ing the Boolean queries with “AND” and “OR” conditions. The system carries out automatic op-
erations on the query, like identification of noisewords or meaningful phrases. The results are also
visualized. Figure 25 shows the Boolean query “(’copy’ AND ‘BACKUP saveset’ AND ‘tape’
AND (‘v.5.0’ OR ‘version 5.0’))” automatically derived from natural language query “copying
backup savesets from tape under v5.0”. The example of [Anick, Brennan, Flynn et al. 1990] is
based on a database with technical information for customer support specialists. Using the
WebViz-example the query could be “((’visualization’ OR ‘visualisation’) AND ‘search’ AND
‘results’ AND (‘www’ OR ‘internet’))” automatically derived from the natural language query
“Visualization of Search Results from the World Wide Web”. The black tiles represent the query.
The white tiles represent citation forms detected, but not automatically selected by the system. By
clicking on the tiles the selections can be toggled. Additionally there are number of other functions
like changing Boolean operators by moving tiles to other columns or requesting a window with
related terms to expand or change the query. The related terms are grouped in phrases containing
the term, synonyms, conceptually related terms, and compound terms. The numbers in the lower
left corner of the tiles shows the number of postings of each term.
visualization
58
of
visualisation
7
search
151
results
114
from the wide web
www
142
internet
78
world
Query: „Visualization of Search Results from the World Wide Web“
copy
469
BACKUP saveset
15
tape
214
v5.0
344
BACKUP saveset from under
version 5.0
840
version 5
Query: „copying backup savesets from tape under v5.0 “

Figure 25: Principle of the Query Reformulation Workspace used in the AI-STARS system by [Anick, Bren-
nan, Flynn et al. 1990]
As described on page 58 discussing the water flow metaphor, [Shneiderman 1991] / [Young,
Shneiderman 1993] introduced a component called Filter/Flow to overcome known problems with
the formulation of Boolean queries. The filters let through only the appropriate documents and the
pipe layout determined if the relationship was an “AND” or an “OR”. The left part of Figure 26
shows the simplified example of a complex query according to Figure 5 from [Young, Shneider-
man 1993]. The example uses an employee database. The task is to find the accountants or engi-
neers from Georgia who are managed by Elisabeth, or clerks from Georgia who make more than
thirty thousand dollars per year. The right part of Figure 26 shows a transfer of the principle to the
visualization of Web search results. Assuming an already found result set for the WebViz-
example, the task is to filter English or German documents that are mixed linklists from academic
servers, or high relevant English or German documents of all types except framesets.
Page 69
hidden
Thomas M. Mann Page 69 from 266
Visualization of Search Results from the World Wide Web 3: Information Visualization

Venn diagrams have been used in a number of cases to represent Boolean queries. One recent ex-
ample is the usage in the TeSS prototype by [Hertzum, Frøkjær 1996]. A good overview of Venn
diagrams can be found in [Jones 1998]. Simple Venn Diagrams are capable of dealing with two or
a maximum of three keywords. Figure 28 shows the principle of Venn diagrams for a part of the
result set for the WebViz-example. Starting in the upper left corner, the blue circle represents 18
documents retrieved by ‘(visualization OR visualisation) AND (NOT (search OR results))’. The
intersection of the two upper circles shows the 8 documents retrieved by ‘(visualization OR visu-
alisation) AND search AND (NOT results)’. The intersection of the three circles contains the 32
documents which are retrieved by ‘(visualization OR visualisation) AND search AND results’.

(visualization
OR
visualisation)
search
results
5918
24
32
8
6 52

Figure 28: Venn diagram for the concepts (visualization OR visualisation), search, results.
[Jones 1998] integrated Venn diagrams in the VQuery interface in a query workspace to support
users in a more flexible way when working with this type of visualization. Figure 29 shows an
illustration using the WebViz-example. Six keywords are spread over the workspace. Currently the
active query, represented by the gray rectangle, includes three of them. The query is ‘(visualization
AND search) OR results’. Part of the workspace is a text field, where the system presents an Eng-
lish language interpretation of the graphically constructed active query. Besides “AND” and “OR”,
the systems support also a NOT operator, but complex queries are impossible to construct.
results
114
visualization
58
search
151
visualisation
7
www
142internet
78
Search for any documents containing either visualization and search; or results
Active query

Figure 29: Principle of the Query workspace with Venn Diagrams in the VQuery system by [Jones 1998],
[Jones 1998a]
[Spoerri 1993], [Spoerri 1993a] introduced with the InfoCrystal a query-visualization component
also derived from Venn diagrams. The InfoCrystal can be used as a visualization tool and as visual
query language. Spoerri describes the usage for Boolean or for vectorspace queries, and different
modes like simple queries or complex queries using a block building mode. The layout inside an
InfoCrystal can be done in rank layout or bull’s-eye layout. Information is coded in shape, prox-
imity, rank, orientation, and color or texture. In special cases size, or brightness and saturation
coding is used. Figure 30 shows an InfoCrystal for the WebViz-example. It is a simple query in
rank layout with color-coding. The number in an icon shows the number of documents satisfying
the conditions represented by it. Starting in the upper left corner, the blue circle represents 64
documents retrieved by ‘visualization OR visualisation’. The next blue circle shows one document
retrieved by ‘(visualization OR visualisation) AND (NOT (search OR results) OR (www OR
internet))’. The rectangle with a blue and a green end stands for 18 documents retrieved by ‘(visu-
Page 72
hidden
Page 72 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web
corresponding keywords beneath of them. The mapping could by changed by drag and drop. An
extra column is reserved for unused keywords. The lower part of Figure 34 shows the principle of
the interactive legend.
visualization
visualisation
visualization
search
results
www
internet
visualisation
search
results
www
internet
CONCEPT:

Figure 33: Principle of the Keyword-Concept Matrix or Concept Control used in the NIRVE system by
[Cugini, Laskowski, Piatko 1998], [Cugini, Laskowski, Sebrechts 2000].
Starting with visualizations of interdocument similarities and document clusters at a later point of
the development of the NIRVE system, a so-called Concept Globe [Cugini, Laskowski, Sebrechts
2000] has been added, showing per default no single documents but only document clusters, the
concept distribution and average relevance in the cluster, the number of documents in the cluster,
and a number of other features. The primary design version was a 3D globe, but the authors also
experimented with 2.5D and 2D versions. The definition of a cluster is guided by previous user
experiences and is quite simple: all documents that have the same subset of concepts form a clus-
ter. The clusters are visualized starting at the North Pole of the globe, or the upper end in the 2D
version, starting with the cluster containing all keywords. In the next row are the clusters in which
one of the concepts is missing, in the next row two concepts are missing and so forth. At the South
Pole, or lower end in the 2D version, would be the cluster of documents where all concepts are
missing. So the number of concepts defines the „latitude“ of an icon representing a cluster. In the
3D version the thickness of the box of a cluster represents the number of documents in the cluster.
The height of a rectangle below the cluster icon indicates the same value in the 2D version. Pres-
ence or absence of colored bars indicates the presence or absence of concepts. Colored lines be-
tween the icons indicate concept differences between clusters. Neglecting the length of the bars,
indicating the average relevance of a concept for the documents in the cluster, and some other fea-
tures not described here, the Concept Globe the presents almost the same information like visual-
ized in the InfoCrystal or the “Bracket”-visualization. Figure 34 shows the 2D principle using the
result set from the WebViz-example.
VISUALIZATION
visualization
visualisation
SEARCH
search
INTERNET
www
internet
RESULTS
results
UNUSED

Figure 34: Principle of the 2D Global View used in the NIRVE system by [Cugini, Laskowski, Sebrechts 2000]
Page 80
hidden
Page 80 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web
size and color. The right side shows another possible mapping, where the relevance is mapped on
icon type and color. Other attributes such as author names or publication year can be mapped on
the placement along the x-axis and y-axis and will be discussed in Chapter 3.3.3.3 Visualization of
interdocument similarities.
Book Journal Article Proceedings Article
Relevance
least most
Relevance
least most
Relevance
least most
Relevance
least most
All Document Types
Document Type mapped on Icon, Relevance mapped on Size and Color Relevance mapped on Icon and Color
Figure 45: Possible mappings in the Matrix of Icons of the Graph View of the Envision system according to
[Nowell, France, Hix et al. 1996]
[Church, Helfman 1993] introduced Dotplots to investigate different sorts of text by visualizing
self-similarity of tokens in the text or in the meta-information of text. The goal is to support the
discovery of large-scale structures. In a first step the text is split into lines, words, or characters. In
a second step a plot is generated, where a dot is placed in every position i, j where the ith input
token is the same as the jth. For Dotplots of text the token is usually a word. Besides simple plots,
Dotplots have a number of features such as reconstruction, weighting, approximation, and the us-
age of greyscale or colormaps to visualize results. Dotplots also support overview plus detail with
multiple views of a text as plots in two different scales and an additional text window. [Church,
Helfman 1993] experimented besides other forms of text like source code with: four Associated
Press (AP) news stories about the same topic, the protocols from Canadian parliamentary debates
in English and French, and Microsoft manuals in seven languages. Patterns detected are reverse
diagonals, broken diagonals, light crosses, checkerboards, reordered diagonals, or density varia-
tions. Practical application in case of the four AP-stories was, for example, the support of the de-
tection of rewrites, or in other cases the detection of similarities or dissimilarities between the
same text in different languages. The principle of Dotplots is shown in Figure 46. Figure 47 shows
an example of Dotplots. Tokens in the examples are characters, not words. Before reading the ex-
planation of the figure, guess which of the three plots is a dada poem.
what
a
be
what
a
be
what
a
beauty
w
h
a
t
a
b
e
words characters
wh
at
a be wh
at
a be wh
at
a be
au
ty
w h a t a b ewh
at
a be wh
at
a be wh
at
a be
au
ty
wh
at
a be wh
at
a be wh
at
a be
au
ty

Figure 46: Principle of Dotplots according to [Church, Helfman 1993]

figures was repeated with the same result by using http://www1.bell-labs.com/user/gwills/ntts95/SEE.gif [2001-02-03]
and http://www.sims.berkeley.edu/~hearst/irbook/10/seesoft.gif [2001-02-03].
Page 81
hidden
Thomas M. Mann Page 81 from 266
Visualization of Search Results from the World Wide Web 3: Information Visualization


Figure 47: Examples of Dotplots102. From left to right: plots of the first 308 characters of the first paragraph of
the Declaration of Independence of the United States of America, of the first 308 characters of the abstract
from [Mann 1999], and of the 308 characters dada poem “What a be what a be what a beauty” by Kurt Schwit-
ters.
An idea comparable to the Dotplots by [Church, Helfman 1993] has been used by [Gershon, Le-
Vasseur, Winstead et al. 1995] for the visualization of single documents retrieved from the World
Wide Web. Instead of the self-similarity principle, word correlations calculated by the proximity
of any pair of words are used to produce a “Dotplot”-visualization of a document.
More famous than the already mentioned “Positive / Negative Feedback workspace” by
[Veerasamy, Navathe 1995] / [Veerasamy, Hudson, Navathe 1995] is their bar-graphs view of a
result set from ranked output systems. The component shows the distribution of query terms for up
to the highest ranked 200 or 150103 documents in a result set. It is used for two main reasons: to
gain specific information about individual documents and to gain aggregate information about the
query results in general [Veerasamy 1996] / [Veerasamy, Belkin 1996]. For each document a
group of vertically stacked bars is used to show the overall relevance of the document for the
query, and the contribution of every keyword or concept. A concept can be a single keyword, or a
group of keywords being synonyms or other forms of the keyword. Each concept is shown in one
row. The taller the bar of a concept, the higher is the contribution of this concept to the retrieval of
the document. If a bar for a concept is absent, the concept in the document is absent. Figure 48
shows the principle of bar-graphs using the WebViz-example and a result set of 20 documents.
The original examples of Veerasamy et al. show 70 or 150 documents. By examining the bar-
graph, it can be detected that the highlighted document #6 has been ranked higher than #7, despite
the fact that #6 does not contain the concept “visualization”. Document #7 contains all four con-
cepts, including a weak contribution of “visualization”. The contribution of the concept “internet”
is much weaker than in document #6. As aggregated information about the whole result set it can
be seen in the bar-graph, that nearly all of the documents deal with “search” and “results”, and
many with “visualization”. The concept “internet” is not well represented.

102 Figure produced by using http://www.research.att.com/~jon/dotplot/try.html [2001-01-29]. The first “line” in the
plot is each time the reproduction of the 308 characters string as a legend.
103 200 according to [Veerasamy, Hudson, Navathe 1995], 150 according to [Veerasamy 1996]. There are also
some other changes in the interfaces described in the 1995 and the 1996 papers.
Page 82
hidden
Page 82 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web
visualization
search
results
internet
Total sum:
10 20
1. Visualizing Search Results using SQWID.
2. Visualization of WWW-Search Results.
3. Visualizing World Wide Web Information Resources.
4. Evaluating a Visual Retrieval Interface: AspInquery at TREC-6.
5. Real Life Information Retrieval: a Study of User Queries on the Web.
6. Using A Data Fusion Agent for Searching the WWW.
7. Clarifying Search: A User-Interface Framework for Text Searches.
8. Evaluation of Text, Numeric and Graphical Presentations for Information Retrieval Interfaces.
9. Querying, Navigating and Visualizing a Digital Library Catalog.
10. TileBars: Visualization of Term Distribution Information in Full Text Information Access.
11. A New Paradigm for Browsing the Web.
12. IVEE: An Information Visualization and Exploration Environment.
13. Queries? Links? Is there a Difference?
14. The WebBook and the Web Forager: An Information Workspace in the World Wide Web.
15. Using Graphic History in Browsing the World Wide Web.
16. Interfaces for Information Exploration: Seeing the Forest.
17. Visual Exploration of Large Structured Data Sets.
18. Scatter/Gather Browsing Communicates the Topic Structure of a Very Large Text Collection.
19. Space-Scale Diagrams: Understand Multiscale Interfaces.
20. Enhanced Dynamic Queries via Movable Filters.
Figure 48: Principle of bar-graphs by [Veerasamy 1996] / [Veerasamy, Belkin 1996]
The stacked histograms used by [Shneiderman, Byrd, Croft 1997] in the WInquery system are a
related approach. The authors also proposed a solution to show the contribution of each query term
for the overall relevance of the document. Their solution, compared to the approach from
Veerasamy et al., focuses more on specific information about individual documents, than aggre-
gated information about the query results in general. The WInquery system is a redesign of the
XINQUERY User-Interface done by the authors based on their proposed four-phase framework
for search and eight design rules adapted from a previous edition of [Shneiderman 1998]104. The
idea of the stacked histograms has also been influenced by Hearst’s tilebars. In a later publication
by [Byrd 1999] the component had been named “VQRa”, as Visualization of the Query in relation
to individual Retrieved documents. The “a” is added because Byrd also describes a solution “b”.
Figure 49 shows the principle using the same query, the same ranking and the same result set as
used above. Looking on the documents #6 and #7 the same results can be observed as described in
the explanation of the bar-graph.
1. Visualizing Search Results using SQWID.
2. Visualization of WWW-Search Results.
3. Visualizing World Wide Web Information Resources.
4. Evaluating a Visual Retrieval Interface: AspInquery at TREC-6.
5. Real Life Information Retrieval: a Study of User Queries on the Web.
6. Using A Data Fusion Agent for Searching the WWW.
7. Clarifying Search: A User-Interface Framework for Text Searches.
8. Evaluation of Text, Numeric and Graphical Presentations for Information Retrieval Interfaces.
9. Querying, Navigating and Visualizing a Digital Library Catalog.
10. TileBars: Visualization of Term Distribution Information in Full Text Information Access.
11. A New Paradigm for Browsing the Web.
12. IVEE: An Information Visualization and Exploration Environment.
13. Queries? Links? Is there a Difference?
14. The WebBook and the Web Forager: An Information Workspace in the World Wide Web.
15. Using Graphic History in Browsing the World Wide Web.
16. Interfaces for Information Exploration: Seeing the Forest.
17. Visual Exploration of Large Structured Data Sets.
18. Scatter/Gather Browsing Communicates the Topic Structure of a Very Large Text Collection.
19. Space-Scale Diagrams: Understand Multiscale Interfaces.
20. Enhanced Dynamic Queries via Movable Filters.
visualization search results internet
Score Rank Title

Figure 49: Principle of stacked histograms / VQRa of the WInquery system by [Shneiderman, Byrd, Croft
1997], [Byrd 1999]
[Cugini, Laskowski, Piatko 1998] also try to visualize the relevance of each term in a multiple-
term query. They used in the NIRVE system flat Iconic Representations of documents showing the

104 In fact they took the rules from the second edition 1992. The 1998-version is the third edition.
Page 83
hidden
Thomas M. Mann Page 83 from 266
Visualization of Search Results from the World Wide Web 3: Information Visualization

“concept profile” in form of the relevance of each concept. The Iconic Representation has been
used in 3D components of the NIRVE system showing interdocument similarities such as the
Document Spiral, the 3-D Axes view, or the Concept Globe [Cugini, Laskowski, Sebrechts 2000].
Being part of a visualization of interdocument similarities the Iconic Representation itself could
clearly be used to visualize document attributes. Figure 50 shows the principle using the document
set from the WebViz example. Looking again at documents #6 and #7 the same results can be ob-
served as described in the explanation of the bar-graph. In the 3D visualizations of the NIRVE
system the icons are on user request additionally decorated with small glyphs for attributes like
document length or overall document scores. In later versions there was also an additional glyph
indicating user judgments of the document (green = good, red = bad, yellow = undecided) [Cugini,
Laskowski, Sebrechts 2000].
1 2 876543 9 10 161514131211 20191817
Figure 50: Principle of the Iconic Representation in the 3D Document Space of the NIRVE system by [Cugini,
Laskowski, Piatko 1998]
[Grewal, Jackson, Wallis et al. 1999], [Grewal, Burden, Jackson et al. 1999], [Grewal, Jackson,
Burden et al. 2000] also try to visualize the relevance of each term in a multiple-term query. In
their R-Wheel (Relevance Wheel) component, each term has its own circle segment and color as-
sociated. The segment is filled in proportion to the relevance of the term. The number of circle
segments corresponds to the number of keywords. Figure 51 shows the R-Wheels using the docu-
ment set from the WebViz example. Looking again at documents #6 and #7 the same results can
be observed as described in the explanation of the bar-graph.
1 2 876543 9 10 161514131211 20191817
Figure 51: Principle of R-Wheels (Result Wheels) by [Grewal, Burden, Jackson et al. 1999], [Grewal, Jackson,
Burden et al. 2000]
Both Veerasamy and Grewal et al. discuss a number of other ideas for the visualization of query
term contribution. For different reasons in the end they prefer the solutions shown in Figure 48 and
Figure 51. Figure 52 shows the principles of their additional ideas using documents #6 and #7 as
examples.
6 7 6 7 6 7 6 7
6
7
6 7 6 7
Star plots Glyphs Vertically
stacked
bargraphs
Horizontally
stacked
bargraphs
alligned
horizontally
Horizontally
stacked
bargraphs
alligned
vertically
Bar-chart Slider-bar
Veerasamy Grewal et.al.

Figure 52: Additional ideas of [Veerasamy 1997] and [Grewal, Jackson, Burden et al. 2000]
In [Grewal, Jackson, Wallis et al. 1999] there is one more idea named “tepee”. Basic structure is a
transparent pyramid which has as many base sides as keywords displayed. Inside the tepee is a
pendulum. The length of the pendulum represents the overall relevance. The pendulum is attracted
Page 85
hidden
Thomas M. Mann Page 85 from 266
Visualization of Search Results from the World Wide Web 3: Information Visualization

the fourth row “internet” OR “www”. The terms “internet” OR “www” can only be found at the
beginning of the first document. In this document the term “Web” is used frequently but is not part
of the term set used here as a query, and therefore is not indicated in the TileBar. In addition, it can
be seen that all of the four term sets can be found in the first part of the first document. The second
document contains a frequent co-occurrence of the terms “search” and “results”. The term set
“visualization” OR “visualization” is not as dominant as in the first document. The last of the three
documents is much shorter than the two previous ones.
Term Set 1: visualization visualisation
Term Set 2: search
Term Set 3: results
Term Set 4: internet www
[Veerasamy, Navathe 1995] Querying, Navigating and Visualizing a Digital Library Catalog.
http://www.csdl.tamu.edu/DL95/papers/veerasamy/veerasamy.html
[Hearst 1995] TileBars: Visualization of Term Distribution Information in Full Text Information Access.
http://www.acm.org/sigchi/chi95/Electronic/documnts/papers/mah_bdy.htm
[Mann 1999] Visualization of WWW-Search Results.
http://www.inf.uni-konstanz.de/~mann/papers/mann_webvis99.html

Figure 54: Principle of the TileBars by [Hearst 1995]105
[Heo, Morse, Willms et al. 1996] modified the TileBar idea in the CASCADE system for the usage
with a single document, and coupled it with a scrollbar. Also combined with the scrollbar is a
component called Mural. While the TileBar shows the distribution of query terms in the document,
the Mural shows the distribution of the hyperlinks. The CASCADE (Computer Augmented Sup-
port for Collaborative Authoring and Document Editing) system is a tool to support collaborative
authoring of documents. In the CASCADE system, Mural and TileBars are used as intra-document
tools to ease navigation through the usage of landmarks. Landmarks in the document are the links
and the matches of the query terms. Figure 55 demonstrates the principle of Mural plus TileBars
using a HTML-version of [Mann 1999] and the WebViz-query reduced to the maximum number
of three allowed term sets in the CASCADE system.
Visualization of WWW-Search Results
Thomas M. Mann
Computer and Information Science, University of Konstanz, D-78457 Konstanz, Germany
Thomas.Mann@uni-konstanz.de
Abstract
The idea of Information Visualization is to get insights into great amounts of abstract data. Especially
document sets found by searching the World Wide Web are a special challenge. The paper gives a short
overview on the variety of possible visualizations for this application area. The presented ideas are
grouped by using the four phase framework of information seeking. Crucial factors for the success of
Figure 55: Principle of Mural and TileBars in the CASCADE system by [Spring, Morse, Heo 1996], [Heo,
Morse, Willms et al. 1996]
Another modification of the TileBar idea is the content-displaying scrollbar VQRb of the FancyV
prototype described by [Byrd 1999]. An ordinary scrollbar is modified and shows every query
term hit in the document in the scrollbar pane using 3-by-3 pixel squares colored according to the
color associated to the keyword. The slider of the scrollbar is white to ease recognition in the cur-
rently displayed portion of the document. Additionally the VQRb is combined with colored query
term highlighting in the document itself. Figure 56 shows the principle using the WebViz-example
and a one-column version of the document [Mann 1999].

105 The figures in [Hearst 1995] show no colors, but [Hearst 1999] Figure 10.15, which can also be retrieved from
the XEROX PARC Web server in a colored version, shows the usage of colors. Hearst uses not saturated colors. They
look nicer than the color palette used for the examples in this thesis.
Page 87
hidden
Thomas M. Mann Page 87 from 266
Visualization of Search Results from the World Wide Web 3: Information Visualization


Figure 57: Example of the libViewer from the SOMLib system106 [Rauber, Bina 1999], [Rauber, Bina 2000]
The last idea introduced in this chapter leads to the next chapter about the visualization of inter-
document similarities. [Chase, D’Amore, Gershon et al. 1998] describe an Entity Relation Visuali-
zation in the NetMap system, where individual entities found in documents and their relations are
encoded with color, shapes, and connecting lines. The main target of Netmap are interdocument
relationships, but the authors describe also that the Entity Relation Visualization can be used for
the discovery of interentity relationships. Figure 58 shows the principle according to [Chase,
D’Amore, Gershon et al. 1998]. Probabely the component could also be used for entities found in a
single document.
[org] New
York Stock
Exchange
[org]
Lazard
Freres &
[org]
Kyocera
Corp.
[org]
AVX
Corp.
[pla]
Europe
[pla]
United
States
[pla]
Japan
[Doc] 890929-
0110.txt
[tit]
general
partner
[per]Kazuo
Inamori
[per] Marshall
D. Butler
[per] John
O‘Herron

Figure 58: Principle of Entity Relation Visualization by [Chase, D’Amore, Gershon et al. 1998] Figure 3e.
This chapter introduced a number of ideas for the visualization of document attributes. Table 18
gives an overview covering the components discussed. Besides simple miniaturizations of docu-
ments or their first pages, with or without fisheye or distortion techniques, we have seen a number
of visualizations of document attributes, metadata or query related information. Query related in-
formation included overall relevance, relevance per keyword or concept, and query term or con-
cept distribution in the document. Metadata displayed included size of the document, document
type, bookmark status, or visitation information. Document attributes included self-similarity pat-
terns, hyperlink information, or the occurrence of query-independent items.

106 Figure produced using http://student.ifs.tuwien.ac.at/~andi/libViewer/ [2001-03-02]
Page 88
hidden
Page 88 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web
Component Literature Used in System
[Ayers, Stasko 1995] MosaicG
[Ginsburg, Marks, Shieber 1996] DeckView
[Hightower, Ring, Helfman et al. 1998] PadPrints
[Bederson, Hollan, Stewart et al. 1998] Pad++ Web browser
[Robertson, Czerwinski, Larson et al. 1998], [Czerwinski,
Dumais, Robertson et al. 1999]
Data Mountain
[Kaugars 1998]
[Ogden, Davis, Rice 1998] J24
[Amento, Hill, Terveen et al. 1999] TopicShop
Thumbnail views
[Cockburn, Greenburg, McKenzie et al. 1999], [Cockburn,
Greenberg 1999], [Cockburn, Greenberg 1999a], [Kaasten,
Greenberg 2000], [Kaasten, Greenberg 2001]
webView and other
unnamed systems
“semi open view” [Kaugars 1998]
Relevant Extracts plus Curve
of Relevance
http://www.arisem.com [2001-02-11] DigOut4U
SeeSoft “bar view” [Eick, Steffen, Sumner 1992], [Eick 1994], [Wills 1995] SeeSoft
Information Mural [Jerding, Stasko 1995], [Jerding, Stasko 1997]
Matrix of Icons [Nowell, France, Hix et al. 1996] Envision
Dotplots [Church, Helfman 1993]
bar-graph [Veerasamy, Navathe 1995], [Veerasamy, Hudson, Navathe
1995], [Veerasamy 1996]
Tkinq
Iconic Representation [Cugini, Piatko, Laskowski 1997], [Cugini, Laskowski,
Piatko 1998], [Cugini, Laskowski, Sebrechts 2000]
NIRVE
R-Wheels or Result Wheel [Grewal, Jackson, Wallis et al. 1999], [Grewal, Burden,
Jackson et al. 1999], [Grewal, Jackson, Burden et al. 2000]

Retrieval History Histogram [Golovchinsky 1997] VOIR
[Hearst 1995]
[Spring, Morse, Heo 1996], [Heo, Morse, Willms et al. 1996] CASCADE
TileBars
[Dieberger, Russell 2001]
Mural [Spring, Morse, Heo 1996], [Heo, Morse, Willms et al. 1996] CASCADE
libViewer [Rauber, Bina 1999], [Rauber, Bina 2000] SOMLib
Entity Relation Visualization [Chase, D’Amore, Gershon et al. 1998] NetMap
Table 18: Components for the visualization of document attributes
3.3.3.3. Visualization of interdocument similarities
In the last two chapters, we have already seen a number of components capable of providing in-
formation about not only the query or a single document, but also first overviews about parts of or
the whole result set. Examples are the InfoCrystal from Spoerri or the bar-graphs from Veerasamy
et al.. We now turn the focus explicitly to visualizations of document groups and interdocument
similarities. On the set level, which means the representation of the whole set of results, it will be
interesting to get an overview. Are there any trends, clusters, or hot spots? Do the suggestions
made by the system in response to the query seem to satisfying the information needs at all? The
last two chapters had only been spotlights on the field of ideas how to map data tables to visual
structures. This chapter will be even more incomplete in relationship to a full discussion of all
components that can be found in the literature. The number of visualization ideas for document
sets is considerably higher than the number of ideas for the two areas discussed so far. Therefore,
Page 89
hidden
Thomas M. Mann Page 89 from 266
Visualization of Search Results from the World Wide Web 3: Information Visualization

the discussion about visualization ideas for document sets is separated into two parts, and addi-
tionally focuses mainly on 2D-visualizations. Whereas this chapter discusses the visualization of
interdocument similarities, the next chapter deals with the visualization of interdocument connec-
tions. So far the majority of ideas discussed have been 2D-visualizations. An exception has been
the chapter about metaphors where 3D-ideas played an important role. In the area of visualizations
of document sets 3D-approaches are found quite frequently. Regardless 3D-visualization ideas
have been excluded relatively early in the process of identifying candidates to be included into the
INSYDER system. Today’s hardware no longer poses a restriction, however, navigation in 3D-
space with standard input devices such as keyboards and conventional mice still create a barrier. In
addition, a number of authors like [Nielsen 1998] report problems with 3D-approaches. Since the
typical technical environment of the target user group are standard PCs and input devices, 3D-
approaches were not included in the list of potential components for the INSYDER system. There-
fore, the following overview of the variety of approaches for the visualization of document sets
will focus on 2D-components.
The decision to classify a visualization component as suitable for the visualization of queries, or
document attributes, or interdocument similarities is not always easy, because a number of compo-
nents can be used for different purposes. Bargraphs are a good example for this. The bar-graphs
from Veerasamy et al. reveal information about documents’ attributes in relation to the query but
also provide an overview of the whole result set or parts of it. The same is true for the stacked his-
tograms / VQRa of the WInquery system. A predessor of them is the Bargraph view of the
XINQUERY system [Shneiderman, Byrd, Croft 1997]. One bar represents a document, showing
with its length the relevance score value of the document. Figure 59 shows the principle using the
20 document result set of the WebViz-example. The currently displayed document (in example #1)
is marked using a different color. Besides showing the score and position of a document in the
result set, the overview of the result set and the visualization of interdocument similarities for this
two dimensions seems to be dominant, leading to the decision to categorize the XINQUERY bar-
graph in this chapter. Dominant function by subjective classification by the author is also used for
all the other components discussed.
1 10 20
Figure 59: Principle of the Bargraph in the XINQUERY system according to [Shneiderman, Byrd, Croft 1997]
In the FISH (Forager for the Information Super Highway) component of the Starfish system,
[Mitchell, Day, Hirschman 1995] use rectangles to represent documents and their relevance. In
some configurations these rectangles look like bars. Inspired by the Tree-Map approach from
[Shneiderman 1992] attributes of documents from a result set of a multi-source WAIS query are
encoded by space, order, and color. Figure 60 shows the principle using the WebViz-example. In
the example on the left side the Tree-Map approach is omitted by showing all 20 documents in one
group. Relevance is mapped to size, position, and color of the rectangle representing the docu-
ment. The size of the rectangle is proportional to the relevance of the document. Higher relevant
documents use lighter colors and the documents are ordered by relevance. Tree-maps had been
designed to represent hierarchies. In the right part of Figure 60 a hierarchy is used by showing the
documents grouped by domain of the server from where they have been downloaded. The size of
the rectangle representing a domain is determined by the sum of the relevance of the documents in
Page 90
hidden
Page 90 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web
the set coming from this domain. The mapping of the document relevance is done in the same way
as on the left side of the figure, except in regards to the hierarchy when ordering the documents
and using a variable width of the document representations.
[Mann 1999] Visualization of WWW-Search Results.
[Gershon, Winstead, LeVasseur et al. 1995] Visualizing World Wide Web Informa
[Swan, Allan, Byrd 1998] Evaluating a Visual Retrieval Interface: AspInquery at TR
[Jansen, Spink, Bateman at al. 1998] Real Life Information Retrieval: a Study of U
[Smeaton, Crimmins 1997] Using A Data Fusion Agent for Searching the WWW.
[Shneiderman, Byrd, Croft 1997] Clarifying Search: A User-Interface Framework f
[Morse, Lewis, Korfhage et al. 1998] Evaluation of Text, Numeric and Graphical P
[Veerasamy, Navathe 1995] Querying, Navigating and Visualizing a Digital Library Catalog.
[Hearst 1995] TileBars: Visualization of Term Distribution Information in Full Text Information Access.
[Brown, Shillner 1995] A New Paradigm for Browsing the Web.
[Ahlberg, Wistrand 1995] IVEE: An Information Visualization and Exploration Environment.
[Golovchinsky 1997] Queries? Links? Is there a Difference?
[Card, Robertson, York 1996] The WebBook and the Web Forager: An Information Workspace in the World W
[Baldonado 1998] Interfaces for Information Exploration: Seeing the Forest.
[Ayers, Stasko 1995] Using Graphic History in Browsing the World Wide Web.
[McCrickard, Kehoe 1997] Visualizing Search Results using SQWID. root
.com .de .edu .gov .se.org
[Shneiderman
[Hearst 1995] Tile
[Brown, Shillner 1
[Card, Robertson,
[Ayers, Stasko 19
[Swan, Allan
[Jansen, Sp
[Golovchinsky 1
[Baldonado 199
[Mann
[Gersho
[Smeat
[Mc
[Veer
[ [

Figure 60: Principle of the FISH component from the Starfish system of [Mitchell, Day, Hirschman 1995]
Treemaps are used in a number of systems to represent multi-step hierarchies. Hierarchies are
normally visualizations of interdocument connections, and not of interdocument similarities such
as discussed in this chapter. In a number of cases, however, the usage of treemaps is more oriented
to the later case. The Information Navigator [Au, Carey, Sewraz et al. 2000], [Carey 2000],
[Carey, Kriwaczek, Rüger 2000] is another example for using Treemaps to visualize interdocu-
ment similarities from Web search results. The left part of Figure 61 shows the initial view of a
document set retrieved using the query from the WebViz-example in a database containing about
550,000 documents from the TREC CDs vol4 and vol5. The hierarchy is determined by clustering
the documents using statistical information about the terms in the documents. Instead of the docu-
ments themselves only clusters and super clusters are shown. The term list on the right side of the
Treemap shows the statistically collected terms from the documents in this cluster in descending
order of occurrence. The right side of Figure 61 shows a re-clustered subset of the complete result
set after selecting one of clusters.

Figure 61: Example of the Treemap View from the Information Navigator [Au, Carey, Sewraz et al. 2000]107
[Chimera 1992] works with Value Bars. Quantifiable attributes are mapped each to a separate bar
next to the scrollbar of a list. The idea is to show an attribute distribution overview for important

107 Figures produced using http://rowan.doc.ic.ac.uk:8000/InfoNavigator/provodnik.html [2001-03-04]
Page 93
hidden
Thomas M. Mann Page 93 from 266
Visualization of Search Results from the World Wide Web 3: Information Visualization

0
visualization
results
Figure 65: Scatterplot with two axes
The Three-Keyword Axes Display used by [Cugini, Piatko, Laskowski 1997] in the NIRVE sys-
tem is a 3D-version of the same idea. The NIRVE (National Institute of Standards and Technology
Information Retrieval Visualization Engine) system is an advanced visual interface for the PRISE
statistical text retrieval system. In [Cugini, Piatko, Laskowski 1997] the authors talk about an “ad-
vanced visual interface”, in later publications the name NIRVE is used. A number of components
also undergo some name changes. The Three-Keyword Axes Display had later been called 3D-
Axes [Cugini, Laskowski, Sebrechts 2000]. In the component document icons are positioned in a
three-dimensional scatterplot based on keyword strength statistics. The left side of Figure 66
shows an example of the early Three-Keyword Axes Display using the four-keyword query “retir”,
“commun”, “trens”, and “develop”. The first three keywords are mapped to the axes. If the query
contains more than three keywords, the user has the possibility to assign any subset of keywords to
each axis. Therefore, a separate keyword window is used with a column of checkboxes for the X-,
Y-, and Z-axes shown in the upper left corner. The principle of the keyword window is similar to
the principle of the Keyword-Concept Matrix or Concept Control from the same authors shown in
Figure 33 on page 72. In the document space of the Three-Keyword Axes Display / 3-D Axes the
documents themselves are shown with the Iconic Representation explained on page 83. The length
of an extra bar outside the square indicates the overall relevance of a document. The right side of
Figure 66 shows an example of the later 3-D Axes variant.

Figure 66: Three-Keyword Axes Display (left figure), 3-D Axes (right figure) from the NIRVE system. Courtesy
of NIST John V. Cugini111

111 Download from http://www.itl.nist.gov/iaui/vvrg/cugini/uicd/gallery/axes.gif [2001-02-26], and http://www.
itl.nist.gov/ iaui/vvrg/cugini/uicd/gallery/ax3d-detail.gif [2001-02-26]
Page 96
hidden
Page 96 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web
A special form of scatterplot is the Galaxy of News by [Rennison 1994]. A document space, proc-
essed by a relationship construction engine, is visualized in different panes and different levels of
details. Starting with a scattered keyword overview the system displays depending on user action
more and more details like headlines of articles or the body of articles. Galaxy of News relies
heavily on interaction and animation.
A third group of components for the visualization of interdocument similarities use reference
points to position document representations in virtual document spaces. In the VIBE system
[Korfhage 1991] document icons are displayed in a virtual 2D-document-space. Reference points
or Points of Interest (POIs) form a coordinate system for positioning document icons. The docu-
ments are attracted by the reference points according to the relevance for the individual reference
points. Figure 70 shows the principle using the 20 document result set of the WebViz-example.
The four concepts displayed as circles are used as reference points. The documents are displayed
as squares. The VIBE system originally used rectangles. In the example, the size of the squares is
determined by the overall relevance of the document. The explanations are taken logical from
[Korfhage 1991]. The two documents with explanations on the dotted line between “visualization
and “results” show some of the problems with the positioning of documents in a 2D space between
POIs. The position alone makes it sometimes hard to determine which POIs are concerned. Part of
the idea is therefore the possibility to add or delete and move POIs around to see which documents
are influenced from which reference points. The basic idea to display document icons in space
between reference points has also been adapted by a number of other systems including WebVIBE
[Morse, Lewis 1997], or the Radial visualization of the Information Navigator [Au, Carey, Sewraz
et al. 2000] as 2D-implementations, and VR-VIBE [Benford, Snowdon, Greenhalgh et al. 1995] or
the Relevance Sphere of the LyberWorld system [Hemmje 1993a], [Hemmje, Kunkel, Willett
1994] as 3D implementations. WebVIBE114 is a simplified Java Version of VIBE using a magnet
metaphor for the reference points.
visualization search
internet
All documents in this
triangle must be
influenced by "internet"
This document is only influenced
by "visualization" and "results"
results
Midway between "search" and "results"
implies equal influence
from "search" and "results"
Strong "search" influence
some "results" influence
This document is equaly influenced
by "search" and "internet" and
additionally by "results"

Figure 70: Principle of the reference points - documents display of the VIBE system and explanations according
to [Korfhage 1991] page 138

113 Download from http://showcase.pnl.gov/showcase/medialib.nsf/by+id/APOO-4SH26G?opendocument [2001-
02-23]
114 Web-version available at http://www2.sis.pitt.edu/~webvibe/ [2001-02-18]. Document set display not working
during the preparation of this thesis. Tested with two different PCs and four different browsers / browser versions.
Page 98
hidden
Page 98 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web

Figure 72: Example of the Radial visualization from the Information Navigator [Au, Carey, Sewraz et al. 2000],
[Carey, Kriwaczek, Rüger 2000]116
The difference between a scatterplot mapping values to axes and the usage of reference points is
easy to understand when using the keywords as Points of Interest for which the relevance values
have been mapped to the axes in Figure 65 on page 93. Figure 73 shows the result. All documents
are positioned on a line between the two POIs instead of being scattered on the 2D pane. Addition-
ally a comparison with Figure 70 between the use of four or two reference points shows the
repositioning of the two documents marked on the line.
visualization
The document only influenced
by "visualization" and "results"
is still on the same position
results
The document equaly influenced
by "search" and "internet" and
additionally by "results" is now here

Figure 73: Document space with two reference points
Besides scatterplots populating maps based on real geographic attributes, there exist a number of
approaches using the scatterplot plus landscape metaphor to create artificial 2D or 3D maps or
landscapes of document spaces. A number of systems using this metaphor have already been men-
tioned on page 54. Examples of systems using this type of components are Bead [Chalmers,
Chitson 1992], Harmony [Andrews 1995], Vineta [Krohn 1995], or SPIRE [Wise, Thomas, Pen-
nock et al. 1995]. The difference between the visualization of interdocument similarities and inter-
document connections is very small in this type of components. Early versions of Bead [Chalmers,
Chitson 1992], [Chalmers 1993], for example, show only a landscape of documents. Later versions
[Chalmers 1995], [Chalmers 1996] show also connections between the documents. The same can
be said for a number of other starfield of scatterplot visualizations. The discussion of dual-use
components in this thesis focuses on the aspects of interdocument similarities.

116 Figures produced using http://rowan.doc.ic.ac.uk:8000/InfoNavigator/provodnik.html [2001-03-04]
Page 100
hidden
Page 100 from 266 Thomas M. Mann
3: Information Visualization Visualization of Search Results from the World Wide Web
1
1
·
··
3
3
··
2
·
·
1
1
6
··
··
·
1
1
3
retriev al
61
··2
2
librar
intelligent
1·1 search· ·· ·3·· · ·· ·
knowledge
12
·· 2online
1others29
6
··
2
application
3network1 ··
22 ··
1 1·· ·
machine learning
·
1
·
1
3
··
··
···
··
···
3
· ·
4 4··
1
·
1
·
·
1
·
1
2
··
···
·
2
1
·
4
··
···
·
1
6
1
2
expert
sy stem
citation
database
··
language
natural
process
2
··
2
··
1
1
··
1
·
2
1
·
4
··
research

Figure 75: Principle of the self-organizing semantic map according to [Lin, Soergel, Marchionini 1991]
Figure 76 shows examples of SOMs using the 20 document result set of the WebViz-example. The
left side shows the documents mapped to a 5x5 grid using the automatically extracted most used
noun phrases of the document set. Dominant clusters are “user”, “users”, and “introduction”. Users
seem to be important for the authors and scientific papers seem to have often an introduction. The
occurrence of keywords is in the system used to produce the maps influenced by a mechanism that
automatically groups nouns to phrases like “information visualization” or “query results”. Non-
grouped terms like “user” or “introduction” may therefore be more dominant than in a case with-
out using a grouping function. The right side shows a 5x5 map produced by manually selecting a
number of noun phrases. The system is very sensitive to the selection of noun phrases. Adding for
example the additional term “visualization” when generating the second map causes the system to
produce only one single cluster labeled “visualization”. This sensibility may be caused by the
small result set of 20 documents.

Figure 76: SOMs of the 20 document result set of the WebViz-example119
Figure 77 shows an example of a SOM produced by using the WebViz-example query and the
Adaptive SOM Web service from the University of Arizona that transfers entered queries to Alta-
Vista and then generates a SOM used to display the returned result set.

119 Figures produced using a trial version of the Competitive Intelligence Spider (CI Spider) V 1.2.2 from the Uni-
versity of Arizona and Knowledge Computing Corporation. Download from http://ai.bpa.arizona.edu/go/download/
cispider/CISpider.html [2001-03-02]. Stop word list was empty when producing the SOM.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

6 Readers on Mendeley
by Discipline
 
by Academic Status
 
17% Student (Master)
 
17% Student (Bachelor)
 
17% Post Doc
by Country
 
33% United Kingdom
 
33% Germany
 
17% Canada