Characterizing the Usability of Interactive Applications Through Query Log Analysis
Human Factors (2011)
- ISBN: 9781450302289
- DOI: 10.1145/1978942.1979205
Available from portal.acm.org
or
Author-supplied keywords
Available from portal.acm.org
Page 1
Characterizing the Usability of Interactive Applications Through Query Log Analysis
Characterizing the Usability of Interactive Applications
Through Query Log Analysis
Adam Fourney
afourney@cs.uwaterloo.ca
Richard Mann
mannr@uwaterloo.ca
Michael Terry
mterry@cs.uwaterloo.ca
David R. Cheriton School of Computer Science
University of Waterloo
ABSTRACT
People routinely rely on Internet search engines to support
their use of interactive systems: they issue queries to learn
how to accomplish tasks, troubleshoot problems, and other-
wise educate themselves on products. Given this common
behavior, we argue that search query logs can usefully aug-
ment traditional usability methods by revealing the primary
tasks and needs of a product’s user population. We term
this use of search query logs CUTS—characterizing usabil-
ity through search. In this paper, we introduce CUTS and de-
scribe an automated process for harvesting, ordering, label-
ing, filtering, and grouping search queries related to a given
product. Importantly, this data set can be assembled in min-
utes, is timely, has a high degree of ecological validity, and
is arguably less prone to self-selection bias than data gath-
ered via traditional usability methods. We demonstrate the
utility of this approach by applying it to a number of popular
software and hardware systems.
Author Keywords
Query log analysis, Usability
ACM Classification Keywords
H.5.2 Information Interfaces and Presentation: Miscellaneous
General Terms
Human Factors
INTRODUCTION
People rely on search engines (e.g., Google1, Yahoo!2, Bing3,
etc.) to support their use of interactive systems [5, 6]. For ex-
ample, users submit search queries to locate tutorials, trou-
bleshoot problems, or learn how to use specific features of an
application. Given this behavior, search engine query logs
1http://www.google.com
2http://www.yahoo.com
3http://www.bing.com
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
CHI 2011, May 7–12, 2011, Vancouver, BC, Canada.
Copyright 2011 ACM 978-1-4503-0267-8/11/05...$10.00.
Figure 1. An overview of CUTS. Steps 1-2 are easily performed with
access to raw query logs, but otherwise require approximation tech-
niques. Step 3 utilizes our query taxonomy specialized for interactive
systems.
serve as centralized repositories cataloguing the day-to-day
needs of the user base of any publicly available interactive
system.
In this paper, we argue that search engine query logs can
be filtered and transformed into forms that usefully com-
plement and augment data collected via traditional usabil-
ity methods. We demonstrate this potential by introducing
an automated process for harvesting, ordering, labeling, fil-
tering, and grouping search queries to understand the com-
mon tasks and needs of a user base (Figure 1). We call this
process CUTS—characterizing usability through search. Im-
portantly, the labeled, ordered data produced by CUTS can
be assembled in minutes, is timely, has a high degree of
ecological validity, and is arguably much less prone to self-
selection bias than traditional means of collecting data from
users.
As an example of the utility of this approach, an approxima-
tion of this process can be illustrated using Google Suggest,
the service that provides query completion suggestions for a
given input. Given the phrase “firefox how to”, Google Sug-
gest produces a list of 10 suggested completions (Figure 2).
As we will show later, these suggestions closely correspond
to the 10 most popular queries matching that input.
From the list of top 10 Firefox “how to” suggestions (Figure
2), it is immediately clear that users have a number of pri-
Through Query Log Analysis
Adam Fourney
afourney@cs.uwaterloo.ca
Richard Mann
mannr@uwaterloo.ca
Michael Terry
mterry@cs.uwaterloo.ca
David R. Cheriton School of Computer Science
University of Waterloo
ABSTRACT
People routinely rely on Internet search engines to support
their use of interactive systems: they issue queries to learn
how to accomplish tasks, troubleshoot problems, and other-
wise educate themselves on products. Given this common
behavior, we argue that search query logs can usefully aug-
ment traditional usability methods by revealing the primary
tasks and needs of a product’s user population. We term
this use of search query logs CUTS—characterizing usabil-
ity through search. In this paper, we introduce CUTS and de-
scribe an automated process for harvesting, ordering, label-
ing, filtering, and grouping search queries related to a given
product. Importantly, this data set can be assembled in min-
utes, is timely, has a high degree of ecological validity, and
is arguably less prone to self-selection bias than data gath-
ered via traditional usability methods. We demonstrate the
utility of this approach by applying it to a number of popular
software and hardware systems.
Author Keywords
Query log analysis, Usability
ACM Classification Keywords
H.5.2 Information Interfaces and Presentation: Miscellaneous
General Terms
Human Factors
INTRODUCTION
People rely on search engines (e.g., Google1, Yahoo!2, Bing3,
etc.) to support their use of interactive systems [5, 6]. For ex-
ample, users submit search queries to locate tutorials, trou-
bleshoot problems, or learn how to use specific features of an
application. Given this behavior, search engine query logs
1http://www.google.com
2http://www.yahoo.com
3http://www.bing.com
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
CHI 2011, May 7–12, 2011, Vancouver, BC, Canada.
Copyright 2011 ACM 978-1-4503-0267-8/11/05...$10.00.
Figure 1. An overview of CUTS. Steps 1-2 are easily performed with
access to raw query logs, but otherwise require approximation tech-
niques. Step 3 utilizes our query taxonomy specialized for interactive
systems.
serve as centralized repositories cataloguing the day-to-day
needs of the user base of any publicly available interactive
system.
In this paper, we argue that search engine query logs can
be filtered and transformed into forms that usefully com-
plement and augment data collected via traditional usabil-
ity methods. We demonstrate this potential by introducing
an automated process for harvesting, ordering, labeling, fil-
tering, and grouping search queries to understand the com-
mon tasks and needs of a user base (Figure 1). We call this
process CUTS—characterizing usability through search. Im-
portantly, the labeled, ordered data produced by CUTS can
be assembled in minutes, is timely, has a high degree of
ecological validity, and is arguably much less prone to self-
selection bias than traditional means of collecting data from
users.
As an example of the utility of this approach, an approxima-
tion of this process can be illustrated using Google Suggest,
the service that provides query completion suggestions for a
given input. Given the phrase “firefox how to”, Google Sug-
gest produces a list of 10 suggested completions (Figure 2).
As we will show later, these suggestions closely correspond
to the 10 most popular queries matching that input.
From the list of top 10 Firefox “how to” suggestions (Figure
2), it is immediately clear that users have a number of pri-
Page 2
Figure 2. The top 10 suggestions provided by Google Suggest for the
phrase “firefox how to”.
vacy and security concerns, as evidenced by their desire to
clear their cache, history, and cookies. However, the eighth
item (“get menu bar back”) is particularly interesting. An
inspection of the Firefox user interface (version 3.6 on Win-
dows), reveals that the top-level menu bar is easily hidden
by deactivating the “Menu bar” item in Firefox’s “View !
Toolbars” sub-menu. However, once this action is taken, it
is not easily reversed: The top-level menuing system is now
hidden, removing the very means the user would employ
to attempt to re-instate the menu bar. What is noteworthy
about this example is that we quickly moved from data de-
rived from query logs to a testable hypothesis regarding the
usability of the software.
The contributions in this paper lie in expanding this man-
ual process to the automated one shown in Figure 1. While
seemingly straightforward, automating this process requires
overcoming a number of challenges: Raw query logs are not
made publicly available; there is a need to automatically de-
termine query intent for the purposes of labeling and filtering
queries (for example, to distinguish troubleshooting queries
from those seeking to download the application); and differ-
ently phrased queries on the same topic should be reduced to
a common canonical form. Our specific contributions, out-
lined below, address these challenges.
To address the problems of obtaining and ranking search
queries, we demonstrate how publicly available query sug-
gestion services (e.g., Google Suggest) and web-based tools
for advertisers can be employed to create reasonable approx-
imations of raw query logs.
We also introduce two new query classification schemes to
address the need to label and filter queries. The first classi-
fication scheme is a taxonomy that extends previous search
query taxonomies to include categories relevant to interac-
tive systems. For example, this new taxonomy differentiates
between queries issued to troubleshoot a problem and those
seeking a tutorial. The second classification scheme con-
siders how a query is phrased. We show that how a query
is phrased closely corresponds to the categories of our spe-
cialized taxonomy. CUTS exploits the relationship between
these two classification schemes to ascribe query intent from
query phrasing.
Finally, common questions or issues are often expressed us-
ing a number of different query phrasings. To cope with this
variability, we introduce a transformation that enables minor
differences between queries to be ignored.
The rest of this paper is structured as follows. We first
present related work, then describe our method for harvest-
ing and ranking search queries using publicly available ser-
vices. We then introduce our two classification schemes and
show how they can be used to label and filter search queries.
The final step of the process, grouping queries, is discussed,
and a set of strategies are introduced to assist with this pro-
cess. We then present a series of examples illustrating the
overall utility of this approach, and conclude with a discus-
sion of the limitations of the technique.
BACKGROUND & RELATED WORK
In recent years, researchers have demonstrated the potential
for search engine query logs to model and predict real-world
phenomena and events. For example, Jeremy Ginsberg et
al. have demonstrated how query logs can be employed to
help track the spread of influenza over time [10]. In this lat-
ter research, “health-seeking behaviour” is automatically de-
tected by monitoring search terms associated with influenza
(symptoms, medications, etc.). This allows the Google Flu
Trends application4 to estimate the prevalence of influenza
infections on a week-to-week basis. The resultant models
closely agree with data released by the Center for Disease
Control (CDC), though they exhibit much less lag: Mod-
els built using query logs show a 24 hour lag in tracking flu
trends, compared to the week lag of the CDC.
More generally, Richardson [24] argues that query log anal-
ysis could quickly become an indispensable tool for re-
searchers working in such human-centric fields as anthro-
pology, sociology, psychology, medicine, economics, and
political science. He notes that query logs function as if
“a survey were sent to millions of people, asking them to,
every day, write down what they were interested in, think-
ing about, planning, and doing.” Accordingly, he argues that
“taken as a whole, across millions of users, ... queries con-
stitute a measurement of the world and humanity through
time” [24]. To demonstrate his point, Richardson describes a
common search pattern that unfolds over the course of three
to six months, starting with a user’s search for “mortgage
calculators”. Within a week, these same users search for
“realtors”. About one month later, they search for legal ser-
vices (e.g., “notary”), and three months later, their searches
include those for home furnishing (e.g., “pottery barn”). As
with Google Flu Trends, this latter example shows the po-
tential for query logs to describe real-world phenomena.
Within the realm of interactive systems, the research litera-
ture contains many accounts of search query logs being used
to improve information interfaces—interfaces in which find-
ing or accessing information is a user’s primary task. For
example, Zhicheng Dou et al. demonstrate how query logs
can be used to improve personalized search [8]. It is also
common for website designers to use query logs to help de-
4http://www.google.org/flutrends/
phrase “firefox how to”.
vacy and security concerns, as evidenced by their desire to
clear their cache, history, and cookies. However, the eighth
item (“get menu bar back”) is particularly interesting. An
inspection of the Firefox user interface (version 3.6 on Win-
dows), reveals that the top-level menu bar is easily hidden
by deactivating the “Menu bar” item in Firefox’s “View !
Toolbars” sub-menu. However, once this action is taken, it
is not easily reversed: The top-level menuing system is now
hidden, removing the very means the user would employ
to attempt to re-instate the menu bar. What is noteworthy
about this example is that we quickly moved from data de-
rived from query logs to a testable hypothesis regarding the
usability of the software.
The contributions in this paper lie in expanding this man-
ual process to the automated one shown in Figure 1. While
seemingly straightforward, automating this process requires
overcoming a number of challenges: Raw query logs are not
made publicly available; there is a need to automatically de-
termine query intent for the purposes of labeling and filtering
queries (for example, to distinguish troubleshooting queries
from those seeking to download the application); and differ-
ently phrased queries on the same topic should be reduced to
a common canonical form. Our specific contributions, out-
lined below, address these challenges.
To address the problems of obtaining and ranking search
queries, we demonstrate how publicly available query sug-
gestion services (e.g., Google Suggest) and web-based tools
for advertisers can be employed to create reasonable approx-
imations of raw query logs.
We also introduce two new query classification schemes to
address the need to label and filter queries. The first classi-
fication scheme is a taxonomy that extends previous search
query taxonomies to include categories relevant to interac-
tive systems. For example, this new taxonomy differentiates
between queries issued to troubleshoot a problem and those
seeking a tutorial. The second classification scheme con-
siders how a query is phrased. We show that how a query
is phrased closely corresponds to the categories of our spe-
cialized taxonomy. CUTS exploits the relationship between
these two classification schemes to ascribe query intent from
query phrasing.
Finally, common questions or issues are often expressed us-
ing a number of different query phrasings. To cope with this
variability, we introduce a transformation that enables minor
differences between queries to be ignored.
The rest of this paper is structured as follows. We first
present related work, then describe our method for harvest-
ing and ranking search queries using publicly available ser-
vices. We then introduce our two classification schemes and
show how they can be used to label and filter search queries.
The final step of the process, grouping queries, is discussed,
and a set of strategies are introduced to assist with this pro-
cess. We then present a series of examples illustrating the
overall utility of this approach, and conclude with a discus-
sion of the limitations of the technique.
BACKGROUND & RELATED WORK
In recent years, researchers have demonstrated the potential
for search engine query logs to model and predict real-world
phenomena and events. For example, Jeremy Ginsberg et
al. have demonstrated how query logs can be employed to
help track the spread of influenza over time [10]. In this lat-
ter research, “health-seeking behaviour” is automatically de-
tected by monitoring search terms associated with influenza
(symptoms, medications, etc.). This allows the Google Flu
Trends application4 to estimate the prevalence of influenza
infections on a week-to-week basis. The resultant models
closely agree with data released by the Center for Disease
Control (CDC), though they exhibit much less lag: Mod-
els built using query logs show a 24 hour lag in tracking flu
trends, compared to the week lag of the CDC.
More generally, Richardson [24] argues that query log anal-
ysis could quickly become an indispensable tool for re-
searchers working in such human-centric fields as anthro-
pology, sociology, psychology, medicine, economics, and
political science. He notes that query logs function as if
“a survey were sent to millions of people, asking them to,
every day, write down what they were interested in, think-
ing about, planning, and doing.” Accordingly, he argues that
“taken as a whole, across millions of users, ... queries con-
stitute a measurement of the world and humanity through
time” [24]. To demonstrate his point, Richardson describes a
common search pattern that unfolds over the course of three
to six months, starting with a user’s search for “mortgage
calculators”. Within a week, these same users search for
“realtors”. About one month later, they search for legal ser-
vices (e.g., “notary”), and three months later, their searches
include those for home furnishing (e.g., “pottery barn”). As
with Google Flu Trends, this latter example shows the po-
tential for query logs to describe real-world phenomena.
Within the realm of interactive systems, the research litera-
ture contains many accounts of search query logs being used
to improve information interfaces—interfaces in which find-
ing or accessing information is a user’s primary task. For
example, Zhicheng Dou et al. demonstrate how query logs
can be used to improve personalized search [8]. It is also
common for website designers to use query logs to help de-
4http://www.google.org/flutrends/
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
11 Readers on Mendeley
by Discipline
9% Design
by Academic Status
36% Ph.D. Student
27% Researcher (at a non-Academic Institution)
18% Student (Master)
by Country
36% United Kingdom
27% United States
18% Germany



