Sign up & Download
Sign in

Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search

by Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, Geri Gay
ACM Transactions on Information Systems (2007)

Abstract

This article examines the reliability of implicit feedback generated from clickthrough data and query reformulations in World Wide Web (WWW) search. Analyzing the users' decision process using eyetracking and comparing implicit feedback against manual relevance judgments, we conclude that clicks are informative but biased. While this makes the interpretation of clicks as absolute relevance judgments difficult, we show that relative preferences derived from clicks are reasonably accurate on average. We find that such relative preferences are accurate not only between results from an individual query, but across multiple sets of results within chains of query reformulations.

Cite this document (BETA)

Available from portal.acm.org
Page 1
hidden

Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search

Evaluating the Accuracy of Implicit Feedback
from Clicks and Query Reformulations in
Web Search
THORSTEN JOACHIMS
Cornell University
LAURA GRANKA
Google Inc.
BING PAN
College of Charleston
and
HELENE HEMBROOKE, FILIP RADLINSKI, and GERI GAY
Cornell University
This article examines the reliability of implicit feedback generated from clickthrough data and
query reformulations in World Wide Web (WWW) search. Analyzing the users’ decision process us-
ing eyetracking and comparing implicit feedback against manual relevance judgments, we conclude
that clicks are informative but biased. While this makes the interpretation of clicks as absolute
relevance judgments difficult, we show that relative preferences derived from clicks are reasonably
accurate on average. We find that such relative preferences are accurate not only between results
from an individual query, but across multiple sets of results within chains of query reformulations.
Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information
Search and Retrieval
General Terms: Human Factors, Measurement, Reliability, Experimentation
Additional Key Words and Phrases: Clickthrough data, eye-tracking, implicit feedback, query
reformulations, user studies, WWW search
This work was funded in part through NSF CAREER Award IIS-0237381 and through a gift from
Google.
Authors’ addresses: T. Joachims and F. Radlinski, Department of Computer Science, Cornell Univer-
sity, 4130 Upson Hall, Ithaca, NY 14853-7501; email: {tj,filip}@cs.cornell.edu; L. Granka, Google
Inc., 1600 Ampitheatre Parkway, Mountain View, CA 94043; email: granka@google.com; B. Pan,
Office of Tourism Analysis, School of Business and Economics, College of Charleston, 315 Beatty
Center, 5 Liberty Street, Charleston, SC 29424; email: PanB@cofc.edu; H. Hembrooke and G. Gay,
Department of Information Science, Cornell University, 301 College Avenue, Ithaca, NY 14850-
4623; email: {hah4,gkg1}@cornell.edu.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is
granted without fee provided that copies are not made or distributed for profit or direct commercial
advantage and that copies show this notice on the first page or initial screen of a display along
with the full citation. Copyrights for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,
to redistribute to lists, or to use any component of this work in other works requires prior specific
permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn
Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org.

2007 ACM 1046-8188/2007/04-ART7 $5.00 DOI 10.1145/1229179.1229181 http://doi.acm.org/
10.1145/1229179.1229181
ACM Transactions on Information Systems, Vol. 25, No. 2, Article 7, Publication date: April 2007.
Page 2
hidden
2•
T. Joachims et al.
ACM Reference Format:
Lin, J. 2007. Evaluating the accuracy of implicit feedback from clicks and query reformula-
tions in Web search. ACM Trans. Inform. Syst. 25, 2, Article 7 (April 2007), 27 pages. DOI =
10.1145/1229179.1229181 http://doi.acm.org/10.1145/1229179.1229181
1. INTRODUCTION
The idea of adapting a retrieval system to particular groups of users and par-
ticular collections of documents promises further improvements in retrieval
quality for at least two reasons. First, a one-size-fits-all retrieval function is nec-
essarily a compromise in environments with heterogeneous users and is there-
fore likely to act suboptimally for many users [Teevan et al. 2005]. Second, as
evident from the TREC evaluations, differences between document collections
make it necessary to tune retrieval functions with respect to the collection for
optimum retrieval performance. Since manually adapting a retrieval function
is time consuming or even impractical, research on automatic adaptation using
machine learning is receiving much attention (e.g., Fuhr [1989]; Bartell et al.
[1994]; Boyan et al. [1996]; Freund et al. [1998]; Cohen et al. [1999]; Herbrich
et al. [2000]; Crammer and Singer [2001]; Kemp and Ramamohanarao [2002];
Joachims [2002]; Holland et al. [2003]; Almeida and Almeida [2004]; Radlinski
and Joachims [2005]; Burges et al. [2005]). However, a great bottleneck in the
application of machine learning techniques is the availability of training data.
In this article we explore and evaluate strategies for how to automatically
generate training examples for learning retrieval functions from observed user
behavior. In contrast to explicit feedback, such implicit feedback has the advan-
tage that it can be collected at much lower cost, in much larger quantities, and
without burden on the user of the retrieval system. However, implicit feedback
is more difficult to interpret and potentially noisy. In this article we analyze
which types of implicit feedback can be reliably extracted from observed user
behavior, in particular clickthrough data in World Wide Web (WWW) search.
Following and extending prior work reported in Radlinski and Joachims [2005],
Joachims et al. [2005], and Granka et al. [2004], we analyze implicit feedback
from within individual queries as well as across multiple consecutive queries
about the same information need (i.e., query chains). The feedback strategies
across query chains exploit that users typically reformulate their query multi-
ple times before their information need is satisfied. We elaborate on the query
chain strategies proposed in Radlinski and Joachims [2005], as well as propose
and explore additional strategies.
To evaluate the reliability of these implicit feedback signals, we conducted
a user study. The study was designed to analyze how users interact with the
list of ranked results (i.e., the “results page” for short) from the Google search
engine and how their behavior can be interpreted as relevance judgments. We
performed two types of analysis in this study. First, we used eye-tracking to un-
derstand how users behave on Google’s results page. Do users scan the results
from top to bottom? How many abstracts do they read before clicking? How does
their behavior change, if we artificially manipulate Google’s ranking? Answers
ACM Transactions on Information Systems, Vol. 25, No. 2, Article 7, Publication date: April 2007.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

41 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
46% Ph.D. Student
 
15% Student (Master)
 
10% Other Professional
by Country
 
15% China
 
12% Germany
 
12% United States