Sign up & Download
Sign in

Replaying live-user interactions in the off-line evaluation of critique-based mobile recommendations

by Quang Nhat Nguyen, Francesco Ricci
Proceedings of the 2007 ACM conference on Recommender systems RecSys 07 (2007)

Cite this document (BETA)

Available from portal.acm.org
Page 1
hidden

Replaying live-user interactions in the off-line evaluation of critique-based mobile recommendations

Replaying Live-User Interactions in the Off-Line Evaluation
of Critique-based Mobile Recommendations
Quang Nhat Nguyen
Free University of Bozen-Bolzano
Piazza Domenicani 3, 39100 Bolzano (BZ), Italy
Quang.NhatNguyen@unibz.it
Francesco Ricci
Free University of Bozen-Bolzano
Piazza Domenicani 3, 39100 Bolzano (BZ), Italy
fricci@unibz.it


ABSTRACT
Supporting conversational approaches in mobile recommender
systems is challenging because of the inherent limitations of
mobile devices and the dependence of produced recommendations
on the context. In a previous work, we proposed a critique-based
mobile recommendation approach and presented the results of a
live users evaluation. Live-user evaluations are expensive and
there we could not compare different system variants to check all
our research hypotheses. In this paper, we present an innovative
simulation methodology and its use in the comparison of different
user-query representation approaches. Our simulation test
procedure replays off-line, against different system variants,
interactions recorded in the live-user evaluation. The results of the
simulation tests show that the composite query representation,
which employs both logical and similarity queries, does improve
the recommendation performance over a representation using
either a logical or a similarity query.
Categories and Subject Descriptors
H.5.2 [Information Interfaces and Presentation]: User
Interfaces – Graphical user interfaces (GUI),
Evaluation/methodology
General Terms
Design, Experimentation, Human Factors
Keywords
Critiquing, mobile recommender systems, query representation,
simulation test
1. INTRODUCTION
When searching for products and services, e-commerce web sites
users are often overwhelmed by the number of options to
consider. Hence they need some system support to filter out
irrelevant products, compare candidates, and select the best
one(s). Recommender Systems (RSs) are decision support tools
that solve this information overload problem, by suggesting
products and services personalized to the user’s needs and
preferences at her particular context.
A research direction in the RSs field, that has received much
attention, is conversational RSs. In these systems the user gets
her desired items through a structured human-computer dialogue
[13]. Critique-based RSs are conversational RSs which, at each
interaction cycle, interleave the system’s product proposals with
the user’s critiques to the proposed products [12, 2, 9, 5, 10, 6, 4,
7, 11, 3]. The user makes a critique to a recommended item when
either a feature of the item does not satisfy the user or when she
wants to emphasize that it is very important to her. A user's
critique, for instance, may specify an unsatisfied preference, such
as “I want a restaurant cheaper than this”, or confirm an important
preference, such as “I prefer to rent a room with private
bathroom”.
There have recently been an increasing number of mobile RSs
introduced in the literature [14, 5, 16, 15, 11]. In practice,
designing an effective and usable mobile RS requires the
recommendation methodology to overcome obstacles typically
present in the mobile usage environment (e.g., smaller screens and
limited input modalities) and to be suitable for mobile users'
behavior.
In our previous paper [11], we have presented a critique-based
mobile RS that, to make user interaction simple and fast, supports
a very limited input of explicit user preferences through system
questions and is mostly based on critiques. When making
critiques, the user also assigns the strength (i.e., wish or must) of
the expressed preference, which helps the system correctly exploit
the user’s critique. To produce relevant recommendations, the
system integrates both long-term and session-specific user
preferences, employs a composite query representation, and
exploits many sources of user related information [8]. The
proposed recommendation methodology has been implemented in
MobyRek, a mobile phone on-tour RS that assists on-the-go users
in searching for travel products (restaurant). MobyRek was
evaluated with real users with respect to: usability (i.e.,
functionality, efficiency, and convenience), recommendation
quality, and overall user satisfaction [11]. The objective and
subjective results of the on-line evaluation showed that our
recommendation methodology is effective in supporting on-the-go
users in making product choice decisions.
However, in the on-line evaluation we could not test different
system variants that employ different user-query representation
approaches. In particular, we would like to understand whether or
not our composite user-query representation approach, which
employs both logical and similarity query components, results in
a better recommendation performance (interaction length and
percentage of successful sessions) against other approaches that

Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
RecSys’07, October 19–20, 2007, Minneapolis, Minnesota, USA.
Copyright 2007 ACM 978-1-59593-730-8/07/0010...$5.00.

81
Page 2
hidden
employs an individual query representation based on either
logical filtering or similarity-based retrieval. If being tested in the
on-line evaluation, this check-experiment would have required
test users to use and evaluate different system variants
implementing the alternative query models. This would have
required them to spend much more time and effort, and would
have limited the number of variants that we could compare. This
is a general problem for empirical system evaluation. Therefore,
we decided to design a simulation methodology that could benefit
from the interaction logs collected in the live-user test and replay
such interactions with systems (slightly) different from that used
in the live-user evaluation.
In this paper, we describe the proposed simulation test procedure
and how it has been used to validate the hypothesized advantage
of a hybrid (logical and similarity-based) query model. The
simulation test procedure takes as input a dataset that comprises a
historical series of recommendation sessions, where each session
contains a historical sequence of user critiques. In other words,
recommendation sessions, and user critiques in a session as well,
are replayed in the simulation in the original order, instead of
being simulated randomly. In summary, the paper makes the
following contributions.
• A simulation test procedure that can be used for testing
critique-based RSs in simulated environments, based on
the replay of real recommendation sessions and the
various ways of simulating user critiques.
• A number of simulation tests that show the advantage of
the composite query representation (i.e., using both
logical and similarity queries) over a single query
representation (i.e., using either logical or similarity
query) in term of the rate of successful recommendation
sessions.
The remainder of the paper is organized as follows. Section 2
discusses the product representation and the user preferences
model, followed by the description of the recommendation
process. The hypothesis on the user-query representation is posed
and discussed in section 3. In section 4, we discuss the proposed
simulation procedure, apply it to test the posed hypothesis, and
discuss on the observed results. In section 5, we recall the off-line
evaluation approaches introduced in previous research in critique-
based RSs. Finally, the conclusions are given in Section 6.
2. RECOMMENDATION METHODOLOGY
In this section we shall briefly recall our proposed critique-based
mobile recommendation methodology, first describing the product
representation and the user preferences model and then presenting
how the system produces personalized product recommendations
for on-the-go users [11].
2.1 Product Representation and User
Preferences Model
A product is represented as a feature vector x= (x
1
, x
2
,…, x
n
),
where a feature value x
i
can be numeric, nominal, or a set of
nominal values. For instance, the representation of the restaurant
x= (Trittico, 79, {pizzeria}, 10, {air-conditioned, parking}, {7,
1}, {credit-card}) means that the name (x
1
) is Trittico, the
distance from the user's position (x
2
) is 79 m, the restaurant type
(x
3
) is pizzeria, the average cost (x
4
) is 10 euros, the
characteristics (x
5
) are air conditioned and parking, the days open
(x
6
) are Saturday and Sunday, and the accepted method of
payment (x
7
) is credit card.
To generate personalized recommendations, a recommender
system needs a representation of the user's preferences.
Preferences vary from user to user, and even from situation, i.e.,
context, to situation for the same user. In our approach, the user
preferences model includes both contextual (e.g., space-time
constraints) and product-feature (e.g., air conditioned)
preferences, and incorporates both long-term (e.g., a preference
on non-smoking room) and session-specific (e.g., a wish to eat a
pizza) user preferences [8, 11]. Though the specification of initial
preferences (i.e., at start-up) is supported, and optional, for users,
session-specific preferences are acquired mainly through the
user’s critiques collected during the recommendation session.
In a recommendation session, the user's preferences are encoded
in the system’s user query representation which is used to
compute the recommendation list. In our approach, the user query
representation q consists of three components, q= (Q
L
, p, w).
• The logical query, Q
L
= (c
1
∧ c
2
∧…∧ c
m
), models the
conditions that must be satisfied by every recommended
product. The logical query is a conjunction of constraints,
where each one (c
j
) relates to a single feature.
• The favorite pattern, p= (p
1
, p
2
, …, p
n
), models the
conditions that the recommended products should match
as closely as possible. The wish conditions (p
i
, i=1..n)
allows the system to make trade-offs.
• The feature importance weights vector, w= (w
1
, w
2
, …,
w
n
) models how much important to the user a feature is
with respect to the others, where w
i
∈ [0,1] is the
importance weight of feature f
i
. The system refers to the
feature weights when it needs to make trade-offs or to find
relaxation solutions for unsatisfiable (i.e., empty-result)
logical queries.
For example, the query <Q
L
=(x
2
≤1000)∧(x
6
⊇{7,1});
p=(?,?,{pizzeria},?,?,?,?); w=(0,0,0.4,0.6,0,0,0)> models a user
who looks for restaurants within 1 km from her position that are
open on Saturday and Sunday and prefers pizzeria restaurants. For
the user the cost is most important, followed by the restaurant
type, and he is indifferent to the other features.
2.2 The Recommendation Process
A recommendation session begins when a mobile user asks the
system for a product suggestion, and it ends when the user selects
a product or she quits the session with no selection. A
recommendation session evolves in cycles. At a recommendation
cycle, the system shows the recommended products (see Figure
1b) that the user can browse to see the product details and make
critiques (see Figure 1c,d). After the user has expressed a critique,
the critique is exploited by the system to compute a new
recommendation list that is showed to the user in the next cycle.
In an overview, a recommendation session is logically divided
into three phases: initialization, interaction and adaptation, and
retaining.
At start-up, the user is offered with three options for the search
initialization (see Figure 1a).
82

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

13 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
31% Ph.D. Student
 
15% Student (Master)
 
8% Doctoral Student
by Country
 
15% Vietnam
 
15% Germany
 
15% Canada