Sign up & Download
Sign in

What Can Quantum Theory Bring to Information Retrieval ?

by Benjamin Piwowarski, Keith Van Rijsbergen
Quantum (2010)

Abstract

The probabilistic formalism of quantum physics is said to provide a sound basis for building a principled information retrieval framework. Such a framework can be based on the notion of information need vector spaces where events, such as document relevance or observed user interactions, correspond to subspaces. As in quantum theory, a proba- bility distribution over these subspaces is defined through weighted sets of state vectors (density operators), and used to represent the current view of the retrieval system on the user information need. Tensor spaces can be used to cap- ture different aspects of information needs. Our evaluation shows that the framework can lead to acceptable performance in an ad-hoc retrieval task. Going beyond this, we discuss the potential of the framework for three active challenges in information retrieval, namely, interaction, novelty and diversity.

Author-supplied keywords

Cite this document (BETA)

Available from ir.dcs.gla.ac.uk
Page 1
hidden

What Can Quantum Theory Bring to Information Retrieval ?

What can Quantum Theory bring to Information Retrieval?
Benjamin Piwowarski
University of Glasgow
benjamin@bpiwowar.net
Ingo Frommholz
University of Glasgow
ingo@dcs.gla.ac.uk
Mounia Lalmas
University of Glasgow
mounia@acm.org
Keith Van Rijsbergen
University of Glasgow
keith@dcs.gla.ac.uk
ABSTRACT
The probabilistic formalism of quantum physics is said to
provide a sound basis for building a principled information
retrieval framework. Such a framework can be based on
the notion of information need vector spaces where events,
such as document relevance or observed user interactions,
correspond to subspaces. As in quantum theory, a proba-
bility distribution over these subspaces is defined through
weighted sets of state vectors (density operators), and used
to represent the current view of the retrieval system on the
user information need. Tensor spaces can be used to cap-
ture different aspects of information needs. Our evaluation
shows that the framework can lead to acceptable perfor-
mance in an ad-hoc retrieval task. Going beyond this, we
discuss the potential of the framework for three active chal-
lenges in information retrieval, namely, interaction, novelty
and diversity.
Categories and Subject Descriptors
H.3.3 [Information Storage and Retrieval]: Information
Search and Retrieval
General Terms
Theory
Keywords
Model, Quantum Theory
1. INTRODUCTION
Many successful models for information retrieval (IR) ex-
ist, and nowadays increasing effort is put into user-oriented
models. But there is still a lack of a unified theoretical
framework able to address the various challenges identified
in IR. Quantum physics, on the other hand, offers a proba-
bilistic, logic and geometric formalism based on the mathe-
matics of Hilbert spaces, to describe the behaviour of matter
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
CIKM’10, October 25–30, 2010, Toronto, Ontario, Canada.
Copyright 2010 ACM 978-1-4503-0099-5/10/10 ...$5.00.
at (sub)atomic scales. As van Rijsbergen claims [16], this
“language” of quantum mechanics can also be used to ex-
press the different geometric, probabilistic and logic-based
IR models within a unified framework, while being able to
consider user-oriented aspects. Driven by this motivation,
research in the field of quantum-inspired IR is investigat-
ing how the mathematical framework and postulates behind
quantum theory can be applied to attack the diverse chal-
lenges in IR. These efforts led to the recent proposal of an
abstract quantum IR (QIR) framework that applies methods
known from quantum mechanics, in particular the notion of
Hilbert spaces, system states (expressed as density opera-
tors) and measurements, to interactive retrieval [14].
A first instantiation of this framework focused on the rep-
resentation of queries (as density operators) and documents
(as subspaces) [12]. Initial experiments addressed classical
ad-hoc retrieval tasks, which cover a potential first step of
user interaction, the submission of a query by a user. While
we do not regard ad-hoc tasks as the main target of the
framework, it is crucial to show that acceptable performance
is achieved with it. In these experiments (performed on one
test collection), various methodologies (and associated pa-
rameters) to construct the representations were explored,
but none of them led to good retrieval performance.
The focus of this paper is to move a step further by first de-
scribing some of the interesting properties of this framework
and refining the representation of queries and documents.
This leads us, from a theoretical point of view, to investi-
gate other concepts known from quantum physics, namely
tensor products, and from a practical point of view, to ex-
tensively experiment with different parameters. The exper-
iments carried out on several TREC collections show that
the framework is now mature enough to support current
IR challenges, namely, dealing with user interaction, nov-
elty (of documents) and diversity (of documents and topics)
in a principled manner. These challenges remain difficult or
impractical to solve with current approaches [17, 22].
The outline of this paper is as follows. We first discuss
related work (Section 2). We then introduce the quantum-
based IR framework (Section 3). In Section 4, we show how
to compute the document representation (Section 4.1) and
construct the query representations (Section 4.2). These are
evaluated in an ad-hoc retrieval scenario in Section 5. We
discuss how the framework addresses interactivity, novelty
and diversity in IR in Section 6. Finally, we conclude.
2. RELATED WORK
Page 2
hidden
The quantum based IR framework relies on a multi-dimensional
representation of documents (subspaces) and queries (den-
sities). Multi-dimensional representations have been implic-
itly used in IR to handle negative feedback. [7] showed that
without such a representation, contradictory results were
obtained. Later, [17] found that negative feedback could
be handled by describing the information need as a set of
vectors. The QIR framework encompasses this approach in
the sense that a query is represented as a weighted set of
vectors, too.
A more explicit use of multi-dimensional representations
is the work from [4] who proposed to randomly split a doc-
ument into two parts, and to use a two-dimensional repre-
sentation of documents to obtain a “stereoscopic” view of a
document. Our approach can be thought of as a principled
extension of this work, where we do not limit ourselves to
two dimensions and, in addition, we rely on the probabilistic
framework of quantum physics to compute the relevance of
a document to a query.
Explicit multidimensional representations have also been
explored. [23] showed that the cluster hypothesis still holds
when representing documents as subspaces. Their method-
ology to build subspaces is close to ours since to represent
documents they compute the subspace spanned by a set of
vectors, albeit implicitly. In our work, we provide an ex-
plicit methodology to construct the subspaces. [9] also uses
a subspace for representing a user’s information need (the
subspace where relevant document vectors should lie), and
a vector representation for documents. The probability that
a document is relevant to a user’s information need is de-
termined by the projection of its vector representation onto
the corresponding (information need) subspace. Following
quantum physics, we interchanged the role of document and
user’s information need1. This is motivated by the fact that
the user’s information need should be represented as a dy-
namic component, as advocated in e.g. [8].
In IR, our work also bears some similarity with Latent
Semantic Indexing (LSI, [6]) since we use spectral analysis
to extract document and query representations. However,
we do not represent objects in a low-dimensional space as
in LSI, but use a spectral analysis to obtain a compact rep-
resentation of our document subspaces and query densities.
Hyperspace Analogous to Language (HAL) spaces [3] are
also closely related to our work. There, each word w is rep-
resented by a term vector where non null components corre-
spond to words co-occurring within a small window centred
around word w. Our representation of a term is inspired by
this approach, but, for each word w, we use spectral analysis
to summarise the information brought by the set of vectors
associated with it.
In quantum IR, besides van Rijsbergen’s seminal work [16],
[22] experimented with a quantum inspired principle for
ranking documents. Our work proposes another approach
to the problem of diversity, whereby the representation it-
self gives a principled way to rank documents.
Outside IR, in the face detection domain, subspaces are
commonly used to solve recognition problems. In [2], a face
is represented by a subspace (generated from different pic-
ture vectors of the same face) and recognition involves com-
puting the distance between a vector (representing the face
to be recognised) and the subspace. In our work, we use
1[9] refers to the user “context” rather than the user’s infor-
mation need, but both concepts are related.
a similar idea to generate document representations. These
are represented by a subspace generated from different as-
pects of the document. Different from [2], a query also has
a multi-dimensional representation.
3. THE QUANTUM IR FRAMEWORK
We first give a brief introduction to the quantum proba-
bility formalism. We then discuss its application to IR.
3.1 Quantum Probabilities
The quantum probability formalism is a geometric gener-
alisation of standard probability theory that makes use of
Hilbert spaces, unit vectors and subspaces. We present the
components of the formalism used in this paper.
3.1.1 Systems and States
Quantum theory describes the behaviour of matter at
atomic and subatomic scales by means of physical systems
which can be in a certain state in a state space. The latter
is a Hilbert space H, a vector space with an inner product.
The state of the system is described by a unit vector in the
state space called the state vector. States determine statisti-
cally the measures obtained on the system, for instance the
position of a particle. Here, the state vector determines the
probability that the particle is at a given position.
A system state may be fully known, in which case the
system is described by exactly one state vector, but the for-
malism also allows to be uncertain about its state, in which
case the system state can be represented as a weighted set
of possible state vectors with corresponding probabilities or,
equivalently, as a so-called density operator [16]. Depend-
ing on the application we may regard single-part systems,
but very often we need to discuss multi-part systems. We
therefore introduce single-part systems, including the cases
where states are known or uncertain, and then proceed to
multi-part ones.
3.1.2 Single-part Systems
States and Probabilities.
Given a state space H and a state vector ϕ, a probabilistic
event is represented as a subspace S of H. A state vector
ϕ induces a probability distribution on events (i.e., the sub-
spaces). The probability of an event is given by the square
of the length of the projection of ϕ onto the corresponding
event subspace S, that is by computing the value


‚bSϕ



2
where bS is the projector onto the subspace S. This value is
the probability of the event S with respect to the probability
distribution defined by ϕ.
Uncertain States and Weighted Sets.
In a physics experiment, there is often some uncertainty on
the preparation process of the system, which in turn induces
some uncertainty about the state the system is in at the
beginning of the experiment. To formalise this uncertainty,
we make use of a weighted set of possible state vectors (called
ensembles in quantum physics).
A weighted set is defined by a function V that associates
a weight V (ϕ) ∈ R to each of its elements (state vectors)
ϕ. The weight V (ϕ) corresponds to the probability that
the system is in state ϕ. We say that ϕ ∈ V if its weight is
greater than zero. For notations, we use standard operations
Page 3
hidden
on function spaces: Adding two weighted sets V1 + V2, or
scaling by a factor αV. We denote ϕ 7→ w the weighted set
that associates w to ϕ and 0 to the other vectors.
As states are mutually exclusive (a system cannot be in
two different states at the same time), we require that a
weighted set representing a system has weights that sum up
to 1. Given the weighted set V of possible state vectors, we
can then define the probability of an event S in case the
system state is uncertain as
Pr (S|V) =
X
ϕ∈V
V (ϕ) Pr (S|ϕ) =
X
ϕ∈V
V (ϕ)


‚bSϕ



2
(1)
= tr

ρV bS

where ρV =
X
ϕ∈V
V (ϕ)ϕϕ>
where tr is the trace operator, ϕ> the transpose of ϕ and ρV
is the density operator, which corresponds to a (probabilis-
tic) mixture of states ϕ ∈ V (see [16, p. 83ff.]). Equation 1
reduces to


‚bSϕ



2
if V consists of only one vector ϕ, i.e. when
there is no uncertainty about the system state.
3.1.3 Multi-part Systems
In physics, interesting systems are those composed of mul-
tiple particles. The corresponding systems are multi-part
systems made of single-part ones. Multi-part systems can be
represented in a tensor product of Hilbert spaces, denoted ⊗.
If H1 and H2 are two Hilbert spaces of respective dimensions
n and m, the tensor space H1 ⊗H2 is an n ·m-dimensional
Hilbert space. If ϕ1 is a vector in H1 and ϕ2 is a vector in
H2, then ϕ1 ⊗ ϕ2 is a vector in H1 ⊗ H2. Furthermore, if
A and B are subspaces (events) in H1 and H2, respectively,
then A⊗B is a subspace (event) in H1⊗H2. The projection
of ϕ1 ⊗ϕ2 onto A⊗B is the tensor product of the two pro-
jected vectors; if one of the projections is null, then the result
is the null vector in H1 ⊗H2. The norm of a vector ϕ1 ⊗ϕ2
is the product of the norms of the component vectors. From
Equation 1, the definition of the projection and the norm in
the tensor space allows us to compute the probability of any
event. For example, the probability of the composite event
A ⊗ B is Pr (A⊗B|ϕ1 ⊗ ϕ2) = Pr (A|ϕ1) × Pr (B|ϕ2) with
ϕ1 (ϕ2) being a state vector in the first (second) state space.
These operations can be extended to more than two spaces
and to weighted sets. In the latter case, with n spaces, the
weighted set
N
Vi = V1 ⊗ . . .⊗Vn is composed of all tensor
combinations of vectors from the weighted sets Vi:
O
Vi =
X
ϕ1∈V1
. . .
X
ϕn∈Vn,

O
i
ϕi 7→ V1 (ϕ1)× . . .× Vn (ϕn)
!
(2)
where the weight of an element
N
i ϕi ∈
N
Vi is the product
of the weights of its component elements. From Equations 1
and 2, we can compute the probability associated with such
a tensor product:
Pr

O
i
Si
˛
˛
˛
˛
˛
O
i
Vi
!
=
Y
i
Pr (Si|Vi) (3)
An example of the use of tensor products in physics is to
describe n particles, for which we measure the probability
of each being in a given location in space. The above for-
mula expresses that if the particles are independent from
each other, then the probability is the product of the proba-
bilities of the individual particles. Transposed to IR, tensor
products are useful to express the constraints (particles) that
a relevant document (location in space) should fulfill.
3.2 The Quantum IR (QIR) framework
We now describe our framework for IR, which applies the
formalisms known from quantum theory introduced above.
3.2.1 Information Need Spaces
The basic assumption of the QIR framework [12] is that
there exists a Hilbert space H of pure information needs,
called information need space. In this space, a state vector
ϕ or pure information need (IN) reflects a system view of
the current IN of the user that completely characterises this
user’s possible IN.
The concept of a “pure” IN2 is central to our framework.
It can be compared to the notion of “nugget” [5], used in
summarisation and question-answering to assess the amount
of relevant information a summary or an answer contains.
As elaborated later, documents can be represented as sub-
spaces in the IN space H. An IR system that knows a user’s
pure IN would be able to determine with certainty how to
rank documents, i.e. would know how to compute their
probability of relevance. From a geometrical perspective,
we posit that a pure IN is fully answered by a document if
the vector representing the former is contained in the sub-
space representing the latter. It is partially answered if the
pure IN vector has a non-null projection onto the document
subspace. This is analogous to the view of states and prob-
abilities in a physical system. Consequently, if Sd is a sub-
space representing document d (and bSd the corresponding
projector), then Equation 3 can be applied to estimate the
probability of relevance of a document given a pure IN.
Typical to IR is the undeterminism that comes from the
fact that the representation of a query is only an approxima-
tion of the user’s IN, and/or that the query may be ambigu-
ous. This is comparable to the uncertainty about a physical
system state. We therefore represent the system’s view on
the user’s IN by a weighted set V, which captures each of
the user’s possible pure INs. We then compute a ranking of
documents by applying Equation 1.
3.2.2 Aspect Spaces and Multi-part Systems
User’s INs often consist of several “aspects” that relevant
documents should address. For example, in the TREC-8
topic 408, “What tropical storms (hurricanes and typhoons)
have caused significant property damage and loss of life?”,
we can identify two (topical) IN aspects, namely tropical
storms and significant damage/loss of life. Each IN aspect
can be defined within an IN aspect space, where the state
vectors are now called pure IN aspects. Examples of pure
IN aspects are the vectors representing “hurricane” and “ty-
phoon” for the first IN aspect (tropical storms). We use the
terminology pure IN aspect, since one pure IN aspect ad-
dresses one aspect of the IN (tropical storms) in the same
way that a pure IN addresses an IN.
An example of an IN aspect space is the topical space T ,
which in this work is equalled to the standard term space
where each term is a dimension. A (simplified) example is
2In this paper, we use “pure” information need (“pure IN”)
to distinguish it from information need (“IN”) in its more
usual sense in IR.
Page 4
hidden
shown in Figure 1a, where the pure IN aspect “pop music”
is represented by the terms “music”, “chart” and “hit” of the
term space. Note that an IN aspect may be of non-topical
nature, but in this paper, since we are experimenting with
TREC ad-hoc tasks, we only use a topical space T .
Our basic idea is to regard the whole user’s IN as a multi-
part system where each component system reflects one as-
pect of the IN. Each IN aspect is represented in a given
Hilbert space Hi (in this paper, we only consider topical
spaces, i.e. Hi = T ). Our IN space is then expressed as a
tensor product H =
N
Hi.
In the IN space H, a document is represented by S⊗d =N
Sd where Sd, whose construction is given in Section 4.1,
is the representation of the document in the topical space T .
The actual IN is a tensor product of weighted sets V =
N
Vi
where each Vi corresponds to one IN aspect of a user’s IN
in the term space T . We describe several constructions of V
based on a query q in Section 4.2. Given a document rep-
resentation S⊗d and a weighted set V, we then compute the
probability of relevance for the document d with Equation 3.
The above abstract construction of a multi-part space
comprises an infinite number of IN aspects. However, we
assume that all INs can be defined by a finite number of
aspects, and we introduce a fake don’t care pure IN as-
pect, represented by the state vector3 ϕ>. Any document
subspace Sd contains the “don’t care” pure IN aspect, i.e.
Pr (Sd|ϕ>) = 1. For example, the pure IN state ϕ = ϕ1 ⊗
ϕ2 ⊗ ϕ> ⊗ ϕ> ⊗ . . . with ϕi ∈ Vi corresponds to a pure IN
composed of two genuine pure IN aspects (ϕ1 and ϕ2) since
Pr (⊗Sd|ϕ) = Pr (Sd|ϕ1)×Pr (Sd|ϕ2)×1×1×. . .. We denote
V> = ϕ> 7→ 1 the IN aspect composed of only the don’t care
aspect. The don’t care aspect can also be used to introduce
weights for single aspects, as we will show in Section 4.2.2.
4. REPRESENTING DOCUMENTS AND
QUERIES
The previous section introduced the abstract QIR frame-
work. In this section, we make things concrete and describe
a methodology to instantiate this framework for the ad-hoc
task. We describe how to compute a subspace Sd represent-
ing the event that the document d is relevant to a topical
aspect (Section 4.1). We then discuss the various possibili-
ties to construct a representation Vq of a query q. The prob-
ability that d is relevant to q is then given by Pr
`
S⊗d
˛
˛Vq
´
,
computed according to Equation 3.
4.1 Representing Documents as Subspaces
We give a quick overview of the construction process. This
process was first introduced in [13] in a document filtering
scenario. [12] further showed how to construct a document
subspace representation by experimenting with a number of
strategies and associated parameters. Besides adopting the
most promising ones, we also propose new parameters.
The document representation is based on the assumption
that a typical document answers various (pure) IN aspects.
It also assumes that each document can be split into (pos-
sibly overlapping and non-contiguous) semantic fragments,
where each fragment addresses an IN aspect. This follows
from research in focused retrieval, which states that answers
to a query, and hence aspects of it, usually correspond to
3ϕ> is an extra dimension of each IN aspect space.
document fragments (sentences or paragraphs) and not full
documents.
As outlined in [12], a document subspace can be created
based on the document’s fragments, i.e. for each document
there is a mapping between them and a weighted set Vd of
pure IN aspects.
Sd is then defined as the subspace spanned by the vectors
in Vd. Sd is the smallest subspace such that a document is
always a fully relevant answer to an IN aspect it contains,
or more formally such that Pr (Sd|ϕ) = 1 for any ϕ ∈ Vd.
The document will be partially relevant to a pure IN with
a probability that depends on the length of the projection
of the pure IN vector onto the subspace Sd, as discussed in
Section 3.1.2.
In [12] the weighted set Vd was constructed using non-
overlapping fragments (sentences, paragraphs, section or full
document). The best performing approach made use of sen-
tence fragments, but had shortcomings with some collections
when sentence detection did not work well. In this paper, we
used a methodology where fragments are extracted using a
sliding window over the text of the documents, making it in-
dependent from error-prone sentence detection algorithms.
Denoting t1, . . . , tM the sequence of words of the document,
and s the window size, each fragment corresponds to the set
of terms wk×s, . . . , wk×(s+1) for 1 ≤ k <
M
s + 1.
4.2 Representing Queries as Density Opera-
tors
We focus on the construction of a representation for a
given query q, which corresponds to calculating a tensor
product of weighted sets Vq = ⊗Vi. We show how to com-
pute this tensor for the queries made of a single term (Sec-
tion 4.2.1) and queries made of several terms (Section 4.2.2).
In contrast to document representation, query construction
is a non-trivial task, as there are many ways to compose
different terms, each one influencing retrieval performance.
4.2.1 Single-term query
As a query in its simplest form consists of a set of terms,
we are first interested in building the query representation
for a query composed of a single term, t. This representation
is later needed for constructing the representation of multi-
term queries.
We assume that a query term t can be represented as the
multiset Vt of pure IN aspect vectors that correspond to doc-
ument fragments centred on (containing) term t. That is, we
use the immediate surroundings of the term t occurrences in
documents to build that term representation. This method-
ology is similar to pseudo-relevance feedback using passages
from retrieved documents containing the query terms [1].
The difference is that we use all the passages to build the
query representation as we want to consider all possible pure
IN aspects associated with the term t. Vt is then the set of
all ambiguous INs represented by t.
As we have a priori no way to distinguish between the
different vectors in Vt, we assume that each vector is equally
likely to be a pure IN aspect with respect to the user’s actual
IN. Hence, to construct the weighted set Vt for a term t, for
any given pure IN aspect ϕ, we set
Vt =
X
ϕ 7→
nt (ϕ)
Nt
(4)
Page 5
hidden
where Nt is the number of vectors associated with term t
and nt (ϕ) is the number of occurrences of the pure IN as-
pect ϕ in the fragments containing term t. In practice, this
representation means that the more vectors ϕ from Vt lie
in the document subspace, the higher the relevance of the
document to the single-term query t.
4.2.2 Multi-term query
We now show three different methods to compute the
weighted set corresponding to a multi-term query:
mixture The document should contain as many pure IN
aspects of any of the query terms;
mixture of superpositions The document should con-
tain as many pure IN aspects which are “combinations” of
the pure IN aspects associated with each query term;
tensor product The document should have as many pure
IN aspects for each of the query terms.
The difference between the mixture and mixture of su-
perpositions approaches and the tensor product one lies in
their ability to handle queries containing different aspects.
The former two provide no explicit means to distinguish be-
tween aspects; they operate in one aspect space and treat
each IN associated with a term equally, even if the terms
describe different aspects. In contrast, the latter provides
explicit means to distinguish between aspects by combining
different aspect spaces. As in our test collection we have no
indication about which query term address which aspect of
the IN, we make the simplifying assumption that each term
relates to a different IN aspect.
In the following, we explain each of these three approaches
and their rationales. We denote T = {t1, . . . , tn} the terms
forming the query q, and the indices i and j refer to terms
ti and tj . As different terms have a different importance, we
use a set of weights wi that sum to 1 to denote the relative
importance of the different terms in the query.
Mixture.
We assume that a relevant document should equally an-
swer all pure IN aspects associated with any of the query
terms. That is, a document d1 will have a higher probabil-
ity of being relevant than d2, if when we pick by random a
term ti of the query, and then a pure IN aspect ϕ from the
associated set Vi, the probability that d1 is relevant to the
pure aspect ϕ is in average higher than that of d2.
More precisely, to compute the probability of relevance of
a document d, we first select the ith term of the query (with
a probability wi, used to reflect the importance of the term
in the query), and then one of the vectors of the weighted
set Vi corresponding to the term ti. With this vector, we
compute the probability of a document d to be relevant to
this aspect. We repeat the process and average over all the
possible selections. This defines the probability of relevance
of document d given the query. Formally, this corresponds
to a density defined as a mixture of all the pure IN aspect
vectors associated with the query terms. The weighted set
is built from the individual weighted set Vi (Section 4.2.1):
V(m)T =
nX
i=1
wiVi (5)
where (m) stands for mixture. In this weighted set, the
user’s pure IN aspect corresponds to ϕ with a probability
P
i wiVi (ϕ). This representation, expressed within a tensor
product of IN aspect spaces, makes use of only one of them,
and is defined as:
V(m)q = V
(m)
T ⊗ V> ⊗ . . . (6)
Note that standard vectorial IR would be derived if V(m)T
was composed of one pure IN aspect ϕ and the document
subspace was unidimensional.
Mixture of superpositions.
In vector-based IR, a query is represented by a vector that
corresponds to a linear combination of the vectors associated
with the query terms. In the simplest case, a term vector is
naught everywhere but for the component that corresponds
to the term, where the value is e.g. tf-idf. In quantum theory,
a normalised linear combination corresponds to the principle
of superposition (normalised linear combination), where the
description of a system state can be superposed to describe
a new system state.
In our case, a system state is a user’s pure IN aspect,
and we use the superposition principle to build new pure IN
aspects from existing ones, as illustrated with the example
shown in Figures 1b and 1c. Let ϕp, ϕuk and ϕusa be three
vectors in a three-dimensional IN space which, respectively,
represent the pure IN aspects “I want a pizza”, “I want it
to be delivered in Cambridge (UK)” and “I want it to be
delivered in Cambridge (USA)”. The pure IN aspect vector
“I want a pizza to be delivered in Cambridge (UK)” would
be represented by a superposition ϕp/uk of ϕp and ϕuk, as
depicted in Figure 1b. We can similarly build the pure IN
aspect for Cambridge (USA). To represent the ambiguous
query “pizza delivery in Cambridge” where we do not know
whether Cambridge is in the USA or the UK, and assuming
there is no other source of ambiguity, we would use a mixture
of the two possible superposed pure IN aspects, as depicted
by the two vectors of the mixture in Figure 1c. This brings
us to another variant of query construction, the mixture of
superpositions.
To compute the probability of relevance of a document,
we randomly select a vector from the set Vi of each term ti of
the query. In our previous example, the set V1 would be just
one vector (ϕp), whereas V2 would contain two vectors (ϕuk
and ϕusa). We then superpose the selected vectors (one for
each term), where the weight in the linear combination is√
wi, obtain a new vector ϕ, and compute the probability of
the document to be relevant to this new pure IN aspect ϕ.
The above process is repeated for all the possible selec-
tions of vectors. The associated weighted set is thus formally
defined as the following mixture of superpositions:
V(ms)T =
X
ϕ1∈V1
. . .
X
ϕn∈Vn
Pn
i=1

wiϕi


Pn
i=1

wiϕi


7→
Y
i
Vi (ϕi)
!
(7)
where (ms) stands for “mixture of superpositions”. In the
weighted set, each pure IN aspect is a linear combinationPn
i=1

wiϕi of the IN aspect from each of the terms com-
posing the query which is normalised to ensure it is a unit
Page 6
hidden
IN: Pop Music
Term: Hit
Term: Chart
Term: Music
(a) A pure IN in a
IN/term space
Cambridge (UK)
Pizza
(b) A superposition of two pure IN aspects
ϕp/usa
ϕp/uk
ϕusa (Cambridge USA)
ϕuk (Cambridge UK)
ϕp (Pizza)
(c) A mixture of two pure IN aspects
Figure 1: Operations in the IN/term space
vector. Each of these linear combinations is associated with
a weight which is the product of the weights4
Q
i Vi (ϕi).
The mixture of superpositions differs from the mixture
in the way we combine the query terms5. As an example,
consider two terms T = {t1, t2} that always occur in a doc-
ument, but half of the time in the same sentences, the other
half in distinct sentences. A mixture V(m)T retrieves all the
documents containing the terms, but a mixture of superpo-
sitions V(ms)T gives a higher probability to the documents
that contain both in the same sentences.
As for mixtures, we only make use of one aspect space
within a tensor product of IN spaces, and define the weighted
set for mixture of superpositions as:
V(ms)q = V
(ms)
T ⊗ V> ⊗ . . . (8)
Tensor product of term spaces.
We now suppose that, to be relevant to a query that com-
prises several aspects of an IN, a document should satisfy
ideally all of its aspects. Furthermore, users might give dif-
ferent importance to certain aspects, which motivates the in-
troduction of a weighting scheme for aspects. Both methods
discussed above cannot handle aspects. To support aspects
explicitely, we discuss a quantum analogue of the “weighted
and” (#wand) operator proposed in [10]. Aspect spaces and
tensor products, introduced in the previous section, are the
core components of our approach. Our (simplified) assump-
tion is that the query is made of a set of IN aspects, one for
each term of T .
Since we suppose that a relevant document is one that
addresses each of the IN aspects associated with the terms
of the query, using the tensor product means that the query
becomes associated with the weighted set V1⊗. . .⊗Vn, where
each Vi now corresponds to the query term ti. However,
this representation gives the same importance to each query
term, which led to poor performance (not reported here).
This motivates the introduction of a “weighted and”. With
our notations, this gives a probability of relevance defined
by:
4In practice, we used a modified formula for computational
reasons (see [12]).
5For one-term sets, i.e. T = {t}, the two representations
give the same representation, i.e. V(m)T = V
(ms)
T .
Pr (d is relevant) =
Y
i
Pr (Sd|Vi)
wi (9)
where the case wi = 1 for any i corresponds to a tensor
product V1 ⊗ . . .⊗ Vn. In general, if all the wi are integers,
then the above equation corresponds to a tensor product as
defined by Equation 3, where each Vi appears wi times in
the tensor product.
However, for a set of arbitrary wi, the above notation does
not correspond to a probability distribution defined on a
tensor product. We now present two ways to overcome this.
The first one is by transforming Equation 9 so that the new
weights are integer values. More precisely, we observe that
Pr (d is relevant)β does not change the ranking of documents
defined by Equation 9 when β > 0. The value β is chosen
so that βwi can be approximated by an integer value. For
example, if a query is composed of two terms t1 and t2 with
respective weights w1 = 0.6 and w2 = 0.4, then β = 5, and
the query is represented by V1 ⊗ V1 ⊗ V1 ⊗ V2 ⊗ V2. The
first tensor query representation, referred to as V(T1)T , can
be formally defined as:
V(T1)q =
O
i

βwiO
j=1
Vi
!
⊗ V> ⊗ . . . (10)
The above has the disadvantage that there are βwi aspect
spaces assigned to one single term ti. This leads us to our
second solution, a more elegant one, where a term ti cor-
responds to only one aspect. The idea is to use the fake
pure IN aspect ϕ> defined at the end of Section 3.2.2. To
include the state ϕ> as one of its possible pure IN aspect
states associated with term ti with a probability f(wi), we
modify Vi as follows:
V(T2)i = (1− f (wi))Vi + (ϕ> 7→ f(wi)) . (11)
The second part of the sum assigns the weight f(wi) to
the don’t care pure IN aspect, while the first part scales
the previous weights so that the sum of all weights is 1
again. Equations 11 and 1 imply that Pr (Sd|V ′i) = f (wi) +
(1− f (wi)) Pr (Sd|Vi). Our second query representation us-
ing tensor products is then defined as:
Page 7
hidden
V(T2)q =
O
i
V(T2)i ⊗ V> ⊗ . . . (12)
We experimented with various heuristically based func-
tions f (not shown here), but none of them gave good re-
sults compared to V(T1)q . This led us to consider a function f
that minimises the mean squared error between Pr (Sd|Vi)
w
and f (wi) + (1− f (wi)) Pr (Sd|Vi). It can be shown that6
the optimal f is defined by f (wi) = 32 −
3
(wi+1)(wi+2)
for
wi ∈ [0, 1] (proof omitted). In the next section, we show that
both approaches (T1) and (T2) perform similarly, which im-
plies that the latter should be preferred.
5. EXPERIMENTS AND ANALYSIS
The focus of our experiments is to show that the QIR
framework achieves performances comparable to that of stan-
dard IR models in an ad-hoc setting. We describe our exper-
imental set up, then present our results and their analysis.
5.1 Experimental Setup
Test collections. We used the TREC 1 to 8 collections (with
the exception of TREC-4 since it did not contain the “title”
field), and the TREC ROBUST 2004 collection.
Queries and weights. All topics were automatically con-
verted into a query using their “title” part. We experiment
with the following query representation approaches: (M)
mixture V(m)q , (MS) mixture of superpositions V
(ms)
q , (T1)
tensor product with V(T1)q and (T2) with V
(T2)
q . To de-
fine the weight wi of query term ti, following [12], we use
normalised IDF.
Document representation. To create the document sub-
space, we segmented the document using a sliding window
approach. Each fragment is processed by stemming and re-
moving stop-words. Finally, to map the resulting sequence
of stems to a (unit) vector in the term space, we use the
binary weighting scheme, as it performed well in prelimi-
nary experiments, and in addition allows faster eigenvalue
decomposition computation.
To find an orthonormal basis and projector for the sub-
space Sd given the multiset of vectors Vd, we performed an
eigenvalue decomposition of
P
ϕ∈Vd
Vd (ϕ)ϕϕ> as
PD
i=1 λixix
>
i ,
where D is the number of eigenvectors with non-null eigen-
values (D is also the dimension of the associated subspace),
λi > 0 are the eigenvalues. It can be shown that the vec-
tors xi form an orthonormal basis of the subspace Sd (proof
omitted). Given the latter, the projector bSd onto the sub-
space Sd associated with document d is then expressed asPD
i=1 xix
>
i . For computational reasons, we bounded D to
not exceed a value of 25, which was found in preliminary
experiments to perform well.
Term representation. We compute the single-term query
density as ρt =
P
ϕ∈Vt
Vt (ϕ)ϕϕ> where Vt is the weighted
set defined by the ensemble of vectors corresponding to win-
dows of terms of span s centred on each occurrence of term
t in the documents. Using eigenvalue decomposition again,
we write ρt as
PD
i=1 λixix
>
i . As some terms occur in a great
number of documents and the potential rank (dimension of
the subspace where ρt is defined) of ρt can be very high, we
6We assumed a uniform distribution of the probability
Pr (Sd|Vi)
approximated it by limiting the number of considered docu-
ments to 10,000 and setting a maximum rank of 10 (D ≤ 10).
As the vectors constructed from the terms occurring in
the document sentences are only an approximation of the
underlying pure IN vectors, the vectors in Vt will contain
components that should not be assigned to the term t. To
reduce this noise, we used a held-out set of documents (20%),
denoted V∗t . For each pure IN aspect ϕ in V

t , we compute
the probability Pr (Sϕ|V∗t ) that the document composed of
only one sentence represented by ϕ is relevant with respect
to the density ρt (that is Sˆϕ = ϕϕ>). Through an exhaus-
tive search, we selected the dimension K maximising the log
likelihood value over all the vectors in V∗t . In this case, ρt is
defined7 as
PK
i=1 λixix
>
i .
5.2 Results
We report in Table 1 the results, using mean average pre-
cision (MAP). We compare the performance of BM25 (with
standard parameters, see [15]), TF-IDF (without document
normalisation) and, for the QIR framework, those instantia-
tions corresponding to each query construction process, i.e.
mixture, mixture of superpositions, and tensor product (T1
and T2). We used a window span of 5, the default setting
in [3].
Overall the results were consistent across all collections.
The MAP values are below that of BM25 for mixture and
mixture of superpositions, and comparable for both tensor
approaches. Given the novelty of our framework, and its
still unexplored parameters and their effect, we are satisfied
with its performance.
The performance of the QIR framework is well above that
of a simple TF-IDF model. This shows that, as will be
discussed in Section 6, the QIR framework includes some
document length normalisation, one based on the IN aspects
present in the document. Hence, it does not need to consider
explicitly document length.
The approach T1 performed the best, followed by T2, M
and MS. The fact that MS works worse than M can be due to
the fact that in general terms denote different components
of the INs – in [12] it was observed that using mixture of
superpositions was better suited for phrase queries. For the
tensor approaches, T2 performs similarly to T1 but in three
collections it has a lower performance. Given that T2 is more
closely related to the QIR framework, the performance of T2
is high enough to consider using it as a basis for interaction.
We analysed the influence of the sliding window span s
on the performance of T1, the best performing query repre-
sentation. In Figure 2, we report the difference in average
precision between BM25 (positive values means that T1 was
better), for different window spans ranging from 1 to 15. We
also report the results for the methodology used in [12, 13],
where sentences were used instead of sliding windows. We
observe that using a sliding window of 5 gives the best result
in terms of both median and variability, hence validating [3].
Finally, we were interested by the influence of the query
length (number of terms in T ). In Figure 3, we plotted
the difference in average precision between BM25 and the
performance of the QIR framework, depending on the query
length and on the query representation. We kept constant
7This was not performed for documents, which have a much
smaller number of associated vectors. Hence the above pro-
cess might remove important facets of the document. This
was validated experimentally in [12].
Page 8
hidden
TREC-1 TREC-2 TREC-3 TREC-5 TREC-6 TREC-7 TREC-8 RB-2004
BM25 0.230 0.209 0.282 0.148 0.224 0.182 0.236 0.242
TF-IDF 0.084† 0.041† 0.056† 0.035† 0.088† 0.056† 0.082† 0.074†
M 0.205† 0.184† 0.226† 0.115† 0.173† 0.142† 0.165† 0.180†
MS 0.209† 0.167† 0.206† 0.112∗ 0.157† 0.117† 0.159† 0.165†
T1 0.232 0.195† 0.281 0.148 0.214 0.182 0.234 0.240
T2 0.222 0.200 0.259† 0.139 0.216 0.179 0.212† 0.228†
Table 1: This table reports mean average precision (MAP). The first line shows the test collection. The second and third lines
show the MAP value for BM25 and TF-IDF, respectively. For the query construction, M stands for mixture, MS for mixture
of superpositions, T1 and T2 for tensor product. For completeness, significance of the difference with BM25 is shown for the
0.05 level (∗) and the 0.01 level (†).
the span of the window (5). We can first observe that in all
cases, the performance degrades with longer queries. This
shows that we need to improve the representation of multi-
term queries. We also note T1 does only degrade slightly
with longer queries.
Summarising, using a larger IN space and sliding windows
brought dramatic improvement over previous works. Apply-
ing tensor products of aspect spaces to explicitely address
different aspects of the IN seems to be a good choice; this
finding will be subject to further investigation. The results
also show that the tensor product representation T2, with
a window span of 5, is a valid starting point to exploit fur-
ther user interactions, although some work is needed on the
automatic query representation construction.
6. POTENTIAL OF THE QUANTUM IR
FRAMEWORK
In the previous section we showed that our framework
can compete with well-established approaches like BM25 in
an ad-hoc scenario. This is very promising, and the added
complexity of the quantum formalism brings exciting new
possibilities to address some of the current IR challenges.
In this section, we discuss the potential of the QIR frame-
work for three IR challenges, namely, interaction, novelty
and diversity. We show in this section how those three
facets are related in the framework. More precisely, handling
novelty and diversity in this framework is a consequence of
(1) handling interaction and (2) having a multi-dimensional
query and document representation. We discuss (1) and (2),
before addressing diversity and novelty.
6.1 Supporting Different Forms of Interaction
and Events
If the hypothesis of the existence of an IN space is correct,
then any event of interest can be represented as a subspace.
This includes the document relevance (used for relevance
feedback), and other forms of interactions such as query re-
formulation or a user click [14].
While interacting with an IR system, users change their
point of view, and relevance, contrarily to topicality, is ex-
pected to evolve within a search session in two ways [19]:
(P1) The IN becomes increasingly specific from a system
point of view, e.g. when a user types some keywords or
clicks on some documents; and (P2) The IN changes from a
user point of view. The IN becomes more specific as the user
reads some documents, or it can slightly drift as user inter-
ests do. We discuss how P1 and P2 are supported within
our framework by means of projection.
S 1 2 3 4 5 6 7 8 9 10 15

0.20

0.15

0.10

0.05
0.00
0.05
0.10
Window span (except for S)
Diffe
renc
e in A
P
Figure 2: Boxplot of the differences in AP between BM25
and the QIR model (with T1 query representation) with
respect to different types of window span and sentence level
fragments (S)
Updating Weighted Sets.
We detail how a weighted set, as defined in Section 3.1,
is updated when an event is realised. Given the subspace S
defining the event, we update the IN given by V by project-
ing each of the vectors in V onto the subspace defined by S,
which we write as follows (see [11] for a justification):
V B S = K
X
ϕ/bSϕ6=0
0
@
bSϕ


‚bSϕ



7→


‚bSϕ



2
× V (ϕ)
1
A (13)
Here, bS is the projector onto the subspace S and K is a
normalising factor ensuring that the weights in V B S sum
to 1.
The effect of applying Equation 13 on a weighted set V
is that a vector ϕ orthogonal to S is discarded (the length
of the projection is 0), and that the vectors in S are kept
as is since


‚bSϕ


‚ = ‖ϕ‖ = 1. The non orthogonal vectors
are projected onto S, and the final weight of each of these
vectors depends on how close it was from the subspace S
defining the interaction. Geometrically, this means that all
the vectors from V B S now belong to the subspace S, i.e.
the probability of S when the weighted set is V B S is 1.
Page 9
hidden
1 M 2 M 3 M 4 M 1 MS 2 MS 3 MS 4 MS 1 T1 2 T1 3 T1 4 T1
−0.3
−0.2
−0.1
0.0
0.1
Query length and representation
Diffe
renc
e in A
P
Figure 3: Boxplot of the differences in AP between BM25
and the QIR model (with T1 query representation) with
respect to query length (queries with length greater than 4
were grouped with queries of length 4). The width of each
boxplot is proportional to the number of topics of a given
length.
We illustrate this using Figure 1c, with a weighted set
composed of ϕusa, ϕp/usa and ϕp/uk with respective weights
0.5, 0.2 and 0.3, and considering the subspace S spanned by
the vectors ϕp and ϕuk. The state ϕusa would be discarded,
and ϕp/uk kept as is. Since ϕp/usa can be written as
1√
2
ϕp+
1√
2
ϕusa, ϕp/usa would be projected onto ϕp and


‚bSϕp


‚ =



1√
2
ϕp



2
= 12 . The new weighted set would be composed of
the vectors ϕp/uk and ϕp with respective weights 0.3K and
1
2 × 0.2K, where K is set to 2.5 so that the sum is 1 (the
final weights are respectively 0.75 and 0.25).
Supporting IN Dynamics.
The QIR framework supports both P1 and P2, which we
illustrate with the example of Figure 1c. We can define two
one-dimensional subspaces Sp/uk and Sp/usa from the two
vectors ϕp/uk and ϕp/usa. If the user judges a document
relevant to “pizzas in the UK”, then the weighted set will
be projected onto Sp/uk and reduced to a weighted set com-
posed of only ϕp/uk. This corresponds to process (P1) where
the IN has become more specific. If the user then says that
the document about pizzas in US is relevant, the IN will drift
towards the pizza-USA direction (it becomes ϕp/usa), hence
supporting (P2), and the probability that a document about
pizza-UK is relevant will be less than 1 (since the projection
of ϕp/usa onto Sp/uk is less than 1).
Event Subspaces.
In practice, to define the subspaces associated with each
interaction, we can imagine representations similar to that of
queries (Section 4.2). For instance, the query reformulation
subspace would be defined by the space spanned by pure IN
aspects corresponding to the query terms used in the refor-
mulation, and a click on a document would be represented
as a subspace defined by the pure IN aspects corresponding
to the terms present in the document snippet.
Negative Feedback.
Even without sophisticated interactions, the QIR frame-
work still allows us to consider the effect of negative rele-
vance feedback, a challenging task in interactive IR. If Sd
is the subspace defining the relevance of a document d that
is considered to be not relevant by the user, then the event
defining the feedback corresponds to the subspace S⊥d or-
thogonal to Sd. This is the dual of the approach defined
in [17], where a topic is defined by a main vector and a
set of vectors corresponding to non relevant documents. In
our approach, we start with an initial set of IN vectors and
discard those lying in the “not relevant subspace”.
6.2 Multi-dimensional Representation of Doc-
uments.
We now turn to the multi-dimensional representation of
documents. Standard IR models assume that a document
deals mainly with one topic. However, a document can cover
multiple topics [21], and there is yet no adequate represen-
tation of multi-topic documents. Some attempts to inte-
grate topic information have been made in language mod-
els (e.g., [18]), but the number of topics is usually small and
fixed for a collection, and the topic information is used to
smooth a mono-topical language model. In contrast, topics
are fined grained (pure IN) in our framework and are not re-
stricted in number – the number of dimensions spanned by
a document subspace is a measure of the number of topics
discussed in the document.
The mono-topical assumption requires document length
normalisation factors like in the BM25 model [15]. We have
shown in our experiments that no length normalisation is
needed within our QIR. Indeed, when a document covers
more topics, it uses more dimensions in the IN space, but if
it is centred on a single topic (in the extreme case, only one
pure IN), then it should use fewer dimensions. A way to see
this is that if we just replace the text of the document by
several copies of this text, then the subspace of the document
would not change, no matter how many duplicates we create.
This means that we capture document length normalisation:
the number of dimensions does not change, at least in theory,
if the document goes on discussing the same topic.
6.3 Diversity and Novelty
Novelty tackles the task of estimating how novel a docu-
ment is with respect to previous documents presented to the
user. Diversity refers to the possible IN aspects associated
with a query. With the QIR framework, both novelty and
diversity can be dealt with in an unified manner. Indeed,
we can express the probability that a new document covers
parts of the IN that were not covered by previous ones.
Diversity in Queries.
Diversity in queries is supported because we use weighted
sets of INs to represent a query. Indeed, the more diverse
this weighted set, the more diverse the query. Recent work
using language models [20] has shown that diversity can be
improved by considering the different possible “aspects” of
a query. Whereas [20] rely on a cost-based formalism, our
QIR framework defines the query as a set of possible INs.
Page 10
hidden
Handling Novelty and Diversity.
The framework offers a natural way to compute how rel-
evant and novel a document d is with respect to a diverse
user’s IN and a set of retrieved documents D. This is done
by computing the probability that a document d is relevant
to the IN given the user has judged all the documents in
D as non relevant. The subspace representing the negation
of the latter event is the union
W
d′∈D Sd′ of the subspaces
of the documents in D, i.e. the subspaces define the region
where answered pure INs do lie. We need to consider the
negation of this event, which translates into the orthogonal
of the union of the subspaces. We can now define the proba-
bility of interest as Pr

Sd|V B
`W
d′∈D Sd′
´⊥

. In practice,
this corresponds to negative relevance feedback (the docu-
ments in D are not relevant anymore); thus that strategy
cannot be used with standard IR models as they do not
handle negative relevance feedback well [17].
7. CONCLUSION
This paper discusses how an interactive probabilistic frame-
work inspired by quantum theory and using the mathematics
of Hilbert spaces can be used to tackle contemporary chal-
lenges in IR. Documents and information needs are defined
within a so-called IN space, which is extended to aspect
spaces, enabling us to process different (topical) facets of
information needs.
We proposed to define the IN space as a tensor product of
smaller term spaces and described a new query representa-
tion exploiting this new space. A query is then a conjunction
of requirements on these term spaces. We also proposed a
new technique to represent documents and terms, based on
sliding windows. This technique has the advantage that it
does not rely on document markup or sentence detection,
in contrast to previous approaches. As a first step, this
representation of documents and queries is evaluated in an
ad-hoc retrieval scenario, where it is shown that the perfor-
mance is comparable to standard IR methods like BM25,
dramatically improving the performance observed in previ-
ous experiments.
Our framework is now mature enough to compete with
standard methods in classical IR tasks, but with the prospect
of being able to reach far beyond them. The framework in-
cludes within the document and information need represen-
tation the necessary properties and expressiveness required
to handle in a principled way three main IR challenges,
namely interaction, diversity and novelty. Exploiting and
evaluating further interaction steps, and dealing with nov-
elty and diversity, is part of our future work.
Acknowledgements. This research was supported by an
Engineering and Physical Sciences Research Council grant
(Grant Number EP/F015984/2). Mounia Lalmas is cur-
rently funded by Microsoft Research/Royal Academy of En-
gineering.
8. REFERENCES
[1] J. Allan. Relevance feedback with too much data. In
SIGIR. ACM, 1995.
[2] P. Belhumeur, J. Hespanha, and D. Kriegman.
Eigenfaces vs. Fisherfaces: recognition using class
specific linear projection. IEEE TPAMI, 19(7), 1997.
[3] C. Burgess, K. Livesay, and K. Lund. Explorations in
context space: Words, sentences, discourse. Discourse
Processes, 2-3, 1998.
[4] L. Che, J. Zen, and N. Tokud. A ”stereo” document
representation for textual information retrieval.
JASIST, 5, 2006.
[5] C. L. Clarke, M. Kolla, G. V. Cormack,
O. Vechtomova, A. Ashkan, S. Bu¨ttcher, and
I. MacKinnon. Novelty and diversity in information
retrieval evaluation. In SIGIR. ACM, 2008.
[6] S. Deerwester, S. Dumais, G. Furnas, and
T. Landauer. Indexing by latent semantic analysis.
JASIST, 41(6), 1990.
[7] M. D. Dunlop. The effect of accessing nonmatching
documents on relevance feedback. ACM TOIS, 15(2),
1997.
[8] P. Ingwersen and K. Ja¨rvelin. The Turn: Integration
of Information Seeking and Retrieval in Context (The
Information Retrieval Series). Springer-Verlag, 2005.
[9] M. Melucci. A basis for information retrieval in
context. ACM TOIS, 26(3), 2008.
[10] D. Metzler and W. B. Croft. Combining the language
model and inference network approaches to retrieval.
IPM, 40(5), 2004.
[11] M. A. Nielsen and I. L. Chuang. Quantum
computation and quantum information. Cambridge
University Press, 2000.
[12] B. Piwowarski, I. Frommholz, M. Lalmas, and K. van
Rijsbergen. Exploring a multidimensional
representation of documents and queries. In RIAO,
2010.
[13] B. Piwowarski, I. Frommholz, Y. Moshfeghi,
M. Lalmas, and K. van Rijsbergen. Filtering
documents with subspaces. In ECIR, 2010.
[14] B. Piwowarski and M. Lalmas. A quantum-based
model for interactive information retrieval. In ICTIR,
2009.
[15] S. Robertson and H. Zaragoza. The probabilistic
relevance framework: BM25 and beyond. Foundations
and Trends in Information Retrieval, 3(4), 2009.
[16] C. J. van Rijsbergen. The Geometry of Information
Retrieval. Cambridge University Press, 2004.
[17] X. Wang, H. Fang, and C. Zhai. A study of methods
for negative relevance feedback. In SIGIR. ACM, 2008.
[18] X. Wei and W. B. Croft. Lda-based document models
for ad-hoc retrieval. In SIGIR. ACM, 2006.
[19] H. I. Xie. Shifts of interactive intentions and
information-seeking strategies in interactive
information retrieval. JASIS, 51(9), 2000.
[20] X. Yin, X. Huang, and Z. Li. Promoting Ranking
Diversity for Biomedical Information Retrieval Using
Wikipedia. In ECIR. Springer, 2010.
[21] C. X. Zhai, W. W. Cohen, and J. Lafferty. Beyond
independent relevance: methods and evaluation
metrics for subtopic retrieval. In SIGIR. ACM, 2003.
[22] G. Zuccon and L. Azzopardi. Using the Quantum
Probability Ranking Principle to Rank Interdependent
Documents. In ECIR. Springer, 2010.
[23] G. Zuccon, L. Azzopardi, and C. J. van Rijsbergen.
Semantic spaces: Measuring the distance between
different subspaces. In QI, 2009.

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

19 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
32% Ph.D. Student
 
16% Student (Master)
 
11% Associate Professor
by Country
 
21% United States
 
11% Germany
 
11% China