Topic modeling for mediated access to very large document collections

  • Muresan G
  • Harper D
  • 34

    Readers

    Mendeley users who have this article in their library.
  • 28

    Citations

    Citations of this article.

Abstract

Clear and precise queries are a necessity when searching very large document collections, especially when query-based retrieval is the only means of exploration. We propose system-mediated information access as a solution for users' well-documented inability to formulate good queries. Our approach is based on two main assumptions: first, on the ability of document clustering to reveal the topical, semantic structure of a problem domain represented by a specialized ``source collection,{''} and, second, on the capacity of statistical language models to convey content. Taking the role of the human mediator or intermediary searcher, a mediation system interacts with the user and supports her exploration of a relatively small source collection, chosen to be representative for the problem domain. Based on the user's selection of relevant ``exemplary{''} documents and clusters from this source collection, the system builds a language model of her information need. This model is subsequently used to derive ``mediated queries,{''} which are expected to convey precisely and comprehensively the user's information need, and can be submitted by the user to search any large and heterogeneous ``target collections.{''} We present results of experiments that simulated various mediation strategies and compared the effect on mediation effectiveness of a variety of parameters, such as the similarity measure, the weighting scheme, and the clustering method. They provide both upperbounds of performance that can potentially be reached by real end users and a comparison between the effectiveness of these strategies. The experimental evidence suggests that information retrieval mediated through a clustered specialized collection has potential to improve effectiveness significantly.}

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Authors

  • Gheorghe Muresan

  • David J. Harper

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free