Skip to content
Journal article

Similarity Measures for Short Segments of Text

Metzler, Donald and Dumais, Susan and Meek C ...see all

Proceedings of the 29th European conference on IR research (2007) pp. 16--27

  • 290


    Mendeley users who have this article in their library.
  • 112


    Citations of this article.
  • N/A


    ScienceDirect users who have downloaded this article.
Sign in to save reference


Measuring the similarity between documents and queries has been extensively studied in information retrieval. However, there are a growing number of tasks that require computing the similarity between two very short segments of text. These tasks include query reformulation, sponsored search, and image retrieval. Standard text similarity measures perform poorly on such tasks because of data sparseness and the lack of context. In this work, we study this problem from an information retrieval perspective, focusing on text representations and similarity measures. We examine a range of similarity measures, including purely lexical measures, stemming, and language modeling-based measures. We formally evaluate and analyze the methods on a query-query similarity task using 363,822 queries from a web search log. Our analysis provides insights into the strengths and weaknesses of each method, including important tradeoffs between effectiveness and efficiency.

Author-supplied keywords

  • and image retrieval
  • any terms in common
  • both the query and
  • document
  • document do not have
  • fail when directly applied
  • if the query and
  • on terms occurring in
  • query-image caption similarity
  • similarity
  • standard text similarity measures
  • such measures rely heavily
  • tasks
  • the
  • then they
  • to these
  • unfortunately

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document

Get full text


  • Christopher Metzler, Donald and Dumais, Susan and Meek

Cite this document

Choose a citation style from the tabs below