Skip to content

Similarity Measures for Short Segments of Text

by Donald Metzler, Susan Dumais, Christopher Meek
Proceedings of the 29th European Conference on IR Research (ECIR 2007) ()
Get full text at journal


Measuring the similarity between documents and queries has been extensively studied in information retrieval. However, there are a growing number of tasks that require computing the similarity between two very short segments of text. These tasks include query reformulation, sponsored search, and image retrieval. Standard text similarity measures perform poorly on such tasks because of data sparseness and the lack of context. In this work, we study this problem from an information retrieval perspective, focusing on text representations and similarity measures. We examine a range of similarity measures, including purely lexical measures, stemming, and language modeling-based measures. We formally evaluate and analyze the methods on a query-query similarity task using 363,822 queries from a web search log. Our analysis provides insights into the strengths and weaknesses of each method, including important tradeoffs between effectiveness and efficiency.

Cite this document (BETA)

Readership Statistics

292 Readers on Mendeley
by Discipline
88% Computer Science
3% Business, Management and Accounting
3% Engineering
by Academic Status
33% Student > Ph. D. Student
27% Student > Master
13% Researcher
by Country
3% United States
3% France
3% Germany

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Sign up & Download

Already have an account? Sign in