Towards a “Universal Dictionary” for Multi-Language Information Retrieval Applications

  • Schultz J
  • Liberman M
N/ACitations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Multilingual information retrieval tasks such as Topic Tracking have yielded high-quality results simply using word-by-word translation approaches. However, the construction of translation dictionaries for new languages is expensive and time-consuming. We show that an appropriate metric for term selection in a monolingual English corpus allows us to define a fairly small list, containing about ten thousand inflected forms or about 7500 lemmas, which works essentially as well (for a particular monolingual document classification evaluation) as an unlimited vocabulary of more than 300,000 word forms does. We suggest that such a list can be taken to form the English axis of a sort of "universal dictionary" for document classification tasks, providing a much more efficient path to the addition of new languages.

Cite

CITATION STYLE

APA

Schultz, J. M., & Liberman, M. Y. (2002). Towards a “Universal Dictionary” for Multi-Language Information Retrieval Applications (pp. 225–241). https://doi.org/10.1007/978-1-4615-0933-2_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free