Towards a “Universal Dictionary” for Multi-Language Information Retrieval Applications

J. Michael Schultz; Mark Y. Liberman

Book Chapter

Towards a “Universal Dictionary” for Multi-Language Information Retrieval Applications

Schultz J
Liberman M

DOI: 10.1007/978-1-4615-0933-2_11

N/ACitations

8Readers

Get full text

Abstract

Multilingual information retrieval tasks such as Topic Tracking have yielded high-quality results simply using word-by-word translation approaches. However, the construction of translation dictionaries for new languages is expensive and time-consuming. We show that an appropriate metric for term selection in a monolingual English corpus allows us to define a fairly small list, containing about ten thousand inflected forms or about 7500 lemmas, which works essentially as well (for a particular monolingual document classification evaluation) as an unlimited vocabulary of more than 300,000 word forms does. We suggest that such a list can be taken to form the English axis of a sort of "universal dictionary" for document classification tasks, providing a much more efficient path to the addition of new languages.

Cite

CITATION STYLE

APA

Schultz, J. M., & Liberman, M. Y. (2002). Towards a “Universal Dictionary” for Multi-Language Information Retrieval Applications (pp. 225–241). https://doi.org/10.1007/978-1-4615-0933-2_11

Towards a “Universal Dictionary” for Multi-Language Information Retrieval Applications

Abstract

Cite

Register to see more suggestions