Back to our roots for retrieving very short passages

Nada Naji; Jacques Savoy

Conference ProceedingsOPEN ACCESS

Back to our roots for retrieving very short passages

Proceedings of the ASIST Annual Meeting (2013) 50(1)

DOI: 10.1002/meet.14505001035

0Citations

6Readers

Abstract

This article tackles the task of retrieving very short documents via even shorter queries. The problem on hand may relate to the retrieval of tweets, image and table captions, short text messages (SMS) and sponsored retrieval among others. In such cases, document and/or query expansion using thesauri and other external resources (e.g., Wikipedia) usually available on the World Wide Web (WWW) are proven to be effective approaches. However, the focus of this paper is on documents that are written in lesser known languages for which the WWW is of limited use. Our experiments are based on two main corpora extracted from historical manuscripts written in Latin and Middle High German. We found that retrieving very short documents whose lengths are quite similar via short queries given that no external enrichment resources are available, the classical tf-idf model performs as satisfactorily as the more complex models do, if not better sometimes.

Author supplied keywords

Cite

CITATION STYLE

APA

Naji, N., & Savoy, J. (2013). Back to our roots for retrieving very short passages. In Proceedings of the ASIST Annual Meeting (Vol. 50). John Wiley and Sons Inc. https://doi.org/10.1002/meet.14505001035

Back to our roots for retrieving very short passages

Abstract

Author supplied keywords

Cite

Register to see more suggestions