This article tackles the task of retrieving very short documents via even shorter queries. The problem on hand may relate to the retrieval of tweets, image and table captions, short text messages (SMS) and sponsored retrieval among others. In such cases, document and/or query expansion using thesauri and other external resources (e.g., Wikipedia) usually available on the World Wide Web (WWW) are proven to be effective approaches. However, the focus of this paper is on documents that are written in lesser known languages for which the WWW is of limited use. Our experiments are based on two main corpora extracted from historical manuscripts written in Latin and Middle High German. We found that retrieving very short documents whose lengths are quite similar via short queries given that no external enrichment resources are available, the classical tf-idf model performs as satisfactorily as the more complex models do, if not better sometimes.
CITATION STYLE
Naji, N., & Savoy, J. (2013). Back to our roots for retrieving very short passages. In Proceedings of the ASIST Annual Meeting (Vol. 50). John Wiley and Sons Inc. https://doi.org/10.1002/meet.14505001035
Mendeley helps you to discover research relevant for your work.