TopX: Efficient and versatile top-k query processing for semistructured data

68Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistruetured data. TopX is a top-k retrieval engine for text and semistruetured data. It terminates query execution as soon as it can safely determine the k top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dynamic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for ranked XML retrieval with XPath Full-Text functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia.

Cite

CITATION STYLE

APA

Theobald, M., Bast, H., Majumdar, D., Schenkel, R., & Weikum, G. (2008). TopX: Efficient and versatile top-k query processing for semistructured data. VLDB Journal, 17(1), 81–115. https://doi.org/10.1007/s00778-007-0072-z

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free