TopX: Efficient and versatile top-k query processing for semistructured data

Martin Theobald; Holger Bast; Debapriyo Majumdar; Ralf Schenkel; Gerhard Weikum

Journal ArticleOPEN ACCESS

TopX: Efficient and versatile top-k query processing for semistructured data

VLDB Journal (2008) 17(1) 81-115

DOI: 10.1007/s00778-007-0072-z

70Citations

33Readers

Abstract

Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistruetured data. TopX is a top-k retrieval engine for text and semistruetured data. It terminates query execution as soon as it can safely determine the k top-ranked result elements according to a monotonic score aggregation function with respect to a multidimensional query. It efficiently supports vague search on both content- and structure-oriented query conditions for dynamic query relaxation with controllable influence on the result ranking. The main contributions of this paper unfold into four main points: (1) fully implemented models and algorithms for ranked XML retrieval with XPath Full-Text functionality, (2) efficient and effective top-k query processing for semistructured data, (3) support for integrating thesauri and ontologies with statistically quantified relationships among concepts, leveraged for word-sense disambiguation and query expansion, and (4) a comprehensive description of the TopX system, with performance experiments on large-scale corpora like TREC Terabyte and INEX Wikipedia.

Author supplied keywords

Cite

CITATION STYLE

APA

Theobald, M., Bast, H., Majumdar, D., Schenkel, R., & Weikum, G. (2008). TopX: Efficient and versatile top-k query processing for semistructured data. VLDB Journal, 17(1), 81–115. https://doi.org/10.1007/s00778-007-0072-z

TopX: Efficient and versatile top-k query processing for semistructured data

Abstract

Author supplied keywords

Cite

Register to see more suggestions