Selective compound splitting of swedish queries for boolean combinations of truncated terms

Rickard Cöster; Magnus Sahlgren; Jussi Karlgren

Journal Article

Selective compound splitting of swedish queries for boolean combinations of truncated terms

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3237 337-344

DOI: 10.1007/978-3-540-30222-3_32

3Citations

2Readers

Get full text

Abstract

In languages that use compound words such as Swedish, it is often neccessary to split compound words when indexing documents or queries. One of the problems is that it is difficult to find constituents that express a concept similar to that expressed by the compound. The approach taken here is to expand a query with the leading constituents of the compound words. Every query term is truncated so as to increase recall by hopefully finding other compounds with the leading constituent as prefix. This approach increases recall in a rather uncontrolled way, so we use a Boolean quorum-level search method to rank documents both according to a tf-idf factor but also to the number of matching Boolean combinations. The Boolean combinations performed relatively well, taking into consideration that the queries were very short (maximum of five search terms). Also included in this paper are the results of two other methods we are currently working on in our lab; one for re-ranking search results on the basis of stylistic analysis of documents, and one for dimensionality reduction using Random Indexing. © Springer-Verlag 2004.

Cite

CITATION STYLE

APA

Cöster, R., Sahlgren, M., & Karlgren, J. (2004). Selective compound splitting of swedish queries for boolean combinations of truncated terms. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3237, 337–344. https://doi.org/10.1007/978-3-540-30222-3_32

Selective compound splitting of swedish queries for boolean combinations of truncated terms

Abstract

Cite

Register to see more suggestions