Comparing representations in chinese information retrieval

79Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Three representation methods are empirically investigated for Chinese information retrieval: 1-gram (single character), bigram (two contiguous overlapping characters), and short-word indexing based on a simple segmentation of the text. The retrieval collection is the approximately 170 MB TREC-5 Chinese corpus of news articles, and 28 queries that are long and rich in wordings. Evaluation shows that 1-gram indexing is good but not sufficiently competitive, while bigram indexing works surprisingly well. Bigram indexing leads to a large index term space, three times that of short-word indexing, but is as good as short-word indexing in precision, and about 5% better in relevants retrieved. The best average non-interpolated precision is about 0.45, 17% better than 1-gram indexing and quite high for a mainly statistical approach. Copyright 1997 ACM.

Cite

CITATION STYLE

APA

Kwok, K. L. (1997). Comparing representations in chinese information retrieval. SIGIR Forum (ACM Special Interest Group on Information Retrieval), 31(1 SPEC. ISS.), 34–39. https://doi.org/10.1145/278459.258531

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free