On the interrelationship of dictionary size and completeness

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

When dictionuiw for specify applicationa or rubject fields are derived from a text collection, the frequedcy distribution of the term in the cokction gives information about the expected completeneae of the dictionary. If only a subset of the terms in the collection t to be included in the dictionary, the complctencas of the dictionary can be optimized with respect to dictionary tie. In this paper, formulas for the relationship between the frequency distribution of the terms in the collection and expected dictionary completeness are derived. First we regard one-dimetiond dictionaries where the (non-trivid) terme occuring in the texb UC to be iadudcd in the dicti~ nary. Then we describe the C~IX of tw~dimetiond dictionuiea, which are needed for example for automatic indexing with acontrolled vocabulary; here relationdip between text terms and descriptora from the prescribed vocabulary have to be #tored in the dictionary. For both cues, formulas for the interpolation and extrapolation with respect to difFereat collection aiaa UC derived. We give experimentd reauIb for one-dimensional dictionuiea and ahow how the completenem6 can be estimated and optimized.

References Powered by Scopus

The automatic indexing system AIR/PHYS-from research to application

30Citations
N/AReaders
Get full text

A sampling theorem for finite discrete distributions

11Citations
N/AReaders
Get full text

Optimum probability estimation from empirical distributions

9Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Hiither, H. (1989). On the interrelationship of dictionary size and completeness. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1990 (pp. 313–325). Association for Computing Machinery, Inc. https://doi.org/10.1145/96749.98234

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 2

50%

Lecturer / Post doc 1

25%

Researcher 1

25%

Readers' Discipline

Tooltip

Computer Science 5

100%

Save time finding and organizing research with Mendeley

Sign up for free