When dictionuiw for specify applicationa or rubject fields are derived from a text collection, the frequedcy distribution of the term in the cokction gives information about the expected completeneae of the dictionary. If only a subset of the terms in the collection t to be included in the dictionary, the complctencas of the dictionary can be optimized with respect to dictionary tie. In this paper, formulas for the relationship between the frequency distribution of the terms in the collection and expected dictionary completeness are derived. First we regard one-dimetiond dictionaries where the (non-trivid) terme occuring in the texb UC to be iadudcd in the dicti~ nary. Then we describe the C~IX of tw~dimetiond dictionuiea, which are needed for example for automatic indexing with acontrolled vocabulary; here relationdip between text terms and descriptora from the prescribed vocabulary have to be #tored in the dictionary. For both cues, formulas for the interpolation and extrapolation with respect to difFereat collection aiaa UC derived. We give experimentd reauIb for one-dimensional dictionuiea and ahow how the completenem6 can be estimated and optimized.
CITATION STYLE
Hiither, H. (1989). On the interrelationship of dictionary size and completeness. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1990 (pp. 313–325). Association for Computing Machinery, Inc. https://doi.org/10.1145/96749.98234
Mendeley helps you to discover research relevant for your work.