We use the hypergeometric distribution to extract relevant information from documents. The hypergeometric distribution gives the probability estimate of observing a given term frequency with respect to a prior. The lower the probability the higher the amount of information is carried by the term. Given a subset of documents, the information items are weighted by using the inversely related function of of the hypergeometric distribution, We here provide an exemplifying introduction to a topic-driven Information extraction from a document collection based on the hypergeometric distribution, © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Amati, G. (2006). Information theoretic approach to information extraction. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4027 LNAI, pp. 519–529). Springer Verlag. https://doi.org/10.1007/11766254_44
Mendeley helps you to discover research relevant for your work.