Estimating keyphrases popularity in sampling collections

3Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The problem of structured representation of data has high practical value and is particularly relevant due to growth of data volume. Such methods of data representation as topic graphs, concepts trees, etc. is a convenient way to represent information retrieved from a collection of documents. In this paper, we research some aspects of using a collection of samples for the evaluation of the popularity of concepts. The latter can be used to visualize concept significance and concept ranking in the tasks of structured representation. Multi-word phrases are considered as concepts. We address the case when these phrases are automatically extracted from the processed document collection. The popularity of a concept (e.g., visually can be presented as the size of the vertex in the topic graph) is judged by the number of documents containing this phrase. We elaborate the case when a sample from the document collection is used to estimate concept popularity. For this case we estimate how permissible is such representation of data, reflecting the proportions of the number of documents containing specific concepts. A frequency-based criterion and the procedure of its calculation is described in the paper. This helps to estimate the expedience of concept popularity representation in respect to the popularity of other concepts. The main aspect here is to establish the criteria when relations between values of concepts popularity in a sample are the same as in the population, and to establish the criterion for selecting n high-frequency concepts which have the same sample rank and frequency distributions as in the population.

Cite

CITATION STYLE

APA

Popova, S., Skitalinskaya, G., & Khodyrev, I. (2015). Estimating keyphrases popularity in sampling collections. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9416, pp. 481–491). Springer Verlag. https://doi.org/10.1007/978-3-319-26138-6_52

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free