Automatic extraction of document topics

10Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

A keyword or topic for a document is a word or multi-word (sequence of 2 or more words) that summarizes in itself part of that document content. In this paper we compare several statistics-based language independent methodologies to automatically extract keywords. We rank words, multi-words, and word prefixes (with fixed length: 5 characters), by using several similarity measures (some widely known and some newly coined) and evaluate the results obtained as well as the agreement between evaluators. Portuguese, English and Czech were the languages experimented. © 2011 IFIP International Federation for Information Processing.

Cite

CITATION STYLE

APA

Teixeira, L., Lopes, G., & Ribeiro, R. A. (2011). Automatic extraction of document topics. In IFIP Advances in Information and Communication Technology (Vol. 349 AICT, pp. 101–108). Springer New York LLC. https://doi.org/10.1007/978-3-642-19170-1_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free