Automatic extraction of document topics

Luís Teixeira; Gabriel Lopes; Rita A. Ribeiro

Conference ProceedingsOPEN ACCESS

Automatic extraction of document topics

IFIP Advances in Information and Communication Technology (2011) 349 AICT 101-108

DOI: 10.1007/978-3-642-19170-1_11

10Citations

24Readers

Abstract

A keyword or topic for a document is a word or multi-word (sequence of 2 or more words) that summarizes in itself part of that document content. In this paper we compare several statistics-based language independent methodologies to automatically extract keywords. We rank words, multi-words, and word prefixes (with fixed length: 5 characters), by using several similarity measures (some widely known and some newly coined) and evaluate the results obtained as well as the agreement between evaluators. Portuguese, English and Czech were the languages experimented. © 2011 IFIP International Federation for Information Processing.

Author supplied keywords

Cite

CITATION STYLE

APA

Teixeira, L., Lopes, G., & Ribeiro, R. A. (2011). Automatic extraction of document topics. In IFIP Advances in Information and Communication Technology (Vol. 349 AICT, pp. 101–108). Springer New York LLC. https://doi.org/10.1007/978-3-642-19170-1_11

Automatic extraction of document topics

Abstract

Author supplied keywords

Cite

Register to see more suggestions