Language independent extraction of key terms: An extensive comparison of metrics

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper twenty language independent statistically-based metrics used for key term extraction from any document collection are compared. Some of those metrics are widely used for this purpose. The others were recently created. Two different document representations are considered in our experiments. One is based on words and multi-words and the other is based on word prefixes of fixed length (5 characters for the experiments made). Prefixes were used for studying how morphologically rich languages, namely Portuguese and Czech behave when applying this other kind of representation. English is also studied taking it, as a non-morphologically rich language. Results are manually evaluated and agreement between evaluators is assessed using k-Statistics. The metrics based on Tf-Idf and Phi-square proved to have higher precision and recall. The use of prefix-based representation of documents enabled a significant precision improvement for documents written in Portuguese. For Czech, recall also improved. © Springer-Verlag Berlin Heidelberg 2013.

Cite

CITATION STYLE

APA

Teixeira, L. F. S., Lopes, G. P., & Ribeiro, R. A. (2013). Language independent extraction of key terms: An extensive comparison of metrics. In Communications in Computer and Information Science (Vol. 358, pp. 69–82). Springer Verlag. https://doi.org/10.1007/978-3-642-36907-0_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free