Exploiting and evaluating a supervised, multilanguage keyphrase extraction pipeline for under-resourced languages

Marco Basaldella; Muhammad Helmy; Elisa Antolli; Mihai Horia Popescu; Giuseppe Serra; Carlo Tasso

Conference ProceedingsOPEN ACCESS

Exploiting and evaluating a supervised, multilanguage keyphrase extraction pipeline for under-resourced languages

International Conference Recent Advances in Natural Language Processing, RANLP (2017) 2017-September 78-85

DOI: 10.26615/978-954-452-049-6_012

4Citations

69Readers

Abstract

This paper evaluates different techniques for building a supervised, multilanguage keyphrase extraction pipeline for languages which lack a gold standard. Starting from an unsupervised English keyphrase extraction pipeline, we implement pipelines for Arabic, Italian, Portuguese, and Romanian, and we build test collections for languages which lack one. Then, we add a Machine Learning module trained on a well-known English language corpus and we evaluate the performance not only over English but on the other languages as well. Finally, we repeat the same evaluation after training the pipeline over an Arabic language corpus to check whether using a language-specific corpus brings a further improvement in performance. On the five languages we analyzed, results show an improvement in performance when using a machine learning algorithm, even if such algorithm is not trained and tested on the same language.

Cite

CITATION STYLE

APA

Basaldella, M., Helmy, M., Antolli, E., Popescu, M. H., Serra, G., & Tasso, C. (2017). Exploiting and evaluating a supervised, multilanguage keyphrase extraction pipeline for under-resourced languages. In International Conference Recent Advances in Natural Language Processing, RANLP (Vol. 2017-September, pp. 78–85). Incoma Ltd. https://doi.org/10.26615/978-954-452-049-6_012

Exploiting and evaluating a supervised, multilanguage keyphrase extraction pipeline for under-resourced languages

Abstract

Cite

Register to see more suggestions