Open-source tools for morphology, lemmatization, POS tagging and named entity recognition

148Citations
Citations of this article
147Readers
Mendeley users who have this article in their library.

Abstract

We present two recently released opensource taggers: NameTag is a free software for named entity recognition (NER) which achieves state-of-the-art performance on Czech; MorphoDiTa (Morphological Dictionary and Tagger) performs morphological analysis (with lemmatization), morphological generation, tagging and tokenization with state-of-the-art results for Czech and a throughput around 10-200K words per second. The taggers can be trained for any language for which annotated data exist, but they are specifically designed to be efficient for inflective languages, Both tools are free software under LGPL license and are distributed along with trained linguistic models which are free for non-commercial use under the CC BY-NC-SA license. The releases include standalone tools, C++ libraries with Java, Python and Perl bindings and web services.

Cite

CITATION STYLE

APA

Straková, J., Straka, M., & Hajič, J. (2014). Open-source tools for morphology, lemmatization, POS tagging and named entity recognition. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2014-June, pp. 13–18). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p14-5003

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free