This paper describes the baseline dictionary-based Lithuanian lemmatizer designed for an open online collaborative Machine Translation system. We evaluated our tool on the gold standard corpus composed of four different domains (official documents, fiction texts, scientific texts, and periodicals) containing ~1 million running words in total and obtained an encouraging accuracy equal to ~85.7%. Afterwards, we have made an error analysis, which will be used for the further improvements of our lemmatizer.
CITATION STYLE
Kapočiūtė-Dzikienė, J., Berment, V., & Rimkutė, E. (2017). Towards creation of a Lithuanian lemmatizer for open online collaborative machine translation. In Communications in Computer and Information Science (Vol. 756, pp. 515–527). Springer Verlag. https://doi.org/10.1007/978-3-319-67642-5_43
Mendeley helps you to discover research relevant for your work.