Fine-grained POS tagging of german social media and web texts

Stefan Thater

Conference ProceedingsOPEN ACCESS

Fine-grained POS tagging of german social media and web texts

Thater S

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10713 LNAI 72-80

DOI: 10.1007/978-3-319-73706-5_7

0Citations

7Readers

Abstract

This paper presents work on part-of-speech tagging of German social media and web texts. We take a simple Hidden Markov Model based tagger as a starting point, and extend it with a distributional approach to estimating lexical (emission) probabilities of out-of-vocabulary words, which occur frequently in social media and web texts and are a major reason for the low performance of off-the-shelf taggers on these types of text. We evaluate our approach on the recent EmpiriST 2015 shared task dataset and show that our approach improves accuracy on out-of-vocabulary tokens by up to 5.8%; overall, we improve state-of-the-art by 0.4% to 90.9% accuracy.

Cite

CITATION STYLE

APA

Thater, S. (2018). Fine-grained POS tagging of german social media and web texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10713 LNAI, pp. 72–80). Springer Verlag. https://doi.org/10.1007/978-3-319-73706-5_7

Fine-grained POS tagging of german social media and web texts

Abstract

Cite

Register to see more suggestions