Fine-grained POS tagging of german social media and web texts

0Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper presents work on part-of-speech tagging of German social media and web texts. We take a simple Hidden Markov Model based tagger as a starting point, and extend it with a distributional approach to estimating lexical (emission) probabilities of out-of-vocabulary words, which occur frequently in social media and web texts and are a major reason for the low performance of off-the-shelf taggers on these types of text. We evaluate our approach on the recent EmpiriST 2015 shared task dataset and show that our approach improves accuracy on out-of-vocabulary tokens by up to 5.8%; overall, we improve state-of-the-art by 0.4% to 90.9% accuracy.

References Powered by Scopus

Part-of-speech tagging from 97% to 100%: Is it time for some linguistics?

268Citations
N/AReaders
Get full text

Fine-grained POS tagging of German tweets

18Citations
N/AReaders
Get full text

The dortmund chat corpus

13Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Thater, S. (2018). Fine-grained POS tagging of german social media and web texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10713 LNAI, pp. 72–80). Springer Verlag. https://doi.org/10.1007/978-3-319-73706-5_7

Readers over time

‘18‘19‘20‘21‘2400.751.52.253

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 3

50%

Professor / Associate Prof. 1

17%

Lecturer / Post doc 1

17%

Researcher 1

17%

Readers' Discipline

Tooltip

Computer Science 5

83%

Linguistics 1

17%

Save time finding and organizing research with Mendeley

Sign up for free
0