Word and sentence segmentation in german: overcoming idiosyncrasies in the use of punctuation in private communication

0Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this paper, we present a segmentation system for German texts. We apply conditional random fields (CRF), a statistical sequential model, to a type of text used in private communication. We show that by segmenting individual punctuation, and by taking into account freestanding lines and that using unsupervised word representation (i. e., Brown clustering, Word2Vec and Fasttext) achieved a label accuracy of 96% in a corpus of postcards used in private communication.

Cite

CITATION STYLE

APA

Sugisaki, K. (2018). Word and sentence segmentation in german: overcoming idiosyncrasies in the use of punctuation in private communication. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10713 LNAI, pp. 62–71). Springer Verlag. https://doi.org/10.1007/978-3-319-73706-5_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free