Semantic splitting of German medical compounds

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Compounding is widespread in highly inflectional languages with a quarter of all nouns created by composition. In our field of study, the German medical language, the amount of compounds significantly outnumbers this figure with 64%. Thus, their correct splitting is a high-impact preprocessing step for any NLP-based application. In this work we address two challenges of medical decomposition: First, we introduce the consideration of unknown constituents in order to split compounds that were not recognized as such so far. Second, our approach builds on the corpus-based approach of Koehn and Knight and adds semantic knowledge from domain ontologies to increase the accuracy during disambiguation of the various split options. Using this first-of-a-kind semantic approach in a study on decomposition of German medical compounds, we outperform the existing approaches by far.

Cite

CITATION STYLE

APA

Bretschneider, C., & Zillner, S. (2015). Semantic splitting of German medical compounds. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9302, pp. 207–215). Springer Verlag. https://doi.org/10.1007/978-3-319-24033-6_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free