Tagging medical documents with high accuracy

Udo Hahn; Joachim Wermter

Conference Proceedings

Tagging medical documents with high accuracy

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (2004) 3157 852-861

DOI: 10.1007/978-3-540-28633-2_90

6Citations

5Readers

Get full text

Abstract

We ran both Brill's rule-based tagger and TNT, a statistical tagger, with a default German newspaper-language model on a medical text corpus. Supplied with limited lexicon resources, TNT outperforms the Brill tagger with state-of-the-art performance figures (close to 97% accuracy). We then trained TNT on a large annotated medical text corpus, with a slightly extended tagset that captures certain medical language particularities, and achieved 98% tagging accuracy. Hence, statistical off-the-shelf POS taggers cannot only be immediately reused for medical NLP, but they also achieve - when trained on medical corpora - a higher performance level than for the newspaper genre. © Springer-Verlag Berlin Heidelberg 2004.

Cite

CITATION STYLE

APA

Hahn, U., & Wermter, J. (2004). Tagging medical documents with high accuracy. In Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 3157, pp. 852–861). Springer Verlag. https://doi.org/10.1007/978-3-540-28633-2_90

Tagging medical documents with high accuracy

Abstract

Cite

Register to see more suggestions