Developing a robust part-of-speech tagger for biomedical text

Yoshimasa Tsuruoka; Yuka Tateishi; Jin Dong Kim; Tomoko Ohta; John McNaught; Sophia Ananiadou; Jun'ichi Tsujii

Conference Proceedings

Developing a robust part-of-speech tagger for biomedical text

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3746 LNCS 382-392

DOI: 10.1007/11573036_36

379Citations

129Readers

Get full text

Abstract

This paper presents a part-of-speech tagger which is specifically tuned for biomedical text. We have built the tagger with maximum entropy modeling and a state-of-the-art tagging algorithm. The tagger was trained on a corpus containing newspaper articles said biomedical documents so that it would work well on various types of biomedical text. Experimental results on the Wall Street Journal corpus, the GENIA corpus, and the PennBioIE corpus revealed that adding training data from a different domain does not hurt the performance of a tagger, and our tagger exhibits very good precision (97% to 98%) on all these corpora. We also evaluated the robustness of the tagger using recent MEDLINE articles. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Tsuruoka, Y., Tateishi, Y., Kim, J. D., Ohta, T., McNaught, J., Ananiadou, S., & Tsujii, J. (2005). Developing a robust part-of-speech tagger for biomedical text. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3746 LNCS, pp. 382–392). https://doi.org/10.1007/11573036_36

Developing a robust part-of-speech tagger for biomedical text

Abstract

Cite

Register to see more suggestions