Integrating text chunking with mixture Hidden Markov Models for effective biomedical information extraction

Min Song; Il Yeol Song; Xiaohua Hu; Robert B. Allen

Conference ProceedingsOPEN ACCESS

Integrating text chunking with mixture Hidden Markov Models for effective biomedical information extraction

Lecture Notes in Computer Science (2005) 3515(II) 976-984

DOI: 10.1007/11428848_124

10Citations

9Readers

Abstract

This paper presents a new information extraction (IE) technique, KXtractor, which integrates a text chunking technique with Mixture Hidden Markov Models (MiHMM). KXtractor is differentiated from other approaches in that (a) it overcomes the problem of the single Part-Of-Speech (POS) HMMs with modeling the rich representation of text where features overlap among state units such as word, line, sentence, and paragraph. By incorporating sentence structures into the learned models, KXtractor provides better extraction accuracy than the single POS HMMs do. (b) It resolves the issues with the traditional HMMs for IE that operate only on the semi-structured data such as HTML documents and other text sources in which language grammar does not play a pivotal role. We compared KXtractor with three IE techniques: 1) RAPIER, an inductive learning-based machine learning system, 2) a Dictionary-based extraction system, and 3) single POS HMM. Our experiments showed that KXtractor outperforms these three IE systems in extracting protein-protein interactions. In our experiments, F-measure for KXtractor was higher than ones for RAPIER, a dictionary-based system, and single POS HMM respectively by 16.89%, 16.28%, and 8.58%. In addition, both precision and recall of KXtractor are higher than those systems. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Song, M., Song, I. Y., Hu, X., & Allen, R. B. (2005). Integrating text chunking with mixture Hidden Markov Models for effective biomedical information extraction. In Lecture Notes in Computer Science (Vol. 3515, pp. 976–984). Springer Verlag. https://doi.org/10.1007/11428848_124

Integrating text chunking with mixture Hidden Markov Models for effective biomedical information extraction

Abstract

Cite

Register to see more suggestions