Hidden Markov model using Dirichlet process for de-identification

Tao Chen; Richard M. Cullen; Marshall Godwin

Journal ArticleOPEN ACCESS

Hidden Markov model using Dirichlet process for de-identification

Journal of Biomedical Informatics (2015) 58 S60-S66

DOI: 10.1016/j.jbi.2015.09.004

17Citations

58Readers

Abstract

For the 2014 i2b2/UTHealth de-identification challenge, we introduced a new non-parametric Bayesian hidden Markov model using a Dirichlet process (HMM-DP). The model intends to reduce task-specific feature engineering and to generalize well to new data. In the challenge we developed a variational method to learn the model and an efficient approximation algorithm for prediction. To accommodate out-of-vocabulary words, we designed a number of feature functions to model such words. The results show the model is capable of understanding local context cues to make correct predictions without manual feature engineering and performs as accurately as state-of-the-art conditional random field models in a number of categories. To incorporate long-range and cross-document context cues, we developed a skip-chain conditional random field model to align the results produced by HMM-DP, which further improved the performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, T., Cullen, R. M., & Godwin, M. (2015). Hidden Markov model using Dirichlet process for de-identification. Journal of Biomedical Informatics, 58, S60–S66. https://doi.org/10.1016/j.jbi.2015.09.004

Hidden Markov model using Dirichlet process for de-identification

Abstract

Author supplied keywords

Cite

Register to see more suggestions