Unsupervised joint monolingual character alignment and word segmentation

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a novel Bayesian model for fully unsupervised word segmentation based on monolingual character alignment. Adapted bilingual word alignment models and a Bayesian language model are combined through product of experts to estimate the joint posterior distribution of a monolingual character alignment and the corresponding segmentation. Our approach enhances the performance of conventional hierarchical Pitman-Yor language models with richer character-level features. In the conducted experiments, our model achieves an 88.6% word token f-score on the standard Brent version of the Bemstein-Ratner corpora. Moreover, on standard Chinese segmentation datasets, our method outperforms a baseline model by 1.9-2.9 f-score points.

Cite

CITATION STYLE

APA

Teng, Z., Xiong, H., & Liu, Q. (2014). Unsupervised joint monolingual character alignment and word segmentation. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8801, 1–12. https://doi.org/10.1007/978-3-319-12277-9_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free