Unsupervised joint monolingual character alignment and word segmentation

Zhiyang Teng; Hao Xiong; Qun Liu

Journal Article

Unsupervised joint monolingual character alignment and word segmentation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8801 1-12

DOI: 10.1007/978-3-319-12277-9_1

1Citations

8Readers

Get full text

Abstract

We propose a novel Bayesian model for fully unsupervised word segmentation based on monolingual character alignment. Adapted bilingual word alignment models and a Bayesian language model are combined through product of experts to estimate the joint posterior distribution of a monolingual character alignment and the corresponding segmentation. Our approach enhances the performance of conventional hierarchical Pitman-Yor language models with richer character-level features. In the conducted experiments, our model achieves an 88.6% word token f-score on the standard Brent version of the Bemstein-Ratner corpora. Moreover, on standard Chinese segmentation datasets, our method outperforms a baseline model by 1.9-2.9 f-score points.

Author supplied keywords

Cite

CITATION STYLE

APA

Teng, Z., Xiong, H., & Liu, Q. (2014). Unsupervised joint monolingual character alignment and word segmentation. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8801, 1–12. https://doi.org/10.1007/978-3-319-12277-9_1

Unsupervised joint monolingual character alignment and word segmentation

Abstract

Author supplied keywords

Cite

Register to see more suggestions