A Simple and Effective Unsupervised Word Segmentation Approach

Songjian Chen; Yabo Xu; Huiyou Chang

Conference ProceedingsOPEN ACCESS

A Simple and Effective Unsupervised Word Segmentation Approach

Proceedings of the 25th AAAI Conference on Artificial Intelligence, AAAI 2011 (2011) 866-871

DOI: 10.1609/aaai.v25i1.7970

0Citations

43Readers

Abstract

In this paper, we propose a new unsupervised approach for word segmentation. The core idea of our approach is a novel word induction criterion called WordRank, which estimates the goodness of word hypotheses (character or phoneme sequences). We devise a method to derive exterior word boundary information from the link structures of adjacent word hypotheses and incorporate interior word boundary information to complete the model. In light of WordRank, word segmentation can be modeled as an optimization problem. A Viterbi-styled algorithm is developed for the search of the optimal segmentation. Extensive experiments conducted on phonetic transcripts as well as standard Chinese and Japanese data sets demonstrate the effectiveness of our approach. On the standard Brent version of Bernstein-Ratner corpora, our approach outperforms the state-ofthe-art Bayesian models by more than 3%. Plus, our approach is simpler and more efficient than the Bayesian methods. Consequently, our approach is more suitable for real-world applications.

Cite

CITATION STYLE

APA

Chen, S., Xu, Y., & Chang, H. (2011). A Simple and Effective Unsupervised Word Segmentation Approach. In Proceedings of the 25th AAAI Conference on Artificial Intelligence, AAAI 2011 (pp. 866–871). AAAI Press. https://doi.org/10.1609/aaai.v25i1.7970

A Simple and Effective Unsupervised Word Segmentation Approach

Abstract

Cite

Register to see more suggestions