Simultaneous character-cluster-based word segmentation and named entity recognition in thai language

Nattapong Tongtep; Thanaruk Theeramunkong

Conference Proceedings

Simultaneous character-cluster-based word segmentation and named entity recognition in thai language

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 6746 LNAI 216-225

DOI: 10.1007/978-3-642-24788-0_20

5Citations

2Readers

Get full text

Abstract

Named entity recognition in inherent-vowel alphabetic languages such as Burmese, Khmer, Lao, Tamil, Telugu, Bali, and Thai, is difficult since there are no explicit boundaries among words or sentences. This paper presents a novel method to exploit the concept of character clusters, a sequence of inseparable characters, to group characters into clusters, utilize statistics among characters and their clusters to extract Thai words and then recognize named entities, simultaneously. Integrated of two phases, the word-segmentation model and the named-entity-recognition model, context features are exploited to learn parameters for these two discriminative probabilistic models, i.e., CRFs, to rank a set of word and named entity candidates generated. The experimental result shows that our method significantly increases the performance of segmenting word and recognizing entities with the F-measure of 96.14% and 83.68%, respectively. © 2011 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Tongtep, N., & Theeramunkong, T. (2011). Simultaneous character-cluster-based word segmentation and named entity recognition in thai language. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6746 LNAI, pp. 216–225). https://doi.org/10.1007/978-3-642-24788-0_20

Simultaneous character-cluster-based word segmentation and named entity recognition in thai language

Abstract

Author supplied keywords

Cite

Register to see more suggestions