Building a Thai part-of-speech tagged corpus (ORCHID)

34Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

ORCHID (Open linguistic Resources CHanelled toward InterDisciplinary research) is an initiative project aimed at building linguistic resources to support research in, but not limited to, natural language processing. Based on the concept of an open architecture design, the resources must be fully compatible with similar resources, and software tools must also be made available. This paper presents one result of the project, the construction of a Thai part-of-speech (POS) tagged corpus, which is a preliminary stage in the construction of a Thai speech corpus. The POS-tagged corpus is the result of collaborative research between the Communications Research Laboratory (CRL) in Japan and the National Electronics and Computer Technology Center (NECTEC) in Thailand, with technical support from the Electrotechnical Laboratory (ETL) in Japan. In this paper, we propose a new tagset, based on the results of a prior multilingual machine translation project. The corpus is annotated on three levels: the paragraph, sentence, and word levels. Text information is maintained in the form of the text information lines and the number lines, which are both utilized in data retrieval. Both word segmentation and POS tagging were carried out by way of a probabilistic trigram model. Rules for syllable demarkation were additionally used to reduce the number of candidates in computing tagging probabilities. Some typical problems in POS assignment are also formalized to resolve ambiguity.

Cite

CITATION STYLE

APA

Sornlertlamvanich, V., Takahashi, N., & Isahara, H. (1999). Building a Thai part-of-speech tagged corpus (ORCHID). Journal of the Acoustical Society of Japan (E) (English Translation of Nippon Onkyo Gakkaishi), 20(3), 189–198. https://doi.org/10.1250/ast.20.189

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free