CRF models for tamil part of speech tagging and chunking

S. Lakshmana Pandian; T. V. Geetha

Conference Proceedings

CRF models for tamil part of speech tagging and chunking

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5459 LNAI 11-22

DOI: 10.1007/978-3-642-00831-3_2

19Citations

13Readers

Get full text

Abstract

Conditional random fields (CRFs) is a framework for building probabilistic models to segment and label sequence data. CRFs offer several advantages over hidden Markov models (HMMs) and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. CRFs also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. In this paper we propose the Language Models developed for Part Of Speech (POS) tagging and chunking using CRFs for Tamil. The Language models are designed based on morphological information. The CRF based POS tagger has an accuracy of about 89.18%, for Tamil and the chunking process performs at an accuracy of 84.25% for the same language. © 2009 Springer Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Pandian, S. L., & Geetha, T. V. (2009). CRF models for tamil part of speech tagging and chunking. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5459 LNAI, pp. 11–22). https://doi.org/10.1007/978-3-642-00831-3_2

CRF models for tamil part of speech tagging and chunking

Abstract

Author supplied keywords

Cite

Register to see more suggestions