Using wiktionary to improve lexical disambiguation in multiple languages

4Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper proposes using linguistic knowledge from Wiktionary to improve lexical disambiguation in multiple languages, focusing on part-of-speech tagging in selected languages with various characteristics including English, Vietnamese, and Korean. Dictionaries and subsumption networks are first automatically extracted from Wiktionary. These linguistic resources are then used to enrich the feature set of training examples. A first-order discriminative model is learned on training data using Hidden Markov-Support Vector Machines. The proposed method is competitive with related contemporary works in the three languages. In English, our tagger achieves 96.37% token accuracy on the Brown corpus, with an error reduction of 2.74% over the baseline. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Nguyen, K. H., & Ock, C. Y. (2012). Using wiktionary to improve lexical disambiguation in multiple languages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7181 LNCS, pp. 238–248). https://doi.org/10.1007/978-3-642-28604-9_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free