Building English-Vietnamese Named Entity Corpus with Aligned Bilingual News Articles

7Citations
Citations of this article
65Readers
Mendeley users who have this article in their library.

Abstract

Named entity recognition aims to classify words in a document into pre-defined target entity classes. It is now considered to be fundamental for many natural language processing tasks such as information retrieval, machine translation, information extraction and question answering. This paper presents a workflow to build an English-Vietnamese named entity corpus from an aligned bilingual corpus. The workflow is based on a state of the art named entity recognition tool to identify English named entities and map them into Vietnamese text. The paper also presents a detailed discussion about several mapping errors and differences between English and Vietnamese sentences that affect this task.

Cite

CITATION STYLE

APA

Ngo, Q. H., Dien, D., & Winiwarter, W. (2014). Building English-Vietnamese Named Entity Corpus with Aligned Bilingual News Articles. In Proceedings of the Conference - 5th Workshop on South and Southeast Asian NLP, WSSANLP 2014 - co-located with the 25th International Conference on Computational Linguistics, COLING 2014 (pp. 85–93). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-5512

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free