Indonesia Language Sphere: An ecosystem for dictionary development for low-resource languages

10Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

There are more than 7000 languages around the world. However, 95 % of the world population speak only 5 % of them, at most 400 languages. More than half of them have fewer than 10,000 speakers. In 2010, UNESCO released a list of 2,464 endangered languages. In Indonesia, 144 languages are endangered. To preserve and increase the use of those languages, we started the Indonesia Language Sphere project. The purpose of this project is to develop comprehensive sets of bilingual dictionaries for Indonesian ethnic languages. To this end, we propose a generalized bilingual lexicon induction method that combines pairs of existing dictionaries. Furthermore, to reduce the total cost of bilingual dictionary creation, we combine the machine and manual creation processes and construct a planner that optimizes creation orders. This paper introduces the proposed methods and reports a preliminary experiment result focusing on Indonesian, Malay, Javanese, Sundanese, and Minangkabau.

Cite

CITATION STYLE

APA

Murakami, Y. (2019). Indonesia Language Sphere: An ecosystem for dictionary development for low-resource languages. In Journal of Physics: Conference Series (Vol. 1192). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1192/1/012001

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free