Abstract
The current release of the ODIN (Online Database of Interlinear Text) database contains over 150,000 linguistic examples, from nearly 1,500 languages, extracted from PDFs found on the web, representing a significant source of data for language research, particularly for low-resource languages. Errors introduced during PDF-totext conversion or poorly formatted examples can make the task of automatically analyzing the data more difficult, so we aim to clean and normalize the examples in order to maximize accuracy during analysis. In this paper we describe a system that allows users to automatically and manually correct errors in the source data in order to get the best possible analysis of the data. We also describe a RESTful service for managing collections of linguistic examples on the web. All software is distributed under an open-source license.
Cite
CITATION STYLE
Georgi, R., Goodman, M. W., & Xia, F. (2016). A web-framework for ODIN annotation. In 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - System Demonstrations (pp. 31–36). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p16-4006
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.