Construction of an English-Uyghur WordNet Dataset

Kahaerjiang Abiderexiti; Maosong Sun

Conference Proceedings

Construction of an English-Uyghur WordNet Dataset

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11856 LNAI 382-393

DOI: 10.1007/978-3-030-32381-3_31

2Citations

3Readers

Get full text

Abstract

Automatically building semantic resources is essential to low resource-languages like Uyghur. However, Uyghur suffers from a lack of publicly available evaluation dataset for automatically building semantic resources like WordNet. To cope with this problem, first, we build the largest Uyghur-English and English-Uyghur dictionaries by exploiting many possible online and offline resources. Then by using Princeton WordNet (PWN) 3.0 and Contemporary Uyghur Detailed Dictionary (CUDD), we construct an English-Uyghur WordNet evaluation dataset which is publicly available (https://github.com/kaharjan/uywordnet ). In this dataset, more than 73,000 English synsets are mapped Uyghur automatically, in which over 20,000 are annotated manually. And the corresponding Uyghur words include definition and examples in Uyghur language context. We also propose a Synset Mapping based on Word Embeddings (SMWE) method. The experimental results on the dataset are promising.

Author supplied keywords

Cite

CITATION STYLE

APA

Abiderexiti, K., & Sun, M. (2019). Construction of an English-Uyghur WordNet Dataset. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11856 LNAI, pp. 382–393). Springer. https://doi.org/10.1007/978-3-030-32381-3_31

Construction of an English-Uyghur WordNet Dataset

Abstract

Author supplied keywords

Cite

Register to see more suggestions