Mandarin Relata: A Dataset of Word Relations and Their Semantic Types

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

For both the training and evaluation of semantic distributional models, language datasets are needed that are both elaborate in their word level descriptors and readily intuitive to human judgment. The current paper introduces a dataset for Mandarin Chinese constructed through the combination of word relation pairs from two distinct sources: corpus extraction, and human elicitation. Our results show that while more word relation pairs were gained through the corpus extraction process, human elicited semantic neighbors were almost twice as likely to show agreement with human raters. The current methods created 4091 word relation pairs that span hypernymy, hyponymy, synonymy, antonymy, and meronymy alongside semantic type information. To date, this is the largest collection of human-rated word relation pairs in Mandarin Chinese.

Cite

CITATION STYLE

APA

Liu, H., Huang, C. R., & Hou, R. K. (2018). Mandarin Relata: A Dataset of Word Relations and Their Semantic Types. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10709 LNAI, pp. 336–340). Springer Verlag. https://doi.org/10.1007/978-3-319-73573-3_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free