For both the training and evaluation of semantic distributional models, language datasets are needed that are both elaborate in their word level descriptors and readily intuitive to human judgment. The current paper introduces a dataset for Mandarin Chinese constructed through the combination of word relation pairs from two distinct sources: corpus extraction, and human elicitation. Our results show that while more word relation pairs were gained through the corpus extraction process, human elicited semantic neighbors were almost twice as likely to show agreement with human raters. The current methods created 4091 word relation pairs that span hypernymy, hyponymy, synonymy, antonymy, and meronymy alongside semantic type information. To date, this is the largest collection of human-rated word relation pairs in Mandarin Chinese.
CITATION STYLE
Liu, H., Huang, C. R., & Hou, R. K. (2018). Mandarin Relata: A Dataset of Word Relations and Their Semantic Types. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10709 LNAI, pp. 336–340). Springer Verlag. https://doi.org/10.1007/978-3-319-73573-3_30
Mendeley helps you to discover research relevant for your work.