Mandarin Relata: A Dataset of Word Relations and Their Semantic Types

Hongchao Liu; Chu Ren Huang; Ren Kui Hou

Conference Proceedings

Mandarin Relata: A Dataset of Word Relations and Their Semantic Types

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10709 LNAI 336-340

DOI: 10.1007/978-3-319-73573-3_30

0Citations

2Readers

Get full text

Abstract

For both the training and evaluation of semantic distributional models, language datasets are needed that are both elaborate in their word level descriptors and readily intuitive to human judgment. The current paper introduces a dataset for Mandarin Chinese constructed through the combination of word relation pairs from two distinct sources: corpus extraction, and human elicitation. Our results show that while more word relation pairs were gained through the corpus extraction process, human elicited semantic neighbors were almost twice as likely to show agreement with human raters. The current methods created 4091 word relation pairs that span hypernymy, hyponymy, synonymy, antonymy, and meronymy alongside semantic type information. To date, this is the largest collection of human-rated word relation pairs in Mandarin Chinese.

Author supplied keywords

Cite

CITATION STYLE

APA

Liu, H., Huang, C. R., & Hou, R. K. (2018). Mandarin Relata: A Dataset of Word Relations and Their Semantic Types. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10709 LNAI, pp. 336–340). Springer Verlag. https://doi.org/10.1007/978-3-319-73573-3_30

Mandarin Relata: A Dataset of Word Relations and Their Semantic Types

Abstract

Author supplied keywords

Cite

Register to see more suggestions