We present two novel datasets for the lowresource language Vietnamese to assess models of semantic similarity: ViCon comprises pairs of synonyms and antonyms across word classes, thus offering data to distinguish between similarity and dissimilarity. ViSim-400 provides degrees of similarity across five semantic relations, as rated by human judges. The two datasets are verified through standard co-occurrence and neural network models, showing results comparable to the respective English datasets.
CITATION STYLE
Nguyen, K. A., Im Walde, S. S., & Vu, N. T. (2018). Introducing two Vietnamese datasets for evaluating semantic models of (Dis-)similarity and relatedness. In NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference (Vol. 2, pp. 199–205). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/n18-2032
Mendeley helps you to discover research relevant for your work.