Data Collection vs. Knowledge Graph Completion: What is Needed to Improve Coverage?

5Citations
Citations of this article
62Readers
Mendeley users who have this article in their library.

Abstract

This survey/position paper discusses ways to improve coverage of resources such as WordNet. Rapp estimated correlations, ρ, between corpus statistics and psycholinguistic norms. ρ improves with quantity (corpus size) and quality (balance). 1M words are enough for simple estimates (unigram frequencies), but at least 100M are required for pairs of words (word associations, edges). Knowledge Graph Completion (KGC) attempts to learn missing links in WN18. Unfortunately, WN18 is flawed with information leaking from train to test. More seriously, WN18 is based on SemCor (just 200k words) and dated (collected in 1960s). KGC cannot learn anything that happened since the 1960s, or associations requiring 100M words.

Cite

CITATION STYLE

APA

Church, K., & Bian, Y. (2021). Data Collection vs. Knowledge Graph Completion: What is Needed to Improve Coverage? In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 6210–6215). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.501

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free