This survey/position paper discusses ways to improve coverage of resources such as WordNet. Rapp estimated correlations, ρ, between corpus statistics and psycholinguistic norms. ρ improves with quantity (corpus size) and quality (balance). 1M words are enough for simple estimates (unigram frequencies), but at least 100M are required for pairs of words (word associations, edges). Knowledge Graph Completion (KGC) attempts to learn missing links in WN18. Unfortunately, WN18 is flawed with information leaking from train to test. More seriously, WN18 is based on SemCor (just 200k words) and dated (collected in 1960s). KGC cannot learn anything that happened since the 1960s, or associations requiring 100M words.
CITATION STYLE
Church, K., & Bian, Y. (2021). Data Collection vs. Knowledge Graph Completion: What is Needed to Improve Coverage? In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 6210–6215). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.501
Mendeley helps you to discover research relevant for your work.