Data Collection vs. Knowledge Graph Completion: What is Needed to Improve Coverage?

Kenneth Church; Yuchen Bian

Conference ProceedingsOPEN ACCESS

Data Collection vs. Knowledge Graph Completion: What is Needed to Improve Coverage?

EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (2021) 6210-6215

DOI: 10.18653/v1/2021.emnlp-main.501

6Citations

67Readers

Abstract

This survey/position paper discusses ways to improve coverage of resources such as WordNet. Rapp estimated correlations, ρ, between corpus statistics and psycholinguistic norms. ρ improves with quantity (corpus size) and quality (balance). 1M words are enough for simple estimates (unigram frequencies), but at least 100M are required for pairs of words (word associations, edges). Knowledge Graph Completion (KGC) attempts to learn missing links in WN18. Unfortunately, WN18 is flawed with information leaking from train to test. More seriously, WN18 is based on SemCor (just 200k words) and dated (collected in 1960s). KGC cannot learn anything that happened since the 1960s, or associations requiring 100M words.

Cite

CITATION STYLE

APA

Church, K., & Bian, Y. (2021). Data Collection vs. Knowledge Graph Completion: What is Needed to Improve Coverage? In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 6210–6215). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.501

Data Collection vs. Knowledge Graph Completion: What is Needed to Improve Coverage?

Abstract

Cite

Register to see more suggestions