Knowledgenet: A benchmark dataset for knowledge base population

44Citations
Citations of this article
166Readers
Mendeley users who have this article in their library.

Abstract

KnowledgeNet is a benchmark dataset for the task of automatically populating a knowledge base (Wikidata) with facts expressed in natural language text on the web. KnowledgeNet provides text exhaustively annotated with facts, thus enabling the holistic end-to-end evaluation of knowledge base population systems as a whole, unlike previous benchmarks that are more suitable for the evaluation of individual subcomponents (e.g., entity linking, relation extraction). We discuss five baseline approaches, where the best approach achieves an F1 score of 0.50, significantly outperforming a traditional approach by 79% (0.28). However, our best baseline is far from reaching human performance (0.82), indicating our dataset is challenging. The KnowledgeNet dataset and baselines are available at https://github.com/diffbot/knowledge-net.

Cite

CITATION STYLE

APA

Mesquita, F., Cannaviccio, M., Schmidek, J., Mirza, P., & Barbosa, D. (2019). Knowledgenet: A benchmark dataset for knowledge base population. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 749–758). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1069

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free