Language-agnostic relation extraction from wikipedia abstracts

Nicolas Heist; Heiko Paulheim

Conference ProceedingsOPEN ACCESS

Language-agnostic relation extraction from wikipedia abstracts

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10587 LNCS 383-399

DOI: 10.1007/978-3-319-68288-4_23

14Citations

18Readers

Abstract

Large-scale knowledge graphs, such as DBpedia, Wikidata, or YAGO, can be enhanced by relation extraction from text, using the data in the knowledge graph as training data, i.e., using distant supervision. While most existing approaches use language-specific methods (usually for English), we present a language-agnostic approach that exploits background knowledge from the graph instead of language-specific techniques and builds machine learning models only from language-independent features. We demonstrate the extraction of relations from Wikipedia abstracts, using the twelve largest language editions of Wikipedia. From those, we can extract 1.6M new relations in DBpedia at a level of precision of 95%, using a RandomForest classifier trained only on language-independent features. Furthermore, we show an exemplary geographical breakdown of the information extracted.

Cite

CITATION STYLE

APA

Heist, N., & Paulheim, H. (2017). Language-agnostic relation extraction from wikipedia abstracts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10587 LNCS, pp. 383–399). Springer Verlag. https://doi.org/10.1007/978-3-319-68288-4_23

Language-agnostic relation extraction from wikipedia abstracts

Abstract

Cite

Register to see more suggestions