Entity Extraction from Wikipedia List Pages

Nicolas Heist; Heiko Paulheim

Conference ProceedingsOPEN ACCESS

Entity Extraction from Wikipedia List Pages

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12123 LNCS 327-342

DOI: 10.1007/978-3-030-49461-2_19

8Citations

7Readers

Abstract

When it comes to factual knowledge about a wide range of domains, Wikipedia is often the prime source of information on the web. DBpedia and YAGO, as large cross-domain knowledge graphs, encode a subset of that knowledge by creating an entity for each page in Wikipedia, and connecting them through edges. It is well known, however, that Wikipedia-based knowledge graphs are far from complete. Especially, as Wikipedia’s policies permit pages about subjects only if they have a certain popularity, such graphs tend to lack information about less well-known entities. Information about these entities is oftentimes available in the encyclopedia, but not represented as an individual page. In this paper, we present a two-phased approach for the extraction of entities from Wikipedia’s list pages, which have proven to serve as a valuable source of information. In the first phase, we build a large taxonomy from categories and list pages with DBpedia as a backbone. With distant supervision, we extract training data for the identification of new entities in list pages that we use in the second phase to train a classification model. With this approach we extract over 700k new entities and extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.

Author supplied keywords

Cite

CITATION STYLE

APA

Heist, N., & Paulheim, H. (2020). Entity Extraction from Wikipedia List Pages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12123 LNCS, pp. 327–342). Springer. https://doi.org/10.1007/978-3-030-49461-2_19

Entity Extraction from Wikipedia List Pages

Abstract

Author supplied keywords

Cite

Register to see more suggestions