While large Knowledge Graphs (KGs) already cover a broad range of domains to an extent sufficient for general use, they typically lack emerging entities that are just starting to attract the public interest. This disqualifies such KGs for tasks like entity-based media monitoring, since a large portion of news inherently covers entities that have not been noted by the public before. Such entities are unlinkable, which ultimately means, they cannot be monitored in media streams. This is the first paper that thoroughly investigates all types of challenges that arise from out-of-KG entities for entity linking tasks. By large-scale analytics of news streams we quantify the importance of each challenge for realworld applications. We then propose a machine learning approach which tackles the most frequent but least investigated challenge, i.e., when entities are missing in the KG and cannot be considered by entity linking systems. We construct a publicly available benchmark data set based on English news articles and editing behavior on Wikipedia. Our experiments show that predicting whether an entity will be added toWikipedia is challenging. However, we can reliably identify emerging entities that could be added to the KG according to Wikipedia’s own notability criteria.
CITATION STYLE
Färber, M., Rettinger, A., & El Asmar, B. (2016). On emerging entity detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10024 LNAI, pp. 223–238). Springer Verlag. https://doi.org/10.1007/978-3-319-49004-5_15
Mendeley helps you to discover research relevant for your work.