Existing evaluations of entity linking systems often say little about how the system is going to perform for a particular application. There are two fundamental reasons for this. One is that many evaluations only use aggregate measures (like precision, recall, and F1 score), without a detailed error analysis or a closer look at the results. The other is that all of the widely used benchmarks have strong biases and artifacts, in particular: a strong focus on named entities, an unclear or missing specification of what else counts as an entity mention, poor handling of ambiguities, and an over- or underrepresentation of certain kinds of entities. We provide a more meaningful and fair in-depth evaluation of a variety of existing end-to-end entity linkers. We characterize their strengths and weaknesses and also report on reproducibility aspects. The detailed results of our evaluation can be inspected under https://elevant.cs.uni-freiburg.de/emnlp2023. Our evaluation is based on several widely used benchmarks, which exhibit the problems mentioned above to various degrees, as well as on two new benchmarks, which address the problems mentioned above. The new benchmarks can be found under https://github.com/ad-freiburg/fair-entitylinking-benchmarks.
CITATION STYLE
Bast, H., Hertel, M., & Prange, N. (2023). A Fair and In-Depth Evaluation of Existing End-to-End Entity Linking Systems. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 6659–6672). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.411
Mendeley helps you to discover research relevant for your work.