Benchmarking scalable methods for streaming cross document entity coreference

Robert L. Logan; Andrew McCallum; Sameer Singh; Daniel Bikel

Conference ProceedingsOPEN ACCESS

Benchmarking scalable methods for streaming cross document entity coreference

ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2021) 4717-4731

DOI: 10.18653/v1/2021.acl-long.364

7Citations

60Readers

Abstract

Streaming cross document entity coreference (CDC) systems disambiguate mentions of named entities in a scalable manner via incremental clustering. Unlike other approaches for named entity disambiguation (e.g., entity linking), streaming CDC allows for the disambiguation of entities that are unknown at inference time. Thus, it is well-suited for processing streams of data where new entities are frequently introduced. Despite these benefits, this task is currently difficult to study, as existing approaches are either evaluated on datasets that are no longer available, or omit other crucial details needed to ensure fair comparison. In this work, we address this issue by compiling a large benchmark adapted from existing free datasets, and performing a comprehensive evaluation of a number of novel and existing baseline models. We investigate: how to best encode mentions, which clustering algorithms are most effective for grouping mentions, how models transfer to different domains, and how bounding the number of mentions tracked during inference impacts performance. Our results show that the relative performance of neural and feature-based mention encoders varies across different domains, and in most cases the best performance is achieved using a combination of both approaches. We also find that performance is minimally impacted by limiting the number of tracked mentions.

Cite

CITATION STYLE

APA

Logan, R. L., McCallum, A., Singh, S., & Bikel, D. (2021). Benchmarking scalable methods for streaming cross document entity coreference. In ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 4717–4731). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.acl-long.364

Benchmarking scalable methods for streaming cross document entity coreference

Abstract

Cite

Register to see more suggestions