The task of coreference resolution requires people or systems to decide when two referring expressions refer to the ‘same’ entity or event. In real text, this is often a difficult decision because identity is never adequately defined, leading to contradictory treatment of cases in previous work. This paper introduces the concept of ‘near-identity’, a middle ground category between identity and non-identity, to handle such cases systematically. We present a typology of Near-Identity Relations (NIDENT) that includes fifteen types—grouped under four main families—that capture a wide range of ways in which (near-)coreference relations hold between discourse entities. We validate the theoretical model by annotating a small sample of real data and showing that inter-annotator agreement is high enough for stability (K= 0.58, and up to K= 0.65 and K= 0.84 when leaving out one and two outliers, respectively). This work enables subsequent creation of the first internally consistent language resource of this type through larger annotation efforts.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below