ID10M: Idiom Identification in 10 Languages

Simone Tedeschi; Federico Martelli; Roberto Navigli

Conference ProceedingsOPEN ACCESS

ID10M: Idiom Identification in 10 Languages

Findings of the Association for Computational Linguistics: NAACL 2022 - Findings (2022) 2715-2726

DOI: 10.18653/v1/2022.findings-naacl.208

11Citations

32Readers

Abstract

Idioms are phrases which present a figurative meaning that cannot be (completely) derived by looking at the meaning of their individual components. Identifying and understanding idioms in context is a crucial goal and a key challenge in a wide range of Natural Language Understanding tasks. Although efforts have been undertaken in this direction, the automatic identification and understanding of idioms is still a largely underinvestigated area, especially when operating in a multilingual scenario. In this paper, we address such limitations and put forward several new contributions: we propose a novel multilingual Transformer-based system for the identification of idioms; we produce a highquality automatically-created training dataset in 10 languages, along with a novel manuallycurated evaluation benchmark; finally, we carry out a thorough performance analysis and release our evaluation suite at https:// github.com/Babelscape/ID10M.

Cite

CITATION STYLE

APA

Tedeschi, S., Martelli, F., & Navigli, R. (2022). ID10M: Idiom Identification in 10 Languages. In Findings of the Association for Computational Linguistics: NAACL 2022 - Findings (pp. 2715–2726). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-naacl.208

ID10M: Idiom Identification in 10 Languages

Abstract

Cite

Register to see more suggestions