ID10M: Idiom Identification in 10 Languages

11Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.

Abstract

Idioms are phrases which present a figurative meaning that cannot be (completely) derived by looking at the meaning of their individual components. Identifying and understanding idioms in context is a crucial goal and a key challenge in a wide range of Natural Language Understanding tasks. Although efforts have been undertaken in this direction, the automatic identification and understanding of idioms is still a largely underinvestigated area, especially when operating in a multilingual scenario. In this paper, we address such limitations and put forward several new contributions: we propose a novel multilingual Transformer-based system for the identification of idioms; we produce a highquality automatically-created training dataset in 10 languages, along with a novel manuallycurated evaluation benchmark; finally, we carry out a thorough performance analysis and release our evaluation suite at https:// github.com/Babelscape/ID10M.

Cite

CITATION STYLE

APA

Tedeschi, S., Martelli, F., & Navigli, R. (2022). ID10M: Idiom Identification in 10 Languages. In Findings of the Association for Computational Linguistics: NAACL 2022 - Findings (pp. 2715–2726). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.findings-naacl.208

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free