The lexical substitution task aims at finding suitable replacements for words in context. It has proved to be useful in several areas, such as word sense induction and text simplification, as well as in more practical applications such as writing-assistant tools. However, the paucity of annotated data has forced researchers to apply mainly unsupervised approaches, limiting the applicability of large pre-trained models and thus hampering the potential benefits of supervised approaches to the task. In this paper, we mitigate this issue by proposing ALaSca, a novel approach to automatically creating large-scale datasets for English lexical substitution. ALaSca allows examples to be produced for potentially any word in a language vocabulary and to cover most of the meanings it lists. Thanks to this, we can unleash the full potential of neural architectures and finetune them on the lexical substitution task. Indeed, when using our data, a transformer-based model performs substantially better than when using manually-annotated data only. We release ALaSca at https://sapienzanlp.github.io/alasca/.
CITATION STYLE
Lacerra, C., Pasini, T., Tripodi, R., & Navigli, R. (2021). ALaSca: An Automated Approach for Large-Scale Lexical Substitution. In IJCAI International Joint Conference on Artificial Intelligence (pp. 3836–3842). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2021/528
Mendeley helps you to discover research relevant for your work.