Effective methods of the detection of multiword expressions are important for many technologies related to Natural Language Processing. Most contemporary methods are based on the sequence labeling scheme, while traditional methods use statistical measures. In our approach, we want to integrate the concepts of those two approaches. In this paper, we present a novel weakly supervised multiword expressions extraction method which focuses on their behaviour in various contexts. Our method uses a lexicon of Polish multiword units as the reference knowledge base and leverages neural language modelling with deep learning architectures. In our approach, we do not need a corpus annotated specifically for the task. The only required components are: a lexicon of multiword units, a large corpus, and a general contextual embeddings model. Compared to the method based on non-contextual embeddings, we obtain gains of 15% points of the macro F1-score for both classes and 30% points of the F1-score for the incorrect multiword expressions. The proposed method can be quite easily applied to other languages.
CITATION STYLE
Piasecki, M., & Kanclerz, K. (2022). Is Context All You Need? Non-contextual vs Contextual Multiword Expressions Detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13350 LNCS, pp. 248–261). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-08751-6_18
Mendeley helps you to discover research relevant for your work.