The Impact of Text Normalization on Multiword Expressions Discovery in Persian

1Citations
Citations of this article
41Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper evaluates normalization procedures of Persian text for a downstream NLP task - multiword expressions (MWEs) discovery. We discuss the challenges the Persian language poses for NLP and evaluate open-source tools that try to address these difficulties. The best-performing tool is later used in the main task - MWEs discovery. In order to discover MWEs, we use association measures and a subpart of the MirasText corpus. The results show that an F-score is 26% higher in the case of normalized input data.

Cite

CITATION STYLE

APA

Marszalek-Kowalewska, K. (2021). The Impact of Text Normalization on Multiword Expressions Discovery in Persian. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 929–939). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_106

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free