Abstract
This paper evaluates normalization procedures of Persian text for a downstream NLP task - multiword expressions (MWEs) discovery. We discuss the challenges the Persian language poses for NLP and evaluate open-source tools that try to address these difficulties. The best-performing tool is later used in the main task - MWEs discovery. In order to discover MWEs, we use association measures and a subpart of the MirasText corpus. The results show that an F-score is 26% higher in the case of normalized input data.
Cite
CITATION STYLE
Marszalek-Kowalewska, K. (2021). The Impact of Text Normalization on Multiword Expressions Discovery in Persian. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 929–939). Incoma Ltd. https://doi.org/10.26615/978-954-452-072-4_106
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.