Abstract
Because text is the most common type of information representation, text processing and manipulation require recurring routines and functions. Every day, massive amounts of text are processed. Indeed, with the advent of artificial intelligence and new machine learning and deep learning enhancements, natural language processing has become a critical domain. PyArabic is a collection of modules that provide basic functionality for manipulating Arabic texts, phrases, words, numbers, and letters. It primarily provides preprocessing tools such as normalization, tokenization, diacritics removal, number conversion, transliteration, and so on. For years, researchers and developers who worked on machine learning algorithms for natural language processing have used the library for Arabic text preprocessing and cleaning. The library becomes more important for machine learning.
Cite
CITATION STYLE
Zerrouki, T. (2023). PyArabic: A Python package for Arabic text. Journal of Open Source Software, 8(84), 4886. https://doi.org/10.21105/joss.04886
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.