The Generative Power of Arabic Morphology and Implications: A case for pattern orientation in arabic corpus annotation and a proposed pattern ontology

Mohammed A. El-Affendi

Conference Proceedings

The Generative Power of Arabic Morphology and Implications: A case for pattern orientation in arabic corpus annotation and a proposed pattern ontology

El-Affendi M

Advances in Intelligent Systems and Computing (2018) 753 36-45

DOI: 10.1007/978-3-319-78753-4_4

1Citations

6Readers

Get full text

Abstract

Most of current Arabic morphological analyzer use complex rules to handle the idiosyncrasies of certain Arabic word classes and special cases. The question that arises: is it feasible to design a pattern-oriented morphological analyzer that streamlines the process and avoid the use of complex rules? To answer this question a detailed study has been conducted using a small representative Arabic corpus. The study revealed that most of the words in the language can be generated using a limited number of patterns, morphemes and particles. Inflected and derivational words can be generated through combinations of roots and patterns. The total number of roots is around 10,000 while the total number of morphological patterns is below 1000. The total number of particles is around 325. Around 70% of words in the experimental corpus are templatic (based on morphological patterns). Although, the number of identified patterns reached 943, only a small subset of these is active. For example, the top 12 patterns in the identified list accounted for more than 50% of the generated templatic words. Although the total number of roots is around 10,000 the number of active roots is 3,461. Particles and similar morphemes account for around 30% of the text in the experimental corpus. These features greatly simplify the development of NLP applications such as spelling correctors, normalizers, lemmatizes and higher-level applications.

Cite

CITATION STYLE

APA

El-Affendi, M. A. (2018). The Generative Power of Arabic Morphology and Implications: A case for pattern orientation in arabic corpus annotation and a proposed pattern ontology. In Advances in Intelligent Systems and Computing (Vol. 753, pp. 36–45). Springer Verlag. https://doi.org/10.1007/978-3-319-78753-4_4

The Generative Power of Arabic Morphology and Implications: A case for pattern orientation in arabic corpus annotation and a proposed pattern ontology

Abstract

Cite

Register to see more suggestions