We present a clustering algorithm for Arabic words sharing the same root. Root based clusters can substitute dictionaries in indexing for IR. Modifying Adamson and Boreham (1974), our Two-stage algorithm applies light stemming before calculating word pair similarity coefficients using techniques sensitive to Arabic morphology. Tests show a successful treatment of infixes and accurate clustering to up to 94.06% for unedited Arabic text samples, without the use of dictionaries.
CITATION STYLE
De Roeck, A. N., & Al-Fares, W. (2000). A morphologically sensitive clustering algorithm for identifying Arabic roots. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2000-October). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1075218.1075244
Mendeley helps you to discover research relevant for your work.