Topic detection and multi-word terms extraction for Arabic Unvowelized documents

Rim Koulali; Abdelouafi Meziane

Conference Proceedings

Topic detection and multi-word terms extraction for Arabic Unvowelized documents

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 7097 LNCS 614-623

DOI: 10.1007/978-3-642-25631-8_56

3Citations

5Readers

Get full text

Abstract

This paper focuses on Topic Detection (TD) for Arabic Unvowelized documents. Our topic detection system was implemented using two different metrics: adapted TF-IDF and Jaccard indicator. The experiments were conducted while studying the impact of working with stems or roots of words, all the words or nouns only. To enhance the TD system we developed The MWTs extraction prototype to generate MWTs vocabularies. To the best of our knowledge MWTs vocabulary has never been used in arabic documents topic's detection. In this paper we investigate the impact of such use on the quality of topic detection. We used the standard measures: Recall, Precision and F-measure to evaluate the performance of the realized systems on Wattan; an Arabic newspaper corpus. © 2011 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Koulali, R., & Meziane, A. (2011). Topic detection and multi-word terms extraction for Arabic Unvowelized documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7097 LNCS, pp. 614–623). https://doi.org/10.1007/978-3-642-25631-8_56

Topic detection and multi-word terms extraction for Arabic Unvowelized documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions