Multi-word expressions annotations effect in document classification task

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Document classification is a necessary task for most Natural Language Processing tools since it classifies documents content in a helpful and meaningful way. The main concern in this paper is to investigate the impact of using multi-words for text representation on the performances of text classification task. Two text classification strategies are proposed to observe the robustness of each of them. First, we will deal with the literature review of existing linguistic resources in Arabic language. Secondly, we will present a classification method that is based on domain candidate simple terms. These terms are automatically extracted from multiple specialized corpora depending on their appearance frequency. Then, we will present a detailed description of a classification method based on multi-word expressions dictionary. CompounDic, an Arabic multi-word expressions dictionary, will be used to automatically annotate multi-word expressions and variations in text. Finally, we carried out a series of experiments on classifying specialized text based on simple words and multi-word expressions for comparison purposes. Our experiments show that the use of multi-word expressions annotations enhances the text classification results.

Cite

CITATION STYLE

APA

Najar, D., Mesfar, S., & Ghezela, H. B. (2018). Multi-word expressions annotations effect in document classification task. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10859 LNCS, pp. 238–246). Springer Verlag. https://doi.org/10.1007/978-3-319-91947-8_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free