Indexing unstructured documents aims to build a list of words, or concepts, which will simplify the exploration of their exploration later on. The most used model for text modeling is the Vector Space Model. In spite of the simplicity of this model in its implementation and its wide use in different researches in the field of text mining and information retrieval, it has an important limit, which is ignoring the semantic relation between the different textual units, by considering them as independent. However, there is a more suitable technique in Data Mining to highlight the semantic linkage between text units, which is the graph-based representation. A graph can easily be adapted to the textual data by representing words as a vertex and the relation between them as edges. In this work, we have introduced the graph based modeling of textual document. Thus, we conducted a study about the impact of the choice of the semantic relation between the text units on the indexation of documents. We have validated our results through classification results.
CITATION STYLE
El Bazzi, M. S., Mammass, D., Zaki, T., & Ennaji, A. (2018). Graph-based text modeling: Considering mathematical semantic linking to improve the indexation of Arabic documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10884 LNCS, pp. 138–144). Springer Verlag. https://doi.org/10.1007/978-3-319-94211-7_16
Mendeley helps you to discover research relevant for your work.