Automatic text summarization is an essential tool to overcome the problem of information overload. So far this field has not been studied enough for Arabic language and currently only few related works are available. Arabic text summarization is faced with two main issues: how to extract semantic relationships between textual units and deal with redundancy. To overcome these problems, we propose in this paper a hybrid method to generate an extractive summary of Arabic documents. Our approach is based on a two-dimensional undirected and weighted graph with sentences as nodes and each pair of sentences are connected by two edges representing the statistical and semantic similarity measure. The statistical similarity measure builds on the content overlap between two sentences, while the semantic one is based upon semantic information extracted from Arabic WordNet (AWN) ontology. Then, the score of each sentence is computed by performing the ranking algorithm PageRank on the generated graph. Thereafter, the score of each sentence is performed by adding other statistical features of the text such as TF.ISF and sentence position. The final summary is built by selecting the top-ranking sentences. Finally, we deal with redundancy and information diversity issues by using an adapted maximal marginal relevance (MMR) method. Experimental results on EASC dataset show that our proposed approach outperforms some of existing Arabic summarization systems.
CITATION STYLE
Alami, N., El Adlouni, Y., En-Nahnahi, N., & Meknassi, M. (2018). Using statistical and semantic analysis for arabic text summarization. In Advances in Intelligent Systems and Computing (Vol. 640, pp. 35–50). Springer Verlag. https://doi.org/10.1007/978-3-319-64719-7_4
Mendeley helps you to discover research relevant for your work.