ARARSS: A System for Constructing and Updating Arabic Textual Resources

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The growth of electronically readable Arabic content available on the web has become a rich source from which to build new corpora or update the existing ones. The availability of such corpora will be beneficial for Arabic corpus linguistics, computational linguistics, and natural language processing. In this paper, we present ARARSS, a tool capable of automatically constructing and updating textual corpora benefiting from the Rich Site Summary (RSS) feeds. ARARSS is capable of collecting the texts in a properly categorized manner according to user needs, in addition to their metadata (for example, location, time, and topic) as provided by RSS sources. We used ARARSS to construct a modern standard Arabic corpus comprising 117,819 texts and more than 28 million words. ARARSS is an open source tool and freely available to download (http://corpus.kacst.edu.sa/more_info.jsp) along with the constructed corpus.

Cite

CITATION STYLE

APA

Al-Thubaity, A., & Alhoshan, M. (2019). ARARSS: A System for Constructing and Updating Arabic Textual Resources. In Advances in Intelligent Systems and Computing (Vol. 845, pp. 261–269). Springer Verlag. https://doi.org/10.1007/978-3-319-99010-1_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free