The growth of electronically readable Arabic content available on the web has become a rich source from which to build new corpora or update the existing ones. The availability of such corpora will be beneficial for Arabic corpus linguistics, computational linguistics, and natural language processing. In this paper, we present ARARSS, a tool capable of automatically constructing and updating textual corpora benefiting from the Rich Site Summary (RSS) feeds. ARARSS is capable of collecting the texts in a properly categorized manner according to user needs, in addition to their metadata (for example, location, time, and topic) as provided by RSS sources. We used ARARSS to construct a modern standard Arabic corpus comprising 117,819 texts and more than 28 million words. ARARSS is an open source tool and freely available to download (http://corpus.kacst.edu.sa/more_info.jsp) along with the constructed corpus.
CITATION STYLE
Al-Thubaity, A., & Alhoshan, M. (2019). ARARSS: A System for Constructing and Updating Arabic Textual Resources. In Advances in Intelligent Systems and Computing (Vol. 845, pp. 261–269). Springer Verlag. https://doi.org/10.1007/978-3-319-99010-1_24
Mendeley helps you to discover research relevant for your work.