Improving Accessibility to Arabic ETDs Using Automatic Classification

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Electronic Theses and Dissertations (ETDs) are documents rich in research information that provide many benefits to students and future generations of scholars in various disciplines. Therefore, research is taking place to extract data from ETDs and make them more accessible. However, much of the related research involved ETDs in the English language, while Arabic ETDs remain an untapped source of data, although the number of Arabic ETDs available digitally is growing. Therefore, the need to make them more browsable and accessible increases. Some ways to achieve this need include data annotation, indexing, translation, and classification. As the size of the data increases, manual subject classification becomes less feasible. Accordingly, automatic subject classification becomes essential for the searchability and management of data. There are two main roadblocks to performing automatic subject classification of Arabic ETDs. The first is the lack of a large public corpus of Arabic ETDs for training purposes, while the second is the Arabic language’s linguistic complexity, especially in academic documents. This research aims to collect key metadata of Arabic ETDs, and apply different automatic subject classification methodologies. The first goal is aided by scraping data from the AskZad Digital Library. The second goal is achieved by exploring different machine learning and deep learning techniques. The experiments’ results show that deep learning using pretrained language models yielded the highest accuracy of approximately 0.83, while classical machine learning techniques yielded approximately 0.41 and 0.70 for multiclass classification one-vs-all classification respectively. This indicates that using pretrained language models assists in understanding languages which is essential for the classification of text.

Cite

CITATION STYLE

APA

Abdelrahman, E., & Fox, E. (2022). Improving Accessibility to Arabic ETDs Using Automatic Classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13541 LNCS, pp. 230–242). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16802-4_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free