ArabSis: Arabic Corpus Sentiment Analysis

4Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Despite rapid progress in natural language processing (NLP), the development of specialized resources for niche domains—critical for specialized applications like affective computing and emotionally intelligent AI—remains a persistent challenge. While benchmark datasets abound for general tasks, languages like Arabic and fields like multi-dimensional sentiment analysis beyond binary classification as positive or negative suffer from resource scarcity, limiting progress in human-centric applications. To address this gap, we present ArabSis: a novel Arabic corpus for multi-dimensional sentiment analysis across five categorical emotions (Joy, Sadness, Fear, Liking, Hatred). Our work introduces a reproducible framework for creating specialized corpora in low-resource languages, enabling future research in regressive dimensional sentiment analysis and other specialized NLP applications. The ArabSis corpus, developed through systematic data augmentation and human labelling, facilitates advanced analysis using traditional NLP techniques (TF-IDF, Bag of Words) and modern deep learning approaches. It also targets the universal Arabic language whereas previous research focuses on Arabic regardless of the dialect which make small nuances and inconsistencies among dialects unnoticeable and unfixable. We evaluate machine learning (ML) and deep learning (DL) models in one-vs-all classification tasks, demonstrating that ML models (e.g., SVMs, Random Forests) outperform DL counterparts on smaller datasets. An ensemble method combining top-performing models achieves 98.6% accuracy through score averaging and majority voting systems, though revealing inherent biases in ensemble voting mechanisms. The study provides a comprehensive pipeline encompassing data preprocessing, exploratory analysis, and model training, validated through 5-fold cross-validation, establishing a blueprint for developing specialized NLP resources, particularly for under-resourced languages.

Cite

CITATION STYLE

APA

Doughan, Z., Itani, S., & Itani, S. (2025). ArabSis: Arabic Corpus Sentiment Analysis. IEEE Access, 13, 81083–81095. https://doi.org/10.1109/ACCESS.2025.3567755

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free