Multimodal Detection of Emotional and Cognitive States in E-Learning Through Deep Fusion of Visual and Textual Data with NLP

Qamar El Maazouzi; Asmaa Retbi

Journal ArticleOPEN ACCESS

Multimodal Detection of Emotional and Cognitive States in E-Learning Through Deep Fusion of Visual and Textual Data with NLP

Computers (2025) 14(8)

DOI: 10.3390/computers14080314

4Citations

52Readers

Abstract

In distance learning environments, learner engagement directly impacts attention, motivation, and academic performance. Signs of fatigue, negative affect, or critical remarks can warn of growing disengagement and potential dropout. However, most existing approaches rely on a single modality, visual or text-based, without providing a general view of learners’ cognitive and affective states. We propose a multimodal system that integrates three complementary analyzes: (1) a CNN-LSTM model augmented with warning signs such as PERCLOS and yawning frequency for fatigue detection, (2) facial emotion recognition by EmoNet and an LSTM to handle temporal dynamics, and (3) sentiment analysis of feedback by a fine-tuned BERT model. It was evaluated on three public benchmarks: DAiSEE for fatigue, AffectNet for emotion, and MOOC Review (Coursera) for sentiment analysis. The results show a precision of 88.5% for fatigue detection, 70% for emotion detection, and 91.5% for sentiment analysis. Aggregating these cues enables an accurate identification of disengagement periods and triggers individualized pedagogical interventions. These results, although based on independently sourced datasets, demonstrate the feasibility of an integrated approach to detecting disengagement and open the door to emotionally intelligent learning systems with potential for future work in real-time content personalization and adaptive learning assistance.

Author supplied keywords

Cite

CITATION STYLE

APA

El Maazouzi, Q., & Retbi, A. (2025). Multimodal Detection of Emotional and Cognitive States in E-Learning Through Deep Fusion of Visual and Textual Data with NLP. Computers, 14(8). https://doi.org/10.3390/computers14080314

Multimodal Detection of Emotional and Cognitive States in E-Learning Through Deep Fusion of Visual and Textual Data with NLP

Abstract

Author supplied keywords

Cite

Register to see more suggestions