Comparison of CNN and CNN-LSTM Performance in Facial Expression Classification Based on FER2013 Dataset

  • Savitri P
  • Permana A
  • Puspa Dewi N
N/ACitations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Although facial expression recognition (FER) using deep learning has received increasing attention in prior studies, research specifically addressing the comparative effectiveness of sequential modeling on static image data remains limited. This study aims to evaluate and compare the performance of a pure Convolutional Neural Network (CNN) model and a hybrid CNN–Long Short-Term Memory (CNN-LSTM) model in classifying seven basic facial expressions using the static FER2013 dataset. A quantitative experimental approach with a comparative study design was employed, utilizing the publicly available FER2013 dataset and two custom deep learning architectures. Data were obtained from FER2013 and model performance was evaluated using accuracy, precision, recall, F1-score, and AUC-ROC metrics. The findings indicate that the pure CNN model significantly outperformed the CNN-LSTM model, achieving a testing accuracy of 63.25% compared to 46.82% for the hybrid model; the CNN provided strong discrimination for visually distinct classes but continued to struggle with visually similar expressions. These results contribute to the theoretical development of deep learning architecture selection and expand understanding of the application of sequence models to static data. The study concludes that data characteristics (static versus temporal) play a crucial role in determining model effectiveness, and that for static datasets such as FER2013, a pure CNN constitutes the more appropriate choice. The implications of this research include theoretical contributions to the growing literature on deep learning-based FER and practical recommendations for developers to prioritize CNN architectures for non-temporal image classification tasks, while also highlighting opportunities for future research on transfer learning and attention mechanisms to better capture subtle expression nuances.

Cite

CITATION STYLE

APA

Savitri, P. A. A., Permana, A. A. J., & Puspa Dewi, N. P. N. (2026). Comparison of CNN and CNN-LSTM Performance in Facial Expression Classification Based on FER2013 Dataset. Asian Journal of Science, Technology, Engineering, and Art, 4(1), 1–18. https://doi.org/10.58578/ajstea.v4i1.8252

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free