Abstract
Due to various factors that cause visual alterations in the collected facial images, gender classification based on image processing continues to be a performance challenge for classifier models. The Vision Transformer model is used in this study to suggest a technique for identifying a person,s gender from their face images. This study investigates how well a facial image-based model can distinguish between male and female genders. It also investigates the rarely discussed performance on the variation and complexity of data caused by differences in racial and age groups. We trained on the AFAD dataset and then carried out same-dataset and crossdataset evaluations, the latter of which considers the UTKFace dataset. From the experiments and analysis in the same-dataset evaluation, the highest validation accuracy of 0.9676 happens for the image of size 160 × 160 pixels with eight patches. In comparison, the highest testing accuracy of 0.9843 occurs for the image of size 224 × 224 pixels with 28 patches. Moreover, the experiments and analysis in the cross-dataset evaluation show that the model works optimally for the image size 224 × 224 pixels with 14 patches, with the value of the model,s accuracy, precision, recall, and F1-score being 0.8174, 0.8188, 0.8189, and 0.8189, respectively. Furthermore, the misclassification analysis shows that the model works optimally in classifying the gender of people between 21-70 years old. The findings of this study can serve as a baseline for conducting further analysis on the effectiveness of gender classifier models considering various physical factors.
Author supplied keywords
Cite
CITATION STYLE
Tahyudin, G. G., Sulistiyo, M. D., Arzaki, M., & Rachmawati, E. (2024). Classifying Gender Based on Face Images Using Vision Transformer. International Journal on Informatics Visualization, 8(1), 18–25. https://doi.org/10.62527/joiv.8.1.1923
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.