A Quantitative Analysis of Labeling Issues in the CelebA Dataset

2Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Facial attribute prediction is a facial analysis task that describes images using natural language features. While many works have attempted to optimize prediction accuracy on CelebA, the largest and most widely used facial attribute dataset, few works have analyzed the accuracy of the dataset’s attribute labels. In this paper, we seek to do just that. Despite the popularity of CelebA, we find through quantitative analysis that there are widespread inconsistencies and inaccuracies in its attribute labeling. We estimate that at least one third of all images have one or more incorrect labels, and reliable predictions are impossible for several attributes due to inconsistent labeling. Our results demonstrate that classifiers struggle with many CelebA attributes not because they are difficult to predict, but because they are poorly labeled. This indicates that the CelebA dataset is flawed as a facial analysis tool and may not be suitable as a generic evaluation benchmark for imbalanced classification.

Cite

CITATION STYLE

APA

Lingenfelter, B., Davis, S. R., & Hand, E. M. (2022). A Quantitative Analysis of Labeling Issues in the CelebA Dataset. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13598 LNCS, pp. 129–141). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20713-6_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free