A Quantitative Analysis of Labeling Issues in the CelebA Dataset

Bryson Lingenfelter; Sara R. Davis; Emily M. Hand

Conference Proceedings

A Quantitative Analysis of Labeling Issues in the CelebA Dataset

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13598 LNCS 129-141

DOI: 10.1007/978-3-031-20713-6_10

2Citations

1Readers

Get full text

Abstract

Facial attribute prediction is a facial analysis task that describes images using natural language features. While many works have attempted to optimize prediction accuracy on CelebA, the largest and most widely used facial attribute dataset, few works have analyzed the accuracy of the dataset’s attribute labels. In this paper, we seek to do just that. Despite the popularity of CelebA, we find through quantitative analysis that there are widespread inconsistencies and inaccuracies in its attribute labeling. We estimate that at least one third of all images have one or more incorrect labels, and reliable predictions are impossible for several attributes due to inconsistent labeling. Our results demonstrate that classifiers struggle with many CelebA attributes not because they are difficult to predict, but because they are poorly labeled. This indicates that the CelebA dataset is flawed as a facial analysis tool and may not be suitable as a generic evaluation benchmark for imbalanced classification.

Cite

CITATION STYLE

APA

Lingenfelter, B., Davis, S. R., & Hand, E. M. (2022). A Quantitative Analysis of Labeling Issues in the CelebA Dataset. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13598 LNCS, pp. 129–141). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20713-6_10

A Quantitative Analysis of Labeling Issues in the CelebA Dataset

Abstract

Cite

Register to see more suggestions