A perceptual metric for photo retouching
- ISSN: 00278424
- DOI: 10.1073/pnas.1110747108
- PubMed: 22123980
Abstract
In recent years, advertisers and magazine editors have been widely criticized for taking digital photo retouching to an extreme. Impossibly thin, tall, and wrinkle- and blemish-free models are routinely splashed onto billboards, advertisements, and magazine covers. The ubiquity of these unrealistic and highly idealized images has been linked to eating disorders and body image dissatisfaction in men, women, and children. In response, several countries have considered legislating the labeling of retouched photos. We describe a quantitative and perceptually meaningful metric of photo retouching. Photographs are rated on the degree to which they have been digitally altered by explicitly modeling and estimating geometric and photometric changes. This metric correlates well with perceptual judgments of photo retouching and can be used to objectively judge by how much a retouched photo has strayed from reality.
A perceptual metric for photo retouching
Eric Kee and Hany Farid1
Department of Computer Science, Dartmouth College, Hanover, NH 03755
Edited by Brian A. Wandell, Stanford University, Stanford, CA, and approved October 19, 2011 (received for review July 5, 2011)
In recent years, advertisers andmagazine editors have beenwidely
criticized for taking digital photo retouching to an extreme. Impos-
sibly thin, tall, and wrinkle- and blemish-free models are routinely
splashed onto billboards, advertisements, and magazine covers.
The ubiquity of these unrealistic and highly idealized images has
been linked to eating disorders and body image dissatisfaction
in men, women, and children. In response, several countries have
considered legislating the labeling of retouched photos. We de-
scribe a quantitative and perceptually meaningful metric of photo
retouching. Photographs are rated on the degree to which they
have been digitally altered by explicitly modeling and estimating
geometric and photometric changes. This metric correlates well
with perceptual judgments of photo retouching and can be used
to objectively judge by how much a retouched photo has strayed
from reality.
photo manipulation ∣ photo analysis
Advertisers and fashion and fitness magazines have alwaysbeen in the business of creating a fantasy of sorts for their
readers. Magazine covers and advertisements routinely depict
impossibly beautiful and flawless models with perfect physiques.
These photos, however, are often the result of digital photo
retouching. Shown in Fig. 1 are three recent examples of photo
retouching in which the models were digitally altered*, in some
cases almost beyond recognition.
Retouched photos are ubiquitous and have created an idea-
lized and unrealistic representation of physical beauty. A signifi-
cant literature has established a link between these images and
men’s and women’s satisfaction with their physical appearance
(1–8). Such concerns for public health has led the American Med-
ical Association (AMA) to recently adopt a policy to “discourage
the altering of photographs in a manner that could promote unrea-
listic expectations of appropriate body image.Ӡ Concern for public
health and for the general issue of truth in advertising has also led
the United Kingdom to consider legislation that would require
digitally altered photos to be labeled.‡ Perhaps not surprisingly,
advertisers and publishers have resisted any such legislation.
A rating system that simply labels an image as digitally altered
or not would have limited efficacy because it would not distin-
guish between common modifications such as cropping and color
adjustment and modifications that dramatically alter a person’s
appearance. We propose that the interests of advertisers, publish-
ers, and consumers may be protected by providing a perceptually
meaningful rating of the amount by which a person’s appearance
has been digitally altered. When published alongside a photo, such
a rating can inform consumers of how much a photo has strayed
from reality, and can also inform photo editors of exaggerated and
perhaps unintended alterations to a person’s appearance.
Popular photo-editing software, such as Adobe Photoshop,
allows photo editors to easily alter the appearance of a person.
These alterations may affect the geometry of the subject and may
include slimming of legs, hips, and arms, elongating the neck,
improving posture, enlarging the eyes, or making faces more sym-
metric. Other photometric alterations affect skin tone and texture.
These changes may include smoothing, sharpening, or other opera-
tions that remove or reduce wrinkles, cellulite, blemishes, freckles,
and dark circles under the eyes. A combination of geometric and
photometric manipulations allows photo retouchers to subtly or
dramatically alter a person’s appearance.
We have developed a metric that quantifies the perceptual im-
pact of geometric and photometric modifications by modeling
common photo retouching techniques. Geometric changes are
modeled with a dense locally-linear, but globally smooth, motion
field. Photometric changes are modeled with a locally-linear filter
and a generic measure of local image similarity [SSIM (9)]. These
model parameters are automatically estimated from the original
and retouched photos as described in Materials and Methods.
Shown in Fig. 2, from left to right, are an original and a retouched
photo and a visualization of the measured geometric and photo-
metric modifications.
The extent of photo manipulation is quantified with eight
summary statistics extracted from these models. The amount
of geometric modification is quantified with four statistics: the
mean and standard deviation of the motion magnitude computed
separately over the subject’s face and body. The amount of photo-
metric modification is quantified with four statistics. The first two
statistics are the mean and standard deviation of the spatial extent
of local smoothing or sharpening filters. The second two statistics
are the mean and standard deviation of the similarity metric SSIM.
We show that these summary statistics combine to yield a me-
tric that correlates well with perceptual ratings of photo altera-
tion. This metric can be used to automatically rate the amount by
which a photo was retouched.
Results
A diverse set of 468 original and retouched photos was collected
from a variety of on-line sources. Human observers were asked to
rank the amount of photo alteration on a scale of 1 (very similar)
to 5 (very different). Given an original and retouched photo, we
estimate the geometric and photometric modifications and
extract eight summary statistics that embody the extent of photo
retouching. Observer ratings were correlated against the sum-
mary statistics using nonlinear support vector regression (SVR).
See Materials and Methods for complete details.
Shown in Fig. 3 is the correlation between the mean of 50
observer ratings per image and our metric. Each data point cor-
responds to one of 468 images rated on a scale of 1 to 5. The
predicted rating for each image was determined by training an
SVR on 467 images using a leave-one-out cross-validation meth-
odology. The R-value is 0.80, the mean/median absolute predic-
tion error is 0.30∕0.24 with a standard deviation of 0.24 and a
max absolute error of 1.19. The absolute prediction error is below
Author contributions: E.K. and H.F. designed research; E.K. and H.F. performed research;
E.K. and H.F. analyzed data; and E.K. and H.F. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Freely available online through the PNAS open access option.
*“July RedbookWinsWebsite’s ’Most Photoshopped’ Contest,” Huffington Post, Jul. 2007.
“Twiggy’s Olay Ad Banned Over Airbrushing,” The Guardian, Dec. 2009. “Model in
Altered Ralph Lauren Ad Speaks Out”, Boston Globe, Oct. 2009.
†
“AMA Adopts New Policies at Annual Meeting”, AMA Press Release, Jun. 21, 2011.
‡
“Airbrush Alert: UK wants to keep fashion ads real,” AP, Sep. 2010.
1To whom correspondence should be addressed. E-mail: farid@cs.dartmouth.edu.
This article contains supporting information on-line at www.pnas.org/lookup/suppl/
doi:10.1073/pnas.1110747108/-/DCSupplemental.
www.pnas.org/cgi/doi/10.1073/pnas.1110747108 PNAS Early Edition ∣ 1 of 6
SO
CI
AL
SC
IE
NC
ES
CO
M
PU
TE
R
SC
IE
NC
ES
99.1% of the images, respectively.
Each observer rated 70 pairs of before/after images. The intra-
class reliability is 0.97, showing that the mean observer rating is
consistent.§ Each observer rated a random set of five images three
separate times, the presentations of which were uniformly distrib-
uted throughout the duration of the experiment. The mean/med-
ian within observer standard deviation is 0.34∕0.31, showing that
observers are relatively consistent in their individual ratings.
To determine which of our eight summary statistics were most
critical for predicting observer ratings, we trained and tested
255 SVRs, one for each possible subset of size 1 to 8. The best
performing SVR with one statistic consisted of the mean of the
geometric facial distortion (statistic 1 as described in subsection
Perceptual Distortion), which yielded an R-value of 0.58. The best
performing SVR with two statistics consisted of the standard de-
viation of the geometric body distortion and the standard devia-
tion of the photometric SSIM (statistics 4 and 6), which yielded
an R-value of 0.69. And, the best performing SVR with three
statistics consisted of adding the standard deviation of the geo-
metric facial distortion to the previous SVR (statistics 4, 5, and 6),
which yielded an R-value of 0.76. The best performing SVR of
size 6 had an R-value of 0.80, equal to that of the full set of size
8. This subset of size 6 consisted of the statistics 1, 2, 4, 6, 7, and 8
as described in subsection Perceptual Distortion. Although six
statistics are sufficiently powerful, they are extracted from each
component of the geometric and photometric models. Therefore,
there is little cost in using all eight statistics in terms of computa-
tional complexity or in terms of training the SVR.
The results presented above employed a nonlinear regression
technique (SVR) to predict observer ratings. We also tested a lin-
ear SVR to validate the use of a nonlinear SVR over a simpler
linear SVR. The R-value for the linear SVR is 0.72, as compared
to 0.80 for the nonlinear SVR. The mean absolute prediction error
is 0.34 with a standard deviation of 0.27 as compared to 0.30 and
0.24 for the nonlinear SVR. The max absolute error jumps from
1.19 to 1.93. Overall, the nonlinear SVR affords a considerably
better prediction of observer ratings as compared to a linear SVR.
We also compared our metric against two standard image
similarity metrics. A metric based only on the mean and standard
deviation of a standard application of SSIM yields an R-value
of 0.52 as compared to our approach that had an R-value of
0.80. A metric based on only the mean squared error between
the before and after image performed much worse with a R-value
of only 0.30. Standard image similarity metrics perform poorly
because they do not compensate for, or measure, large-scale geo-
metric distortions.
Shown in Fig. 4 are representative images with minimal (top)
and maximal (bottom) prediction error. The over- and under-
estimations illustrate some of the limitations of our model.
The perceptual distortion in the first two images (lower) is over-
estimated because there is a large photometric difference for
the young boy (removal of blemishes) and a large geometric
difference for the young woman (change in shape and position of
the head), but neither of these differences correspond to a large
perceptual difference in appearance. On the other hand, the per-
ceptual distortion in the next three images is underestimated. The
change to the symmetry of the young man’s face, the addition of
make-up to the woman, and the addition of teeth to the man are
each relatively small from a photometric and geometric perspec-
tive but yield a large perceptual difference in appearance. Even
with these limitations, we can reasonably measure perceptual dis-
tortion over a diverse range of photo alterations and content.
Fig. 1. A magazine cover featuring Faith Hill and the unretouched photo
(top). An advertisement for an Olay beauty product featuring the model
Twiggy and a contemporaneous photo of Twiggy (middle). This ad was
banned in the United Kingdom by the Advertising Standards Authority be-
cause it considered the postproduction retouching of the original ad mislead-
ing to consumers. An ad featuring the model Filippa Hamilton and a
contemporaneous ad of Hamilton (bottom). Images from: (Top) http://
jezebel.com/278919/heres-our-winner-redbook-shatters-our-faith-in-well-not-
publishing-but-maybe-god; (Middle) http://www.guardian.co.uk/media/2009/
dec/16/twiggys-olay-ad-banned-airbrushing; (Bottom) http://www.mdig.com.
br/index.php?itemid=8244. Sites accessed November 8, 2011.
§The intraclass reliability (10) is computed as σ2t ∕ðσ2t þ σ2w∕nÞ, where the between-image
variance is nσ2t þ σ2w , the within-image variance is σ2w , and n is the number of ratings
per image.
2 of 6 ∣ www.pnas.org/cgi/doi/10.1073/pnas.1110747108 Kee and Farid
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime



