Sign up & Download
Sign in

Galaxy Zoo Morphology and Photometric Redshifts in the Sloan Digital Sky Survey

by M J Way
Astrophysical Journal Letters (2011)

Abstract

It has recently been demonstrated that one can accurately derive galaxy morphology from particular primary and secondary isophotal shape estimates in the Sloan Digital Sky Survey imaging catalog. This was accomplished by applying Machine Learning techniques to the Galaxy Zoo morphology catalog. Using the broad bandpass photometry of the Sloan Digital Sky Survey in combination with with precise knowledge of galaxy morphology should help in estimating more accurate photometric redshifts for galaxies. Using the Galaxy Zoo separation for spirals and ellipticals in combination with Sloan Digital Sky Survey photometry we attempt to calculate photometric redshifts. In the best case we find that the root mean square error for Luminous Red Galaxies classified as ellipticals is as low as 0.0118. Given these promising results we believe better photometric redshift estimates for all galaxies in the Sloan Digital Sky Survey (sim$350 million) will be feasible if researchers can also leverage their derived morphologies via Machine Learning. These initial results look to be promising for those interested in estimating Weak-Lensing, Baryonic Acoustic Oscillation, and other fields dependent upon accurate photometric redshifts.

Author-supplied keywords

Cite this document (BETA)

Available from arxiv.org
Page 1
hidden

Galaxy Zoo Morphology and Photometric Redshifts in the Sloan Digital Sky Survey

ar
X
iv
:1
10
4.
37
58
v1
[
as
tro
-p
h.C
O]
1
9 A
pr
20
11
Draft version April 20, 2011
Preprint typeset using LATEX style emulateapj v. 5/14/03
GALAXY ZOO MORPHOLOGY AND PHOTOMETRIC REDSHIFTS IN THE SLOAN DIGITAL SKY SURVEY
M. J. Way1,2
NASA Goddard Institute for Space Studies, 2880 Broadway, New York, NY 10029, USA
Draft version April 20, 2011
ABSTRACT
It has recently been demonstrated that one can accurately derive galaxy morphology from particular
primary and secondary isophotal shape estimates in the Sloan Digital Sky Survey imaging catalog.
This was accomplished by applying Machine Learning techniques to the Galaxy Zoo morphology
catalog. Using the broad bandpass photometry of the Sloan Digital Sky Survey in combination with
with precise knowledge of galaxy morphology should help in estimating more accurate photometric
redshifts for galaxies. Using the Galaxy Zoo separation for spirals and ellipticals in combination with
Sloan Digital Sky Survey photometry we attempt to calculate photometric redshifts. In the best
case we find that the root mean square error for Luminous Red Galaxies classified as ellipticals is as
low as 0.0118. Given these promising results we believe better photometric redshift estimates for all
galaxies in the Sloan Digital Sky Survey (∼350 million) will be feasible if researchers can also leverage
their derived morphologies via Machine Learning. These initial results look to be promising for those
interested in estimating Weak-Lensing, Baryonic Acoustic Oscillation, and other fields dependent upon
accurate photometric redshifts.
Subject headings: galaxies: distances and redshifts — methods: statistical
1. INTRODUCTION
It is commonly believed that adding information about
the morphology of galaxies may help in the estimation
of Photometric Redshifts (Photo-Zs) when using train-
ing set methods. Most of this work in recent years has
utilized The Sloan Digital Sky Survey (SDSS, York et al.
2000). For example, as discussed in Way et al. (2009,
hereafter Paper II) many groups have attempted to use
a number of derived primary and secondary isophotal
shape estimates in the Sloan Digital Sky Survey imaging
catalog to help in estimating Photo-Zs. Some examples
include; using the radius containing 50% and/or 90% of
the Petrosian (1976) flux in the SDSS r band (denoted as
petroR50 r petroR90 r in the SDSS catalog), concentra-
tion index (CI=petroR90 r/petroR50 r), surface bright-
ness, axial ratios and radial profile (e.g. Collister & La-
hav 2004; Ball et al. 2004; Wadadekar 2005; Kurtz et al.
2007; Wray & Gunn 2008).
More recently Singal et al. (2011) have attempted to
use Galaxy Shape parameters derived from Hubble Space
Telescope/Advanced Camera for Surveys imaging data
using a principle components approach and then feeding
this information into their Neural Network code to pre-
dict Photo-Zs, but for samples much deeper than the
SDSS. Unfortunately they find marginal improvement
when using their morphology estimators.
Another promising approach focuses on the reddening
and inclination of galaxies. Yip et al. (2011) have at-
tempted to quantify these effects on a galaxy’s spectral
energy distribution (SED). The idea is to use this infor-
mation to correct the over-estimation of Photo-Zs of disk
galaxies.
On the other hand, attempts to morphologically clas-
1 NASA Ames Research Center, Space Sciences Division, MS
245-6, Moffett Field, CA 94035, USA
2 Department of Astronomy and Space Physics, Uppsala, Swe-
den
sify large number of galaxies in the universe has gained in
accuracy over the past 15 years as better/larger training
samples from eye classification has increased. For exam-
ple, Lahav et al. (1995) was one of the first to use an Ar-
tificial Neural Network trained on 830 galaxies classified
by the eyes of six different professional astronomers. In
more recent years Ball et al. (2004) has attempted to clas-
sify galaxies by morphological type using a Neural Net-
work approach based on a sample of 1399 galaxies (from
the catalog of Nakamura et al. (2011)). Cheng et al.
(2011) has used a sample of 984 non-star forming SDSS
early-type galaxies to distinguish between E, S0 and Sa
galaxies. In the past year two new attempts at morpho-
logical classification using Machine Learning techniques
on a Galaxy Zoo (Lintott et al. 2008, 2011) training sam-
ple have been published (Banerji et al. 2010; Huertas-
Company et al. 2011). The Banerji et al. (2010) results
were impressive in that they claim to obtain classifica-
tion to better than 90% for three different morphological
classes (spiral, elliptical and point-sources/artifacts).
These works are in contrast to previous work like that
of Bernardi et al. (2003) who used a classification scheme
based on SDSS spectra. However, this classification cer-
tainly missed some early-type galaxies from their desired
sample due to the presence of star formation.
In this paper we will continue our use of Gaussian Pro-
cess Regression to calculate Photo-Zs, using a variety of
inputs. This method has been discussed extensively in
two previous papers (Way & Srivastava 2006; Way et al.
2009).
We utilize the SDSS Main Galaxy Sample (MGS,
Strauss et al. 2002) and the Luminous Red Galaxy Sam-
ple (LRG, Eisenstein et al. 2001) from the SDSS Data
Release Seven (DR7, Abazajian et al. 2009). We also
utilize the Galaxy Zoo 1 survey results (GZ1, Lintott et
al. 2011). The Galaxy Zoo project3 (Lintott et al. 2008)
3 http://www.galaxyzoo.org
Page 2
hidden
2contains a total of 900,000 SDSS galaxies with morpho-
logical classifications (Lintott et al. 2011).
While this study does not focus exclusively on the LRG
sample, it should be noted that if it is possible to im-
prove the Photo-Z estimates for these objects as shown
herein it could also improve the estimation of cosmologi-
cal parameters (e.g. Blake & Bridle 2005; Padmanabhan
et al. 2007; Percival et al. 2010; Reid et al. 2010; Zunckel,
Gott & Lunnan 2011) using the SDSS as well as upcom-
ing surveys such as BOSS4(Cuesta-Vazquez et al. 2011;
Eisenstein et al. 2011), BigBOSS (Schlegel et al. 2009),
and possibly Euclid (Sorba & Sawicki 2011), not to men-
tion LSST5(Ivezic et al. 2008). It could also contribute to
more reliable Photo-Z errors, as required for weak-lensing
surveys (Bernstein & Huterer 2010; Kitching, Heavens &
Miller 2011) and Baryonic Acoustic Oscillation measure-
ments, which are also dependent upon accurate Photo-Z
estimation of LRGs (Roig et al. 2008).
2. DATA
All of the data used herein have been obtained via the
SDSS casjobs server6. In order to obtain results consis-
tent with Paper II for both the MGS and LRG samples
we use the same photometric quality flags (!BRIGHT
and !BLENDED and !SATURATED) and redshift qual-
ity (zConf>0.95 and zWarning=0) but using the SDSS
DR7 instead of earlier SDSS releases. These data are
cross-matched in casjobs with columns 14–16 in Table 2
of Lintott et al. (2011) extracting the galaxies flagged as
‘spiral’, ‘elliptical’ or ‘uncertain’. The galaxies “flagged
as ‘elliptical’ or ‘spiral’ require 80 per cent of the vote in
that category after the debiasing procedure has been ap-
plied; all other galaxies are flagged ‘uncertain’” (Lintott
et al. 2011). Debiasing is the processes of correcting for
small biases in spin direction and color. See Section 3.1
in Lintott et al. (2011) for more details on debiasing.
Note that the GZ1 sample is based upon the MGS, but
the MGS contains LRGs as well. This is why we can ana-
lyze both of these samples. However, the actual LRG sur-
vey goes fainter than the MGS and so we do not find LRG
galaxies fainter than the MGS limit of rpetrosian .17.77.
See Strauss et al. (2002) and Eisenstein et al. (2001) for
details on the MGS and LRG samples.
0 0.1 0.2 0.3
0
10000
20000
30000
40000
50000
60000
70000
redshift
n
u
m
be
r o
f g
al
ax
ie
s
SDSS Main Galaxy Sample
13 14 15 16 17 18
0
50000
100000
150000
200000
r magnitude
SDSS Main Galaxy Sample


SDSS DR7 MGS
Galaxy Zoo spirals
Galaxy Zoo ellipticals
Galaxy Zoo unknown
0 0.1 0.2 0.3 0.4 0.5
0
2000
4000
6000
8000
10000
12000
redshift
n
u
m
be
r o
f g
al
ax
ie
s
SDSS Luminous Red Galaxies
12 14 16 18 20
0
5000
10000
15000
20000
25000
30000
r magnitude
SDSS Luminous Red Galaxies


SDSS DR7 LRG
Galaxy Zoo spirals
Galaxy Zoo ellipticals
Galaxy Zoo unknown
4 Baryon Oscillation Spectroscopic Survey
5 Large Synoptic Survey Telescope
6 http://casjobs.sdss.org
Table 1. Results
Dataa Inputsb σrmsec
MGS-ELL ugriz+Q+U 0.01561 0.01532 0.01620
- ugriz+P50+CI 0.01407 0.01400 0.01475
- ugriz+P50+CI+Q+U 0.01641 0.01560 0.01801
- ugriz+B 0.01679 0.01668 0.01683
MGS-SP ugriz+Q+U 0.01889 0.01864 0.01913
- ugriz+P50+CI 0.01938 0.01927 0.01947
- ugriz+P50+CI+Q+U 0.01751 0.01747 0.01777
- ugriz+B 0.02092 0.02089 0.02101
LRG-ELL ugriz+Q+U 0.01345 0.01291 0.01420
- ugriz+P50+CI 0.01334 0.01278 0.01426
- ugriz+P50+CI+Q+U 0.01584 0.01439 0.01693
- ugriz+B 0.01180 0.01175 0.01184
LRG-SP ugriz+Q+U 0.01520 0.01404 0.01910
- ugriz+P50+CI 0.01514 0.01474 0.01679
- ugriz+P50+CI+Q+U 0.01957 0.01870 0.02285
- ugriz+B 0.01737 0.01728 0.01765
aMGS=Main Galaxy Sample (Strauss et al. 2002),
LRG=Luminous Red Galaxies (Eisenstein et al. 2001),
SP=Classified as spiral by Galaxy Zoo, ELL=Classified as
elliptical by Galaxy Zoo
bu-g-r-i-z=5 SDSS dereddened magnitudes, P50=Petrosian
50% light radius in SDSS i band, CI= Concentration In-
dex (P90/P50), Q=Stokes Q value in i band, U=Stokes U
value in i band, B=Inputs from Table 2 of Banerji et al.
(2010)=CI,mRrCc i,aE i,mCr4 i,texture i
cWe quote the bootstrapped 50%, 10% and 90% confidence levels
as in Paper II for the root mean square error (rmse)
Fig. 1.— Redshift and r-band dereddened model magnitudes
for the Main Galaxy Sample (top two panels) and Luminous Red
Galaxies (bottom two panels).
A number of points from both the LRG and MGS
were eliminated because of either bad values (e.g. -9999)
or because they were considered outliers from the main
distribution of points. The former offenders included:
petroR90 i (13 points in the MGS sample, 1 point in
the LRG), mE1 i (43 points, 5 points), petroR90Err i
(7177 points, 1262 points), mRrCcErr i (22 points, 12
points). The reason for eliminating bad mE1 i points
is that we use it for calculating aE i from Table 2 of
Banerji et al. (2010). A small number of outliers were
also removed from the MGS sample, but totalled only 27
points. No such outlier points were removed in the LRG
sample. This leaves us with a total of 437,273 MGS and
68,996 LRG objects. Using the GZ1 classifications in
the MGS there are 45,249 ellipticals, 119,369 spirals and
272,655 uncertain (∼ 62%). For the LRG sample there
are 27,227 ellipticals and 13,495 spirals leaving 28,274
uncertain (∼41%).
3. DISCUSSION
Using the morphological classifications from the
Galaxy Zoo project first data release (Lintott et al. 2011)
we attempt to calculate Photo-Zs for 4 different samples
and four combinations of primary and secondary isopho-
tal shape estimates from the SDSS as seen in Table 1. A
larger variety of input combinations were tried including
those in Table 1 of Banerji et al. (2010). However, we
only report those found with the lowest root mean square
error (rmse) in Table 1 of this paper.
The results using the Banerji et al. (2010) suggested
isophotal shape estimates as well as others tested in Pa-

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

11 Readers on Mendeley
by Discipline
 
 
by Academic Status
 
27% Researcher (at an Academic Institution)
 
27% Ph.D. Student
 
18% Student (Master)
by Country
 
27% China
 
9% Italy
 
9% Germany

Groups

Galaxy Zoo