Sign up & Download
Sign in

Galaxy Zoo Supernovae

by A M Smith, S Lynn, M Sullivan, C J Lintott, P E Nugent, J Botyanszki, M Kasliwal, R Quimby, S P Bamford, L F Fortson, K Schawinski, I Hook, S Blake, P Podsiadlowski, J Joensson, A Gal-Yam, I Arcavi, D A Howell, J S Bloom, J Jacobsen, S R Kulkarni, N M Law, E O Ofek, R Walters show all authors
Physics (2010)

Abstract

This paper presents the first results from a new citizen science project: Galaxy Zoo Supernovae. This proof of concept project uses members of the public to identify supernova candidates from the latest generation of wide-field imaging transient surveys. We describe the Galaxy Zoo Supernovae operations and scoring model, and demonstrate the effectiveness of this novel method using imaging data and transients from the Palomar Transient Factory (PTF). We examine the results collected over the period April-July 2010, during which nearly 14,000 supernova candidates from PTF were classified by more than 2,500 individuals within a few hours of data collection. We compare the transients selected by the citizen scientists to those identified by experienced PTF scanners, and find the agreement to be remarkable - Galaxy Zoo Supernovae performs comparably to the PTF scanners, and identified as transients 93% of the ~130 spectroscopically confirmed SNe that PTF located during the trial period (with no false positive identifications). Further analysis shows that only a small fraction of the lowest signal-to-noise SN detections (r > 19.5) are given low scores: Galaxy Zoo Supernovae correctly identifies all SNe with > 8sigma detections in the PTF imaging data. The Galaxy Zoo Supernovae project has direct applicability to future transient searches such as the Large Synoptic Survey Telescope, by both rapidly identifying candidate transient events, and via the training and improvement of existing machine classifier algorithms.

Cite this document (BETA)

Available from arxiv.org
Page 1
hidden

Galaxy Zoo Supernovae

Mon. Not. R. Astron. Soc. 000, 000{000 (0000) Printed 17 November 2010 (MN LATEX style le v2.2)
Galaxy Zoo Supernovae
A. M. Smith1?y, S. Lynn1, M. Sullivan1z, C. J. Lintott1, P. E. Nugent2,
J. Botyanszki2, M. Kasliwal3, R. Quimby3, S. P. Bamford 14, L. F. Fortson15,
K. Schawinski4;5;6,I. Hook1;7, S. Blake1, P. Podsiadlowski1, J. Jonsson1, A. Gal-Yam8,
I. Arcavi8, D. A. Howell9;10, J. S. Bloom11, J. Jacobsen2, S. R. Kulkarni3,
N. M. Law12, E. O. Ofek3;4, R. Walters12
1Department of Physics (Astrophysics), University of Oxford, DWB, Keble Road, Oxford OX1 3RH, UK
2Computational Cosmology Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA.
3Cahill Center for Astrophysics, California Institute of Technology, Pasadena, CA, 91125, USA
4Einstein Fellow
5Department of Physics, Yale University, New Haven, CT 06511, USA
6Yale Center for Astronomy and Astrophysics, Yale University, P.O. Box 208121, New Haven, CT 06520, USA
7INAF-Osservatorio di Roma, via Frascati 33, I-00040 Monteporzio Catone (Roma), Italy
8Department of Particle Physics and Astrophysics, Faculty of Physics, The Weizmann Institute of Science, Rehovot 76100, Israel
9Las Cumbres Observatory Global Telescope Network, 6740 Cortona Dr, Suite 102, Goleta, CA 93117
10University of California, Santa Barbara, Broida Hall, Mail Code 9530, Santa Barbara, CA 93106-9530, USA
11Department of Astronomy, University of California, Berkeley, CA 94720-3411, USA
12Dunlap Institute for Astronomy and Astrophysics, University of Toronto, 50 St. George Street, Toronto M5S 3H4, Ontario, Canada
13Caltech Optical Observatories, California Institute of Technology, Pasadena, CA 91125, USA
14School of Physics and Astronomy, University of Nottingham, University Park, Nottingham, NG7 2RD
15School of Physics and Astronomy, University of Minnesota, Minneapolis, MN 55455, USA
17 November 2010
ABSTRACT
This paper presents the rst results from a new citizen science project: Galaxy Zoo
Supernovae. This proof of concept project uses members of the public to identify su-
pernova candidates from the latest generation of wide- eld imaging transient surveys.
We describe the Galaxy Zoo Supernovae operations and scoring model, and demon-
strate the e ectiveness of this novel method using imaging data and transients from
the Palomar Transient Factory (PTF). We examine the results collected over the pe-
riod April{July 2010, during which nearly 14,000 supernova candidates from PTF
were classi ed by more than 2,500 individuals within a few hours of data collection.
We compare the transients selected by the citizen scientists to those identi ed by
experienced PTF scanners, and nd the agreement to be remarkable { Galaxy Zoo
Supernovae performs comparably to the PTF scanners, and identi ed as transients
93% of the  130 spectroscopically con rmed SNe that PTF located during the trial
period (with no false positive identi cations). Further analysis shows that only a small
fraction of the lowest signal-to-noise SN detections (r > 19:5) are given low scores:
Galaxy Zoo Supernovae correctly identi es all SNe with > 8 detections in the PTF
imaging data. The Galaxy Zoo Supernovae project has direct applicability to future
transient searches such as the Large Synoptic Survey Telescope, by both rapidly iden-
tifying candidate transient events, and via the training and improvement of existing
machine classi er algorithms.
Key words: supernovae: general | surveys { methods: data analysis
? This publication has been made possible by the participation
of more than 10,000 volunteers in the Galaxy Zoo Supernovae
project (http://supernova.galaxyzoo.org/authors).
y E-mail: arfon.smith@astro.ox.ac.uk
z E-mail: sullivan@astro.ox.ac.uk
c
0000 RAS
ar
X
iv
:1
01
1.
21
99
v2
[
as
tro
-p
h.I
M
]
15
N
ov
20
10
Page 2
hidden
2 Smith et al.
1 INTRODUCTION
Supernovae (SNe) have a profound in
uence upon many
diverse areas of astrophysics. They are the key source of
heavy elements in the universe, driving cosmic chemical
evolution. Their energy input can initiate episodes of star
formation, and they are themselves the product of the com-
plex physics underlying the nal stages of stellar evolution.
The homogeneous nature of the thermonuclear Type Ia SNe
provides the most mature and direct probe of dark energy.
Despite this importance in astrophysics, we understand sur-
prisingly little about the physics governing SN explosions.
Only the progenitors of the core collapse Type IIP SNe
have been directly identi ed: the physical nature of other
SN types remains uncertain (for reviews see Hillebrandt
& Niemeyer 2000; Smartt 2009). We remain ignorant
about many aspects of SN rates, light-curves, spectra,
demographics, and the dependence of these properties on
environment, progenitor composition, and explosion physics.
In part, this is due to the historical diculty and
technical challenges associated with locating SNe in the
required numbers to create statistically meaningful samples,
particularly at low redshift where high quality follow-up
data can most easily be attained. This situation has
changed with the availability of large format CCD detec-
tors. Automated, wide- eld transient searches on dedicated
1-2m class telescopes and facilities are underway, typically
observing thousands of square degrees every few days (e.g.
Keller et al. 2007; Law et al. 2009). These
ux-limited
`rolling searches' select transient events without regard to
host galaxy properties or type.
This large amount of imaging data naturally gener-
ates its own particular logistical challenges in dealing
with the data
ow, and identifying transient astrophysical
objects of interest in the data (`candidates') for scienti c
study and analysis. Of particular importance is the rapid
identi cation of new candidates once the imaging data has
been obtained and processed. Though many aspects of
survey operations, such as image processing, can be e-
ciently pipelined, the identi cation of new transient sources
remains challenging, with human operators (`scanners')
invariably charged with wading through new detections on a
nightly basis. Though computer algorithms can assist with
identifying objects of interest in the data, this scanning
can still absorb a signi cant amount of researcher time. A
related issue is spectroscopic follow-up, a limited resource
that must be prioritised and allocated eciently to the
detected candidates, with the absolute minimum of false
candidates observed.
Two high-redshift SN searches highlight these chal-
lenges. The Supernova Legacy Survey (SNLS; e.g. Astier
et al. 2006) used the MegaCam instrument on the 3.6m
Canada{France{Hawaii Telescope to survey 4 deg2 with
a cadence of a few days. Following automated cuts on
signal-to-noise and candidate shape, each square degree
would typically generate 200 candidates for each night of
observation (Perrett et al. 2010). Visual inspection would
decrease this number to 20 plausible real transients. The
Sloan Digital Sky Survey-II Supernova Survey (SDSS-SN;
e.g. Frieman et al. 2008) used the SDSS 2.5m telescope
to survey a larger area of 300 deg2, though to a shallower
depth than SNLS (Sako et al. 2008). After the removal of
moving (solar system) objects, in the rst season (3 month
period), human scanners viewed 3000{5000 objects each
night spread over six scanners (>100,000 over the whole
season). Although this number was radically reduced in
later seasons as more automated procedures were developed
(14,000 during season 2), the burden on human scanners
was still large (Sako et al. 2008). With new wide- eld
transient surveys generating many more candidates than
these two surveys, advances in both automated techniques
and human scanning are clearly required.
This paper details a new method for sorting through
SN candidates, based upon the citizen science project
`Galaxy Zoo' (Lintott et al. 2008, 2010). New candidate
transient events are uploaded to the Galaxy Zoo Super-
novae website, and are visually examined and classi ed by
members of the public, guided by a tutorial and associated
decision tree. Each candidate is examined and classi ed by
multiple people and given an average score, with the candi-
dates ranked and made available for further investigation in
real-time. The advantages of this approach are considerable.
First, the burden of candidate scanning is largely removed
from the science team running the survey. Second, each
candidate is inspected multiple times (versus once by a
scanner in previous transient surveys), reducing the chances
that the candidate could be missed. Third, with a large
number of people scanning candidates, more candidates
can be examined in a shorter amount of time { and with
the global Zooniverse (the parent project of Galaxy Zoo)
user base this can be done around the clock, regardless
of the local time zone the science team happens to be
based in. This speed can even allow interesting candidates
to be followed up on the same night as that of the SNe
discovery, of particular interest to quickly evolving SNe
or transient sources. Fourth, the large number of human
classi cations collected can be used to improve machine
learning algorithms for automated SNe classi cation.
This paper reports the results from the early opera-
tions (over 3 month period) of this system. In section
2, we describe the Palomar Transient Factory, data from
which were used in the tests and running of Galaxy Zoo
Supernovae. Section 3 describes Galaxy Zoo Supernovae,
including the ranking system for candidates used by the
citizen science classi ers. Section 4 has details of the tests
and rst results of the Galaxy Zoo Supernovae operation.
We discuss the future direction of this project in section 5.
2 THE PALOMAR TRANSIENT FACTORY
The Palomar Transient Factory (PTF) is a wide- eld
survey exploring the optical transient sky. The survey is
built around the 48 inch Samuel Oschin telescope at the
Palomar Observatory, recently equipped with the CFH12k
mosaic camera (formerly at the Canada-France-Hawaii
Telescope) o ering an 7.8 square degree eld of view, and
robotised to allow remote and automated observations.
Observations are mainly conducted using the Mould-R lter.
c
0000 RAS, MNRAS 000, 000{000
Page 3
hidden
Galaxy Zoo Supernovae 3
A full description of the operations of the PTF exper-
iment can be found in Law et al. (2009). Of most relevance
for SN studies are the `5-day cadence' and `dynamical
cadence' experiments, each using  40 % of the observing
time. The dynamic cadence revisits survey elds on time-
scales of 1 minute up to 5 days and is particularly sensitive
to rapid transient events (as well as longer duration SNe),
whereas the 5-day cadence is speci cally targeted to
extra-galactic SN studies (Rau et al. 2009). Even in the 5
day cadence, images are typically taken in pairs separated
in time by about one hour. This is to help identify moving
objects (i.e., asteroids) in the imaging data, which might
otherwise masquerade as new transients.
2.1 PTF real-time operations
The PTF (near)-real-time search pipeline is hosted by the
National Energy Scienti c Computing Center (NERSC) at
the Lawrence Berkeley National Laboratory (LBNL). After
data is taken and transferred from the Palomar observatory
to NERSC, the pipeline generates new subtraction images
within an hour (Nugent et al. 2010), subtracting an older,
deep `reference' image from the new observations. The two
images are photometrically matched using the hotpants
program1, an implementation of the Alard (2000) algorithm.
Candidate transient events are then identi ed as > 5 de-
tections in the subtraction images using SExtractor (Bertin
& Arnouts 1996). Fluxes and various other relevant parame-
ters are measured before storing all candidates in a database.
Each candidate is also `scored' (producing the PTF `real-
bogus' value) using a machine-learning algorithm (the `PTF
robot') based on the characteristics of the detection and pre-
vious history of the candidate (Bloom et al., in prep.). The
vast majority ( 99:99%) of these candidates are not real
astrophysical transient events { the search algorithm is de-
signed to be as inclusive as possible, with most of the can-
didates rejected via simple cuts. These include:
(i) The ratio of both semi-major and semi-minor axes of
the candidate shape to the seeing must be greater than 0.15
and less than 0.85, and the ratio of the FWHM of the can-
didate to the seeing must be greater than 0.5 and less than
2.0. These ensure that the candidate has a reasonable spatial
extent when compared to the seeing,
(ii) In a 7 pixel by 7 pixel box placed on top of the candi-
date, the number of pixels deviating by more than 2 must
be less than 6, and the number deviating by more than 3
must be less than 2.
(iii) Each candidate must be seen in at least one image
taken in the previous 10 nights (including the night of de-
tection), a constraint designed to remove fast moving solar
system objects,
(iv) Candidates within 100 of previously located objects
(excluding the previous 10 nights) are removed to avoid the
repeated detection of (e.g.) AGN or variable stars.
The e ectiveness of these cuts means that a typical full
night of PTF observing will yield 100{500 (average 200)
candidates that survive these culls, which can then be
1 http://www.astro.washington.edu/users/becker/hotpants.html
further sorted using only a short decision tree in Galaxy
Zoo Supernovae.
Though the ultimate aim is to make the human scan-
ners redundant with a fully automated machine-learning
classi cation pipeline, at the current time a substantial
amount of human scanning is still required to identify the
good candidates (in part, this scanning can be used to train
machine-based methods). Candidates are inspected visually
by human scanners in the PTF team, using a web interface
to reject false transient detections. The human scanner can
dynamically alter a set of cuts to control the candidates
that are shown for a given image, including the signal-
to-noise, shape parameters, the full-width half-maximum
(FWHM) of the candidate compared to the global image
value, and the output score from the machine classi er.
Based on the cuts chosen, the scanner is presented with a
series of detection `triplets' { each triplet contains three
images showing the current image of the eld (containing
SN light together with all other objects), the historical or
reference image of the same eld (with no SN light), and
the di erence between the two (which should contain only
the SN light). Examples of triplets are shown in Fig. 1. The
human scanner then decides, based on his or her subjective
(but informed) judgement, whether each of the candidates
presented is a real transient event, and if so marks that
candidate as either a SN-like transient or a variable star.
The primary goal of Galaxy Zoo Supernovae in PTF
is to initially supplement, but perhaps ultimately replace,
the role of the PTF human scanners. By presenting a
transient candidate to a number of di erent classi ers not
only is the time of the PTF team freed to spend on tasks
not suitable for the general public, but the potential of
mis-classi cation of candidates due to individual human
error is signi cantly reduced. The 5-day and dynamical
cadence programs in PTF collect data on every night of
the year March to November (weather permitting) and on
each night 2{4 of the PTF team share the scanning tasks,
examining 500 candidates. This not only requires several
person-hours of work, but the large number of classi cations
by a small number of PTF-scanners is likely to contain
errors, and this is where the repeat-classi cation by Galaxy
Zoo Supernovae volunteers can help.
The Galaxy Zoo Supernova project also has other aims. A
longer-term goal is to provide sucient classi cation data
for the training and improvement of the PTF machine-
learning classi cation algorithm. A nal consideration is to
build expertise in the citizen science community for future
transient surveys, which of course generate many more
candidates than PTF, perhaps approaching thousands of
genuine candidates on a nightly basis.
c
0000 RAS, MNRAS 000, 000{000
Page 4
hidden
4 Smith et al.
Figure 1. Four example detection triplets from PTF, similar to those uploaded to Galaxy Zoo Supernovae. Each image is 10000 on a
side. In each triplet, the panels show (from left to right) the most recent image containing the candidate SN light (the science image),
the reference image generated from data from an earlier epoch with no SN candidate light, and the subtraction or di erence image { the
science image minus the reference image { with the SN candidate at the centre of the crosshairs. The two triplets on the left are real
SNe, and were highly scored by the Zoo; the triplets on the right are not SNe and were given the lowest possible score.
3 GALAXY ZOO SUPERNOVAE
3.1 Description of a typical `Zoo'
The Galaxy Zoo Supernovae website2 is built using the
Zooniverse3 Application Programming Interface (API)
toolset. The Zooniverse API is the core software supporting
the activities of all Zooniverse citizen science projects.
Built originally for Galaxy Zoo 2, the software is currently
being used by six di erent projects. The Zooniverse API is
designed primarily as a tool for serving up a large collection
of `assets' (for example, images or video) to an interface, and
collecting back user-generated interactions with these assets.
So that the project website can retain a high perfor-
mance during spikes of activity, Galaxy Zoo Supernovae
is hosted on Amazon Web Services which provides a
virtualised machine environment that can auto-scale in size
based upon server load. The site uses the Elastic Compute
Cloud4 (EC2) for web/database servers and the Simple
Storage Service5 (S3) for image storage.
Image assets are presented to volunteers of the web-
site through custom user interfaces, designed to aid the
volunteer in classifying the object. For many projects this
interface takes the form of a decision tree which walks
the volunteer through a number of questions concerning
the current image. The interaction of the volunteer with
the website produces a set of `annotations' which together
constitute a `classi cation' of the asset. These are stored for
later analysis or in the case of Galaxy Zoo Supernovae are
scored in real-time to change the behaviour of the website.
2 http://supernova.galaxyzoo.org/
3 http://zooniverse.org
4 http://aws.amazon.com/ec2
5 http://aws.amazon.com/s3
3.2 Galaxy Zoo Supernovae website operations
Similar in nature to the original Galaxy Zoo 2 interface,
Galaxy Zoo Supernovae is a classic example of a `Zoo'.
When a new highly-scored candidate is located in the PTF
pipeline, an image triplet (Fig. 1) of the candidate is auto-
matically uploaded, together with a small amount of meta-
data, to the Galaxy Zoo Supernovae API. Upon upload, the
image is saved to Amazon S3 (a le hosting service) and
registered with the website. Finding new SNe is time criti-
cal and our method of automatically registering new assets
with the API means that classi ers are inspecting SN candi-
dates discovered just hours earlier. The interface for Galaxy
Zoo Supernovae presents these candidate detection triplets
(just as with the PTF human scanners, x 2.1) together with
a decision tree of questions and answers designed to help
classify each candidate (see Fig. 3). Fig. 2 displays the typi-
cal
ow in the system. Once a candidate has been classi ed
(see below) it is instantly available to the PTF team through
a private web interface.
3.3 Decision tree
The decision tree developed to assist volunteers in classi-
fying candidates is described in Fig. 3. This decision tree
is designed to remove as many false candidates as possible,
without losing real, scienti cally interesting events. In this
respect the decision tree is conservative in the candidates
that are removed to minimise the number of false negatives.
The tree proceeds as follows:
(i) Is there a candidate centered in the crosshairs of the
right-hand image?
The PTF subtraction pipeline can occasionally undergo
a failure and report (and therefore upload to the site) a
`good' candidate that is actually an error in the processing.
This can be due to large (several pixel) mis-alignments
of the two images being analysed, often localised in a
particular part of the CCD where the astrometric solution
fails. Other sources of failure include saturated pixels or
c
0000 RAS, MNRAS 000, 000{000
Page 5
hidden
Galaxy Zoo Supernovae 5
PTF
pipeline
Zooniverse API
Admin/Results interface
(science team)
Result
Classification
(general public)
Amazon Web
Services
Figure 2. A schematic showing the data acquisition and analysis
in Galaxy Zoo Supernovae: Raw data is processed by the PTF
pipeline, automatically uploaded to the API, presented, analysed
and scored by the Zooniverse community and available for review
by the PTF science team.
bleed trails from bright stars, or problems with the pipeline
at- elding. The SExtractor detection algorithm can also
sometimes detect a noise peak rather than a real transient.
Though the basic cuts made by PTF remove most of these
errors, on occasion they are ranked highly and uploaded to
Galaxy Zoo Supernovae (emphasising the need for human
classi ers). Therefore, the rst question in the decision tree
is designed to remove such objects. The right-hand image
in the triplets in Fig. 1 are the focus of this question.
(ii) Has the candidate itself subtracted correctly?
Small mis-alignments between the reference and science
image can result in image subtraction problems, usually
indicated by a dipole of positive and negative pixels in the
subtraction image. The cores of bright (but not saturated)
stars can also mis-subtract, and result in `bullseye' patterns
in the subtraction images. This question is designed to
ag
such candidates.
(iii) Is the candidate star-like and approximately circu-
lar?
This question is designed to remove unidenti ed cosmic
rays, or di use/non-circular candidates which result from
image subtraction problems. The volunteer is asked if the
candidate looks like a round, symmetrical dot (star). Can-
didates that are very small (1{2 pixels, i.e., not PSF-like),
elongated or otherwise distorted, or di use would trigger a
negative response to this question.
(iv) Is the candidate centered in a circular host galaxy?
The nal question is more subjective, and is designed to
categorise real astrophysical transients into two broad cate-
gories. Many of the transients which PTF detects are vari-
able stars lying within our own galaxy, which are of interest
to a di erent set of science users than extra-galactic tran-
sients. Variable star transients will appear to lie in `hosts'
Q: Is there a candidate
centered in the crosshairs of
the right-hand image?
No (-1) Yes
Q: Has the candidate itself
subtracted correctly?
No (-1) Yes
Q: Is the candidate star-like
and approximately circular?
No (-1) Yes (+1)
Q: Is the candidate centred in
a circular host galaxy?
No (+2) Yes
Q: What is wrong with the
subtraction?
No candidate
Not all pixels positive
Poor subtraction
Q: What is wrong with the
candidate?
Not circular - too small
Not circular - elongated
Not circular - distorted
Not circular - diffuse
Figure 3. The decision tree that a Galaxy Zoo Supernovae vol-
unteer is presented with when classifying a candidate (see x 3.3).
The decision tree can end at a number of points. The scoring
points in the decision tree are also shown. Both the path through
the decision tree, and the cumulative score is recorded for later
analysis.
that are circular (as they are stars), and will also appear to
be located in the centre of these hosts. By contrast, SNe will
either have no host galaxy, or will lie (probably o -centre) in
a large di use host galaxy. This question therefore broadly
splits the real transients into variable stellar transients, and
SNe. Most SNe that do happen to lie in the centres of their
host galaxies will not be categorised as variable stars { the
question also requires the `host galaxy' to be circular.
A full tutorial is available to new volunteers of the website
to illustrate the di erent questions using real PTF data.
3.4 Asset scoring and priority
Once a volunteer has examined a candidate, their response
is converted into a score, S, as follows.
 The initial score is zero.
 If a classi er answers negatively any question up to and
including `Is the candidate star-like and approximately cir-
cular', the candidate is given a score of -1.
 If a classi er instead answers positively up to that ques-
tion, then the candidate is given a score of +1.
 If the classi er then also marks the candidate as not
centred in a circular host, then the candidate gains an ad-
ditional score of 2.
c
0000 RAS, MNRAS 000, 000{000
Page 6
hidden
6 Smith et al.
The structure of the decision and scoring of the questions
means that candidates can only end up with a score of -1,
1 or 3 from each classi cation, with the most promising
SN candidates scored 3. As each new classi cation is
received, the arithmetic mean score (Save) of the candidate
is recalculated. Candidates which are not astrophysically
interesting tend to have Save < 0 (i.e., most volunteers
scored them a `1'). Astrophysical transients typically
have Save > 0, and SNe tend to have Save > 1 (i.e., most
volunteers scored them a `3').
The asset prioritisation system is adjusted after each
classi cation is received, and operates to prioritise the best
SN candidates (i.e., the order in which the candidates are
shown to classi ers). When new candidates are uploaded to
the website, they are initially prioritised based upon i) a
score supplied by the PTF pipeline, and ii) the age of the
candidate (the newest uploads are shown rst). The PTF
`real bogus' value (x 2.1) is calculated by the PTF pipeline
for all candidates and gives an indication of the likelihood
that a candidate is a real transient. This value is only used
to determine the order in which candidates are shown and
is not used in the nal ranking.
Studies of results from early (`beta') versions of Galaxy Zoo
Supernovae have allowed us to optimise the asset prioriti-
sation to reduce the time taken to identify candidates. We
divide candidates in to four categories:
(i) Unseen { Candidates which have 3 or fewer classi ca-
tions.
(ii) Bulk { Candidates which have been classi ed between
3 and 10 times.
(iii) Stragglers { Candidates which have been classi ed
more than 10 times, but which do not have a `de nitive'
Save (i.e., those with 0:0 < Save < 1:7).
(iv) Done { Candidates which have been classi ed more
than 10 times and which have Save < 0:0 or Save > 1:7, and
candidates which have been classi ed more than 20 times.
Candidates in the `unseen' category are given absolute
precedence over all others in an aim to get an initial
understanding of the quality of the candidate; they are
shown in order of upload time followed by the real-bogus
score. Once these are completed, the `bulk' and `straggler'
candidate classes have equal priority. We select randomly
between the two classes, choosing the newest candidate
with the highest score from each group { as a candidate
begins to receive `positive' classi cations (i.e., S of 1 or 2)
then it is prioritised above any others thus allowing rapid
identi cation of the most interesting targets.
The choice of 10 classi cations as the rst point at
which a candidate can be considered classi ed is a com-
promise between the robustness of the classi cation and
speed. Clearly, the greater the number of classi cations
required for each candidate the slower the classi cation
process proceeds; yet the process must be robust against
both user mistakes (i.e., clicking the wrong button) and
misunderstanding.
The aim is to both quickly classify the best, high-
scoring candidates (which will rapidly exceed the Save = 1:7
threshold after 10 classi cations), and to remove the worst
candidates (which will remain below Save = 0:0). More
ambiguous candidates can then obtain up to 10 extra
classi cations before completion. The process continues
until a target has received enough classi cation scores that
it is considered `done', at which point it is removed from
the pool of available candidates. Our simulations based on
the beta versions indicated that this scheme is 2{3 times
faster at classifying than just using a random order.
3.5 Communication of results
The science of Galaxy Zoo Supernovae relies on new candi-
dates being classi ed rapidly, and those classi cations then
being easily accessible to the science team.
3.5.1 Science dashboard
A key part of the Galaxy Zoo Supernovae website is a
science `dashboard' for the PTF team. The science dash-
board provides basic statistics on the number of candidate
uploads, classi cations and volunteers versus time as well
as a more in-depth breakdown of the classi cation history
for a candidate or individual.
Custom views have been created which break down a
score ranked list of candidates for each day and week al-
lowing observing teams to use these rankings to help in the
identi cation of good candidates for follow up observations.
Candidates already identi ed as PTF transients show the
PTF identi er on the science dashboard and a link is
also provided to allow the science team to easily mark a
highly-ranked candidate from the Zoo in the PTF database.
3.5.2 Candidate alerts
In order to improve the rate at which objects are classi-
ed, an automated alert system that monitors the number
of candidates being uploaded to the website is used. Should
the number of unclassi ed candidates reach a threshold, the
website sends an automated `alert' to Galaxy Zoo Super-
novae subscribers. These (email) alerts are usually sent out
once per day, coinciding with the end of a night's candi-
dates being uploaded from NERSC, and usually result in
the full complement of candidates being classi ed within a
few hours.
3.5.3 `My Supernovae'
Providing feedback to the Galaxy Zoo Supernovae com-
munity is a vital part of the overall website experience to
encourage volunteers to return to the website. This is partly
done using forums and blogs where scientists can comment
on individual events classi ed by the zoo. In addition, each
volunteer can view a history of the candidates that they
have classi ed on their `My Supernovae' page.
The `My Supernovae' (MySN) page displays the can-
didate triplets. Those which have been observed are
overlaid with a small symbol identifying the candidate as
c
0000 RAS, MNRAS 000, 000{000
Page 8
hidden
8 Smith et al.
PTF09ecl
CV
PTF09ffg
unknown
PTF09epz
SN Ia
PTF09dxv
SN II
PTF09vx
SN Ia
PTF09dnp
SN Ia
PTF09afw
unknown
PTF09si
SN Ia
PTF09bgr
unknown
PTF09dqt
SN Ia
PTF09dah
SN II
PTF09dlc
SN Ia
PTF09csj
SN Ia
PTF09akb
SN Ia
PTF09alu
SN Ia
PTF09dnq
SN Ia
PTF09dic
SN Ia
PTF09bdb
SN Ia
PTF09dhx
SN Ia
PTF09amr
unknown
Figure 4. A montage of the 20 highest ranked PTF candidates from the October testing of the website. Each set of three images shows,
from left to right, the new image, the reference image, and the subtraction image. The position of the candidate is shown in each panel
by the crosshairs. The candidate name and the spectroscopic type from the WHT (where available) are also shown.
c
0000 RAS, MNRAS 000, 000{000
Page 10
hidden
10 Smith et al.
16 17 18 19 20 21Candidate apparent magnitude (R)
−1
0
1
2
3
S ave
PTF TransientsPTF SNeAverage SN score
0.00 0.05 0.10 0.15 0.20Candidate apparent magnitude error
−1
0
1
2
3
S ave
100 50 20 10 8 5Sigma of detection
Figure 8. The Galaxy Zoo Supernova scores Save of PTF candidates of various types as a function of their apparent R detection
magnitude (left) and the error in that magnitude (right). The lled circles show PTF objects believed to be SN-like transients, lled
squares show the con rmed SNe, while the contours show the distribution of all  14; 000 PTF candidates. The open squares show
the average SN scores in bins of magnitude (or magnitude error). For these candidates, the trend of decreasing score with increasing
magnitude is signi cant at about 3, and with increasing magnitude error at  6. Note that only detections of 5 signi cance or greater
are uploaded to the zoo, hence the cut-o in the right-hand panel.
for each candidate. As an example, we show the `trajectory'
of Save for PTF candidates as a function of the number
of classi cations in Fig. 9. As expected, the variation in
Save when adding additional classi cations is larger when
the total number of classi cations is small compared to
when many classi cations are available. It is also apparent
that once  15 classi cations have been received, very few
candidates change Save signi cantly.
We also examine the dispersion in each of the Galaxy
Zoo Supernova scores, as calculated from the individual
classi cations, as a function of the scores themselves.
Fig. 10 plots the mean absolute deviation in the score of
each classi ed candidate as a function of the nal candidate
score. As the individual scores from which each Save is
calculated are highly quantised (each classi cation can only
result in a score of -1, 1 or 3), the resulting plot is highly
structured. In particular, objects with Save of -1 or 3 must
have a dispersion of zero, and a further dip in the dispersion
is also seen around the third scoring possibility, 1. While in
principle the dispersion in the score might be thought of as
a good measure of the classi cation con dence (measuring,
in essence, the agreement between individual classi ers),
the current simple decision tree is not re ned enough to
allow this statistic to be useful.
There therefore exists some room to improve the scoring
model used by Galaxy Zoo Supernovae (and hence the
eciency of the project). A detailed analysis of the data
in Fig. 9 shows that reducing the number of classi cations
needed before a candidate is considered classi ed (x 3.4)
from > 20 to > 15 (for intermediate scoring events) and
from > 10 to > 8 (for low and high scoring events) would
reduce the total number of classi cations recorded by
 20%, while only moving a handful of candidates (< 5%)
across the boundaries of Save = 1:7 and Save = 0:2.
0 5 10 15 20Classification count
−1
0
1
2
3
S ave
SNe, Save>1.7SNe, 0.2<Save<1.7SNe, Save<0.2
Figure 9. The Galaxy Zoo Supernova scores Save of PTF candi-
dates of various types as a function of the number of classi cations
they have received. Each line represents a spectroscopically con-
rmed PTF SN. Those in red have a nal Save > 1:7, those in
black a nal Save < 0:2, and those in blue intermediate scores.
Only candidates scored with 0:2 < Save < 1:7 continue to be
classi ed beyond 10 classi cations. The grey contours show the
trajectories of all PTF candidates that were classi ed by the zoo,
regardless of any spectroscopic typing. As the scores as highly
quantised (each classi cation can only result in a score of -1, 1 or
3), each line representing a PTF SN is o set slightly in Save for
clarity.
In principle, an analysis of which volunteers consis-
tently get the classi cations correct (when compared to a
professional astronomer or a spectroscopic classi cation)
could be used to weight di erent volunteer responses. For
example, an experienced classi er with a consistent history
of correct responses could have a larger weight than a
novice volunteer { there is evidence from Fig. 9 that even
c
0000 RAS, MNRAS 000, 000{000
Page 12
hidden
12 Smith et al.
ACKNOWLEDGEMENTS
We acknowledge the valuable contributions of the Zooni-
verse community without which this project would not
have been possible. AS acknowledges support from the
Leverhulme Trust. MS acknowledges support from the
Royal Society. MS and AG acknowledge support from a
Weizmann{UK \Making conenctions" grant. CJL acknowl-
edges support from the STFC Science in Society Program
and The Leverhulme Trust. PEN acknowledges support
from the US Department of Energy Scienti c Discovery
through Advanced Computing program under contract
DE-FG02-06ER06-04. KS acknowledges support from a
NASA Einstein Postdoctoral Fellowship grant number
PF9-00069, issued by the Chandra X-ray Observatory
Center, which is operated by the Smithsonian Astrophysical
Observatory for and on behalf of NASA under contract
NAS8-03060. JSB acknowledges support of an NSF-CDI
grant \Real-time Classi cation of Massive Time-series
Data Streams" (Award #0941742). The National Energy
Research Scienti c Computing Center, which is supported
by the Oce of Science of the U.S. Department of Energy
under Contract No. DE-AC02-05CH11231, provided sta ,
computational resources and data storage for this project.
The William Herschel Telescope is operated on the is-
land of La Palma by the Isaac Newton Group in the
Spanish Observatorio del Roque de los Muchachos of
the Instituto de Astrofsica de Canarias. Observations
obtained with the Samuel Oschin Telescope at the Palomar
Observatory as part of the Palomar Transient Factory
project: a scienti c collaboration between the California
Institute of Technology, Columbia University, Las Cumbres
Observatory, the Lawrence Berkeley National Laboratory,
the National Energy Research Scienti c Computing Center,
the University of Oxford, and the Weizmann Institute of
Science.
REFERENCES
Alard C., 2000, A&AS, 144, 363
Astier P., Guy J., Regnault N., Pain R., Aubourg E., Balam
D., Basa S., Carlberg R. G., Fabbro S., Fouchez D., Hook
I. M., Howell D. A., Lafoux H., Neill J. D., Palanque-
Delabrouille N., Perrett K., Pritchet C. J., Rich J., Sulli-
van M., Taillet R., Aldering G., Antilogus P., Arsenijevic
V., Balland C., Baumont S., Bronder J., Courtois H., Ellis
R. S., Filiol M., Goncalves A. C., Goobar A., Guide D.,
Hardin D., Lusset V., Lidman C., McMahon R., Mouchet
M., Mourao A., Perlmutter S., Ripoche P., Tao C., Walton
N., 2006, A&A, 447, 31
Bertin E., Arnouts S., 1996, A&AS, 117, 393
Frieman J. A., Bassett B., Becker A., Choi C., Cinabro
D., DeJongh F., Depoy D. L., Dilday B., Doi M., Gar-
navich P. M., Hogan C. J., Holtzman J., Im M., Jha S.,
Kessler R., Konishi K., Lampeitl H., Marriner J., Mar-
shall J. L., McGinnis D., Miknaitis G., Nichol R. C., Pri-
eto J. L., Riess A. G., Richmond M. W., Romani R.,
Sako M., Schneider D. P., Smith M., Takanashi N., Tokita
K., van der Heyden K., Yasuda N., Zheng C., Adelman-
McCarthy J., Annis J., Assef R. J., Barentine J., Bender
R., Blandford R. D., Boroski W. N., Bremer M., Brewing-
ton H., Collins C. A., Crotts A., Dembicky J., Eastman
J., Edge A., Edmondson E., Elson E., Eyler M. E., Fil-
ippenko A. V., Foley R. J., Frank S., Goobar A., Gueth
T., Gunn J. E., Harvanek M., Hopp U., Ihara Y., Ivezic
Z., Kahn S., Kaplan J., Kent S., Ketzeback W., Kleinman
S. J., Kollatschny W., Kron R. G., Krzesinski J., Lamenti
D., Leloudas G., Lin H., Long D. C., Lucey J., Lup-
ton R. H., Malanushenko E., Malanushenko V., McMillan
R. J., Mendez J., Morgan C. W., Morokuma T., Nitta A.,
Ostman L., Pan K., Rockosi C. M., Romer A. K., Ruiz-
Lapuente P., Saurage G., Schlesinger K., Snedden S. A.,
Sollerman J., Stoughton C., Stritzinger M., Subba Rao
M., Tucker D., Vaisanen P., Watson L. C., Watters S.,
Wheeler J. C., Yanny B., York D., 2008, AJ, 135, 338
Hillebrandt W., Niemeyer J. C., 2000, ARA&A, 38, 191
Kaiser N., 2004, in Presented at the Society of Photo-
Optical Instrumentation Engineers (SPIE) Conference,
Vol. 5489, Society of Photo-Optical Instrumentation En-
gineers (SPIE) Conference Series, J. M. Oschmann Jr.,
ed., pp. 11{22
Keller S. C., Schmidt B. P., Bessell M. S., Conroy P. G.,
Francis P., Granlund A., Kowald E., Oates A. P., Martin-
Jones T., Preston T., Tisserand P., Vaccarella A., Water-
son M. F., 2007, Publications of the Astronomical Society
of Australia, 24, 1
Law N. M., Kulkarni S. R., Dekany R. G., Ofek E. O.,
Quimby R. M., Nugent P. E., Surace J., Grillmair C. C.,
Bloom J. S., Kasliwal M. M., Bildsten L., Brown T.,
Cenko S. B., Ciardi D., Croner E., Djorgovski S. G., van
Eyken J., Filippenko A. V., Fox D. B., Gal-Yam A., Hale
D., Hamam N., Helou G., Henning J., Howell D. A., Ja-
cobsen J., Laher R., Mattingly S., McKenna D., Pickles
A., Poznanski D., Rahmer G., Rau A., Rosing W., Shara
M., Smith R., Starr D., Sullivan M., Velur V., Walters R.,
Zolkower J., 2009, PASP, 121, 1395
Lintott C., Schawinski K., Bamford S., Slosar A., Land K.,
Thomas D., Edmondson E., Masters K., Nichol R., Rad-
dick J., Szalay A., Andreescu D., Murray P., Vandenberg
J., 2010, ArXiv e-prints
Lintott C. J., Schawinski K., Slosar A., Land K., Bamford
S., Thomas D., Raddick M. J., Nichol R. C., Szalay A.,
Andreescu D., Murray P., Vandenberg J., 2008, MNRAS,
389, 1179
Masters K. L., Nichol R. C., Hoyle B., Lintott C., Bamford
S., Edmondson E. M., Fortson L., Keel W. C., Schawinski
K., Smith A., Thomas D., 2010, ArXiv e-prints
Nugent P., Cenko S. B., Miller A. M., Poznanski D., Bloom
J. S., Filippenko A. V., Sullivan M., Howell D. A., Quimby
R. M., Ofek E. O., Kasliwal M. M., Kulkarni S. R., Law
N. M., Dekany R. G., Rahmer G., Hale D., Smith R.,
Zolkower J., Velur V., Walters R., Henning J., Bui K.,
McKenna D., Jacobsen J., 2010, The Astronomer's Tele-
gram, 2600, 1
Perrett K., Balam D., Sullivan M., Pritchet C., Conley A.,
Carlberg R., Astier P., Balland C., Basa S., Fouchez D.,
Guy J., Hardin D., Hook I. M., Howell D. A., Pain R.,
Regnault N., 2010, AJ, 140, 518
Rau A., Kulkarni S. R., Law N. M., Bloom J. S., Ciardi D.,
Djorgovski G. S., Fox D. B., Gal-Yam A., Grillmair C. C.,
Kasliwal M. M., Nugent P. E., Ofek E. O., Quimby R. M.,
Reach W. T., Shara M., Bildsten L., Cenko S. B., Drake
c
0000 RAS, MNRAS 000, 000{000
Page 13
hidden
Galaxy Zoo Supernovae 13
A. J., Filippenko A. V., Helfand D. J., Helou G., Howell
D. A., Poznanski D., Sullivan M., 2009, PASP, 121, 1334
Sako M., Bassett B., Becker A., Cinabro D., DeJongh F.,
Depoy D. L., Dilday B., Doi M., Frieman J. A., Garnavich
P. M., Hogan C. J., Holtzman J., Jha S., Kessler R., Kon-
ishi K., Lampeitl H., Marriner J., Miknaitis G., Nichol
R. C., Prieto J. L., Riess A. G., Richmond M. W., Romani
R., Schneider D. P., Smith M., Subba Rao M., Takanashi
N., Tokita K., van der Heyden K., Yasuda N., Zheng
C., Barentine J., Brewington H., Choi C., Dembicky J.,
Harnavek M., Ihara Y., Im M., Ketzeback W., Klein-
man S. J., Krzesinski J., Long D. C., Malanushenko E.,
Malanushenko V., McMillan R. J., Morokuma T., Nitta
A., Pan K., Saurage G., Snedden S. A., 2008, AJ, 135, 348
Smartt S. J., 2009, ARA&A, 47, 63
Starr D. L., Bloom J. S., Brewer J. M., Butler N. R., Poz-
nanski D., Rischard M., Klein C., 2009, in Astronom-
ical Society of the Paci c Conference Series, Vol. 411,
Astronomical Society of the Paci c Conference Series,
D. A. Bohlender, D. Durand, & P. Dowler, ed., pp. 493{+
c
0000 RAS, MNRAS 000, 000{000

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

16 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
31% Post Doc
 
25% Ph.D. Student
 
13% Other Professional
by Country
 
38% United Kingdom
 
38% United States
 
6% China

Groups

Galaxy Zoo