Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop

Interesting near galaxy sources

• identiﬁed by TCP in the last 2 days
• (last epoch observed 1 week ago)
• Classiﬁcation triggered by latest epoch
added to the source

Overview

• TCP Software & Data Architecture
• Classifiers & Cutting out “Junk”

• Continuing work...

PTF spectroscopically confirmed SN,
subsequently classified by TCP as SN

Transients Classiﬁcation Pipeline

Parallelized source correlation
and classification

• Difference objects are retrieved from LBL

• Each difference-object is passed to an IPython client

• Each parallel IPython client performs:
• Source creation or correlation with existing sources

• “Feature” generation (or re-generation) for that source

source • Classification of that source
generation

feature
generation

source
classification

Parallelized source correlation
and classification
• Realtime TCP runs on 22 dedicated cores

• LCOGT’s 96 core beowulf
• non run-time tasks

• Classifier generation

• Additional resources
• To be used for future timeseries classification work

source
generation
• Yahoo’s 4000 core Hadoop academic cluster

• Amazon EC2 cluster

feature
generation

source
classification

Warehouse of light-curves

• Need representative light-curves for all science

• With these we can model each science class

• We’ve built a warehouse of example light-curves

TCP-TUTOR DotAstro.org
internal interface public interface

Confusion Matrix

different ways of quantifying effeciencies
- using original good training set, and train/evaluate efﬁcencies via folding
- using “noisiﬁed”, simulated sources matching sur vey shedule, cadences, limits

• C

“Noisification”
(resampling light-curves)

• For PTF, the Noisification code references:
• 1000s of PTF pointing and survey observing plans

• This allows simulation of PTF cadenced light-curves

• Occasionally PTF observes using a faster cadence:

• 7.5 minutes between revisiting an RA, Dec

• This requires a separate set of noisified light-curves and classifiers.

• Other pointing and observing plans could be used.
• This means we can easily generate noisified light-curves for any survey.

• Thus we can generate science classifiers for any survey.

Constructing Light Curves
from subtractions ain’t easy

true
mag
reference
[assumes template doesn’t
update]

time


true
mag
reference
update]

= 3 σ limiting mag

time


true
mag
reference
update]

= 3 σ limiting mag

detected in:
pos_sub?
neg_sub?
time


5σ exclusion
band
true
mag
reference
update]

= 3 σ limiting mag

detected in:
pos_sub?
neg_sub?
time

for some source at Constructing Light Curves
RA,DEC & ti, determine from subtractions ain’t easy
best ref_mag at t=ti
total mag = TM+
yes [detection]
detection in
positive sub?
total mag = limit_mag
no [upper limit]
no
limit_mag fainter
than ref_mag? total mag = ref_mag
[detection]
yes
no

detection in the total mag = TM-
negative sub?
[detection]
s

yes
ye

mag in negative sub < total mag = limit_mag
limit_mag - ref_mag? no [upper limit]
TM+ = 2.5 log10( f_aper × 10-0.4(sub_zp-ref_zp) + ﬂux_aper ) + ub1_ref_zp
TM- = 2.5 log10( -f_aper × 10-0.4(sub_zp-ref_zp) + ﬂux_aper ) + ub1_ref_zp

Classifiers
• General Classifier
• Filter out: poorly subtracted sources

• Filter out: minor planets / rocks

• Filter out: long-time sampled (periodic & nonperiodic)

• Identify interesting sources near known galaxies

• Identify periodic variable science class when confidence is high

• Timeseries Classifier
• Weighted combination of machine learning classifiers

• Astronomer crafted classifiers for specific science types

• Microlens, Super Nova

(Source)

General Classification
• Three general classification groups.

• Periodic variables are contained within the
“uninteresting” group, although more specific
Interesting with sub-classifications are known.
nearby galaxy context

Poor subtraction
JUNK class
SN, AGN of Uninteresting
various quality
classes Rock class
(general) Periodic variable
class
Interesting without context
information
Nicely subtracted,
non-galaxy,
non-periodic
variable classes

(Source)

General Classification
• Applied to ~80 spectroscopically confirmed
user classified (SN, AGN, galaxy) sources.

• SN lightcurve classifier is needed when galaxy
Interesting with context is not available, and to improve confidence
nearby galaxy context in SN classification.

SN, AGN,
galaxy Uninteresting
(58 SN) faint, poorly
subtracted
(11 SN)
Interesting without context
information

General Classiﬁer: components & cuts
• Crowd source modeled “RealBogus” metric
• Cut on: average RealBogus, derivatives of RB components
• Cut on: % epochs in source with good RealBogus
• PSF statistics
• Cuts on: PSF symmetry, eccentricity (averages)
• Neighboring object comparisons
• Cuts on signiﬁcance of above metrics when compared to neighboring pixels
• Minor Planet check
PyEphem
• Does an epoch intersect a Minor Planet? (PyMPChecker)
PyMPChecker

• Well sampled source
• Cuts on: well sampled periodic & nonperiodic sources

Evaluating and Combining Classifiers
The “Netflix Prize” was won using a combination of ~1000 different classifiers.

• Issues when using multiple classifiers:
• How to combine Classifiers using weights or tree-hierarchy

• How to generate final classification “probabilities” when using:

• Widely varying types of classifiers

• Each classifier may contain sub-classifications with their own class
probabilities.

• Evaluate the final combination of classifiers
• We classify PTF09xxx user classified sources

• We display success / failure cases for each general class

• Update classifier weights & cuts, try again.

• OR: Iteratively & algorithmically find best weights.

Periodic variable classifiers
• Currently, science classes are determined by combining
the weighted probabilities generated by different
classification models, for a source.
~0.4 day period
~0.14 day period
RR Lyrae using • Each machine-learned classification model is trained using RR Lyrae using
10 epoch
20 epoch “noisified” lightcurves which were generated using
different parameters. noisification
noisification
...shows highest classification
Clicking on a class for one
probability sources for that
of dozens of ML models...
model::class

Overplotting of
period-fold plotting
period-folded model
probably failed here
still needs work

0.1 - 0.17 day period RR Lyrae
using 15 epoch noisification

Continuing Work

• Test, improve general classifier cuts

• Push general classifications to Followup
Marshal

• Push specific variable science class
identified sources to Followup Marshal

• Explore other timeseries classifiers for
periodic variable classification.

Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (6)

En vedette

En vedette (8)

Similaire à Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop

Similaire à Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop (13)

Dernier

Dernier (20)

Caltech 20090903 Talk on T.C.P. for LSST/PTF workshop