Functional group interconversions(oxidation reduction)
LSST Data Management: Entering the Era of Software-Bound Astronomy
1. 1ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015Name of Meeting • Location • Date - Change in Slide Master
Large Sky Surveys:
Entering the Era of Software-Bound Astronomy
Mario Juric
WRF Data Science Chair in Astronomy, University of Washington
LSST Data Management Project Scientist
ASTROINFORMATICS 2015
Dubrovnik, Croatia, October 7th, 2015
5. Hipparchus of Rhodes (180-125 BC)
Discovered the precession of the
equinoxes.
Measured the length of the year to ~6
minutes.
In 129 BC, constructed one of the first
star catalogs, containing about 850
stars.
n.b.: also the one to blame for the magnitude system …
6. Galileo Galilei (1564-1642)
Researched a variety of topics in
physics, but called out here for the
introduction of the Galilean
telescope.
Galileo’s telescope allowed us for the
first time to zoom in on the cosmos,
and study the individual objects in
great detail.
7. The AstrophysicsTwo-Step
• Surveys
– Construct catalogs and maps of objects in the sky. Focus on
coarse classification and discovering targets for further follow-
up.
• Large telescopes
– Acquire detailed observations of a few representative objects.
Understand the details of astrophysical processes that govern
them, and extrapolate that understanding to the entire class.
9. But, it’s more than just a catalog of pointers – more and more, Google itself collects,
processes, indexes, visualizes, and serves the actual information we need.
More and more often, our “research” begins and ends with Google!
10. Entering the Era of Massive Sky
Surveys
• There’s a close parallel with large surveys in astronomy, in scale, quality, and
richness of the collected information
– Scale: We’re entering the era when we can image and catalog the entire sky
– Quality: Those catalogs will be as precise as the measurements taken with “pointed”
observations (used to be ~5-10x worse)
– Richness:Those catalogs contain not only positions and magnitudes, but also shapes,
profiles, and temporal behavior of the objects.
• Quite often, the research can begin and end with the survey.
• This is what makes large surveys of today not just bigger, but better; different.
They’re much more than just “finding charts”.
11. Sloan Digital Sky Survey
2.5m telescope >10000 deg2 0.1” astrometry r<22.5 flux limit
5 band, 2%, photometry for >50M stars
>300k R=2000 stellar spectra
10 years of ops: ~10 TB of imaging
12. SDSS DR6 Imaging Sky Coverage
(Adelman-McCarthy et al. 2008)
Panoramic Survey Telescope and Rapid Response System
1.8m telescope 30000 deg2 50mas astrometry r<23 flux limit
5 band, better than 1% photometry (goal)
~700 GB/night
13. 13Astroinformatics 2015 • Dubrovnik, Croatia • October 7th, 2015
LSST: A Deep, Wide, Fast, Optical Sky Survey
8.4m telescope 18000+ deg2 10mas astrom. r<24.5 (<27.5@10yr)
ugrizy 0.5-1% photometry
3.2Gpix camera 30sec exp/4sec rd 15TB/night 37 B objects
Imaging the visible sky, once every 3 days, for 10 years (825 revisits)
14. 14ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Turning the Sky into a Database
− A wide (half the sky), deep (24.5/27.5 mag), fast (image the sky once every 3 days)
survey telescope. Beginning in 2022, it will repeatedly image the sky for 10 years.
− The LSST will be an automated survey system. In nighttime, the observatory and the
data system will operate with minimum human intervention.
− The ultimate deliverable of LSST is not the telescope, nor the instruments; it is the
fully reduced data.
• All science will be come from survey catalogs and images
Telescope Images Catalogs
15. 15ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
History
1996-2000 “Dark Matter Telescope”
This project began as a quest to
understand cosmology and the Solar
System.
2000 - … “LSST”
Emphasizes a broad range of science
from the same multi-wavelength
survey data, including unique time
domain exploration
We’ve reached the scale where a
single telescope, a single general
purpose data set, can serve to answer
a wide swath of science questions
The evolution of LSST design
LSST: Evolution of Design and Purpose
16. 16ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Frontiers of Survey Astronomy (1/4)
− Time domain science
• Nova, supernova, GRBs
• Source characterization
• Instantaneous discovery
− Census of the Solar System
• NEOs, MBAs, Comets
• KBOs, Oort Cloud
− Mapping the Milky Way
• Tidal streams
• Galactic structure
− Dark energy and dark matter
• Strong Lensing
• Weak Lensing
• Constraining the nature of dark energy
17. 17ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Exposure 1
Exposure 2
Exposure 1
-
Exposure 2
Frontiers of Survey Astronomy (2/4)
− Time domain science
• Nova, supernova, GRBs
• Source characterization
• Instantaneous discovery
− Census of the Solar System
• NEOs, MBAs, Comets
• KBOs, Oort Cloud
− Mapping the Milky Way
• Tidal streams
• Galactic structure
− Dark energy and dark matter
• Strong Lensing
• Weak Lensing
• Constraining the nature of dark energy
18. 18ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Frontiers of Survey Astronomy (3/4)
− Time domain science
• Nova, supernova, GRBs
• Source characterization
• Instantaneous discovery
− Census of the Solar System
• NEOs, MBAs, Comets
• KBOs, Oort Cloud
− Mapping the Milky Way
• Tidal streams
• Galactic structure
− Dark energy and dark matter
• Strong Lensing
• Weak Lensing
• Constraining the nature of dark energy
RR Lyrae limit
MS stars
limit
Dwarf Galaxies
19. 19ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Frontiers of Survey Astronomy (4/4)
− Time domain science
• Nova, supernova, GRBs
• Source characterization
• Instantaneous discovery
− Census of the Solar System
• NEOs, MBAs, Comets
• KBOs, Oort Cloud
− Mapping the Milky Way
• Tidal streams
• Galactic structure
− Dark energy and dark matter
• Strong lensing
• Weak lensing
• Constraining the nature of dark energy
20. Astroinformatics 2015 • Dubrovnik, Croatia • October 7th, 2015
Location: Cerro Pachon, Chile
Leveling of El Peñón (the summit of Cerro Pachón)
24. 24ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
LSST Camera
Parameter Value
Diameter 1.65 m
Length 3.7 m
Weight 3000 kg
F.P. Diam 634 mm
1.65 m
5’-5”
– 3.2 Gigapixels
– 0.2 arcsec pixels
– 9.6 square degree FOV
– 2 second readout
– 6 filters
25. 25ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
LSST’s #1 Challenge:
Processing the Data in a way that Enables
Science Effectively
26. 26ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
From Data to Knowledge
Model inference – Data
And
metadata!
Model inference – Catalog Data Processing – Data
ProjectScientists
Scientists
Scientists Project Project ProjectScientists
Computationally (and cognitively)
expensive, science-case speciific
Computationally cheaper,
Easier to understand,
Science-case speciific
• Computationally expensive, general
• Reprojection; may or may not involve
compression
• Almost always introduces some
information loss
• Data Processing == Instrumental
Calibration + Measurement
27. 27ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Guiding Principles for LSST Data Products
− There are virtually infinite options on what quantities (features) one can measure
on images. But if catalog generation is understood as a (generalized) cost reduction
tool, the guiding principles become easier to define:
1. Maximize science enabled by the catalogs
- Working with images takes time and resources; a large fraction of LSST
science cases should be enabled by just the catalog.
- Be considerate to the user: provide even sub-optimal measurements if
they will enable leveraging of existing experience and tools
2. Minimize information loss
- Provide (as much as possible) estimates of likelihood surfaces, not just
single point estimators
3. Provide and document the transformation (the software)
- Measurements are becoming increasingly complex and systematics
limited; need to be maximally transparent about how they’re done
28. 28ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
What LSST will Deliver:
A Data Stream, a Database, and a (small) Cloud
− A stream of ~10 million time-domain events per night, detected and
transmitted to event distribution networks within 60 seconds of
observation.
− A catalog of orbits for ~6 million bodies in the Solar System.
− A catalog of ~37 billion objects (20B galaxies, 17B stars), ~7 trillion
single-epoch detections (“sources”), and ~30 trillion forced sources,
produced annually, accessible through online databases.
− Deep co-added images.
− Services and computing resources at the Data Access Centers to
enable user-specified custom processing and analysis.
− Software and APIs enabling development of analysis codes.
Level3Level1Level2
29. 29ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Level 1: Transient Alerts
− LSST will generate ~10k time-domain events per observation (average)
• Includes anything that changed wrt. deep template (variables, explosive
transients, asteroids, etc.)
− Planning to measure and transmit with each alert:
• position
• PSF flux, aperture flux(es), trailed model fits
• shape (adaptive moments)
• light curves in all bands (up to a ~year; stretch: all)
• variability characterization (eg., low-order light-curve moments,
probability the object is variable; Richards et al. 2011)
• cut-outs centered on the object (template, difference image)
30. 30ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Level 2: Data Release Catalogs
− Object characterization (models):
• Moving Point Source model
• Double Sérsic model (bulge+disk)
- Maximum likelihood peak
- Samples of the posterior (hundreds)
− Object characterization (non-parametric):
• Centroid: (α, δ), per band
• Adaptive moments and ellipticity
measures (per band)
• Aperture fluxes and Petrosian and Kron
fluxes and radii (per band)
− Colors:
• Seeing-independent measure of object
color
− Variability statistics:
• Period, low-order light-curve moments,
etc.
LSST Science Book,
Fig. 9.3
31. 31ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
The DPDD Captures the Details of this Thinking
LSST Data Products Definition
Document
A document giving a high-level
description of LSST data products.
http://ls.st/dpdd
Level 1 Data Products: Section 4.
Level 2 Data Products: Section 5.
Level 3 Data Products: Section 6.
Special Programs DPs: Section 7.
Questions: http://community.lsst.org
Change Requests: https://github.com/lsst/data_products
32. 32ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Looking Ahead: Astronomical Data Processing in
the 2020ies
(or why software is important)
33. 33ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Astronomy was hardware-limited
− Astronomical data processing software is a lesson in
(intelligent) cutting of corners. We (generally) didn’t need
particularly sophisticated codes so far.
− We were lucky because typically
• a) there was no data,
• b) the data was poor, or
• c) there wasn’t enough computing power available to process it right.
− Very frequently, all of the above.
− Simple algorithms were sufficient. The challenge was in the
hardware.
34. 34ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Astronomy is becoming software-bound
− Things are changing at a rapid pace.
− We can collect amazing quantities of data.
− Telescopes and cameras built today are extremely well characterized and
calibrated instruments.
− Computing has reached a level where more complex algorithms are
feasible. It’s also getting cheap.
• Also, lagging memory bandwidths are forcing us towards more complex
algorithms.
− The science requires it. The alternatives (“build a bigger telescope!”) are
incredibly costly.
− Time to develop software is becoming >= time to build (astronomical)
hardware
36. 36ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Example: Multi-Epoch Measurement via Coadds
Exposure 1
Exposure 2
Exposure 3
Coadd
Galaxy / Star Models
Fitting
Warp,
Convolve
Coadd Measurement
Hard, but we only
have to do it once.
Easy; relatively few
data points.
37. 37ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Except that you should not do that…
− Impossible to combine multi-epoch data taken in different
seeings/exposure times without information loss
• Suboptimal S/N
− Warping and resampling correlates pixel values and noise; correlation
matrices are (practically) impossible to carry forward.
• Source of systematic error
− Detector effects have to be taken out at the pixel level
• Further correlates the noise
− The effective bandpass changes from exposure to exposure.
• Coadding different sorts of apples…
− Stars move!
38. 38ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Multi-Epoch Measurement with Forward Modeling
Exposure 1
Exposure 2
Exposure 3
Galaxy /
Star
Models
Transformed Model 1
Transformed Model 2
Transformed Model 2
Warp,
Convolve
Fitting
MultiFit (Simultaneous Multi-Epoch Fitting)
Easier (depends on
model), but we have to
do it every iteration!
Same number of parameters,
but with orders of magnitude
more data points.
39. 39ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Detecting and Estimating Proper Motions
Below the Single-Epoch Flux Limit
Optimal measurement of
properties of objects imaged
in multiple epoch. Left:
extraction of a moving point
source (Lang 2009).
Individual exposures: objects are
undetected or marginally detected
Moving point-source and galaxy models
are indistinguishable on the coadd
40. 40ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Example: Sampling and retaining the Likelihoods
Perform importance sampling from a proposal distribution determined on the coadd. Plan to
characterize (and keep!) the full posterior for each object. (Unexplored) possibilities for compression.
41. 41ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Looking Ahead: Astrophysical Inference
in the 2020s
(or why software is even more important than you think)
42. 42ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Pushing the Boundaries of Optimal Inference
− As our measurements become more and more systematics limited, what
occurs in the “Data Processing” box above becomes incredibly import
Model inference – Data
Model inference – Catalog Data Processing – Data
− Sometimes, an assumption or an algorithmic choice that’s been made there
may introduce a systematic that drowns out the signal (or eliminates it).
− For optimal inference, one wants to design measurements that directly probe
the relevant aspects of the original (imaging data), and not the (lossy-
compressed) catalog.
• Or derive more appropriate catalogs/feature sets/etc.
43. 43ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
− Reasons we don’t do this today:
1. Computationally (and I/O) intensive
2. Conceptually difficult
- Expertize in statistics, applied math, and software engineering is often not there
- Catalogs are too often taken as “God given”, fundamental, result of a survey
− Things are changing
• Big data problems are becoming increasingly computationally tractable
• Average astronomer in the 2020s will grow up with an expectation of being well
versed in Stats, SE, Appl. Math.
• A concerted effort is under way, primarily driven by people in large survey and
telescope projects, to create the necessary software to make this possible.
Model inference – Data
Model inference – Catalog Data Processing – Data
Pushing the Boundaries of Optimal Inference
44. 44ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Astronomy 2025: Personalized Medicine
− LSST “Level 3” concept our first step in that direction: Enabling the
community to create new products using LSST’s software, services, or
computing resources. This means:
• Providing the software primitives to construct custom
measurement/inference codes
• Enabling the users to run those codes at the LSST data center,
leveraging the investment in I/O (piggyback onto LSST’s data trains).
− Looking ahead: Right now, we see the data releases as the key product of a
survey. By the end of LSST, I wouldn’t be surprised if we saw the software
as the key product, with hundreds specialized (and likely ephemeral)
catalogs being generated by it.
− LSST “data releases” will just be some of those catalogs, designed to be
more broadly useful than others, and retained for a longer period of time.
− LSST software software and hardware is being engineered to make this
possible.
45. 45ASTROINFORMATICS 2015 | DUBROVNIK, CROATIA | OCTOBER 7TH, 2015
Astronomy in the Age of Large Surveys
− Traditionally, astronomy was a data-starved science. Our methods and approach to
research were shaped by this environment. Surveys are altering it; data is
becoming abundant, well characterized, and rich.
− LSST is a poster-child of this transformation: it will deliver the positions,
magnitudes and variability information for virtually everything in the southern sky
to 24th-27th magnitude.
− In that environment success in research will depend on the ability to mine
knowledge from that, existing, data. A bigger instrument is not a cost-effective
solution any more. A Stage V DETF experiment may be a software project.
− This will re-ignite the drive behind VO-like ideas, though emphasis will shift to
algorithms, tools, and reusable, collaborative, organically grown software
frameworks. Projects like AstroPy and LSST are already pushing on this.
− Need is the mother of invention, and she’s here.