SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Computational Training for Domain
Scientists & Data Literacy
Joshua	
  Bloom
Moore	
  Founda1on	
  Data-­‐Driven	
  Inves1gator	
  	
  
UC	
  Berkeley,	
  Astronomy
@pro%sb
AAAS,	
  San	
  Jose	
  	
  Sunday	
  Feb	
  15,	
  2015
“knowledge and understanding of scientific concepts and processes required
for personal decision making, participation in civic and cultural affairs,
and economic productivity"
Scientific Literacy
National Science Education Standards Report, National
Academies of Science 1996
Data Literacy, as a broad new educational imperative
Data Literacy
“knowledge and understanding of analytical concepts and processes required
to draw fair conclusions from data that impact personal, professional, and
civic lives”
now @ AAAS
http://boingboing.net/2013/01/01/correlation-between-autism-dia.html
via reddit/r/skeptic
Our	
  ML	
  framework	
  found	
  
the	
  Nearest	
  Supernova	
  in	
  3	
  
Decades	
  ..‣ Built	
  &	
  Deployed	
  robust,	
  real-­‐?me	
  
machine	
  learning	
  framework,	
  
discovering	
  >10,000	
  events	
  in	
  >	
  10	
  
TB	
  of	
  imaging	
  

	
  	
  	
  	
  	
  	
  	
  	
  →	
  50+	
  journal	
  ar?cles
‣ Built	
  Probabilis?c	
  Event	
  
classifica?on	
  catalogs	
  with	
  
innova?ve	
  ac?ve	
  learning	
  
hPp://?medomain.org hPps://www.nsf.gov/news/news_summ.jsp?cntn_id=122537
What is the toolbox
of the modern
(data-driven) scientist?
domain
training
statistics
advanced
computing
database
GUI
parallel
visualization
Bayesian
machine learning
Physics
laboratory techniques
MCMC
MapReduce
And...How do we teach this with what
little time the students have?
What is the toolbox
of the modern
(data-driven) scientist?
Data-Centric Coursework, Bootcamps, Seminars, & Lecture
Series
BDAS: Berkeley Data Analytics
Stack
[Spark, Shark, ...]
parallel
programming
bootcamp
...and entire degree programs
Taught by CS/Stats
Aimed at Engineers & Programmers
Heading Toward Industry
2010: 85 campers 2012a: 135 campers
Python Bootcamps at Berkeley
2012b: 210 campers
Python Bootcamps at Berkeley
2013a: 253 campers
‣ 3 days of live/archive streamed lectures
‣ all open material in GitHub
‣ widely disseminated (e.g., @ NASA)
http://pythonbootcamp.info
Part of the
Designated
Emphasis in
Computational
Science &
Engineering at
Berkeley
visualization
machine learning
database interaction
user interface & web frameworks
timeseries & numerical computing
interfacing to other languages
Bayesian inference & MCMC
hardware control
parallelism
Python for Data Science @ Berkeley [Sept 2013]
64%
36%
female male
8%
4%
8%
12%
4%
12%
8%
16%
16%
12%Psychology
Astronomy
Neuroscience
Biostatistics
Physics
Chemical Engineering
ISchool
Earth and Planetary Sciences
Industrial Engineering
Mechanical Engineering
“Parallel Image Reconstruction
from Radio Interferometry Data”
“Graph Theory Analysis of Growing
Graphs”
http://mb3152.github.io/Graph-Growth/
“Realtime Prediction of Activity Behavior
from Smartphone”
“Bus Arrival Time
Prediction in Spain”
Time domain preprocessing
- Start with raw photometry!
- Gaussian process detrending!
- Calibration!
- Petigura & Marcy 2012!
!
Transit search
- Matched filter!
- Similar to BLS algorithm (Kovcas+ 2002)!
- Leverages Fast-Folding Algorithm
O(N^2) → O(N log N) (Staelin+ 1968)!
!
Data validation
- Significant peaks in periodogram, but
inconsistent with exoplanet transit
TERRA – optimized for small planets
Detrended/calibrated photometry
TERRA
RawFlux(ppt)CalibratedFlux
Erik Petigura
Berkeley Astro
Grad Student
Prevalence of Earth-size planets orbiting Sun-like stars
Erik A. Petiguraa,b,1
, Andrew W. Howardb
, and Geoffrey W. Marcya
a
Astronomy Department, University of California, Berkeley, CA 94720; and b
Institute for Astronomy, University of Hawaii at Manoa, Honolulu, HI 96822
Contributed by Geoffrey W. Marcy, October 22, 2013 (sent for review October 18, 2013)
Determining whether Earth-like planets are common or rare looms
as a touchstone in the question of life in the universe. We searched
for Earth-size planets that cross in front of their host stars by
examining the brightness measurements of 42,000 stars from
National Aeronautics and Space Administration’s Kepler mission.
We found 603 planets, including 10 that are Earth size (1 − 2 R⊕)
and receive comparable levels of stellar energy to that of Earth
(0:25 − 4 F⊕). We account for Kepler’s imperfect detectability of
such planets by injecting synthetic planet–caused dimmings into
the Kepler brightness measurements and recording the fraction
detected. We find that 11 ± 4% of Sun-like stars harbor an Earth-
size planet receiving between one and four times the stellar inten-
sity as Earth. We also find that the occurrence of Earth-size planets is
constant with increasing orbital period (P), within equal intervals of
logP up to ∼200 d. Extrapolating, one finds 5:7+1:7
−2:2 % of Sun-like stars
harbor an Earth-size planet with orbital periods of 200–400 d.
extrasolar planets | astrobiology
The National Aeronautics and Space Administration’s (NASA’s)
Kepler mission was launched in 2009 to search for planets
that transit (cross in front of) their host stars (1–4). The resulting
dimming of the host stars is detectable by measuring their bright-
ness, and Kepler monitored the brightness of 150,000 stars every
30 min for 4 y. To date, this exoplanet survey has detected more
We searched for transiting planets in Kepler brightness mea-
surements using our custom-built TERRA software package
described in previous works (6, 9) and in SI Appendix. In brief,
TERRA conditions Kepler photometry in the time domain, re-
moving outliers, long timescale variability (>10 d), and systematic
errors common to a large number of stars. TERRA then searches
for transit signals by evaluating the signal-to-noise ratio (SNR) of
prospective transits over a finely spaced 3D grid of orbital period,
P, time of transit, t0, and transit duration, ΔT. This grid-based
search extends over the orbital period range of 0.5–400 d.
TERRA produced a list of “threshold crossing events” (TCEs)
that meet the key criterion of a photometric dimming SNR ratio
SNR > 12. Unfortunately, an unwieldy 16,227 TCEs met this cri-
terion, many of which are inconsistent with the periodic dimming
profile from a true transiting planet. Further vetting was performed
by automatically assessing which light curves were consistent with
theoretical models of transiting planets (10). We also visually
inspected each TCE light curve, retaining only those exhibiting a
consistent, periodic, box-shaped dimming, and rejecting those
caused by single epoch outliers, correlated noise, and other data
anomalies. The vetting process was applied homogeneously to all
TCEs and is described in further detail in SI Appendix.
To assess our vetting accuracy, we evaluated the 235 Kepler
objects of interest (KOIs) among Best42k stars having P > 50 d,
which had been found by the Kepler Project and identified as planet
candidates in the official Exoplanet Archive (exoplanetarchive.
ONOMY
Bootcamp/Seminar Alum
Python
DOE/NERSC computation
PNAS [2014]
Dangers	
  of	
  a	
  Cursory	
  Training/Teaching	
  
Curriculum
34 Fitting a straight line to data
0 50 100 150 200 250 300
x
0
100
200
300
400
500
600
700
y
y = 1.33 x + 164
0 50 100 150 200 250 300
x
0
100
200
300
400
500
600
700
y
0.0
1.0
1.0
1.0
0.0
0.0
0.0
0.00.0
0.0
0.0
0.0
0.00.0
0.0
0.0
0.0
0.0
0.0
0.0
Data analysis recipes:
Fitting a model to data⇤
David W. Hogg
Center for Cosmology and Particle Physics, Department of Physics, New York University
Max-Planck-Institut f¨ur Astronomie, Heidelberg
Jo Bovy
Center for Cosmology and Particle Physics, Department of Physics, New York University
Dustin Lang
Department of Computer Science, University of Toronto
Princeton University Observatory
Abstract
We go through the many considerations involved in fitting a model
to data, using as an example the fit of a straight line to a set of points
in a two-dimensional plane. Standard weighted least-squares fitting
is only appropriate when there is a dimension along which the data
points have negligible uncertainties, and another along which all the
uncertainties can be described by Gaussians of known variance; these
D
F
Da
Ce
Ma
Jo
Ce
Du
De
Pr
th
di
lin
an
arXiv:1008.4686v1 [astro-ph.IM] 27 Aug 2010
Sta7s7cal	
  Inference
Data	
  Literacy	
  in	
  the	
  Undergraduate	
  &	
  Graduate	
  Training	
  Mission	
  
Versioning	
  &	
  Reproducibility
“Recently, the scientific community was shaken by reports that a troubling proportion of peer-reviewed
preclinical studies are not reproducible.” McNutt, 2014
http://www.sciencemag.org/content/343/6168/229.summary
-­‐	
  Git	
  has	
  emerged	
  as	
  the	
  de	
  facto	
  versioning	
  tool	
  
-­‐	
  Berkeley	
  Common	
  Environment	
  (BCE)	
  So]ware	
  Stack	
  
-­‐	
  “Reproducible	
  and	
  Collabora?ve	
  Sta?s?cal	
  Data	
  Science”	
  (Sta?s?cs	
  157:	
  P.	
  Stark)
Data	
  Literacy	
  in	
  the	
  Undergraduate	
  &	
  Graduate	
  Training	
  Mission	
  
Undergraduate Curriculum
• Data	
  Science	
  Task	
  Force	
  report	
  generated	
  at	
  Chancellor/Provost	
  behest	
  
• Recommenda?ons:	
  
– A	
  founda?onal	
  lower-­‐division	
  course	
  accessible	
  to	
  everyone	
  (sta?s?cal	
  &	
  computa?onal	
  thinking),	
  with	
  
broad	
  applicability	
  
– A	
  suite	
  of	
  courses	
  that	
  advance	
  the	
  use	
  of	
  data	
  sciences	
  within	
  the	
  broad	
  swath	
  of	
  

disciplines	
  available	
  to	
  Berkeley	
  undergraduates	
  	
  
– A	
  high-­‐quality	
  minor	
  for	
  students	
  who	
  wish	
  to	
  couple	
  data	
  sciences	
  with	
  their	
  major	
  

discipline	
  	
  
– A	
  data	
  sciences	
  major	
  for	
  those	
  students	
  who	
  want	
  this	
  to	
  be	
  their	
  primary	
  area	
  of	
  study
pg. 12; Announced Feb 11
Chancellor’s Report for 2015
Grand	
  challenges	
  at	
  the	
  intersec7on	
  of:	
  
• Open	
  science	
  and	
  reproducible	
  data	
  analysis	
  
• Responses	
  of	
  coupled	
  human-­‐natural	
  systems	
  to	
  rapid	
  environmental	
  change	
  (water	
  
resources,	
  land	
  use	
  planning,	
  agriculture,	
  etc.)	
  
• Development	
  of	
  evidence-­‐based	
  solu?ons	
  in	
  public	
  policy,	
  resource	
  management	
  and	
  
environmental	
  design
• Five	
  schools	
  and	
  colleges:	
  LeFers	
  &	
  Science,	
  Natural	
  Resources,	
  Engineering,	
  Public	
  Policy	
  
and	
  Environmental	
  Design	
  
• Two	
  year	
  graduate	
  training	
  program;	
  Master’s	
  and	
  PhD	
  students	
  
• Four	
  cohorts	
  of	
  up	
  to	
  20	
  students	
  each,	
  first	
  cohort	
  in	
  fall	
  2015
Data Sciences for the 21st Century (DS421): Environment and Society
via Prof. David Ackerly (Integrative Biology)
Data Sciences for the 21st Century (DS421): Environment and Society
via Prof. David Ackerly (Integrative Biology)
http://bitly.com/bundles/fperezorg/1
“Bold new partnership launches to harness potential of data scientists and big data”
Founded	
  in	
  December	
  2013	
  as	
  a	
  result	
  of	
  a	
  year+	
  long	
  na?onal	
  selec?on	
  process	
  
$37.8M	
  over	
  5	
  years,	
  along	
  with	
  University	
  of	
  Washington	
  &	
  NYU	
  
‣ An	
  accelerator	
  for	
  data-­‐driven	
  discovery	
  
‣ An	
  agent	
  of	
  change	
  in	
  the	
  modern	
  university	
  as	
  Data	
  Science	
  takes	
  hold	
  
‣ An	
  incubator	
  for	
  the	
  next	
  genera?on	
  of	
  Data	
  Science	
  technology	
  and	
  prac?ce
Leadership	
  from	
  across	
  the	
  spectrum
Joshua	
  Bloom,	
  Professor,	
  Astronomy;	
  Director,	
  Center	
  
for	
  Time	
  Domain	
  Informa?cs

	
  	
  	
  	
  	
  
Henry	
  Brady,	
  Dean,	
  Goldman	
  School	
  of	
  Public	
  Policy	
  	
  	
  	
  	
  	
  	
  	
  
Cathryn	
  Carson,	
  Associate	
  Dean,	
  Social	
  Sciences;	
  Ac?ng	
  
Director	
  of	
  Social	
  Sciences	
  Data	
  Laboratory	
  "D-­‐Lab”

	
  	
  	
  	
  	
  	
  	
  


David	
  Culler,	
  Chair,	
  EECS

	
  	
  	
  	
  	
  	
  	
  	
  
Michael	
  Franklin,	
  Professor;	
  EECS,	
  Co-­‐Director,	
  AMP	
  Lab

	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Erik	
  Mitchell,	
  Associate	
  University	
  Librarian

	
  	
  	
  	
  	
  	
  	
  	
  	
  

Faculty	
  Lead/PI:	
  Saul	
  PerlmuFer,	
  Physics,	
  Berkeley	
  Center	
  for	
  Cosmological	
  Physics
Fernando	
  Perez,	
  Researcher,	
  Henry	
  H.	
  Wheeler	
  Jr.	
  Brain	
  
Imaging	
  Center

Jasjeet	
  Sekhon,	
  Professor,	
  Poli?cal	
  Science	
  and	
  Sta?s?cs;	
  
Center	
  for	
  Causal	
  Inference	
  and	
  Program	
  Evalua?on

	
  	
  	
  	
  	
  	
  	
  


Jamie	
  Sethian,	
  Professor,	
  Mathema?cs

	
  	
  	
  	
  	
  	
  	
  
Kimmen	
  Sjölander,	
  Professor,	
  Bioengineering,	
  Plant	
  and	
  
Microbial	
  Biology

	
  	
  	
  	
  	
  	
  	
  
Philip	
  Stark,	
  Chair,	
  Sta?s?cs

	
  	
  	
  	
  	
  	
  


Ion	
  Stoica,	
  Professor,	
  EECS;	
  Co-­‐Director,	
  AMP	
  Lab
“I	
  love	
  working	
  with	
  astronomers,	
  since	
  their	
  
data	
  is	
  worthless.”
-­‐	
  Jim	
  Gray,	
  Microso'
Created by Natalia Bilenko.
Data source: PubMed Central. http://sciencereview.berkeley.edu/bsr_design/Issue26/datascience/cover/
Established	
  CS/Stats/Math	
  in	
  Service	
  
of	
  novelty	
  in	
  domain	
  science	
  
vs.	
  
Novelty	
  in	
  domain	
  science	
  driving	
  &	
  informing	
  novelty	
  in	
  
CS/Stats/Math
“novelty2	
  problem”	
  
an	
  extra	
  Burden	
  for	
  Forefront	
  Scien?sts
hPps://medium.com/tech-­‐talk/dd88857f662
ỉπ vs.
21st Century Educational Mission: Produce Gamma-shaped people
deep domain skill/knowledge/training
deep methodological knowledge/skill
deep domain or methodological skill/knowledge/training
strong methodological or domain knowledge/skill
Goal: empower teams of gamma’s to excel
Summary
Data	
  science	
  at	
  Berkeley	
  is	
  thriving	
  and	
  BIDS	
  is	
  its	
  intellectual	
  home	
  
	
  	
  	
  -­‐	
  incuba?ng	
  novel	
  science	
  &	
  methodologies	
  
	
  	
  	
  -­‐	
  teaching	
  &	
  training	
  
	
  	
  	
  -­‐	
  innovate	
  environments,	
  interac?ons,	
  &	
  networks
Teaching	
  impera7ves	
  emerging	
  around	
  data	
  literacy	
  
	
  	
  	
  -­‐	
  new	
  campus-­‐wide	
  undergraduate	
  curriculum	
  (star?ng	
  with	
  a	
  founda?on	
  
for	
  all	
  1st	
  years)	
  
	
  	
  	
  -­‐	
  new	
  mul?-­‐year,	
  mul?-­‐disciplinary	
  graduate	
  program
@pro%sb
Thank	
  you.
PyData,	
  May	
  4.	
  2014

Contenu connexe

Tendances

A Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsA Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing Costs
Databricks
 
2016 Poster Launch Models
2016 Poster Launch Models2016 Poster Launch Models
2016 Poster Launch Models
Richard Ottaway
 

Tendances (14)

Processing of raw astronomical data of large volume by map reduce model
Processing of raw astronomical data of large volume by map reduce modelProcessing of raw astronomical data of large volume by map reduce model
Processing of raw astronomical data of large volume by map reduce model
 
Wcee2012 5599
Wcee2012 5599Wcee2012 5599
Wcee2012 5599
 
Presentation for OGRS 2016 at Peruggia, Italy
Presentation for OGRS 2016 at Peruggia, ItalyPresentation for OGRS 2016 at Peruggia, Italy
Presentation for OGRS 2016 at Peruggia, Italy
 
Promoting a Joint EU-BR Digital Future - High Performance Computing
Promoting a Joint EU-BR Digital Future - High Performance ComputingPromoting a Joint EU-BR Digital Future - High Performance Computing
Promoting a Joint EU-BR Digital Future - High Performance Computing
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and Analytics
 
The data, they are a-changin’
The data, they are a-changin’The data, they are a-changin’
The data, they are a-changin’
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
 
A Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing CostsA Recommender Story: Improving Backend Data Quality While Reducing Costs
A Recommender Story: Improving Backend Data Quality While Reducing Costs
 
PEER NGA-West2 Database
PEER NGA-West2 DatabasePEER NGA-West2 Database
PEER NGA-West2 Database
 
Welcome & Workshop Objectives: Introduction to COMPRES by Jay Bass, Universit...
Welcome & Workshop Objectives: Introduction to COMPRES by Jay Bass, Universit...Welcome & Workshop Objectives: Introduction to COMPRES by Jay Bass, Universit...
Welcome & Workshop Objectives: Introduction to COMPRES by Jay Bass, Universit...
 
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...
ASTRONOMICAL OBJECTS DETECTION IN CELESTIAL BODIES USING COMPUTER VISION ALGO...
 
2016 Poster Launch Models
2016 Poster Launch Models2016 Poster Launch Models
2016 Poster Launch Models
 
What to Expect of the LSST Archive: The LSST Science Platform
What to Expect of the LSST Archive: The LSST Science PlatformWhat to Expect of the LSST Archive: The LSST Science Platform
What to Expect of the LSST Archive: The LSST Science Platform
 

En vedette

Next Generation School Accountability Report-Final
Next Generation School Accountability Report-FinalNext Generation School Accountability Report-Final
Next Generation School Accountability Report-Final
Patrick Forsyth
 
MA in Teaching Research Proposal
MA in Teaching Research ProposalMA in Teaching Research Proposal
MA in Teaching Research Proposal
Jordan Hampton
 
OLT conference Learning analytics
OLT conference Learning analyticsOLT conference Learning analytics
OLT conference Learning analytics
Shirley Alexander
 
OER16 - Skills not Silos - Open Data as OER
OER16 - Skills not Silos - Open Data as OEROER16 - Skills not Silos - Open Data as OER
OER16 - Skills not Silos - Open Data as OER
Leo Havemann
 

En vedette (20)

York School District unlocking potencial for learning
York School District unlocking potencial for learningYork School District unlocking potencial for learning
York School District unlocking potencial for learning
 
Next Generation School Accountability Report-Final
Next Generation School Accountability Report-FinalNext Generation School Accountability Report-Final
Next Generation School Accountability Report-Final
 
Information literacy, from higher education to employment
Information literacy, from higher education to employmentInformation literacy, from higher education to employment
Information literacy, from higher education to employment
 
2014 NCMLE Conference Program
2014 NCMLE Conference Program2014 NCMLE Conference Program
2014 NCMLE Conference Program
 
Data literacy presentation1
Data literacy presentation1Data literacy presentation1
Data literacy presentation1
 
Developing global citizens: Open Data as Open Educational Resources
Developing global citizens: Open Data as Open Educational ResourcesDeveloping global citizens: Open Data as Open Educational Resources
Developing global citizens: Open Data as Open Educational Resources
 
Developing Data Literacy Competencies to Enhance Faculty Collaborations
Developing Data Literacy Competencies to Enhance Faculty CollaborationsDeveloping Data Literacy Competencies to Enhance Faculty Collaborations
Developing Data Literacy Competencies to Enhance Faculty Collaborations
 
MA in Teaching Research Proposal
MA in Teaching Research ProposalMA in Teaching Research Proposal
MA in Teaching Research Proposal
 
On the Shoulders of Giants: Learning from the Legacy of Middle Level Education
On the Shoulders of Giants: Learning from the Legacy of Middle Level EducationOn the Shoulders of Giants: Learning from the Legacy of Middle Level Education
On the Shoulders of Giants: Learning from the Legacy of Middle Level Education
 
OLT conference Learning analytics
OLT conference Learning analyticsOLT conference Learning analytics
OLT conference Learning analytics
 
The European Data Science Academy
The European Data Science AcademyThe European Data Science Academy
The European Data Science Academy
 
NWEA MAP BOE presentation
NWEA MAP BOE presentationNWEA MAP BOE presentation
NWEA MAP BOE presentation
 
Using Data to Drive Personalized Math Learning Needs
Using Data to Drive Personalized Math Learning NeedsUsing Data to Drive Personalized Math Learning Needs
Using Data to Drive Personalized Math Learning Needs
 
Thinking about Teaching Systems Thinking with Technology
Thinking about Teaching Systems Thinking with TechnologyThinking about Teaching Systems Thinking with Technology
Thinking about Teaching Systems Thinking with Technology
 
Using Data
Using DataUsing Data
Using Data
 
Secrets to Motivating Students / Youth
Secrets to Motivating Students / YouthSecrets to Motivating Students / Youth
Secrets to Motivating Students / Youth
 
Don’t fear the data: Statistics in Information Literacy Instruction
Don’t fear the data: Statistics in Information Literacy InstructionDon’t fear the data: Statistics in Information Literacy Instruction
Don’t fear the data: Statistics in Information Literacy Instruction
 
Skillshare - Building a data literacy community in Nigeria
Skillshare - Building a data literacy community in NigeriaSkillshare - Building a data literacy community in Nigeria
Skillshare - Building a data literacy community in Nigeria
 
OER16 - Skills not Silos - Open Data as OER
OER16 - Skills not Silos - Open Data as OEROER16 - Skills not Silos - Open Data as OER
OER16 - Skills not Silos - Open Data as OER
 
Copyright literacy in the UK: tackling anxiety through learning and games - J...
Copyright literacy in the UK: tackling anxiety through learning and games - J...Copyright literacy in the UK: tackling anxiety through learning and games - J...
Copyright literacy in the UK: tackling anxiety through learning and games - J...
 

Similaire à Computational Training for Domain Scientists & Data Literacy

Comaskey_William_Poster_SULI_FALL_2014
Comaskey_William_Poster_SULI_FALL_2014Comaskey_William_Poster_SULI_FALL_2014
Comaskey_William_Poster_SULI_FALL_2014
William Comaskey
 
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million GalaxiesAstronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
Sérgio Sacani
 
Digital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyDigital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in Astronomy
Jose Enrique Ruiz
 
myashar_research_2016
myashar_research_2016myashar_research_2016
myashar_research_2016
Mark Yashar
 
Effects of differing statistical methodologies on inferences about Earth-like...
Effects of differing statistical methodologies on inferences about Earth-like...Effects of differing statistical methodologies on inferences about Earth-like...
Effects of differing statistical methodologies on inferences about Earth-like...
John Owens
 
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...
Sérgio Sacani
 
3gyguhuhuhiouiooiopoppipop[[pp[pppp;;.pdf
3gyguhuhuhiouiooiopoppipop[[pp[pppp;;.pdf3gyguhuhuhiouiooiopoppipop[[pp[pppp;;.pdf
3gyguhuhuhiouiooiopoppipop[[pp[pppp;;.pdf
tttiba
 

Similaire à Computational Training for Domain Scientists & Data Literacy (20)

ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training AlgorithmsExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms
ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms
 
Data Science at Berkeley
Data Science at BerkeleyData Science at Berkeley
Data Science at Berkeley
 
Joshua Bloom Data Science at Berkeley
Joshua Bloom Data Science at BerkeleyJoshua Bloom Data Science at Berkeley
Joshua Bloom Data Science at Berkeley
 
Comaskey_William_Poster_SULI_FALL_2014
Comaskey_William_Poster_SULI_FALL_2014Comaskey_William_Poster_SULI_FALL_2014
Comaskey_William_Poster_SULI_FALL_2014
 
Identifying Exoplanets with Machine Learning Methods: A Preliminary Study
Identifying Exoplanets with Machine Learning Methods: A Preliminary StudyIdentifying Exoplanets with Machine Learning Methods: A Preliminary Study
Identifying Exoplanets with Machine Learning Methods: A Preliminary Study
 
Astronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache SparkAstronomical Data Processing on the LSST Scale with Apache Spark
Astronomical Data Processing on the LSST Scale with Apache Spark
 
Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)Research Data Infrastructure for Geochemistry (DFG Roundtable)
Research Data Infrastructure for Geochemistry (DFG Roundtable)
 
AAG_2011
AAG_2011AAG_2011
AAG_2011
 
201109021 mcguinness ska_meeting
201109021 mcguinness ska_meeting201109021 mcguinness ska_meeting
201109021 mcguinness ska_meeting
 
EXOPLANETS IDENTIFICATION AND CLUSTERING WITH MACHINE LEARNING METHODS
EXOPLANETS IDENTIFICATION AND CLUSTERING WITH MACHINE LEARNING METHODSEXOPLANETS IDENTIFICATION AND CLUSTERING WITH MACHINE LEARNING METHODS
EXOPLANETS IDENTIFICATION AND CLUSTERING WITH MACHINE LEARNING METHODS
 
CV_myashar_2017
CV_myashar_2017CV_myashar_2017
CV_myashar_2017
 
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million GalaxiesAstronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
Astronomaly at Scale: Searching for Anomalies Amongst 4 Million Galaxies
 
05 astrostat feigelson
05 astrostat feigelson05 astrostat feigelson
05 astrostat feigelson
 
Digital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyDigital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in Astronomy
 
myashar_research_2016
myashar_research_2016myashar_research_2016
myashar_research_2016
 
AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound ...
AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound ...AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound ...
AstroInformatics 2015: Large Sky Surveys: Entering the Era of Software-Bound ...
 
Effects of differing statistical methodologies on inferences about Earth-like...
Effects of differing statistical methodologies on inferences about Earth-like...Effects of differing statistical methodologies on inferences about Earth-like...
Effects of differing statistical methodologies on inferences about Earth-like...
 
Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...
A Search for Technosignatures Around 11,680 Stars with the Green Bank Telesco...
 
3gyguhuhuhiouiooiopoppipop[[pp[pppp;;.pdf
3gyguhuhuhiouiooiopoppipop[[pp[pppp;;.pdf3gyguhuhuhiouiooiopoppipop[[pp[pppp;;.pdf
3gyguhuhuhiouiooiopoppipop[[pp[pppp;;.pdf
 

Plus de Joshua Bloom

Plus de Joshua Bloom (6)

Autoencoding RNN for inference on unevenly sampled time-series data
Autoencoding RNN for inference on unevenly sampled time-series dataAutoencoding RNN for inference on unevenly sampled time-series data
Autoencoding RNN for inference on unevenly sampled time-series data
 
Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)Industrial Machine Learning (SIGKDD17)
Industrial Machine Learning (SIGKDD17)
 
Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
Large-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain AstrophysicsLarge-Scale Inference in Time Domain Astrophysics
Large-Scale Inference in Time Domain Astrophysics
 
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey EraJoshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
Joshua Bloom: Machine Learning and Classification in the Synoptic Survey Era
 

Dernier

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Dernier (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 

Computational Training for Domain Scientists & Data Literacy

  • 1. Computational Training for Domain Scientists & Data Literacy Joshua  Bloom Moore  Founda1on  Data-­‐Driven  Inves1gator     UC  Berkeley,  Astronomy @pro%sb AAAS,  San  Jose    Sunday  Feb  15,  2015
  • 2. “knowledge and understanding of scientific concepts and processes required for personal decision making, participation in civic and cultural affairs, and economic productivity" Scientific Literacy National Science Education Standards Report, National Academies of Science 1996 Data Literacy, as a broad new educational imperative Data Literacy “knowledge and understanding of analytical concepts and processes required to draw fair conclusions from data that impact personal, professional, and civic lives” now @ AAAS
  • 4. Our  ML  framework  found   the  Nearest  Supernova  in  3   Decades  ..‣ Built  &  Deployed  robust,  real-­‐?me   machine  learning  framework,   discovering  >10,000  events  in  >  10   TB  of  imaging  
                →  50+  journal  ar?cles ‣ Built  Probabilis?c  Event   classifica?on  catalogs  with   innova?ve  ac?ve  learning   hPp://?medomain.org hPps://www.nsf.gov/news/news_summ.jsp?cntn_id=122537
  • 5. What is the toolbox of the modern (data-driven) scientist? domain training statistics advanced computing database GUI parallel visualization Bayesian machine learning Physics laboratory techniques MCMC MapReduce
  • 6. And...How do we teach this with what little time the students have? What is the toolbox of the modern (data-driven) scientist?
  • 7. Data-Centric Coursework, Bootcamps, Seminars, & Lecture Series BDAS: Berkeley Data Analytics Stack [Spark, Shark, ...] parallel programming bootcamp ...and entire degree programs Taught by CS/Stats Aimed at Engineers & Programmers Heading Toward Industry
  • 8. 2010: 85 campers 2012a: 135 campers Python Bootcamps at Berkeley
  • 9. 2012b: 210 campers Python Bootcamps at Berkeley 2013a: 253 campers
  • 10. ‣ 3 days of live/archive streamed lectures ‣ all open material in GitHub ‣ widely disseminated (e.g., @ NASA) http://pythonbootcamp.info
  • 11. Part of the Designated Emphasis in Computational Science & Engineering at Berkeley visualization machine learning database interaction user interface & web frameworks timeseries & numerical computing interfacing to other languages Bayesian inference & MCMC hardware control parallelism
  • 12. Python for Data Science @ Berkeley [Sept 2013]
  • 13. 64% 36% female male 8% 4% 8% 12% 4% 12% 8% 16% 16% 12%Psychology Astronomy Neuroscience Biostatistics Physics Chemical Engineering ISchool Earth and Planetary Sciences Industrial Engineering Mechanical Engineering “Parallel Image Reconstruction from Radio Interferometry Data” “Graph Theory Analysis of Growing Graphs” http://mb3152.github.io/Graph-Growth/ “Realtime Prediction of Activity Behavior from Smartphone” “Bus Arrival Time Prediction in Spain”
  • 14. Time domain preprocessing - Start with raw photometry! - Gaussian process detrending! - Calibration! - Petigura & Marcy 2012! ! Transit search - Matched filter! - Similar to BLS algorithm (Kovcas+ 2002)! - Leverages Fast-Folding Algorithm O(N^2) → O(N log N) (Staelin+ 1968)! ! Data validation - Significant peaks in periodogram, but inconsistent with exoplanet transit TERRA – optimized for small planets Detrended/calibrated photometry TERRA RawFlux(ppt)CalibratedFlux Erik Petigura Berkeley Astro Grad Student Prevalence of Earth-size planets orbiting Sun-like stars Erik A. Petiguraa,b,1 , Andrew W. Howardb , and Geoffrey W. Marcya a Astronomy Department, University of California, Berkeley, CA 94720; and b Institute for Astronomy, University of Hawaii at Manoa, Honolulu, HI 96822 Contributed by Geoffrey W. Marcy, October 22, 2013 (sent for review October 18, 2013) Determining whether Earth-like planets are common or rare looms as a touchstone in the question of life in the universe. We searched for Earth-size planets that cross in front of their host stars by examining the brightness measurements of 42,000 stars from National Aeronautics and Space Administration’s Kepler mission. We found 603 planets, including 10 that are Earth size (1 − 2 R⊕) and receive comparable levels of stellar energy to that of Earth (0:25 − 4 F⊕). We account for Kepler’s imperfect detectability of such planets by injecting synthetic planet–caused dimmings into the Kepler brightness measurements and recording the fraction detected. We find that 11 ± 4% of Sun-like stars harbor an Earth- size planet receiving between one and four times the stellar inten- sity as Earth. We also find that the occurrence of Earth-size planets is constant with increasing orbital period (P), within equal intervals of logP up to ∼200 d. Extrapolating, one finds 5:7+1:7 −2:2 % of Sun-like stars harbor an Earth-size planet with orbital periods of 200–400 d. extrasolar planets | astrobiology The National Aeronautics and Space Administration’s (NASA’s) Kepler mission was launched in 2009 to search for planets that transit (cross in front of) their host stars (1–4). The resulting dimming of the host stars is detectable by measuring their bright- ness, and Kepler monitored the brightness of 150,000 stars every 30 min for 4 y. To date, this exoplanet survey has detected more We searched for transiting planets in Kepler brightness mea- surements using our custom-built TERRA software package described in previous works (6, 9) and in SI Appendix. In brief, TERRA conditions Kepler photometry in the time domain, re- moving outliers, long timescale variability (>10 d), and systematic errors common to a large number of stars. TERRA then searches for transit signals by evaluating the signal-to-noise ratio (SNR) of prospective transits over a finely spaced 3D grid of orbital period, P, time of transit, t0, and transit duration, ΔT. This grid-based search extends over the orbital period range of 0.5–400 d. TERRA produced a list of “threshold crossing events” (TCEs) that meet the key criterion of a photometric dimming SNR ratio SNR > 12. Unfortunately, an unwieldy 16,227 TCEs met this cri- terion, many of which are inconsistent with the periodic dimming profile from a true transiting planet. Further vetting was performed by automatically assessing which light curves were consistent with theoretical models of transiting planets (10). We also visually inspected each TCE light curve, retaining only those exhibiting a consistent, periodic, box-shaped dimming, and rejecting those caused by single epoch outliers, correlated noise, and other data anomalies. The vetting process was applied homogeneously to all TCEs and is described in further detail in SI Appendix. To assess our vetting accuracy, we evaluated the 235 Kepler objects of interest (KOIs) among Best42k stars having P > 50 d, which had been found by the Kepler Project and identified as planet candidates in the official Exoplanet Archive (exoplanetarchive. ONOMY Bootcamp/Seminar Alum Python DOE/NERSC computation PNAS [2014]
  • 15. Dangers  of  a  Cursory  Training/Teaching   Curriculum
  • 16. 34 Fitting a straight line to data 0 50 100 150 200 250 300 x 0 100 200 300 400 500 600 700 y y = 1.33 x + 164 0 50 100 150 200 250 300 x 0 100 200 300 400 500 600 700 y 0.0 1.0 1.0 1.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.00.0 0.0 0.0 0.0 0.0 0.0 0.0 Data analysis recipes: Fitting a model to data⇤ David W. Hogg Center for Cosmology and Particle Physics, Department of Physics, New York University Max-Planck-Institut f¨ur Astronomie, Heidelberg Jo Bovy Center for Cosmology and Particle Physics, Department of Physics, New York University Dustin Lang Department of Computer Science, University of Toronto Princeton University Observatory Abstract We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these D F Da Ce Ma Jo Ce Du De Pr th di lin an arXiv:1008.4686v1 [astro-ph.IM] 27 Aug 2010 Sta7s7cal  Inference Data  Literacy  in  the  Undergraduate  &  Graduate  Training  Mission  
  • 17. Versioning  &  Reproducibility “Recently, the scientific community was shaken by reports that a troubling proportion of peer-reviewed preclinical studies are not reproducible.” McNutt, 2014 http://www.sciencemag.org/content/343/6168/229.summary -­‐  Git  has  emerged  as  the  de  facto  versioning  tool   -­‐  Berkeley  Common  Environment  (BCE)  So]ware  Stack   -­‐  “Reproducible  and  Collabora?ve  Sta?s?cal  Data  Science”  (Sta?s?cs  157:  P.  Stark) Data  Literacy  in  the  Undergraduate  &  Graduate  Training  Mission  
  • 18. Undergraduate Curriculum • Data  Science  Task  Force  report  generated  at  Chancellor/Provost  behest   • Recommenda?ons:   – A  founda?onal  lower-­‐division  course  accessible  to  everyone  (sta?s?cal  &  computa?onal  thinking),  with   broad  applicability   – A  suite  of  courses  that  advance  the  use  of  data  sciences  within  the  broad  swath  of  
 disciplines  available  to  Berkeley  undergraduates     – A  high-­‐quality  minor  for  students  who  wish  to  couple  data  sciences  with  their  major  
 discipline     – A  data  sciences  major  for  those  students  who  want  this  to  be  their  primary  area  of  study
  • 19. pg. 12; Announced Feb 11 Chancellor’s Report for 2015
  • 20. Grand  challenges  at  the  intersec7on  of:   • Open  science  and  reproducible  data  analysis   • Responses  of  coupled  human-­‐natural  systems  to  rapid  environmental  change  (water   resources,  land  use  planning,  agriculture,  etc.)   • Development  of  evidence-­‐based  solu?ons  in  public  policy,  resource  management  and   environmental  design • Five  schools  and  colleges:  LeFers  &  Science,  Natural  Resources,  Engineering,  Public  Policy   and  Environmental  Design   • Two  year  graduate  training  program;  Master’s  and  PhD  students   • Four  cohorts  of  up  to  20  students  each,  first  cohort  in  fall  2015 Data Sciences for the 21st Century (DS421): Environment and Society via Prof. David Ackerly (Integrative Biology)
  • 21. Data Sciences for the 21st Century (DS421): Environment and Society via Prof. David Ackerly (Integrative Biology)
  • 22. http://bitly.com/bundles/fperezorg/1 “Bold new partnership launches to harness potential of data scientists and big data” Founded  in  December  2013  as  a  result  of  a  year+  long  na?onal  selec?on  process   $37.8M  over  5  years,  along  with  University  of  Washington  &  NYU   ‣ An  accelerator  for  data-­‐driven  discovery   ‣ An  agent  of  change  in  the  modern  university  as  Data  Science  takes  hold   ‣ An  incubator  for  the  next  genera?on  of  Data  Science  technology  and  prac?ce
  • 23. Leadership  from  across  the  spectrum Joshua  Bloom,  Professor,  Astronomy;  Director,  Center   for  Time  Domain  Informa?cs
           Henry  Brady,  Dean,  Goldman  School  of  Public  Policy                 Cathryn  Carson,  Associate  Dean,  Social  Sciences;  Ac?ng   Director  of  Social  Sciences  Data  Laboratory  "D-­‐Lab”
               
 David  Culler,  Chair,  EECS
                 Michael  Franklin,  Professor;  EECS,  Co-­‐Director,  AMP  Lab
                     Erik  Mitchell,  Associate  University  Librarian
                  
 Faculty  Lead/PI:  Saul  PerlmuFer,  Physics,  Berkeley  Center  for  Cosmological  Physics Fernando  Perez,  Researcher,  Henry  H.  Wheeler  Jr.  Brain   Imaging  Center
 Jasjeet  Sekhon,  Professor,  Poli?cal  Science  and  Sta?s?cs;   Center  for  Causal  Inference  and  Program  Evalua?on
               
 Jamie  Sethian,  Professor,  Mathema?cs
               Kimmen  Sjölander,  Professor,  Bioengineering,  Plant  and   Microbial  Biology
               Philip  Stark,  Chair,  Sta?s?cs
             
 Ion  Stoica,  Professor,  EECS;  Co-­‐Director,  AMP  Lab
  • 24. “I  love  working  with  astronomers,  since  their   data  is  worthless.” -­‐  Jim  Gray,  Microso'
  • 25. Created by Natalia Bilenko. Data source: PubMed Central. http://sciencereview.berkeley.edu/bsr_design/Issue26/datascience/cover/
  • 26. Established  CS/Stats/Math  in  Service   of  novelty  in  domain  science   vs.   Novelty  in  domain  science  driving  &  informing  novelty  in   CS/Stats/Math “novelty2  problem”   an  extra  Burden  for  Forefront  Scien?sts hPps://medium.com/tech-­‐talk/dd88857f662
  • 27. ỉπ vs. 21st Century Educational Mission: Produce Gamma-shaped people deep domain skill/knowledge/training deep methodological knowledge/skill deep domain or methodological skill/knowledge/training strong methodological or domain knowledge/skill Goal: empower teams of gamma’s to excel
  • 28. Summary Data  science  at  Berkeley  is  thriving  and  BIDS  is  its  intellectual  home        -­‐  incuba?ng  novel  science  &  methodologies        -­‐  teaching  &  training        -­‐  innovate  environments,  interac?ons,  &  networks Teaching  impera7ves  emerging  around  data  literacy        -­‐  new  campus-­‐wide  undergraduate  curriculum  (star?ng  with  a  founda?on   for  all  1st  years)        -­‐  new  mul?-­‐year,  mul?-­‐disciplinary  graduate  program