Deep learning is finding applications in science such as predicting material properties. DLHub is being developed to facilitate sharing of deep learning models, data, and code for science. It will collect, publish, serve, and enable retraining of models on new data. This will help address challenges of applying deep learning to science like accessing relevant resources and integrating models into workflows. The goal is to deliver deep learning capabilities to thousands of scientists through software for managing data, models and workflows.
1. Learning Systems for Science
Ian Foster
Argonne National Laboratory and The University of Chicago
foster@anl.gov
1
Joint work with Rachana Ananthakrishnan, Ben Blaiszik, Kyle Chard, Ryan Chard,
Mike Papka, Jim Pruyne, Steve Tuecke, Rick Wagner, Logan Ward, and others
2. “Whatever you are studying right now if
you are not getting up to speed on deep
learning, neural networks, etc., you lose.
We are going through the process where
software will automate software,
automation will automate automation.”
-- Mark Cuban
3.
4. Deep leaning is also finding applications in science.
Example: Predicting formation enthalpies of crystalline materials
Best conventional machine learning method,
Random Forest:
a) Only elemental
compositions
(DFT-computed OQMD)
Given DHf e.g.:
Cr2Ni3
Al2O3
Predict:
TiO2 ?
Logan Ward et al.,
Phys Rev B, 2017
5. Best conventional machine learning method,
Random Forest:
a) Only elemental
compositions
b) Also physical
attributes
(DFT-computed OQMD)
Compute
145 physical
properties:
• Stoichiometric
• Elemental
property
statistics
• Electronic
structure
• Ionic
compound
Deep leaning is also finding applications in science.
Example: Predicting formation enthalpies of crystalline materials
Logan Ward et al.,
Phys Rev B, 2017
6. Best conventional machine learning method,
Random Forest:
a) Only elemental
compositions
b) Also physical
attributes
Dipendra Jha
ElemNet,
17-layer DNN:
Only elemental
compositions
(Also runs 100x
faster than RF.)
3,500550
Jha, Ward, et al., 2018.
(DFT-computed OQMD)
Deep leaning is also finding applications in science.
Example: Predicting formation enthalpies of crystalline materials
Logan Ward et al.,
Phys Rev B, 2017
7. Deep learning
• Drug response prediction
• Scientific image classification
• Scientific text understanding
• Materials property design
• Gravitational lens detection
• Feature detection in 3D
• Street scene analysis
• Organism design
• State space prediction
• Persistent learning
• Hyperspectral patterns
Many other interesting applications are emerging
Simulation
• Materials science
• Cosmology
• Molecular dynamics
• Nuclear reactor modeling
• Combustion
• Quantum computer
simulation
• Climate modeling
• Power grid
• Discrete event simulation
• Fusion reactor simulation
• Brain simulation
• Transportation networks
Big data
• APS data analysis
• HEP data analysis
• LSST data analysis
• SKA data analysis
• Metagenome analysis
• Battery design search
• Graph analysis
• Virtual compound library
• Neuroscience data analysis
• Genome pipelines
Rick Stevens: Argonne applications for exascale
9. We face many research challenges
9
Applications
Learning systems
Foundations
Hardware
Mathematics, algorithms; general AI, reinforcement
learning, uncertainty quantification, explanability,
etc.
Advanced hardware to support AI. Evaluation of new
architectures and systems. Neuromorphic and quantum
as long-term AI accelerators?
AI software. Software infrastructure for managing data,
models, workflows etc., and for delivering AI capabilities
to 10,000s of scientists and engineers.
AI applications across science and engineering. New
approaches to simulation and experimental science.
10. DeepAI
We need a lot more computing
Exaflop/s-days used to
train:
AlexNet: 0.000007
(in 2012)
AlphaGo Zero: 2
(in 2017)
x 300,000 in 5.5 years
11. Opportunities for science automation:
Research today
11
Configure apparatus/write code
Run experiments
Solve
societal
problems
Create knowledge
What scientists
want to do
Most
scientist
time
Analyze and plan
13. Example: Accelerated discovery of metallic glasses
Metallic glasses offer unique
properties, but discovering
new, useful alloys is slow
• ML model predicts glass
formation
• Validate with automated
experimentation
• Active learning to optimize
experiments
13
Ren et al. Sci Adv. (2017) eaaq1566
14. Random forest to predict metallic glass formation
Batch active learning to choose experiments
Discovery of new ternary glass systems
14
Ren et al. Sci Adv. (2017) eaaq1566
Example: Accelerated discovery of metallic glasses
15. 15
Imagine when
only the fun parts
of science remain
https://twitter.com/worrydream/status/992546529217933312
16. 16
Developing a DL model remains an artisanal process
Model
selection
Model
training
Inference
Training
data
Q
A
Training
data
Human
expertise
model
architecture
trained
model
17. Many challenges. For example …
• Finding relevant models and methods (1000s of papers per year)
• Finding relevant data for training and validation
• Implementing, training, testing, and validating models
• Configuring and adapting models
• Scaling, accelerating, and optimizing models
• Leveraging new architectures
• Integrating models into scientific work processes
• Documenting, sharing, and explaining results
• Integrating and applying advanced methods: UQ, active learning,
reinforcement learning, …
• Engaging and educating the non-expert 99.99%
18. 18
Learning
systems
AI software. Software infrastructure for managing
data, models, workflows etc., and for delivering AI
capabilities to 10,000s of scientists and engineers.
“Without deep understanding of the basic tools needed to build and train new
algorithms … researchers creating AIs resort to hearsay, like medieval alchemists.
People gravitate around cargo-cult practices, relying on folklore and magic spells.”
– Science, May 3 2018
New “learning systems for science”
19. Organizing relevant data: Materials Data Facility
EP
EP
EP
• Query
• Browse
• Aggregate
• Mint DOIs
• Associate
metadata
• Persist
datasets
Databases
Datasets
APIs
LIMS
etc.
Distributed data
storage
Data
Publication
Data
Discovery
materialsdatafacility.org
Ben Blaiszik, Logan Ward, Jonathan Gaff, and others
20. DLHub: A data and learning hub for science
• Collect, publish, categorize models/code/ weights/data from many sources
• Serve models via API to foster sharing, consumption, and access to data,
training sets, and models
• Automate training of models
(using HPC as needed) as
new data are available
• Enable new science through
reuse and synthesis of existing
models
TrainCollect Serve
Ben Blaiszik, Ryan Chard, Logan Ward, and others
21. “beam misaligned”
“…”
Say you want to use a deep neural network for online identification
of problems when running diffraction experiments
DLHub: Collect, serve, train community models
23. ▪ Where are the model and trained weights?
▪ How do I run the model on my data?
▪ Should I run the model on my data?
▪ How can I retrain the model on new data?
https://doi.org/10.1109/NYSDS.2017.8085045
DLHub: Collect, serve, train community models
24. DLHub
[“beam off image”, …]
model/xray/batch_predict
▪ Where are the model and trained weights?
▪ How do I run the model on my data?
▪ Should I run the model on my data?
▪ How can I retrain the model on new data?
https://doi.org/10.1109/NYSDS.2017.8085045
DLHub: Collect, serve, train community models
25. DLHub
[“beam off image”, …]
model/xray/batch_predict
▪ Where are the model and trained weights?
▪ How do I run the model on my data?
▪ Should I run the model on my data?
▪ How can I retrain the model on new data?
https://doi.org/10.1109/NYSDS.2017.8085045
DLHub: Collect, serve, train community models
26. DLHub
Collect
Data
1) Register a model
Train
Model
Register
Model Model /
transform
containers
Receive DOI
Send to DLHub
DLHub: Collect, serve, train community models
28. DLHub: Initial Use Cases
• X-Ray diffraction (XRD) image tagging model
• Prediction of bulk metallic glass forming regions
in ternary diagrams
• Predicting compound stability and
bandgap by elemental composition
Coming Soon
• Deep learning to predict crystalline materials
• ML/DL applied to high-throughput catalyst
synthesis, simulation, and characterization
• DL for chemical compound stability prediction
• High-throughput High-Energy Diffraction
Microscopy (HEDM) analysis
with SLAC, NIST, NU, USC, Citrine
With
CHiMaD/NU
Wang,
Yager
et al.
33. Ben Blaiszik Steve TueckeKyle Chard Jim Pruyne Logan WardRachana
Ananthakrishnan
Ryan Chard Mike Papka Rick Wagner
I reported on the work of many talented people
Thanks also to:
• Jon Almer, Francesco de Carlo, Hemant Sharma, Brian Toby, Stefan Vogt, Stephen Streiffer,
Nicholas Schwarz, Doga Gursoy, and others, Advanced Photon Source
• Tekin Bicer, Jonathan Gaff, Raj Kettimuthu, Justin Wozniak, and others, Argonne Computing
We thank our sponsors
DLHub Globus
IMaD
Petrel
Argonne Leadership
Computing Facility
34. 34
Applications
Learning systems
Foundations
Hardware
Mathematics, algorithms; general AI, reinforcement
learning, uncertainty quantification, explanability,
etc.
Advanced hardware to support AI. Evaluation of new
architectures and systems. Neuromorphic and quantum
as long-term AI accelerators?
AI software. Software infrastructure for managing data,
models, workflows etc., and for delivering AI capabilities
to 10,000s of scientists and engineers.
AI applications across science and engineering. New
approaches to simulation and experimental science.
“All the impressive achievements of deep learning
amount to just curve fitting.” – Judea Pearl