Standard vs Custom Battery Packs - Decoding the Power Play
Applications of Machine Learning for Materials Discovery at NREL
1. Applications of Machine Learning
for Materials Discovery at NREL
Caleb Phillips, Ph.D.
Data Analysis and Visualization
Computational Sciences Center
National Renewable Energy Laboratory
3. NREL | 3
Skeptics Allowed
“I’ll admit it, there may be something to
this ‘big data’ and ‘machine learning’
thing everyone keeps talking about.”
- Anonymous cynic (2017)
What changed?
• Computational power
• Deep Neural Networks
• Cheap storage, big data
• Increasing adoption/investment
4. NREL | 4
Setting Realistic Expectations
Machine learning &
Deep Learning
Image source: Gartner.com, Aug. 2017
5. NREL | 5
Overview of the talk
Compelling examples of materials-oriented machine learning at
work at NREL:
• Improving the throughput of experimentation:
• Interpretation: Accelerate the data->knowledge path
• Automation: Replace onerous manual tasks
• Prediction: Predict properties not measured
• Augmenting or replacing DFT simulations in candidate screening
• Prediction: End-to-end deep learning on molecular and
atomistic structures
Will cover applications at a high level – talk to or email me for more info
} Do work
faster with
more insight
} Focus work,
avoid some
altogether
6. NREL | 6
But first, the Data
The Materials Project
http://materials.nrel.gov
http://organiceletronics.nrel.gov
http://htem.nrel.gov
{
{ {
Experimental
Theoretical
Both
7. NREL | 7
Example: Experimental Materials Discovery
Taylor et al. Adv. Funct. Mater. 18, 3169 (2008)
% In
Conductivity of Annealed InZnO
Goal: Make a PhD Thesis Amount of Analysis a Routine Activity
Composition
Structure
Property
Process
Slide credit: John Perkins
11. NREL | 11
Clustering by structure and composition
Samples
Results
Extracted Data Set
~ 1000 XRD patterns
Spectral Clustering
NNLS Decomposition
Apply Machine Learning to Determine Clusters in XRD Patterns
Fast: ~ 30 seconds on a laptop
Calculated XRD Patterns
12. NREL | 12
Automatic band gap calculation
Goal: replace highly subjective manual
Process with something scalable, automated,
and (more) accurate.
Combining experimental and theoretical data compare properties across a
wide landscape of materials systems and synthesis conditions.
Schwarting et al., Materials Discovery (2018)
14. NREL | 14
High throughput screening using computational results
Constraints
Molecule
Generator
Predictive
(Machine Learning)
Model
Simulation on
Supercomputer
$$$
Results
Database
OR
Best candidates
All candidates
(sequentially)
Visualization &
Analysis
Materials
Synthesis
$$$$$
Measurement
and Validation
New
Materials
Theoretical
Experimental
Training
on
Past Results
Phillips et al. CoDA (2016)
15. NREL | 15
Predict opto-electric properties of molecules
Support Vector Regression (SVR) performance when predicting calculated band gap. Residual
error is linear and normally distributed. Median error is effectively zero, RMSE is 0.25 EV or
less for most scenarios.
First try: learn using
molecular descriptors
(traditional feature
engineering)
2 million candidates
16. NREL | 16
End-to-end Learning: Skip the feature extraction
Image Recognition: Convolutional
Neural Networks (CNNs)
O
Message
Passing
Blocks
Node Recurrent Units
Node
Embedding
Layer
Graph
Output
Layer(s)
Dense
Regression
Layers
Predictions
Input
Graph
(Molecule)
Molecular Graphs: Message Passing Neural
Networks (MPNNs)Gilmer et al., CoRR (2017)
Key hypothesis: model
can learn which features
are important directly from
structure.
17. NREL | 17
End-to-end Learning: Skip the feature extraction
Duplicate 1 (DFT)
MachineLearningPrediction
3-5x improvement over manually engineered features.
Accuracy approaching repeated-measures accuracy of DFT.
Gap 0.90
HOMO 1.05
LUMO 0.89
Spectral overlap 1.28
Polymer HOMO 1.24
Polymer LUMO 1.03
Polymer gap 1.19
Polymer optical LUMO 1.02
!"#$("&'ℎ)*+ ,+&-*)*.)
!"#$(012 3456)'&7+8)
St. John et al. https://arxiv.org/abs/1807.10363. (2018)
18. NREL | 18
Transfer learning and training set size
St. John et al. https://arxiv.org/abs/1807.10363. (2018)
19. NREL | 19
End-to-end learning for crystalline materials
Represent crystal structure as a graph
to allow end-to-end learning.
Kamdar. 2018. NREL/US DOE CSGF.
20. NREL | 20
Thanks to Many Collaborators
(and many funding sources)
Theory
Stephan Lany
Vladan Stevonvic
Aaron Holder
@ LBNL
Gerd Ceder
Kristin Persson
Data
Robert White
Kristin Munch
Peter Graf
@ NIST
Zachary Trautt
Robert Hanisch
Experiment
Andriy Zakutayev
John Perkins
Philip Parilla
David Ginley
Bill Tumas
Sebastian Siol
Lauren Garten
Elisabetta Arca
Matthew Taylor
@ NIST
Martin Green
Jae Hattrick-Simpers
Nam Nguyen
@ SLAC
Apurva Mehta
@ ANL
Debbie Myers
AI/ML
Jacob Hinkle
Marcus Schwarting
Peter St. John
@ Harvard
Harshil Kamdar Slide credit: John Perkins
21. NREL | 21
Selected Publications
Peter C. St. John, Caleb Phillips, Travis W. Kemper, A. Nolan Wilson,
Michael F. Crowley, Mark R. Nimlos, Ross E. Larsen.
Message-passing neural networks for high-throughput polymer screening.
In submission. ArXiv preprint: https://arxiv.org/abs/1807.10363
Marcus Schwarting, Sebastian Siol, Kevin Talley, Andriy Zakutayev, Caleb Phillips.
Automated algorithms for band gap analysis from optical absorption spectra.
Materials Discovery, April 18, 2018. https://doi.org/10.1016/j.md.2018.04.003
Andriy Zakutayev, Nick Wunder, Marcus Schwarting, John Perkins, Robert White,
Kristin Munch, William Tumas, and Caleb Phillips.
An open experimental database for exploring inorganic materials.
Nature. Scientific Data. April 3, 2018. https://www.nature.com/articles/sdata201853
Caleb Phillips, Ross Larson, Kristin Munch, Nikos Kopidakis.
Guided Search for Organic Photovoltaic Materials Using Predictive Data Modeling.
Conference on Data Analysis (CoDA) 2016. March 2-4, 2016. Santa Fe, New Mexico.
22. www.nrel.gov
Thank you
This work was authored by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy,
LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by U.S.
Department of Energy Office of Energy Efficiency and Renewable Energy. The views expressed in the article do not
necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher,
by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up,
irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for
U.S. Government purposes.
caleb.phillips@nrel.gov
23. NREL | 23
Save work: predict not-measured properties
• Electrical conductivity prediction using random forest model
• Training variables: chemical composition, XRD peak count, deposition conditions
• Training process: 10-fold cross-validation by withdrawing 25% sample libraries
• Training set: 16K data points varying by 9-10 orders of magnitude
Predicted vs Measured
Conductivity
Prediction accuracy for
Conductivity
Prediction accuracy of
1-2 orders of
magnitude, reasonable
for semiconductors
Zakutayev et al. Scientific Data 5 180053 (2018)
24. NREL | 24
What’s in my database?
tSne model can group
70K samples based on
similarity of their
chemical compositions
t-distributed stochastic neighbor embedding (tSne) dimensionality reduction model
Zakutayev et al. Scientific Data 5 180053 (2018)