Presentation on machine learning and materials science at Computing in Engineering Forum 2018, Machine Ground Interaction Consortium (MaGIC) 2018, Wisconsin, Madison, December 4, 2018
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Morgan uw maGIV v1.3 dist
1. 1
Some Uses of Machine Learning in Materials
Science
Dane Morgan
(University of Wisconsin – Madison, WI USA)
Computing in Engineering Forum 2018
Machine Ground Interaction Consortium
(MaGIC) 2018
Wisconsin, Madison
December 4, 2018
2. Collaborator Acknowledgements
Benjamin Afflerbach, Ryan Jacobs, Wei Li, Haijin Lu,
Tam Mayeshiba, Mingren Shen, Henry Wu
Cloris Feng, Nicholas Lawrence, Ruiqi Yin
(University of Wisconsin – Madison, WI USA)
Kevin Field
(Oak Ridge National Laboratory, TN USA)
3. Funding Acknowledgements
• Diffusion work
– Software Infrastructure for Sustained Innovation (SI2) award
No. 1148011.
– UW Center for High Throughput Computing (CHTC), XSEDE.
– China Scholarship Council.
• TEM Image work
– Department of Energy (DOE) Office of Nuclear Energy,
Advanced Fuel Campaign of the Nuclear Technology Research
and Development program (formerly the Fuel Cycle R&D
program).
– Oak Ridge National Laboratory’s High Flux Isotope Reactor user
facility, sponsored by the Scientific User Facilities Division,
Office of Basic Energy Sciences, DOE.
– Software Infrastructure for Sustained Innovation (SI2) award
No. 1148011.
4. Machine Learning Applications in MS&E
• Image processing tools for characterization
data using both unsupervised and supervised
methods methods (including microstructural
analysis) (e.g., octahedral tilts and defects in
electron microscopy, X-ray structural analysis)
• Property database development (e.g.,
diffusion coefficients, thermoelectrics, battery
electrolytes, amorphous alloys)
• Materials design (e.g., phosphors, polymer
dielectrics, piezoelectrics, and superconductors)
• Text mining of published papers (e.g., for
synthesis guidance)
• Accelerated modeling:
– Novel interatomic potentials (e.g., for complex alloy
surfaces, acceleration of ab initio molecular
dynamics),
– Improving ab initio functionals (e.g., corrections for
highly correlated systems)
– Fitting complex simulations (e.g., DFT, neutronics
simulations)
• Autonomous experiments (e.g., carbon
nanotube synthesis) 4
Clustering (”by-hand” machine
learning): Ashby Maps
Kernel regression fitting:
http://diffusiondata.materialshub.
org/
• See correlations tab, select Pb
with GKRR and DFT.
6. Outline
6
Property Prediction: Solute Diffusion
Image Analysis: Defect Detection in TEM
Accelerating Simulation
Wu, et al., Scientific Data, ’16; H. Wu, et al., Comp. Mat. Sci ’17; H. Lu, et al., submitted
‘18
https://doi.org/10.6084/m9.figshare.1546772.v8
http://diffusiondata.materialshub.org/
7. What is Solute Diffusion and Why Does it
Matter?
• The way an element X (solute)
moves in a host M is governed by its
diffusion coefficient.
• Diffusion controls many processes,
from semiconductor performance to
battery charging rates to nuclear
steel degradation.
• Diffusion of X in M is often unknown
or poorly (10% of values for basic
metals).
• D=D0exp(-EA/kBT), key property is EA.
• Can be calculated by ab initio
methods (~10k CPU hours/M-X
system). We will use computed
database. 7
Mamivand, et al. Submitted ‘18
1E-30
1E-28
1E-26
1E-24
1E-22
1E-20
1E-18
1E-16
0.9 1.1 1.3 1.5 1.7 1.9
D(m2/s)
1000/T (1/K)
Rothman (Tra)
Lazarev (Tra)
Salje (Chem)
Anand (Tra)
Toyama (Chem)
Deschamps
Le
This Work
Marian (MD)
Messina (DFT)
Cu in Fe
⨉ 106
range
• Considering just FCC, HCP, BCC hosts and
metallic elements: ≈40 hosts, ≈50
impurities => ≈6000 systems, ≈60m core-
hours.
• Present coverage is ≈ 7%.
• How can we quickly (and cheaply ) get to
100% coverage? Try Machine Learning!
8. Machine Learning Approach
[1] Y. Zeng and K. Bai, Journal of Alloys and Compounds 624, p. 201-209 (2015); [2] L. Ward, et al. Comp. Mat. ‘16; Datasets expanded from L.
Ward and C. Wolverton; https://bitbucket.org/wolverton/magpie, http://oqmd.org/static/analytics/magpie/doc/; [3] H. Wu, et al., Comp. Mat.
Sci ’17; H. Lu, et al., in prep ‘18
• Assume Activation energy (measured relative
to host) = F(Host descriptors, Impurity
descriptors). [1]
• Descriptors = elemental properties like
melting temperature, bulk modulus,
electronegativity, … and their ratios,
differences, etc. [2]
• F is determined using Gaussian Process
Regression (Gaussian Kernel) (GPR). Also
Gaussian Kernel Ridge Regression (GKRR).
• Fit F with calculated data (15 hosts, 440 M-X
pairs), test with cross-validation (k-fold, host-
leave-out), then predict new M-X pairs. [3]
1.0
1.5
2.0
2.5
3.0
DiffusionBarrier[eV]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
Ga
In
Tl
Ge
Sn
Pb
As
Sb
Bi
2.
2.
3.
3.
4.
DiffusionBarrier[eV]
2.0
2.5
3.0
3.5
4.0
DiffusionBarrier[eV]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
2.
2.
3.
3.
4.
DiffusionBarrier[eV]
Cu
Pd
1.0
1.5
2.0
2.5
3.0
DiffusionBarrier[eV]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
Ga
In
Tl
Ge
Sn
Pb
As
Sb
Bi
2.0
2.5
3.0
3.5
4.0
DiffusionBarrier[eV]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
Ga
In
Tl
Ge
Sn
Pb
As
Sb
Bi
2.0
2.5
3.0
3.5
4.0
DiffusionBarrier[eV] Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
2.0
2.5
3.0
3.5
4.0
DiffusionBarrier[eV]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
Cu Ni
Pd Pt
http://diffusiondata.materialshub.org/
10. Model Application: Prediction of New Data
New data on similar systems captures key trends and
appears accurate as far as we can tell.
2.0
2.5
3.0
3.5
4.0
DiffusionBarrier[e
Zn
Cd
Hg
2.0
2.5
3.0
3.5
4.0
DiffusionBarrier[eV]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
Ni
Pt
0.8
1.2
1.6
2.0
SoluteDiffusionBarrier[eV]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
Ga
In
Tl
Ge
Sn
Pb
As
Sb
Bi
Ca
Sr
Ba
K
Rb
Cs
Pb - GKRR
Original database New prediction
11. Model Application: Prediction of New Data
• Can predict comprehensive database for almost whole periodic table.
• All points not equally accurate, but error bar estimates give guidance.
XXXXXXXXXX
Figure removed for public distribution
12. Summary for Property Prediction: Solute Diffusion
• Models predicts good
results even for hosts left
out of fit for validation.
• Model can be used to
predict Em relative to host
for almost whole periodic
table.
• We have reliably extended
our diffusion data by ~5x
with machine learning
model, saving years and
~$1m.
XXXXXXXXXX
Figure removed for public distribution
13. Outline
13
Property Prediction: Solute Diffusion
Image Analysis: Defect Detection in TEM
Accelerating Simulation
W. Li, et al., Automated Defect Analysis in Electron Microscopy Images, NPJ
Computational Materials, 4 ‘18
14. Introduction to Defects in Electron
Microscopy
• Electron microscopy
techniques are widely
used to identify defects
in materials.
• An important example is
for irradiated materials,
where radiation
produces voids,
dislocations, and defect
clusters.
• Key challenge is to
determine the number
density and size
distribution of each
defect type. 14
15. Defect Analysis in Irradiated Materials
1. Accuracy: Humans make errors.
2. Consistency: Different people
give inconsistent results.
3. Efficiency: takes time to train
new people; human labeling is
slow.
4. Scalability: impossible to
handle thousands of images
rapidly for scaling to new
machines, movies, real-time
analysis.
15
Number density and size
distribution generally found by
human examination. Major issues:
Can machine vision tools do better?
17. Data and Assessment
• Focus on just identifying (111) loops
• Work with 270, 28 test images =
8424 training, 1142 test loops.
• Training data augmentation to 1605
images (39,596 loops).
• No exact ground truth so we take
ground truth labeling from multiple
iterative labeling by two people
(Field, Li).
• Comparison group of human
labelers (5 experts with > 5 years in
the field) who labeled 6 test images
from 28.
17
20. Summary: Defect Detection
• Machine vision tools can provide
automated defect detection in
electron micrographs.
• Accuracy appears to be
comparable to human analysis in
initial tests. Significant work
needed to generalize to more
defects and conditions, extend to
more advanced methods, and to
make practical tool.
• Future materials image analysis
may be much more automated,
with humans only reviewing
aggregate values and outliers.
20
21. Outline
21
Property Prediction: Solute Diffusion
Image Analysis: Defect Detection in TEM
Accelerating Simulation
• M. Yu, et al., Integrated Computational and Experimental Structure Refinement for
Nanoparticles, ACS Nano 10, ‘16
• A. Combs, et al., Fast Approximate STEM Image Simulations from a Machine
Learning Model, Submitted to Advanced Structural and Chemical Imaging, ’18
• Lawrence, et al., in preparation, ‘18
22. Accelerating Simulations
• Complex simulations are often
– Essentially a Y=F(X) relationship, where F is
determined by the simulation.
– Fast enough to evaluate hundreds of times on
carefully chosen grids of input parameters.
– Too slow to allow massive parameter search
(wide-range of X), optimization (Max Y over all X),
or inversion (what X yields a specified Y).
• Machine learning models can be fit to yield
very fast approximation to F.
23. Accelerating Multislice Simulations
• Electron microscopy images of atoms can be simulated almost
exactly using multislice simulations, a tool that enables quantitative
image interpretation by matching experiment and simulation.
Experiment Simulated Model
• Multislice simulations can take weeks on single CPU –
can we use machine learning to do them faster?
24. “Multifidelity” Machine Learning for
Multislice Simulations
• Build database of convolution model and multislice model
simulations for a set of atomic structures (e.g. Pt-Co nanoparticles)
• Assume [Y=Pixel intensity in multislice] = F [X=Pixel intensity in
convolution] and fit F.
• Use linear regression, neural networks
Convolution simulation ~10-2 s/CPU
(low-fidelity)
Multisclice simulation ~106 s/CPU
(high-fidelity)
Machine
Learning
25. Fitting the Convolution -> Multislice
Mapping (Pt-Co Nanoparticles)
Convolution Predicted
Linear fit
For Pt-Co nanoparticle images we can predict multislice pixel
intensity to within about 9% error about 106⨉ faster
Results
27. Thank You
for Your Attention!
Any Questions?
Contact: ddmorgan@wisc.edu
Open source machine learning tools for
materials science (MAST-ML)
Undergraduate informatics research teams
looking for collaborators
https://github.com/uw-cmg/MAST-ML https://skunkworks.engr.wisc.edu/