The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
Materials Informatics Opportunities in Data Science
1. Opportunities in
Materials Informatics
Dane Morgan
University of Wisconsin, Madison
ddmorgan@wisc.edu, W: 608-265-5879, C: 608-234-2906
Fisher Barton Technology Center
Watertown, WI
December 7, 2015 1
2. What is Materials Informatics?
Materials informatics is a field of study that
applies the tools and principles of information
extraction from data (informatics) to materials
science and engineering to better understand
the use, selection, development, and discovery
of materials.
– Mining for materials information in large data sets
– Applying new information technologies to enable
new materials science
2
3. What Are Materials Informatics
Applications?
Related buzzwords: Data science, data analytics, data mining, knowledge
discovery, machine learning, artificial intelligence, deep learning, big data …
• Interpolation/Extrapolation/Correlation of Data – determine controlling
factors, fill in what is missing, optimize
• Design of Experiments – Perform experiments in optimal order to achieve
your goal
• Clustering (Feature Extraction) – group like things together, either
supervised or unsupervised
• Image Recognition – identify things in pictures and analyze them
• Optimization – find the optimal solution in complex spaces
• Text Mining – Extract data from published documents, web
3
Associated Infrastructure: Cloud computing, high-performance computing
clusters, high-throughput/combinatorial experiment+computation, …
9. Focus Area: Informatics for Knowledge
Discovery in Large Data Sets
Use machine learning techniques to
• Organize your data by putting all relevant,
cleaned input and output into one place
• Understand your data by finding the most
important factors controlling output values
• Expand your data by interpolating and
extrapolating
• Optimize your data by finding correlations
between input and output data to optimize
desired output
9
10. Example
• Organize: Build a database of all the relevant factors (impurity
concentrations, processing conditions, testing conditions, …)
and output performance.
• Understand: Which impurities matter most. Size of impurity
effects vs. other contributions.
• Expand: Interpolate/extrapolate to other impurity
concentrations to assess performance under conditions we
have not yet explored.
• Optimize: Determine impurity concentrations that lead to
optimal performance.
I know impurities impact my device lifetime, so …
10
12. Example: Predicting Impurity Diffusion in
FCC Alloys
12
UNPUBLISHED DATA – CONFIDENTIAL – DO NOT DISSEMINATE
Calculated activation energies with ab initio methods
1.0
1.5
2.0
2.5
3.0
DiffusionBarrier[eV]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
Ga
In
Tl
Ge
Sn
Pb
As
Sb
Bi
Ca
Sr
Ba
K
Rb
Cs
1.0
1.5
2.0
2.5
3.0
DiffusionBarrier[eV]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
Ga
In
Tl
Ge
Sn
Pb
As
Sb
Bi
1.0
1.5
2.0
2.5
3.0
DiffusionBarrier[eV]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
Ga
In
Tl
Ge
Sn
Pb
As
Sb
Bi
2.0
2.5
3.0
3.5
4.0DiffusionBarrier[eV]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
Ga
In
Tl
Ge
Sn
Pb
As
Sb
Bi
]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
]
Sc
Y
La
Ti
Zr
Hf
V
Nb
Ta
Cr
Mo
W
Mn
Tc
Re
Fe
Ru
Os
Co
Rh
Ir
Ni
Pd
Pt
Cu
Ag
Au
Zn
Cd
Hg
Mg Al
Cu Ni
13. Example: Predicting Impurity Diffusion in
FCC Alloys
• 15 FCC hosts x 100
impurities = 1500
systems, ~15m core-
hours (~$500k to
produce, ~2 years).
• We have computed
values for ~10%
• How can we quickly
(and cheaply) get to
~100% coverage?
13
M Al Ca Ni Cu Sr Rh Pd Ag Yb Ir Pt Au Pb Ac Th
X 13 20 28 29 38 45 46 47 70 77 78 79 82 89 90
H 1
He 2
Li 3
Be 4
B 5
C 6
N 7
O 8
F 9
Ne 10
Na 11
Mg 12
Al 13
Si 14
P 15
S 16
Cl 17
Ar 18
K 19
Ca 20
Sc 21
Ti 22
V 23
Cr 24
Mn 25
Fe 26
Co 27
Ni 28
Cu 29
Zn 30
Ga 31
Ge 32
As 33
Se 34
Br 35
Kr 36
Rb 37
Sr 38
Y 39 N/A N/A
Zr 40
Nb 41
Mo 42 N/A
Tc 43 N/A N/A
Ru 44 N/A N/A
Rh 45 N/A N/A
Pd 46 N/A
Ag 47
Cd 48
In 49
Sn 50
Sb 51
Te 52
I 53
Xe 54
Cs 55
Ba 56
La 57 N/A N/A
Ce 58
Pr 59
Nd 60
Pm 61
Sm 62
Eu 63
Gd 64
Tb 65
Dy 66
Ho 67
Er 68
Tm 69
Yb 70
Lu 71
Hf 72
Ta 73
W 74
Re 75
Os 76
Ir 77
Pt 78
Au 79
Hg 80
Tl 81
Pb 82
Bi 83
Po 84
At 85
Rn 86
Fr 87
Ra 88
Ac 89
Th 90
Pa 91
U 92
Np 93
Pu 94
UNPUBLISHED DATA – CONFIDENTIAL – DO NOT DISSEMINATE
14. Materials Informatics Approach –
Regression and Prediction
• Assume Activation energy = F(elemental properties)
• Elemental properties = melting temperature, bulk modulus,
electronegativity, …
• F is determined using a one of many possible methods: linear
regression, neural network, decision tree, kernel ridge
regression, …
• Fit F with calculated data, test it with cross-validation, then
predict new data.
M Al Ca Ni Cu Sr Rh Pd Ag Yb Ir Pt Au Pb Ac Th
X 13 20 28 29 38 45 46 47 70 77 78 79 82 89 90
H 1
He 2
Li 3
Be 4
B 5
C 6
N 7
O 8
F 9
Ne 10
Na 11
Mg 12
Al 13
Si 14
P 15
S 16
Cl 17
Ar 18
K 19
Ca 20
Sc 21
Ti 22
V 23
Cr 24
Mn 25
Fe 26
Co 27
Ni 28
Cu 29
Zn 30
Ga 31
Ge 32
As 33
Se 34
Br 35
Kr 36
Rb 37
Sr 38
Y 39 N/A N/A
Zr 40
Nb 41
Mo 42 N/A
Tc 43 N/A N/A
Ru 44 N/A N/A
Rh 45 N/A N/A
Pd 46 N/A
Ag 47
Cd 48
In 49
Sn 50
Sb 51
Te 52
I 53
Xe 54
Cs 55
Ba 56
La 57 N/A N/A
Ce 58
Pr 59
Nd 60
Pm 61
Sm 62
Eu 63
Gd 64
Tb 65
Dy 66
Ho 67
Er 68
Tm 69
Yb 70
Lu 71
Hf 72
Ta 73
W 74
Re 75
Os 76
Ir 77
Pt 78
Au 79
Hg 80
Tl 81
Pb 82
Bi 83
Po 84
At 85
Rn 86
Fr 87
Ra 88
Ac 89
Th 90
Pa 91
U 92
Np 93
Pu 94
Train F(properties)
M Al Ca Ni Cu Sr Rh Pd Ag Yb Ir Pt Au Pb Ac Th
X 13 20 28 29 38 45 46 47 70 77 78 79 82 89 90
H 1
He 2
Li 3
Be 4
B 5
C 6
N 7
O 8
F 9
Ne 10
Na 11
Mg 12
Al 13
Si 14
P 15
S 16
Cl 17
Ar 18
K 19
Ca 20
Sc 21
Ti 22
V 23
Cr 24
Mn 25
Fe 26
Co 27
Ni 28
Cu 29
Zn 30
Ga 31
Ge 32
As 33
Se 34
Br 35
Kr 36
Rb 37
Sr 38
Y 39 N/A N/A
Zr 40
Nb 41
Mo 42 N/A
Tc 43 N/A N/A
Ru 44 N/A N/A
Rh 45 N/A N/A
Pd 46 N/A
Ag 47
Cd 48
In 49
Sn 50
Sb 51
Te 52
I 53
Xe 54
Cs 55
Ba 56
La 57 N/A N/A
Ce 58
Pr 59
Nd 60
Pm 61
Sm 62
Eu 63
Gd 64
Tb 65
Dy 66
Ho 67
Er 68
Tm 69
Yb 70
Lu 71
Hf 72
Ta 73
W 74
Re 75
Os 76
Ir 77
Pt 78
Au 79
Hg 80
Tl 81
Pb 82
Bi 83
Po 84
At 85
Rn 86
Fr 87
Ra 88
Ac 89
Th 90
Pa 91
U 92
Np 93
Pu 94
Y. Zeng and K. Bai, Journal of Alloys and Compounds 624, p. 201-209 (2015).
14
15. Model Predictive Ability
• Leave one out
cross validation
• Predictive RMS =
0.14 eV (vs. 0.24
eV for linear fit) –
predicts diffusion
of new impurity
within <10x at
1000K
• Time to predict
new system < 1s!
0 1 2 3 4 5 6
DFT Activation Energy [eV]
0
1
2
3
4
5
6
PredictedActivationEnergy[eV]
Al
Cu
Ni
Pd
Pt
Au
Ca
Ir
Pb
Leave One Out Cross Validation
y = 0.9909x
R
2
= 0.9312
UNPUBLISHED DATA – CONFIDENTIAL – DO NOT DISSEMINATE
15
16. Al-X Recrystallization Temperature (Tx)
• Data on Tx for 82 Al-X alloys with 11 alloying elements
• What controls Tx and how can we optimize it? 16
0
50
100
150
200
250
300
350
400
0
2
4
6
8
10
12
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81
RecrystallizaonTemperatureTx
MoleFraconAlloyingElement
Alloy Number
Fe Y Ni La Ti Co
Cu Sn Ga B Ce Tx (°C)
Courtesy of
Izabela
Szlufarska, John
Perepezko, Zach
Jensen
17. Materials Informatics Approach –
Regression and Prediction
• Assume Tx = F(elemental composition)
• Elemental composition = mole fraction of Fe, Cu, Y, …
• F is determined using a one of many possible methods: linear
regression, neural network, decision tree, kernel ridge
regression, …
• Fit F with calculated data, test it with cross-validation, then
predict new data.
Train F(properties)
17
0
50
100
150
200
250
300
350
400
0
2
4
6
8
10
12
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81
RecrystallizaonTemperatureTx
MoleFraconAlloyingElement
Alloy Number
Fe Y Ni La Ti Co
Cu Sn Ga B Ce Tx (°C)
0
50
100
150
200
250
300
350
400
0
2
4
6
8
10
12
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81
RecrystallizaonTemperatureTx
MoleFraconAlloyingElement
Alloy Number
Fe Y Ni La Ti Co
Cu Sn Ga B Ce Tx (°C)
18. Linear Regression Prediction of Tx
18
Max RMS: 91°C
Min RMS: 13°C
Avg RMS: 28°C +/- 10.3°C
Original Data Std Dev: 48°C
Worst Case
1000 leave out 20% cross-validation tests
Best Case
TrainingTraining
TestingTesting
Courtesy of
Izabela
Szlufarska, John
Perepezko, Zach
Jensen
19. The Undergraduate “Materials
Informatics Skunkworks”
We are establishing ~10-20 undergraduates working together
to provide materials informatics research for companies
• Help researchers in academia and industry develop and
utilize this new field
• Provide training in rapidly growing field of informatics to
undergraduates to enhance employment opportunities and
key workforce development
• Be supported financially/academically through credits,
internships, senior design/capstone projects, funded
projects from industry
• Be supported intellectually through group culture of
teamwork and knowledge continuity (more senior train
more junior members) with limited faculty involvement for
advanced issues
19
20. What the Informatics Skunkworks
Might Provide You
WORKFORCE
A team of talented students who are ready to work quickly with
your company to get the most out of your data
DATA ANALYTICS
Technical skills to help you organize, understand and expand data
sets and utilize data to optimize materials development
20
21. What You Might Provide the
Informatics Skunkworks
FINANCIAL/COURSE CREDIT SUPPORT
Internships, Co-ops, Senior design/Capstone projects, Research
projects, Research funding or course credits
SHARED DATA
Data sets of materials related performance and property data that
are large (> ~50), can be shared (ideally published), and are worth
mining
21
24. Present Best Approach
Gaussian Kernel Ridge Regression
• We have systems M-X labeled with i, and descriptors labeled with j for
each M-X system. Assume yi are output, xi,j are input descriptors
• Regression: Find {aj} that minimize
• Ridge Regression: Find {aj} that minimize
• Kernel Ridge Regression: Find {ai} that minimize
yi - aj xi, j
j
å
æ
è
çç
ö
ø
÷÷
i
å
yi - aj xi, j
j
å
æ
è
çç
ö
ø
÷÷
i
å + l aj
2
j
å
yi - ai'K xi',xi( )
i'
å
æ
è
ç
ö
ø
÷
i
å + l ai ai'
i,i'
å K xi',xi( )
K xi',xi( )=
exp - xi' - xi
2
2s 2
( )
New values are given by y*
= ai
i,i'
å K xi,x*
( )
Kernel is
Must fit s and l
G. Montavon, et al., NJOP ‘13.
A. Gretton, Introduction to RKHS, and some simple kernel algorithms, 1/27/15 (lecture notes)
25. Gaussian Kernel Ridge Regression
Introduction to RKHS, and some simple kernel Algorithms, Arthur Gretton, January 27, 2015