Discovering advanced materials for energy applications (with high-throughput computing and by mining the scientific literature)
1. Discovering advanced materials for energy
applications
(with high-throughput computing and by mining the scientific literature)
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
ACM Meetup, Jan 2020
Slides (already) posted to hackingmaterials.lbl.gov
2. 2
Often, world-changing ideas are inhibited by the physical
properties of available materials at the time
Electric vehicles and solar power
are two technologies that had
been dreamed about for many
decades, yet are only seeing
wide adoption today
1910
1956
3. • Often, materials are known for several decades
before their functional applications are known
– MgB2 sitting on lab shelves for 50 years before its
identification as a superconductor in 2001
– LiFePO4 known since 1938, only identified as a Li-ion
battery cathode in 1997
• Even after discovery, optimization and
commercialization still take decades
• To get a sense for why this is so hard, let’s look at
the problem in more detail …
3
Typically, both new materials discovery and optimization
take decades
4. 4
A material is defined at multiple length scales –
stick to the fundamental scale for now
5. 5
A material is defined at multiple length scales –
stick to the fundamental scale for now
6. 6
Atoms in a box – the materials universe is huge!
• Bag of 30 atoms
• Each atom is one of 50
elements
• Arrange on 10x10x10 lattice
• Over 10108 possibilities!
– more than grains of sand on all
beaches (1021)
– more than number of atoms in
universe (1080)
8. What constrains traditional approaches to materials design?
8
“[The Chevrel] discovery resulted from a lot of
unsuccessful experiments of Mg ions insertion
into well-known hosts for Li+ ions insertion, as
well as from the thorough literature analysis
concerning the possibility of divalent ions
intercalation into inorganic materials.”
-Aurbach group, on discovery of Chevrel cathode
for multivalent (e.g., Mg2+) batteries
Levi, Levi, Chasid, Aurbach
J. Electroceramics (2009)
9. • Materials are:
– Important – constrain what’s possible in the physical
world
– Difficult to design – many, many possibilities
– Ripe for new ways of approaching the problem
9
Why do we need new ways of designing materials?
10. 10
Researchers are starting to fundamentally re-think how we
invent the materials that make up our devices
Next-
generation
materials
design
Computer-
aided
materials
design
Natural
language
processing
“Self-driving
laboratories”
11. 11
Today, computer aided design of products is ubiquitous –
but what are the governing equations to model materials?
12. Materials physics is determined by quantum mechanics
12
−!2
2m
∇2
Ψ(r)+V (r)Ψ(r) = EΨ(r)
Schrödinger equation describes all the properties
of a system through the wavefunction:
Time-independent, non-relativistic Schrödinger equation
13. • There aren’t too many real situations where we can
get a closed solution to the Schrödinger equation
• Let’s pretend we want to approach things
numerically for 1000 electrons
– There are ~500,000 electron-electron interactions to worry
about.
– Even storing the wavefunction would take ~101000 GB!
• Discretize the x,y,z, position of each electron into a 1000-
element grid = 1 billion positions per electron
• Need the wavefunction output (real + complex part) for each
combination of all electron positions, i.e. 1E9 ^ (1000) * 2, or
2E9000 values
• even at 1 byte per wavefunction value (low resolution), you have
about 2E1000 GB needed needed to store the wavefunction!
13
The wave function is formidable
14. Maybe Dirac said it best …
14
“The underlying physical laws necessary
for the mathematical theory of a large part
of physics and the whole of chemistry are
thus completely known, and the difficulty
is only that the exact application of these
laws leads to equations much too
complicated to be soluble.”
“It therefore becomes desirable that
approximate practical methods of applying
quantum mechanics should be developed,
which can lead to an explanation of the
main features of complex atomic systems
without too much computation.”
15. What is density functional theory (DFT)?
15
DFT is a method solve for the electronic structure and energetics of arbitrary
materials starting from first-principles. It replaces many-body interactions with
a mean field interaction that reproduces the same charge density.
In theory, it is exact for the ground state. In practice, accuracy depends on the
choice of (some) parameters, the type of material, the property to be studied,
and whether the simulated system (crystal) is a good approximation of reality.
DFT resulted in the 1999 Nobel Prize for chemistry (W. Kohn). It is responsible
for 2 of the top 10 cited papers of all time, across all sciences.
e–e–
e– e–
e– e–
16. How does one use DFT to design new materials?
16
A. Jain, Y. Shin, and K. A.
Persson, Nat. Rev. Mater.
1, 15004 (2016).
17. • System size is essentially limited to a few thousand atoms
– many important materials phenomena simply do not occur at this
length scale; other techniques available with reduced accuracy
• Certain materials, such as those with strong electron
correlation, remain difficult to model accurately
• Certain properties, including excited state properties
such as band gap, remain difficult to model accurately
• These are all active areas of research and improvement to
the theory, and the situation is improving on all fronts
17
Limitations of density functional theory
18. • Ok, so we have a computational model now that
allows us to assemble atoms in a computer and
predict their physical properties
• What next?
18
19. A big advantage of computational modeling is that it can be
automated – so we can screen many ideas in parallel
19
Automate the DFT
procedure
Supercomputing
Power
FireWorks
Software for programming
general computational
workflows that can be
scaled across large
supercomputers.
NERSC
Supercomputing center,
processor count is
~100,000 desktop
machines. Other centers
are also viable.
High-throughput
materials screening
G. Ceder & K.A.
Persson, Scientific
American (2015)
S. Kirklin et al., Acta Mater. 102 (2016) 125-135
20. • The answer is “it really varies a lot”
– how big / complicated are the materials you are modeling?
– how complex / expensive are the physical properties you
are trying to predict?
• Ballpark numbers:
– Low range: optimize structure of ~3-atom compounds
• time to do a million materials ~ 10 million core-hours
– Medium range: bulk modulus of ~50 atom compounds
• time to do a million materials ~ 2 billion core-hours
– The “high range” can go almost as high as you’d like …
• A “tiered” screening strategy is common
20
How much computer time is needed for
high-throughput DFT?
21. Example of high-throughput materials screening:
Li ion battery cathodes
21
anode electrolyte cathode
Li+ discharge
e- discharge
e.g.
graphitic carbon
e.g.
LiPF6 / (EC/DMC)
e.g.
LiCoO2
LiFePO4
Li+ charge
e- charge
22. The cathode material is like a Li sponge (on the atomic scale)
The cathode material must quickly
absorb and release large
quantities of Li without
degrading
It must be cost-effective and safe
It should be light, compact, and
highly absorbent (high voltage)
22
23. Anatomy of a cathode composition
Lia Mb (XYc)d
Li ion
source
electron
donor /
acceptor
structural
framework /
charge neutrality
examples:
V4+/5+,Fe2+/3+
examples:
O2-, (PO4)3-, (SiO4)4-
common cathodes: LiCoO2, LiMn2O4, LiFePO4 23
24. Calculate average voltage by computing energy differences
in structures w/ or w/o Li
24
24
GGA+U
results
Li
avg
OC
xF
G
V
D
D
= - [ + ]
E (Li Mn O2) - [ E (MnO2) + E (Li) ]
ΔG ~
25. Diffusion via Nudged Elastic Band
Hexagonal phase
low Li 529 meV
high Li 723 meV
monoclinic phase
low Li 395 meV
high Li 509 meV
• 525 meV means a micron-sized
particle can be charged in 2 hours
• Every 60 meV difference represents
a10X difference in diffusion coefficient
Kim, Moore, Kang,
Hautier, Jain, Ceder
J ECS (2011)
LiMnBO3
27. New mixed phosphate-pyrophosphate
Chemistry Novelty Energy density
vs. LiFePO4
% of theoretical capacity
already achieved in the lab
Li9V3(P2O7)3(PO4)2 New 20% greater ~65%
Origin:
V to Fe substitution in Li9Fe3(P2O7)3(PO4)2*
Remarks:
• Structure has “layers” and “tunnels”
• Pyrophosphate-phosphate mixture
• Potential 2-electron material
Jain, Hautier, Moore, Kang, Lee,
Chen, Twu, and Ceder
Journal of The Electrochemical Society
159, A622–A633 (2012).
27
C/35 at RT
2.0mg
3.0V – 4.7V
28. One can apply this template to many different applications
28
Sidorenkite-based Li-ion battery
cathodes
YCuTe2 thermoelectrics
Chen, H.; Hao, Q.; Zivkovic, O.; Hautier, G.; Du, L.-S.;
Tang, Y.; Hu, Y.-Y.; Ma, X.; Grey, C. P.; Ceder, G.
Sidorenkite (Na3MnPO4CO3): A New Intercalation
Cathode Material for Na-Ion Batteries, Chem. Mater., 2013
Aydemir, U; Pohls, J-H; Zhu, H; Hautier, G; Bajaj, S; Gibbs,
ZM; Chen, W; Li, G; Broberg, D; White, MA; Asta, M;
Persson, K; Ceder, G; Jain, A; Snyder, GJ. Thermoelectric
Properties of Intrinsically Doped YCuTe2 with CuTe4-
based Layered Structure. J. Mat. Chem C, 2016
More examples here: A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
Li-M-O CO2 capture compounds
Dunstan, M. T., Jain, A., Liu, W., Ong, S. P., Liu, T., Lee,
J., Persson, K. A., Scott, S. A., Dennis, J. S. & Grey, C. .
Energy and Environmental Science (2016)
29. 29
Examples of experimentally-confirmed materials designed
with DFT (1)
Jain, A., Shin, Y., Persson, K.A., 2016. Computational predictions of energy materials using density functional theory.
Nature Reviews Materials 1, 15004.
30. 30
Examples of experimentally-confirmed materials designed
with DFT (2)
Jain, A., Shin, Y., Persson, K.A., 2016. Computational predictions of energy materials using density functional theory.
Nature Reviews Materials 1, 15004.
31. • This information is much harder to find, but:
– New alkaline battery from Duracell with assist from high-throughput
screening from Computational Modeling Consultants
• (based on personal communication)
– New alloys for watch and phones from Apple with assist from computational
alloy design by Questek
• https://www.americaninno.com/chicago/inside-the-small-evanston-company-whose-
tech-was-acquired-by-apple-and-used-by-spacex/
– New alloys for 3D printing with guidance from ML-based models from
Citrine
• https://citrine.io/media-post/aluminum-alloy-designed-using-citrine-platform-becomes-
first-ever-officially-registered-for-3d-printing/
– New phosphor materials from Lumenari with guidance from MaterialsQM
Consulting
• (own work)
31
How about commercial impact?
32. 32
Today, DFT is often used within a pipeline that includes
machine learning – but that is a separate talk …
Machine learning /
optimization
High-throughput DFT
Expensive calculation
Experiment
Training
data
Compounds to
screen
external databases
(DFT or expt)
33. 33
Researchers are starting to fundamentally re-think how we
invent the materials that make up our devices
Next-
generation
materials
design
Computer-
aided
materials
design
Natural
language
processing
“Self-driving
laboratories”
34. 34
Can ML help us work through our backlog of information we
need to assimilate from text sources?
papers to read “someday”
NLP algorithms
35. Extracted ~2 million
abstracts of relevant
scientific articles
Use natural language
processing algorithms
to try to extract
knowledge from all this
data
35
Use computers to parse research abstracts on our behalf
36. 36
Algorithms to automatically identify keywords in the
abstracts based on word2vec and LSTM networks
Weston, L. et al Named Entity
Recognition and Normalization
Applied to Large-Scale
Information Extraction from
the Materials Science
Literature. J. Chem. Inf. Model.
(2019).
37. 37
Named entity recognition to detect materials, applications,
etc.
Named Entity Recognition
X
• Custom machine learning models to
extract the most valuable materials-related
information.
• Utilizes a long short-term memory (LSTM)
network trained on ~1000 hand-annotated
abstracts.
• f1 scores of ~0.9. f1 score for inorganic
materials extraction is >0.9.
Weston, L., et al. J. Chem. Inf. Model. (2019).
doi:10.1021/acs.jcim.9b00470
41. 41
Could these techniques also be used to predict which
materials we might want to screen for an application?
papers to read “someday”
NLP algorithms
42. • We use the word2vec
algorithm (Google) to turn
each unique word in our
corpus into a 200-
dimensional vector
• These vectors encode the
meaning of each word
meaning based on trying to
predict context words
around the target
42
Key concept 1: the word2vec algorithm
Barazza, L. How does Word2Vec’s Skip-Gram work? Becominghuman.ai. 2017
43. • We use the word2vec
algorithm (Google) to turn
each unique word in our
corpus into a 200-
dimensional vector
• These vectors encode the
meaning of each word
meaning based on trying to
predict context words
around the target
43
Key concept 1: the word2vec algorithm
Barazza, L. How does Word2Vec’s Skip-Gram work? Becominghuman.ai. 2017
“You shall know a word by
the company it keeps”
- John Rupert Firth (1957)
44. • The classic example is:
– “king” - “man” + “woman” = ? → “queen”
44
Word embeddings trained on ”normal” text learns
relationships between words
45. 45
When trained on materals science abstracts,
word2vec learns scientific concepts
crystal structures and principal
oxides of the elements
“word
embedding”
periodic table
Tshitoyan, V. et al. Unsupervised word embeddings capture latent
knowledge from materials science literature. Nature 571, 95–98 (2019).
46. • Dot product of a composition word with
the word “thermoelectric” essentially
predicts how likely that word is to appear
in an abstract with the word
thermoelectric
• Compositions with high dot products are
typically known thermoelectrics
• Sometimes, compositions have a high dot
product with “thermoelectric” but have
never been studied as a thermoelectric
• These compositions usually have high
computed power factors!
(DFT+BoltzTraP)
46
Key concept 2: vector dot products can be used to predict
which words might co-occur in abstracts
Tshitoyan, V. et al. Unsupervised word embeddings capture latent knowledge from
materials science literature. Nature 571, 95–98 (2019).
47. “Go back in time”
approach:
– For every year since
2001, see which
compounds we would
have predicted using only
literature data until that
point in time
– Make predictions of what
materials are the most
promising thermoelectrics
for data until that year
– See if those materials
were actually studied as
thermoelectrics in
subsequent years 47
Can we predict future thermoelectrics discoveries with this
method?
Tshitoyan, V. et al. Unsupervised word embeddings capture
latent knowledge from materials science literature. Nature
571, 95–98 (2019).
48. • Thus far, 2 of our top 20 predictions made in
~August 2018 have already been reported in the
literature for the first time as thermoelectrics
– Li3Sb was the subject of a computational study
(predicted zT=2.42) in Oct 2018
– SnTe2 was experimentally found to be a moderately
good thermoelectric (expt zT=0.71) in Dec 2018
• We are working with an experimentalist on one
of the predictions (but ”spare time” project)
48
How about “forward” predictions?
[1] Yang et al. "Low lattice thermal conductivity and
excellent thermoelectric behavior in Li3Sb and Li3Bi."
Journal of Physics: Condensed Matter 30.42 (2018):
425401
[2] Wang et al. "Ultralow lattice thermal conductivity and
electronic properties of monolayer 1T phase semimetal
SiTe2 and SnTe2." Physica E: Low-dimensional Systems and
Nanostructures 108 (2019): 53-59
49. 49
How is this working?
“Context
words” link
together
information
from different
sources
50. • Developing new materials is of fundamental
importance to realizing new physical
technologies
• Today, it possible to start designing phases of
matter in a computer (or supercomputer)
• New advancements in computation and machine
learning will bring us closer to being able to
design new substances from our desks
50
Conclusions
51. 51
Acknowledgements
Slides (already) posted to hackingmaterials.lbl.gov
• High-throughput DFT
– Gerbrand Ceder and “BURP” team
– Funding: Bosch / Umicore
• Natural language processing
– Gerbrand Ceder, Kristin Persson, and “Matscholar” team
– Funding: Toyota Research Institutes
• Overall work funded by US Department of Energy