SlideShare une entreprise Scribd logo
1  sur  59
Télécharger pour lire hors ligne
Atomate: a tool for rapid high-throughput
computing and materials discovery
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
Ceder group meeting, April 2020
Slides (already) posted to hackingmaterials.lbl.gov
2
“Civilization advances by extending the
number of important operations which we
can perform without thinking about them.”
- Alfred North Whitehead
Outline
3
① Vision of atomate
② History of atomate
③ Current implementation
④ Future of atomate
⑤ Practical things
A “black-box” view of performing a calculation
4
“something”
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
Unfortunately, the inside of the “black box”
is usually tedious and “low-level”
5
lots of tedious,
low-level work…
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
Input file flags
SLURM format
how to fix ZPOTRF?
q set up the structure coordinates
q write input files, double-check all
the flags
q copy to supercomputer
q submit job to queue
q deal with supercomputer
headaches
q monitor job
q fix error jobs, resubmit to queue,
wait again
q repeat process for subsequent
calculations in workflow
q parse output files to obtain results
q copy and organize results, e.g., into
Excel
All the low-level steps can lead to errors!
Let’s take a look at two alternate universes:
It is too easy to end up in
the wrong timeline!
6
you have coffee
copy files from
previous simulation
edit 5 lines
run simulation,
analyze data
forget coffee
copy files from
previous simulation
edit 4 lines
but forget
LHFCALC=F
run simulation,
looks fine at first,
in a month you
discover it was wrong
1
2
you
Today, it is difficult to learn and apply several computational
procedures due to steep learning curve
Because of the multiple low-level steps and subtleties that all
need to be done correctly to apply computational methods,
there is often a single group “expert” for each technique
7
“Alice knows how to do charged defect calculations.”
“Bob is the one who can properly converge GW runs.”
“Olga has all the scripts for phonon calculations.”
What would be a better way?
8
“something”
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
What would be a better way?
9
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
Workflows to run
q band structure
q surface energies
ü elastic tensor
q Raman spectrum
q QH thermal expansion
Ideally the method should scale to millions of calculations
10
Results!
researcher
Start with all binary
oxides, replace O->S,
run several different
properties
Workflows to run
ü band structure
ü surface energies
ü elastic tensor
q Raman spectrum
q QH thermal expansion
q spin-orbit coupling
Atomate’s vision is to make it easy, automatic, and flexible to
generate data with existing simulation packages
11
Results!
researcher
Run many different
properties of many
different materials!
Outline
12
① Vision of atomate
② History of atomate
③ Current implementation
④ Future of atomate
⑤ Practical things
13
The first attempt to create something like this in Ceder
group was early/mid 2000s, with the “JCMC” code
“relational
database hell”
DB design
bottleneck
• Java on its way out
• Poor source code
quality control
• 134,000 lines of code
• 511 classes
• 54 packages
14
Circa 2011, we solved the database problem by switching to
MongoDB.We also migrated everything to Python
Without going into too many details, it
turned out these were better tools for us to
be constructing code with.
We also tightened up code quality control
During this period, we created pymatgen and FireWorks which
are still alive today and indeed tend to be growing. These codes
were written to be general-purpose and for arbitrary end users.
We also created MPWorks and used it for ~5 years. But this
code was written mainly for ourselves to power the Materials
Project. It was a “heavyweight” infrastructure, like a battleship. It
could run a lot of calculations (good for solving big problems)
but had a ton of components and prioritized
comprehensiveness over ease–of–use (bad for solving small
problems).
• We began the “atomate” project, which
prioritized ease-of-use and ease-of-development
over feature list
• This turned out to be a very good idea, and
atomate is definitely better than MPWorks
• But, we still didn’t get everything right (more on
that later)
– Why is it taking so many iterations to get this code
right? Because developing complex frameworks is
more difficult than writing complex functions
15
Circa 2016, we decided that MPWorks needed to be
redesigned to make it easier to solve small problems
• The Materials Project is powered by atomate for the
last couple of years
– Thus, atomate users are using the same exact tool as
Materials Project
• Persson, Jain, and Ong research groups use atomate
for their work
• Some examples of outside community usage
• Members of ABINIT team (G.M. Rignanese, G.
Hautier) plan to migrate all their ABINIT workflows
to atomate
– Exciting thing is that one can then run all this stuff without
paying for anything apart from computing
16
Atomate is used in several different places and is quite well-
tested now
Outline
17
① Vision of atomate
② History of atomate
③ Current implementation
④ Future of atomate
⑤ Practical things
How can we design a simple yet powerful
tool for performing simulations?
• First, we’ll go through the four high-level steps in
doing a theory study
• In each step, we’ll talk about how software tools
can help make you more productive and help you
concentrate on things that truly require your
attention, not mindless labor (e.g., manually
resubmitting failed jobs)
18
Steps for theory &
simulation
19
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)
simulation parameters
and sequence (workflow)
job management,
execution, error recovery
databases, plotting,
analysis, insight
Software technologies for theory / calculation
20
(automatic materials
science workflows)
Custodian
(calculation error
recovery)
(materials analysis
framework)
Base packages Derived packages
(workflow definition &
execution)
These are all open-source:
(materials data mining)
Software stack for
theory & simulation
21
material
generation
simulation
procedure
calculation
&
execution
data
analysis
Custodian
Steps for theory &
simulation
22
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)
simulation parameters
and sequence (workflow)
job management,
execution, error recovery
databases, plotting,
analysis, insight
Step 1: materials
generation
23
material
generation
• Pymatgen can help generate
models for:
– crystal structures
– molecules
– systems (surfaces, interfaces, etc.)
• Tools include:
– order-disorder (shown at right) and
SQS
– interstitial finding
– surface / slab generation
– structure matching and analysis
– get structures from Materials Project
• Won’t cover materials generation
in this presentation!
Example: Order-disorder
resolve partial or mixed
occupancies into a fully
ordered crystal structure
(e.g., mixed oxide-fluoride site
into separate oxygen/fluorine)
Steps for theory &
simulation
24
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)
simulation parameters
and sequence (workflow)
job management,
execution, error recovery
databases, plotting,
analysis, insight
Step 2: simulation
procedure
25
simulation
procedure
• Simulation procedure requires knowing:
i. what sequence of calculations do I need to perform to compute my
desired materials property?
ii. for each calculation, what is a good set of parameters to use (e.g.,
functional, energy cutoff, k-point density, etc.)
• Item (i) is largely handled by atomate
• Item (ii) is largely handled by pymatgen, e.g. in the VaspInputSet
classes and won’t be covered here – although atomate also
contains some of this information
• Overall, we are trying to standardize and automate simulation
procedures as much as possible
Atomate contains a library of simulation procedures
26
VASP-based
• band structure
• spin-orbit coupling
• hybrid functional
calcs
• elastic tensor
• piezoelectric tensor
• Raman spectra
• NEB
• GIBBS method
• QH thermal
expansion
• AIMD
• ferroelectric
• surface adsorption
• work functions
• NMR spectra*
• Bader charges*
• Magnetic
orderings*
• SCAN functionals*
Other
• BoltzTraP
• FEFF method
• Q-Chem*
*=added / major
updates in past year
Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze
computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
Each simulation procedure in atomate is composed of
multiple levels of detail / abstraction
27
Workflow – complete set of calculations to get a materials property
Firework – one step in the Workflow (typically one DFT calculation)
Firetask – one step in a Firework
Starting with just a crystal
structure, this workflow performs
four calculations to get an
optimized structure, optimized
charge density, and band
structure on two types of grids
(uniform and line)
atomate allows you to leverage the prior efforts and
knowledge of many researchers
28
K. Mathew J. Montoya S. Dwaraknath A. Faghaninia
All past and present knowledge, from everyone in the group,
everyone previously in the group, and our collaborators,
about how to run calculations
M. Aykol
S.P. Ong
B. Bocklund T. Smidt
H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood
Z.K. Liu J. Neaton K. Persson A. Jain
+
Workflow parameters can be customized at
multiple levels of detail
29
1. Workflows have
various high-level
options
2. Fireworks also
have options / flags
(not shown)
3. Firetasks have
most detailed
number of options /
flags (not shown)
Example 1: “VASP input set”
controls the rules that set DFT
parameters (pseudopotentials,
cutoffs, grid densities, etc) via
pymatgen
Example II: If “stability_check” is
enabled, the later parts of the
workflow are skipped if the
structure is determined unstable to
save computer time on
uninteresting structures
You can build workflows from scratch or reuse components
to assemble workflows
Multiple workflows are built with the same
components stacked together in different ways
30
These two workflows reuse almost
all the same code between the
two!
Atomate simulation procedures: the main ideas
• Atomate allows you to start with a material (e.g.,
crystal structure) and get a full workflow of
calculations needed to compute many types of
materials properties
• Atomate tries to guess sensible defaults in terms of
calculation parameters, but you can always
customize these parameters using:
– Workflow options
– Firework options
– Firetask options
– powerups
– going back to pymatgen library
31
Steps for theory &
simulation
32
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)
simulation parameters
and sequence (workflow)
job management,
execution, error recovery
databases, plotting,
analysis, insight
Step 3: calculation &
execution
33
calculation
&
execution
• Once you have the material and the simulation procedure
(Workflow), you need to actually execute the workflow on your
computing resource
• This includes tasks like:
– submission to calculation queues
– customization of any computing-specific parameters
• e.g., path to VASP executable, number of CPUs to parallelize over
– recovering from failures / job resubmission
– coordinating jobs across computing centers
– managing location of jobs
– tracking the progress of jobs
• Almost all of this is handled by FireWorks (custodian is used for
encoding job recover procedures)
Custodian
FireWorks allows you to write your workflow once and
execute (almost) anywhere
34
• Execute workflows
locally or at a
supercomputing
center
• Queue systems
supported
– PBS
– SGE
– SLURM
– IBM LoadLeveler
– NEWT (a REST-based
API at NERSC)
– Cobalt (Argonne LCF)
Dashboard with status of all jobs
35
Job provenance and automatic metadata storage
36
what machine
what time
what directory
what was the output
when was it queued
when did it start running
when was it completed
Detect and rerun failures
• All kinds of failures can be detected and rerun
– Soft failures (job quits with error code)
– hard failures (computing center goes down)
– human errors
37
Customize job priorities
• Within workflow, or between workflows
• Completely flexible and can be modified /
updated whenever you want
38
Track output files remotely
• Can bring up the last few lines of each of your
output files – and can be combined with queries
/ filters
39
Now seems like a
good time to bring
up the last few lines
of the OUTCAR of
all failed jobs...
FireWorks workflow software: the main ideas
• FireWorks is used to execute Workflow objects at
supercomputing centers
• It is very well-suited to high-throughput applications
and has been used to execute millions of jobs and is
well tested
• There are many features and advantages to using
FireWorks as your workflow manager, which has
made it one of the most popular scientific workflow
software in use today
40
An aside on “custodian”: instead of running VASP,
have custodian run VASP on your behalf
41
• Custodian can wrap around an
executable (e.g., VASP)
– i.e., run custodian instead of directly
running VASP
• During execution, custodian will
monitor output files and detect
errors / problems
– e.g., if ZPOTRF error detected, rerun
with ISYM=0
– ever-expanding library of fixes
(currently over 30 fixes for VASP!)
• There is more you can do with
custodian (e.g., simple workflows),
but it won’t be covered here
• Currently has fixes for: VASP, Q-
Chem, NWChem
Steps for theory &
simulation
42
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)
simulation parameters
and sequence (workflow)
job management,
execution, error recovery
databases, plotting,
analysis, insight
Step 4:
data analysis
43
data
analysis
• Data analysis can involve:
i. Creating databases of materials and their properties for easy
searching and data retrieval
ii. Conventional scientific analyses and scientific plotting (e.g.,
generating phase diagrams, plotting band structures)
iii. Data mining methods
• Item (i) is largely handled by atomate
• Item (ii) is largely handled by pymatgen
• Item (iii) is largely handled by matminer
Atomate – builders
framework
44
“Builders” start with base
collections in a database and
create higher-level collections
that summarize information or
add metadata
pymatgen – examples of analyses
45
phase diagrams
Pourbaix diagrams
diffusivity from MDband structure analysis
Data analysis: the main ideas
• Atomate generates databases that summarize your
calculation results. Additionally The builders
framework generates document collections that
summarize data on a particular material from
multiple calculations or from additional information.
• Pymatgen is a powerful software for doing
conventional scientific analyses with the
computational data
• Matminer, currently in pre-release, is software for
doing data mining style analyses of materials
46
Doing simulations programmatically can become second
nature – and very fast / automatic / scalable
47
material
generation
simulation
procedure
calculation
&
execution
data
analysis
Custodian
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
software infrastructure
Outline
48
① Vision of atomate
② History of atomate
③ Current implementation
④ Future of atomate
⑤ Practical things
• After using atomate for ~3 years, we realize there are still
things that are awkward to do
• For example
– Re-using components between workflows is harder than it should be
(e.g., components don’t always “stack” and “connect” easily)
– Workflow creator functions often use different signatures and
conventions (harder to learn)
– Changing certain parameters (e.g., switching to a vdW functional)
can be non-intuitive
• We are starting “atomate v2” to fix these things, but better to
start with original atomate for now
– Original atomate will still be around, and atomate v2 will be a
smooth transition from atomate v1
49
Atomate v2
• Most of “atomate” (and all the various software tools) are
largely programmatic, not interactive / GUI-driven
• The goal of atomate was always to work up to a visual
interface
– Stage 1: Exploring results using a web interface rather than by
database queries
– Stage 2: Submitting pre-defined workflow templates using a web
interface
– Stage 3: Designing and submitting custom workflows using a web
interface
• Currently working on stage 1
50
Atomate GUI
51
Atomate GUI prototype
Done by UCB undergrad
But really requires a dedicated, focused effort
Outline
52
① Vision of atomate
② History of atomate
③ Current implementation
④ Future of atomate
⑤ Practical things
• Not going to lie, there is a learning curve and the current tools are
really meant for programmers
– When (if?) we get to “Stage 2” atomate GUI, this will no longer be the case
• If you don’t know any programming, it will be a long road
53
How to actually get started
No programming
experience
Python
proficient
pymatgen
proficient
atomate
proficientMongoDB
experience
• Papers: good general overview / vision but of no
practical help
• Online MP Workshop videos and tutorials: good way to
get started if you have at least some knowledge, but
usually not very deep. A “zero to one” situation
• Code documentation: Most comprehensive way to
learn, although some experience probably necessary
• Help channels: Great if you already started and run
into problems
54
What resources are available to learn?
Papers
• Again, these are of little practical help and any
specifics are often outdated
55
Ong, S. P. et al. Python Materials
Genomics (pymatgen): A robust,
open-source python library for
materials analysis. Comput. Mater.
Sci. 68, 314–319 (2013).
Jain, A. et al. FireWorks: a dynamic
workflow system designed for high-
throughput applications. Concurr.
Comput. Pract. Exp. 22, 5037–5059
(2015).
Mathew, K. et al. Atomate: A high-
level interface to generate, execute,
and analyze computational
materials science workflows.
Comput. Mater. Sci. 139, 140–152
(2017).
• Resources on http://workshop.materialsproject.org
• Videos on MP channel of Youtube:
– https://www.youtube.com/channel/UC6pqY-__Nu-
mKkv0LMP8FJQ
56
Online MP workshop and videos
Online documentation
• Online documentation is the most comprehensive
writeup
– www.pymatgen.org
– https://materialsproject.github.io/fireworks/
– https://materialsproject.github.io/custodian/
– https://hackingmaterials.github.io/atomate/
• The online documentation includes installation, examples,
tutorials, and descriptions of how to use the code
• If you want to do “everything”, suggest starting with
atomate and going from there
57
Help lists
• A help forum for each code is at
https://discuss.matsci.org
58
Questions? Comments?
(a big thanks to everyone who supported and/or
contributed to this software!)
59Slides (already) posted to hackingmaterials.lbl.gov

Contenu connexe

Tendances

Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
Anubhav Jain
 

Tendances (20)

Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
 
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
 
Overview of DuraMat software tool development (poster version)
Overview of DuraMat software tool development(poster version)Overview of DuraMat software tool development(poster version)
Overview of DuraMat software tool development (poster version)
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?
 
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataThe DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...
 
Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methods
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and Analytics
 
Core Objective 1: Highlights from the Central Data Resource
Core Objective 1: Highlights from the Central Data ResourceCore Objective 1: Highlights from the Central Data Resource
Core Objective 1: Highlights from the Central Data Resource
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
 

Similaire à Atomate: a tool for rapid high-throughput computing and materials discovery

Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
Anubhav Jain
 
Enabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous ArchetypesEnabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous Archetypes
James Johnson
 

Similaire à Atomate: a tool for rapid high-throughput computing and materials discovery (20)

Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
Open-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data setsOpen-source tools for generating and analyzing large materials data sets
Open-source tools for generating and analyzing large materials data sets
 
01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdf01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdf
 
Ecss des
Ecss desEcss des
Ecss des
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Constructing Operating Systems and E-Commerce
Constructing Operating Systems and E-CommerceConstructing Operating Systems and E-Commerce
Constructing Operating Systems and E-Commerce
 
Enabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous ArchetypesEnabling Congestion Control Using Homogeneous Archetypes
Enabling Congestion Control Using Homogeneous Archetypes
 
HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores HPC Cluster Computing from 64 to 156,000 Cores 
HPC Cluster Computing from 64 to 156,000 Cores 
 
Scientific
Scientific Scientific
Scientific
 
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
 
Wannier tutorial
Wannier tutorialWannier tutorial
Wannier tutorial
 
Computer Simulation of Nano-Structures
Computer Simulation of Nano-StructuresComputer Simulation of Nano-Structures
Computer Simulation of Nano-Structures
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Fake gyarmathyvarasdi
Fake gyarmathyvarasdiFake gyarmathyvarasdi
Fake gyarmathyvarasdi
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
The Salmon Algorithm Spawning with Kubernetes
The Salmon Algorithm Spawning with KubernetesThe Salmon Algorithm Spawning with Kubernetes
The Salmon Algorithm Spawning with Kubernetes
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 

Plus de Anubhav Jain

Plus de Anubhav Jain (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 

Dernier

development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 

Dernier (20)

development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 

Atomate: a tool for rapid high-throughput computing and materials discovery

  • 1. Atomate: a tool for rapid high-throughput computing and materials discovery Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA Ceder group meeting, April 2020 Slides (already) posted to hackingmaterials.lbl.gov
  • 2. 2 “Civilization advances by extending the number of important operations which we can perform without thinking about them.” - Alfred North Whitehead
  • 3. Outline 3 ① Vision of atomate ② History of atomate ③ Current implementation ④ Future of atomate ⑤ Practical things
  • 4. A “black-box” view of performing a calculation 4 “something” Results! researcher What is the GGA-PBE elastic tensor of GaAs?
  • 5. Unfortunately, the inside of the “black box” is usually tedious and “low-level” 5 lots of tedious, low-level work… Results! researcher What is the GGA-PBE elastic tensor of GaAs? Input file flags SLURM format how to fix ZPOTRF? q set up the structure coordinates q write input files, double-check all the flags q copy to supercomputer q submit job to queue q deal with supercomputer headaches q monitor job q fix error jobs, resubmit to queue, wait again q repeat process for subsequent calculations in workflow q parse output files to obtain results q copy and organize results, e.g., into Excel
  • 6. All the low-level steps can lead to errors! Let’s take a look at two alternate universes: It is too easy to end up in the wrong timeline! 6 you have coffee copy files from previous simulation edit 5 lines run simulation, analyze data forget coffee copy files from previous simulation edit 4 lines but forget LHFCALC=F run simulation, looks fine at first, in a month you discover it was wrong 1 2 you
  • 7. Today, it is difficult to learn and apply several computational procedures due to steep learning curve Because of the multiple low-level steps and subtleties that all need to be done correctly to apply computational methods, there is often a single group “expert” for each technique 7 “Alice knows how to do charged defect calculations.” “Bob is the one who can properly converge GW runs.” “Olga has all the scripts for phonon calculations.”
  • 8. What would be a better way? 8 “something” Results! researcher What is the GGA-PBE elastic tensor of GaAs?
  • 9. What would be a better way? 9 Results! researcher What is the GGA-PBE elastic tensor of GaAs? Workflows to run q band structure q surface energies ü elastic tensor q Raman spectrum q QH thermal expansion
  • 10. Ideally the method should scale to millions of calculations 10 Results! researcher Start with all binary oxides, replace O->S, run several different properties Workflows to run ü band structure ü surface energies ü elastic tensor q Raman spectrum q QH thermal expansion q spin-orbit coupling
  • 11. Atomate’s vision is to make it easy, automatic, and flexible to generate data with existing simulation packages 11 Results! researcher Run many different properties of many different materials!
  • 12. Outline 12 ① Vision of atomate ② History of atomate ③ Current implementation ④ Future of atomate ⑤ Practical things
  • 13. 13 The first attempt to create something like this in Ceder group was early/mid 2000s, with the “JCMC” code “relational database hell” DB design bottleneck • Java on its way out • Poor source code quality control • 134,000 lines of code • 511 classes • 54 packages
  • 14. 14 Circa 2011, we solved the database problem by switching to MongoDB.We also migrated everything to Python Without going into too many details, it turned out these were better tools for us to be constructing code with. We also tightened up code quality control During this period, we created pymatgen and FireWorks which are still alive today and indeed tend to be growing. These codes were written to be general-purpose and for arbitrary end users. We also created MPWorks and used it for ~5 years. But this code was written mainly for ourselves to power the Materials Project. It was a “heavyweight” infrastructure, like a battleship. It could run a lot of calculations (good for solving big problems) but had a ton of components and prioritized comprehensiveness over ease–of–use (bad for solving small problems).
  • 15. • We began the “atomate” project, which prioritized ease-of-use and ease-of-development over feature list • This turned out to be a very good idea, and atomate is definitely better than MPWorks • But, we still didn’t get everything right (more on that later) – Why is it taking so many iterations to get this code right? Because developing complex frameworks is more difficult than writing complex functions 15 Circa 2016, we decided that MPWorks needed to be redesigned to make it easier to solve small problems
  • 16. • The Materials Project is powered by atomate for the last couple of years – Thus, atomate users are using the same exact tool as Materials Project • Persson, Jain, and Ong research groups use atomate for their work • Some examples of outside community usage • Members of ABINIT team (G.M. Rignanese, G. Hautier) plan to migrate all their ABINIT workflows to atomate – Exciting thing is that one can then run all this stuff without paying for anything apart from computing 16 Atomate is used in several different places and is quite well- tested now
  • 17. Outline 17 ① Vision of atomate ② History of atomate ③ Current implementation ④ Future of atomate ⑤ Practical things
  • 18. How can we design a simple yet powerful tool for performing simulations? • First, we’ll go through the four high-level steps in doing a theory study • In each step, we’ll talk about how software tools can help make you more productive and help you concentrate on things that truly require your attention, not mindless labor (e.g., manually resubmitting failed jobs) 18
  • 19. Steps for theory & simulation 19 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...) simulation parameters and sequence (workflow) job management, execution, error recovery databases, plotting, analysis, insight
  • 20. Software technologies for theory / calculation 20 (automatic materials science workflows) Custodian (calculation error recovery) (materials analysis framework) Base packages Derived packages (workflow definition & execution) These are all open-source: (materials data mining)
  • 21. Software stack for theory & simulation 21 material generation simulation procedure calculation & execution data analysis Custodian
  • 22. Steps for theory & simulation 22 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...) simulation parameters and sequence (workflow) job management, execution, error recovery databases, plotting, analysis, insight
  • 23. Step 1: materials generation 23 material generation • Pymatgen can help generate models for: – crystal structures – molecules – systems (surfaces, interfaces, etc.) • Tools include: – order-disorder (shown at right) and SQS – interstitial finding – surface / slab generation – structure matching and analysis – get structures from Materials Project • Won’t cover materials generation in this presentation! Example: Order-disorder resolve partial or mixed occupancies into a fully ordered crystal structure (e.g., mixed oxide-fluoride site into separate oxygen/fluorine)
  • 24. Steps for theory & simulation 24 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...) simulation parameters and sequence (workflow) job management, execution, error recovery databases, plotting, analysis, insight
  • 25. Step 2: simulation procedure 25 simulation procedure • Simulation procedure requires knowing: i. what sequence of calculations do I need to perform to compute my desired materials property? ii. for each calculation, what is a good set of parameters to use (e.g., functional, energy cutoff, k-point density, etc.) • Item (i) is largely handled by atomate • Item (ii) is largely handled by pymatgen, e.g. in the VaspInputSet classes and won’t be covered here – although atomate also contains some of this information • Overall, we are trying to standardize and automate simulation procedures as much as possible
  • 26. Atomate contains a library of simulation procedures 26 VASP-based • band structure • spin-orbit coupling • hybrid functional calcs • elastic tensor • piezoelectric tensor • Raman spectra • NEB • GIBBS method • QH thermal expansion • AIMD • ferroelectric • surface adsorption • work functions • NMR spectra* • Bader charges* • Magnetic orderings* • SCAN functionals* Other • BoltzTraP • FEFF method • Q-Chem* *=added / major updates in past year Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
  • 27. Each simulation procedure in atomate is composed of multiple levels of detail / abstraction 27 Workflow – complete set of calculations to get a materials property Firework – one step in the Workflow (typically one DFT calculation) Firetask – one step in a Firework Starting with just a crystal structure, this workflow performs four calculations to get an optimized structure, optimized charge density, and band structure on two types of grids (uniform and line)
  • 28. atomate allows you to leverage the prior efforts and knowledge of many researchers 28 K. Mathew J. Montoya S. Dwaraknath A. Faghaninia All past and present knowledge, from everyone in the group, everyone previously in the group, and our collaborators, about how to run calculations M. Aykol S.P. Ong B. Bocklund T. Smidt H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood Z.K. Liu J. Neaton K. Persson A. Jain +
  • 29. Workflow parameters can be customized at multiple levels of detail 29 1. Workflows have various high-level options 2. Fireworks also have options / flags (not shown) 3. Firetasks have most detailed number of options / flags (not shown) Example 1: “VASP input set” controls the rules that set DFT parameters (pseudopotentials, cutoffs, grid densities, etc) via pymatgen Example II: If “stability_check” is enabled, the later parts of the workflow are skipped if the structure is determined unstable to save computer time on uninteresting structures
  • 30. You can build workflows from scratch or reuse components to assemble workflows Multiple workflows are built with the same components stacked together in different ways 30 These two workflows reuse almost all the same code between the two!
  • 31. Atomate simulation procedures: the main ideas • Atomate allows you to start with a material (e.g., crystal structure) and get a full workflow of calculations needed to compute many types of materials properties • Atomate tries to guess sensible defaults in terms of calculation parameters, but you can always customize these parameters using: – Workflow options – Firework options – Firetask options – powerups – going back to pymatgen library 31
  • 32. Steps for theory & simulation 32 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...) simulation parameters and sequence (workflow) job management, execution, error recovery databases, plotting, analysis, insight
  • 33. Step 3: calculation & execution 33 calculation & execution • Once you have the material and the simulation procedure (Workflow), you need to actually execute the workflow on your computing resource • This includes tasks like: – submission to calculation queues – customization of any computing-specific parameters • e.g., path to VASP executable, number of CPUs to parallelize over – recovering from failures / job resubmission – coordinating jobs across computing centers – managing location of jobs – tracking the progress of jobs • Almost all of this is handled by FireWorks (custodian is used for encoding job recover procedures) Custodian
  • 34. FireWorks allows you to write your workflow once and execute (almost) anywhere 34 • Execute workflows locally or at a supercomputing center • Queue systems supported – PBS – SGE – SLURM – IBM LoadLeveler – NEWT (a REST-based API at NERSC) – Cobalt (Argonne LCF)
  • 35. Dashboard with status of all jobs 35
  • 36. Job provenance and automatic metadata storage 36 what machine what time what directory what was the output when was it queued when did it start running when was it completed
  • 37. Detect and rerun failures • All kinds of failures can be detected and rerun – Soft failures (job quits with error code) – hard failures (computing center goes down) – human errors 37
  • 38. Customize job priorities • Within workflow, or between workflows • Completely flexible and can be modified / updated whenever you want 38
  • 39. Track output files remotely • Can bring up the last few lines of each of your output files – and can be combined with queries / filters 39 Now seems like a good time to bring up the last few lines of the OUTCAR of all failed jobs...
  • 40. FireWorks workflow software: the main ideas • FireWorks is used to execute Workflow objects at supercomputing centers • It is very well-suited to high-throughput applications and has been used to execute millions of jobs and is well tested • There are many features and advantages to using FireWorks as your workflow manager, which has made it one of the most popular scientific workflow software in use today 40
  • 41. An aside on “custodian”: instead of running VASP, have custodian run VASP on your behalf 41 • Custodian can wrap around an executable (e.g., VASP) – i.e., run custodian instead of directly running VASP • During execution, custodian will monitor output files and detect errors / problems – e.g., if ZPOTRF error detected, rerun with ISYM=0 – ever-expanding library of fixes (currently over 30 fixes for VASP!) • There is more you can do with custodian (e.g., simple workflows), but it won’t be covered here • Currently has fixes for: VASP, Q- Chem, NWChem
  • 42. Steps for theory & simulation 42 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...) simulation parameters and sequence (workflow) job management, execution, error recovery databases, plotting, analysis, insight
  • 43. Step 4: data analysis 43 data analysis • Data analysis can involve: i. Creating databases of materials and their properties for easy searching and data retrieval ii. Conventional scientific analyses and scientific plotting (e.g., generating phase diagrams, plotting band structures) iii. Data mining methods • Item (i) is largely handled by atomate • Item (ii) is largely handled by pymatgen • Item (iii) is largely handled by matminer
  • 44. Atomate – builders framework 44 “Builders” start with base collections in a database and create higher-level collections that summarize information or add metadata
  • 45. pymatgen – examples of analyses 45 phase diagrams Pourbaix diagrams diffusivity from MDband structure analysis
  • 46. Data analysis: the main ideas • Atomate generates databases that summarize your calculation results. Additionally The builders framework generates document collections that summarize data on a particular material from multiple calculations or from additional information. • Pymatgen is a powerful software for doing conventional scientific analyses with the computational data • Matminer, currently in pre-release, is software for doing data mining style analyses of materials 46
  • 47. Doing simulations programmatically can become second nature – and very fast / automatic / scalable 47 material generation simulation procedure calculation & execution data analysis Custodian Results! researcher What is the GGA-PBE elastic tensor of GaAs? software infrastructure
  • 48. Outline 48 ① Vision of atomate ② History of atomate ③ Current implementation ④ Future of atomate ⑤ Practical things
  • 49. • After using atomate for ~3 years, we realize there are still things that are awkward to do • For example – Re-using components between workflows is harder than it should be (e.g., components don’t always “stack” and “connect” easily) – Workflow creator functions often use different signatures and conventions (harder to learn) – Changing certain parameters (e.g., switching to a vdW functional) can be non-intuitive • We are starting “atomate v2” to fix these things, but better to start with original atomate for now – Original atomate will still be around, and atomate v2 will be a smooth transition from atomate v1 49 Atomate v2
  • 50. • Most of “atomate” (and all the various software tools) are largely programmatic, not interactive / GUI-driven • The goal of atomate was always to work up to a visual interface – Stage 1: Exploring results using a web interface rather than by database queries – Stage 2: Submitting pre-defined workflow templates using a web interface – Stage 3: Designing and submitting custom workflows using a web interface • Currently working on stage 1 50 Atomate GUI
  • 51. 51 Atomate GUI prototype Done by UCB undergrad But really requires a dedicated, focused effort
  • 52. Outline 52 ① Vision of atomate ② History of atomate ③ Current implementation ④ Future of atomate ⑤ Practical things
  • 53. • Not going to lie, there is a learning curve and the current tools are really meant for programmers – When (if?) we get to “Stage 2” atomate GUI, this will no longer be the case • If you don’t know any programming, it will be a long road 53 How to actually get started No programming experience Python proficient pymatgen proficient atomate proficientMongoDB experience
  • 54. • Papers: good general overview / vision but of no practical help • Online MP Workshop videos and tutorials: good way to get started if you have at least some knowledge, but usually not very deep. A “zero to one” situation • Code documentation: Most comprehensive way to learn, although some experience probably necessary • Help channels: Great if you already started and run into problems 54 What resources are available to learn?
  • 55. Papers • Again, these are of little practical help and any specifics are often outdated 55 Ong, S. P. et al. Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013). Jain, A. et al. FireWorks: a dynamic workflow system designed for high- throughput applications. Concurr. Comput. Pract. Exp. 22, 5037–5059 (2015). Mathew, K. et al. Atomate: A high- level interface to generate, execute, and analyze computational materials science workflows. Comput. Mater. Sci. 139, 140–152 (2017).
  • 56. • Resources on http://workshop.materialsproject.org • Videos on MP channel of Youtube: – https://www.youtube.com/channel/UC6pqY-__Nu- mKkv0LMP8FJQ 56 Online MP workshop and videos
  • 57. Online documentation • Online documentation is the most comprehensive writeup – www.pymatgen.org – https://materialsproject.github.io/fireworks/ – https://materialsproject.github.io/custodian/ – https://hackingmaterials.github.io/atomate/ • The online documentation includes installation, examples, tutorials, and descriptions of how to use the code • If you want to do “everything”, suggest starting with atomate and going from there 57
  • 58. Help lists • A help forum for each code is at https://discuss.matsci.org 58
  • 59. Questions? Comments? (a big thanks to everyone who supported and/or contributed to this software!) 59Slides (already) posted to hackingmaterials.lbl.gov