SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
Data dissemination and material informatics at LBNL
Anubhav Jain, Energy Technologies Area, Berkeley Lab
Contact: ajain@lbl.gov
Data Dissemination
Several “apps”, including the
PDApp (which helps assess
synthesis and stability)
provide domain-specific
views of the data. Plots and
analyses from such apps are
some of the most heavily
used and cited aspects of the
Materials Project.
Materials Informatics
Data on over 500,000 porous
materials was contributed by
the Nanoporous Materials
Genome Center. Users can
create custom plots based on
properties of interest as well as
obtain detailed information
about specific materials.
A Representational State Transfer
Application Programming
Interface (REST API) allows users
to download large quantities of
data using many popular
programming languages. This
method uses HTTP requests as a
mechanism of accessing data and is
employed by many of the top
internet companies, including
Google, Dropbox, and Twitter.
Unified View
Collabora'ng	PIs:	Kris&n	Persson,	Gerbrand	Ceder,	Dan	Gunter,	Shyue	Ping	Ong	(UCSD),	
Geoffroy	Hau&er	(U.	Catholique	de	Louvain),	Anthony	Gamst	(UCSD),	Karsten	Jacobsen	(DTU)	
Although the Materials Project is largely
an open database, a sandboxing system
allows for core data to be stored
separately from data generated for
proprietary or external projects. Access
management is controlled via an API.
The Materials Project
MPContribs
MIDAS: Materials Informatics & Data Analysis Software
Materials data mining
Automated materials design
A “science gateway” exposes data to the research community and facilitates knowledge
extraction. The Materials Project (www.materialsproject.org) shares simulation data on
hundreds of thousands of materials for a community of 22,000 researchers; millions of data
points have been downloaded and used in the research articles of its users. A recently
introduced feature, MPContribs, accepts data contributions from the user community.
Historically, all data in the Materials
Project was generated at LBNL via high-
throughput computing and density
functional theory (DFT) calculations.
MPContribs is a new feature that allows
users to contribute their own data sets and
disseminate them via Materials Project.
Online editors and plotting tools allow
users to control the look and presentation
of their data. In some instances, apps can
combine user data with Materials Project
core data to create a joint computation-
experiment analysis, as shown in the phase
diagram on the right.
We are building a materials data mining platform called MIDAS (Materials Informatics &
Data Analysis Software). MIDAS can retrieve data from several materials databases that have
REST APIs or from the JSON format, format that data into a Pandas DataFrame object,
automatically generate possible descriptors for the data, and run machine learning
algorithms through scikit-learn. Visualization tools are provided through adapters to the
plot.ly library. This system will be scaled for “big data analysis” in the future through Apache
Spark and similar toolkits such as the FireWorks workflow software developed at LBNL.
Examples of user-
contributed data
Left: data pipeline for MIDAS 
Right: example of an interactive plot
that can be generated through MIDAS
via the plot.ly toolkit.
Using data sets from the Materials Project,
we have derived structure-property
relationships that relate fundamental
descriptors such as composition, density,
and coordination to output properties such
as bulk modulus, shear modulus, and the
electronic character of the valence and
conduction bands. Crucial to this effort has
been the development of relevant
descriptor combinations as well as new
machine learning approaches.
In this machine learning predictor for the bulk and shear
moduli of a material, we use Hölder means to improve
the predictive power of descriptors and develop a local
linear regression method that fits the tails of distributions
(i.e., extreme values) accurately without overfitting.
This diagram shows the pairwise likelihood for the
electronic state on the y-axis to form a greater
contribution to the valence or conduction band character
versus the state on the x-axis. For example, d orbitals in
Cu1+ are the most likely to form the VB. For this study,
we repurposed algorithms used in ranking sports teams
to rank electronic orbitals.
Materials discovery today is largely performed through targeted experiments based on
researcher intuition. However, such intuition is difficult to obtain within the context of high-
throughput and combinatorial studies in which tens of thousands of data points may be
collected. We are developing automated optimization routines that couple forward models
(e.g. density functional theory calculations) with inverse optimizers (e.g., genetic algorithms
or Gaussian processes) to build a fully automated materials discovery system.
Left: schematic diagram showing
the integration of typical high-
throughput screening coupled with
an automatic optimization routine.

Right: performance of genetic
algorithms (GAs) in uncovering
promising perovskite water splitting
materials versus random search
(black) and chemical rules (orange).
Download	this	poster:	hHp://www.slideshare.net/anubhavster

Contenu connexe

Tendances

Tendances (20)

Atomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discovery
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
DuraMat Data Analytics
DuraMat Data AnalyticsDuraMat Data Analytics
DuraMat Data Analytics
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Methods, tools, and examples (Part II): High-throughput computation and machi...
Methods, tools, and examples (Part II): High-throughput computation and machi...Methods, tools, and examples (Part II): High-throughput computation and machi...
Methods, tools, and examples (Part II): High-throughput computation and machi...
 
DuraMat Data Management and Analytics
DuraMat Data Management and AnalyticsDuraMat Data Management and Analytics
DuraMat Data Management and Analytics
 
Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...
 
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataThe DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
 
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
 
Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...
 

Similaire à Data dissemination and materials informatics at LBNL

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 

Similaire à Data dissemination and materials informatics at LBNL (20)

IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
resume_MH
resume_MHresume_MH
resume_MH
 
G1803054653
G1803054653G1803054653
G1803054653
 
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning AlgorithmsSurvey on MapReduce in Big Data Clustering using Machine Learning Algorithms
Survey on MapReduce in Big Data Clustering using Machine Learning Algorithms
 
Linking Scientific Instruments and Computation
Linking Scientific Instruments and ComputationLinking Scientific Instruments and Computation
Linking Scientific Instruments and Computation
 
Time to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the CloudTime to Science/Time to Results: Transforming Research in the Cloud
Time to Science/Time to Results: Transforming Research in the Cloud
 
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy SciencesDiscovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORKMACHINE LEARNING ON MAPREDUCE FRAMEWORK
MACHINE LEARNING ON MAPREDUCE FRAMEWORK
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
 
ME Synopsis
ME SynopsisME Synopsis
ME Synopsis
 
Resume
ResumeResume
Resume
 

Plus de Anubhav Jain

Plus de Anubhav Jain (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 

Dernier

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 

Dernier (20)

Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 

Data dissemination and materials informatics at LBNL

  • 1. Data dissemination and material informatics at LBNL Anubhav Jain, Energy Technologies Area, Berkeley Lab Contact: ajain@lbl.gov Data Dissemination Several “apps”, including the PDApp (which helps assess synthesis and stability) provide domain-specific views of the data. Plots and analyses from such apps are some of the most heavily used and cited aspects of the Materials Project. Materials Informatics Data on over 500,000 porous materials was contributed by the Nanoporous Materials Genome Center. Users can create custom plots based on properties of interest as well as obtain detailed information about specific materials. A Representational State Transfer Application Programming Interface (REST API) allows users to download large quantities of data using many popular programming languages. This method uses HTTP requests as a mechanism of accessing data and is employed by many of the top internet companies, including Google, Dropbox, and Twitter. Unified View Collabora'ng PIs: Kris&n Persson, Gerbrand Ceder, Dan Gunter, Shyue Ping Ong (UCSD), Geoffroy Hau&er (U. Catholique de Louvain), Anthony Gamst (UCSD), Karsten Jacobsen (DTU) Although the Materials Project is largely an open database, a sandboxing system allows for core data to be stored separately from data generated for proprietary or external projects. Access management is controlled via an API. The Materials Project MPContribs MIDAS: Materials Informatics & Data Analysis Software Materials data mining Automated materials design A “science gateway” exposes data to the research community and facilitates knowledge extraction. The Materials Project (www.materialsproject.org) shares simulation data on hundreds of thousands of materials for a community of 22,000 researchers; millions of data points have been downloaded and used in the research articles of its users. A recently introduced feature, MPContribs, accepts data contributions from the user community. Historically, all data in the Materials Project was generated at LBNL via high- throughput computing and density functional theory (DFT) calculations. MPContribs is a new feature that allows users to contribute their own data sets and disseminate them via Materials Project. Online editors and plotting tools allow users to control the look and presentation of their data. In some instances, apps can combine user data with Materials Project core data to create a joint computation- experiment analysis, as shown in the phase diagram on the right. We are building a materials data mining platform called MIDAS (Materials Informatics & Data Analysis Software). MIDAS can retrieve data from several materials databases that have REST APIs or from the JSON format, format that data into a Pandas DataFrame object, automatically generate possible descriptors for the data, and run machine learning algorithms through scikit-learn. Visualization tools are provided through adapters to the plot.ly library. This system will be scaled for “big data analysis” in the future through Apache Spark and similar toolkits such as the FireWorks workflow software developed at LBNL. Examples of user- contributed data Left: data pipeline for MIDAS Right: example of an interactive plot that can be generated through MIDAS via the plot.ly toolkit. Using data sets from the Materials Project, we have derived structure-property relationships that relate fundamental descriptors such as composition, density, and coordination to output properties such as bulk modulus, shear modulus, and the electronic character of the valence and conduction bands. Crucial to this effort has been the development of relevant descriptor combinations as well as new machine learning approaches. In this machine learning predictor for the bulk and shear moduli of a material, we use Hölder means to improve the predictive power of descriptors and develop a local linear regression method that fits the tails of distributions (i.e., extreme values) accurately without overfitting. This diagram shows the pairwise likelihood for the electronic state on the y-axis to form a greater contribution to the valence or conduction band character versus the state on the x-axis. For example, d orbitals in Cu1+ are the most likely to form the VB. For this study, we repurposed algorithms used in ranking sports teams to rank electronic orbitals. Materials discovery today is largely performed through targeted experiments based on researcher intuition. However, such intuition is difficult to obtain within the context of high- throughput and combinatorial studies in which tens of thousands of data points may be collected. We are developing automated optimization routines that couple forward models (e.g. density functional theory calculations) with inverse optimizers (e.g., genetic algorithms or Gaussian processes) to build a fully automated materials discovery system. Left: schematic diagram showing the integration of typical high- throughput screening coupled with an automatic optimization routine. Right: performance of genetic algorithms (GAs) in uncovering promising perovskite water splitting materials versus random search (black) and chemical rules (orange). Download this poster: hHp://www.slideshare.net/anubhavster