SlideShare une entreprise Scribd logo
1  sur  17
The Materials Project
Validation, Provenance and Sandboxes
Goals
• Validation
– constantly guard against bugs in core data
and imported data
• Provenance
– know how data came to be
• Sandboxes
– Combine public and non-public data; "good
fences make good neighbors"
Validation
(Internal)
Database ID
External ID What we expected
What we got
Validation runs all the time
• Rules with "constraints" for every database (and sandbox)
• Test constraints against entire DB every night  email reports
• Validation engine, etc. all open-source software in pymatgen-db
Remote
server
Validation
engine
Rules
MP Databases
Reports
(email, web pages, ..)
Rules have a simple syntax
_aliases:
- snl_id = mps_id
- energy = analysis.e_above_hull
materials:
-
filter:
constraints:
- final_energy_per_atom <= 0
- initial_structure.lattice.volume > 0
- initial_structure.lattice.a > 0
- initial_structure.lattice.b > 0
- initial_structure.lattice.c > 0
- initial_structure.lattice.matrix size 3
- formation_energy_per_atom <= 5
- formation_energy_per_atom > -5
- cpu_time > 5
- e_above_hull > -0.000001
- final_energy < 0
- reduced_cell_formula size$
nelements
# Check num. ICSD sources for
selected compounds
-
filter:
- task_id = "mp-540081"
constraints:
- icsd_id size> 10
-
filter:
- task_id = "mp-20379"
constraints:
- icsd_id size 1
-
filter:
- task_id = "mp-13634"
constraints:
- icsd_id size> 0
-
filter:
- task_id = "mp-600022"
constraints:
- icsd_id size 0
# NiO2 phases should never become
stable
-
filter:
- e_above_hull = 0
constraints:
- pretty_formula != 'NiO2'
tasks:
-
filter:
- state = "successful"
constraints:
- output.final_energy_per_atom <= 0
Validation summary
Easy-to-use, integrated, efficient tools to
report errors
Next steps
– Record all check results in DB
– More sophisticated checks (Map/Reduce)
– Make it easier to add new checks internally
– Make it easier to add new check for anyone
• per-sandbox or even per-user ("MP Alerts")
Provenance: How do I know that
the data is correct?
Types of provenance in the system
1) Calculation workflows
– FireWorks records calculation inputs, .. results in great detail
2) External datasets
– Structure Notation Language standardizes the naming of data
sources and publications
3) Post-calculation data transformations
– New "builders" provides framework for tracking creation of final
database products
(1) (2)
(3)
Provenance is available
for every material
Provenance in DB
Structure Notation Language
"snl_final": {
"about": {
"created_at": {
"string": "2014-02-22
19:07:00.383869",
"@class": "datetime",
"@module": "datetime"
},
"_materialsproject": {
"submission_id":
52621,
"snl_id": 398676,
"spacegroup": {
"lattice_type":
"tetragonal",
"symbol":
"P4_2/mmc",
"number": 131,
"point_group":
"4/mmm",
"crystal_system":
"tetragonal",
"hall": "-P 4c 2"
}
},
"_cedergroup": {
"BURP_sids": [
409544,
409545,
409546
],
"icsd_ids": [
],
"e_above_hull":
0.075125350000000423734
},
"references": "",
"authors": [
{
"name": "Geoffroy
Hautier",
"email":
"geoffroy.hautier@uclouvain
.be"
},
{
"name": "Bo Xu",
"email":
"boxu14@mit.edu"
}
],
"remarks": [
"supplementary
compounds from MIT
matgen database"
],
"projects": [
"MIT matgen"
],
"history": [
{
"url": "http://www.fiz-
karlsruhe.de/icsd_home.htm
l",
"name": "Inorganic
Crystal Structure Database",
"description": {
"Collection code":
24692
}
},
{
"url": "",
"name": "",
"description": {
"source": null,
"orig_name": "Basic
substitution code.",
"formula": "O1 Pd1"
}
},
{
"url":
"http://ceder.mit.edu/",
"name": "MIT Ceder
group research database",
"description": {
"source": 105986,
"orig_name": "",
"formula": "FeO"
}
},
{
"url":
"http://www.materialsproject.
org",
"name": "Materials
Project structure
optimization",
"description": {
"fw_id": 820305,
"task_type": "GGA
optimize structure (2x)",
"task_id": "mp-
753682"
}
},
{
"url":
"http://www.materialsproject.
org",
"name": "Materials
Project structure
optimization",
"description": {
"fw_id": 820308,
"task_type":
"GGA+U optimize structure
(2x)",
"task_id": "mp-
776678"
}
}
]
},
Metadata
Crystal
DB sources
References History of
structure
optimizations
Future work: unified view of
provenance
VASP
result
ICSD
VASP
result
VASP
result
Post-
processing
Material
properties
Computation
Data import
processing
e.g., Defects
Sandbox example: Multivalent
JCESR
users
Non-
JCESR
users
Multivalent app
Sandboxes = Database + Apps
Core data Core data
+
multivalent
materials
Non-
JCESR
users
JCESR
users
Technical challenges
• Pre-process data for real-time search
• Interfaces for per-user access control
– https://materialsproject.org/materials/1234?san
dbox=jcesr
– Web UI elements
and
Future: dynamic sandbox creation
Current:
– Large & significant
additional data / apps
• e.g., JCESR
– Longer-term
connections to MP data
• e.g. porous materials
– Companies
• e.g. VW/Stanford
Future
small collab.
per-user?
CoD?
Summary
• Validation
– guard against bugs by checking all data daily
and at data import/creation time
• Provenance
– universal standard for annotating data
provenance
• Sandboxes
– unified view of distinct databases
– onramp for new collaborations and data

Contenu connexe

Tendances

Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016
Xavier Ashe
 
Android Malware Analysis
Android Malware AnalysisAndroid Malware Analysis
Android Malware Analysis
JongWon Kim
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
Priyanka Aash
 
Analisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoAnalisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario Malicioso
Conferencias FIST
 
Volatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident ResponseVolatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident Response
Takahiro Haruyama
 

Tendances (20)

A Threat Hunter Himself
A Threat Hunter HimselfA Threat Hunter Himself
A Threat Hunter Himself
 
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
 
Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2
 
H@dfex 2015 malware analysis
H@dfex 2015   malware analysisH@dfex 2015   malware analysis
H@dfex 2015 malware analysis
 
Windows Threat Hunting
Windows Threat HuntingWindows Threat Hunting
Windows Threat Hunting
 
Tcpdump hunter
Tcpdump hunterTcpdump hunter
Tcpdump hunter
 
Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016
 
Malware forensics
Malware forensicsMalware forensics
Malware forensics
 
Android Malware Analysis
Android Malware AnalysisAndroid Malware Analysis
Android Malware Analysis
 
Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)
 
Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1
 
DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day Threats
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
 
Usage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsUsage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics tools
 
Lateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your NetworkLateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your Network
 
Analisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoAnalisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario Malicioso
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control Flow
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on Examples
 
Volatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident ResponseVolatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident Response
 

Similaire à Materials Project Validation, Provenance, and Sandboxes by Dan Gunter

High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
MongoDB
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it works
Boundary
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
Nick Kypreos
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.doc
butest
 

Similaire à Materials Project Validation, Provenance, and Sandboxes by Dan Gunter (20)

2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it works
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management System
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling Elasticsearch
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
 
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
 
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Public private hybrid - cmdb challenge
Public private hybrid - cmdb challengePublic private hybrid - cmdb challenge
Public private hybrid - cmdb challenge
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
Codemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsCodemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of Things
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.doc
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
 

Dernier

Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Dernier (20)

COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 

Materials Project Validation, Provenance, and Sandboxes by Dan Gunter

  • 1. The Materials Project Validation, Provenance and Sandboxes
  • 2. Goals • Validation – constantly guard against bugs in core data and imported data • Provenance – know how data came to be • Sandboxes – Combine public and non-public data; "good fences make good neighbors"
  • 3. Validation (Internal) Database ID External ID What we expected What we got
  • 4. Validation runs all the time • Rules with "constraints" for every database (and sandbox) • Test constraints against entire DB every night  email reports • Validation engine, etc. all open-source software in pymatgen-db Remote server Validation engine Rules MP Databases Reports (email, web pages, ..)
  • 5. Rules have a simple syntax _aliases: - snl_id = mps_id - energy = analysis.e_above_hull materials: - filter: constraints: - final_energy_per_atom <= 0 - initial_structure.lattice.volume > 0 - initial_structure.lattice.a > 0 - initial_structure.lattice.b > 0 - initial_structure.lattice.c > 0 - initial_structure.lattice.matrix size 3 - formation_energy_per_atom <= 5 - formation_energy_per_atom > -5 - cpu_time > 5 - e_above_hull > -0.000001 - final_energy < 0 - reduced_cell_formula size$ nelements # Check num. ICSD sources for selected compounds - filter: - task_id = "mp-540081" constraints: - icsd_id size> 10 - filter: - task_id = "mp-20379" constraints: - icsd_id size 1 - filter: - task_id = "mp-13634" constraints: - icsd_id size> 0 - filter: - task_id = "mp-600022" constraints: - icsd_id size 0 # NiO2 phases should never become stable - filter: - e_above_hull = 0 constraints: - pretty_formula != 'NiO2' tasks: - filter: - state = "successful" constraints: - output.final_energy_per_atom <= 0
  • 6. Validation summary Easy-to-use, integrated, efficient tools to report errors Next steps – Record all check results in DB – More sophisticated checks (Map/Reduce) – Make it easier to add new checks internally – Make it easier to add new check for anyone • per-sandbox or even per-user ("MP Alerts")
  • 7. Provenance: How do I know that the data is correct?
  • 8. Types of provenance in the system 1) Calculation workflows – FireWorks records calculation inputs, .. results in great detail 2) External datasets – Structure Notation Language standardizes the naming of data sources and publications 3) Post-calculation data transformations – New "builders" provides framework for tracking creation of final database products (1) (2) (3)
  • 10. Provenance in DB Structure Notation Language "snl_final": { "about": { "created_at": { "string": "2014-02-22 19:07:00.383869", "@class": "datetime", "@module": "datetime" }, "_materialsproject": { "submission_id": 52621, "snl_id": 398676, "spacegroup": { "lattice_type": "tetragonal", "symbol": "P4_2/mmc", "number": 131, "point_group": "4/mmm", "crystal_system": "tetragonal", "hall": "-P 4c 2" } }, "_cedergroup": { "BURP_sids": [ 409544, 409545, 409546 ], "icsd_ids": [ ], "e_above_hull": 0.075125350000000423734 }, "references": "", "authors": [ { "name": "Geoffroy Hautier", "email": "geoffroy.hautier@uclouvain .be" }, { "name": "Bo Xu", "email": "boxu14@mit.edu" } ], "remarks": [ "supplementary compounds from MIT matgen database" ], "projects": [ "MIT matgen" ], "history": [ { "url": "http://www.fiz- karlsruhe.de/icsd_home.htm l", "name": "Inorganic Crystal Structure Database", "description": { "Collection code": 24692 } }, { "url": "", "name": "", "description": { "source": null, "orig_name": "Basic substitution code.", "formula": "O1 Pd1" } }, { "url": "http://ceder.mit.edu/", "name": "MIT Ceder group research database", "description": { "source": 105986, "orig_name": "", "formula": "FeO" } }, { "url": "http://www.materialsproject. org", "name": "Materials Project structure optimization", "description": { "fw_id": 820305, "task_type": "GGA optimize structure (2x)", "task_id": "mp- 753682" } }, { "url": "http://www.materialsproject. org", "name": "Materials Project structure optimization", "description": { "fw_id": 820308, "task_type": "GGA+U optimize structure (2x)", "task_id": "mp- 776678" } } ] }, Metadata Crystal DB sources References History of structure optimizations
  • 11. Future work: unified view of provenance VASP result ICSD VASP result VASP result Post- processing Material properties Computation Data import processing e.g., Defects
  • 14. Sandboxes = Database + Apps Core data Core data + multivalent materials Non- JCESR users JCESR users
  • 15. Technical challenges • Pre-process data for real-time search • Interfaces for per-user access control – https://materialsproject.org/materials/1234?san dbox=jcesr – Web UI elements and
  • 16. Future: dynamic sandbox creation Current: – Large & significant additional data / apps • e.g., JCESR – Longer-term connections to MP data • e.g. porous materials – Companies • e.g. VW/Stanford Future small collab. per-user? CoD?
  • 17. Summary • Validation – guard against bugs by checking all data daily and at data import/creation time • Provenance – universal standard for annotating data provenance • Sandboxes – unified view of distinct databases – onramp for new collaborations and data

Notes de l'éditeur

  1. Picture of 1915 Heinrich Campendonk painting, "Landscape with horses". Steve Martin paid $850K for a forged version of the painting, from a reputable art house in Paris, in 2004. He sold it at a loss of $250K before discovering it was a forgery. The forgery was performed by Wolfgang Beltracchi.
  2. Sandboxes are a way to share preliminary data in the context of MP data and tools.