SlideShare une entreprise Scribd logo
1  sur  9
Open Force Field:
Scavenging pre-emptible
CPU hours* in the age of
COVID
Jeff Wagner, Open Force Field Scavenger King Technical Lead
*and panel spots
www.openforcefield.org
Automated
infrastructure enables
rapid experimentation
with minimum human
intervention
OPEN SOFTWARE
Access to large, high
quality experimental
and quantum chemical
data facilities easy
curation of balanced
train / test sets
OPEN DATA
Exploring new force
field science:
hypothesis - build
software - train - test -
iterate
is now almost routine
OPEN Software, OPEN Data, OPEN Science is rapidly
facilitating force field science!
OPEN SCIENCE
www.openforcefield.org
What is a force field?
re
q
U = kOH(r - req)2
Many more
terms…
www.openforcefield.org
Training new force fields requires new data
Large molecule datasets (SMILES strings) Quantum chemical calculation results
(times thousands of molecules)
www.openforcefield.org
What Open Force Field does
www.openforcefield.org
What Open Force Field does
www.openforcefield.org
PRP is capable of running enormous quantum chemistry
workloads
OpenFF-1.0.0 released OpenFF-2.0.0 released
OpenFF begins using Nautilus
www.openforcefield.org
OpenFF force fields are state-of-the-art
OpenFF 2.0.0 outperforms other public small molecule force fields
and OpenFF force fields continue to improve
OpenFF 2.0.0 outperforms OpenFF 1.2.1, GAFF 2.1, and CGENFF in free energy calculations. The
proprietary OPLS3e force field shows the best performance.
www.openforcefield.org
Dataset listing: https://qcarchive.molssi.org/apps/ml_datasets/
Python example notebooks for data access: https://qcarchive.molssi.org/examples/
OpenFF’s dataset lifecycle: https://github.com/openforcefield/qca-dataset-submission/projects/1
The datasets on QCArchive are fully open!

Contenu connexe

Similaire à Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Jeff Wagner

OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
OpenNebula Project
 
User-­friendly Metaworkflows in Quantum Chemistry
User-­friendly Metaworkflows in Quantum ChemistryUser-­friendly Metaworkflows in Quantum Chemistry
User-­friendly Metaworkflows in Quantum Chemistry
Sandra Gesing
 
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
inside-BigData.com
 
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Spark Summit
 

Similaire à Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Jeff Wagner (20)

Concurrent programming with RTOS
Concurrent programming with RTOSConcurrent programming with RTOS
Concurrent programming with RTOS
 
HNSciCloud represented at HUAWEI CONNECT 2017 in Shanghai
HNSciCloud represented at HUAWEI CONNECT 2017 in ShanghaiHNSciCloud represented at HUAWEI CONNECT 2017 in Shanghai
HNSciCloud represented at HUAWEI CONNECT 2017 in Shanghai
 
Enabling 5G through end-to-end wireless and optical orchestration
Enabling 5G through end-to-end wireless and optical orchestrationEnabling 5G through end-to-end wireless and optical orchestration
Enabling 5G through end-to-end wireless and optical orchestration
 
Leveraging CI/CD to improve open stack operation
Leveraging CI/CD to improve open stack operationLeveraging CI/CD to improve open stack operation
Leveraging CI/CD to improve open stack operation
 
Openflow
OpenflowOpenflow
Openflow
 
Overview of DuraMat software tool development (poster version)
Overview of DuraMat software tool development(poster version)Overview of DuraMat software tool development(poster version)
Overview of DuraMat software tool development (poster version)
 
Introduction of Okinawa Open Laboratory and it's activities (iPOP2015)
Introduction of Okinawa Open Laboratory and it's activities (iPOP2015)Introduction of Okinawa Open Laboratory and it's activities (iPOP2015)
Introduction of Okinawa Open Laboratory and it's activities (iPOP2015)
 
CERN User Story
CERN User StoryCERN User Story
CERN User Story
 
DA-JPL-final
DA-JPL-finalDA-JPL-final
DA-JPL-final
 
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...OpenNebulaconf2017US:  Rapid scaling of research computing to over 70,000 cor...
OpenNebulaconf2017US: Rapid scaling of research computing to over 70,000 cor...
 
AMS 250 - High-Performance, Massively Parallel Computing with FLASH
AMS 250 - High-Performance, Massively Parallel Computing with FLASH AMS 250 - High-Performance, Massively Parallel Computing with FLASH
AMS 250 - High-Performance, Massively Parallel Computing with FLASH
 
User-­friendly Metaworkflows in Quantum Chemistry
User-­friendly Metaworkflows in Quantum ChemistryUser-­friendly Metaworkflows in Quantum Chemistry
User-­friendly Metaworkflows in Quantum Chemistry
 
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
How to Design Scalable HPC, Deep Learning, and Cloud Middleware for Exascale ...
 
Cognitive Engine: Boosting Scientific Discovery
Cognitive Engine:  Boosting Scientific DiscoveryCognitive Engine:  Boosting Scientific Discovery
Cognitive Engine: Boosting Scientific Discovery
 
大強子計算網格與OSS
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSS
 
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...
 
OpenACC Monthly Highlights April 2018
OpenACC Monthly Highlights April 2018OpenACC Monthly Highlights April 2018
OpenACC Monthly Highlights April 2018
 
Quantum chem
Quantum chemQuantum chem
Quantum chem
 
Storm
StormStorm
Storm
 
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
 

Plus de Larry Smarr

Plus de Larry Smarr (20)

My Remembrances of Mike Norman Over The Last 45 Years
My Remembrances of Mike Norman Over The Last 45 YearsMy Remembrances of Mike Norman Over The Last 45 Years
My Remembrances of Mike Norman Over The Last 45 Years
 
Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019
Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019
Metagenics How Do I Quantify My Body and Try to Improve its Health? June 18 2019
 
Panel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving Institutions
 
Global Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated SystemsGlobal Network Advancement Group - Next Generation Network-Integrated Systems
Global Network Advancement Group - Next Generation Network-Integrated Systems
 
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
 Wireless FasterData and Distributed Open Compute Opportunities and (some) Us... Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
Wireless FasterData and Distributed Open Compute Opportunities and (some) Us...
 
Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Panel Discussion: Engaging underrepresented technologists, researchers, and e...Panel Discussion: Engaging underrepresented technologists, researchers, and e...
Panel Discussion: Engaging underrepresented technologists, researchers, and e...
 
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon MoonThe Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
The Asia Pacific and Korea Research Platforms: An Overview Jeonghoon Moon
 
Panel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving InstitutionsPanel: Reaching More Minority Serving Institutions
Panel: Reaching More Minority Serving Institutions
 
Panel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An OverviewPanel: The Global Research Platform: An Overview
Panel: The Global Research Platform: An Overview
 
Panel: Future Wireless Extensions of Regional Optical Networks
Panel: Future Wireless Extensions of Regional Optical NetworksPanel: Future Wireless Extensions of Regional Optical Networks
Panel: Future Wireless Extensions of Regional Optical Networks
 
Global Research Platform Workshops - Maxine Brown
Global Research Platform Workshops - Maxine BrownGlobal Research Platform Workshops - Maxine Brown
Global Research Platform Workshops - Maxine Brown
 
Built around answering questions
Built around answering questionsBuilt around answering questions
Built around answering questions
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Democratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish ParasharDemocratizing Science through Cyberinfrastructure - Manish Parashar
Democratizing Science through Cyberinfrastructure - Manish Parashar
 
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Frank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forwardFrank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forward
 
Global Network Advancement Group Next Generation Network-Integrated Sys...
      Global Network Advancement GroupNext Generation Network-Integrated Sys...      Global Network Advancement GroupNext Generation Network-Integrated Sys...
Global Network Advancement Group Next Generation Network-Integrated Sys...
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Open Force Field: Scavenging pre-emptible CPU hours* in the age of COVID - Jeff Wagner

Notes de l'éditeur

  1. Consider a water molecule moving around in space. Think of it like a ball and stick model. It’s going to have a list of different inter- and intramolecular motions and interactions. Let’s just consider the H-O bonds. In liquid water, these bonds stretch a bit back and forth, more or less symmetrically around an equilibrium distance we’ll call r_eq. This is modelled with a harmonic oscillator, the same one from physics class. Here there are two fitted parameters, one value for the equilibrium bond length and another for the force constant of the spring. In this case we can reliably fit these values to a combination of experimental data and highly accurate quantum chemical calculations, so we can with pretty good accuracy model this particular detail of the motion of water. There are approximations that can be made for the other forces that atoms feel, which I won’t go into here. But when you add them all up, you get a physical model of a molecule that you can propagate forward through time, on the scale of femtoseconds per timestep. If these models are accurate enough, you can use them to guide research into chemicals with desired properties, for example finding drug candidates that bind to a protein or polymers that have a desired property.
  2. The basic task here is to take a molecule and run QM calculations on it to determine its optimized geometry, which serves as a sort of baseline reference from first-principles physics that can be used in fitting and, also later, benchmarking. It’s not such a fundamentally difficult thing to do with one of the many off-the-shelf tools, but we need to do this at massive scale. Depending on the direction that fitting experiments go, we might have data sets of thousands to tens of thousands of molecules that are used in individual fits, so generating these data sets must be automated and standardized. There are a lot of different methods and settings that scientists might want to use in the QM space, so there must be a way to coherently communicate with different programs. Each of these calculations molecules takes on the order of a few hours to days per molecule, so we’re talking about a large amount of compute to leverage this data. And finally, the actual results must be stored in a way that’s rapidly accessible for future fitting experiments - ideally in a publicly accessible database so that the community can make use of our compute without needing to ask to create an account on one of our clusters or shipping harddrives around.
  3. What we do: we create new, comprehensive force fields and systematically improve their accuracy through scientific innovation and large, high-quality datasets. This schematic gives an overview of that process. We begin by generating, curate, and sharing the datasets necessary for producing and benchmarking high-accuracy force fields. We create and maintain open-source software for systematic, automated parameter optimization to our curated datasets. We finally benchmark the force field to evaluate if it has been significantly improved. If force fields do not meet our standards, they return back into the pipeline. All infrastructure, datasets, and force fields created during this process are released openly with permissive licenses so users can rapidly use, modify, and extend our work.
  4. PRP is used heavily in this force field creation workflow during the QC data generation stage.
  5. This is all a pretty tall task, but fortunately it’s mostly a solved problem. We use QCArchive, which is a project out of the molecular software sciences institute at virginia tech. It’s basically a public archive of QM calculations but also includes a lot of infrastructure for generating new data, including talking to different QM engines via a unified interface (QCEngine). Two of the key contributors to the projects (Daniel Smith and Lori Burns) gave a SciPy talk about this project a few years ago. The scale of the project has grown and the backend has been partially rewritten since then, but the talk holds up today. QCArchive handles most of the hard stuff - storing results in a database, running QM calculations with Psi4 - and we built a tool that makes our communication with it a little easier, since there’s some pre-processing we need to do before sending stuff off to QM. This tool is called QCSubmit and pretty elegantly handles the tasks of “I have a bunch of molecules, please run QM calculations on these” and, later, “please go fetch for me QM calculations on these molecules” Some of our calculations are run by grad students and postdocs running “QC managers” as cluster jobs at their respective universities. But something like half of our total compute is run on Nautilus. It’s a uniquely suitable compute backend to pair with the NSF-funded MolSSI QCArchive project, which was designed to take advantage of preemptible compute. Our datasets consist of hundreds to millions of jobs, each requiring tens to thousands of CPU-hours and 8-32 GB of RAM.
  6. And so, in combination with the efforts of modeling scientists and engineers, the enormous amount of training data generated by PRP has helped us release continuous improvements to our force fields, such that our models are now comparable to other academic models with decades of development, and we’re closing in on the accuracy of models developed by for-profit chemical modeling software vendors.