AWS for HPC in Drug Discovery

CONFIDENTIAL
Why AWS for HPC?
Low cost with flexible pricing Efficient clusters
Unlimited infrastructure
Faster time to results
Concurrent clusters on-demand
Increased collaboration

CONFIDENTIAL
Schrodinger Material Sciences Tools
Estimated $68M for a cluster purchase,
or 200-years on an on-premise machine
vs
50,000-core analytics job run on AWS cloud,
completed in 18 hours
using 1.21 petaflops of computing capacity at peak…
…for a total of $33K

CONFIDENTIAL
Novartis
Estimated 50,000 cores and $40M to experiment internally
vs
10,600 Spot Instances, ~87,000 compute cores
39 years of computational chemistry in 9 hours…
…for a total of $4,232

CONFIDENTIAL
But cloud provides more than scale
• Compliance
• Data management
– Secure
– Integrated Lifecycle Management
• Collaboration
– Real time desktop sharing
– Controlled sharing of data

CONFIDENTIAL
AWS Global Infrastructure
Application Services
Networking
Deployment & Administration
DatabaseStorageCompute

CONFIDENTIAL
Compiance
Collaboration
Scale
Cloud offers…
…with higher performance and lower cost
than on-premise HPC

CONFIDENTIAL
BETTER ANSWERS, FASTER.
Rob Futrick
rob.futrick@cyclecomputing.com
8

CONFIDENTIAL
BigData + BigCompute
Fraud Detection
Risk Modeling
Drug Design
Genomics
Modeling &
Simulation
Customer Analysis
Data Lakes

CONFIDENTIAL
Utility computing should be
ubiquitous and easily
accessible.

CONFIDENTIAL
Well, after he’s at daycare...

CONFIDENTIAL
Great, so…
what’s the problem?

CONFIDENTIAL
The Problem: Fixed Capacity

CONFIDENTIAL
• Arkema
Comp Chem
• Tute Genomics
NGS
• J&J
PK/PD, clinical trial
simulation
• Novartis Institutes for
Biomedical Research
Drug Discovery
• Large BioTech
Petabyte+ Genome data
archiving
• J&J
Statistical modeling, data
archival, and computation
Cloud helped…

CONFIDENTIAL
For users, focus should be on science not IT.
Easy access to compute changes everything.
Accelerating compute accelerates people.
Data wants to be stored and processed.
Patterns

CONFIDENTIAL
The Problem in 2015:
• Need to run applications such as
Gromacs, LAMMPS, & Quantum
Espresso
• No internal option to procure or
support a cluster
• Small amount of compute
Solution: AWS &
Create fully functional compute
clusters - of a few nodes - on
demand
18
Arkema: Comp chemistry
Data Workflow
Cloud Orchestration
Analytics
Modeling
Compute
Workflow

CONFIDENTIAL 19
Tute Genomics: NGS
• Need to run an in-house genome
sequencing and analysis pipeline
• No internal option to procure or
support a cluster
• Small initial compute needs
Solution: AWS &
Create fully functional compute
clusters on demand.
Data Workflow
Cloud Orchestration
Analytics
Modeling
Compute
Workflow

CONFIDENTIAL 20
Users are focused on Science
Not cluster management
Data Workflow
Cloud Orchestration
Analytics
Modeling
Compute
Workflow

CONFIDENTIAL 21
J&J: Clinical Trial Simulations
The Problem:
• Need to run multiple versions of
apps like NONMEM in qualified
and validated environments.
• Environments must be
maintained for years!
• Need to replace EOLed
infrastructure.
Solution: AWS &
Create qualified and validated
compute environments on
demand in AWS.
Data Workflow
Cloud Orchestration
Analytics
Modeling
Compute
Workflow

CONFIDENTIAL
Your compute environment
- at the push of a button –
changes everything

CONFIDENTIAL
Expected Impact
720 (hours) 720 720
Computing Analysis
2880 hours /
120 Days to Results
Computing
720
Analysis
CURRENT PROCESS (in hours)
720
Computing Analysis Analysis
1456 hours /
60.6 Days to Results
7208 8
Computing
ANTICIPATED BENEFIT (in hours)

CONFIDENTIAL
Benefit: 2-3X faster time to results
720 (hours) 720 720
Computing Analysis
2880 hours /
120 Days to Results
Computing
720
Analysis
CURRENT PROCESS (in hours)
Higher Quality Output, Iterative Analysis,
Less Context Switching
Computing & Analysis
POST ADOPTION: AGILE DESIGN PROCESS
8

CONFIDENTIAL
Accelerating compute,
Accelerates people.
Enable them to ask the right
questions.

CONFIDENTIAL
Transform Healthcare/Life Sciences
• Cancer research needed 50,000 cores,
not available in-house
The options they didn’t choose:
• Buy infrastructure: Spend $2M, wait 6 months
• Write software for 9-12 months this 1 app
Solution:
• Created 10,600 server cluster
• 39.5 years of computing in 8 hours
• Found 3 potential drug candidates!
• Total infrastructure bill: $4,372
26

CONFIDENTIAL
Data wants to be stored properly

CONFIDENTIAL
San Diego BioTech
• 1+ Petabytes of data
• DirectConnect
• Uses DataMan to fully
utilize bandwidth
– Encryption keys
managed internally
– Scheduled and just in
time transfers, easy for
users
Internal
File System
1 Petabyte
Firewall
Amazon S3
Amazon
Glacier
HTTPS
Command
Lines/Sched
uled
Transfers

CONFIDENTIAL
Data wants to be processed

CONFIDENTIAL
Amazon S3
Amazon
Glacier
S3 -> Lustre -> Processing -> Glacier
Data Workflow
Analytics
Modeling
Cloud
Compute

CONFIDENTIAL
So we’ve covered…

CONFIDENTIAL
So how do I get there?

CONFIDENTIAL
Technical Computing Access Problems
Technical Problems:
- Migration: Cloud presents new APIs, interfaces
- Workflow: Data, optimization
- Security: Audit, compliance, and encryption
- Performance: Error-handling, cost
- Reporting: Chargeback, utilization, planning
Users
Cluster
Workload
35
Business Problems:
- Starting instances is just the beginning
- LOBs access new, highly dynamic scale
- Scale vs. cost equation
- The workflow is more important than any tool

CONFIDENTIAL
Cycle powers cloud BigData & BigCompute
Data Workflow
Cloud Orchestration
Analytics
Modeling
Internal
Compute
Burst
Software required to drive
cloud analytics & simulation:
• Easy access
• Highly automated
• Cost Optimized

CONFIDENTIAL
Cycle Makes BigCompute Easy
Burst, Data Workflow, & Orchestration
 Handles errors, reliability
 Automatic Spot Bidding
 Schedules data movement
 Secures, encrypts and audits
 Provides reporting and chargeback
 Validation, compliance
 Supports productions operations
 Scales from 100s - 100,000s cores

CONFIDENTIAL
We can solve the Problem!

CONFIDENTIAL
Help your organization
ask the right questions

CONFIDENTIAL
Not only the ones that fit in
fixed internal infrastructure

CONFIDENTIAL
Think outside the boxes you own

CONFIDENTIAL
Reap the benefits of boxes you borrow
Analytics
Modeling
Simulation
On Premise
Data

CONFIDENTIAL
Questions?
44
@rfutrick
@cyclecomputing

CONFIDENTIAL
The Challenge for the Scientist
• Dr. Mark Thompson
• “Solar energy has the potential
to replace some of our
dependence on fossil fuels, but
only if the solar panels can be
made very inexpensively and
have reasonable to high
efficiencies. Organic solar cells
have this potential.”

CONFIDENTIAL
Challenge:
205,000 compounds
totaling 2,312,959 core-hours,
or 264 core-years

CONFIDENTIAL
16,788 Spot Instances,
156,314 cores!
205,000 molecules
264 years of computing

CONFIDENTIAL
156,314 cores =
1.21 PetaFLOPS (Rpeak)
Equivalent to Top500 Jun2013 #29
205,000 molecules

CONFIDENTIAL
Done in 18 hours
On $68M system
for $33k
205,000 molecules

CONFIDENTIAL
Thank you.
51

AWS for HPC in Drug Discovery

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à AWS for HPC in Drug Discovery

Similaire à AWS for HPC in Drug Discovery (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Dernier

Dernier (20)

AWS for HPC in Drug Discovery

Notes de l'éditeur