Drug discovery at 2x speed. Faster, more comprehensive testing approval processes. Identifying gene targets in massive sequencing data sets. These goals are ambitious yet attainable, but not without increasing the computational capabilities of today's researchers. While everyone agrees that simply deploying more infrastructure is not the answer, running that work in the cloud is not without challenges. In this talk we will discuss and illustrate elements of those workloads that Cycle Computing's customers have run on AWS, generating vastly better results than would have been attained on traditional infrastructure. We will cover some common problems they encountered, and how they resolved them using Amazon EC2, S3, Glacier, and Cycle's software.
Presenters: Dougal Ballantyne, Business Development, AWS; Rob Futrick, CTO, Cycle Computing
2. CONFIDENTIAL
Why AWS for HPC?
Low cost with flexible pricing Efficient clusters
Unlimited infrastructure
Faster time to results
Concurrent clusters on-demand
Increased collaboration
3. CONFIDENTIAL
Schrodinger Material Sciences Tools
Estimated $68M for a cluster purchase,
or 200-years on an on-premise machine
vs
50,000-core analytics job run on AWS cloud,
completed in 18 hours
using 1.21 petaflops of computing capacity at peak…
…for a total of $33K
4. CONFIDENTIAL
Novartis
Estimated 50,000 cores and $40M to experiment internally
vs
10,600 Spot Instances, ~87,000 compute cores
39 years of computational chemistry in 9 hours…
…for a total of $4,232
5. CONFIDENTIAL
But cloud provides more than scale
• Compliance
• Data management
– Secure
– Integrated Lifecycle Management
• Collaboration
– Real time desktop sharing
– Controlled sharing of data
17. CONFIDENTIAL
• Arkema
Comp Chem
• Tute Genomics
NGS
• J&J
PK/PD, clinical trial
simulation
• Novartis Institutes for
Biomedical Research
Drug Discovery
• Large BioTech
Petabyte+ Genome data
archiving
• J&J
Statistical modeling, data
archival, and computation
Cloud helped…
18. CONFIDENTIAL
For users, focus should be on science not IT.
Easy access to compute changes everything.
Accelerating compute accelerates people.
Data wants to be stored and processed.
Patterns
19. CONFIDENTIAL
The Problem in 2015:
• Need to run applications such as
Gromacs, LAMMPS, & Quantum
Espresso
• No internal option to procure or
support a cluster
• Small amount of compute
Solution: AWS &
Create fully functional compute
clusters - of a few nodes - on
demand
18
Arkema: Comp chemistry
Data Workflow
Cloud Orchestration
Analytics
Modeling
Compute
Workflow
20. CONFIDENTIAL 19
Tute Genomics: NGS
The Problem in 2015:
• Need to run an in-house genome
sequencing and analysis pipeline
• No internal option to procure or
support a cluster
• Small initial compute needs
Solution: AWS &
Create fully functional compute
clusters on demand.
Data Workflow
Cloud Orchestration
Analytics
Modeling
Compute
Workflow
21. CONFIDENTIAL 20
Users are focused on Science
Not cluster management
Data Workflow
Cloud Orchestration
Analytics
Modeling
Compute
Workflow
22. CONFIDENTIAL 21
J&J: Clinical Trial Simulations
The Problem:
• Need to run multiple versions of
apps like NONMEM in qualified
and validated environments.
• Environments must be
maintained for years!
• Need to replace EOLed
infrastructure.
Solution: AWS &
Create qualified and validated
compute environments on
demand in AWS.
Data Workflow
Cloud Orchestration
Analytics
Modeling
Compute
Workflow
24. CONFIDENTIAL
Expected Impact
720 (hours) 720 720
Computing Analysis
2880 hours /
120 Days to Results
Computing
720
Analysis
CURRENT PROCESS (in hours)
720
Computing Analysis Analysis
1456 hours /
60.6 Days to Results
7208 8
Computing
ANTICIPATED BENEFIT (in hours)
25. CONFIDENTIAL
Benefit: 2-3X faster time to results
720 (hours) 720 720
Computing Analysis
2880 hours /
120 Days to Results
Computing
720
Analysis
CURRENT PROCESS (in hours)
Higher Quality Output, Iterative Analysis,
Less Context Switching
Computing & Analysis
POST ADOPTION: AGILE DESIGN PROCESS
8
27. CONFIDENTIAL
Transform Healthcare/Life Sciences
The Problem in 2013:
• Cancer research needed 50,000 cores,
not available in-house
The options they didn’t choose:
• Buy infrastructure: Spend $2M, wait 6 months
• Write software for 9-12 months this 1 app
Solution:
• Created 10,600 server cluster
• 39.5 years of computing in 8 hours
• Found 3 potential drug candidates!
• Total infrastructure bill: $4,372
26
30. CONFIDENTIAL
San Diego BioTech
• 1+ Petabytes of data
• DirectConnect
• Uses DataMan to fully
utilize bandwidth
– Encryption keys
managed internally
– Scheduled and just in
time transfers, easy for
users
Internal
File System
1 Petabyte
Firewall
Amazon S3
Amazon
Glacier
HTTPS
Command
Lines/Sched
uled
Transfers
34. CONFIDENTIAL
For users, focus should be on science not IT.
Easy access to compute changes everything.
Accelerating compute accelerates people.
Data wants to be stored and processed.
Patterns
47. CONFIDENTIAL
The Challenge for the Scientist
• Dr. Mark Thompson
• “Solar energy has the potential
to replace some of our
dependence on fossil fuels, but
only if the solar panels can be
made very inexpensively and
have reasonable to high
efficiencies. Organic solar cells
have this potential.”
WELCOME
EVERYBODY ENJOYING REINVENT?
SHOW OF HANDS, 1st re:Invent, 2nd re:Invent, 3rd re:Invent, last night’s pub crawl?
big welcome to those attending their 1st re:Invent
huge thank you to those attending their 2nd and 3rd
thank you for coming to this session
My name is Dougal and I am a Solutions Architect with a focus on High Performance Computing. In this session, I am going to be covering the Elastic Block Store service or EBS as we typically call it, and hopefully dive deep and explain the different capabilities and features of EBS.
Unlimited Infrastructure – increased scalability and elasticity – go from 10’s to 1000’s of instances
Efficient clusters – our compute instances have high efficiency comparing actual performance vs theoretical performance: Rmax/Rpeak – this means you need less nodes in your cluster than inefficient clusters – HUGE potential cost savings. Tune your cluster, not tune to your cluster – with AWS you can pick the right EC2 instance type instead of being forced to optimize your workload for the fixed cluster you have in-house
Low Cost with Flexible pricing – multiple pricing models, pay as you go, no CAPEX
Increased collaboration – access to clusters and data can be from anywhere with an internet connection
Faster time to results –focus on your business/science, increase efficiency of your people by not being burdent by IT. While you can benchmark the cluster on the performance of a running a job, it is more important and comprehensive to benchmark the total time it takes to provision and use the cluster end-to-end
Concurrent Clusters on demand – no more waiting in a queue, run multiple jobs simultaneously with an API call
When Schrodinger Materials Science tools wanted to test out 200,000 different organic compounds to see which ones could be a good fit to be used in photovoltaic electricity generation, the amount of data it had to deal with was an inhibiting factor, to say the least. They estimated it would take them $68 million and 200-years on an on Prem machine. 150,000-core analytics job run on Amazon's cloud in 18 hours for $33,000 and exceeded 1.21 petaflops of computing capacity.
In 2013, Novartis ran a project that involved virtually screening 10 million compounds against a common cancer target in less than a week. They calculated that it would take 50,000 cores and close to a $40 million investment if they wanted to run the experiment internally. Partnering with Cycle Computing and Amazon Web Services (AWS), Novartis built a platform leveraging Amazon Simple Storage Service (Amazon S3), Amazon Elastic Block Store (Amazon EBS), and four Availability Zones. The project ran across 10,600 Spot Instances (approximately 87,000 compute cores) and allowed Novartis to conduct 39 years of computational chemistry in 9 hours for a cost of $4,232. Out of the 10 million compounds screened, three were successfully identified. Novartis Uses AWS to Conduct 39 Years of Computational Chemistry In 9 Hours
Now – that makes a great conversation!!
-----
A shout out - So what are some of the use cases for SPOT?
When Schrodinger Materials Science tools wanted to test out 200,000 different organic compounds to see which ones could be a good fit to be used in photovoltaic electricity generation, the amount of data it had to deal with was an inhibiting factor, to say the least. They estimated it would take them $68 million and 200-years on an on Prem machine. 150,000-core analytics job run on Amazon's cloud in 18 hours for $33,000 and exceeded 1.21 petaflops of computing capacity.
In 2013, Novartis ran a project that involved virtually screening 10 million compounds against a common cancer target in less than a week. They calculated that it would take 50,000 cores and close to a $40 million investment if they wanted to run the experiment internally. Partnering with Cycle Computing and Amazon Web Services (AWS), Novartis built a platform leveraging Amazon Simple Storage Service (Amazon S3), Amazon Elastic Block Store (Amazon EBS), and four Availability Zones. The project ran across 10,600 Spot Instances (approximately 87,000 compute cores) and allowed Novartis to conduct 39 years of computational chemistry in 9 hours for a cost of $4,232. Out of the 10 million compounds screened, three were successfully identified. Novartis Uses AWS to Conduct 39 Years of Computational Chemistry In 9 Hours
Now – that makes a great conversation!!
-----
A shout out - So what are some of the use cases for SPOT?
It is about the whole platform
Help people in a lot of industries
Here to talk about Life Sciences
What I like best about what we do is to enable
researchers to think about what is required to solve their problem, rather than what
is in their datacenter, if they can even get access to it
Ever touched a server?
Ever touched a rack?
Who here already uses AWS?How many would consider themselves expert/advanced AWS users?
Too small when you need it most,
too large every other time
Fixed-size cluster used for variable workloads means:
Looooong wait times for users
The more successful the organization the longer the queue times for users
Shared model but how do you know you are getting your fair share
Faster depreciation than a new sports car
Data wants to be:
stored properly
processed
Key Points:
Multi-billion dollar corps committed to getting better answers faster
(key on the hook)
What I like best about what we do is to enable
researchers to think about what is required to solve their problem, rather than what
is in their datacenter, if they can even get access to it
Too small when you need it most,
too large every other time
Fixed-size cluster used for variable workloads means:
Looooong wait times for users
The more successful the organization the longer the queue times for users
Shared model but how do you know you are getting your fair share
Faster depreciation than a new sports car