SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
1
Tutorial & Best Practices:
Running EDA Workloads in the Cloud
Rob Lalonde, VP & GM Cloud
Bill Bryce, VP Products
2
About Univa
• Leader in HPC workload management
• 250 global customers
• Hybrid, dedicated, private clouds
• 3.3M+ cores under management
• EDA, Manufacturing, Life Sciences, Oil & Gas,
Government, Research & Edu, Transportation
• Trusted by leading manufacturers
3
Key Focus area: Optimize cloud workloads
• Accelerate regression testing with high-
throughput workload scheduling
• Share resources optimally between diverse
workloads and different design efforts
• Maximize EDA license utilization with license
orchestration software
Advanced workload management and
resource sharing
Cloud migration, automation, and
spend management
• Easily extend on-prem environments to the
cloud to meet peak-demand
• Deploy cloud resources optimally for each
simulation, place workloads correctly
• Maximize the efficiency of cloud resource
usage with automation and spend mgmt.
4
2019 Univa InsideHPC cloud survey results
92%
Using or open to
HPC cloud - up
50% from 2017
64%
Say cloud has
proven value
or high
potential
See value in cloud
spend association
What we spend
BUT
76%
Have no
automated
solution
27%
Need help
27%
Manual
22%
Other
84%
< $10K
$10k to $100k
> $100k
27%
50%
34%
Dedicated
20%
Hybrid
47%
Both
Dedicated or Hybrid Cloud?
31%
In production
SLURM and Grid Engine represent
the majority of HPC cloud workloads
SLURM or
Grid Engine
54%
77%
Spend
monthly
8%
Power Users
75%
Univa sponsored survey – 2019 InsideHPC: Cloud Adoption for HPC: Trends and Opportunities
https://insidehpc.com/white-paper/cloud-adoption-for-hpc-trends-and-opportunities/
5
What customers tell us
• Increasing design complexity, higher gate counts
• Need for higher quality & reliability driving coverage requirements
IoT, SoC embedded, medical devices, etc.
• Shorter product cycles, time-to-market
• Many simulation types: analog, digital, functional, system-level,
multi-physics, ML
• Need to maximize EDA tool utilization
• Limited data center capacity and IT budgets
More than any other industry, EDA users are
continuously challenged to do more with less
6
A typical design environment
Interactive
users
License Server(s)
FlexNet Publisher
Project A
Project B
Project C
EDA Software
Licenses
License sharing
policies
General-
purpose
simulation
High-
throughput
servers
Place and
route
servers
Workload Management
Univa License
Orchestrator
Cloud InstancesOn-premise Infrastructure
Managed network, uniform DNS name-space Managed network, uniform DNS name-space
Cloud
APIs
• Gate Level Simulations (GLS)
• Register Transfer Level Simulations
• Transistor Level Modeling (TLM)
• Physical Verification
• Dynamic IR analysis
• Placement and clock optimization
• Static Timing Analysis (STA)
• Circuit Simulation
• Routing
Instance Provisioning
7
Use case #1: Cloud automation
Boost license utilization, reduce Capex
• EDA environments frequently have “bursty
workloads” – overlapping projects, different
resources requirements at different phases
• For cloud to be practical, cloud provisioning
needs to be automated and transparent to users
• “Bring-your-own-image” functionality (BYOI) for
straightforward cloud migration
• Automate runtime decisions to avoid
administrator effort and potential human error
• Maximize EDA license utilization to improve
overall productivity
CHALLENGE:
• Bursty simulation & verification workloads
• Need to defer/reduce CapEx
• On-premise cluster right sized for day-to-day workloads
• On-premise EDA licenses underutilized
SOLUTION:
• Hybrid Cloud – Navops Launch, Univa Grid Engine
• Auto-scale cloud capacity based on workloads
• Automated data migration to and from the cloud
• Analytics and license management
BUSINESS VALUE:
• Avoid bottlenecks during critical tapeout periods
• Reduce costs - pay for cloud when needed
• Maximize on-premise license usage by shifting non-
licensed work to the cloud improving overall productivity,
Details at: https://blogs.univa.com/2020/01/mission-is-possible-
tips-on-building-a-million-core-cluster/
8
Use case #2: Cloud simulation at extreme scale
Deploying a 1M+ vCPU cluster
• EDA verification and regression tests can run for
days accounting for approx. 80% of workloads
• Cloud capacity can dramatically reduce runtime
• Benefits: Reduced cycle time, more thorough
verification, higher quality, reduced schedule risk
• Many technical challenges solved: checkpointing,
reclaim rates, container registries, API calls etc.
CHALLENGE:
• Engineering design for next-gen hard disk drives
• Requires complex multi-physics simulations
• 2.5 million tasks require days on premise
• Need capacity for more complex designs
SOLUTION:
• Navops Launch – deployed 1M+ vCPU cluster in 90 mins
• 40,000 cloud instances, instances come and go
• Leveraged containerized workloads
• Lower costs with preemptible VMs, spot fleets
BUSINESS VALUE:
• ~60x reduction in runtime – 20 days to 8 hours
• Estimated 50% cost reduction vs on-prem resources
• Increased capacity for new product development
Details at: https://blogs.univa.com/2020/01/mission-is-possible-
tips-on-building-a-million-core-cluster/
9
Use case #3: Optimize cloud instance selection
• Different tools have different requirements
• For licensed tools, it can be more economical to
underutilize machine resources!
• Optimizing selection is a function of license and
instance costs, and tool performance
40
60
80
100
120
140
160
180
200
1 2 3 4 5 6 7 8
Timepersimulation(s)
Simultaneous simulations per cloud instance
Instance A
Instance B
Where should we operate?
2 sims on instance A provides 37%
better throughput but requires 4x the
number of machine instances
compared to 8 sims on instance B
• Topology-aware placement yields further gains
(reducing simulation time, improving efficiency)
• Place workloads for socket/core affinity,
maximize cache per sim, NUMA considerations,
distribute load across memory & I/O channels
S C T T C T T C T T C T T C T T C T T C T T C T T
Example: AMD ROME EPYC 7Fx2 processor –Google Cloud N2D VMs
Closely controlling placement on VM drives greater efficiency
COMMON CHALLENGES FOR EDA SITES:
• Need reporting and license analytics to optimize selection
• Need smart policy-based instance selection at runtime
• Need granular resource scheduling / job placement
Instance selection Workload placement
10
Use case #4: Share resources, manage spending
Share infrastructure and licenses
• Multiple project teams, multiple clusters
• Limited EDA feature licenses
• Need to allocate on-prem/cloud resources and
license features based on configurable policies
• Need to track actual cloud-spending and license
consumption by cost-center /project
• Automated mechanisms to throttle cloud
spending when budgets are exceeded
Manage cloud spend
SERVER MANAGED
LICENSES
FLEXERA
Publisher #1
FLEXERA
Publisher #1
Users
Cluster
LO CONNECTOR
Users
Cluster
LO CONNECTOR
Users
Cluster
LO CONNECTOR
(and additional
Tools)
11
Summary
• Cloud can provide significant additional capacity to speed regression
tests and other EDA workloads
• The key to making cloud cost-efficient is automation, efficient
provisioning, and minimizing impact on existing applications
• Operating at scale requires specific software features for provisioning
and scheduling – it’s challenging to keep cloud-scale clusters busy!
• Placing workloads optimally is key to maximizing the use of EDA
licenses and improving overall throughput and efficiency
• Cloud spend association & management is critical – many
organizations lack automated mechanisms to track and control
spending
12
Cloud Bursting Demo
13
Discussion and Questions?

Contenu connexe

Tendances

Track 2, session 4, data protection and disaster recovery with riverbed
Track 2, session 4, data protection and disaster recovery with riverbedTrack 2, session 4, data protection and disaster recovery with riverbed
Track 2, session 4, data protection and disaster recovery with riverbed
EMC Forum India
 
FT Architecture For Cloud Service Computing
FT Architecture For Cloud Service ComputingFT Architecture For Cloud Service Computing
FT Architecture For Cloud Service Computing
destruck
 

Tendances (20)

How to Make Your Move to the Cloud with Confidence
How to Make Your Move to the Cloud with ConfidenceHow to Make Your Move to the Cloud with Confidence
How to Make Your Move to the Cloud with Confidence
 
When HPC meet ML/DL: Manage HPC Data Center with Kubernetes
When HPC meet ML/DL: Manage HPC Data Center with KubernetesWhen HPC meet ML/DL: Manage HPC Data Center with Kubernetes
When HPC meet ML/DL: Manage HPC Data Center with Kubernetes
 
NICTA, Disaster Recovery Using OpenStack
NICTA, Disaster Recovery Using OpenStackNICTA, Disaster Recovery Using OpenStack
NICTA, Disaster Recovery Using OpenStack
 
AWS Sydney Summit 2013 - Technical Lessons on How to do DR in the Cloud
AWS Sydney Summit 2013 - Technical Lessons on How to do DR in the CloudAWS Sydney Summit 2013 - Technical Lessons on How to do DR in the Cloud
AWS Sydney Summit 2013 - Technical Lessons on How to do DR in the Cloud
 
Klepsydra Technical Presentation
Klepsydra Technical PresentationKlepsydra Technical Presentation
Klepsydra Technical Presentation
 
Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)Mod05lec24(resource mgmt i)
Mod05lec24(resource mgmt i)
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
Experiences in Delivering Spark as a Service
Experiences in Delivering Spark as a ServiceExperiences in Delivering Spark as a Service
Experiences in Delivering Spark as a Service
 
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
 
Jelastic Overview
Jelastic OverviewJelastic Overview
Jelastic Overview
 
Observability with Spring-based distributed systems
Observability with Spring-based distributed systemsObservability with Spring-based distributed systems
Observability with Spring-based distributed systems
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
Track 2, session 4, data protection and disaster recovery with riverbed
Track 2, session 4, data protection and disaster recovery with riverbedTrack 2, session 4, data protection and disaster recovery with riverbed
Track 2, session 4, data protection and disaster recovery with riverbed
 
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
Kubernetes on EGO : Bringing enterprise resource management and scheduling to...
 
RightScale Webinar feat. Redapt: How to Build a Private or Hybrid Cloud
RightScale Webinar feat. Redapt:  How to Build a Private or Hybrid CloudRightScale Webinar feat. Redapt:  How to Build a Private or Hybrid Cloud
RightScale Webinar feat. Redapt: How to Build a Private or Hybrid Cloud
 
Edge 2016 Session 1886 Building your own docker container cloud on ibm power...
Edge 2016 Session 1886  Building your own docker container cloud on ibm power...Edge 2016 Session 1886  Building your own docker container cloud on ibm power...
Edge 2016 Session 1886 Building your own docker container cloud on ibm power...
 
FT Architecture For Cloud Service Computing
FT Architecture For Cloud Service ComputingFT Architecture For Cloud Service Computing
FT Architecture For Cloud Service Computing
 
VMware: Hybrid Cloud for Increased Scientific Agility
VMware: Hybrid Cloud for Increased Scientific AgilityVMware: Hybrid Cloud for Increased Scientific Agility
VMware: Hybrid Cloud for Increased Scientific Agility
 
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deploymentsSAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
SAP Teched 2012 Session Tec3438 Automate IaaS SAP deployments
 

Similaire à Univa Presentation at DAC 2020

Cloud_Testing_The_future_of_softwareV1.04
Cloud_Testing_The_future_of_softwareV1.04Cloud_Testing_The_future_of_softwareV1.04
Cloud_Testing_The_future_of_softwareV1.04
Mrityunjaya Hikkalgutti
 
10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud
NGINX, Inc.
 

Similaire à Univa Presentation at DAC 2020 (20)

Navops talk at hpc in the cloud meetup 19 march 2019
Navops talk at hpc in the cloud meetup 19 march 2019Navops talk at hpc in the cloud meetup 19 march 2019
Navops talk at hpc in the cloud meetup 19 march 2019
 
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid ITAsk The Architect: RightScale & AWS Dive Deep into Hybrid IT
Ask The Architect: RightScale & AWS Dive Deep into Hybrid IT
 
Cloud_Testing_The_future_of_softwareV1.04
Cloud_Testing_The_future_of_softwareV1.04Cloud_Testing_The_future_of_softwareV1.04
Cloud_Testing_The_future_of_softwareV1.04
 
10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud10 Key Steps for Moving from Legacy Infrastructure to the Cloud
10 Key Steps for Moving from Legacy Infrastructure to the Cloud
 
Enterprise Cloud Transformation
Enterprise Cloud TransformationEnterprise Cloud Transformation
Enterprise Cloud Transformation
 
Taming the cost of your first cloud - CCCEU 2014
Taming the cost of your first cloud - CCCEU 2014Taming the cost of your first cloud - CCCEU 2014
Taming the cost of your first cloud - CCCEU 2014
 
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoFInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
 
Webinar: Burst ANSYS Workloads to the Cloud with Univa & UberCloud
Webinar: Burst ANSYS Workloads to the Cloud with Univa & UberCloudWebinar: Burst ANSYS Workloads to the Cloud with Univa & UberCloud
Webinar: Burst ANSYS Workloads to the Cloud with Univa & UberCloud
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch Deck
 
Migration Recipes for Success - AWS Summit Cape Town 2017
Migration Recipes for Success - AWS Summit Cape Town 2017 Migration Recipes for Success - AWS Summit Cape Town 2017
Migration Recipes for Success - AWS Summit Cape Town 2017
 
Cloud Migration and Portability Best Practices
Cloud Migration and Portability Best PracticesCloud Migration and Portability Best Practices
Cloud Migration and Portability Best Practices
 
Moving Applications to the Cloud
Moving Applications to the CloudMoving Applications to the Cloud
Moving Applications to the Cloud
 
A Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data VirtualizationA Successful Journey to the Cloud with Data Virtualization
A Successful Journey to the Cloud with Data Virtualization
 
Interop ITX: Moving applications: From Legacy to Cloud-to-Cloud
Interop ITX: Moving applications: From Legacy to Cloud-to-CloudInterop ITX: Moving applications: From Legacy to Cloud-to-Cloud
Interop ITX: Moving applications: From Legacy to Cloud-to-Cloud
 
Coud computing
Coud computingCoud computing
Coud computing
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
Cloud Migration Cookbook: A Guide To Moving Your Apps To The Cloud
Cloud Migration Cookbook: A Guide To Moving Your Apps To The CloudCloud Migration Cookbook: A Guide To Moving Your Apps To The Cloud
Cloud Migration Cookbook: A Guide To Moving Your Apps To The Cloud
 
Ravi namboori ppt
Ravi namboori pptRavi namboori ppt
Ravi namboori ppt
 
Ravi namboori-cloud computing
Ravi namboori-cloud computingRavi namboori-cloud computing
Ravi namboori-cloud computing
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Univa Presentation at DAC 2020

  • 1. 1 Tutorial & Best Practices: Running EDA Workloads in the Cloud Rob Lalonde, VP & GM Cloud Bill Bryce, VP Products
  • 2. 2 About Univa • Leader in HPC workload management • 250 global customers • Hybrid, dedicated, private clouds • 3.3M+ cores under management • EDA, Manufacturing, Life Sciences, Oil & Gas, Government, Research & Edu, Transportation • Trusted by leading manufacturers
  • 3. 3 Key Focus area: Optimize cloud workloads • Accelerate regression testing with high- throughput workload scheduling • Share resources optimally between diverse workloads and different design efforts • Maximize EDA license utilization with license orchestration software Advanced workload management and resource sharing Cloud migration, automation, and spend management • Easily extend on-prem environments to the cloud to meet peak-demand • Deploy cloud resources optimally for each simulation, place workloads correctly • Maximize the efficiency of cloud resource usage with automation and spend mgmt.
  • 4. 4 2019 Univa InsideHPC cloud survey results 92% Using or open to HPC cloud - up 50% from 2017 64% Say cloud has proven value or high potential See value in cloud spend association What we spend BUT 76% Have no automated solution 27% Need help 27% Manual 22% Other 84% < $10K $10k to $100k > $100k 27% 50% 34% Dedicated 20% Hybrid 47% Both Dedicated or Hybrid Cloud? 31% In production SLURM and Grid Engine represent the majority of HPC cloud workloads SLURM or Grid Engine 54% 77% Spend monthly 8% Power Users 75% Univa sponsored survey – 2019 InsideHPC: Cloud Adoption for HPC: Trends and Opportunities https://insidehpc.com/white-paper/cloud-adoption-for-hpc-trends-and-opportunities/
  • 5. 5 What customers tell us • Increasing design complexity, higher gate counts • Need for higher quality & reliability driving coverage requirements IoT, SoC embedded, medical devices, etc. • Shorter product cycles, time-to-market • Many simulation types: analog, digital, functional, system-level, multi-physics, ML • Need to maximize EDA tool utilization • Limited data center capacity and IT budgets More than any other industry, EDA users are continuously challenged to do more with less
  • 6. 6 A typical design environment Interactive users License Server(s) FlexNet Publisher Project A Project B Project C EDA Software Licenses License sharing policies General- purpose simulation High- throughput servers Place and route servers Workload Management Univa License Orchestrator Cloud InstancesOn-premise Infrastructure Managed network, uniform DNS name-space Managed network, uniform DNS name-space Cloud APIs • Gate Level Simulations (GLS) • Register Transfer Level Simulations • Transistor Level Modeling (TLM) • Physical Verification • Dynamic IR analysis • Placement and clock optimization • Static Timing Analysis (STA) • Circuit Simulation • Routing Instance Provisioning
  • 7. 7 Use case #1: Cloud automation Boost license utilization, reduce Capex • EDA environments frequently have “bursty workloads” – overlapping projects, different resources requirements at different phases • For cloud to be practical, cloud provisioning needs to be automated and transparent to users • “Bring-your-own-image” functionality (BYOI) for straightforward cloud migration • Automate runtime decisions to avoid administrator effort and potential human error • Maximize EDA license utilization to improve overall productivity CHALLENGE: • Bursty simulation & verification workloads • Need to defer/reduce CapEx • On-premise cluster right sized for day-to-day workloads • On-premise EDA licenses underutilized SOLUTION: • Hybrid Cloud – Navops Launch, Univa Grid Engine • Auto-scale cloud capacity based on workloads • Automated data migration to and from the cloud • Analytics and license management BUSINESS VALUE: • Avoid bottlenecks during critical tapeout periods • Reduce costs - pay for cloud when needed • Maximize on-premise license usage by shifting non- licensed work to the cloud improving overall productivity, Details at: https://blogs.univa.com/2020/01/mission-is-possible- tips-on-building-a-million-core-cluster/
  • 8. 8 Use case #2: Cloud simulation at extreme scale Deploying a 1M+ vCPU cluster • EDA verification and regression tests can run for days accounting for approx. 80% of workloads • Cloud capacity can dramatically reduce runtime • Benefits: Reduced cycle time, more thorough verification, higher quality, reduced schedule risk • Many technical challenges solved: checkpointing, reclaim rates, container registries, API calls etc. CHALLENGE: • Engineering design for next-gen hard disk drives • Requires complex multi-physics simulations • 2.5 million tasks require days on premise • Need capacity for more complex designs SOLUTION: • Navops Launch – deployed 1M+ vCPU cluster in 90 mins • 40,000 cloud instances, instances come and go • Leveraged containerized workloads • Lower costs with preemptible VMs, spot fleets BUSINESS VALUE: • ~60x reduction in runtime – 20 days to 8 hours • Estimated 50% cost reduction vs on-prem resources • Increased capacity for new product development Details at: https://blogs.univa.com/2020/01/mission-is-possible- tips-on-building-a-million-core-cluster/
  • 9. 9 Use case #3: Optimize cloud instance selection • Different tools have different requirements • For licensed tools, it can be more economical to underutilize machine resources! • Optimizing selection is a function of license and instance costs, and tool performance 40 60 80 100 120 140 160 180 200 1 2 3 4 5 6 7 8 Timepersimulation(s) Simultaneous simulations per cloud instance Instance A Instance B Where should we operate? 2 sims on instance A provides 37% better throughput but requires 4x the number of machine instances compared to 8 sims on instance B • Topology-aware placement yields further gains (reducing simulation time, improving efficiency) • Place workloads for socket/core affinity, maximize cache per sim, NUMA considerations, distribute load across memory & I/O channels S C T T C T T C T T C T T C T T C T T C T T C T T Example: AMD ROME EPYC 7Fx2 processor –Google Cloud N2D VMs Closely controlling placement on VM drives greater efficiency COMMON CHALLENGES FOR EDA SITES: • Need reporting and license analytics to optimize selection • Need smart policy-based instance selection at runtime • Need granular resource scheduling / job placement Instance selection Workload placement
  • 10. 10 Use case #4: Share resources, manage spending Share infrastructure and licenses • Multiple project teams, multiple clusters • Limited EDA feature licenses • Need to allocate on-prem/cloud resources and license features based on configurable policies • Need to track actual cloud-spending and license consumption by cost-center /project • Automated mechanisms to throttle cloud spending when budgets are exceeded Manage cloud spend SERVER MANAGED LICENSES FLEXERA Publisher #1 FLEXERA Publisher #1 Users Cluster LO CONNECTOR Users Cluster LO CONNECTOR Users Cluster LO CONNECTOR (and additional Tools)
  • 11. 11 Summary • Cloud can provide significant additional capacity to speed regression tests and other EDA workloads • The key to making cloud cost-efficient is automation, efficient provisioning, and minimizing impact on existing applications • Operating at scale requires specific software features for provisioning and scheduling – it’s challenging to keep cloud-scale clusters busy! • Placing workloads optimally is key to maximizing the use of EDA licenses and improving overall throughput and efficiency • Cloud spend association & management is critical – many organizations lack automated mechanisms to track and control spending