Univa Presentation at DAC 2020

1
Tutorial & Best Practices:
Running EDA Workloads in the Cloud
Rob Lalonde, VP & GM Cloud
Bill Bryce, VP Products

2
About Univa
• Leader in HPC workload management
• 250 global customers
• Hybrid, dedicated, private clouds
• 3.3M+ cores under management
• EDA, Manufacturing, Life Sciences, Oil & Gas,
Government, Research & Edu, Transportation
• Trusted by leading manufacturers

3
Key Focus area: Optimize cloud workloads
• Accelerate regression testing with high-
throughput workload scheduling
• Share resources optimally between diverse
workloads and different design efforts
• Maximize EDA license utilization with license
orchestration software
Advanced workload management and
resource sharing
Cloud migration, automation, and
spend management
• Easily extend on-prem environments to the
cloud to meet peak-demand
• Deploy cloud resources optimally for each
simulation, place workloads correctly
• Maximize the efficiency of cloud resource
usage with automation and spend mgmt.

4
2019 Univa InsideHPC cloud survey results
92%
Using or open to
HPC cloud - up
50% from 2017
64%
Say cloud has
proven value
or high
potential
See value in cloud
spend association
What we spend
BUT
76%
Have no
automated
solution
27%
Need help
27%
Manual
22%
Other
84%
< $10K
$10k to $100k
> $100k
27%
50%
34%
Dedicated
20%
Hybrid
47%
Both
Dedicated or Hybrid Cloud?
31%
In production
SLURM and Grid Engine represent
the majority of HPC cloud workloads
SLURM or
Grid Engine
54%
77%
Spend
monthly
8%
Power Users
75%
Univa sponsored survey – 2019 InsideHPC: Cloud Adoption for HPC: Trends and Opportunities
https://insidehpc.com/white-paper/cloud-adoption-for-hpc-trends-and-opportunities/

5
What customers tell us
• Increasing design complexity, higher gate counts
• Need for higher quality & reliability driving coverage requirements
IoT, SoC embedded, medical devices, etc.
• Shorter product cycles, time-to-market
• Many simulation types: analog, digital, functional, system-level,
multi-physics, ML
• Need to maximize EDA tool utilization
• Limited data center capacity and IT budgets
More than any other industry, EDA users are
continuously challenged to do more with less

6
A typical design environment
Interactive
users
License Server(s)
FlexNet Publisher
Project A
Project B
Project C
EDA Software
Licenses
License sharing
policies
General-
purpose
simulation
High-
throughput
servers
Place and
route
servers
Workload Management
Univa License
Orchestrator
Cloud InstancesOn-premise Infrastructure
Managed network, uniform DNS name-space Managed network, uniform DNS name-space
Cloud
APIs
• Gate Level Simulations (GLS)
• Register Transfer Level Simulations
• Transistor Level Modeling (TLM)
• Physical Verification
• Dynamic IR analysis
• Placement and clock optimization
• Static Timing Analysis (STA)
• Circuit Simulation
• Routing
Instance Provisioning

7
Use case #1: Cloud automation
Boost license utilization, reduce Capex
• EDA environments frequently have “bursty
workloads” – overlapping projects, different
resources requirements at different phases
• For cloud to be practical, cloud provisioning
needs to be automated and transparent to users
• “Bring-your-own-image” functionality (BYOI) for
straightforward cloud migration
• Automate runtime decisions to avoid
administrator effort and potential human error
• Maximize EDA license utilization to improve
overall productivity
CHALLENGE:
• Bursty simulation & verification workloads
• Need to defer/reduce CapEx
• On-premise cluster right sized for day-to-day workloads
• On-premise EDA licenses underutilized
SOLUTION:
• Hybrid Cloud – Navops Launch, Univa Grid Engine
• Auto-scale cloud capacity based on workloads
• Automated data migration to and from the cloud
• Analytics and license management
BUSINESS VALUE:
• Avoid bottlenecks during critical tapeout periods
• Reduce costs - pay for cloud when needed
• Maximize on-premise license usage by shifting non-
licensed work to the cloud improving overall productivity,
Details at: https://blogs.univa.com/2020/01/mission-is-possible-
tips-on-building-a-million-core-cluster/

8
Use case #2: Cloud simulation at extreme scale
Deploying a 1M+ vCPU cluster
• EDA verification and regression tests can run for
days accounting for approx. 80% of workloads
• Cloud capacity can dramatically reduce runtime
• Benefits: Reduced cycle time, more thorough
verification, higher quality, reduced schedule risk
• Many technical challenges solved: checkpointing,
reclaim rates, container registries, API calls etc.
CHALLENGE:
• Engineering design for next-gen hard disk drives
• Requires complex multi-physics simulations
• 2.5 million tasks require days on premise
• Need capacity for more complex designs
SOLUTION:
• Navops Launch – deployed 1M+ vCPU cluster in 90 mins
• 40,000 cloud instances, instances come and go
• Leveraged containerized workloads
• Lower costs with preemptible VMs, spot fleets
BUSINESS VALUE:
• ~60x reduction in runtime – 20 days to 8 hours
• Estimated 50% cost reduction vs on-prem resources
• Increased capacity for new product development
Details at: https://blogs.univa.com/2020/01/mission-is-possible-
tips-on-building-a-million-core-cluster/

9
Use case #3: Optimize cloud instance selection
• Different tools have different requirements
• For licensed tools, it can be more economical to
underutilize machine resources!
• Optimizing selection is a function of license and
instance costs, and tool performance
40
60
80
100
120
140
160
180
200
1 2 3 4 5 6 7 8
Timepersimulation(s)
Simultaneous simulations per cloud instance
Instance A
Instance B
Where should we operate?
2 sims on instance A provides 37%
better throughput but requires 4x the
number of machine instances
compared to 8 sims on instance B
• Topology-aware placement yields further gains
(reducing simulation time, improving efficiency)
• Place workloads for socket/core affinity,
maximize cache per sim, NUMA considerations,
distribute load across memory & I/O channels
S C T T C T T C T T C T T C T T C T T C T T C T T
Example: AMD ROME EPYC 7Fx2 processor –Google Cloud N2D VMs
Closely controlling placement on VM drives greater efficiency
COMMON CHALLENGES FOR EDA SITES:
• Need reporting and license analytics to optimize selection
• Need smart policy-based instance selection at runtime
• Need granular resource scheduling / job placement
Instance selection Workload placement

10
Use case #4: Share resources, manage spending
Share infrastructure and licenses
• Multiple project teams, multiple clusters
• Limited EDA feature licenses
• Need to allocate on-prem/cloud resources and
license features based on configurable policies
• Need to track actual cloud-spending and license
consumption by cost-center /project
• Automated mechanisms to throttle cloud
spending when budgets are exceeded
Manage cloud spend
SERVER MANAGED
LICENSES
FLEXERA
Publisher #1
FLEXERA
Publisher #1
Users
Cluster
LO CONNECTOR
Users
Cluster
LO CONNECTOR
Users
Cluster
LO CONNECTOR
(and additional
Tools)

11
Summary
• Cloud can provide significant additional capacity to speed regression
tests and other EDA workloads
• The key to making cloud cost-efficient is automation, efficient
provisioning, and minimizing impact on existing applications
• Operating at scale requires specific software features for provisioning
and scheduling – it’s challenging to keep cloud-scale clusters busy!
• Placing workloads optimally is key to maximizing the use of EDA
licenses and improving overall throughput and efficiency
• Cloud spend association & management is critical – many
organizations lack automated mechanisms to track and control
spending

Univa Presentation at DAC 2020

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Univa Presentation at DAC 2020

Similaire à Univa Presentation at DAC 2020 (20)

Dernier

Dernier (20)

Univa Presentation at DAC 2020