SlideShare a Scribd company logo
1 of 13
Download to read offline
How Open Technology Helped DOE Labs Place
Sixteen Supercomputers on the Top500
Sid Mair– OCP 2019
© 2019 Penguin Computing
About Penguin Computing
2
▪ Specialized high-performance computing (HPC), bare
metal HPC in the cloud, AI, and storage technologies
▪ Coupled with leading-edge design, implementation,
hosting, and managed services including sys-admin
and storage-as-a-service, and highly rated customer
support
▪ More than 2,500 customers in 50 countries across
nine major vertical markets
▪ Over 300 OCP racks delivered to date based on
Tundra™ Extreme Scale Design
▪ 20 years of artificial intelligence (AI), engineering, and computer
science for startups, Fortune 500, government, and academic
organizations
© 2019 Penguin Computing
OCP Clusters in Top500
3
Sixteen supercomputers designed and built by Penguin Computing
placed on the TOP500 List since 2016
▪ All OCP based & deployed in U.S. national labs as part of the U.S. Department of
Energy as an alternative to explosive test-based confidence
▪ Part of the DOE’s Advanced
Simulation and Computing (ASC)
program
▪ Provide simulation-based
confidence in the nuclear
stockpile, an alternative to
explosive test-based confidence
© 2019 Penguin Computing
CTS-1: A Perfect Opportunity for OCP Design
4
▪ Supports National Nuclear Security Administration (NNSA) to ensure nuclear
stockpile stewardship in compliance with the Comprehensive Nuclear-Test-
Ban Treaty (CTBT) between the U.S. and the former Soviet Union
▪ 30,000 Broadwell / Skylake dual processor nodes to date
▪ Commodity clusters brought down the cost of HPC systems from
approximately $100 million per teraFLOP in 1995 to less than $5,000 per
teraFLOP today (factor of 20,000) with greater computing power and energy
efficiency with each generation
▪ Exemplifies how OCP-based technology can give organizations both value and
performance
▪ Provides flexibility in CPU Architectures, accelerators, and interconnects
© 2019 Penguin Computing
The Key -- Uniquely Flexible, Dense Tundra Design
5
Baseline Tundra™ Extreme Scale
▪ First was 1OU, 3 node, CPU compute, housed in v1 rack
▪ Supports up to 102 nodes per rack with switching
▪ May include GPGPU accelerated servers
▪ High-speed, low latency interconnects
Workloads synchronized within microseconds between
nodes
▪ Air or liquid cooling (Direct-to-Chip or Rear Doors)
▪ Latest generation open bridge rack
▪ Accessible via cloud through Penguin Computing On-
Demand®️ (POD)
▪ Multiple storage options
© 2019 Penguin Computing 6
▪ Shelf Dimensions: 3U (H) x 19in (W) x 25.8in (D)
▪ Many voltage options available (176-305Vac)
Nine (9) slots for 3300W Rectifiers and BBUs
208V Single Power System Output (Max) =
16.5kW (N+1)
277/480V Single Power System (Max) = 26.4kW
(N+1)
277/480V Dual Zone Configuration (Max) =
52.8kW (N+1)
▪ Redundant power options available
▪ Provides 3 Pair of DC Output Bus Bar Connections
▪ Temperature Env -10C to +45C (+14F to +113F)
▪ 2 AC Convenience Outlets for switching (Gen II)
Vertiv HPC Power Shelf (3 x 12V DC Bus Bar)
[Feb’18 Updated]
© 2019 Penguin Computing 7
CPU Processing
▪ Three Nodes in 1 OpenU Form Factor
▪ Dual Socket Intel®️ Xeon®️ E5-2600v4 per node
▪ Up to 1TB DDR4-2400MHz (8x DIMMs) per node
▪ Intel®️ C612 Chipset
▪ 1x Dedicated BMC
▪ 1x PCIe 3.0 x16
▪ 1x 2.5" Fixed SATA SSD
▪ Dual 1GbE/RJ45 LOM
▪ Optional 1x FDR LOM
▪ Supports Asetek Direct-to-Chip Cooling
Compute Node: Relion 1930e
© 2019 Penguin Computing 8
Compute Node: Relion XO1114GT
GPU Computing
▪ 1 OU Form Factor
▪ Dual Intel Xeon Cascade Lake-SP / Skylake-SP with
Intel Omni-Path
▪ Up to 2TB DDR4-2933MHz (16x DIMMs)
▪ Intel®️ Lewisburg Chipset
▪ 1x Dedicated BMC
▪ Supports 4x GPGPU
▪ Nvidia Tesla Volta-PCIe
▪ 2x PCIe 3.0 Low profile (Speed depends on topology)
▪ 4x 2.5" SATA SSD
▪ Dual 1GbE/RJ45 LOM
▪ Support Asetek Direct-to-Chip Cooling (CPUs and GPUs)
▪ Flexible PCIe Topology
© 2019 Penguin Computing 9
CTS-1 Delivered Systems Through Spring 2019
© 2019 Penguin Computing
Flexible Architecture Enables Scalable Configurations
10
Tundra Extreme Scale, Xeon E5-2695v4 18C 2.1GHz, Intel Omni-Path
▪ LLNL “Quartz” - orig 14SU, expanding 2SU in 2019. TTL will be 16SU.
▪ LLNL “Jade” - 14SU
▪ LANL “Grizzly” w/ “Snow” is 10SU
▪ SNL “Serrano” and “Cayenne” - separate 6SU
▪ SNL “Eclipse” - base 6SU purchased in 2017, 2SU expanded in 2018
▪ LANL “Badger” - purchased in 2017, doubled in 2018
▪ LANL “Kodiak” (GPU cluster) - purchased in 2017, expanded in 2018
▪ LANL “Fire”, “Ice” and “Cyclone” - all separate 6SU
© 2019 Penguin Computing
OCP for AI
11
NEW Tundra-based “Corona” AI Cluster leverages flexibility, density,
efficiency of OCP
▪ AMD EPYC™ processors, AMD Radeon™ Instinct™ GPU accelerators
▪ 383 teraFLOPS (floating point operations per second)
▪ 170 two-socket nodes incorporating 24-core AMD EPYC™ 7401
processors and a PCIe 1.6 Terabyte (TB) nonvolatile (solid-state)
memory device
▪ Half of compute nodes utilize 4 AMD Radeon Instinct™ MI25 GPUs per
node, delivering 4.2 petaFLOPS of FP32 peak performance
▪ Connected via a Mellanox HDR 200 Gigabit InfiniBand network
▪ Remaining compute nodes may be upgraded with future GPUs
© 2019 Penguin Computing 12
Questions?
www.penguincomputing.com
1-888-PENGUIN

More Related Content

What's hot

Energy Saving ARM Server Cluster Born for Distributed Storage & Computing
Energy Saving ARM Server Cluster Born for Distributed Storage & ComputingEnergy Saving ARM Server Cluster Born for Distributed Storage & Computing
Energy Saving ARM Server Cluster Born for Distributed Storage & ComputingAaron Joue
 
Software Defined Storage Appliance Power by ARM based Microserver
Software Defined Storage Appliance Power by ARM based MicroserverSoftware Defined Storage Appliance Power by ARM based Microserver
Software Defined Storage Appliance Power by ARM based MicroserverAaron Joue
 
ARM server, The Cy7 Introduction by Aaron Joue, Ambedded Technology
ARM server, The Cy7 Introduction by Aaron Joue, Ambedded TechnologyARM server, The Cy7 Introduction by Aaron Joue, Ambedded Technology
ARM server, The Cy7 Introduction by Aaron Joue, Ambedded TechnologyAaron Joue
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftJeremy Eder
 
NVIDIA GTC 2018: Enabling GPU-as-a-Service Providers with Red Hat OpenShift
NVIDIA GTC 2018:  Enabling GPU-as-a-Service Providers with Red Hat OpenShiftNVIDIA GTC 2018:  Enabling GPU-as-a-Service Providers with Red Hat OpenShift
NVIDIA GTC 2018: Enabling GPU-as-a-Service Providers with Red Hat OpenShiftJeremy Eder
 
Benchmarking your cloud performance with top 4 global public clouds
Benchmarking your cloud performance with top 4 global public cloudsBenchmarking your cloud performance with top 4 global public clouds
Benchmarking your cloud performance with top 4 global public cloudsdata://disrupted®
 
AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center AMD
 
Propelling IoT Innovation with Predictive Analytics
Propelling IoT Innovation with Predictive AnalyticsPropelling IoT Innovation with Predictive Analytics
Propelling IoT Innovation with Predictive AnalyticsSingleStore
 
Red Hat Storage Day Dallas - Why Software-defined Storage Matters
Red Hat Storage Day Dallas - Why Software-defined Storage MattersRed Hat Storage Day Dallas - Why Software-defined Storage Matters
Red Hat Storage Day Dallas - Why Software-defined Storage MattersRed_Hat_Storage
 
PowerEdge Rack and Tower Server Masters AMD Processors.pptx
PowerEdge Rack and Tower Server Masters AMD Processors.pptxPowerEdge Rack and Tower Server Masters AMD Processors.pptx
PowerEdge Rack and Tower Server Masters AMD Processors.pptxNeoKenj
 
Triangle Kubernetes Meetup - Performance Sensitive Apps in OpenShift
Triangle Kubernetes Meetup - Performance Sensitive Apps in OpenShiftTriangle Kubernetes Meetup - Performance Sensitive Apps in OpenShift
Triangle Kubernetes Meetup - Performance Sensitive Apps in OpenShiftJeremy Eder
 

What's hot (17)

Energy Saving ARM Server Cluster Born for Distributed Storage & Computing
Energy Saving ARM Server Cluster Born for Distributed Storage & ComputingEnergy Saving ARM Server Cluster Born for Distributed Storage & Computing
Energy Saving ARM Server Cluster Born for Distributed Storage & Computing
 
Software Defined Storage Appliance Power by ARM based Microserver
Software Defined Storage Appliance Power by ARM based MicroserverSoftware Defined Storage Appliance Power by ARM based Microserver
Software Defined Storage Appliance Power by ARM based Microserver
 
ARM server, The Cy7 Introduction by Aaron Joue, Ambedded Technology
ARM server, The Cy7 Introduction by Aaron Joue, Ambedded TechnologyARM server, The Cy7 Introduction by Aaron Joue, Ambedded Technology
ARM server, The Cy7 Introduction by Aaron Joue, Ambedded Technology
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShift
 
@IBM Power roadmap 8
@IBM Power roadmap 8 @IBM Power roadmap 8
@IBM Power roadmap 8
 
NSCC Training Introductory Class
NSCC Training Introductory Class NSCC Training Introductory Class
NSCC Training Introductory Class
 
NSCC Training - Introductory Class
NSCC Training - Introductory ClassNSCC Training - Introductory Class
NSCC Training - Introductory Class
 
NSCC Training Introductory Class
NSCC Training  Introductory ClassNSCC Training  Introductory Class
NSCC Training Introductory Class
 
NVIDIA GTC 2018: Enabling GPU-as-a-Service Providers with Red Hat OpenShift
NVIDIA GTC 2018:  Enabling GPU-as-a-Service Providers with Red Hat OpenShiftNVIDIA GTC 2018:  Enabling GPU-as-a-Service Providers with Red Hat OpenShift
NVIDIA GTC 2018: Enabling GPU-as-a-Service Providers with Red Hat OpenShift
 
Benchmarking your cloud performance with top 4 global public clouds
Benchmarking your cloud performance with top 4 global public cloudsBenchmarking your cloud performance with top 4 global public clouds
Benchmarking your cloud performance with top 4 global public clouds
 
AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center
 
Ceph's journey at SUSE
Ceph's journey at SUSECeph's journey at SUSE
Ceph's journey at SUSE
 
Propelling IoT Innovation with Predictive Analytics
Propelling IoT Innovation with Predictive AnalyticsPropelling IoT Innovation with Predictive Analytics
Propelling IoT Innovation with Predictive Analytics
 
Red Hat Storage Day Dallas - Why Software-defined Storage Matters
Red Hat Storage Day Dallas - Why Software-defined Storage MattersRed Hat Storage Day Dallas - Why Software-defined Storage Matters
Red Hat Storage Day Dallas - Why Software-defined Storage Matters
 
PowerEdge Rack and Tower Server Masters AMD Processors.pptx
PowerEdge Rack and Tower Server Masters AMD Processors.pptxPowerEdge Rack and Tower Server Masters AMD Processors.pptx
PowerEdge Rack and Tower Server Masters AMD Processors.pptx
 
Triangle Kubernetes Meetup - Performance Sensitive Apps in OpenShift
Triangle Kubernetes Meetup - Performance Sensitive Apps in OpenShiftTriangle Kubernetes Meetup - Performance Sensitive Apps in OpenShift
Triangle Kubernetes Meetup - Performance Sensitive Apps in OpenShift
 

Similar to How Open Technology Helped DOE Labs Place Sixteen Supercomputers on the Top500

Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 
PLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP MoonshotPLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP MoonshotPROIDEA
 
Emerging Cloud Storage Trends for Enterprises
Emerging Cloud Storage Trends for EnterprisesEmerging Cloud Storage Trends for Enterprises
Emerging Cloud Storage Trends for EnterprisesRebekah Rodriguez
 
Consumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxRebekah Rodriguez
 
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCinside-BigData.com
 
Pro Tips: Building for Hyperscale
Pro Tips: Building for HyperscalePro Tips: Building for Hyperscale
Pro Tips: Building for HyperscalePenguin Computing
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)inside-BigData.com
 
TechWiseTV Workshop: Cisco UCS C4200
TechWiseTV Workshop: Cisco UCS C4200TechWiseTV Workshop: Cisco UCS C4200
TechWiseTV Workshop: Cisco UCS C4200Robb Boyd
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specificationsinside-BigData.com
 
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...Embarcados
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1blewington
 
The Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade SystemsThe Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade SystemsRebekah Rodriguez
 
Blue line Supermicro Server Building Block Solutions
Blue line Supermicro Server Building Block SolutionsBlue line Supermicro Server Building Block Solutions
Blue line Supermicro Server Building Block SolutionsBlue Line
 
Fpga computing 14 03 2013
Fpga computing 14 03 2013Fpga computing 14 03 2013
Fpga computing 14 03 2013Eurotech Aurora
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Netronome
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PC Cluster Consortium
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchRyousei Takano
 

Similar to How Open Technology Helped DOE Labs Place Sixteen Supercomputers on the Top500 (20)

Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
PLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP MoonshotPLNOG 13: Maciej Grabowski: HP Moonshot
PLNOG 13: Maciej Grabowski: HP Moonshot
 
Emerging Cloud Storage Trends for Enterprises
Emerging Cloud Storage Trends for EnterprisesEmerging Cloud Storage Trends for Enterprises
Emerging Cloud Storage Trends for Enterprises
 
Consumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a BoxConsumption Based On-Demand Private Cloud in a Box
Consumption Based On-Demand Private Cloud in a Box
 
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCExceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC
 
Pro Tips: Building for Hyperscale
Pro Tips: Building for HyperscalePro Tips: Building for Hyperscale
Pro Tips: Building for Hyperscale
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
 
TechWiseTV Workshop: Cisco UCS C4200
TechWiseTV Workshop: Cisco UCS C4200TechWiseTV Workshop: Cisco UCS C4200
TechWiseTV Workshop: Cisco UCS C4200
 
IBM Power9 Features and Specifications
IBM Power9 Features and SpecificationsIBM Power9 Features and Specifications
IBM Power9 Features and Specifications
 
Lenovo HPC Strategy Update
Lenovo HPC Strategy UpdateLenovo HPC Strategy Update
Lenovo HPC Strategy Update
 
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
The Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade SystemsThe Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
 
Blue line Supermicro Server Building Block Solutions
Blue line Supermicro Server Building Block SolutionsBlue line Supermicro Server Building Block Solutions
Blue line Supermicro Server Building Block Solutions
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 
Fpga computing 14 03 2013
Fpga computing 14 03 2013Fpga computing 14 03 2013
Fpga computing 14 03 2013
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
Qnap nas tvs serie x63-catalogo
Qnap nas tvs serie x63-catalogoQnap nas tvs serie x63-catalogo
Qnap nas tvs serie x63-catalogo
 

More from Penguin Computing

Pro Tips: Designing and Deploying End-to-End HPC and AI Solutions
Pro Tips: Designing and Deploying End-to-End HPC and AI SolutionsPro Tips: Designing and Deploying End-to-End HPC and AI Solutions
Pro Tips: Designing and Deploying End-to-End HPC and AI SolutionsPenguin Computing
 
Pro Tips: When to Choose Private vs Public vs Hybrid Cloud
Pro Tips: When to Choose Private vs Public vs Hybrid CloudPro Tips: When to Choose Private vs Public vs Hybrid Cloud
Pro Tips: When to Choose Private vs Public vs Hybrid CloudPenguin Computing
 
Partner Perspectives: The OCP Community
Partner Perspectives: The OCP CommunityPartner Perspectives: The OCP Community
Partner Perspectives: The OCP CommunityPenguin Computing
 
Ocp recommended profiles for next generation OCP Racks
Ocp recommended profiles for next generation OCP RacksOcp recommended profiles for next generation OCP Racks
Ocp recommended profiles for next generation OCP RacksPenguin Computing
 
Penguin computing designing and deploying end to end HPC and AI Solutions
Penguin computing designing and deploying end to end HPC and AI SolutionsPenguin computing designing and deploying end to end HPC and AI Solutions
Penguin computing designing and deploying end to end HPC and AI SolutionsPenguin Computing
 
Ocp updating the ocp compute voltage step response specification
Ocp  updating the ocp compute voltage step response specificationOcp  updating the ocp compute voltage step response specification
Ocp updating the ocp compute voltage step response specificationPenguin Computing
 
Ocp recommended profiles for next generation ocp racks
Ocp recommended profiles for next generation ocp racksOcp recommended profiles for next generation ocp racks
Ocp recommended profiles for next generation ocp racksPenguin Computing
 

More from Penguin Computing (7)

Pro Tips: Designing and Deploying End-to-End HPC and AI Solutions
Pro Tips: Designing and Deploying End-to-End HPC and AI SolutionsPro Tips: Designing and Deploying End-to-End HPC and AI Solutions
Pro Tips: Designing and Deploying End-to-End HPC and AI Solutions
 
Pro Tips: When to Choose Private vs Public vs Hybrid Cloud
Pro Tips: When to Choose Private vs Public vs Hybrid CloudPro Tips: When to Choose Private vs Public vs Hybrid Cloud
Pro Tips: When to Choose Private vs Public vs Hybrid Cloud
 
Partner Perspectives: The OCP Community
Partner Perspectives: The OCP CommunityPartner Perspectives: The OCP Community
Partner Perspectives: The OCP Community
 
Ocp recommended profiles for next generation OCP Racks
Ocp recommended profiles for next generation OCP RacksOcp recommended profiles for next generation OCP Racks
Ocp recommended profiles for next generation OCP Racks
 
Penguin computing designing and deploying end to end HPC and AI Solutions
Penguin computing designing and deploying end to end HPC and AI SolutionsPenguin computing designing and deploying end to end HPC and AI Solutions
Penguin computing designing and deploying end to end HPC and AI Solutions
 
Ocp updating the ocp compute voltage step response specification
Ocp  updating the ocp compute voltage step response specificationOcp  updating the ocp compute voltage step response specification
Ocp updating the ocp compute voltage step response specification
 
Ocp recommended profiles for next generation ocp racks
Ocp recommended profiles for next generation ocp racksOcp recommended profiles for next generation ocp racks
Ocp recommended profiles for next generation ocp racks
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

How Open Technology Helped DOE Labs Place Sixteen Supercomputers on the Top500

  • 1. How Open Technology Helped DOE Labs Place Sixteen Supercomputers on the Top500 Sid Mair– OCP 2019
  • 2. © 2019 Penguin Computing About Penguin Computing 2 ▪ Specialized high-performance computing (HPC), bare metal HPC in the cloud, AI, and storage technologies ▪ Coupled with leading-edge design, implementation, hosting, and managed services including sys-admin and storage-as-a-service, and highly rated customer support ▪ More than 2,500 customers in 50 countries across nine major vertical markets ▪ Over 300 OCP racks delivered to date based on Tundra™ Extreme Scale Design ▪ 20 years of artificial intelligence (AI), engineering, and computer science for startups, Fortune 500, government, and academic organizations
  • 3. © 2019 Penguin Computing OCP Clusters in Top500 3 Sixteen supercomputers designed and built by Penguin Computing placed on the TOP500 List since 2016 ▪ All OCP based & deployed in U.S. national labs as part of the U.S. Department of Energy as an alternative to explosive test-based confidence ▪ Part of the DOE’s Advanced Simulation and Computing (ASC) program ▪ Provide simulation-based confidence in the nuclear stockpile, an alternative to explosive test-based confidence
  • 4. © 2019 Penguin Computing CTS-1: A Perfect Opportunity for OCP Design 4 ▪ Supports National Nuclear Security Administration (NNSA) to ensure nuclear stockpile stewardship in compliance with the Comprehensive Nuclear-Test- Ban Treaty (CTBT) between the U.S. and the former Soviet Union ▪ 30,000 Broadwell / Skylake dual processor nodes to date ▪ Commodity clusters brought down the cost of HPC systems from approximately $100 million per teraFLOP in 1995 to less than $5,000 per teraFLOP today (factor of 20,000) with greater computing power and energy efficiency with each generation ▪ Exemplifies how OCP-based technology can give organizations both value and performance ▪ Provides flexibility in CPU Architectures, accelerators, and interconnects
  • 5. © 2019 Penguin Computing The Key -- Uniquely Flexible, Dense Tundra Design 5 Baseline Tundra™ Extreme Scale ▪ First was 1OU, 3 node, CPU compute, housed in v1 rack ▪ Supports up to 102 nodes per rack with switching ▪ May include GPGPU accelerated servers ▪ High-speed, low latency interconnects Workloads synchronized within microseconds between nodes ▪ Air or liquid cooling (Direct-to-Chip or Rear Doors) ▪ Latest generation open bridge rack ▪ Accessible via cloud through Penguin Computing On- Demand®️ (POD) ▪ Multiple storage options
  • 6. © 2019 Penguin Computing 6 ▪ Shelf Dimensions: 3U (H) x 19in (W) x 25.8in (D) ▪ Many voltage options available (176-305Vac) Nine (9) slots for 3300W Rectifiers and BBUs 208V Single Power System Output (Max) = 16.5kW (N+1) 277/480V Single Power System (Max) = 26.4kW (N+1) 277/480V Dual Zone Configuration (Max) = 52.8kW (N+1) ▪ Redundant power options available ▪ Provides 3 Pair of DC Output Bus Bar Connections ▪ Temperature Env -10C to +45C (+14F to +113F) ▪ 2 AC Convenience Outlets for switching (Gen II) Vertiv HPC Power Shelf (3 x 12V DC Bus Bar) [Feb’18 Updated]
  • 7. © 2019 Penguin Computing 7 CPU Processing ▪ Three Nodes in 1 OpenU Form Factor ▪ Dual Socket Intel®️ Xeon®️ E5-2600v4 per node ▪ Up to 1TB DDR4-2400MHz (8x DIMMs) per node ▪ Intel®️ C612 Chipset ▪ 1x Dedicated BMC ▪ 1x PCIe 3.0 x16 ▪ 1x 2.5" Fixed SATA SSD ▪ Dual 1GbE/RJ45 LOM ▪ Optional 1x FDR LOM ▪ Supports Asetek Direct-to-Chip Cooling Compute Node: Relion 1930e
  • 8. © 2019 Penguin Computing 8 Compute Node: Relion XO1114GT GPU Computing ▪ 1 OU Form Factor ▪ Dual Intel Xeon Cascade Lake-SP / Skylake-SP with Intel Omni-Path ▪ Up to 2TB DDR4-2933MHz (16x DIMMs) ▪ Intel®️ Lewisburg Chipset ▪ 1x Dedicated BMC ▪ Supports 4x GPGPU ▪ Nvidia Tesla Volta-PCIe ▪ 2x PCIe 3.0 Low profile (Speed depends on topology) ▪ 4x 2.5" SATA SSD ▪ Dual 1GbE/RJ45 LOM ▪ Support Asetek Direct-to-Chip Cooling (CPUs and GPUs) ▪ Flexible PCIe Topology
  • 9. © 2019 Penguin Computing 9 CTS-1 Delivered Systems Through Spring 2019
  • 10. © 2019 Penguin Computing Flexible Architecture Enables Scalable Configurations 10 Tundra Extreme Scale, Xeon E5-2695v4 18C 2.1GHz, Intel Omni-Path ▪ LLNL “Quartz” - orig 14SU, expanding 2SU in 2019. TTL will be 16SU. ▪ LLNL “Jade” - 14SU ▪ LANL “Grizzly” w/ “Snow” is 10SU ▪ SNL “Serrano” and “Cayenne” - separate 6SU ▪ SNL “Eclipse” - base 6SU purchased in 2017, 2SU expanded in 2018 ▪ LANL “Badger” - purchased in 2017, doubled in 2018 ▪ LANL “Kodiak” (GPU cluster) - purchased in 2017, expanded in 2018 ▪ LANL “Fire”, “Ice” and “Cyclone” - all separate 6SU
  • 11. © 2019 Penguin Computing OCP for AI 11 NEW Tundra-based “Corona” AI Cluster leverages flexibility, density, efficiency of OCP ▪ AMD EPYC™ processors, AMD Radeon™ Instinct™ GPU accelerators ▪ 383 teraFLOPS (floating point operations per second) ▪ 170 two-socket nodes incorporating 24-core AMD EPYC™ 7401 processors and a PCIe 1.6 Terabyte (TB) nonvolatile (solid-state) memory device ▪ Half of compute nodes utilize 4 AMD Radeon Instinct™ MI25 GPUs per node, delivering 4.2 petaFLOPS of FP32 peak performance ▪ Connected via a Mellanox HDR 200 Gigabit InfiniBand network ▪ Remaining compute nodes may be upgraded with future GPUs
  • 12. © 2019 Penguin Computing 12 Questions?