In this deck from the HPC User Forum in Santa Fe, Peter Hopton from Iceotope presents: European Exascale System Interconnect & Storage.
"A new Exascale computing architecture using ARM processors is being developed by a European consortium of hardware and software providers, research centers, and industry partners. Funded by the European Union’s Horizon2020 research program, a full prototype of the new system is expected to be ready by 2018."
The project, called ExaNeSt, is based on ARM processors, originally developed for mobile and embedded applications, similar to another EU project, Mont Blanc, which also aims to design a supercomputer architecture using an ARM based supercomputer. Where ExaNeSt differs from Mont Blanc, however, is a focus on networking and on the design of applications. ExaNeSt is co-designing the hardware and software, enabling the prototype to run real-life evaluations – facilitating a stable, scalable platform that will be used to encourage the development of HPC applications for use on this ARM based supercomputing architecture.
Watch the video:
Learn more: http://www.iceotope.com/
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
2. Project Group Landscape
• “EuroExa” Group Of Projects
• Historic Input Effort (from 2010)
– EuroServer, Mont Blanc, Encore
• Currently Running Projects (2014-2018)
– ExaNeSt
– EcoScale
– ExaNode
• Follow Through Project (2017-2020)
– To Be Announced... ;-)
2
3. What ExaNeSt is about
• ARMv8, UNIMEM Partitioned Global Address Space (PGAS)
– low energy compute
– low overhead communicate
– FPGA-assisted acceleration
– working closely with Other Projects…
• Network: unified compute & storage, low latency
• Storage: distributed, in-node non-volatile memories
• Real Applications: Scientific, Engineering, Data Analytics
• Data Centre Infrastructure (Power and Cooling): Total
Liquid Cooling
• Prototype: 1K cores, 16GBytes DDR4 per FPGA
3HPC User Forum 1st March 2017, Stuttgart
4. The ExaNeSt Consortium
4
Storage&data
UK
Italy
Greece - coordinator
UK
UKUPV - ES
Technology
Interconnects
Netherlands
France
Germany
Applications
Italy
Italy Italy
www.exanest.eu
HPC User Forum 1st March 2017, Stuttgart
6. Interconnection Network
• Now: Simulations, Studies:
– at the Packet/flit level, for protocol behavior and interactions (using
INSEE and Omnet+);
– Traffic Inputs: Synthetic models, real App Traces, or running App’s.
• Later: Experiments on real Prototype running real App’s
• Packaging & interconnect considered in tandem
– Hierarchical interconnect
• Design Goals:
– unified network for compute & storage
– flow prioritization: heavy / storage versus short / sync (compute)
– throttle congestive flows at network edges
– resiliency: error detect/correct, monitor links, multipath routing
– Zero-copy, user-level RDMA
– Global address space
– all-optical proof-of-concept TOR switch using 2×2/4×4 building blocks
6HPC User Forum 1st March 2017, Stuttgart
8. Storage: current Design work
Global Storage Layer +
+ per-job SSD/NVM on-demand Parallel Cache Layer
• Based on the BeeGFS parallel filesystem (open source),
with caching and replication extensions
• Low-latency memory-mapped storage access path in Linux
• Virtualization: RDMA from within VM’s; MPI remoting
• Acceleration for Host-to-VM and VM-to-VM interactions
8HPC User Forum 1st March 2017, Stuttgart
12. Power
• 48V DC to POL Converter on Daughter Board
• 3 Phase 415V to 48V DC Liquid Cooled Power Conversion
• 48V DC Power Distribution
• Ambition to integrate renewable energy on-site
generation and energy storage at DC to remove
conversion cycles and give “green power” (Separate
Project)
12
AC
DC
48V
UPS AC
DC
12V
DC
PoL
AC DC 48V DC PoL
Old
New
13. Cooling
Combination of 3 Different Patented Cooling Technologies
• Immersion
– Natural Convection “Cells”
– Forced Convection “Fountains”
• Conduction
• Total Liquid Cooling
• Zero Airflow
• Potential for High Temperatures and Heat Capture
13
18. The 1st Major ExaNeSt Prototype (2017)
• Using Xilinx Zynq UltraScale+
FPGAs:
– Quad-core 64-bit ARM A53 per FPGA
– Cache-coherent low-latency I/O port
• On 120×130 mm2 Daughter Boards
– 4 FPGA’s
– 1 TB SSD
– 10× 16Gb/s I/O’s
• 1st Prototype
– Up to 8 DB’s per Blade, 9 Blades
• In Early Test Now
– First deployment = within 2017
18
Electronics
immersed in
PFPE liquids
Rack-level
water
circulation
HPC User Forum 1st March 2017, Stuttgart
19. The 2nd Major ExaNeSt Prototype (2018)
• Same FPGAs, Same Quad FPGA Daughter Board
• 16 DB’s, 1 Blade, 3.2kW in 1u, Small Footprint Cabs
• Our Ambition
– Hot Water Cooling (>50C), Heat Capture Functions
– 100kW cabinets, 1.0x PUE, >0.9 ERF Potential
– DC Power Distribution
– Potential for High Density Data Centres or Modular Facilities:
HPC User Forum 1st March 2017, Stuttgart 19
Aisle
Aisle
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
Aisle
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
Coolant and DC Power
Aisle
Aisle
600x600mm
cabinetwith6
chassis&2xCDUs
Aisle
2.5m
ICEOTOPE
CONFIDENTIAL
Batteries and Rectifiers for 250kW
of Load
Batteries and Rectifiers for 250kW
of Load
Coolant and DC Power600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
600x600mm
cabinetwith6
chassis&2xCDUs
Coolant and DC Power
Batteries and Rectifiers for 250kW
of Load
Batteries and Rectifiers for 250kW
of Load
Coolant and DC Power
10.6m
Wall
Wall
1MW DC
In 40ft ISO or
10mx2.4m
20. What ExaneSt offers to the Ecosystem
• Use of the Prototypes by ExaNoDe and EcoScale
• Use of the Prototype by the Ecosystem (2018 onwards)
• HPC technology components (2018 onwards):
– ARM/Unimem
– Fine-tuned Applications
– The Ultra-Dense Data Centre for Scaling HPC
– Packaging & Cooling
– Distributed NVM / Storage
– Interconnects
– DB compute-node prototype
20HPC User Forum 1st March 2017, Stuttgart
21. European Exascale
System Interconnect & Storage
• Interconnection Network
• In-node Storage
• Data Centre Infrastructure
• Advanced Cooling & Power
• Real Applications
www.exanest.eu Project Coordinator:
Prof. Manolis Katevenis (kateveni@ics.forth.gr)
HPC User Forum 1st March 2017, Stuttgart