4. HPC division highlights
• The Eurotech HPC division focuses on designing, manufacturing, delivering and
supporting high performance computing solutions
• More than 14 years of history of delivering supercomputing systems and solutions to
industry and academia
• First worldwide company to market hot water cooled high performance computers. First
hot water cooled HPC in the market delivered in 2009.
• R&D capabilities nurtured in house and through collaboration with Universities and
research centres in Europe: INFN, Julich, Regensburg, Daisy…
• Founder member of ETP for HPC
5. Q-Pace, 2007-2009
Eurotech HPC project examples
Janus, 2006-2008
Ape mille, 1999-2002
Ape next, 2002-2005
Aurora Science, 2008-
2010
Selex Elsag
(E-security)
2011-2012
Deep project 2012
Eurora 2012
10. JANUS-SSUE: Architecture
• 16 computing nodes (SP) are interconnected through a 3D network (3D-Torus)
each SP is connected to the external world through a I/O management board (IOP)
• I/O interfaces provided by the IOP module:
2 1-GEth links
1 UART
1 USB
1 LVDS channel for high-speed I/O
• the carrier board (PB) provides the 3D net, power generation/ distribution and
clock management
10
SP
PB
IOP
17. From 1 node to large petascale systems
Node Backplane Chassis Cooling
System Rack
18. Key Features:
High Performance Density – 256 CPUs, 256
accelerators, up to 350 TFlops in just 1.5 m2
Energy efficiency– the Aurora direct cooling
target datacenter PUE of 1.05, no need for air
conditioning, up to 50% less energy
Programmability and compatibility – Based on
standard HPC cluster architecture. 100%
compatibility with existing applications.
Flexible Liquid Cooling– All components are
cooled by water, temperature from 18°C to 52°C
and variable flow rates
Reliability– 3 independent sensor networks,
soldered memory, no moving parts, uniform
cooling, quality controls
The Aurora Tigon
Unleash the hybrid power
19. • The node card is the main processing
unit
• An aluminum cold plate that cools the
board smoothing temperature
distributions and assuring maximum
heat extraction efficacy
• A large, high end FPGA allow
implementation of a point to point 3D-
Torus network
Node card
2 X Intel Xeon E5 series
2X Nvidia Kepler K20
OR 2 X Intel Xeon Phi 5120D
2730 GFlops
800 W power consumption
64 GB soldered memory
The Aurora Tigon node card
20. • 2 Intel Sandy Bridge Xeon E5 CPUs
connected via Quick Path
Interconnect QPI at 8.0GT/s.
• The system hub (I/O Bridge) is an
Intel Patsburg chipset and provides
connectivity between the CPUs and
the rest of the system
• One SATA disk or SSD, used to
provide local fast and permanent
storage
• A Mellanox QDR/FDR adapter is
connected to one of the CPU’s via
one x8 PCIe 2.0 link.
• An Altera Stratix V FPGA is
connected with an 2 PCIe2.0 x8, one
to each CPU.
The Aurora Tigon
Node card architecture
21. Aurora FPGA: 3D Torus network processor
• Each Node Card provides 6 full-duplex
links (X+, X-, Y+, Y-, Z+, Z-).
• Each link is physically implemented by
two lines (main and redundant) that
can be selected (in software) to
configure the machine partitioning
(full 3D Torus or one of the many 3D
sub-tori available).
2xPCIe 3.0 x8
Aurora systems are scalable from a single working unit to many computing racks with no performance degradation.
High density, power efficiency and noiseless operation, thanks to extensive liquid cooling and to advanced packaging. A wide range of configurations is available.
The node card is the main processing unit of every Aurora system. A blade hosting two Intel Xeon 5600 series processors (Six cores, up to 3.34GHz, TDP<130W), each connected to up to 24GB of 1333MHz DDR3 memory. CPUs are linked to peripherals via Intel chipset, using Intel QPI at 6.4GTps. One node has a computing power of 155Gflops and a typical power consumption of 350W.
A large, high end FPGA allows implementation of a point to point 3D-Torus network for nearest neighbor communications. 3D-Torus interconnect comes with low latency (about 1µs), 60Gbps bandwidth, high robustness and reliability, thanks to redundant lines. Nodes host one Mellanox ConnectX2 device, with 40Gbps bandwidth and <2µs latency, used to implement a QDR Infiniband switched network.
Reconfigurable computing functions such as acceleration, GPU-like co-processing are possible thanks to available logic resources on the FPGA (up to 700Gops per device).
Aurora nodes are hosted in chassis: each of them features also an IB switch, with 20 QDR ports for QSFP cables. All chassis feature two monitoring and control networks, for reliable operation. Maintenance is possible also using a touch-screen interface on a monitor showing diagnostic data.
Aurora Racks can contain up to 16 chassis each, and they provide mechanical support for accessand maintenance, power distribution, cables routing,and piping infrastructure for heat removal via a liquid cooling circuit.
Aurora systems are scalable from a single working unit to many computing racks with no performance degradation.
High density, power efficiency and noiseless operation, thanks to extensive liquid cooling and to advanced packaging. A wide range of configurations is available.
The node card is the main processing unit of every Aurora system. A blade hosting two Intel Xeon 5600 series processors (Six cores, up to 3.34GHz, TDP<130W), each connected to up to 24GB of 1333MHz DDR3 memory. CPUs are linked to peripherals via Intel chipset, using Intel QPI at 6.4GTps. One node has a computing power of 155Gflops and a typical power consumption of 350W.
A large, high end FPGA allows implementation of a point to point 3D-Torus network for nearest neighbor communications. 3D-Torus interconnect comes with low latency (about 1µs), 60Gbps bandwidth, high robustness and reliability, thanks to redundant lines. Nodes host one Mellanox ConnectX2 device, with 40Gbps bandwidth and <2µs latency, used to implement a QDR Infiniband switched network.
Reconfigurable computing functions such as acceleration, GPU-like co-processing are possible thanks to available logic resources on the FPGA (up to 700Gops per device).
Aurora nodes are hosted in chassis: each of them features also an IB switch, with 20 QDR ports for QSFP cables. All chassis feature two monitoring and control networks, for reliable operation. Maintenance is possible also using a touch-screen interface on a monitor showing diagnostic data.
Aurora Racks can contain up to 16 chassis each, and they provide mechanical support for accessand maintenance, power distribution, cables routing,and piping infrastructure for heat removal via a liquid cooling circuit.