State of ARM-based HPC

State of ARM-based HPC
LTD20-106
24 March 2020

Welcome!
1. This is not our ﬁrst rodeo…
a. Mont Blanc -
https://www.montblanc-project.eu/wp-content/uploads/2017/12/UCHPC_Presentation_PDF_
lw.pdf
b. Linaro Connect -
http://connect.linaro.org.s3.amazonaws.com/sfo17/Presentations/SFO17-200K1.pdf
c. Linaro Connect - https://connect.linaro.org/resources/san19/san19-400k1/
d. Arm - https://developer.arm.com/solutions/hpc
2. The question of whether Aarch64/Arm64 can do HPC is a resounding Yes!

Typical components of a HPC
1. Common components.
a. As near identical conﬁguration per node as possible.
b. A method of interconnecting nodes.
2. A job scheduler.
a. Slurm workload manager
b. Univa grid engine
c. ...and others or ways to parallelise across nodes.
3. CPU / RAM / Interconnect / Storage
Is that enough?

Components
1. Core volume/density.
a. We used to count the number of simultaneous processes by the number of physical
CPUs.
i. In each node we look at number of CPUs
ii. The number of cores
iii. The number of threads
1. Is threading intentionally disabled?
iv. Is NUMA supported?
v. Whether those CPUs are cache-coherent.
2. Levels of Cache
L0 - Macro-op cache
L1 - for each core
L2 - for each cluster of cores
L3 - for each cluster of CPUs
L1,L2,L3 Cache have separate Instruction and Data elements.

Chips
● Arm v8.0-A (Advanced Neon, SIMD 32 x 128bit)
○ Ampere eMag 8180
○ Cavium ThunderX
○ Qualcomm Kryo
● Arm v8.1-A
○ Marvell ThunderX2 (28core variant) - Astra Supercomputer (dual-socket)
○ Marvell ThunderX2 (32core variant) - Isambard Supercomputer (dual-socket)
● Arm v8.2-A
○ Arm NeoverseN1
○ Fujitsu A64FX (+SVE) - Fugaku Supercomputer (single-socket)
○ Huawei Kunpeng 920
○ NVidia Carmel
○ Ampere Altra (v8.2+)
● Arm v8.3-A (SIMD Complex Number rotation support and Nested Virtualisation support)
○ Marvell ThunderX3 (v8.3+) 2020
○ Huawei Kunpeng 930 (almost v8.4 + SVE) 2021
https://en.wikipedia.org/wiki/ARM_architecture

Chips
● Arm v8.6-A (Neoverse N2 ‘Zeus’ to be used in the European Processor Initiative)
○ General Matrix Multiply (GEMM)
○ Bﬂoat16 format support
○ SIMD matrix manipulation instructions, BFDOT, BFMMLA, BFMLAL and BFCVT
○ Enhancements for virtualization, system management and security
● Arm SVE2
○ Fine-grained data-level parallelism
Support for v8.6-A and SVE2 to be in GCC 10 and LLVM CLANG 9
Announced April 2019
https://en.wikipedia.org/wiki/ARM_architecture

RISC, CISC, ACCELERATOR
● The ARM ISA is a RISC implementation
○ Do simple operations highly efﬁciently.
○ Each operation takes one clock cycle, enables pipelining.
● A CISC implementation
○ Do simple instructions like RISC but have additional complex instructions that take more
than one clock cycle. Pipelining is more cumbersome.
● Accelerators
○ Do bespoke actions as quick as possible, even asynchronously.
● The Challenge,
○ Can an ARM ISA extended with accelerator-style operations be as effective as a CISC +
plug-in Accelerator?

Interconnects
● Between upto 128 cores there is ARM CMN600 - Coherent Mesh Network for single chassis
● Between chassis there are:
○ PCIe
○ CCIX
○ CXL?
○ Ares
○ Tofu
● Network options
○ InﬁniBand - Low latency
○ Ethernet

Adaptive Compute Acceleration
https://www.xilinx.com/products/silicon-devices/acap/versal-premium.html

Resilience
● ECC Memory
● Dual power-supplies
● Core fault sensing
● ...Containers?

Blending Containers
● Containers are packaged environments to enable the easy execution of applications by
supplying its dependencies within.
● Multiple containers can work together as building blocks of a larger solution.
● Subject to operational requirements, containers can be built to run on a variety of platforms.
○ From SBC to HPC!
● With the right sort of scheduler system and orchestration tool jobs become:
○ Auto-built/tested
○ Parallelised
○ Flexible
○ Scalable
○ On-demand

Storage is still required...
● DRAM is volatile
● Virtual disks ephemeral
● Diskless nodes
● Persistent storage is still needed:
○ File systems
■ Ext4,lvm,xfs,zfs
○ Parallel ﬁle systems
■ Lustre
○ Distributed storage
■ CEPH
○ Media
■ Conventional disks
■ SSD,nvme

Applications
What does HPC enable...
● 292 Libraries/Applications tested for Aarch64 -
https://gitlab.com/arm-hpc/packages/-/wikis/home
● Weather prediction
○ Although Scalable Probabilistic approximation might be more efﬁcient…
https://advances.sciencemag.org/content/6/5/eaaw0961
● Molecular Dynamics
○ GROMACS supports SIMD NEON operations
○ https://redmine.gromacs.org/issues/2806 SIMD algorithms for ARM SVE scheduled for
2021.
● AI

All things Cloud...
● IDC - Worldwide Server Market Revenue Declined 11.6% Year Over Year in the Second Quarter
of 2019 https://www.idc.com/getdoc.jsp?containerId=prUS45482519
● COVID-19 pandemic causes Stock Market falls of 20% (Mar.2020).
https://www.wired.com/story/covid-19-spreads-listen-stock-market/
● Working remotely is now the norm.
● Scalable on-demand services brings Serverless Computing.

The Linaro Datacenter & Cloud Group (LDCG)
● Common development center for the Arm
Server & Infrastructure ecosystem
● Eliminates fragmentation, reduces cost
and accelerates time to market
● Members can focus on innovation and
differentiated value-add
● Working on core open-source software for
ARM servers
○ Server architecture – UEFI/ACPI/ServerReady
○ ARMv8 enablement & optimization
○ Big Data, BigTop, Hadoop and Spark
○ Cloud Infrastructure such as Kubernetes,
OpenStack and Ceph
Linaro Developer Cloud
Enterprise-class Arm Powered
servers hosted in UK are available for
development, test, CI and cloud
deployments for VM and containers.
www.linaro.cloud

Lower deployment & management barriers
Leverage the Linaro Developer Cloud and other services to develop
cost-effective Cloud-integrated HPC development frameworks and generate
reference implementations to accelerate
Member-driven with Advisory Board
Members determine work completed by engineering resources while advisory
board provides subject matter expertise on HPC requirements and guidance
and feedback on ongoing HPC SIG strategic direction and roadmap
Driving datacenter-class, open-source HPC development on Arm
Identify and adopt standards to make HPC deployment on Arm a commercial
imperative. Develop real-world use cases that reap the beneﬁts of Arm while
ensuring interoperability, modularization, orchestration
LDCG High Performance Computing (HPC) SIG
Collaborative project building on the work of the Linaro Datacenter & Cloud Group
HPC

Functions-as-a-Service
● Linaro HPC hardware being reconﬁgured towards a scalable environment.
○ A combination of OpenStack, K8S and OpenHPC.
○ A testbed to verify combinations of heterogeneous ingredients for the optimal recipes.
● Service Consumers
○ Send the service request and receive the service answer.
○ The service consumer will be CPU,GPU,ISA,Accelerator agnostic!
If the equipment is billed as pay-per-use then it’s our challenge to ensure that Aarch64
solutions match a signiﬁcant number of requests.

Thank you
Continuing to accelerate deployment of your
Arm-based solutions through collaboration
hpc@linaro.org

State of ARM-based HPC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to State of ARM-based HPC

Similar to State of ARM-based HPC (20)

More from inside-BigData.com

More from inside-BigData.com (20)

Recently uploaded

Recently uploaded (20)

State of ARM-based HPC