Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Santa Clara 2018
Bio: "Yutaka Ishikawa is the project leader of developing the post K
supercomputer. From 1987 to 2001, he was a member of AIST (former
Electrotechnical Laboratory), METI. From 1993 to 2001, he was the
chief of Parallel and Distributed System Software Laboratory at Real
World Computing Partnership. He led development of cluster system
software called SCore, which was used in several large PC cluster
systems around 2004. From 2002 to 2014, he was a professor at the
University Tokyo. He led a project to design a commodity-based
supercomputer called T2K open supercomputer. As a result, three
universities, Tsukuba, Tokyo, and Kyoto, obtained each supercomputer
based on the specification in 2008. He was also involved with the
design of the Oakleaf-PACS, the successor of T2K supercomputer in both
Tsukuba and Tokyo, whose peak performance is 25PF."
Session Title: Post-K and Arm HPC Ecosystem
Session Description:
"Post-K, a flagship supercomputer in Japan, is being developed by Riken
and Fujitsu. It will be the first supercomputer with Armv8-A+SVE.
This talk will give an overview of Post-K and how RIKEN and Fujitsu
are currently working on software stack for an Arm architecture."
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Santa Clara 2018
1. Post-K and Arm HPC
ecosystem
Yutaka Ishikawa
RIKEN Center for Computational Science
10:00– 10:30 26th of June, 2018
Arm Architecture HPC Workshop by
Linaro and HiSilicon, Santa Clara, USA
2. Project Overview
20018/7/26
Login
Servers
Login
Servers
Maintenance
Servers
Maintenance
Servers
I/O NetworkI/O Network
……
…
…
…
…
…
…
…
…
…
… Hierarchical
Storage System
Hierarchical
Storage System
Portal
Servers
Portal
Servers
Missions
• Building the Japanese national flagship supercomputer,
post K, and
• Developing wide range of HPC applications, running on
post K, in order to solve social and science issues in
Japan
Project organization
• Post K Computer development
• RIKEN AICS is in charge of development
• Fujitsu is vendor partner.
• International collaborations: DOE, CEA,
JLESC (NCSA, ANL, UTK, JSC, BSC, INRIA, RIKEN)
• Applications
• The government selected
• 9 social & scientific priority issues
• 4 exploratory issues
and their R&D organizations.
Current Status
• The first porotype CPU chip has been
powered on at Fujitsu
• Fujitsu is now evaluating the chip
• System software stack is being
implemented
• Target applications are being tuned
2
NOW
Courtesy of FUJITSU LIMITED
3. Project Overview
20018/7/26
Login
Servers
Login
Servers
Maintenance
Servers
Maintenance
Servers
I/O NetworkI/O Network
……
…
…
…
…
…
…
…
…
…
… Hierarchical
Storage System
Hierarchical
Storage System
Portal
Servers
Portal
Servers
Missions
• Building the Japanese national flagship supercomputer,
post K, and
• Developing wide range of HPC applications, running on
post K, in order to solve social and science issues in
Japan
Project organization
• Post K Computer development
• RIKEN AICS is in charge of development
• Fujitsu is vendor partner.
• International collaborations: DOE, CEA,
JLESC (NCSA, ANL, UTK, JSC, BSC, INRIA, RIKEN)
• Applications
• The government selected
• 9 social & scientific priority issues
• 4 exploratory issues
and their R&D organizations.
Current Status
• The first porotype CPU chip has been
powered on at Fujitsu
• Fujitsu is now evaluating the chip
• System software stack is being
implemented
• Target applications are being tuned
3
Courtesy of FUJITSU LIMITED
NOW
Target Applications
Program Brief description
① GENESIS MD for proteins
② Genomon Genome processing (Genome alignment)
③ GAMERA
Earthquake simulator (FEM in unstructured &
structured grid)
④ NICAM+LETK
Weather prediction system using Big data (structured
grid stencil & ensemble Kalman filter)
⑤ NTChem molecular electronic (structure calculation)
⑥ FFB Large Eddy Simulation (unstructured grid)
⑦ RSDFT an ab-initio program (density functional theory)
⑧ Adventure
Computational Mechanics System for Large Scale
Analysis and Design (unstructured grid)
⑨ CCS-QCD Lattice QCD simulation (structured grid Monte Carlo)
4. Courtesy of FUJITSU LIMITED
Two compute nodes are
implemented on one board
CPU Architecture
Armv8-A + SVE (Scalable Vector Extension)
SIMD Length: 512 bit
FP64/FP32/FP16
INT 1-, 2-, 4-, 8-byte
# of Cores: 48 + (2/4 for OS)
Byte/DP Flop
Approx. 0.4
Fujitsuʼs extensions
Inter core barrier
Sector cache
Hardware prefetch assist
20018/7/26 4
5. An Overview of Post-K Hardware
Compute Node, Compute + I/O Node
connected by 6D mesh/torus Interconnect
3-level hierarchical storage system
1st Layer
Cache for global file system
Temporary file systems
- Local file system for compute node
- Shared file system for a job
2nd Layer
Lustre-based global file system
3rd Layer
Storage for archive
520018/7/26
6. An Overview of System Software Stack
20018/7/26
Easy of use is one of our KPIs (Key Performance Indicators)
Providing wide range of
applications/tools/libraries/compilers
Linux Distribution
Eco‐System
Parallel Programming Environments
XMP, FDPS, …
Armv8 + SVE
Multi-Kernel System: Linux and light-weight kernel (McKernel)
Batch Job System
Application-oriented
File I/O
Communication
MPI
Parallel File System
Tuning and Debugging Tools
Hierarchical File System
Low Level Communication
File I/O for
Hierarchical Storage
LLIO
Fortran, C/C++, OpenMP, Java, …
Math libraries
Process/Thread
PIP
6
7. An Overview of System Software Stack
20018/7/26
Easy of use is one of our KPIs (Key Performance Indicators)
Providing wide range of
applications/tools/libraries/compilers
Linux Distribution
Eco‐System
Parallel Programming Environments
XMP, FDPS, …
Armv8 + SVE
Multi-Kernel System: Linux and light-weight kernel (McKernel)
Batch Job System
Application-oriented
File I/O
Communication
MPI
Parallel File System
Tuning and Debugging Tools
Hierarchical File System
Low Level Communication
File I/O for
Hierarchical Storage
LLIO
Fortran, C/C++, OpenMP, Java, …
Math libraries
Process/Thread
PIP
7
Balazs Gerofi, Rolf Riesen, Masamichi Takagi, Taisuke Boku , Yutaka Ishikawa, Robert
W. Wisniewski, “Performance and Scalability of Lightweight Multi‐Kernel based
Operating Systems,” IPDPS2018, 2018.
8. Programing Languages and Compilers
provided by Fujitsu
Fortran2008 & Fortran2018 subset
C11 & GNU and Clang extensions
C++14 & C++17 subset and GNU and
Clang extensions
OpenMP 4.5 & OpenMP 5.0 subset
Java
Parallel Programming Language &
Domain Specific Library provided by
RIKEN
XcalableMP
FDPS (Framework for Developing Particle
Simulator)
Process/Thread Library provided by RIKEN
PiP (Process in Process)
Script Languages provided by Linux
distributor
E.g., Python+NumPy, SciPy
Communication Libraries
MPI 3.1 & MPI4.0 subset
Open MPI base (Fujitsu), MPICH (RIKEN)
Low-level Communication Libraries
uTofu (Fujitsu), LLC(RIKEN)
File I/O Libraries provided by RIKEN
pnetCDF, DTF, FTAR
Math Libraries
BLAS, LAPACK, ScaLAPACK, SSL II
(Fujitsu)
EigenEXA, Batched BLAS (RIKEN)
Programming Tools provided by Fujitsu
Profiler, Debugger, GUI
Post-K Programming Environment
GCC and LLVM will be also available
20018/7/26 8
9. Support of Software Development/Porting
CY2017 CY2018 CY2019 CY2020 CY2021
Specification
Optimization
Guidebook
RIKEN
Performance
Evaluation
Environment
Early Access
Program
Publishing Incrementally
Performance estimation tool using FX100
RIKEN Simulator
Installation,
and Tuning
ManufacturingDesign and Implementation Operation
Armv8‐A + SVE Overview Detailed hardware info.
• CY2018. Q2, Optimization guidebook is incrementally published
• CY2021. Q1/Q2, General operation starts
NOW
9
• Takeo Yoshida, “Fujitsuʼs HPC processor for the Post-K computer,” IEEE Hot
Chips: A Symposium on High Performance Chips, San Jose, August 21, 2018.
Note: Fujitsu will reveal features of Post‐K CPU at Hot Chips 2018.
20018/7/26
Presenting microarchitecture including core pipeline, cache,
memory, NUMA, performance and power management features.
• CY2020. Q2, Early access program start
Contribution to Arm HPC (Armv8-A+SVE) Ecosystem
13. Open Source Management Tools
EasyBuild
Used at CEA
RIKEN is now evaluating it. As an example, CAFFE, a deep
learning tool, is ported to an Arm machine using EasyBuild
CAFFE consists of several opensource packages:
- boost, blas, cmake, gflags, google (glog, googletest, snapy, leveldb, protobuf),
lmdb, opencv
Spack
Used at ECP project
RIKEN starts Spack evaluation also.
1320018/7/26