4th Systems Paper Survey Seminar

Attending report of SC ’17
Ryo Matsumiya

Self introduction
• Ryo Matsumiya
• Twitter: @mattn_
• https://sites.google.com/site/ryomatsumiya0101/
• Ph.D. student (D2)
• Oyama lab. (UEC, B4-M2)
• Endo lab. (Titech, D1-)
• Major topic: Distributed and parallel processing and its software
architecture considering memory (storage) hierarchy
• Memory Hierarchy, Memory-centric Computing,
Data-intensive Computing, Big Data, Task Parallelism,
Programming System, System Software, GPGPU, Storage System

About SC (1/3)
• ACM/IEEE International Conference for High Performance
Computing, Networking, Storage and Analysis
• DO NOT confuse similar conferences!
• International Conference on Supercomputing (ICS)
• International Supercomputing Conference (ISC)
• Top conf. in the field of HPC
• About 13,000 attendees
• Including 3,500 international (non-US) attendees in SC ’17

About SC (2/3)
• Technical session
• Doctoral forum
• Poster session
• Tutorial session
• Panel session
• Invited talks + Keynote talks
• Workshops
• 38 official workshops
• BoF session
• TOP 500 is announced
• Exhibition
• 250+ organizations

About SC (3/3)
• SC ’17 was held in Colorado Convention Center, Denver
• SC ’15: Austin, SC ’16: Salt Lake City
• SC ’18: Dallas, SC ’19: Denver
• Acceptance Rate: 61/327 = 19 %
• Best paper: Extreme Scale Multi-Physics Simulations of the
Tsunamigenic 2004 Sumatra Megathrust Earthquake
• Technical University of Munich + Ludwig-Maximilians-Universität München
• Best poster: AI with Super-Computed Data for Monte Carlo Earthquake
Hazard Classification
• RIKEN + UT
• Gordon Bell Prize: 18.9-Pflops Nonlinear Earthquake Simulation on
Sunway TaihuLight: Enabling Depiction of 18-Hz and 8-Meter
Scenarios

PapyrusKV: A High-Performance Parallel Key-
Value Store for Distributed NVM Architectures
• Distributed KVS developed by ORNL
• No system-level daemons and servers
• C++ library using Papyrus
• Design and Implementation of Papyrus: Parallel Aggregate Persistent
Storage (IPDPS ’17)
• Open source
• https://code.ornl.gov/eck/papyrus
• Considering memory hierarchy
• Private SSDs + Private DRAMs
• TSUBAME (Titech), Stampede (TACC)
• Shared SSDs (burst buffers) + Private DRAMs
• Oakforest-PACS (JCAHPC), Cori (LBNL)

Structure overview
• Each process has four Memtables and a SSTable
• Memtable
• Used as caches
• Local memtable, Remote memtable, Local immutable memtable,
Remote immutable memtable
• Stored in DRAM
• SSTable
• Sorted String Table
• Stored in NVRAM

Data placement
• DBs are divided into files
• Each process has its own file
• In local SSD architectures, the file is stored in a SSD of its process
• In shared SSD architectures, all files are stored in the Burst
Buffer(s)
• Each KV-pair is assigned to a process
• The process is decided by (hash(key) % # of processes)

Local cache policy
• LRU+FIFO
• The cache is pushed to LRU-queue firstly
• Mutable-memtable(s)
• The FIFO-queue is pushed an element when it is evicted from the LRU-
queue
• Immutable-memtable(s)
• Evicted elements from the FIFO-queue are written-back to SSDs
Key Value
... ...
Key Value
... ...
Key Value
... ...
LRU FIFO
Mutable memtable Immutable memtable SSTable
DRAM SSD

Data structure of tables
• LSM-Tree
• Used by HBase, LevelDB, etc
• In PapyrusKV, trees of MemTables are red-black trees
• The trees in the SSDs are binary trees
O‘Neil et al, The log-structured merge-tree (LSM-tree), Acta Infomatica Vol.33 pp.351-385

Remote cache policy
• Can be changed with papyruskv_consistency()
• Two consistency mode
• Sequential consistency
• Relaxed consistency
• papyruskv_protect() under relaxed consisntency can make
remote caches available
• With PAPYRUSKV_RDONLY, remote read caches are available
• With PAPYRUSKV_WRONLY, asynchronously writing back is
available
• Consistency can be guaranteed by calling papyruskv_barrier()

Storage group (1/2)
• Extra memory copying is caused when a process gets a KV-
pair of another process in the same node
DRAM DRAM
KV-pair of
Proc. A
Process A Process B

Storage group (2/2)
• Solution: directly copying if under relaxed consistency
DRAM DRAM
KV-pair of
Proc. A
Process A Process B

Performance evaluation
• Single node performance compared with Lustre
• Put, Get, Barrier
• Multiple nodes performance
• Relaxed (+ barrier)
• Sequential (+ barrier)
• Combining reads/updates
• Checkpoint/Restart performance
• Comparison with MDHIM
• Real HPC application

Multiple nodes put/get performance

Checkpoint/Restart performance

Real HPC application:
De-novo genome assembly
Evangelos Georganas, Scalable Parallel Algorithms for Genome Analysis, Ph.D. Thesis, UC Berkeley

Application benchmarking
• Comparing with Unified Parallel C (UPC) implementation
• Not use SSDs
• Dataset is human chr14 dataset
• Executed on Cori

Summary
• PapyrusKV is a KVS for HPC Clusters
• C++ library based
• PapyrusKV supports both private and shared SSD
architectures
• SSDs are used as persistent memory
• DRAMs are used as caches
• LSM-Tree based cache mechanism
• Users can specify consistent policies

Other affective papers in SC ’17
• Why Is MPI So Slow? Analyzing the Fundamental Limits in
Implementing MPI-3.1
• 28 authors! (including three Japanese)
• Observing overheads from MPI standard
• Gravel: Fine-Grain GPU-Initiated Network Messages
• UW-Madison + AMD Research
• Network interface for GPU kernel
• Related: GPUnet [OSDI ’14], GPUrdma [ROSS ’16]
• Reducing GPU overheads
• Topology-Aware GPU Scheduling for Learning Workloads in Cloud
Environments
• Barcelona Supercomputing Center + IBM

Call for Jobs
• Hire me!
• Interested in large parallel and/or distributed software
• System software as well as applications
• Not only research, developing and business are also welcome
• I have the best record of (LOC×# of nodes in parallel÷# of
developers) of the active Japanese system-software
students...maybe :-D

4th Systems Paper Survey Seminar

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à 4th Systems Paper Survey Seminar

Similaire à 4th Systems Paper Survey Seminar (20)

Dernier

Dernier (20)

4th Systems Paper Survey Seminar

Notes de l'éditeur