SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Yusuke Doi, Ph.D
Corporate Officer, VP of Computing Infrastructure, Preferred Networks, Inc.
MN-3, MN-Core and HPL
SC21 Green500 BOF
Who are We?
Why We Need Computing Power?
Preferred Networks Inc.
Industry Domains
Transportation Manufacturing Life Sciences Materials Robots Entertainment
Founded March 2014
Directors
CEO Toru Nishikawa
COO Daisuke Okanohara
CTO Ryosuke Okuta
Located
Tokyo, Japan (HQ) ​
Burlingame, CA., US
(Preferred Networks America, Inc.)​
Make the real world computable
How much information can you extract from a single image?
Our pixel accuracy
object detection model
extracts large amount of rich features
from single image using
● State-of-art algorithm
● Hyperparameter tuning and
optimization using Optuna™
● Proprietary CG-based annotation-
free data generation and
augmentation combined with
domain transfer to real image
● Distributed / large-batch training
5
Behavior model
with neural network
Computational Chemistry with Deep Learning
Searching for
new
materials
over
computers
Atom
Energy
and
Force
physical
property
Molecular
Dynamics
Our Capabilities
Deep Learning
World class researchers
focusing on deep learning
Expertise
Wide range of deep expertise from
robotics to genomics to
computational chemistry
World class computational
resources designed for deep
learning application
Private Super
Computer
Software
In-house developments of OSS and
hyperparameter tuning library to
accelerate software development
MN-3 and MN-Core: Deep Learning Supercomputer
7
MN-Core MN-Core Board x 4
CPU Intel Xeon 8260M 2way (48 physical cores)
Memory 384GB DDR4
Storage Class Memory 3TB Intel Optane DC Persistent Memory
Network
MN-Core DirectConnect(112Gbps) x 2
Mellanox ConnectX-6(100GbE) x 2
On board(10GbE) x 2
MN-3 node specs
Deep learning processor MN-Core
For more information please visit: https://projects.preferred.jp/supercomputers/en/
MN-3 is the world’s most energy efficient supercomputer for deep learning.
We use HPL to understand how to run our computer efficiently.
Green500 / TOP500 history:
● 2021/11, 39.38GFlops/W (Green500 #1 / TOP500 #301)
● 2021/06, 29.70GFlops/W (Green500 #1 / TOP500 #335)
● 2020/11, 26.04GFlops/W (Green500 #2 / TOP500 #330)
● 2020/06, 21.11GFlops/W (Green500 #1 / TOP500 #393)
MN-3 and HPL
Giant SIMD Processor
● Single instruction stream
● 500W/Package @ 32.8TF(DP)
○ 65GF/W on chip (ideal case)
● Hierarchical structure with unique on-chip
network (broadcast, aggregation, etc)
● Deterministic/transparent from software
○ no cache, software shall manage data
copy between each layers
MN-Core
Philosophy behind MN -Core Hardware
By providing only the functions necessary for
computation and controlling them completely with
software, we can achieve high execution
efficiency/power efficiency with minimal hardware.
This is difficult to achieve with cache-based parallel
processors whose behavior is hidden to software.
Prof. Makino
(Kobe Univ.)
Idea Behind MN -Core : Transparent Hardware for
High Performance
Power Measurements
● Level 2 in our first Green500 (June 2020)
● Upgraded to meet Level 3 requirements
○ #1 system should be one of the best
examples in power measurement
● We measure in level-3 criteria since our
second Green500 (Nov. 2020)
← upgrading power facility and measurement
devices to meet Level 3 requirements
Update to Level 3 power measurement
Level 3 Measurements: Power System of the MN-3
200V600A
3P3W
200V600A
3P3W
200V150A (3P3W) MN-3A (Zone 0) - 32nodes
Smart PDU x 10
Smart PDU x 8
MN-3 nodes x 16
MN-3 nodes x 16
MN-3 Interconnect
MN-3A (Zone 1)
Revenue grade meter
ME110SSR x 4
Power Analyzer
WT1800E (6elements)
MN-3 Power Measurements system TSDB
HPL program
Trigger
Feedback
via Slack bot, Web (Grafana)
Everybody can see results on Slack and Grafana dashboards
Measurement system supporting continuous improvements
● The more iterations, the more improvements
● Key to rapid iterations: how we quickly share the results of experiments
● Automated reporting system
○ Issues a unique ID to each HPL run
○ Records timestamp of core/full phase with the ID
○ Generates summary and graph of power measurements
○ Share the results in Slack
● It helps us to quickly understand effects of development
Optimization
We targeted 40GF/W with our 12nm accelerator
● 2020/06, 21.11GFlops/W, efficiency 41% (#1)
○ initial challenge, made it work
● 2020/11, 26.04GFlops/W, efficiency 53% (#2)
○ optimization on scheduling, GEMM
● 2021/06, 29.70GFlops/W, efficiency 58% (#1)
○ even more optimization
● 2021/11, 39.38GFlops/W, efficiency 64% (#1)
○ interconnect improvement, aggressive software-level clock
gating, even even more optimization
MN-Core Challenge on HPL
58% → 64% (6pt gain)
● +2pt (58→60): Optimizing DGEMM kernel and re-organized
overlapping DGEMM and communication (swap and panel
broadcast)
● +3pt (60→63): Increased bandwidth of interconnect (MN-Core
DirectConnect) and more overlapping calc. and comm.
● +1pt (63→64): Optimizing other parts including panel factorization
and dynamic code generation
Execution Efficiency Improvement Breakdown
29.70GF/W → 39.38GF/W (9.68GF/W gain)
● +3.4GF/W: Corresponding to the improvement of execution efficiency
● +4.4GF/W: Generating "energy-efficient instructions" by software:
stopping unused arithmetic units, using scratchpad FFs instead of
SRAMs, and reducing energy consumption of data copy, etc.
● +1.9GF/W: Other tuning including the core voltage and freq.
Power Efficiency Improvement Breakdown
Stop an ALU in PE
Reuse a matrix as much
as possible to reduce
data copy
● Unique computation framework of MN-Core
○ Deterministic and transparent hardware fully controlled by software
○ Application-specific optimization
● HPL is very useful benchmark to understand efficiency of new-style computer in
real environment
○ Precise and integrated measurement is essential for continuous
improvement
Please visit us at booth #1521!
Summary

Contenu connexe

Tendances

Tendances (20)

HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?
 
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料
 
高速な倍精度指数関数expの実装
高速な倍精度指数関数expの実装高速な倍精度指数関数expの実装
高速な倍精度指数関数expの実装
 
Tensor コアを使った PyTorch の高速化
Tensor コアを使った PyTorch の高速化Tensor コアを使った PyTorch の高速化
Tensor コアを使った PyTorch の高速化
 
【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】時系列予測 Transfomers の精度向上手法
 
研究効率化Tips Ver.2
研究効率化Tips Ver.2研究効率化Tips Ver.2
研究効率化Tips Ver.2
 
【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models【DL輪読会】Scaling Laws for Neural Language Models
【DL輪読会】Scaling Laws for Neural Language Models
 
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
 
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
 
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
Deep Learningのための専用プロセッサ「MN-Core」の開発と活用(2022/10/19東大大学院「 融合情報学特別講義Ⅲ」)
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
 
畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化畳み込みニューラルネットワークの高精度化と高速化
畳み込みニューラルネットワークの高精度化と高速化
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
 
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learningゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement Learning
 
GPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIAGPU と PYTHON と、それから最近の NVIDIA
GPU と PYTHON と、それから最近の NVIDIA
 
モデル高速化百選
モデル高速化百選モデル高速化百選
モデル高速化百選
 
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?
 
TensorFlow Lite Delegateとは?
TensorFlow Lite Delegateとは?TensorFlow Lite Delegateとは?
TensorFlow Lite Delegateとは?
 
Hopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないことHopper アーキテクチャで、変わること、変わらないこと
Hopper アーキテクチャで、変わること、変わらないこと
 
畳み込みLstm
畳み込みLstm畳み込みLstm
畳み込みLstm
 

Similaire à MN-3, MN-Core and HPL - SC21 Green500 BOF

Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
inside-BigData.com
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
Junli Gu
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
Bomm Kim
 
Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...
Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...
Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...
ijtsrd
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
NomanSiddiqui41
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
ideas2ignite
 

Similaire à MN-3, MN-Core and HPL - SC21 Green500 BOF (20)

Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AIArm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
Arm A64fx and Post-K: Game-Changing CPU & Supercomputer for HPC, Big Data, & AI
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...
Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...
Study of Energy Efficient Images with Just Noticeable Difference Threshold Ba...
 
Panel: NRP Science Impacts​
Panel: NRP Science Impacts​Panel: NRP Science Impacts​
Panel: NRP Science Impacts​
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Early Application experiences on Summit
Early Application experiences on Summit Early Application experiences on Summit
Early Application experiences on Summit
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
 
Performance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and MindsporePerformance Comparison between Pytorch and Mindspore
Performance Comparison between Pytorch and Mindspore
 
High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsHigh Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming Paradigms
 
Fugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons LearnedFugaku, the Successes and the Lessons Learned
Fugaku, the Successes and the Lessons Learned
 
Scolari's ICCD17 Talk
Scolari's ICCD17 TalkScolari's ICCD17 Talk
Scolari's ICCD17 Talk
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
 
Lakefield: Hybrid Cores in 3D Package
Lakefield: Hybrid Cores in 3D PackageLakefield: Hybrid Cores in 3D Package
Lakefield: Hybrid Cores in 3D Package
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...
 
Deep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devicesDeep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devices
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
 

Plus de Preferred Networks

Plus de Preferred Networks (20)

PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
PodSecurityPolicy からGatekeeper に移行しました / Kubernetes Meetup Tokyo #57
 
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
Optunaを使ったHuman-in-the-loop最適化の紹介 - 2023/04/27 W&B 東京ミートアップ #3
 
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
Kubernetes + containerd で cgroup v2 に移行したら "failed to create fsnotify watcher...
 
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
深層学習の新しい応用と、 それを支える計算機の進化 - Preferred Networks CEO 西川徹 (SEMICON Japan 2022 Ke...
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
 
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
最新リリース:Optuna V3の全て - 2022/12/10 Optuna Meetup #2
 
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
Optuna Dashboardの紹介と設計解説 - 2022/12/10 Optuna Meetup #2
 
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
スタートアップが提案する2030年の材料開発 - 2022/11/11 QPARC講演
 
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
PFNにおける研究開発(2022/10/19 東大大学院「融合情報学特別講義Ⅲ」)
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
 
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語るKubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
Kubernetes にこれから入るかもしれない注目機能!(2022年11月版) / TechFeed Experts Night #7 〜 コンテナ技術を語る
 
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
Matlantis™のニューラルネットワークポテンシャルPFPの適用範囲拡張
 
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
PFNのオンプレ計算機クラスタの取り組み_第55回情報科学若手の会
 
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
Kubernetes Service Account As Multi-Cloud Identity / Cloud Native Security Co...
 
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
KubeCon + CloudNativeCon Europe 2022 Recap / Kubernetes Meetup Tokyo #51 / #k...
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
 
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
独断と偏見で選んだ Kubernetes 1.24 の注目機能と今後! / Kubernetes Meetup Tokyo 50
 
Topology Managerについて / Kubernetes Meetup Tokyo 50
Topology Managerについて / Kubernetes Meetup Tokyo 50Topology Managerについて / Kubernetes Meetup Tokyo 50
Topology Managerについて / Kubernetes Meetup Tokyo 50
 
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

MN-3, MN-Core and HPL - SC21 Green500 BOF

  • 1. Yusuke Doi, Ph.D Corporate Officer, VP of Computing Infrastructure, Preferred Networks, Inc. MN-3, MN-Core and HPL SC21 Green500 BOF
  • 2. Who are We? Why We Need Computing Power?
  • 3. Preferred Networks Inc. Industry Domains Transportation Manufacturing Life Sciences Materials Robots Entertainment Founded March 2014 Directors CEO Toru Nishikawa COO Daisuke Okanohara CTO Ryosuke Okuta Located Tokyo, Japan (HQ) ​ Burlingame, CA., US (Preferred Networks America, Inc.)​ Make the real world computable
  • 4. How much information can you extract from a single image? Our pixel accuracy object detection model extracts large amount of rich features from single image using ● State-of-art algorithm ● Hyperparameter tuning and optimization using Optuna™ ● Proprietary CG-based annotation- free data generation and augmentation combined with domain transfer to real image ● Distributed / large-batch training
  • 5. 5 Behavior model with neural network Computational Chemistry with Deep Learning Searching for new materials over computers Atom Energy and Force physical property Molecular Dynamics
  • 6. Our Capabilities Deep Learning World class researchers focusing on deep learning Expertise Wide range of deep expertise from robotics to genomics to computational chemistry World class computational resources designed for deep learning application Private Super Computer Software In-house developments of OSS and hyperparameter tuning library to accelerate software development
  • 7. MN-3 and MN-Core: Deep Learning Supercomputer 7 MN-Core MN-Core Board x 4 CPU Intel Xeon 8260M 2way (48 physical cores) Memory 384GB DDR4 Storage Class Memory 3TB Intel Optane DC Persistent Memory Network MN-Core DirectConnect(112Gbps) x 2 Mellanox ConnectX-6(100GbE) x 2 On board(10GbE) x 2 MN-3 node specs Deep learning processor MN-Core For more information please visit: https://projects.preferred.jp/supercomputers/en/
  • 8. MN-3 is the world’s most energy efficient supercomputer for deep learning. We use HPL to understand how to run our computer efficiently. Green500 / TOP500 history: ● 2021/11, 39.38GFlops/W (Green500 #1 / TOP500 #301) ● 2021/06, 29.70GFlops/W (Green500 #1 / TOP500 #335) ● 2020/11, 26.04GFlops/W (Green500 #2 / TOP500 #330) ● 2020/06, 21.11GFlops/W (Green500 #1 / TOP500 #393) MN-3 and HPL
  • 9. Giant SIMD Processor ● Single instruction stream ● 500W/Package @ 32.8TF(DP) ○ 65GF/W on chip (ideal case) ● Hierarchical structure with unique on-chip network (broadcast, aggregation, etc) ● Deterministic/transparent from software ○ no cache, software shall manage data copy between each layers MN-Core
  • 10. Philosophy behind MN -Core Hardware By providing only the functions necessary for computation and controlling them completely with software, we can achieve high execution efficiency/power efficiency with minimal hardware. This is difficult to achieve with cache-based parallel processors whose behavior is hidden to software. Prof. Makino (Kobe Univ.) Idea Behind MN -Core : Transparent Hardware for High Performance
  • 12. ● Level 2 in our first Green500 (June 2020) ● Upgraded to meet Level 3 requirements ○ #1 system should be one of the best examples in power measurement ● We measure in level-3 criteria since our second Green500 (Nov. 2020) ← upgrading power facility and measurement devices to meet Level 3 requirements Update to Level 3 power measurement
  • 13. Level 3 Measurements: Power System of the MN-3 200V600A 3P3W 200V600A 3P3W 200V150A (3P3W) MN-3A (Zone 0) - 32nodes Smart PDU x 10 Smart PDU x 8 MN-3 nodes x 16 MN-3 nodes x 16 MN-3 Interconnect MN-3A (Zone 1) Revenue grade meter ME110SSR x 4 Power Analyzer WT1800E (6elements) MN-3 Power Measurements system TSDB HPL program Trigger Feedback via Slack bot, Web (Grafana)
  • 14. Everybody can see results on Slack and Grafana dashboards
  • 15. Measurement system supporting continuous improvements ● The more iterations, the more improvements ● Key to rapid iterations: how we quickly share the results of experiments ● Automated reporting system ○ Issues a unique ID to each HPL run ○ Records timestamp of core/full phase with the ID ○ Generates summary and graph of power measurements ○ Share the results in Slack ● It helps us to quickly understand effects of development
  • 17. We targeted 40GF/W with our 12nm accelerator
  • 18. ● 2020/06, 21.11GFlops/W, efficiency 41% (#1) ○ initial challenge, made it work ● 2020/11, 26.04GFlops/W, efficiency 53% (#2) ○ optimization on scheduling, GEMM ● 2021/06, 29.70GFlops/W, efficiency 58% (#1) ○ even more optimization ● 2021/11, 39.38GFlops/W, efficiency 64% (#1) ○ interconnect improvement, aggressive software-level clock gating, even even more optimization MN-Core Challenge on HPL
  • 19. 58% → 64% (6pt gain) ● +2pt (58→60): Optimizing DGEMM kernel and re-organized overlapping DGEMM and communication (swap and panel broadcast) ● +3pt (60→63): Increased bandwidth of interconnect (MN-Core DirectConnect) and more overlapping calc. and comm. ● +1pt (63→64): Optimizing other parts including panel factorization and dynamic code generation Execution Efficiency Improvement Breakdown
  • 20. 29.70GF/W → 39.38GF/W (9.68GF/W gain) ● +3.4GF/W: Corresponding to the improvement of execution efficiency ● +4.4GF/W: Generating "energy-efficient instructions" by software: stopping unused arithmetic units, using scratchpad FFs instead of SRAMs, and reducing energy consumption of data copy, etc. ● +1.9GF/W: Other tuning including the core voltage and freq. Power Efficiency Improvement Breakdown Stop an ALU in PE Reuse a matrix as much as possible to reduce data copy
  • 21. ● Unique computation framework of MN-Core ○ Deterministic and transparent hardware fully controlled by software ○ Application-specific optimization ● HPL is very useful benchmark to understand efficiency of new-style computer in real environment ○ Precise and integrated measurement is essential for continuous improvement Please visit us at booth #1521! Summary