AMD_Xilinx_AI_VCK5000_20220602R1.pdf

AIに世界最高レベルの効率をもたらす
ゼロダークシリコンソリューション
AMD-ザイリンクス (ザイリンクス株式会社)
Adaptive and Embedded Computing Group (AECG)
Data Center and Communications Group (DCCG)
堀江義弘
2022年 6月2日

2
AI アクセラレータは革新が必要になっている
DC AI Capex Growing at 36.7% CAGR, projected
to $65B TAM by 2026
Source: Markets and Markets (Data Center Accelerator Market Global Forecast to 2026)
AI Accelerator Peak TOPS Growing Exponentially
to Keep Up with the Model Innovation
*2022-2023 is projected based on history
** Only including AI cards with 150W or less power consumption
0
100
200
300
400
500
600
700
2016 2017 2018 2019 2020 2021 2022 2023
Peak
TOPS
AI Accelerator TOPS Growth (<150W)
nVidia Xilinx - AMD Intel
0
10000
20000
30000
40000
50000
60000
70000
2020 2021 2022 2023 2024 2025 2026
$MILLION
AI Accelerator TAM

3
現状、AI向けデバイスの3分の2の部分は ”ダークシリコン”
The market flagship barely achieves 40% in the most basic AI benchmark
Source: https://developer.Nvidia.com/deep-learning-performance-training-inference
34%
35%
38%
42%
66%
65%
62%
58%
0% 20% 40% 60% 80% 100% 120%
nVidia T4
nVidia A10
nVidia A30
nVidia A100
Actual TOPS achieved vs dark silicon
Efficiency Dark Silicon
ResNet-50
(img/s)
Peak
TOPS
Actual
TOPS
A100 32,204 624 264
A30 15,411 330 126
A10 10,676 250 88
T4 5,423 130 44

4
固定したハードウェアはデータ処理効率に課題
Fixed AI Processor
Fixed Data Mover
”Data Bubbles”

5
適応性のあるハードウェアにより “Data Bubble” を削減
Adaptable AI Engine
Adaptable Data Mover
”Data Bubbles”
Xilinx + AMD

6
世界最高レベルの ”ゼロダークシリコン” AI アクセラレータ
Near 100% efficiency: Achieving True Peak TOPS at Real AI Model Workloads
34%
35%
38%
42%
90%
66%
65%
62%
58%
10%
0% 20% 40% 60% 80% 100% 120%
nVidia T4
nVidia A10
nVidia A30
nVidia A100
Xilinx VCK5000
Actual TOPS Achieved vs Dark Silicon
Efficiency Dark Silicon

7
消費電力効率、コストパフォーマンスは主要な GPU の約2倍
0 20 40 60 80 100 120 140 160
nVidia T4
nVidia A10
nVidia A30
nVidia A100
Xilinx VCK5000
Img/s/watt (ResNet-50 v1.5)
Perf/w ResNet-50
(img/s)
Power SRP**
VCK5000 13,700 97W $2,745
A100 SXM 32,204 413W $12,235*
A30 15,411 165W $4,787
A10 10,676 150W $3,283
T4 5,423 75W $2,410
0.0 1.0 2.0 3.0 4.0 5.0 6.0
nVidia T4
nVidia A10
nVidia A30
nVidia A100
Xilinx VCK5000
Img/s/$ (ResNet-50 v1.5)
Perf/$
* A100 SXM pricing not available, using A100 PCIe 80GB pricing instead.
SXM price is typically more expensive than PCIe
** SRP captured from acmemicro.com as of Feb 22, 2022

8
VCK5000 + EPYC は NVIDIA T4 の2倍のTCOをもたらす
H.264 Decode + Yolov3 + 3x ResNet-18 H.264 Decode + tinyYolov3 + 3x ResNet-50
0 5 10 15 20 25 30 35
Xilinx VCK5000
nVidia T4
# of video streams
ML-heavy pipeline
0 10 20 30 40 50 60
Xilinx VCK5000
nVidia T4
# of video streams
Video-heavy pipeline

9
VCK5000 概要
Silicon 7nm Versal ACAP
Peak TOPS (INT8) 125*
SRP $2,745
Form Factor FH3/4L Dual
PCI Express Gen3 x16 / Gen4 x 8
TDP 75, 150, 225W
Off-chip Memory DDR4 16 GB
Internal SRAM 36.4 MB
FPGA Logic (LUTs) 900K
*AI Engines running at 1.25GHz

10
Adaptable
Hardware
AI Engines
ARM
Cortex-R5
2x ARM
Cortex-A72
PMC
Versal AI
AI Engine Core
Store Unit
Scalar Unit
Scalar
Register
File
Scalar ALU
Non-linear
Functions
Instruction Fetch
& Decode Unit
AGU
Vector Unit
Vector
Register
File
Fixed-Point
Vector Unit
Floating-Point
Vector Unit
Load Unit B
AGU
Load Unit A
AGU
Memory Interface Stream Interface
AI
Engine
Memory
AI
Engine
Memory
AI
Engine
Memory
AI Engine Array
AI
Engine
Memory
Interconnect
ISA-based
Vector Processor
Local
Memory
AI Vector
Extensions
5G Vector
Extensions
Data
Mover
AI Engine Tile
AI エンジンアーキテクチャ

11
core
L0
core
L0
core
L0
Block 0
L1
core
L0
core
L0
core
L0
Block 1
L1
L2
DRAM
D0
D0
D0
D0
固定した共有接続
• システム性能を制約
• レイテンシーの大幅なばらつき
データの複製
• レイテンシーの大幅な増大とばらつき
• 帯域不足による性能制約
• 消費電力の大幅な増大
従来のマルチコア
(キャッシュアーキテクチャ)
MEM
AI
Engine
MEM
AI
Engine
MEM
AI
Engine
AI
Engine
MEM
AI
Engine
AI
Engine
MEM
AI
Engine
MEM
MEM
AI エンジンアレイ
(インテリジェントエンジン)
専用の接続
• システム性能の制約
とならない
• レイテンシーは短く、
かつ確定的
密結合したメモリを分散
• キャッシュミスは無し
• レイテンシーは短く、かつ確定的
• システム性能の制約とならない高帯域
• 全体のメモリサイズを節約
• 消費電力を大幅に低減
AI
Engine
MEM
MEM
AI
Engine
AI エンジン = マルチコアに革新をもたらす

12
Signal
Processing
AI Inference
optimized optimized
AIE AIE-ML
AI Engine Architecture
1X
1X
1X
1X
1X
1X
1X
1X
1X
1X
Compute
Tiles
UltraRAM
LUTs
LUTs
 Optimized for signal processing AND ML
 Flexibility for high performance DSP applications
 Native support for INT8, INT16, FP32
INT4
INT8
INT16
BFLOAT16
INT32
FP32
AIE AIE-ML
OPS / Tile
1024
512
128
256
256
256
64
16
16
KB / Tile
64
Data
Memory
Program
Memory
16
16
32
16
42*
*Via software emulation
AIE-ML Architecture
2X 2X 2X 2X 2X
Compute
Tiles
LUTs
Mem
Tiles
 Optimized for ML Inference Applications
 Maximum AI/ML compute with reduced footprint
 Native support for INT4, INT8, INT16, bfloat16
 Fine grained sparsity HW optimization
 Enhanced FFT & complex math support
512KB 512KB
512KB
512KB
512KB
AI アプリケーションに最適化されたインテリジェントエンジン

13
Accelerator
Edge Aggregation &
Autonomous Systems
Intelligent Edge
Sensor & End Point
AI Compute (INT8x4)1
67 TOPS 479 TOPS
AI Compute (INT8)1
31 TOPS 228 TOPS
AIE-ML Tiles 34 304
Adaptable Engines 150K LUTs 521K LUTs
Processing Subsystem Dual-Core Arm® Cortex®-A72 Application Processing Unit / Dual-Core Arm Cortex-R5F Real-Time Processing Unit
Accelerator RAM (4MB) ✓ -
Total Memory 172Mb 575Mb
32G Transceivers 8 32
PCIe® ✓ ✓ (PCIe gen5 w/ DMA)
Video Decode Unit (VDU) - 4
Power 2
15-20W 75W
Engines
RAM
1: Total AI compute includes AI Engines, DSP Engines, and Adaptable Engines
2: Power Projections
VE2302 VE2802
AI/MLに向けた幅広い製品群

フルソリューションスタックを提供 – 構築のステップ (1)
1 モデルをインポート
FP32
INT8
Train/ Re-Train Quantization Inference
In-Framework Inference
Optimization
14

2 アプリケーション全体の流れをコンフィグレーション
フルソリューションスタックを提供 – 構築のステップ (2)
VCU
Decoder
Scaler
ML Job
Management
Detected
Box
Metadata
Find
Frames
Crop and
Scale
ML Job
Management
ResNet50
TinyYoloV3
1080p 1080p
Buffered
Frames
Decoded
Frames
416x416 224x224
Image Data
Metadata
VMSS
Server
On x86
VMSS
Client
FPGA Accelerated
Host Process
Alveo U30
Alveo
U50LV
VVAS ：Vitis ビデオ解析 SDK
(Vitis Video Analytics SDK)
VMSS：Video Machine
Learning Streaming Server
N x 1080p30
Video Streams
VVAS
Client (Alveo対応予定)
VCK5000
15

16
現在提供中のソリューション
VCK5000 on Xilinx.com
$2,745
Try TensorFlow direct inference
Docker by Mipsology
or Vitis AI by AMD-Xilinx
Try Video Analytics pipeline
Docker by Aupera

Aupera社インテリジェントビデオ解析ソリューション
17
https://japan.xilinx.com/products/boards-and-kits/vck5000.html
Aupera VMSS (Video Machine Learning Streaming Server)
ソリューションは複数のフル HD カメラからの高密度映像ソース
をサポートし、オブジェクトの識別と分類を実行します。
複数の推論モデルを同時に実行可能で、確定的かつ低レイテン
シで精度の高い結果を出力します。業界最小のコスト (TCO) を
実現できることが特徴です。

18
Mipsology社 Zebra AI 推論ソリューション
Zebra は、CNN の推論を高速化する理想的なエンジンです。CPU/GPU をシームレスに置き換え、
FPGA 上で様々なニューラルネットワークを高速化し、より低電力かつ低コストに計算を実行します。
Zebra は FPGA 技術やコンパイルの知識がなくてもプラグアンドプレイ方式ですばやく導入でき、
設計環境やアプリケーションへの変更も一切不要です。

19
Mipsology社 Zebra AI 推論ソリューション
https://japan.xilinx.com/products/boards-and-kits/vck5000.html#get-started-mipsology

いますぐにお試しいただけます
20
https://japan.xilinx.com/products/
boards-and-kits/vck5000.html
アダプティブコンピューティング研究推進体
(Adaptive Computing Research Initiative)
3時間単位で機材を無償で貸出し
リモートからアクセスして利用
Forum を通じてた技術支援
ACRi ルーム (クラウド)
動作が確認されたキット
をご購入いただけます
VCK5000を単体でご購入
Fully Customizable
FPGA Cloud Solutions
https://www.vmaccel.com/
クラウド
(近日公開予定)

サポート資料
21
https://docs.xilinx.com/r/ja-JP/ug1531-vck5000-install
https://japan.xilinx.com/applications/ai-inference/why-xilinx-ai.html
ザイリンクス AI の利点
VCK5000 Versal 開発カード
VCK5000 インストールガイド
Vitis AI 関連資料 https://docs.xilinx.com/v/u/ja-JP/ug1431-vitis-ai-documentation
AI エンジン関連ブログ

22
VCK5000
Versal AI コアを搭載した
初のアクセラレータカード
AI エンジンプロセッサにより
ML推論性能を大幅に向上
ML推論において
業界最高レベルの
ゼロダークシリコンを実現
Nvidia社 Ampere
の約2倍のTCO
VMSSに対応
VVAS (GStreamer) 対応予定
カスタムプラグイン
をサポート
TensorFlow / Pytorch
をスムーズに実装
Nvidia社 T4 の約2倍
のビデオ分析処理能力
まとめ

Disclaimer and Attribution
The information contained herein is for informational purposes only and is subject to change without notice. While every precaution has been taken in the
preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise
correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this
document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect
to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual
property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement
between the parties or in AMD's Standard Terms and Conditions of Sale. GD-18
© Copyright 2021 Advanced Micro Devices, Inc. All rights reserved. Xilinx, the Xilinx logo, AMD, the AMD Arrow logo, Alveo, Artix, Kintex, Kria, Spartan, Versal,
Vitis, Virtex, Vivado, Zynq, and other designated brands included herein are trademarks of Advanced Micro Devices, Inc. Other product names used in this
publication are for identification purposes only and may be trademarks of their respective companies.

AMD_Xilinx_AI_VCK5000_20220602R1.pdf

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AMD_Xilinx_AI_VCK5000_20220602R1.pdf

Similar to AMD_Xilinx_AI_VCK5000_20220602R1.pdf (20)

More from 直久住川

More from 直久住川 (20)