SlideShare une entreprise Scribd logo
1  sur  41
A faster, more efficient, more intelligent cloud
Data explosion: 2013 4.4 ZB - 2020 44 ZB
ML, DNN, AI are driving requirements up faster
Autonomous decision making
Real-time insights into connected devices
Interactive user experiences
Cloud-scale services
Searches and recommendations (Indexing the Internet!)
The need for SCALE
The need for LOW-LATENCY
The need for THROUGHPUT
1
0
0
1
1
0
1
0
1
0
2013
1
0
0
1
1
0
1
0
1
0
2020
4.4 ZB 44 ZB
0
1
0
0
0
1
0
1
0
1
1
1
0
1
1
0
1
0
1
0
1
0
1
1
0
0
0
1
1
0
0
1
0
0
0
1
0
1
0
1
1
0
0
1
1
0
1
0
1
0
0
0
1
0
0
1
1
0
1
1
1
0
1
1
1
0
1
0
1
0
0
0
0
1
0
0
1
1
1
0
0
1
1
0
0
0
1
0
1
1
Source: IDC 2014
FPGAs
EVALUATION
CPUs and FPGAs,
ASICs under investigation
EFFICIENCY
TRAINING
CPUs and GPUs, limited FPGAs,
ASICs under investigation
Control
Unit
(CU)
Registers
Arithmetic
Logic Unit
(ALU)
+
+
+
+
+
+
+
FLEXIBILITY
CPUs GPUs
ASICs
DRAM
Controller
USB
Controller
Ethernet Controller
DSP
Slice
RAM
RAM
DSP
Slice
CPU
CPU
FPGA: spatial compute
FPGA
Data
Instruction
Instruction
Instruction
Data
Instruction
Instruction
Instruction
CPU: temporal compute
CPU
Instruction
Catapult v0
Catapult v1
Scale v1
Catapult v2
2011 2012 2013 2014 2015 2016 …
Ignite unveiling
Production
WCS Gen4.1 Blade with NIC and Catapult FPGA
Catapult v2 Mezzanine card
Azure
Virtual Network
Virtual network
“Bring your own network”
Segment with subnets and
network security groups
Control traffic flow with
user defined routes
Backend
connectivity
Point-to-site for dev/test
VPN Gateways for secure
site-to-site connectivity
ExpressRoute for private
enterprise grade connectivity
Backend
connectivity
ExpressRoute
VPN Gateways
Users
Internet
Front-end access
Dynamic/reserved public
IP addresses
Direct VM access, ACLs for security
Load balancing
DNS services: hosting,
traffic management
DDoS protection
Management
Control
Data
Proprietary
appliance
Management plane Create a tenant
Control plane
Plumb tenant ACLs
to switches
Data plane Apply ACLs to flows
Azure Resource
Manager
Controller
Switch (Host)
Management
plane
Data plane
SDN
Control
plane
Key to flexibility and scale is Host SDN
Acts as a virtual switch inside Hyper-V VMSwitch
Provides core SDN functionality for Azure
networking services, including:
• Address Virtualization for VNET
• VIP -> DIP Translation for SLB
• ACLs, Metering, and Security Guards
Uses programmable rule/flow tables to perform
per-packet actions
Available for Private Cloud in Microsoft Azure Stack
VM Switch
VFP
VM VM
ACLs, Metering, Security
VNET
SLB (NAT)
VMSwitch exposes a typed Match-Action-Table
API to the controller
Controllers define policy
One table per policy
Key insight: Let controller tell switch
exactly what to do with which packets
e.g. encap/decap, rather than trying to use existing
abstractions (tunnels, …)
Tenant Description
VNet Description
VNet Routing
Policy
ACLs
NAT Endpoints
Flow Action
TO: 10.2/16 Encap to GW
TO: 10.1.1.5 Encap to 10.5.1.7
TO: !10/8 NAT out of VNET
Flow Action
TO: 79.3.1.2 DNAT to 10.1.1.2
TO: !10/8 SNAT to 79.3.1.2
Flow Action
TO: 10.1.1/24 Allow
10.4/16 Block
TO: !10/8 Allow
VNET LB NAT ACLS
VFP
Controller
VM 1
10.1.1.2
Hosts are Scaling Up:
1G  10G  40G  50G  100G
Reduces COGS of VMs (more VMs per host) and enables
new workloads
Need the performance of hardware to implement policy
without CPU
Need to support new scenarios:
BYO IP, BYO Topology, BYO Appliance
We are always pushing richer semantics to virtual networks
Need the programmability of software to be agile and
future-proof
“How do we get the
performance of hardware
with programmability
of software?
Use an FPGA for reconfigurable functions
FPGAs are already used in Bing (Catapult)
Roll out Hardware as we do software
Programmed using Generic Flow Tables (GFT)
Language for programming SDN to hardware
Uses connections and structured actions as primitives
Deployed on all new Azure compute servers
since late 2015
SmartNIC can also do Crypto, QoS, storage
acceleration, and more…
Host
SmartNIC
FPGA
ToR
NIC ASIC
SmartNIC
CPU
VM
VFP
Southbound API
GFT Offload API (NDIS)
VMSwitch
Northbound API
GFT
Table
First Packet
GFT Offload Engine
50G
QoSCrypto RDMA
GFT
Transposition
Engine
REWRITE
SLB Decap SLB NAT VNET ACL Metering
ControllerControllerController
Encap
SmartNIC
DNATDecap Allow Meter
Rule Action
* Meter
Rule Action
* Allow
Rule Action
* Rewrite
Rule Action
* DNAT
Rule Action
* Decap
Flow Action
1.2.3.1->1.3.4.1,
62362->80
Decap, DNAT,
Rewrite, Meter
Flow Action
1.2.3.1->1.3.4.1,
62362->80
Decap, DNAT,
Rewrite, Meter
SDN/Networking policy applied in
software in the host
FPGA acceleration used to
apply all policies
VM 1 VM 2
Virtual switch
Physical
server 1
Physical switch
Virtual switch
Physical
server 2
Virtual
Network VM 1 VM 2
Physical switch
Virtual
Network
The fastest cloud network
Highest bandwidth VMs of any cloud
DS15v2 & D15v2 VMs get 25Gbps
Consistent low latency network performance
Provides SR-IOV to the VM
Up to 10x latency improvement
Increased packets per second (PPS)
Reduced jitter means more consistency in workloads
Enables workloads requiring native performance to run in cloud VMs
>2x improvement for many DB and OLTP applications
New 50GbE SmartNIC for Project Olympus
(Announced at OCP 2017)
Deep neural networks (DNN)
have led to breakthroughs in
major AI problems
Computer vision
Language translation
Speech recognition
And more…
But DNNs are challenging to
serve in online services
Latency, cost, and power-constrained
Size and complexity of DNNs outpacing
growth of CPUs
DNN
Microsoft has the world’s largest cloud investment in FPGAs
Multiple Exa-Ops of aggregate AI capacity
We have built powerful DNN serving platform on our FPGA fabric
FPGAs ideal for adapting to rapidly evolving ML
CNNs, LSTMs, MLPs, reinforcement learning, feature extraction, decision trees, etc.
Inference-optimized numerical precision
Custom binarized, ternarized, tiny precision nets
Sparsity, deep compression for larger, faster models
Tens to hundreds of TOPS of effective inference throughput at low batch sizes
Ultra-low latency serving on modern DNNs
>10X better than CPUs and GPUs
Scale to many FPGAs in single DNN service
Performance
Flexibility
Scale
software
FPGA
99.9% Query Latency versus Queries/sec
HWvs.SWLatencyandLoad
average software load
99.9% software latency
99.9% FPGA latency
average FPGA query load
Management
Fabric
Hardware
(FPGA)
Super Low-
latency
Network
Traditional software (CPU) server plane
QPI CPUCPU
QSFP
TOR40Gb/s
Web search
ranking
Web search
ranking
Traditional software (CPU) server plane
QPICPU
QSFP
40Gb/s ToR
FPGA
CPU
40Gb/s
QSFP QSFP
Hardware acceleration plane
Interconnected FPGAs form a
separate plane of computation
Can be managed and used
independently from the CPU
Web search
ranking
Deep neural
networks
SDN offload
SQL
Flexibility: many services need a large number of FPGAs,
others underutilize theirs
Deploy exactly as many instances as needed
Many accelerators can handle load of multiple software clients
Consolidate underutilized FPGA accelerators into fewer shared instances
Increases efficiency & makes room for more accelerators
Many services need to access multiple types of accelerators
F F F
L0
L1
F F F
L0
Pretrained DNN Model DNN Hardware Microservice
DNN Engine
Instr Decoder
& Control
Neural FU
CPU FPGA CPU FPGA
CPU FPGA CPU FPGA
Low-Level AI Representation (LLAIR)
& Federated Runtime
Customer DNN Model
(TF, CNTK, etc)
Hosted FPGA-powered
Service in Azure
FPGA0 FPGA1
Add500
1000-dim Vector
1000-dim Vector
Split
500x500
Matrix
MatMul500
500x500
Matrix
MatMul500 MatMul500 MatMul500
500x500
Matrix
Add500
Add500
Sigmoid500 Sigmoid500
Split
Add500
500 500
Concat
500 500
500x500
Matrix
Host
Ranking Service
LTL
Host
FE
FPGA
Ranking Service
LTL
Host
Free
FPGA
Ranking Service
LTL
Host
DNN
FPGA
Ranking Service
LTL
Host
FE
FPGA
Host
LTL LTL
CPU compute layer
Reconfigurable
compute layer
Converged network
We look forward to
eventually making this
available to you,
a major step toward
democratizing AI with
the power of FPGA
Our technology will push the boundary of
what is possible to deploy in the cloud
Deeper convolutional neural networks for more
accurate computer vision
Higher dimensional recurrent neural networks toward
human-like natural language processing
State-of-the-art translation and speech recognition
And much more…
This technology is already powering services
within Microsoft
Inside Microsoft's FPGA-Based Configurable Cloud

Contenu connexe

Tendances

Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward
 

Tendances (20)

NVIDIA DataArt IT
NVIDIA DataArt ITNVIDIA DataArt IT
NVIDIA DataArt IT
 
OSMC 2023 | Large-scale logging made easy by Alexandr Valialkin
OSMC 2023 | Large-scale logging made easy by Alexandr ValialkinOSMC 2023 | Large-scale logging made easy by Alexandr Valialkin
OSMC 2023 | Large-scale logging made easy by Alexandr Valialkin
 
H263.ppt
H263.pptH263.ppt
H263.ppt
 
RISC-V Summit 2020: The Next Ten Years
RISC-V Summit 2020: The Next Ten YearsRISC-V Summit 2020: The Next Ten Years
RISC-V Summit 2020: The Next Ten Years
 
Kernel advantages for Istio realized with Cilium
Kernel advantages for Istio realized with CiliumKernel advantages for Istio realized with Cilium
Kernel advantages for Istio realized with Cilium
 
ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlow
 
Kubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOKubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPO
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch Deck
 
CoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
LCA13: Power State Coordination Interface
LCA13: Power State Coordination InterfaceLCA13: Power State Coordination Interface
LCA13: Power State Coordination Interface
 
NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21 NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21
 
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 

Similaire à Inside Microsoft's FPGA-Based Configurable Cloud

Service Assurance for Virtual Network Functions in Cloud-Native Environments
Service Assurance for Virtual Network Functions in Cloud-Native EnvironmentsService Assurance for Virtual Network Functions in Cloud-Native Environments
Service Assurance for Virtual Network Functions in Cloud-Native Environments
Nikos Anastopoulos
 
Brocade Administration & troubleshooting
Brocade Administration & troubleshootingBrocade Administration & troubleshooting
Brocade Administration & troubleshooting
prakashjjaya
 

Similaire à Inside Microsoft's FPGA-Based Configurable Cloud (20)

Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community
 
Netsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfvNetsft2017 day in_life_of_nfv
Netsft2017 day in_life_of_nfv
 
Новые коммутаторы QFX10000. Технология JunOS Fusion
Новые коммутаторы QFX10000. Технология JunOS FusionНовые коммутаторы QFX10000. Технология JunOS Fusion
Новые коммутаторы QFX10000. Технология JunOS Fusion
 
6WINDGate™ - Enabling Cloud RAN Virtualization
6WINDGate™ - Enabling Cloud RAN Virtualization6WINDGate™ - Enabling Cloud RAN Virtualization
6WINDGate™ - Enabling Cloud RAN Virtualization
 
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitchDPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
 
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentationTurbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
Turbocharge the NFV Data Plane in the SDN Era - a Radisys presentation
 
From SDN to Cloud Networking
From SDN to Cloud NetworkingFrom SDN to Cloud Networking
From SDN to Cloud Networking
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
 
Polyteda Power DRC/LVS July 2016
Polyteda Power DRC/LVS July 2016Polyteda Power DRC/LVS July 2016
Polyteda Power DRC/LVS July 2016
 
Virtualization & Network Connectivity
Virtualization & Network Connectivity Virtualization & Network Connectivity
Virtualization & Network Connectivity
 
PowerDRC/LVS 2.2 released by POLYTEDA
PowerDRC/LVS 2.2 released by POLYTEDAPowerDRC/LVS 2.2 released by POLYTEDA
PowerDRC/LVS 2.2 released by POLYTEDA
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFV
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph
 
Summit 16: How to Compose a New OPNFV Solution Stack?
Summit 16: How to Compose a New OPNFV Solution Stack?Summit 16: How to Compose a New OPNFV Solution Stack?
Summit 16: How to Compose a New OPNFV Solution Stack?
 
Software Defined Network (SDN) using ASR9000 :: BRKSPG-2722 | San Diego 2015
Software Defined Network (SDN) using ASR9000 :: BRKSPG-2722 | San Diego 2015Software Defined Network (SDN) using ASR9000 :: BRKSPG-2722 | San Diego 2015
Software Defined Network (SDN) using ASR9000 :: BRKSPG-2722 | San Diego 2015
 
Service Assurance for Virtual Network Functions in Cloud-Native Environments
Service Assurance for Virtual Network Functions in Cloud-Native EnvironmentsService Assurance for Virtual Network Functions in Cloud-Native Environments
Service Assurance for Virtual Network Functions in Cloud-Native Environments
 
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics WorkshopLagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
 
Brocade Administration & troubleshooting
Brocade Administration & troubleshootingBrocade Administration & troubleshooting
Brocade Administration & troubleshooting
 

Plus de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 

Plus de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Dernier

Dernier (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Inside Microsoft's FPGA-Based Configurable Cloud

  • 1.
  • 2.
  • 3. A faster, more efficient, more intelligent cloud Data explosion: 2013 4.4 ZB - 2020 44 ZB ML, DNN, AI are driving requirements up faster Autonomous decision making Real-time insights into connected devices Interactive user experiences Cloud-scale services Searches and recommendations (Indexing the Internet!) The need for SCALE The need for LOW-LATENCY The need for THROUGHPUT 1 0 0 1 1 0 1 0 1 0 2013 1 0 0 1 1 0 1 0 1 0 2020 4.4 ZB 44 ZB 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 1 0 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0 1 0 0 0 1 0 0 1 1 0 1 1 1 0 1 1 1 0 1 0 1 0 0 0 0 1 0 0 1 1 1 0 0 1 1 0 0 0 1 0 1 1 Source: IDC 2014
  • 4. FPGAs EVALUATION CPUs and FPGAs, ASICs under investigation EFFICIENCY TRAINING CPUs and GPUs, limited FPGAs, ASICs under investigation Control Unit (CU) Registers Arithmetic Logic Unit (ALU) + + + + + + + FLEXIBILITY CPUs GPUs ASICs
  • 7. Catapult v0 Catapult v1 Scale v1 Catapult v2 2011 2012 2013 2014 2015 2016 … Ignite unveiling Production
  • 8.
  • 9. WCS Gen4.1 Blade with NIC and Catapult FPGA Catapult v2 Mezzanine card
  • 10.
  • 11. Azure Virtual Network Virtual network “Bring your own network” Segment with subnets and network security groups Control traffic flow with user defined routes Backend connectivity Point-to-site for dev/test VPN Gateways for secure site-to-site connectivity ExpressRoute for private enterprise grade connectivity Backend connectivity ExpressRoute VPN Gateways Users Internet Front-end access Dynamic/reserved public IP addresses Direct VM access, ACLs for security Load balancing DNS services: hosting, traffic management DDoS protection
  • 12. Management Control Data Proprietary appliance Management plane Create a tenant Control plane Plumb tenant ACLs to switches Data plane Apply ACLs to flows Azure Resource Manager Controller Switch (Host) Management plane Data plane SDN Control plane Key to flexibility and scale is Host SDN
  • 13. Acts as a virtual switch inside Hyper-V VMSwitch Provides core SDN functionality for Azure networking services, including: • Address Virtualization for VNET • VIP -> DIP Translation for SLB • ACLs, Metering, and Security Guards Uses programmable rule/flow tables to perform per-packet actions Available for Private Cloud in Microsoft Azure Stack VM Switch VFP VM VM ACLs, Metering, Security VNET SLB (NAT)
  • 14. VMSwitch exposes a typed Match-Action-Table API to the controller Controllers define policy One table per policy Key insight: Let controller tell switch exactly what to do with which packets e.g. encap/decap, rather than trying to use existing abstractions (tunnels, …) Tenant Description VNet Description VNet Routing Policy ACLs NAT Endpoints Flow Action TO: 10.2/16 Encap to GW TO: 10.1.1.5 Encap to 10.5.1.7 TO: !10/8 NAT out of VNET Flow Action TO: 79.3.1.2 DNAT to 10.1.1.2 TO: !10/8 SNAT to 79.3.1.2 Flow Action TO: 10.1.1/24 Allow 10.4/16 Block TO: !10/8 Allow VNET LB NAT ACLS VFP Controller VM 1 10.1.1.2
  • 15. Hosts are Scaling Up: 1G  10G  40G  50G  100G Reduces COGS of VMs (more VMs per host) and enables new workloads Need the performance of hardware to implement policy without CPU Need to support new scenarios: BYO IP, BYO Topology, BYO Appliance We are always pushing richer semantics to virtual networks Need the programmability of software to be agile and future-proof “How do we get the performance of hardware with programmability of software?
  • 16. Use an FPGA for reconfigurable functions FPGAs are already used in Bing (Catapult) Roll out Hardware as we do software Programmed using Generic Flow Tables (GFT) Language for programming SDN to hardware Uses connections and structured actions as primitives Deployed on all new Azure compute servers since late 2015 SmartNIC can also do Crypto, QoS, storage acceleration, and more… Host SmartNIC FPGA ToR NIC ASIC SmartNIC CPU
  • 17. VM VFP Southbound API GFT Offload API (NDIS) VMSwitch Northbound API GFT Table First Packet GFT Offload Engine 50G QoSCrypto RDMA GFT Transposition Engine REWRITE SLB Decap SLB NAT VNET ACL Metering ControllerControllerController Encap SmartNIC DNATDecap Allow Meter Rule Action * Meter Rule Action * Allow Rule Action * Rewrite Rule Action * DNAT Rule Action * Decap Flow Action 1.2.3.1->1.3.4.1, 62362->80 Decap, DNAT, Rewrite, Meter Flow Action 1.2.3.1->1.3.4.1, 62362->80 Decap, DNAT, Rewrite, Meter
  • 18. SDN/Networking policy applied in software in the host FPGA acceleration used to apply all policies VM 1 VM 2 Virtual switch Physical server 1 Physical switch Virtual switch Physical server 2 Virtual Network VM 1 VM 2 Physical switch Virtual Network
  • 19. The fastest cloud network Highest bandwidth VMs of any cloud DS15v2 & D15v2 VMs get 25Gbps Consistent low latency network performance Provides SR-IOV to the VM Up to 10x latency improvement Increased packets per second (PPS) Reduced jitter means more consistency in workloads Enables workloads requiring native performance to run in cloud VMs >2x improvement for many DB and OLTP applications
  • 20.
  • 21.
  • 22. New 50GbE SmartNIC for Project Olympus (Announced at OCP 2017)
  • 23.
  • 24. Deep neural networks (DNN) have led to breakthroughs in major AI problems Computer vision Language translation Speech recognition And more… But DNNs are challenging to serve in online services Latency, cost, and power-constrained Size and complexity of DNNs outpacing growth of CPUs DNN
  • 25.
  • 26. Microsoft has the world’s largest cloud investment in FPGAs Multiple Exa-Ops of aggregate AI capacity We have built powerful DNN serving platform on our FPGA fabric FPGAs ideal for adapting to rapidly evolving ML CNNs, LSTMs, MLPs, reinforcement learning, feature extraction, decision trees, etc. Inference-optimized numerical precision Custom binarized, ternarized, tiny precision nets Sparsity, deep compression for larger, faster models Tens to hundreds of TOPS of effective inference throughput at low batch sizes Ultra-low latency serving on modern DNNs >10X better than CPUs and GPUs Scale to many FPGAs in single DNN service Performance Flexibility Scale
  • 27. software FPGA 99.9% Query Latency versus Queries/sec HWvs.SWLatencyandLoad average software load 99.9% software latency 99.9% FPGA latency average FPGA query load
  • 28.
  • 29.
  • 31. Traditional software (CPU) server plane QPI CPUCPU QSFP TOR40Gb/s Web search ranking
  • 32. Web search ranking Traditional software (CPU) server plane QPICPU QSFP 40Gb/s ToR FPGA CPU 40Gb/s QSFP QSFP Hardware acceleration plane Interconnected FPGAs form a separate plane of computation Can be managed and used independently from the CPU Web search ranking Deep neural networks SDN offload SQL
  • 33. Flexibility: many services need a large number of FPGAs, others underutilize theirs Deploy exactly as many instances as needed Many accelerators can handle load of multiple software clients Consolidate underutilized FPGA accelerators into fewer shared instances Increases efficiency & makes room for more accelerators Many services need to access multiple types of accelerators
  • 34. F F F L0 L1 F F F L0 Pretrained DNN Model DNN Hardware Microservice DNN Engine Instr Decoder & Control Neural FU
  • 35. CPU FPGA CPU FPGA
  • 36. CPU FPGA CPU FPGA
  • 37. Low-Level AI Representation (LLAIR) & Federated Runtime Customer DNN Model (TF, CNTK, etc) Hosted FPGA-powered Service in Azure FPGA0 FPGA1 Add500 1000-dim Vector 1000-dim Vector Split 500x500 Matrix MatMul500 500x500 Matrix MatMul500 MatMul500 MatMul500 500x500 Matrix Add500 Add500 Sigmoid500 Sigmoid500 Split Add500 500 500 Concat 500 500 500x500 Matrix
  • 38. Host Ranking Service LTL Host FE FPGA Ranking Service LTL Host Free FPGA Ranking Service LTL Host DNN FPGA Ranking Service LTL Host FE FPGA Host LTL LTL
  • 39. CPU compute layer Reconfigurable compute layer Converged network
  • 40. We look forward to eventually making this available to you, a major step toward democratizing AI with the power of FPGA Our technology will push the boundary of what is possible to deploy in the cloud Deeper convolutional neural networks for more accurate computer vision Higher dimensional recurrent neural networks toward human-like natural language processing State-of-the-art translation and speech recognition And much more… This technology is already powering services within Microsoft