Inside Microsoft's FPGA-Based Configurable Cloud

A faster, more efficient, more intelligent cloud
Data explosion: 2013 4.4 ZB - 2020 44 ZB
ML, DNN, AI are driving requirements up faster
Autonomous decision making
Real-time insights into connected devices
Interactive user experiences
Cloud-scale services
Searches and recommendations (Indexing the Internet!)
The need for SCALE
The need for LOW-LATENCY
The need for THROUGHPUT
2013 2020
4.4 ZB 44 ZB
Source: IDC 2014

FPGAs
EVALUATION
CPUs and FPGAs,
ASICs under investigation
EFFICIENCY
TRAINING
CPUs and GPUs, limited
FPGAs, ASICs under
investigation
Control
Unit
(CU)
Registers
Arithmetic
Logic Unit
(ALU)
+
+
+
+
+
+
+
FLEXIBILITY
CPUs GPUs
ASICs

DRAM
Controller
USB
Controller Ethernet Controller
DSP
Slice
RAM
RAM
DSP
Slice
CPU
CPU

FPGA: spatial compute
FPGA
1001010011101011101100111001111001110101
0110001
0100101001110101110110011100111100111010
1101110
1010011101011101100111001111001110101100
1011001
Data
Instruction
Instruction
Instruction
100101001110101110110011100111100111
0101
Data
Instruction
Instruction
Instruction
CPU: temporal compute
CPU
Instruction

Catapult v0
Catapult v1
Scale v1
Catapult v2
2011 2012 2013 2014 2015 2016 …
Ignite unveiling
Production

WCS Gen4.1 Blade with NIC and Catapult FPGA
Catapult v2 Mezzanine card

Azure
Virtual Network
Virtual network
“Bring your own network”
Segment with subnets and
network security groups
Control traffic flow with
user defined routes
Backend
connectivity
Point-to-site for dev/test
VPN Gateways for secure
site-to-site connectivity
ExpressRoute for private
enterprise grade connectivity
Backend
connectivity
ExpressRoute
VPN Gateways
Users
Internet
Front-end access
Dynamic/reserved public
IP addresses
Direct VM access, ACLs for security
Load balancing
DNS services: hosting,
traffic management
DDoS protection

Management
Control
Data
Proprietary
appliance
Management plane Create a tenant
Control plane
Plumb tenant ACLs
to switches
Data plane Apply ACLs to flows
Azure Resource
Manager
Controller
Switch (Host)
Management
plane
Data plane
SDN
Control
plane
Key to flexibility and scale is Host SDN

Acts as a virtual switch inside Hyper-V VMSwitch
Provides core SDN functionality for Azure
networking services, including:
•  Address Virtualization for VNET
•  VIP -> DIP Translation for SLB
•  ACLs, Metering, and Security Guards
Uses programmable rule/flow tables to perform
per-packet actions
Available for Private Cloud in Microsoft Azure
Stack
VM Switch
VFP
VM VM
ACLs, Metering, Security
VNET
SLB (NAT)

VMSwitch exposes a typed Match-Action-Table
API to the controller
Controllers define policy
One table per policy
Key insight: Let controller tell switch
exactly what to do with which packets
e.g. encap/decap, rather than trying to use existing
abstractions (tunnels, …)
Tenant Description
VNet Description
VNet Routing
Policy
ACLs
NAT Endpoints
Flow Action
TO: 10.2/16 Encap to GW
TO: 10.1.1.5 Encap to 10.5.1.7
TO: !10/8 NAT out of VNET
Flow Action
TO: 79.3.1.2
DNAT to
10.1.1.2
TO: !10/8
SNAT to
79.3.1.2
Flow Action
TO:
10.1.1/24
Allow
10.4/16 Block
TO: !10/8 Allow
VNET LB NAT ACLS
VFP
Controller
VM 1
10.1.1.2

Hosts are Scaling Up:
1G à 10G à 40G à 50G à 100G
Reduces COGS of VMs (more VMs per host) and
enables new workloads
Need the performance of hardware to implement policy
without CPU
Need to support new scenarios:
BYO IP, BYO Topology, BYO Appliance
We are always pushing richer semantics to virtual
networks
Need the programmability of software to be agile and
future-proof
“How do we get the
performance of
hardware
with programmability
of software?

Use an FPGA for reconfigurable functions
FPGAs are already used in Bing (Catapult)
Roll out Hardware as we do software
Programmed using Generic Flow Tables (GFT)
Language for programming SDN to hardware
Uses connections and structured actions as primitives
Deployed on all new Azure compute servers since
late 2015
SmartNIC can also do Crypto, QoS, storage
acceleration, and more…
Host
SmartNIC
FPGA
ToR
NIC ASIC
SmartNIC
CPU

VM
VFP
Southbound API
GFT Offload API (NDIS)
VMSwitch
Northbound API
GFT
Table
First Packet
GFT Offload Engine
50G
QoSCrypto RDMA
GFT
Transposition
Engine
REWRITE
SLB Decap SLB NAT VNET ACL Metering
ControllerControllerController
Encap
SmartNIC
DNATDecap Allow Meter
Rule Action
* Meter
Rule Action
* Allow
Rule Action
* Rewrite
Rule Action
* DNAT
Rule Action
* Decap
Flow Action
1.2.3.1->1.3.4.1,
62362->80
Decap, DNAT,
Rewrite, Meter
Flow Action
1.2.3.1->1.3.4.1,
62362->80
Decap, DNAT,
Rewrite, Meter

SDN/Networking policy applied in
software in the host
FPGA acceleration used to
apply all policies
VM 1 VM 2
Virtual switch
Physical
server 1
Physical switch
Virtual switch
Physical
server 2
Virtual
Network VM 1 VM 2
Physical switch
Virtual
Network

The fastest cloud network
Highest bandwidth VMs of any cloud
DS15v2 & D15v2 VMs get 25Gbps
Consistent low latency network performance
Provides SR-IOV to the VM
Up to 10x latency improvement
Increased packets per second (PPS)
Reduced jitter means more consistency in workloads
Enables workloads requiring native performance to run in cloud VMs
>2x improvement for many DB and OLTP applications

New 50GbE SmartNIC for Project Olympus
(Announced at OCP 2017)

Deep neural networks (DNN)
have led to breakthroughs in
major AI problems
Computer vision
Language translation
Speech recognition
And more…
But DNNs are challenging to
serve in online services
Latency, cost, and power-constrained
Size and complexity of DNNs outpacing
growth of CPUs
DNN

Microsoft has the world’s largest cloud investment in FPGAs
Multiple Exa-Ops of aggregate AI capacity
We have built powerful DNN serving platform on our FPGA fabric
FPGAs ideal for adapting to rapidly evolving ML
CNNs, LSTMs, MLPs, reinforcement learning, feature extraction, decision trees,
etc.
Inference-optimized numerical precision
Custom binarized, ternarized, tiny precision nets
Sparsity, deep compression for larger, faster models
Tens to hundreds of TOPS of effective inference throughput at low batch
sizes
Ultra-low latency serving on modern DNNs
>10X better than CPUs and GPUs
Scale to many FPGAs in single DNN service
Performance
Flexibility
Scale

software
FPGA
99.9% Query Latency versus Queries/sec
HWvs.SWLatencyandLoad
average software load
99.9% software latency
99.9% FPGA latency
average FPGA query load

Management
Fabric
Hardware
(FPGA)
Super Low-
latency
Network

Traditional software (CPU) server plane
QPI CPUCPU
QSFP
TOR40Gb/s
Web search
ranking

Web search
ranking
Traditional software (CPU) server plane
QPICPU
QSFP
40Gb/s ToR
FPGA
CPU
40Gb/s
QSFP QSFP
Hardware acceleration plane
Interconnected FPGAs form a
separate plane of computation
Can be managed and used
independently from the CPU
Web search
ranking
Deep neural
networks
SDN offload
SQL

Flexibility: many services need a large number of FPGAs,
others underutilize theirs
Deploy exactly as many instances as needed
Many accelerators can handle load of multiple software clients
Consolidate underutilized FPGA accelerators into fewer shared instances
Increases efficiency & makes room for more accelerators
Many services need to access multiple types of accelerators

F F F
L0
L1
F F F
L0
Pretrained DNN Model DNN Hardware Microservice
DNN Engine
Instr Decoder
& Control
Neural FU

Low-Level AI Representation
(LLAIR) & Federated Runtime
Customer DNN Model
(TF, CNTK, etc)
Hosted FPGA-powered
Service in Azure
FPGA0 FPGA1
Add500
1000-dim Vector
1000-dim Vector
Split
500x500
Matrix
MatMul500
500x500
Matrix
MatMul500 MatMul500 MatMul500
500x500
Matrix
Add500
Add500
Sigmoid500 Sigmoid500
Split
Add500
500 500
Concat
500 500
500x500
Matrix

Host
Ranking Service
LTL
Host
FE
FPGA
Ranking Service
LTL
Host
Free
FPGA
Ranking Service
LTL
Host
DNN
FPGA
Ranking Service
LTL
Host
FE
FPGA
Host
LTL LTL

CPU compute layer
Reconfigurable
compute layer
Converged network

We look forward to
eventually making this
available to you,
a major step toward
democratizing AI with the
power of FPGA
àOur technology will push the boundary of what
is possible to deploy in the cloud
Deeper convolutional neural networks for more
accurate computer vision
Higher dimensional recurrent neural networks toward
human-like natural language processing
State-of-the-art translation and speech recognition
And much more…
This technology is already powering services
within Microsoft

Inside Microsoft's FPGA-Based Configurable Cloud

Inside Microsoft's FPGA-Based Configurable Cloud

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Inside Microsoft's FPGA-Based Configurable Cloud

Similaire à Inside Microsoft's FPGA-Based Configurable Cloud (20)

Plus de inside-BigData.com

Plus de inside-BigData.com (20)

Dernier

Dernier (20)

Inside Microsoft's FPGA-Based Configurable Cloud