At Microsoft’s annual developers conference, Microsoft Azure CTO Mark Russinovich disclosed major advances in Microsoft’s hyperscale deployment of Intel field programmable gate arrays (FPGAs). These advances have resulted in the industry’s fastest public cloud network, and new technology for acceleration of Deep Neural Networks (DNNs) that replicate “thinking” in a manner that’s conceptually similar to that of the human brain.
Watch the video: http://wp.me/p3RLHQ-gNu
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Inside Microsoft's FPGA-Based Configurable Cloud
1.
2. A faster, more efficient, more intelligent cloud
Data explosion: 2013 4.4 ZB - 2020 44 ZB
ML, DNN, AI are driving requirements up faster
Autonomous decision making
Real-time insights into connected devices
Interactive user experiences
Cloud-scale services
Searches and recommendations (Indexing the Internet!)
The need for SCALE
The need for LOW-LATENCY
The need for THROUGHPUT
2013 2020
4.4 ZB 44 ZB
Source: IDC 2014
3. FPGAs
EVALUATION
CPUs and FPGAs,
ASICs under investigation
EFFICIENCY
TRAINING
CPUs and GPUs, limited
FPGAs, ASICs under
investigation
Control
Unit
(CU)
Registers
Arithmetic
Logic Unit
(ALU)
+
+
+
+
+
+
+
FLEXIBILITY
CPUs GPUs
ASICs
8. WCS Gen4.1 Blade with NIC and Catapult FPGA
Catapult v2 Mezzanine card
9.
10. Azure
Virtual Network
Virtual network
“Bring your own network”
Segment with subnets and
network security groups
Control traffic flow with
user defined routes
Backend
connectivity
Point-to-site for dev/test
VPN Gateways for secure
site-to-site connectivity
ExpressRoute for private
enterprise grade connectivity
Backend
connectivity
ExpressRoute
VPN Gateways
Users
Internet
Front-end access
Dynamic/reserved public
IP addresses
Direct VM access, ACLs for security
Load balancing
DNS services: hosting,
traffic management
DDoS protection
11. Management
Control
Data
Proprietary
appliance
Management plane Create a tenant
Control plane
Plumb tenant ACLs
to switches
Data plane Apply ACLs to flows
Azure Resource
Manager
Controller
Switch (Host)
Management
plane
Data plane
SDN
Control
plane
Key to flexibility and scale is Host SDN
12. Acts as a virtual switch inside Hyper-V VMSwitch
Provides core SDN functionality for Azure
networking services, including:
• Address Virtualization for VNET
• VIP -> DIP Translation for SLB
• ACLs, Metering, and Security Guards
Uses programmable rule/flow tables to perform
per-packet actions
Available for Private Cloud in Microsoft Azure
Stack
VM Switch
VFP
VM VM
ACLs, Metering, Security
VNET
SLB (NAT)
13. VMSwitch exposes a typed Match-Action-Table
API to the controller
Controllers define policy
One table per policy
Key insight: Let controller tell switch
exactly what to do with which packets
e.g. encap/decap, rather than trying to use existing
abstractions (tunnels, …)
Tenant Description
VNet Description
VNet Routing
Policy
ACLs
NAT Endpoints
Flow Action
TO: 10.2/16 Encap to GW
TO: 10.1.1.5 Encap to 10.5.1.7
TO: !10/8 NAT out of VNET
Flow Action
TO: 79.3.1.2
DNAT to
10.1.1.2
TO: !10/8
SNAT to
79.3.1.2
Flow Action
TO:
10.1.1/24
Allow
10.4/16 Block
TO: !10/8 Allow
VNET LB NAT ACLS
VFP
Controller
VM 1
10.1.1.2
14. Hosts are Scaling Up:
1G à 10G à 40G à 50G à 100G
Reduces COGS of VMs (more VMs per host) and
enables new workloads
Need the performance of hardware to implement policy
without CPU
Need to support new scenarios:
BYO IP, BYO Topology, BYO Appliance
We are always pushing richer semantics to virtual
networks
Need the programmability of software to be agile and
future-proof
“How do we get the
performance of
hardware
with programmability
of software?
15. Use an FPGA for reconfigurable functions
FPGAs are already used in Bing (Catapult)
Roll out Hardware as we do software
Programmed using Generic Flow Tables (GFT)
Language for programming SDN to hardware
Uses connections and structured actions as primitives
Deployed on all new Azure compute servers since
late 2015
SmartNIC can also do Crypto, QoS, storage
acceleration, and more…
Host
SmartNIC
FPGA
ToR
NIC ASIC
SmartNIC
CPU
16. VM
VFP
Southbound API
GFT Offload API (NDIS)
VMSwitch
Northbound API
GFT
Table
First Packet
GFT Offload Engine
50G
QoSCrypto RDMA
GFT
Transposition
Engine
REWRITE
SLB Decap SLB NAT VNET ACL Metering
ControllerControllerController
Encap
SmartNIC
DNATDecap Allow Meter
Rule Action
* Meter
Rule Action
* Allow
Rule Action
* Rewrite
Rule Action
* DNAT
Rule Action
* Decap
Flow Action
1.2.3.1->1.3.4.1,
62362->80
Decap, DNAT,
Rewrite, Meter
Flow Action
1.2.3.1->1.3.4.1,
62362->80
Decap, DNAT,
Rewrite, Meter
17. SDN/Networking policy applied in
software in the host
FPGA acceleration used to
apply all policies
VM 1 VM 2
Virtual switch
Physical
server 1
Physical switch
Virtual switch
Physical
server 2
Virtual
Network VM 1 VM 2
Physical switch
Virtual
Network
18. The fastest cloud network
Highest bandwidth VMs of any cloud
DS15v2 & D15v2 VMs get 25Gbps
Consistent low latency network performance
Provides SR-IOV to the VM
Up to 10x latency improvement
Increased packets per second (PPS)
Reduced jitter means more consistency in workloads
Enables workloads requiring native performance to run in cloud VMs
>2x improvement for many DB and OLTP applications
23. Deep neural networks (DNN)
have led to breakthroughs in
major AI problems
Computer vision
Language translation
Speech recognition
And more…
But DNNs are challenging to
serve in online services
Latency, cost, and power-constrained
Size and complexity of DNNs outpacing
growth of CPUs
DNN
24.
25. Microsoft has the world’s largest cloud investment in FPGAs
Multiple Exa-Ops of aggregate AI capacity
We have built powerful DNN serving platform on our FPGA fabric
FPGAs ideal for adapting to rapidly evolving ML
CNNs, LSTMs, MLPs, reinforcement learning, feature extraction, decision trees,
etc.
Inference-optimized numerical precision
Custom binarized, ternarized, tiny precision nets
Sparsity, deep compression for larger, faster models
Tens to hundreds of TOPS of effective inference throughput at low batch
sizes
Ultra-low latency serving on modern DNNs
>10X better than CPUs and GPUs
Scale to many FPGAs in single DNN service
Performance
Flexibility
Scale
26. software
FPGA
99.9% Query Latency versus Queries/sec
HWvs.SWLatencyandLoad
average software load
99.9% software latency
99.9% FPGA latency
average FPGA query load
31. Web search
ranking
Traditional software (CPU) server plane
QPICPU
QSFP
40Gb/s ToR
FPGA
CPU
40Gb/s
QSFP QSFP
Hardware acceleration plane
Interconnected FPGAs form a
separate plane of computation
Can be managed and used
independently from the CPU
Web search
ranking
Deep neural
networks
SDN offload
SQL
32. Flexibility: many services need a large number of FPGAs,
others underutilize theirs
Deploy exactly as many instances as needed
Many accelerators can handle load of multiple software clients
Consolidate underutilized FPGA accelerators into fewer shared instances
Increases efficiency & makes room for more accelerators
Many services need to access multiple types of accelerators
33. F F F
L0
L1
F F F
L0
Pretrained DNN Model DNN Hardware Microservice
DNN Engine
Instr Decoder
& Control
Neural FU
39. We look forward to
eventually making this
available to you,
a major step toward
democratizing AI with the
power of FPGA
àOur technology will push the boundary of what
is possible to deploy in the cloud
Deeper convolutional neural networks for more
accurate computer vision
Higher dimensional recurrent neural networks toward
human-like natural language processing
State-of-the-art translation and speech recognition
And much more…
This technology is already powering services
within Microsoft