8. ScaleYourInnovation 8
HowIntel®FPGAsenableDEEPLearningI/O
I/O
I/O
I/O
▪ Millions of reconfigurable logic elements & routing
fabric
▪ Thousands of 20Kb memory blocks & MLABs
▪ Thousands of variable precision digital signal
processing (DSP) blocks
▪ Hundreds of configurable I/O & high-speed
transceivers
▪ Programmable Datapath
▪ Customized Memory structure
▪ Configurable compute
9. ScaleYourInnovation 9
Adaptingtoinnovation
Many efforts to improve efficiency
▪ Batching
▪ Reduce bit width
▪ Sparse weights
▪ Sparse activations
▪ Weight sharing
▪ Compact network
SparseCNN
[CVPR’15]
Spatially SparseCNN
[CIFAR-10 winner ‘14]
Pruning
[NIPS’15]
TernaryConnec
t [ICLR’16]
BinaryConnect
[NIPS’15]
DeepComp
[ICLR’16]
HashedNets
[ICML’15]
XNORNet
SqueezeNet
I
X
W
=
···
···
O
3 2
1 3
13
1
3
Shared Weights
LeNet
[IEEE}
AlexNet
[ILSVRC’12}
VGG
[ILSVRC’14}
GoogleNet
[ILSVRC’14}
ResNet
[ILSVRC’15}
I W O
2
3
10. ScaleYourInnovation 10
Performanceimprovementovertime
Model
Sept-17
Baseline
Dec-17 Feb-18 Apr-18 Jun-18 Oct-18 Dec-18 (projected)
SqueezeNet 1x 1.13x 1.75x 2.61x 3.89x 4.33x 4.51x
GoogleNet 1x 1.13x 1.22x 1.46x 3.55x 4.11x 4.50x
▪ Continually adapting
the custom data flow,
memory hierarchy and
compute enables
improved performance
with the same power
footprint
Jun-17 Sep-17 Dec-17 Apr-18 Jul-18 Oct-18 Feb-19
Performance(img/s)
SqueezeNet and Googlenet
Performance over Time, Batch=1
11.
12. ScaleYourInnovation 12
Intel® FPGADeepLearning accelerationsuite
Pre-compiledGraphArchitecture ExampleTopologies
DDR
DDR
DDR
DDR
Configuration
Engine
AlexNet GoogleNet Tiny Yolo
SqueezeNetVGG16 ResNet 18
…*
ResNet 50ResNet 101
Memory
Reader
/Writer
Crossbar
CUST
OM*
PRIM
Conv
PE Array
Feature Map Cache
*Deeper customization options
COMING SOON!
PRIM PRIM
*More topologies added with every release
MobileNet ResNetSSD
SqueezeNet
SDD
13. ScaleYourInnovation 13
OpenvinoTM toolkitforintelfpgas
Anall-in-onesolutiontoeasily
harnessthebenefitsofFPGAs
▪ Enables developers and data scientists to take
their prototype application to production
▪ Utilize API-based & direct coding to maximize
performance
▪ Deeper customization capabilities coming
soon
OpenVINO™ Toolkit
IntelDeepLearning
DeploymentToolkit
Inference
Engine
Model
Optimizer
Intel FPGA DL
Acceleration Suite
TODAY’S INTEL FPGA
SUPPORTED
DEEP LEARNING FRAMEWORKS
Intel
Xeon®
Processor
Intel
FPGAHeterogeneous
CPU/FPGA
Deployment
Free Download
software.intel.com/openvino-toolkit
14. ScaleYourInnovation 14
Yourapplicationaccelerationwithfpgapoweredplatforms
*Please contact Intel representative for complete list of ODM manufacturers. Other names and brands may be claimed as the property of others.
INTERFACE
CURRENTLY MANUFACTURED
BY*
Mustang F-100
PCIe x8
Develop NN Model; Deploy across Intel® CPU, GPU, VPU, FPGA; Leverage common algorithms
SOFTWARE
TOOLS
SUPPORTED
PLATFORMS FOR
FPGA
Intel Programmable
Acceleration Card with
Intel Arria 10
PCIe x8
Intel® Arria® 10
Development Kit
PCIe x8
INTEL® INTEL®
Openvino™toolkit
15. ScaleYourInnovation 15
Usecase1:search
Solution Search
Looking for a quick path to deploy and accelerate instant
reverse image searches of products for retail convenience
Solution Success
Intel® FPGAs offered real-time AI inferencing using OpenVINO™
toolkit. This enabled engineers to map neural networks to FPGA,
accelerating image searches with increased throughput and lower
latency, all without the need for FPGA programming experience
Real-timeaioptimizedforperformance,powerandcost
OpenVINO™ Toolkit
Accelerating workloads,
enabling deep learning
capabilities for smarter and
faster ways to transform data
for competitive edge
Intel Programmable
Acceleration Card with
Intel Arria® 10 FPGA
Deployment ready PCIe-
based card with versatile
built-in multifunction
acceleration capabilities with
low-power dissipation and
low-profile form factor
Acceleration stack for
Intel® Xeon® CPU with
FPGAs
Abstracting programming
complexity and maximizing
ease of use by hot-swapping
accelerators and enabling
application portability for
Intel FPGA based
acceleration solutions
16. ScaleYourInnovation 16
UseCase2:Microsoft’sAIforEarth
Microsoft leverages the multimode
capabilities of Intel FPGAs to push through
the memory wall to maximize performance
Project Brainwave with Intel®
Stratix® 10 gives Performance/$
only $42 of compute*
200M Images, 20TB
Land cover mapping for the whole US
10+ minutes
*Microsoft’s Blog
17. ScaleYourInnovation 17
Summary
Delivering AI+ for Flexible system
level functionality
First to market to accelerate
evolving AI workloads
▪ OpenVINO™ Toolkit is free to download and enables you to deploy on Intel
FPGAs directly from TensorFlow or Caffe
▪ Intel’s FPGA architecture enables programmable datapath, custom
memory structure and configurable compute
INTELFPGASENABLE
18. ScaleYourInnovation 18
resources
Intel FPGA Training
https://www.intel.com/content/www/us/en/programmable/support/training/overview.html
Get started quickly with:
▪ Find out more online at ww w.intel.com/ai and www.intel.com/fpga
▪ Intel Tech.Decoded online webinars, tool
how-tos & quick tips
▪ Hands-on in-person events
Support
▪ Connect with Intel engineers & AI experts via the public Community Forum
Download
Free OPENVINO™ toolkit