SlideShare une entreprise Scribd logo
1  sur  20
Télécharger pour lire hors ligne
Copyright © 2015 Auviz Systems 1
Nagesh Gupta
12 May 2015
Trade-offs in Implementing Deep Neural
Networks on FPGAs
Copyright © 2015 Auviz Systems 2
• Startup, specializes in implementing & optimizing algorithms on FPGAs
• Offers libraries of different classes of algorithms
• AuvizCV—optimized OpenCV algorithms
• AuvizLA —optimized BLAS
• AuvizDNN—optimized deep neural networks
• And develops custom algorithms in Computer Vision, Linear Algebra,
Deep Learning & Machine Learning
• Available as OpenCL function calls for software users to abstract the
complexity of using an FPGA
• Visit our booth & see AlexNet running on Xilinx FPGA!
Auviz Systems
Copyright © 2015 Auviz Systems 3
The Time for Artificial Intelligence &
Machine Learning
• Sources: Cisco/Statista, Facebook research, IT Business Edge
Copyright © 2015 Auviz Systems 4
Machine Learning Moving to the Data Center
Performance/watt
Programming model &
use model
Microsoft Azure ML—
provides Machine Learning as a service on the cloud
IBM Watson at Jeopardy—one of the
best demonstration of Machine Learning
Amazon AWS ML & Google Predictive Analytics —other
Machine Learning services on the cloud
Copyright © 2015 Auviz Systems 5
• A form of Deep Neural Networks—used for various “recognition” tasks
• AlexNet [2] is a CNN configuration as shown below was used to classify
1.2 million images
Convolutional Neural Networks (CNNs)
Copyright © 2015 Auviz Systems 6
• A convolution layer has multiple stages
• 3D Convolutions:
• Activation: Using the ReLU function, Max(x, 0)
• Max pooling: Sub-sampling function that selects the max value
within a neighborhood
Components of AlexNet—Convolution layers
3D Convolutions Activation (ReLU)
Sub-sampling
(Max pooling)
Copyright © 2015 Auviz Systems 7
• Dense layers are fully connected—each
output node is a function of all the input
nodes
• The first 2 dense layers can be represented
as a matrix-vector multiplication operation
• Layer 6 has 9216 inputs which are
multiplied with a weight matrix to
create 4096 outputs
• Layer 7 has 4096 inputs which are
multiplied with a different weight
matrix to create 4096 outputs
• The output layer uses SoftMax to classify
the input image into one of 1000 classes
Dense Layers in AlexNet
Layer 6 Layer 7
Output
layer
Copyright © 2015 Auviz Systems 8
• Sequential implementation
• Implementation follows the
convolution equations
• Resource utilization will be very low,
but the latency at 200 MHz will be
22s for the 2nd layer
• High level synthesis (HLS) can be used to
implement as shown in [3]
• Get better performance by parallelizing
the implementation
Implementing 3D Convolutions
Weight
Matrices
Input feature
maps
Output feature
maps
Copyright © 2015 Auviz Systems 9
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1.E+08
1.E+09 Computations Data transfers
Computations vs. Data Transfers in AlexNet
• Computation latency, 2nd
convolution layer
• With 512 single precision
floating point operations
the 2nd convolution layer
takes 2.2 ms to
complete at 200 MHz
• Data transfer latency, 2nd
convolution layer
• With 64 bit DDR, 1.3
Gb/s, single precision
floating point data fetch
latency is around 0.5 ms
3D convolutions require more number of computations, while the data
transfers are higher for the dense layers
Copyright © 2015 Auviz Systems 10
3D Convolution—Parallel Implementation
X =
• A 11x11 weight matrix with 3 input feature maps requires 121*3
multiplications and 121*3 adders
• With 363 multiply units and 363 adders, this can be done in 1 cycle
• The FPGA resources required for a each single precision floating point
operation are 2-5 DSP blocks and 200-400 LUTs
• Implementing this in parallel will require ~1200 DSPs and ~75000 LUTs
1 Output value
11x11 Weight Matrix 11x11 Input Feature
Map
Copyright © 2015 Auviz Systems 11
Increasing Throughput With Pipelining
• Pipelining is a hardware concept to achieve higher throughput
• Helpful with complex multi-cycle operations—works by registering
intermediate results
• Pipeline 3D convolutions on one dimension & parallelize the other
• For example, convolve the weight matrix with an input feature
map in parallel, and pipeline for different feature maps
• Zhang, et al [3] convolve a set of input
feature maps with a set of weight matrices in
parallel and pipeline for the size of the input
feature map
C
R
C’
R’
M number of NxKxK weight filters
N M
Tn
Tr
Tc
N
Tn
Tm
Input feature maps, NxRxC
K
K
N
Tn
Output feature maps, MxR’xC’
Copyright © 2015 Auviz Systems 12
• A simple way is to flatten feature maps and to create an array of
feature maps—below is an illustration for the first layer of AlexNet
• The weight matrices are flattened and the input feature maps are
rearranged for each column to have the neighborhood required for
convolutions
Mapping 3D Convolutions into Matrix
Multiplications
.
.96
55 x 55 = 3025
.
.96
3 x 11 x 11 = 363
.
.
3x11x11=363
55 x 55 = 3025
Y, matrix of output
feature maps
W, matrix of weight
coefficients
X, matrix of input
feature maps
Copyright © 2015 Auviz Systems 13
• Larger number of compute units exhausts
the FPGA resources
• Each compute unit takes a few hundred
LUTs and 3-5 DSPs
• Data organization to ensure the compute
units are performing to the max
• Need to read a lot of data in parallel
• Data has to be stored on-chip to enable
parallel access
• Routing turns out to be a bigger challenge
• Proper data organization, architecture
& tools are the way to overcome
Implementation Challenges
0
10000
20000
30000
40000
50000
60000
70000
80000
256 512 768
Bitsrequiredpercycle
Parallelism
Bits per operation
Copyright © 2015 Auviz Systems 14
• Single precision floating point
• Uses 32 bits to represent each data
• Requires more DSPs (3-5) to implement multiply/accumulate
• Fixed point
• 16-bit fixed point representation would suffice for many
applications [4]
• Stochastic rounding techniques perform similar to single precision
floating point representation [5]
• Half precision
• Uses 16 bits to represent data
• Significant reduction in routing & overall FPGA resources
• Mixed representation
• Use fixed point or half precision representation for some and single
precision representation for other layers
Using Alternate Data Representations
Copyright © 2015 Auviz Systems 15
• OpenCL tools enable software programmers to use the FPGA accelerator
without learning hardware methodologies
• Programmer calls OpenCL functions to accelerate on the FPGA
A complete CNN on the FPGA using OpenCL
Configure &
setup
3D
Convolutions
Dense layers Softmax
Copyright © 2015 Auviz Systems 16
Performance of AlexNet on FPGAs
FPGAs can achieve an impressive 14 images/sec/Watt compared to high
end GPUs such as Tesla K40, which can get to 4 images/sec/Watt
Copyright © 2015 Auviz Systems 17
• 3D convolutions are a key part of a CNN, and are compute intensive
• In FPGAs, 3D convolutions can be implemented efficiently with a
parallel & pipelined implementation
• FPGA resources—gates & routing will be the critical factors in
achieving a highly parallel implementation
• OpenCL implementation tools, such as Xilinx SDAccel simplify the
implementation task and provide a software flow
• Alternate data representations can be used to simplify the complexity
• Mixed data representations can simplify the computations without
compromising on the performance
• FPGAs are capable of delivering a high performance at a suitable power
profile for the data center
Summary
Copyright © 2015 Auviz Systems 18
• [1] Kevin Ovtcharov, et al, Accelerating Deep Convolutional Neural
Networks Using Specialized Hardware, Microsoft Research, 2015
• [2] A. Krizhevsky, I. Sutskever and G. E. Hinton, ImageNet Classification
with Deep Convolutional Neural Networks, Advances in Neural
Information Processing Systems, 2012
• [3] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao and J. Cong, Optimizing
FPGA-based Accelerator Design for Deep Convolutional Neural
Networks, FPGA'2015, 2015
• [4] Farabet, C., LeCun, Y., Kavukcuoglu, K., Culurciello, E., Martini, B.,
Akselrod, P., & Talay, S., “Large-scale FPGA-based convolutional
networks” in Machine Learning on Very Large Data Sets (2011).
• [5] Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish
Narayanan. "Deep Learning with Limited Numerical Precision." arXiv
preprint arXiv:1502.02551 (2015).
References
Copyright © 2015 Auviz Systems 19
Nagesh Gupta
12 May 2015
Deep Neural Networks in FPGAs
Copyright © 2015 Auviz Systems 20
Convolutionlayers
Input size Input
feature
maps
Output
feature
maps
Filter
size
Computations Total data
transfer
224 x 224 3 96 11x11 110 * 10^6 255 * 10^3
27 x 27 96 256 5x5 448 * 10^6 728 * 10^3
13 x 13 256 384 3x3 150 * 10^6 993 * 10^3
13 x 13 384 384 3x3 224 * 10^6 1457 * 10^3
13 x 13 384 256 3x3 150 * 10^6 959 * 10^3
Computations vs. Data TransfersDenselayers
Input data Weight matrix Computations Data transfers
9216 9216 x 4096 38 * 10^6 38 * 10^6
4096 4096 x 4096 16 * 10^6 16 * 10^6
4096 4096 x 1000 4 * 10^6 4 * 10^6

Contenu connexe

Tendances

Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationDevansh16
 
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen..."Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...Edge AI and Vision Alliance
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesAnirudh Koul
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...Edge AI and Vision Alliance
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep LearningBrahim HAMADICHAREF
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsChester Chen
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntel Nervana
 
Deep learning on mobile
Deep learning on mobileDeep learning on mobile
Deep learning on mobileAnirudh Koul
 
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P..."Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...Edge AI and Vision Alliance
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesKen Chatfield
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetAmazon Web Services
 
Serving BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServeServing BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServeNidhin Pattaniyil
 
Towards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentTowards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentNVIDIA Taiwan
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 

Tendances (20)

CNN Quantization
CNN QuantizationCNN Quantization
CNN Quantization
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen..."Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep Learning
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at Galvanize
 
On-Device AI
On-Device AIOn-Device AI
On-Device AI
 
Deep learning on mobile
Deep learning on mobileDeep learning on mobile
Deep learning on mobile
 
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P..."Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet Features
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
Scalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNetScalable Deep Learning Using Apache MXNet
Scalable Deep Learning Using Apache MXNet
 
Serving BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServeServing BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServe
 
Towards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentTowards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken Content
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 

En vedette

"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from IntelEdge AI and Vision Alliance
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 
Introduction to CNN with Application to Object Recognition
Introduction to CNN with Application to Object RecognitionIntroduction to CNN with Application to Object Recognition
Introduction to CNN with Application to Object RecognitionArtifacia
 
1 구글의탄생
1 구글의탄생1 구글의탄생
1 구글의탄생Yongjin Yim
 
IoT: Autonomous and Smart- Paul Guermonprez
IoT: Autonomous and Smart- Paul GuermonprezIoT: Autonomous and Smart- Paul Guermonprez
IoT: Autonomous and Smart- Paul GuermonprezWithTheBest
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...GeeksLab Odessa
 
"A Practitioner’s Guide to Commercializing Applications of Computer Vision," ...
"A Practitioner’s Guide to Commercializing Applications of Computer Vision," ..."A Practitioner’s Guide to Commercializing Applications of Computer Vision," ...
"A Practitioner’s Guide to Commercializing Applications of Computer Vision," ...Edge AI and Vision Alliance
 
OPEN_POWER8_SESSION_20150316
OPEN_POWER8_SESSION_20150316OPEN_POWER8_SESSION_20150316
OPEN_POWER8_SESSION_20150316기한 김
 
Startup Bootcamp - Session 4 of 8 - How to get your Startup Going
Startup Bootcamp - Session 4 of 8 - How to get your Startup GoingStartup Bootcamp - Session 4 of 8 - How to get your Startup Going
Startup Bootcamp - Session 4 of 8 - How to get your Startup GoingAmit Seth
 
중국의 슈퍼컴퓨터 연구개발
중국의 슈퍼컴퓨터 연구개발중국의 슈퍼컴퓨터 연구개발
중국의 슈퍼컴퓨터 연구개발Lee Jysoo
 
Intel APJ Enterprise Day - Intel puts Automotive Innovation into High Gear
Intel APJ Enterprise Day - Intel puts Automotive Innovation into High GearIntel APJ Enterprise Day - Intel puts Automotive Innovation into High Gear
Intel APJ Enterprise Day - Intel puts Automotive Innovation into High GearIntelAPAC
 
"Challenges in Object Detection on Embedded Devices," a Presentation from CEVA
"Challenges in Object Detection on Embedded Devices," a Presentation from CEVA"Challenges in Object Detection on Embedded Devices," a Presentation from CEVA
"Challenges in Object Detection on Embedded Devices," a Presentation from CEVAEdge AI and Vision Alliance
 
IoT & Machine Learning
IoT & Machine LearningIoT & Machine Learning
IoT & Machine Learning신동 강
 
Introduction to Recurrent Neural Network with Application to Sentiment Analys...
Introduction to Recurrent Neural Network with Application to Sentiment Analys...Introduction to Recurrent Neural Network with Application to Sentiment Analys...
Introduction to Recurrent Neural Network with Application to Sentiment Analys...Artifacia
 
Accelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardAccelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardNVIDIA
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensOscar Law
 
"Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv...
"Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv..."Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv...
"Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv...Edge AI and Vision Alliance
 

En vedette (19)

"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
Introduction to CNN with Application to Object Recognition
Introduction to CNN with Application to Object RecognitionIntroduction to CNN with Application to Object Recognition
Introduction to CNN with Application to Object Recognition
 
1 구글의탄생
1 구글의탄생1 구글의탄생
1 구글의탄생
 
IoT: Autonomous and Smart- Paul Guermonprez
IoT: Autonomous and Smart- Paul GuermonprezIoT: Autonomous and Smart- Paul Guermonprez
IoT: Autonomous and Smart- Paul Guermonprez
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...AI&BigData Lab. Артем Чернодуб  "Распознавание изображений методом Lazy Deep ...
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
 
"A Practitioner’s Guide to Commercializing Applications of Computer Vision," ...
"A Practitioner’s Guide to Commercializing Applications of Computer Vision," ..."A Practitioner’s Guide to Commercializing Applications of Computer Vision," ...
"A Practitioner’s Guide to Commercializing Applications of Computer Vision," ...
 
OPEN_POWER8_SESSION_20150316
OPEN_POWER8_SESSION_20150316OPEN_POWER8_SESSION_20150316
OPEN_POWER8_SESSION_20150316
 
Startup Bootcamp - Session 4 of 8 - How to get your Startup Going
Startup Bootcamp - Session 4 of 8 - How to get your Startup GoingStartup Bootcamp - Session 4 of 8 - How to get your Startup Going
Startup Bootcamp - Session 4 of 8 - How to get your Startup Going
 
중국의 슈퍼컴퓨터 연구개발
중국의 슈퍼컴퓨터 연구개발중국의 슈퍼컴퓨터 연구개발
중국의 슈퍼컴퓨터 연구개발
 
Intel APJ Enterprise Day - Intel puts Automotive Innovation into High Gear
Intel APJ Enterprise Day - Intel puts Automotive Innovation into High GearIntel APJ Enterprise Day - Intel puts Automotive Innovation into High Gear
Intel APJ Enterprise Day - Intel puts Automotive Innovation into High Gear
 
"Challenges in Object Detection on Embedded Devices," a Presentation from CEVA
"Challenges in Object Detection on Embedded Devices," a Presentation from CEVA"Challenges in Object Detection on Embedded Devices," a Presentation from CEVA
"Challenges in Object Detection on Embedded Devices," a Presentation from CEVA
 
IoT & Machine Learning
IoT & Machine LearningIoT & Machine Learning
IoT & Machine Learning
 
Introduction to Recurrent Neural Network with Application to Sentiment Analys...
Introduction to Recurrent Neural Network with Application to Sentiment Analys...Introduction to Recurrent Neural Network with Application to Sentiment Analys...
Introduction to Recurrent Neural Network with Application to Sentiment Analys...
 
Accelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardAccelerated Computing: The Path Forward
Accelerated Computing: The Path Forward
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
 
MaPU-HPCA2016
MaPU-HPCA2016MaPU-HPCA2016
MaPU-HPCA2016
 
"Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv...
"Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv..."Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv...
"Overcoming Barriers to Consumer Adoption of Vision-enabled Products and Serv...
 

Similaire à "Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation from Auviz Systems

Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15MLconf
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsKinetica
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...Edge AI and Vision Alliance
 
Optimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationOptimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationRECAP Project
 
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Elasticsearch
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
Challenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM MigrationChallenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM MigrationSarmad Makhdoom
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learninginside-BigData.com
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningCloudLightning
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computingpurplesea
 

Similaire à "Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation from Auviz Systems (20)

Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
Narayanan Sundaram, Research Scientist, Intel Labs at MLconf SF - 11/13/15
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Grid computiing
Grid computiingGrid computiing
Grid computiing
 
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database AnalyticsOperationalizing Machine Learning Using GPU-accelerated, In-database Analytics
Operationalizing Machine Learning Using GPU-accelerated, In-database Analytics
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
Optimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationOptimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource Configuration
 
Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival Brad stack - Digital Health and Well-Being Festival
Brad stack - Digital Health and Well-Being Festival
 
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes Bandwidth: Use Cases for Elastic Cloud on Kubernetes
Bandwidth: Use Cases for Elastic Cloud on Kubernetes
 
NextGenML
NextGenML NextGenML
NextGenML
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
oracle.pptx
oracle.pptxoracle.pptx
oracle.pptx
 
Challenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM MigrationChallenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM Migration
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
 
HPC in higher education
HPC in higher educationHPC in higher education
HPC in higher education
 

Plus de Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

Plus de Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Dernier

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Dernier (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation from Auviz Systems

  • 1. Copyright © 2015 Auviz Systems 1 Nagesh Gupta 12 May 2015 Trade-offs in Implementing Deep Neural Networks on FPGAs
  • 2. Copyright © 2015 Auviz Systems 2 • Startup, specializes in implementing & optimizing algorithms on FPGAs • Offers libraries of different classes of algorithms • AuvizCV—optimized OpenCV algorithms • AuvizLA —optimized BLAS • AuvizDNN—optimized deep neural networks • And develops custom algorithms in Computer Vision, Linear Algebra, Deep Learning & Machine Learning • Available as OpenCL function calls for software users to abstract the complexity of using an FPGA • Visit our booth & see AlexNet running on Xilinx FPGA! Auviz Systems
  • 3. Copyright © 2015 Auviz Systems 3 The Time for Artificial Intelligence & Machine Learning • Sources: Cisco/Statista, Facebook research, IT Business Edge
  • 4. Copyright © 2015 Auviz Systems 4 Machine Learning Moving to the Data Center Performance/watt Programming model & use model Microsoft Azure ML— provides Machine Learning as a service on the cloud IBM Watson at Jeopardy—one of the best demonstration of Machine Learning Amazon AWS ML & Google Predictive Analytics —other Machine Learning services on the cloud
  • 5. Copyright © 2015 Auviz Systems 5 • A form of Deep Neural Networks—used for various “recognition” tasks • AlexNet [2] is a CNN configuration as shown below was used to classify 1.2 million images Convolutional Neural Networks (CNNs)
  • 6. Copyright © 2015 Auviz Systems 6 • A convolution layer has multiple stages • 3D Convolutions: • Activation: Using the ReLU function, Max(x, 0) • Max pooling: Sub-sampling function that selects the max value within a neighborhood Components of AlexNet—Convolution layers 3D Convolutions Activation (ReLU) Sub-sampling (Max pooling)
  • 7. Copyright © 2015 Auviz Systems 7 • Dense layers are fully connected—each output node is a function of all the input nodes • The first 2 dense layers can be represented as a matrix-vector multiplication operation • Layer 6 has 9216 inputs which are multiplied with a weight matrix to create 4096 outputs • Layer 7 has 4096 inputs which are multiplied with a different weight matrix to create 4096 outputs • The output layer uses SoftMax to classify the input image into one of 1000 classes Dense Layers in AlexNet Layer 6 Layer 7 Output layer
  • 8. Copyright © 2015 Auviz Systems 8 • Sequential implementation • Implementation follows the convolution equations • Resource utilization will be very low, but the latency at 200 MHz will be 22s for the 2nd layer • High level synthesis (HLS) can be used to implement as shown in [3] • Get better performance by parallelizing the implementation Implementing 3D Convolutions Weight Matrices Input feature maps Output feature maps
  • 9. Copyright © 2015 Auviz Systems 9 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 Computations Data transfers Computations vs. Data Transfers in AlexNet • Computation latency, 2nd convolution layer • With 512 single precision floating point operations the 2nd convolution layer takes 2.2 ms to complete at 200 MHz • Data transfer latency, 2nd convolution layer • With 64 bit DDR, 1.3 Gb/s, single precision floating point data fetch latency is around 0.5 ms 3D convolutions require more number of computations, while the data transfers are higher for the dense layers
  • 10. Copyright © 2015 Auviz Systems 10 3D Convolution—Parallel Implementation X = • A 11x11 weight matrix with 3 input feature maps requires 121*3 multiplications and 121*3 adders • With 363 multiply units and 363 adders, this can be done in 1 cycle • The FPGA resources required for a each single precision floating point operation are 2-5 DSP blocks and 200-400 LUTs • Implementing this in parallel will require ~1200 DSPs and ~75000 LUTs 1 Output value 11x11 Weight Matrix 11x11 Input Feature Map
  • 11. Copyright © 2015 Auviz Systems 11 Increasing Throughput With Pipelining • Pipelining is a hardware concept to achieve higher throughput • Helpful with complex multi-cycle operations—works by registering intermediate results • Pipeline 3D convolutions on one dimension & parallelize the other • For example, convolve the weight matrix with an input feature map in parallel, and pipeline for different feature maps • Zhang, et al [3] convolve a set of input feature maps with a set of weight matrices in parallel and pipeline for the size of the input feature map C R C’ R’ M number of NxKxK weight filters N M Tn Tr Tc N Tn Tm Input feature maps, NxRxC K K N Tn Output feature maps, MxR’xC’
  • 12. Copyright © 2015 Auviz Systems 12 • A simple way is to flatten feature maps and to create an array of feature maps—below is an illustration for the first layer of AlexNet • The weight matrices are flattened and the input feature maps are rearranged for each column to have the neighborhood required for convolutions Mapping 3D Convolutions into Matrix Multiplications . .96 55 x 55 = 3025 . .96 3 x 11 x 11 = 363 . . 3x11x11=363 55 x 55 = 3025 Y, matrix of output feature maps W, matrix of weight coefficients X, matrix of input feature maps
  • 13. Copyright © 2015 Auviz Systems 13 • Larger number of compute units exhausts the FPGA resources • Each compute unit takes a few hundred LUTs and 3-5 DSPs • Data organization to ensure the compute units are performing to the max • Need to read a lot of data in parallel • Data has to be stored on-chip to enable parallel access • Routing turns out to be a bigger challenge • Proper data organization, architecture & tools are the way to overcome Implementation Challenges 0 10000 20000 30000 40000 50000 60000 70000 80000 256 512 768 Bitsrequiredpercycle Parallelism Bits per operation
  • 14. Copyright © 2015 Auviz Systems 14 • Single precision floating point • Uses 32 bits to represent each data • Requires more DSPs (3-5) to implement multiply/accumulate • Fixed point • 16-bit fixed point representation would suffice for many applications [4] • Stochastic rounding techniques perform similar to single precision floating point representation [5] • Half precision • Uses 16 bits to represent data • Significant reduction in routing & overall FPGA resources • Mixed representation • Use fixed point or half precision representation for some and single precision representation for other layers Using Alternate Data Representations
  • 15. Copyright © 2015 Auviz Systems 15 • OpenCL tools enable software programmers to use the FPGA accelerator without learning hardware methodologies • Programmer calls OpenCL functions to accelerate on the FPGA A complete CNN on the FPGA using OpenCL Configure & setup 3D Convolutions Dense layers Softmax
  • 16. Copyright © 2015 Auviz Systems 16 Performance of AlexNet on FPGAs FPGAs can achieve an impressive 14 images/sec/Watt compared to high end GPUs such as Tesla K40, which can get to 4 images/sec/Watt
  • 17. Copyright © 2015 Auviz Systems 17 • 3D convolutions are a key part of a CNN, and are compute intensive • In FPGAs, 3D convolutions can be implemented efficiently with a parallel & pipelined implementation • FPGA resources—gates & routing will be the critical factors in achieving a highly parallel implementation • OpenCL implementation tools, such as Xilinx SDAccel simplify the implementation task and provide a software flow • Alternate data representations can be used to simplify the complexity • Mixed data representations can simplify the computations without compromising on the performance • FPGAs are capable of delivering a high performance at a suitable power profile for the data center Summary
  • 18. Copyright © 2015 Auviz Systems 18 • [1] Kevin Ovtcharov, et al, Accelerating Deep Convolutional Neural Networks Using Specialized Hardware, Microsoft Research, 2015 • [2] A. Krizhevsky, I. Sutskever and G. E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, Advances in Neural Information Processing Systems, 2012 • [3] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao and J. Cong, Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, FPGA'2015, 2015 • [4] Farabet, C., LeCun, Y., Kavukcuoglu, K., Culurciello, E., Martini, B., Akselrod, P., & Talay, S., “Large-scale FPGA-based convolutional networks” in Machine Learning on Very Large Data Sets (2011). • [5] Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. "Deep Learning with Limited Numerical Precision." arXiv preprint arXiv:1502.02551 (2015). References
  • 19. Copyright © 2015 Auviz Systems 19 Nagesh Gupta 12 May 2015 Deep Neural Networks in FPGAs
  • 20. Copyright © 2015 Auviz Systems 20 Convolutionlayers Input size Input feature maps Output feature maps Filter size Computations Total data transfer 224 x 224 3 96 11x11 110 * 10^6 255 * 10^3 27 x 27 96 256 5x5 448 * 10^6 728 * 10^3 13 x 13 256 384 3x3 150 * 10^6 993 * 10^3 13 x 13 384 384 3x3 224 * 10^6 1457 * 10^3 13 x 13 384 256 3x3 150 * 10^6 959 * 10^3 Computations vs. Data TransfersDenselayers Input data Weight matrix Computations Data transfers 9216 9216 x 4096 38 * 10^6 38 * 10^6 4096 4096 x 4096 16 * 10^6 16 * 10^6 4096 4096 x 1000 4 * 10^6 4 * 10^6