SlideShare a Scribd company logo
1 of 29
Download to read offline
1
CAOS: A CAD Framework
for FPGA-Based Systems
06/08/2017
Xilinx, San Jose, CA
Marco Rabozzi & al.
marco.rabozzi@polimi.it
NECSTLab, Politecnico di Milano
2
Problem
• Next-generation Cloud and HPC applications
requires a great amount of computing power
– Bioinformatics
– Deep learning
– Virtual Reality
• CPUs and GPUs do not match the
applications closely
– Performance requirements not met
– Energy inefficient
exec.	
  
time
energy	
  consumption
3
Opportunity
• Specialized Hardware
+ Well suited for specific algorithms
– High risk investment
– Fixed architecture
• FPGAs
+ Performance/power/cost efficient solutions
. for a wide variety of applications
+ Flexible reconfigurable architecture
– Complex to program
4
Heterogeneous Complex Systems
F1
Project Catapult
5
Research Challenge
6
Contribution
7
The proposed CAOS framework
______
__________
___
________
______
__________
___
________
CAOS	
  Flow	
  Manager
Backend
Functions
Optimization
Frontend
<system
>
…
</system
>
System	
  
description
Profiling
datasets
______
__________
___
________
Application	
  code
(C,	
  C++,	
  OPENCL)
1 1 1 0 0 10110
1 0 1 0 1 01111
1 1 1 1 0 10101
0 1 0 1 0 10101
0 1 0 1 0 10101
0 1 1 1 1 1 1
FPGAs
bitstreams
______
__________
___
________
System	
  
runtime
ArchitecturalTemplates
SST
MaxCompiler
Output	
  Generation
8
Architectural template
• Defines the memory access model between
the host and the accelerator
– Streaming
– Block-based
• Defines the internal structure of the
accelerator
– Chain of a replicated base module
– NoC of heterogeneous modules
– Interconnected dataflow cores
– …
9
The case of the SST Architectural template
Single	
  Block:	
  
Streaming	
  Stencil	
  Time-­‐step	
  (SST)
Whole	
  Accelerator:
Queue	
  of	
  SSTs
10
Architectural templates objectives
• Narrow the Design Space Exploration (DSE)
for the accelerator
– Well defined set of potential optimizations
– Constrains the classes of supported algorithms
• Enable more accurate estimations
– Hardware resource requirements
– Operational intensity estimation
11
CAOS frontend
12
CAOS functions optimization
13
CAOS backend
14
Custom CAOS workflow
• CAOS allows to reorder the modules
executions to create custom workflows
• The case of the Smith-Waterman algorithm:
Static	
  Code	
  
Analysis
Performance	
  evaluation
(Roofline	
  Model)
CAOS	
  Backend	
  
(Implementation)
Code	
  
modification
Application	
  
Benchmark
CAOS	
  frontend
f
1
f
2
f
3
f
6
f
4
f
5
f
7
Identified	
  HW	
  
kernels
<system
>
…
</syste
m>
______
_______
___
___
_______
_
15
CAOS Infrastructure
CAOS Flow Manager
Module A Module B Module C Module D Module E
Frontend Flow Function Optimization
Flow
Backend
Flow
Web	
  Application
…
…
JSON	
  +	
  
File	
  archives
REST
interface
16
CAOS: Frontend
CAOS Flow Manager
Module A Module B Module C Module D Module E
Frontend Flow Function Optimization
Flow
Backend
Flow
Web	
  Application
…
…
JSON	
  +	
  
File	
  archives
REST
interface
17
Frontend – IR generation
• Functions extraction and generation of the
application call graph
• Current implementation leverages Doxygen
.c
.c
.c
f1
f2
f3
f6
f4
f5
f7
application IR:	
  call	
  graph	
  +	
  
functions	
  description
18
Frontend – applicability check
• Verifies the applicability of an architectural template w.r.t.:
– Application
– System description
• Detects candidates for hardware acceleration
f1
f2
f3
f6
f4
f5
f7
IR
Architectural
template	
  1
Architectural	
  
template	
  2
Architectural	
  
template	
  3
f1
f2
f3
f6
f4
f5
f7
f1
f2
f3
f6
f4
f5
f7
HW	
  candidate
19
Frontend - profiling
• Runs the application against multiple user-defined
datasets
• For each functions collects:
– Self execution time
– Total execution time
– Function calls
IR
f1
f2
f3
f6
f4
f5
f7
Datasets
f1
f2
f3
f6
f4
f5
f7
Profiled	
  IR
Total	
  =	
  100%
Self	
  =	
  2%	
  -­‐ 4%
7-­‐9	
  calls…
…
20
Frontend - HW/SW partitioning
• Identifies the subtree to accelerate for each
architectural template
• If needed, translate the identified code for
subsequent optimizations (e.g. C to MaxJ)
IR
f1
f2
f3
f6
f4
f5
f7Self	
  =	
  10%
Self	
  =	
  2%
Self	
  =	
  20%
f1
f2
f3
f6
f4
f5
f7Self	
  =	
  10%
Self	
  =	
  2%
Self	
  =	
  20%
21
CAOS: Function optimization
CAOS Flow Manager
Module A Module B Module C Module D Module E
Frontend Flow Function Optimization
Flow
Backend
Flow
Web	
  Application
…
…
JSON	
  +	
  
File	
  archives
REST
interface
22
Function optimization - Static code analysis
• Retrieve metrics on the current implementation for the
candidate HW functions
• Metrics are architectural template dependent
– Produce / consume rate of kernels (Maxeler)
– Estimated module latency (SST)
– Computational intensity (OpenCL)
23
Function optimization - Resource estimation
• Estimate resource requirements for the entire set
of functions to accelerate in HW
• Multiple resource estimation modules:
– Vivado HLS
• Might require a high execution time
• Accurate estimation
– Operations count-based estimation
• Fast execution time
• Coarse grain estimation
• MaxJ code support
24
Function optimization - Performance estimation
• Estimates performance leveraging data from
previous modules
• Proposes template-specific optimizations
– Modules replication factor
– Loop unrolling and pipelining
– Memory transfer optimizations
– Resource sharing
– Multi-FPGA data flow graph splitting
25
Function optimization - Code optimization
• Tightly coupled with the performance
estimation Module
• Applies one of the proposed optimization
(optimization choice made by the user)
• Regenerate the CAOS IR to enable further
optimizations or final system implementation
26
CAOS: Backend
CAOS Flow Manager
Module A Module B Module C Module D Module E
Frontend Flow Function Optimization
Flow
Backend
Flow
Web	
  Application
…
…
JSON	
  +	
  
File	
  archives
REST
interface
27
Backend
• Architectural template-specific implementation
• Generates the runtime for the target system
• Leverages vendor tools for bitstream generation
• Floorplanning + mapping and scheduling modules used by
Dyplo architectural template
SST
MaxCompiler
28
Conclusion & future works
• We presented
– A CAD flow that simplifies the acceleration of
Cloud and HPC applications
– A unifying framework to stimulate research on
FPGA-based systems
• Future works
– Integrate Amazon F1 within the CAOS backend
– Unify the the high level languages targeted by
the functions optimization into a single
abstraction
29
Thank you for your
attention!
QUESTIONS?

More Related Content

What's hot

XRM: An Event-based Resource Management Framework for XCP
XRM: An Event-based Resource Management Framework for XCPXRM: An Event-based Resource Management Framework for XCP
XRM: An Event-based Resource Management Framework for XCP
Pradeep Padala
 
FT Architecture For Cloud Service Computing
FT Architecture For Cloud Service ComputingFT Architecture For Cloud Service Computing
FT Architecture For Cloud Service Computing
destruck
 

What's hot (10)

Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
 
Spark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan RavatSpark Summit EU talk by Brij Bhushan Ravat
Spark Summit EU talk by Brij Bhushan Ravat
 
Experiences in Delivering Spark as a Service
Experiences in Delivering Spark as a ServiceExperiences in Delivering Spark as a Service
Experiences in Delivering Spark as a Service
 
dA Platform Overview
dA Platform OverviewdA Platform Overview
dA Platform Overview
 
Capital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msCapital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 ms
 
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
 
XRM: An Event-based Resource Management Framework for XCP
XRM: An Event-based Resource Management Framework for XCPXRM: An Event-based Resource Management Framework for XCP
XRM: An Event-based Resource Management Framework for XCP
 
FT Architecture For Cloud Service Computing
FT Architecture For Cloud Service ComputingFT Architecture For Cloud Service Computing
FT Architecture For Cloud Service Computing
 
Intern Report
Intern ReportIntern Report
Intern Report
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 

Similar to CAOS: A CAD Framework for FPGA-Based Systems

Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
jsvetter
 
Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed system
Deepak Shankar
 

Similar to CAOS: A CAD Framework for FPGA-Based Systems (20)

The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
 
How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Rsm Refactor April 2011
Rsm Refactor April 2011Rsm Refactor April 2011
Rsm Refactor April 2011
 
Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed system
 
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 Design Automation Approaches for Real-Time Edge Computing for Science Applic... Design Automation Approaches for Real-Time Edge Computing for Science Applic...
Design Automation Approaches for Real-Time Edge Computing for Science Applic...
 
AI Hardware for Real-Time Machine Learning
AI Hardware for Real-Time Machine LearningAI Hardware for Real-Time Machine Learning
AI Hardware for Real-Time Machine Learning
 
Microservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing MicroservicesMicroservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing Microservices
 
Resume2015
Resume2015Resume2015
Resume2015
 
05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC05 Preparing for Extreme Geterogeneity in HPC
05 Preparing for Extreme Geterogeneity in HPC
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
 
Syste O CHip Concepts for Students.ppt
Syste O CHip Concepts for Students.pptSyste O CHip Concepts for Students.ppt
Syste O CHip Concepts for Students.ppt
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
Webinar on RISC-V
Webinar on RISC-VWebinar on RISC-V
Webinar on RISC-V
 
A Software Factory Integrating Rational & WebSphere Tools
A Software Factory Integrating Rational & WebSphere ToolsA Software Factory Integrating Rational & WebSphere Tools
A Software Factory Integrating Rational & WebSphere Tools
 

More from NECST Lab @ Politecnico di Milano

Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
NECST Lab @ Politecnico di Milano
 

More from NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Recently uploaded

Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 

Recently uploaded (20)

KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 

CAOS: A CAD Framework for FPGA-Based Systems

  • 1. 1 CAOS: A CAD Framework for FPGA-Based Systems 06/08/2017 Xilinx, San Jose, CA Marco Rabozzi & al. marco.rabozzi@polimi.it NECSTLab, Politecnico di Milano
  • 2. 2 Problem • Next-generation Cloud and HPC applications requires a great amount of computing power – Bioinformatics – Deep learning – Virtual Reality • CPUs and GPUs do not match the applications closely – Performance requirements not met – Energy inefficient exec.   time energy  consumption
  • 3. 3 Opportunity • Specialized Hardware + Well suited for specific algorithms – High risk investment – Fixed architecture • FPGAs + Performance/power/cost efficient solutions . for a wide variety of applications + Flexible reconfigurable architecture – Complex to program
  • 7. 7 The proposed CAOS framework ______ __________ ___ ________ ______ __________ ___ ________ CAOS  Flow  Manager Backend Functions Optimization Frontend <system > … </system > System   description Profiling datasets ______ __________ ___ ________ Application  code (C,  C++,  OPENCL) 1 1 1 0 0 10110 1 0 1 0 1 01111 1 1 1 1 0 10101 0 1 0 1 0 10101 0 1 0 1 0 10101 0 1 1 1 1 1 1 FPGAs bitstreams ______ __________ ___ ________ System   runtime ArchitecturalTemplates SST MaxCompiler Output  Generation
  • 8. 8 Architectural template • Defines the memory access model between the host and the accelerator – Streaming – Block-based • Defines the internal structure of the accelerator – Chain of a replicated base module – NoC of heterogeneous modules – Interconnected dataflow cores – …
  • 9. 9 The case of the SST Architectural template Single  Block:   Streaming  Stencil  Time-­‐step  (SST) Whole  Accelerator: Queue  of  SSTs
  • 10. 10 Architectural templates objectives • Narrow the Design Space Exploration (DSE) for the accelerator – Well defined set of potential optimizations – Constrains the classes of supported algorithms • Enable more accurate estimations – Hardware resource requirements – Operational intensity estimation
  • 14. 14 Custom CAOS workflow • CAOS allows to reorder the modules executions to create custom workflows • The case of the Smith-Waterman algorithm: Static  Code   Analysis Performance  evaluation (Roofline  Model) CAOS  Backend   (Implementation) Code   modification Application   Benchmark CAOS  frontend f 1 f 2 f 3 f 6 f 4 f 5 f 7 Identified  HW   kernels <system > … </syste m> ______ _______ ___ ___ _______ _
  • 15. 15 CAOS Infrastructure CAOS Flow Manager Module A Module B Module C Module D Module E Frontend Flow Function Optimization Flow Backend Flow Web  Application … … JSON  +   File  archives REST interface
  • 16. 16 CAOS: Frontend CAOS Flow Manager Module A Module B Module C Module D Module E Frontend Flow Function Optimization Flow Backend Flow Web  Application … … JSON  +   File  archives REST interface
  • 17. 17 Frontend – IR generation • Functions extraction and generation of the application call graph • Current implementation leverages Doxygen .c .c .c f1 f2 f3 f6 f4 f5 f7 application IR:  call  graph  +   functions  description
  • 18. 18 Frontend – applicability check • Verifies the applicability of an architectural template w.r.t.: – Application – System description • Detects candidates for hardware acceleration f1 f2 f3 f6 f4 f5 f7 IR Architectural template  1 Architectural   template  2 Architectural   template  3 f1 f2 f3 f6 f4 f5 f7 f1 f2 f3 f6 f4 f5 f7 HW  candidate
  • 19. 19 Frontend - profiling • Runs the application against multiple user-defined datasets • For each functions collects: – Self execution time – Total execution time – Function calls IR f1 f2 f3 f6 f4 f5 f7 Datasets f1 f2 f3 f6 f4 f5 f7 Profiled  IR Total  =  100% Self  =  2%  -­‐ 4% 7-­‐9  calls… …
  • 20. 20 Frontend - HW/SW partitioning • Identifies the subtree to accelerate for each architectural template • If needed, translate the identified code for subsequent optimizations (e.g. C to MaxJ) IR f1 f2 f3 f6 f4 f5 f7Self  =  10% Self  =  2% Self  =  20% f1 f2 f3 f6 f4 f5 f7Self  =  10% Self  =  2% Self  =  20%
  • 21. 21 CAOS: Function optimization CAOS Flow Manager Module A Module B Module C Module D Module E Frontend Flow Function Optimization Flow Backend Flow Web  Application … … JSON  +   File  archives REST interface
  • 22. 22 Function optimization - Static code analysis • Retrieve metrics on the current implementation for the candidate HW functions • Metrics are architectural template dependent – Produce / consume rate of kernels (Maxeler) – Estimated module latency (SST) – Computational intensity (OpenCL)
  • 23. 23 Function optimization - Resource estimation • Estimate resource requirements for the entire set of functions to accelerate in HW • Multiple resource estimation modules: – Vivado HLS • Might require a high execution time • Accurate estimation – Operations count-based estimation • Fast execution time • Coarse grain estimation • MaxJ code support
  • 24. 24 Function optimization - Performance estimation • Estimates performance leveraging data from previous modules • Proposes template-specific optimizations – Modules replication factor – Loop unrolling and pipelining – Memory transfer optimizations – Resource sharing – Multi-FPGA data flow graph splitting
  • 25. 25 Function optimization - Code optimization • Tightly coupled with the performance estimation Module • Applies one of the proposed optimization (optimization choice made by the user) • Regenerate the CAOS IR to enable further optimizations or final system implementation
  • 26. 26 CAOS: Backend CAOS Flow Manager Module A Module B Module C Module D Module E Frontend Flow Function Optimization Flow Backend Flow Web  Application … … JSON  +   File  archives REST interface
  • 27. 27 Backend • Architectural template-specific implementation • Generates the runtime for the target system • Leverages vendor tools for bitstream generation • Floorplanning + mapping and scheduling modules used by Dyplo architectural template SST MaxCompiler
  • 28. 28 Conclusion & future works • We presented – A CAD flow that simplifies the acceleration of Cloud and HPC applications – A unifying framework to stimulate research on FPGA-based systems • Future works – Integrate Amazon F1 within the CAOS backend – Unify the the high level languages targeted by the functions optimization into a single abstraction
  • 29. 29 Thank you for your attention! QUESTIONS?