FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
CAOS: A CAD Framework for FPGA-Based Systems
1. 1
CAOS: A CAD Framework
for FPGA-Based Systems
06/08/2017
Xilinx, San Jose, CA
Marco Rabozzi & al.
marco.rabozzi@polimi.it
NECSTLab, Politecnico di Milano
2. 2
Problem
• Next-generation Cloud and HPC applications
requires a great amount of computing power
– Bioinformatics
– Deep learning
– Virtual Reality
• CPUs and GPUs do not match the
applications closely
– Performance requirements not met
– Energy inefficient
exec.
time
energy
consumption
3. 3
Opportunity
• Specialized Hardware
+ Well suited for specific algorithms
– High risk investment
– Fixed architecture
• FPGAs
+ Performance/power/cost efficient solutions
. for a wide variety of applications
+ Flexible reconfigurable architecture
– Complex to program
8. 8
Architectural template
• Defines the memory access model between
the host and the accelerator
– Streaming
– Block-based
• Defines the internal structure of the
accelerator
– Chain of a replicated base module
– NoC of heterogeneous modules
– Interconnected dataflow cores
– …
9. 9
The case of the SST Architectural template
Single
Block:
Streaming
Stencil
Time-‐step
(SST)
Whole
Accelerator:
Queue
of
SSTs
10. 10
Architectural templates objectives
• Narrow the Design Space Exploration (DSE)
for the accelerator
– Well defined set of potential optimizations
– Constrains the classes of supported algorithms
• Enable more accurate estimations
– Hardware resource requirements
– Operational intensity estimation
14. 14
Custom CAOS workflow
• CAOS allows to reorder the modules
executions to create custom workflows
• The case of the Smith-Waterman algorithm:
Static
Code
Analysis
Performance
evaluation
(Roofline
Model)
CAOS
Backend
(Implementation)
Code
modification
Application
Benchmark
CAOS
frontend
f
1
f
2
f
3
f
6
f
4
f
5
f
7
Identified
HW
kernels
<system
>
…
</syste
m>
______
_______
___
___
_______
_
15. 15
CAOS Infrastructure
CAOS Flow Manager
Module A Module B Module C Module D Module E
Frontend Flow Function Optimization
Flow
Backend
Flow
Web
Application
…
…
JSON
+
File
archives
REST
interface
16. 16
CAOS: Frontend
CAOS Flow Manager
Module A Module B Module C Module D Module E
Frontend Flow Function Optimization
Flow
Backend
Flow
Web
Application
…
…
JSON
+
File
archives
REST
interface
17. 17
Frontend – IR generation
• Functions extraction and generation of the
application call graph
• Current implementation leverages Doxygen
.c
.c
.c
f1
f2
f3
f6
f4
f5
f7
application IR:
call
graph
+
functions
description
18. 18
Frontend – applicability check
• Verifies the applicability of an architectural template w.r.t.:
– Application
– System description
• Detects candidates for hardware acceleration
f1
f2
f3
f6
f4
f5
f7
IR
Architectural
template
1
Architectural
template
2
Architectural
template
3
f1
f2
f3
f6
f4
f5
f7
f1
f2
f3
f6
f4
f5
f7
HW
candidate
19. 19
Frontend - profiling
• Runs the application against multiple user-defined
datasets
• For each functions collects:
– Self execution time
– Total execution time
– Function calls
IR
f1
f2
f3
f6
f4
f5
f7
Datasets
f1
f2
f3
f6
f4
f5
f7
Profiled
IR
Total
=
100%
Self
=
2%
-‐ 4%
7-‐9
calls…
…
20. 20
Frontend - HW/SW partitioning
• Identifies the subtree to accelerate for each
architectural template
• If needed, translate the identified code for
subsequent optimizations (e.g. C to MaxJ)
IR
f1
f2
f3
f6
f4
f5
f7Self
=
10%
Self
=
2%
Self
=
20%
f1
f2
f3
f6
f4
f5
f7Self
=
10%
Self
=
2%
Self
=
20%
21. 21
CAOS: Function optimization
CAOS Flow Manager
Module A Module B Module C Module D Module E
Frontend Flow Function Optimization
Flow
Backend
Flow
Web
Application
…
…
JSON
+
File
archives
REST
interface
22. 22
Function optimization - Static code analysis
• Retrieve metrics on the current implementation for the
candidate HW functions
• Metrics are architectural template dependent
– Produce / consume rate of kernels (Maxeler)
– Estimated module latency (SST)
– Computational intensity (OpenCL)
23. 23
Function optimization - Resource estimation
• Estimate resource requirements for the entire set
of functions to accelerate in HW
• Multiple resource estimation modules:
– Vivado HLS
• Might require a high execution time
• Accurate estimation
– Operations count-based estimation
• Fast execution time
• Coarse grain estimation
• MaxJ code support
24. 24
Function optimization - Performance estimation
• Estimates performance leveraging data from
previous modules
• Proposes template-specific optimizations
– Modules replication factor
– Loop unrolling and pipelining
– Memory transfer optimizations
– Resource sharing
– Multi-FPGA data flow graph splitting
25. 25
Function optimization - Code optimization
• Tightly coupled with the performance
estimation Module
• Applies one of the proposed optimization
(optimization choice made by the user)
• Regenerate the CAOS IR to enable further
optimizations or final system implementation
26. 26
CAOS: Backend
CAOS Flow Manager
Module A Module B Module C Module D Module E
Frontend Flow Function Optimization
Flow
Backend
Flow
Web
Application
…
…
JSON
+
File
archives
REST
interface
27. 27
Backend
• Architectural template-specific implementation
• Generates the runtime for the target system
• Leverages vendor tools for bitstream generation
• Floorplanning + mapping and scheduling modules used by
Dyplo architectural template
SST
MaxCompiler
28. 28
Conclusion & future works
• We presented
– A CAD flow that simplifies the acceleration of
Cloud and HPC applications
– A unifying framework to stimulate research on
FPGA-based systems
• Future works
– Integrate Amazon F1 within the CAOS backend
– Unify the the high level languages targeted by
the functions optimization into a single
abstraction