SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
GPRM
Towards Automated Design Space Exploration
and Code Generation using
Type Transformations
S Waqar Nabi & Wim Vanderbauwhede
First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC'15)
Sunday, November 15, 2015
Austin, TX
www.tytra.org.uk
Using Safe Transformations and a
Cost-Model for HPC on FPGAs
 The TyTra project context
o Our approach, blue-sky target, down-to-earth target, where
we are now, how we are different
 Key contributions
o (1) Type transformations to create design-variants, (2) a new
Intermediate Language, and (3) an FPGA Cost model
 The cost model
o Performance and resource-usage estimates, some results
Using safe transformations and an associated light-weight cost-model opens the
route to a fully automated design-space exploration flow
THE CONTEXT
Our approach, blue-sky target, down-to-earth target, where we are now,
how we are different
Blue Sky Target
Blue Sky Target
Cost Model
Legacy
Scientific Code
Heterogeneous
HPC Target
Description
Optimized HPC
solution!
The goal that keeps us motivated!
( The pragmatic target is somewhat more modest…)
The Short-Term Target
Our focus is on FPGA targets, and we currently require design entry in a
Functional Language using High-Level Functions ( maps, folds) [ a kind of DSL]
The cunning plan…
1. Use the functional programming paradigm to (auto) generate
program-variants which translate to design-variants on the
FPGA.
2. Create an Intermediate Language that:
• Is able to capture points entire design-space
• Allows a light-weight cost-model to be built around it
• Is a convenient target for front-end compiler
3. Create a light-weight cost-model that can estimate the
performance and resource-utilization for each variant.
7
A performance portable code-base that builds on a purely software programming
paradigm.
And you may very well ask…
8
The jury is still out…
How our work is different
 Our observations on limitations of current tools and flows:
1. Design-entry in a custom high-level language which nevertheless has
hardware-specific semantics
2. Architecture of the FPGA-solution specified by programmer; compilers
cannot optimize it.
3. Solutions create soft-processors on the FPGA; not optimized for HPC
(orientation towards embedded applications)
4. Design-space exploration requires prohibitively long time
5. Compiler is application specific (e.g. DSP applications)
We are not there yet, but in principle, our approach entirely eliminates the first
four, and mitigates the fifth.
KEY CONTRIBUTIONS
(1) Type transformations for generating program variants, (2) a new
Intermediate Language, and (3) a light-weight Cost Model
1. Type Transformations to Generate
Program Variants
 Functional Programming
 Types
o More general than types in C
o Our focus is on types of functions that perform array
operations
o reshape, maps and folds
 Type transformations
o Can be derived automatically
o Provably correct
o Essentially reshape the arrays
A functional paradigm with high-level functions allows creation of design-variants
that are correct-by-construction.
Illustration of Variant Generation through
Type-Transformation
• typeA :Vect (im*jm*km) dataType --1D data
• Single execution thread
• typeB :Vect km (Vect im*jm dataType)--transformed 2D data
• (km concurrent execution threads)
• output = mappipe kernel_func input --original program
• inputTr = reshapeTo km input --reshaping data
• output = mappar (mappipe kernel_func) inputTr --new program
Simple and provably correct transformations in a high-level functional language
translates to design-variants on the FPGA.
• Manage-IR
• Deals with
• memory objects (arrays)
• streams (loops over arrays)
• offset streams
• loops over work-unit
• block-memory transfers
 Strongly and statically typed
 All computations expressed as SSA (Single-Static Assignments)
 Largely (and deliberately) based on the LLVM-IR
• Compute-IR
• Streaming model
• SSA instructions define
the datapath
2. A New Intermediate Language
2. A New Intermediate Language
Design Space
Estimation Space
The Cost Model
3. Cost Model
THE FPGA COST-MODEL
Performance Estimate, Resource-utilization estiamte, Experimental
Results
The Cost-Model Use-Case
17
A set of standardized experiments feed target-specific empirical data to the cost
model, and the rest comes from the IR descripition.
Two Types of Estimates
 Resource-Utilization Estimates
o ALUTs, REGs, DSPs
 Performance Estimates
o Estimating memory-access
bandwidth for specific data
patterns
o Estimating FPGA operating
frequency
18
Both estimates needed to allow compiler to choose the best design variant.
1. Resource Estimates
 Observation
o Regularity of FPGA fabric allows some very simple first or second order
expressions to be built up for most instructions based on a few
experiments.
 Key Determinants
o Primitive (SSA) instructions used in IR of the kernel functions
o Data-types
o Structure of various functions (par, comb, par, seq)
o Control logic over-head
19
A set of one-time simple synthesis experiments on the target device helps us
create a very accurate resource-utilization cost model
Resource Estimates - Example
20
Integer Division
Integer Multiplication
Light-weight cost expressions associated with every legal SSA instruction in the
TyTra-IR
2. Performance Estimate
 Effective Work-Unit Throughput (EWUT)
o Work-Unit = Executing the kernel over the entire index-space
 Key Determinants
o Memory execution model
o Sustained memory bandwidth for the target architecture and design-
variant
• Data-access pattern
o Design configuration of the FPGA
o Operating frequency of the FPGA
o Compute-bound or IO-bound?
21
Performance model is trickier, especially calculating estimates of sustained
memory bandwidth and FPGA operating frequency.
2. Performance Estimate
 Effective Work-Unit Throughput (EWUT)
o Work-Unit = Executing the kernel over the entire index-space
 Key Determinants
o Memory execution model
o Sustained memory bandwidth for the target architecture and design-
variant
• Data-access pattern
o Design configuration of the FPGA
o Operating frequency of the FPGA
o Compute-bound or IO-bound?
22
Performance model is trickier, especially calculating estimates of sustained
memory bandwidth and FPGA operating frequency.
Performance Estimate
Dependence on Memory Execution Model
Time
Activity
Host

Device-DRAM
Device-DRAM

Device-Buffers
Device-Buffers

Offset-Buffers
Kernel Pipeline
Execution
Three Types of memory executions
A given design-variant can be categorized based on:
- Architectural description
- IR description
Performance Estimate
Dependence on Memory Execution Model
Time
Activity
Host

Device-DRAM
Device-DRAM

Device-Buffers
Device-Buffers

Offset-Buffers
Kernel Pipeline
Execution
Three Types of memory executions
A given design-variant can be categorized based on:
- Architectural description
- IR description
Performance Estimate
Dependence on Memory Execution Model
Time
Activity
Host

Device-DRAM
Device-DRAM

Device-Buffers
Device-Buffers

Offset-Buffers
Kernel Pipeline
Execution
Work-Unit Iterations
Type A
All iterations
Performance Estimate
Dependence on Memory Execution Model
Time
Activity
Host

Device-DRAM
Device-DRAM

Device-Buffers
Device-Buffers

Offset-Buffers
Kernel Pipeline
Execution
First Iteration
only
Last Iteration
only
Work-Unit Iterations
Type B
All other
iterations
Performance Estimate
Dependence on Memory Execution Model
Time
Activity
Host

Device-DRAM
Device-DRAM

Device-Buffers
Device-Buffers

Offset-Buffers
Kernel Pipeline
Execution
First Iteration
only
Last Iteration
only
Work-Unit Iterations
Type C
All other
iterations
Once a design-variant is categorized, performance can be estimated accordingly
2. Performance Estimate
 Effective Work-Unit Throughput (EWUT)
o Work-Unit = Executing the kernel over the entire index-space
 Key Determinants
o Memory execution model
o Sustained memory bandwidth for the target architecture and
design-variant
• Data-access pattern
o Design configuration of the FPGA
o Operating frequency of the FPGA
o Compute-bound or IO-bound?
28
Performance model is trickier, especially calculating estimates of sustained
memory bandwidth and FPGA operating frequency.
Performance Estimate
Dependence on Data Access Pattern
 We have defined a rho (ρ) factor defined as a scaling factor of the
peak memory bandwidth
 Varies from 0-1
 Based on data-access pattern
 Derived empirically through one-time standardized experiments on
target node
29
2. Performance Estimate
 Effective Work-Unit Throughput (EWUT)
o Work-Unit = Executing the kernel over the entire index-space
 Key Determinants
o Memory execution model
o Sustained memory bandwidth for the target architecture and design-
variant
• Data-access pattern
o Design configuration of the FPGA
o Operating frequency of the FPGA
o Compute-bound or IO-bound?
30
Performance model is trickier, especially calculating estimates of sustained
memory bandwidth and FPGA operating frequency.
Determined from the IR
description of design-variant
Performance Estimates
The Parameters and their Evaluation
31
Performance Estimates
Parameters from Architecture Description
32
Performance Estimates
Parameters Calculated Empirically
33
Performance Estimates
Parameters derived from IR description of Kernel
34
Performance Estimates
The Expressions
35
Performance Estimates
The Expressions
36
Performance Estimates
The Expressions
37
Performance Estimates
The Expressions
38
Performance Estimates
The Expressions
39
Performance Estimates
The Expressions
40
Performance Estimates
Experimental Results (Type C)
41
Estimated (E) vs actual (A) cost and throughput for C2 and
C1 configurations of a successive over-relaxation kernel.
[Note that the cycles/kernel are estimated very accurately, but the Effective Workgroup Throughput
(EWGT) is off because of inaccuracy of frequency estimate for the FPGA]
Does the TyTraApproach Work?
Design-Space Exploration?
CONCLUSION
The route to Automated Design Space
Exploration on FPGAs for HPC Applications
 The larger aim is to create a turn-key compiler for:
Legacy scientific code  Heterogeneous HPC Platform
o Current focus is on FPGAs, and on using a Functional
Language design entry
 Our main contributions are:
o Type transformations to create design-variants,
o New Intermediate Language, and
o FPGA Cost model
 Our FPGA Cost Model
o Works on the TyTra-UIR, is light-weight, accurate (enough),
and allows us to evaluate design-variants
Using safe transformations on a functional language paradigm and a light-weight
cost-model to brings us closer to a turn-key HPC compiler for legacy code
46
Acknowledgement
We wish to acknowledge support
by EPSRC through grant EP/L00058X/1.
The woods are lovely, dark and deep,
But I havepromises to keep,
And lines to code before I sleep,
And lines to code before I sleep.

Contenu connexe

Tendances

Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
Ruymán Reyes
 
Fpga implementation of multilayer feed forward neural network architecture us...
Fpga implementation of multilayer feed forward neural network architecture us...Fpga implementation of multilayer feed forward neural network architecture us...
Fpga implementation of multilayer feed forward neural network architecture us...
Ece Rljit
 
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
Jason Hearne-McGuiness
 
An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...
Videoguy
 
A Common Backend for Hardware Acceleration of DSLs on FPGA
A Common Backend for Hardware Acceleration of DSLs on FPGAA Common Backend for Hardware Acceleration of DSLs on FPGA
A Common Backend for Hardware Acceleration of DSLs on FPGA
NECST Lab @ Politecnico di Milano
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 

Tendances (19)

A Review of FPGA-based design methodologies for efficient hardware Area estim...
A Review of FPGA-based design methodologies for efficient hardware Area estim...A Review of FPGA-based design methodologies for efficient hardware Area estim...
A Review of FPGA-based design methodologies for efficient hardware Area estim...
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 
Logic Synthesis
Logic SynthesisLogic Synthesis
Logic Synthesis
 
Fpga implementation of multilayer feed forward neural network architecture us...
Fpga implementation of multilayer feed forward neural network architecture us...Fpga implementation of multilayer feed forward neural network architecture us...
Fpga implementation of multilayer feed forward neural network architecture us...
 
Complex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its ApplicationsComplex Programmable Logic Device (CPLD) Architecture and Its Applications
Complex Programmable Logic Device (CPLD) Architecture and Its Applications
 
HDL (hardware description language) presentation
HDL (hardware description language) presentationHDL (hardware description language) presentation
HDL (hardware description language) presentation
 
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
C++ Data-flow Parallelism sounds great! But how practical is it? Let’s see ho...
 
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking TechnologyDesign of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
 
An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...An efficient transcoding algorithm for G.723.1 and G.729A ...
An efficient transcoding algorithm for G.723.1 and G.729A ...
 
A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.A Domain-Specific Embedded Language for Programming Parallel Architectures.
A Domain-Specific Embedded Language for Programming Parallel Architectures.
 
Introduction to fpga synthesis tools
Introduction to fpga synthesis toolsIntroduction to fpga synthesis tools
Introduction to fpga synthesis tools
 
Hdl
HdlHdl
Hdl
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
 
A Common Backend for Hardware Acceleration of DSLs on FPGA
A Common Backend for Hardware Acceleration of DSLs on FPGAA Common Backend for Hardware Acceleration of DSLs on FPGA
A Common Backend for Hardware Acceleration of DSLs on FPGA
 
Issues in implementation of parallel
Issues in implementation of parallelIssues in implementation of parallel
Issues in implementation of parallel
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
Chap7 slides
Chap7 slidesChap7 slides
Chap7 slides
 
Overview of digital design with Verilog HDL
Overview of digital design with Verilog HDLOverview of digital design with Verilog HDL
Overview of digital design with Verilog HDL
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 

En vedette

Thomas juli french
Thomas juli frenchThomas juli french
Thomas juli french
NASAPMC
 
Jody.singer
Jody.singerJody.singer
Jody.singer
NASAPMC
 
Saraiva, 2016 educação a distãncia no brasil lições da história
Saraiva, 2016 educação a distãncia no brasil lições da históriaSaraiva, 2016 educação a distãncia no brasil lições da história
Saraiva, 2016 educação a distãncia no brasil lições da história
UFRJ
 
Report - Distance Counter
Report - Distance CounterReport - Distance Counter
Report - Distance Counter
Charlie Weld
 

En vedette (19)

JAXA's Activities 2010
JAXA's Activities 2010JAXA's Activities 2010
JAXA's Activities 2010
 
China's Space Progress
China's Space ProgressChina's Space Progress
China's Space Progress
 
The Future of Solar System Exploration
The Future of Solar System ExplorationThe Future of Solar System Exploration
The Future of Solar System Exploration
 
Altair: Constellation Returns Humans to the Moon
Altair: Constellation Returns Humans to the MoonAltair: Constellation Returns Humans to the Moon
Altair: Constellation Returns Humans to the Moon
 
Goddard 2015: Mark Clampin, NASA
Goddard 2015: Mark Clampin, NASAGoddard 2015: Mark Clampin, NASA
Goddard 2015: Mark Clampin, NASA
 
Thomas juli french
Thomas juli frenchThomas juli french
Thomas juli french
 
Goddard 2015: Joe Cassady, Aerojet Rocketdyne
Goddard 2015: Joe Cassady, Aerojet RocketdyneGoddard 2015: Joe Cassady, Aerojet Rocketdyne
Goddard 2015: Joe Cassady, Aerojet Rocketdyne
 
Astronauts and Robots 2015: Steve Stich, NASA
Astronauts and Robots 2015: Steve Stich, NASAAstronauts and Robots 2015: Steve Stich, NASA
Astronauts and Robots 2015: Steve Stich, NASA
 
Astronauts and Robots 2015: Jim Crocker, Lockheed Martin
Astronauts and Robots 2015: Jim Crocker, Lockheed MartinAstronauts and Robots 2015: Jim Crocker, Lockheed Martin
Astronauts and Robots 2015: Jim Crocker, Lockheed Martin
 
Jody.singer
Jody.singerJody.singer
Jody.singer
 
NASA SPACE LAUNCH SYSTEM -A Complete Guide
NASA SPACE LAUNCH SYSTEM -A Complete GuideNASA SPACE LAUNCH SYSTEM -A Complete Guide
NASA SPACE LAUNCH SYSTEM -A Complete Guide
 
Space Launch System (SLS)
Space Launch System (SLS)Space Launch System (SLS)
Space Launch System (SLS)
 
Youtube
Youtube Youtube
Youtube
 
Purpose; corporate revolution or corporate bandwagon?
Purpose; corporate revolution or corporate bandwagon?Purpose; corporate revolution or corporate bandwagon?
Purpose; corporate revolution or corporate bandwagon?
 
Saraiva, 2016 educação a distãncia no brasil lições da história
Saraiva, 2016 educação a distãncia no brasil lições da históriaSaraiva, 2016 educação a distãncia no brasil lições da história
Saraiva, 2016 educação a distãncia no brasil lições da história
 
Report - Distance Counter
Report - Distance CounterReport - Distance Counter
Report - Distance Counter
 
Hoogbergen 28 jan 2010 2/3
Hoogbergen 28 jan 2010 2/3Hoogbergen 28 jan 2010 2/3
Hoogbergen 28 jan 2010 2/3
 
Steve Majors Teg
Steve Majors TegSteve Majors Teg
Steve Majors Teg
 
Mercado laboral y empleo
Mercado laboral y empleoMercado laboral y empleo
Mercado laboral y empleo
 

Similaire à Towards Automated Design Space Exploration and Code Generation using Type Transformations

Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
NomanSiddiqui41
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environment
Ericsson
 

Similaire à Towards Automated Design Space Exploration and Code Generation using Type Transformations (20)

Summer training vhdl
Summer training vhdlSummer training vhdl
Summer training vhdl
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
Summer training vhdl
Summer training vhdlSummer training vhdl
Summer training vhdl
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
PEARC17: Interactive Code Adaptation Tool for Modernizing Applications for In...
PEARC17: Interactive Code Adaptation Tool for Modernizing Applications for In...PEARC17: Interactive Code Adaptation Tool for Modernizing Applications for In...
PEARC17: Interactive Code Adaptation Tool for Modernizing Applications for In...
 
Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...Performance Characterization and Optimization of In-Memory Data Analytics on ...
Performance Characterization and Optimization of In-Memory Data Analytics on ...
 
Lecture syn 024.cpld-fpga
Lecture syn 024.cpld-fpgaLecture syn 024.cpld-fpga
Lecture syn 024.cpld-fpga
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
 
1570514051.pptx
1570514051.pptx1570514051.pptx
1570514051.pptx
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Reconfigurable Coprocessors Synthesis in the MPEG-RVC Domain
Reconfigurable Coprocessors Synthesis in the MPEG-RVC DomainReconfigurable Coprocessors Synthesis in the MPEG-RVC Domain
Reconfigurable Coprocessors Synthesis in the MPEG-RVC Domain
 
0507036
05070360507036
0507036
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Scolari's ICCD17 Talk
Scolari's ICCD17 TalkScolari's ICCD17 Talk
Scolari's ICCD17 Talk
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark Performance
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
Conference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environmentConference Paper: Universal Node: Towards a high-performance NFV environment
Conference Paper: Universal Node: Towards a high-performance NFV environment
 
Embedded systems-unit-1
Embedded systems-unit-1Embedded systems-unit-1
Embedded systems-unit-1
 

Dernier

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 

Towards Automated Design Space Exploration and Code Generation using Type Transformations

  • 1. GPRM Towards Automated Design Space Exploration and Code Generation using Type Transformations S Waqar Nabi & Wim Vanderbauwhede First International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC'15) Sunday, November 15, 2015 Austin, TX www.tytra.org.uk
  • 2. Using Safe Transformations and a Cost-Model for HPC on FPGAs  The TyTra project context o Our approach, blue-sky target, down-to-earth target, where we are now, how we are different  Key contributions o (1) Type transformations to create design-variants, (2) a new Intermediate Language, and (3) an FPGA Cost model  The cost model o Performance and resource-usage estimates, some results Using safe transformations and an associated light-weight cost-model opens the route to a fully automated design-space exploration flow
  • 3. THE CONTEXT Our approach, blue-sky target, down-to-earth target, where we are now, how we are different
  • 5. Blue Sky Target Cost Model Legacy Scientific Code Heterogeneous HPC Target Description Optimized HPC solution! The goal that keeps us motivated! ( The pragmatic target is somewhat more modest…)
  • 6. The Short-Term Target Our focus is on FPGA targets, and we currently require design entry in a Functional Language using High-Level Functions ( maps, folds) [ a kind of DSL]
  • 7. The cunning plan… 1. Use the functional programming paradigm to (auto) generate program-variants which translate to design-variants on the FPGA. 2. Create an Intermediate Language that: • Is able to capture points entire design-space • Allows a light-weight cost-model to be built around it • Is a convenient target for front-end compiler 3. Create a light-weight cost-model that can estimate the performance and resource-utilization for each variant. 7 A performance portable code-base that builds on a purely software programming paradigm.
  • 8. And you may very well ask… 8 The jury is still out…
  • 9. How our work is different  Our observations on limitations of current tools and flows: 1. Design-entry in a custom high-level language which nevertheless has hardware-specific semantics 2. Architecture of the FPGA-solution specified by programmer; compilers cannot optimize it. 3. Solutions create soft-processors on the FPGA; not optimized for HPC (orientation towards embedded applications) 4. Design-space exploration requires prohibitively long time 5. Compiler is application specific (e.g. DSP applications) We are not there yet, but in principle, our approach entirely eliminates the first four, and mitigates the fifth.
  • 10. KEY CONTRIBUTIONS (1) Type transformations for generating program variants, (2) a new Intermediate Language, and (3) a light-weight Cost Model
  • 11. 1. Type Transformations to Generate Program Variants  Functional Programming  Types o More general than types in C o Our focus is on types of functions that perform array operations o reshape, maps and folds  Type transformations o Can be derived automatically o Provably correct o Essentially reshape the arrays A functional paradigm with high-level functions allows creation of design-variants that are correct-by-construction.
  • 12. Illustration of Variant Generation through Type-Transformation • typeA :Vect (im*jm*km) dataType --1D data • Single execution thread • typeB :Vect km (Vect im*jm dataType)--transformed 2D data • (km concurrent execution threads) • output = mappipe kernel_func input --original program • inputTr = reshapeTo km input --reshaping data • output = mappar (mappipe kernel_func) inputTr --new program Simple and provably correct transformations in a high-level functional language translates to design-variants on the FPGA.
  • 13. • Manage-IR • Deals with • memory objects (arrays) • streams (loops over arrays) • offset streams • loops over work-unit • block-memory transfers  Strongly and statically typed  All computations expressed as SSA (Single-Static Assignments)  Largely (and deliberately) based on the LLVM-IR • Compute-IR • Streaming model • SSA instructions define the datapath 2. A New Intermediate Language
  • 14. 2. A New Intermediate Language
  • 15. Design Space Estimation Space The Cost Model 3. Cost Model
  • 16. THE FPGA COST-MODEL Performance Estimate, Resource-utilization estiamte, Experimental Results
  • 17. The Cost-Model Use-Case 17 A set of standardized experiments feed target-specific empirical data to the cost model, and the rest comes from the IR descripition.
  • 18. Two Types of Estimates  Resource-Utilization Estimates o ALUTs, REGs, DSPs  Performance Estimates o Estimating memory-access bandwidth for specific data patterns o Estimating FPGA operating frequency 18 Both estimates needed to allow compiler to choose the best design variant.
  • 19. 1. Resource Estimates  Observation o Regularity of FPGA fabric allows some very simple first or second order expressions to be built up for most instructions based on a few experiments.  Key Determinants o Primitive (SSA) instructions used in IR of the kernel functions o Data-types o Structure of various functions (par, comb, par, seq) o Control logic over-head 19 A set of one-time simple synthesis experiments on the target device helps us create a very accurate resource-utilization cost model
  • 20. Resource Estimates - Example 20 Integer Division Integer Multiplication Light-weight cost expressions associated with every legal SSA instruction in the TyTra-IR
  • 21. 2. Performance Estimate  Effective Work-Unit Throughput (EWUT) o Work-Unit = Executing the kernel over the entire index-space  Key Determinants o Memory execution model o Sustained memory bandwidth for the target architecture and design- variant • Data-access pattern o Design configuration of the FPGA o Operating frequency of the FPGA o Compute-bound or IO-bound? 21 Performance model is trickier, especially calculating estimates of sustained memory bandwidth and FPGA operating frequency.
  • 22. 2. Performance Estimate  Effective Work-Unit Throughput (EWUT) o Work-Unit = Executing the kernel over the entire index-space  Key Determinants o Memory execution model o Sustained memory bandwidth for the target architecture and design- variant • Data-access pattern o Design configuration of the FPGA o Operating frequency of the FPGA o Compute-bound or IO-bound? 22 Performance model is trickier, especially calculating estimates of sustained memory bandwidth and FPGA operating frequency.
  • 23. Performance Estimate Dependence on Memory Execution Model Time Activity Host  Device-DRAM Device-DRAM  Device-Buffers Device-Buffers  Offset-Buffers Kernel Pipeline Execution Three Types of memory executions A given design-variant can be categorized based on: - Architectural description - IR description
  • 24. Performance Estimate Dependence on Memory Execution Model Time Activity Host  Device-DRAM Device-DRAM  Device-Buffers Device-Buffers  Offset-Buffers Kernel Pipeline Execution Three Types of memory executions A given design-variant can be categorized based on: - Architectural description - IR description
  • 25. Performance Estimate Dependence on Memory Execution Model Time Activity Host  Device-DRAM Device-DRAM  Device-Buffers Device-Buffers  Offset-Buffers Kernel Pipeline Execution Work-Unit Iterations Type A All iterations
  • 26. Performance Estimate Dependence on Memory Execution Model Time Activity Host  Device-DRAM Device-DRAM  Device-Buffers Device-Buffers  Offset-Buffers Kernel Pipeline Execution First Iteration only Last Iteration only Work-Unit Iterations Type B All other iterations
  • 27. Performance Estimate Dependence on Memory Execution Model Time Activity Host  Device-DRAM Device-DRAM  Device-Buffers Device-Buffers  Offset-Buffers Kernel Pipeline Execution First Iteration only Last Iteration only Work-Unit Iterations Type C All other iterations Once a design-variant is categorized, performance can be estimated accordingly
  • 28. 2. Performance Estimate  Effective Work-Unit Throughput (EWUT) o Work-Unit = Executing the kernel over the entire index-space  Key Determinants o Memory execution model o Sustained memory bandwidth for the target architecture and design-variant • Data-access pattern o Design configuration of the FPGA o Operating frequency of the FPGA o Compute-bound or IO-bound? 28 Performance model is trickier, especially calculating estimates of sustained memory bandwidth and FPGA operating frequency.
  • 29. Performance Estimate Dependence on Data Access Pattern  We have defined a rho (ρ) factor defined as a scaling factor of the peak memory bandwidth  Varies from 0-1  Based on data-access pattern  Derived empirically through one-time standardized experiments on target node 29
  • 30. 2. Performance Estimate  Effective Work-Unit Throughput (EWUT) o Work-Unit = Executing the kernel over the entire index-space  Key Determinants o Memory execution model o Sustained memory bandwidth for the target architecture and design- variant • Data-access pattern o Design configuration of the FPGA o Operating frequency of the FPGA o Compute-bound or IO-bound? 30 Performance model is trickier, especially calculating estimates of sustained memory bandwidth and FPGA operating frequency. Determined from the IR description of design-variant
  • 31. Performance Estimates The Parameters and their Evaluation 31
  • 32. Performance Estimates Parameters from Architecture Description 32
  • 34. Performance Estimates Parameters derived from IR description of Kernel 34
  • 41. Performance Estimates Experimental Results (Type C) 41 Estimated (E) vs actual (A) cost and throughput for C2 and C1 configurations of a successive over-relaxation kernel. [Note that the cycles/kernel are estimated very accurately, but the Effective Workgroup Throughput (EWGT) is off because of inaccuracy of frequency estimate for the FPGA]
  • 45. The route to Automated Design Space Exploration on FPGAs for HPC Applications  The larger aim is to create a turn-key compiler for: Legacy scientific code  Heterogeneous HPC Platform o Current focus is on FPGAs, and on using a Functional Language design entry  Our main contributions are: o Type transformations to create design-variants, o New Intermediate Language, and o FPGA Cost model  Our FPGA Cost Model o Works on the TyTra-UIR, is light-weight, accurate (enough), and allows us to evaluate design-variants Using safe transformations on a functional language paradigm and a light-weight cost-model to brings us closer to a turn-key HPC compiler for legacy code
  • 46. 46 Acknowledgement We wish to acknowledge support by EPSRC through grant EP/L00058X/1. The woods are lovely, dark and deep, But I havepromises to keep, And lines to code before I sleep, And lines to code before I sleep.