SlideShare une entreprise Scribd logo
1  sur  24
Design of Programmable Accelerators for SoCs Gert Goossens CEO Target Compiler Technologies
Abstract ,[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],SoC Design
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SoC Design
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to Design ASIPs? ,[object Object]
How to Design ASIPs? Design step Benefits ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to Design ASIPs? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tool Comparison ,[object Object],[object Object],[object Object],[object Object],Architectural style Example vendors Approach Yes High Yes EDA license Flexible, using processor description language Target (IP Designer),  CoWare (Processor Designer) Retargetable ASIP design tools Yes  Low (within template boundaries) Yes Royalties Configurable ASIP template + extension instructions Tensilica, ARC,  ASIP Solutions, SiliconHive Configurable ASIP templates No  High Depends on tool EDA license Hardwired datapath,  no programmability Mentor (CatapultC),  Forte, Synfora, Cadence (C2S) High-level synthesis from C —   (*) (*) No strong focus for CoWare?
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Programmable Datapath Examples  Examples shown  Served by IP Designer
What is a Programmable Datapath? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SEQ PM DEC s 0 s 1 s 2 d+=(a+b)*c; g+=(e-f)*f;
Prog. Datapath Example: WLAN ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Matrix inversion Matrix inversion + Address computations Address computations Complex conjugate Square modulus [1]  Medea+ project “Uppermost”
[object Object],[object Object],[object Object],[object Object],Prog. Datapath Example: WLAN Dual Port Memory Common Program Control GMAC 0 Dual Port Memory GMAC 1 Dual Port Memory GMAC 2 Dual Port Memory GMAC 3 Channel Estimation ASIP GMAC
Prog. Datapath Example: WLAN ,[object Object],reg  R[8] <vcmpl>  read(tR0, tR1,  tR2, tR3, tR4,  tR5); reg  ACC <vcmpl>; pipe  P0 <vcmpl>; pipe  P1 <vcmpl>; trn  tC0 <vcmpl>; trn  tC1 <vcmpl>; trn  tM0 <vcmpl>; trn  tM1 <vcmpl>; enum  gmac_op {mpy_mpy_mac, mac, sq_sq_mac, minv, ...}; opn  gmac(g:gmac_op, r0:c3, r1:c3, r2:c3, r3:c3, r4:c3, r5:c3) { action  { stage  E1: switch  (g) { case  mpy_mpy_mac: tC0 = ccnj(tR2 = R[r2]);  P0 = cmpy(tR1 = R[r1], tC0); tC1 = ccnj(tR3 = R[r3]); P1 = cmpy(tR4 = R[r4], tC1 ); case  mac: P0 = tR0 = R[r0]; P1 = tR5 = R[r5]; case  sq_sq_mac: P0 = cmpy(tR1 = R[r1], tR2 = R[r1]); P1 = cmpy(tR4 = R[r4], tR3 = R[r4]); case  minv: P0 = tR0 = R[r0]; tM0 = cmpy(tR1 = R[r1], tR2 = R[r2]); tM1 = cmpy(tR4 = R[r4], tR3 = R[r3]); P1 = csub(tM0, tM1); case  ... } stage  E2: tM = cmpy(P0, P1); ACC = cadd(tM, ACC);  } }  Resources Instruction-set   grammar
Prog. Datapath Example: WLAN ,[object Object],COMPILATION ENGINE (PHASE COUPLING) Application C Machine code Elf / Dwarf Processor model nML ISG sub_AB sub_BA add_AB add_BA A B C <<_C AR_w CDFG + << nML FRONT-END C FRONT-END SOURCE-LEVEL TRANSF. CODE SELECTION REGISTER ALLOCATION SCHEDULING CODE EMISSION
Prog. Datapath Example: FFT ,[object Object],[object Object],[object Object],[object Object],[object Object]
Prog. Datapath Example: FFT ,[object Object],[object Object],[object Object],[object Object],Mdata Mcoef A[4] B[4] CMPY BFLY ld A/B Ld C stA/B * * * * - + + + - -
Prog. Datapath Example: FFT ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],/* 0 */   DO cnt,LE /* 1 */   /* delay slot */ /* 2 */   md=*pa(next_bfly) | *pb(+s)=b1 | mc=*pr(next_bfly_rdx4) | a2=md*mc |   b3,b2=bfly(a2,a3) /* 3 */   md=*pa(+s) | *pb(+s)=b3 | mc=*pr(+s) | a3=md*mc | b1,a2=bfly(a1,a2) /* 4 */   md=*pa(+s) | *pb(+s)=b0 | mc=*pr(+s) | a1=md*mc | b0,a3=bfly(a0,a3) /* 5 */   md=*pa(+s) | *pb(next_bfly)=b2 | mc=*pr(+s) |a0=md*mc | b1,b0=bfly(b1,b0) LDA LDC MPY LDA LDC MPY LDA LDC MPY LDA LDC MPY BFLY BFLY BFLY BFLY STB STB STB STB LDA STB LDC MPY BFLY
Prog. Datapath Example: FFT ,[object Object],[object Object],[object Object],[object Object],COMPILATION ENGINE (PHASE COUPLING) Application C Machine code Elf / Dwarf Processor model nML ISG sub_AB sub_BA add_AB add_BA A B C <<_C AR_w CDFG + << nML FRONT-END C FRONT-END SOURCE-LEVEL TRANSF. CODE SELECTION REGISTER ALLOCATION SCHEDULING CODE EMISSION
Prog. Datapath Example: FFT ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object]

Contenu connexe

Tendances

IBM XL Compilers Performance Tuning 2016-11-18
IBM XL Compilers Performance Tuning 2016-11-18IBM XL Compilers Performance Tuning 2016-11-18
IBM XL Compilers Performance Tuning 2016-11-18
Yaoqing Gao
 
Pla pal-and-pla-optimization
Pla pal-and-pla-optimizationPla pal-and-pla-optimization
Pla pal-and-pla-optimization
Sai Kumar
 
The 8051 assembly language
The 8051 assembly languageThe 8051 assembly language
The 8051 assembly language
hemant meena
 

Tendances (18)

IBM XL Compilers Performance Tuning 2016-11-18
IBM XL Compilers Performance Tuning 2016-11-18IBM XL Compilers Performance Tuning 2016-11-18
IBM XL Compilers Performance Tuning 2016-11-18
 
Review of high-speed phase accumulator for direct digital frequency synthesizer
Review of high-speed phase accumulator for direct digital frequency synthesizer Review of high-speed phase accumulator for direct digital frequency synthesizer
Review of high-speed phase accumulator for direct digital frequency synthesizer
 
Pla pal-and-pla-optimization
Pla pal-and-pla-optimizationPla pal-and-pla-optimization
Pla pal-and-pla-optimization
 
0507036
05070360507036
0507036
 
Assembly Language Lecture 1
Assembly Language Lecture 1Assembly Language Lecture 1
Assembly Language Lecture 1
 
8085 branching instruction
8085 branching instruction8085 branching instruction
8085 branching instruction
 
8051 instruction set
8051 instruction set8051 instruction set
8051 instruction set
 
Effective replacement of dynamic polymorphism with std::variant
Effective replacement of dynamic polymorphism with std::variantEffective replacement of dynamic polymorphism with std::variant
Effective replacement of dynamic polymorphism with std::variant
 
Addressing modes
Addressing modesAddressing modes
Addressing modes
 
Spectra IP Core ORB - high-performance, low-latency solution for FPGA-GPP com...
Spectra IP Core ORB - high-performance, low-latency solution for FPGA-GPP com...Spectra IP Core ORB - high-performance, low-latency solution for FPGA-GPP com...
Spectra IP Core ORB - high-performance, low-latency solution for FPGA-GPP com...
 
Arm instruction set
Arm instruction setArm instruction set
Arm instruction set
 
Lec02
Lec02Lec02
Lec02
 
Emerson Eduardo Rodrigues - ENGINEERING STUDIES1 Rp 160540 signalling reducti...
Emerson Eduardo Rodrigues - ENGINEERING STUDIES1 Rp 160540 signalling reducti...Emerson Eduardo Rodrigues - ENGINEERING STUDIES1 Rp 160540 signalling reducti...
Emerson Eduardo Rodrigues - ENGINEERING STUDIES1 Rp 160540 signalling reducti...
 
Emerson Eduardo Rodrigues - ENGINEERING STUDIES1 Rp 160571-mark new si propos...
Emerson Eduardo Rodrigues - ENGINEERING STUDIES1 Rp 160571-mark new si propos...Emerson Eduardo Rodrigues - ENGINEERING STUDIES1 Rp 160571-mark new si propos...
Emerson Eduardo Rodrigues - ENGINEERING STUDIES1 Rp 160571-mark new si propos...
 
Emerson Eduardo Rodrigues - ENGINEERING STUDIES1 Rp 160534 - rm
Emerson Eduardo Rodrigues - ENGINEERING STUDIES1 Rp 160534 - rmEmerson Eduardo Rodrigues - ENGINEERING STUDIES1 Rp 160534 - rm
Emerson Eduardo Rodrigues - ENGINEERING STUDIES1 Rp 160534 - rm
 
The 8051 assembly language
The 8051 assembly languageThe 8051 assembly language
The 8051 assembly language
 
Emerson Eduardo Rodrigues - ENGINEERING STUDIES 2 clean1 new wi proposal perf...
Emerson Eduardo Rodrigues - ENGINEERING STUDIES 2 clean1 new wi proposal perf...Emerson Eduardo Rodrigues - ENGINEERING STUDIES 2 clean1 new wi proposal perf...
Emerson Eduardo Rodrigues - ENGINEERING STUDIES 2 clean1 new wi proposal perf...
 
ARM inst set part 2
ARM inst set part 2ARM inst set part 2
ARM inst set part 2
 

Similaire à Chip Ex2010 Gert Goossens

tau 2015 spyrou fpga timing
tau 2015 spyrou fpga timingtau 2015 spyrou fpga timing
tau 2015 spyrou fpga timing
Tom Spyrou
 
Embedded c programming22 for fdp
Embedded c programming22 for fdpEmbedded c programming22 for fdp
Embedded c programming22 for fdp
Pradeep Kumar TS
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Deepak Shankar
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris
imec.archive
 

Similaire à Chip Ex2010 Gert Goossens (20)

IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
 
tau 2015 spyrou fpga timing
tau 2015 spyrou fpga timingtau 2015 spyrou fpga timing
tau 2015 spyrou fpga timing
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
Introduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSPIntroduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSP
 
Introduction to computer architecture .pptx
Introduction to computer architecture .pptxIntroduction to computer architecture .pptx
Introduction to computer architecture .pptx
 
UIC Thesis Candiloro
UIC Thesis CandiloroUIC Thesis Candiloro
UIC Thesis Candiloro
 
design-compiler.pdf
design-compiler.pdfdesign-compiler.pdf
design-compiler.pdf
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Embedded c programming22 for fdp
Embedded c programming22 for fdpEmbedded c programming22 for fdp
Embedded c programming22 for fdp
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
 
07 140430-ipp-languages used in llvm during compilation
07 140430-ipp-languages used in llvm during compilation07 140430-ipp-languages used in llvm during compilation
07 140430-ipp-languages used in llvm during compilation
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
 
A 32-Bit Parameterized Leon-3 Processor with Custom Peripheral Integration
A 32-Bit Parameterized Leon-3 Processor with Custom Peripheral IntegrationA 32-Bit Parameterized Leon-3 Processor with Custom Peripheral Integration
A 32-Bit Parameterized Leon-3 Processor with Custom Peripheral Integration
 
Computer architecture 3
Computer architecture 3Computer architecture 3
Computer architecture 3
 
676.v3
676.v3676.v3
676.v3
 
OPAL-RT and ANSYS - HIL simulation
OPAL-RT and ANSYS - HIL simulationOPAL-RT and ANSYS - HIL simulation
OPAL-RT and ANSYS - HIL simulation
 
20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris20081114 Friday Food iLabt Bart Joris
20081114 Friday Food iLabt Bart Joris
 

Plus de Alona Gradman

Bary pangrle mentor track d
Bary pangrle   mentor track dBary pangrle   mentor track d
Bary pangrle mentor track d
Alona Gradman
 
C:\fakepath\apache track d updated
C:\fakepath\apache   track d updatedC:\fakepath\apache   track d updated
C:\fakepath\apache track d updated
Alona Gradman
 
Apache track d updated
Apache   track d updatedApache   track d updated
Apache track d updated
Alona Gradman
 
National instruments track e
National instruments   track eNational instruments   track e
National instruments track e
Alona Gradman
 
Stephan berg track f
Stephan berg   track fStephan berg   track f
Stephan berg track f
Alona Gradman
 
Mullbery& veriest track g
Mullbery& veriest  track gMullbery& veriest  track g
Mullbery& veriest track g
Alona Gradman
 
Target updated track f
Target updated   track fTarget updated   track f
Target updated track f
Alona Gradman
 
C:\fakepath\micrologic track c
C:\fakepath\micrologic   track cC:\fakepath\micrologic   track c
C:\fakepath\micrologic track c
Alona Gradman
 
Timing¬Driven Variation¬Aware NonuniformClock Mesh Synthesis
Timing¬Driven Variation¬Aware NonuniformClock Mesh SynthesisTiming¬Driven Variation¬Aware NonuniformClock Mesh Synthesis
Timing¬Driven Variation¬Aware NonuniformClock Mesh Synthesis
Alona Gradman
 

Plus de Alona Gradman (19)

Bary pangrle mentor track d
Bary pangrle   mentor track dBary pangrle   mentor track d
Bary pangrle mentor track d
 
C:\fakepath\apache track d updated
C:\fakepath\apache   track d updatedC:\fakepath\apache   track d updated
C:\fakepath\apache track d updated
 
Apache track d updated
Apache   track d updatedApache   track d updated
Apache track d updated
 
National instruments track e
National instruments   track eNational instruments   track e
National instruments track e
 
Stephan berg track f
Stephan berg   track fStephan berg   track f
Stephan berg track f
 
Mullbery& veriest track g
Mullbery& veriest  track gMullbery& veriest  track g
Mullbery& veriest track g
 
Xilinx track g
Xilinx   track gXilinx   track g
Xilinx track g
 
Altera trcak g
Altera  trcak gAltera  trcak g
Altera trcak g
 
Arm updated track h
Arm updated  track hArm updated  track h
Arm updated track h
 
Evatronix track h
Evatronix   track hEvatronix   track h
Evatronix track h
 
Target updated track f
Target updated   track fTarget updated   track f
Target updated track f
 
Vsync track c
Vsync   track cVsync   track c
Vsync track c
 
C:\fakepath\micrologic track c
C:\fakepath\micrologic   track cC:\fakepath\micrologic   track c
C:\fakepath\micrologic track c
 
Synopsys track c
Synopsys track cSynopsys track c
Synopsys track c
 
Intel track a
Intel   track aIntel   track a
Intel track a
 
Mips track a
Mips   track aMips   track a
Mips track a
 
E silicon track b
E silicon  track bE silicon  track b
E silicon track b
 
Magma trcak b
Magma  trcak bMagma  trcak b
Magma trcak b
 
Timing¬Driven Variation¬Aware NonuniformClock Mesh Synthesis
Timing¬Driven Variation¬Aware NonuniformClock Mesh SynthesisTiming¬Driven Variation¬Aware NonuniformClock Mesh Synthesis
Timing¬Driven Variation¬Aware NonuniformClock Mesh Synthesis
 

Chip Ex2010 Gert Goossens