SlideShare une entreprise Scribd logo
1  sur  13
Łukasz Schodowski
Power Systems Client Technical Specialist
IBM Power Systems
FPGA and CAPI in OpenPower
© 2015 IBM Corporation 2
FPGA - Field Programmable Gate Array
• It’s not a microprocessor
• It’s not an ASIC chip
• It’s not a CPU
• But it can be all of them
© 2015 IBM Corporation
FPGA as an Accelerator
• FPGA: Field Programmable Gate Array
– It’s a re-programmable chip
– It can run fast (cycle times of 250 – 500 Mhz or more)
– It has Industry Standard Interfaces like PCI-E Gen3
– The Major FPGA Suppliers, Altera and Xilinx,
are OpenPOWER Foundation members
3
FPGA
gzip Encrypt
Monte
Carlo
PCIE
FPGA Library
Source code for FPGAs has traditionally
been written in RTL* (VHDL** or Verilog).
Now, we also have OpenCL, a more
programmer friendly language.
*RTL = Register Transfer Level
**VHDL = VHSIC*** Hardware Description Language
***VHSIC = Very High Speed Integrated Circuit
Why FPGAs?
© 2015 IBM Corporation
Why is an Accelerator Faster?
4
FPGAPCIE
Question: The POWER8 Processor runs at ~3-4 Ghz while our
FPGA runs at 250Mhz. So why would an accelerator
be better?
Answer: The FPGA is better for certain algorithms, such as
those that are numerical intensive or have parallelism.
The POWER8 processor has a finite set of instructions
to implement the algorithm in SW.
The FPGA is customized logic built for specific
processing of an algorithm.
Why FPGAs?
© 2015 IBM Corporation
Why is an Accelerator Faster?
5
FPGAPCIE
Example 1: Numerical Intensive Algorithm
FPGA
sin cos
x+
∑
+
∫
Integral ()
Sigma ()
Sin ()
Cos ()
Main
(n,a,v,w)
SW
𝑛 𝑛
𝑎 𝑎
𝑣 𝑣
𝑤 𝑤
Variables
Done!Done!
Why FPGAs?
ò P(x)dx = a0 + (an cos
npv
L
+an sin
npw
L
)
n=1
k
å
© 2015 IBM Corporation
Why is an Accelerator Faster?
6
FPGAPCIE
Example 2: Parallelism
FPGA
Monte Carlo Risk Analysis to determine
probability of financial success:
Given current finances, run 100 scenarios
Variable distributor
Engine1
Engine2
Engine3
Engine4
Engine5
Engine6
Engine7
Engine8
Engine9
Engine50
Results Accumulator
Monte
Main
(Vars)
SW
Variables Variables
50510 100
Why FPGAs?
© 2014 International Business Machines Corporation
Workload Acceleration
Services Delivery Model
Advanced Memories
Optimized System Design
Custom SOC’s
Some Example Use Cases
System stack innovations are
required to drive cost/performance
Processors
Semiconductor Technology
Applications and services
Firmware, Operating System
and Hypervisor
System Stack
Systems Management &
Cloud Deployment
Systems Acceleration &
HW/SW Optimization
Power8 Invents CAPI – Coherent Accelerator
Processor Interface
CAPP
PCIe
Power Processor
CAPI over
PCIe
Coherently Attached
Device
• Coherent Attached Processor Proxy (CAPP) in processor
– Unit on processor that extends coherency to an attached device
– On processor directory responds on behalf of off-chip device
(Filtering snoops)
• Coherency protocol tunneled over standard PCIe
– Eliminates the need for special I/Os and protocol logic
CAPI utilizes standard Posted Write and Non-posted Reads
– Reduces the complexity and bandwidth requirements of the
attached device
• Enables attached device to be a peer to the processor
– Simplifies programming model between application
– Enables device to use same effective address as application
running in processor
– Eliminates the cumbersome I/O Device Driver requirements
Pinned memory not required
9© 2015 IBM Corporation
Memory Subsystem
Virt Addr
What was done before CAPI?
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
App
FPGAPCIE
VariablesInput
Data
DD
Device Driver
Storage Area
Variables
Input
Data
Variables
Input
Data
Output
Data
Output
Data
Prior to CAPI, an application called a device driver to utilize an
FPGA Accelerator.
The device driver performed a memory mapping operation.
3 versions of the data (not coherent).
1000s of instructions in the device driver.
CAPI Coherency
Vs. IO Model
10© 2015 IBM Corporation
Memory Subsystem
Virt Addr
CAPI Coherency
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
App
FPGA
PCIE
With CAPI, the FPGA shares memory with the cores
PSL
Variables
Input
Data
Output
Data
1 coherent version of the data.
No device driver call/instructions.
CAPI Coherency
Vs. IO Model
11© 2015 IBM Corporation
Typical I/O Model Flow:
Flow with a Coherent Model:
Shared Mem.
Notify Accelerator
Acceleration
Shared Memory
Completion
DD Call
Copy or Pin
Source Data
MMIO Notify
Accelerator
Acceleration
Poll / Interrupt
Completion
Copy or Unpin
Result Data
Ret. From DD
Completion
Application
Dependent, but
Equal to below
Application
Dependent, but
Equal to above
300 Instructions 10,000 Instructions 3,000 Instructions
1,000 Instructions
1,000 Instructions
7.9µs 4.9µs
Total ~13µs for data prep
400 Instructions 100 Instructions
0.3µs 0.06µs
Total 0.36µs
CAPI vs. I/O Device Driver: Data Prep
CAPI Coherency
Vs. IO Model
12© 2015 IBM Corporation
Memory Subsystem
Redis asks for data through Block API.
Data source is either memory subsystem or
IBM Flash. POWER8 manages it.
Virt Addr
IBM Data Engine for NoSQL
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
POWER8
Core
PCIE
PSL
IBM
Data
Engine
Block
API
IBM Flash
Acceptable
latency
CAPI
Memory
Conventional PCIe I/O
network
network
network
13© 2015 IBM Corporation
POWER8 with CAPI Cards
POWER8 Modules
CAPI Dev Kit Cards
Front View
Side View
How CAPI
Works

Contenu connexe

Tendances

ScilabTEC 2015 - Xilinx
ScilabTEC 2015 - XilinxScilabTEC 2015 - Xilinx
ScilabTEC 2015 - XilinxScilab
 
Host Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment ModelsHost Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment ModelsNetronome
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2Yutaka Kawai
 
ODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Workgroup
 
Embedded Recipes 2018 - SoC+FPGA update in 2018 - Marek Vasut
Embedded Recipes 2018 - SoC+FPGA update in 2018 - Marek VasutEmbedded Recipes 2018 - SoC+FPGA update in 2018 - Marek Vasut
Embedded Recipes 2018 - SoC+FPGA update in 2018 - Marek VasutAnne Nicolas
 
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Verification Automation Using IPXACT
Verification Automation Using IPXACTVerification Automation Using IPXACT
Verification Automation Using IPXACTDVClub
 
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation SystemSynopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation SystemMostafa Khamis
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresRISC-V International
 
VLSID_2015_DSE_HMP_v3
VLSID_2015_DSE_HMP_v3VLSID_2015_DSE_HMP_v3
VLSID_2015_DSE_HMP_v3Santanu Sarma
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Aheadinside-BigData.com
 
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53KarthiSugumar
 
Introduction to fpga synthesis tools
Introduction to fpga synthesis toolsIntroduction to fpga synthesis tools
Introduction to fpga synthesis toolsHossam Hassan
 

Tendances (20)

OpenCAPI Technology Ecosystem
OpenCAPI Technology EcosystemOpenCAPI Technology Ecosystem
OpenCAPI Technology Ecosystem
 
ScilabTEC 2015 - Xilinx
ScilabTEC 2015 - XilinxScilabTEC 2015 - Xilinx
ScilabTEC 2015 - Xilinx
 
Host Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment ModelsHost Data Plane Acceleration: SmartNIC Deployment Models
Host Data Plane Acceleration: SmartNIC Deployment Models
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
00 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver200 opencapi acceleration framework yonglu_ver2
00 opencapi acceleration framework yonglu_ver2
 
ODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & FeedsODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Proof of Concept SmartNIC Speeds & Feeds
 
Embedded Recipes 2018 - SoC+FPGA update in 2018 - Marek Vasut
Embedded Recipes 2018 - SoC+FPGA update in 2018 - Marek VasutEmbedded Recipes 2018 - SoC+FPGA update in 2018 - Marek Vasut
Embedded Recipes 2018 - SoC+FPGA update in 2018 - Marek Vasut
 
@IBM Power roadmap 8
@IBM Power roadmap 8 @IBM Power roadmap 8
@IBM Power roadmap 8
 
An Update on Arm HPC
An Update on Arm HPCAn Update on Arm HPC
An Update on Arm HPC
 
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Fpga vs asic
Fpga vs asicFpga vs asic
Fpga vs asic
 
Verification Automation Using IPXACT
Verification Automation Using IPXACTVerification Automation Using IPXACT
Verification Automation Using IPXACT
 
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation SystemSynopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V cores
 
VLSID_2015_DSE_HMP_v3
VLSID_2015_DSE_HMP_v3VLSID_2015_DSE_HMP_v3
VLSID_2015_DSE_HMP_v3
 
EMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road AheadEMC in HPC – The Journey so far and the Road Ahead
EMC in HPC – The Journey so far and the Road Ahead
 
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53
Architecture Exploration of RISC-V Processor and Comparison with ARM Cortex-A53
 
Introduction to fpga synthesis tools
Introduction to fpga synthesis toolsIntroduction to fpga synthesis tools
Introduction to fpga synthesis tools
 

En vedette

PORQUE A PREFEITURA TEM QUE DESCONGELAR O ORÇAMENTO
PORQUE A PREFEITURA TEM QUE DESCONGELAR O ORÇAMENTOPORQUE A PREFEITURA TEM QUE DESCONGELAR O ORÇAMENTO
PORQUE A PREFEITURA TEM QUE DESCONGELAR O ORÇAMENTORAPPER PIRATA
 
Kennedy Veteran Mentor Certificate
Kennedy Veteran Mentor CertificateKennedy Veteran Mentor Certificate
Kennedy Veteran Mentor CertificateAlan Kennedy
 
Fundamentos de Sistemas de Base de Datos (Capítulo 7 y 8)
Fundamentos de Sistemas de Base de Datos (Capítulo 7 y 8)Fundamentos de Sistemas de Base de Datos (Capítulo 7 y 8)
Fundamentos de Sistemas de Base de Datos (Capítulo 7 y 8)Karina Lucio
 
Project Listing Rev.08.2016
Project Listing Rev.08.2016Project Listing Rev.08.2016
Project Listing Rev.08.2016Mark Selig
 
TEDxZapopan 2012 - Convocatoria Audiencia
TEDxZapopan 2012 - Convocatoria AudienciaTEDxZapopan 2012 - Convocatoria Audiencia
TEDxZapopan 2012 - Convocatoria AudienciaInterlubGroup
 
IBM CAPI:概要 (An overview of IBM CAPI)
IBM CAPI:概要 (An overview of IBM CAPI)IBM CAPI:概要 (An overview of IBM CAPI)
IBM CAPI:概要 (An overview of IBM CAPI)Mr. Vengineer
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
презентация «Использование игровых проемов для развития связной речи у дошкол...
презентация «Использование игровых проемов для развития связной речи у дошкол...презентация «Использование игровых проемов для развития связной речи у дошкол...
презентация «Использование игровых проемов для развития связной речи у дошкол...Валерия Кулеш
 

En vedette (15)

Daftar pustaka bhb
Daftar pustaka bhbDaftar pustaka bhb
Daftar pustaka bhb
 
Prd 2
Prd 2Prd 2
Prd 2
 
PORQUE A PREFEITURA TEM QUE DESCONGELAR O ORÇAMENTO
PORQUE A PREFEITURA TEM QUE DESCONGELAR O ORÇAMENTOPORQUE A PREFEITURA TEM QUE DESCONGELAR O ORÇAMENTO
PORQUE A PREFEITURA TEM QUE DESCONGELAR O ORÇAMENTO
 
Sem wizard Sole Case Study
Sem wizard Sole Case StudySem wizard Sole Case Study
Sem wizard Sole Case Study
 
Kennedy Veteran Mentor Certificate
Kennedy Veteran Mentor CertificateKennedy Veteran Mentor Certificate
Kennedy Veteran Mentor Certificate
 
Fundamentos de Sistemas de Base de Datos (Capítulo 7 y 8)
Fundamentos de Sistemas de Base de Datos (Capítulo 7 y 8)Fundamentos de Sistemas de Base de Datos (Capítulo 7 y 8)
Fundamentos de Sistemas de Base de Datos (Capítulo 7 y 8)
 
Project Listing Rev.08.2016
Project Listing Rev.08.2016Project Listing Rev.08.2016
Project Listing Rev.08.2016
 
TEDxZapopan 2012 - Convocatoria Audiencia
TEDxZapopan 2012 - Convocatoria AudienciaTEDxZapopan 2012 - Convocatoria Audiencia
TEDxZapopan 2012 - Convocatoria Audiencia
 
9 класс
9 класс9 класс
9 класс
 
Desastres naturales
Desastres naturalesDesastres naturales
Desastres naturales
 
Plan de marketing 2015 final
Plan de marketing 2015 finalPlan de marketing 2015 final
Plan de marketing 2015 final
 
IBM CAPI:概要 (An overview of IBM CAPI)
IBM CAPI:概要 (An overview of IBM CAPI)IBM CAPI:概要 (An overview of IBM CAPI)
IBM CAPI:概要 (An overview of IBM CAPI)
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
презентация «Использование игровых проемов для развития связной речи у дошкол...
презентация «Использование игровых проемов для развития связной речи у дошкол...презентация «Использование игровых проемов для развития связной речи у дошкол...
презентация «Использование игровых проемов для развития связной речи у дошкол...
 
PI Daimler Mobility Services.pdf
PI Daimler Mobility Services.pdfPI Daimler Mobility Services.pdf
PI Daimler Mobility Services.pdf
 

Similaire à FPGA MeetUp

Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialGanesan Narayanasamy
 
Announcing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAsAnnouncing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAsAmazon Web Services
 
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET Journal
 
FPGA Accelerated Computing Using Amazon EC2 F1 Instances - CMP308 - re:Invent...
FPGA Accelerated Computing Using Amazon EC2 F1 Instances - CMP308 - re:Invent...FPGA Accelerated Computing Using Amazon EC2 F1 Instances - CMP308 - re:Invent...
FPGA Accelerated Computing Using Amazon EC2 F1 Instances - CMP308 - re:Invent...Amazon Web Services
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator Ganesan Narayanasamy
 
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...Amazon Web Services
 
[Café techno] - Ibm power7 - Les dernières annonces
[Café techno] - Ibm power7 - Les dernières annonces[Café techno] - Ibm power7 - Les dernières annonces
[Café techno] - Ibm power7 - Les dernières annoncesGroupe D.FI
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterTim Ellison
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstackIkuo Kumagai
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERAchronix
 
02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...Yutaka Kawai
 
Open power topics20191023
Open power topics20191023Open power topics20191023
Open power topics20191023Yutaka Kawai
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Cesar Maciel
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistParis Open Source Summit
 
IBM Power Systems Announcement Update
IBM Power Systems Announcement UpdateIBM Power Systems Announcement Update
IBM Power Systems Announcement UpdateDavid Spurway
 
IBM System Networking Portfolio Update, June 2014
IBM System Networking Portfolio Update, June 2014IBM System Networking Portfolio Update, June 2014
IBM System Networking Portfolio Update, June 2014Angel Villar Garea
 

Similaire à FPGA MeetUp (20)

Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialSCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
 
Announcing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAsAnnouncing Amazon EC2 F1 Instances with Custom FPGAs
Announcing Amazon EC2 F1 Instances with Custom FPGAs
 
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
 
FPGA Accelerated Computing Using Amazon EC2 F1 Instances - CMP308 - re:Invent...
FPGA Accelerated Computing Using Amazon EC2 F1 Instances - CMP308 - re:Invent...FPGA Accelerated Computing Using Amazon EC2 F1 Instances - CMP308 - re:Invent...
FPGA Accelerated Computing Using Amazon EC2 F1 Instances - CMP308 - re:Invent...
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
 
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
Accelerate Your C/C++ Applications with Amazon EC2 F1 Instances (CMP405) - AW...
 
[Café techno] - Ibm power7 - Les dernières annonces
[Café techno] - Ibm power7 - Les dernières annonces[Café techno] - Ibm power7 - Les dernières annonces
[Café techno] - Ibm power7 - Les dernières annonces
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
Approaching hyperconvergedopenstack
Approaching hyperconvergedopenstackApproaching hyperconvergedopenstack
Approaching hyperconvergedopenstack
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWER
 
02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...
 
OpenDataPlane Project
OpenDataPlane ProjectOpenDataPlane Project
OpenDataPlane Project
 
Open power topics20191023
Open power topics20191023Open power topics20191023
Open power topics20191023
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems SpecialistOWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
OWF14 - Plenary Session : Thibaud Besson, IBM POWER Systems Specialist
 
IBM Power Systems Announcement Update
IBM Power Systems Announcement UpdateIBM Power Systems Announcement Update
IBM Power Systems Announcement Update
 
IBM System Networking Portfolio Update, June 2014
IBM System Networking Portfolio Update, June 2014IBM System Networking Portfolio Update, June 2014
IBM System Networking Portfolio Update, June 2014
 
IBM PureSystems
IBM PureSystemsIBM PureSystems
IBM PureSystems
 
IBM POWER8 as an HPC platform
IBM POWER8 as an HPC platformIBM POWER8 as an HPC platform
IBM POWER8 as an HPC platform
 

Dernier

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Dernier (20)

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

FPGA MeetUp

  • 1. Łukasz Schodowski Power Systems Client Technical Specialist IBM Power Systems FPGA and CAPI in OpenPower
  • 2. © 2015 IBM Corporation 2 FPGA - Field Programmable Gate Array • It’s not a microprocessor • It’s not an ASIC chip • It’s not a CPU • But it can be all of them
  • 3. © 2015 IBM Corporation FPGA as an Accelerator • FPGA: Field Programmable Gate Array – It’s a re-programmable chip – It can run fast (cycle times of 250 – 500 Mhz or more) – It has Industry Standard Interfaces like PCI-E Gen3 – The Major FPGA Suppliers, Altera and Xilinx, are OpenPOWER Foundation members 3 FPGA gzip Encrypt Monte Carlo PCIE FPGA Library Source code for FPGAs has traditionally been written in RTL* (VHDL** or Verilog). Now, we also have OpenCL, a more programmer friendly language. *RTL = Register Transfer Level **VHDL = VHSIC*** Hardware Description Language ***VHSIC = Very High Speed Integrated Circuit Why FPGAs?
  • 4. © 2015 IBM Corporation Why is an Accelerator Faster? 4 FPGAPCIE Question: The POWER8 Processor runs at ~3-4 Ghz while our FPGA runs at 250Mhz. So why would an accelerator be better? Answer: The FPGA is better for certain algorithms, such as those that are numerical intensive or have parallelism. The POWER8 processor has a finite set of instructions to implement the algorithm in SW. The FPGA is customized logic built for specific processing of an algorithm. Why FPGAs?
  • 5. © 2015 IBM Corporation Why is an Accelerator Faster? 5 FPGAPCIE Example 1: Numerical Intensive Algorithm FPGA sin cos x+ ∑ + ∫ Integral () Sigma () Sin () Cos () Main (n,a,v,w) SW 𝑛 𝑛 𝑎 𝑎 𝑣 𝑣 𝑤 𝑤 Variables Done!Done! Why FPGAs? ò P(x)dx = a0 + (an cos npv L +an sin npw L ) n=1 k å
  • 6. © 2015 IBM Corporation Why is an Accelerator Faster? 6 FPGAPCIE Example 2: Parallelism FPGA Monte Carlo Risk Analysis to determine probability of financial success: Given current finances, run 100 scenarios Variable distributor Engine1 Engine2 Engine3 Engine4 Engine5 Engine6 Engine7 Engine8 Engine9 Engine50 Results Accumulator Monte Main (Vars) SW Variables Variables 50510 100 Why FPGAs?
  • 7. © 2014 International Business Machines Corporation Workload Acceleration Services Delivery Model Advanced Memories Optimized System Design Custom SOC’s Some Example Use Cases System stack innovations are required to drive cost/performance Processors Semiconductor Technology Applications and services Firmware, Operating System and Hypervisor System Stack Systems Management & Cloud Deployment Systems Acceleration & HW/SW Optimization
  • 8. Power8 Invents CAPI – Coherent Accelerator Processor Interface CAPP PCIe Power Processor CAPI over PCIe Coherently Attached Device • Coherent Attached Processor Proxy (CAPP) in processor – Unit on processor that extends coherency to an attached device – On processor directory responds on behalf of off-chip device (Filtering snoops) • Coherency protocol tunneled over standard PCIe – Eliminates the need for special I/Os and protocol logic CAPI utilizes standard Posted Write and Non-posted Reads – Reduces the complexity and bandwidth requirements of the attached device • Enables attached device to be a peer to the processor – Simplifies programming model between application – Enables device to use same effective address as application running in processor – Eliminates the cumbersome I/O Device Driver requirements Pinned memory not required
  • 9. 9© 2015 IBM Corporation Memory Subsystem Virt Addr What was done before CAPI? POWER8 Core POWER8 Core POWER8 Core POWER8 Core POWER8 Core POWER8 Core App FPGAPCIE VariablesInput Data DD Device Driver Storage Area Variables Input Data Variables Input Data Output Data Output Data Prior to CAPI, an application called a device driver to utilize an FPGA Accelerator. The device driver performed a memory mapping operation. 3 versions of the data (not coherent). 1000s of instructions in the device driver. CAPI Coherency Vs. IO Model
  • 10. 10© 2015 IBM Corporation Memory Subsystem Virt Addr CAPI Coherency POWER8 Core POWER8 Core POWER8 Core POWER8 Core POWER8 Core POWER8 Core App FPGA PCIE With CAPI, the FPGA shares memory with the cores PSL Variables Input Data Output Data 1 coherent version of the data. No device driver call/instructions. CAPI Coherency Vs. IO Model
  • 11. 11© 2015 IBM Corporation Typical I/O Model Flow: Flow with a Coherent Model: Shared Mem. Notify Accelerator Acceleration Shared Memory Completion DD Call Copy or Pin Source Data MMIO Notify Accelerator Acceleration Poll / Interrupt Completion Copy or Unpin Result Data Ret. From DD Completion Application Dependent, but Equal to below Application Dependent, but Equal to above 300 Instructions 10,000 Instructions 3,000 Instructions 1,000 Instructions 1,000 Instructions 7.9µs 4.9µs Total ~13µs for data prep 400 Instructions 100 Instructions 0.3µs 0.06µs Total 0.36µs CAPI vs. I/O Device Driver: Data Prep CAPI Coherency Vs. IO Model
  • 12. 12© 2015 IBM Corporation Memory Subsystem Redis asks for data through Block API. Data source is either memory subsystem or IBM Flash. POWER8 manages it. Virt Addr IBM Data Engine for NoSQL POWER8 Core POWER8 Core POWER8 Core POWER8 Core POWER8 Core POWER8 Core PCIE PSL IBM Data Engine Block API IBM Flash Acceptable latency CAPI Memory Conventional PCIe I/O network network network
  • 13. 13© 2015 IBM Corporation POWER8 with CAPI Cards POWER8 Modules CAPI Dev Kit Cards Front View Side View How CAPI Works