Eugene Khvedchenia - Image processing using FPGAs

•Télécharger en tant que PPTX, PDF•

0 j'aime•2,240 vues

Eastern European Computer Vision Conference

This document discusses image processing using FPGAs. It begins with an overview of FPGAs and their components. It then discusses using high-level synthesis to convert C++ code to hardware designs for FPGAs. An example of implementing Sobel edge detection on an FPGA is provided. The implementation was optimized from 40 cycles per pixel to 1 cycle per pixel through pipelining, parallelism, and using block RAM for intermediate storage. Challenges discussed include limited debugging tools and steep learning curves for FPGA development.

Technologie

Image processing on FPGA
Eugene Khvedchenya
https://ua.linkedin.com/in/cvtalks

General implementation
OpenCL
Cache tuning
Multithreading
SIMD (SSE, NEON)
FPGA
Optimization pyramid

What’s inside?
LUT
Flip-Flop
ALU
BRAM
IO pads
FPGA

High Level Synthesis
Converts C++ code to hardware design
HLS compiler optimizes your code for FPGA
Automatically optimize RTL and timing
Provides #pragma’s for fine tuning
C++ API for arbitrary precision math
C++ API for stream data processing
Supports C++ 11

Things to remember
No dynamic memory allocation

Things to remember
Instantaneous BRAM access
Register-level bandwidth 0.5M-bits / second
BRAM bandwidth 23T-bits / second
Numbers above for Xilinx Kintex®-7 410T device

Things to remember
Single producer - single consumer

Things to remember
● No branching penalty
● No cache penalty
● No dynamic memory allocation
● Instantaneous BRAM access
● Single producer - single consumer
● Pipelining
● Task-centric approach

HLS Development cycle
1. Get baseline version
2. Write simulation test
3. Run HLS synthesis
4. Simulate
5. Validate
6. Measure
7. Optimize
8. Goto 3

Sobel Edge Detection
Goal: Process image 1920x1080 @ 60HZ

Sobel Edge Detection
Baseline implementation
Iterate over image
● Convolve 3x3 window with Gx and Gy kernels
● Compute their absolute sum
● Write to corresponding output pixel
The FPGA frequency is this example is 150 Mhz
To meet 1920x1080@60Hz goal we must process data at rate 1 cycle/pixel or faster

Sobel Edge Detection
Baseline implementation

Sobel Edge Detection
Baseline implementation
40 cycles/pixel on FPGA
Timing violation

Sobel Edge Detection
Tuning FPGA implementation
Iterate over image
● Convolve 3x3 window with Gx and Gy kernels
Pipeline: Compute one field in the 3x3 filter window per clock cycle.
● Compute Gx and Gy absolute sum
● Write to corresponding output pixel

Sobel Edge Detection
Tuning FPGA implementation

Sobel Edge Detection
Tuning FPGA implementation
10 cycles/pixel on FPGA
Timing violation

Sobel Edge Detection
Tuning FPGA implementation
Iterate over image
● Pipeline: Apply pipeline to the inner loop (columns)
● Convolve 3x3 window with Gx and Gy kernels
○ Loop gets totally unrolled and computed at 1 cycle
● Compute Gx and Gy absolute sum
○ Also computed in parallel
● Write to corresponding output pixel

Sobel Edge Detection
Tuning FPGA implementation
1 cycle/pixel on FPGA
Memory-access violation

Sobel Edge Detection
Tuning FPGA implementation
Issues
● Nine concurrent memory accesses
● More hardware blocks required
● HLS module can only connect a single port capable of one transaction/clock

Sobel Edge Detection
Tuning FPGA implementation
● Use BRAM to store intermediate line buffer
● Read data from external memory to line buffer
● Fill memory window (Flip-flop elements)
● Convolve 3x3 window with Gx and Gy kernels
○ Loop gets totally unrolled and computed at 1 cycle
● Compute their absolute sum
○ Also computed in parallel
● Write to corresponding output pixel

Sobel Edge Detection
Tuning FPGA implementation
1 cycle/pixel on FPGA
Achievement unlocked

The dark side
Of the FPGA development
● The tools aren’t great
● It works in simulator!
● Learning curve
● Debugging timing violations

Quick start
● FPGA Development board: Altera, Xilinx
● IDE & Samples: Vivado
● OpenCV support
● HLS for OpenCL

Image processing on FPGA
Eugene Khvedchenya
Questions?
https://ua.linkedin.com/in/cvtalks
ekhvedchenya@gmail.com
@cvtalks

Contenu connexe

Tendances

GPU Pipeline - Realtime Rendering CH3Aries Cs

Challenges in Embedded DevelopmentSQABD

Minimizing CPU Shortage Risks in Integrated Embedded SoftwareLionel Briand

Getting Space Pirate Trainer* to Perform on Intel® GraphicsIntel® Software

[Unite Seoul 2020] Mobile Graphics Best Practices for ArtistsOwen Wu

[TGDF 2020] Mobile Graphics Best Practices for ArtistOwen Wu

Memory Leak Analysis in Android GamesHeghine Hakobyan

Horovod ubers distributed deep learning framework by Alex Sergeev from UberBill Liu

Unity mobile game performance profiling – using arm mobile studioOwen Wu

[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...Owen Wu

OpenmpAmirali Sharifian

GPU Computing for Data Science Domino Data Lab

TinyML as-a-ServiceHiroshi Doyu

BruCON 2010 Lightning Talks - DIY Grid Computingtomaszmiklas

SpeedIT FLOWUniversity of Zurich

Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Benoit Combemale

Kernel Recipes 2014 - kGraft: Live Patching of the Linux KernelAnne Nicolas

بررسی و انتخاب بهترین زبان برنامه نویسیShiraz LUG

Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central

GFX Part 1 - Introduction to GPU HW and OpenGL ES specificationsPrabindh Sundareson

Tendances (20)

GPU Pipeline - Realtime Rendering CH3

Challenges in Embedded Development

Minimizing CPU Shortage Risks in Integrated Embedded Software

Getting Space Pirate Trainer* to Perform on Intel® Graphics

[Unite Seoul 2020] Mobile Graphics Best Practices for Artists

[TGDF 2020] Mobile Graphics Best Practices for Artist

Memory Leak Analysis in Android Games

Horovod ubers distributed deep learning framework by Alex Sergeev from Uber

Unity mobile game performance profiling – using arm mobile studio

[GDC 2012] Enhancing Graphics in Unreal Engine 3 Titles Using AMD Code Submis...

Openmp

GPU Computing for Data Science

TinyML as-a-Service

BruCON 2010 Lightning Talks - DIY Grid Computing

SpeedIT FLOW

Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)

Kernel Recipes 2014 - kGraft: Live Patching of the Linux Kernel

بررسی و انتخاب بهترین زبان برنامه نویسی

Direct3D12 and the Future of Graphics APIs by Dave Oldcorn

GFX Part 1 - Introduction to GPU HW and OpenGL ES specifications

En vedette

Michael Norel - High Accuracy Camera Calibration Eastern European Computer Vision Conference

Andrii Babii - Application of fuzzy transform to image fusion Eastern European Computer Vision Conference

James Pritts - Visual Recognition in the Wild: Image Retrieval, Faces, and Text Eastern European Computer Vision Conference

#3 Global AI Meetup (NLP) - Станислав Гафаров, MrBotchatbotscommunity

#3 Global AI Meetup (NLP) - Михаил Бурцев, DeepHackLabchatbotscommunity

#3 Global AI Meetup (NLP) - Олег Шляжко, Chatfuelchatbotscommunity

Анализ ниши 80-го левела - нюансы, кейсы, практикаSeoProfy Presentations

XgboostVivian S. Zhang

30 Reasons to Start a BusinessPalo Alto Software

En vedette (9)

Michael Norel - High Accuracy Camera Calibration

Andrii Babii - Application of fuzzy transform to image fusion

James Pritts - Visual Recognition in the Wild: Image Retrieval, Faces, and Text

#3 Global AI Meetup (NLP) - Станислав Гафаров, MrBot

#3 Global AI Meetup (NLP) - Михаил Бурцев, DeepHackLab

#3 Global AI Meetup (NLP) - Олег Шляжко, Chatfuel

Анализ ниши 80-го левела - нюансы, кейсы, практика

Xgboost

30 Reasons to Start a Business

Similaire à Eugene Khvedchenia - Image processing using FPGAs

SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialGanesan Narayanasamy

CAPI and OpenCAPI Hardware acceleration enablementGanesan Narayanasamy

Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas

"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance

UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheapedlangley

Deep learning with FPGAAyush Singh, MS

Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis

FPGAs for Supercomputing: The Why and HowDESMOND YUEN

FPGA Selection Methodology for Real time projectsKrishna Gaihre

Ti DSP optimization on JacintoHank (Tai-Chi) Wang

Smart logicP V Krishna Mohan Gupta

00 opencapi acceleration framework yonglu_ver2Yutaka Kawai

“Show Me the Garbage!”, Garbage Collection a Friend or a FoeHaim Yadid

Can FPGAs Compete with GPUs?inside-BigData.com

FPGAs in the cloud? (October 2017)Julien SIMON

DESIGN CHOICES FOR EMBEDDED REAL-TIME CONTROL SYSTEMS @ 4th FPGA CampFPGA Central

Using FPGA in Embedded DevicesGlobalLogic Ukraine

fpga1 - What is.pptxssuser0de10a

TMS320C5xDeekshithaReddy23

Synopsys User Group Presentationemlawgr

Similaire à Eugene Khvedchenia - Image processing using FPGAs (20)

SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial

CAPI and OpenCAPI Hardware acceleration enablement

Using a Field Programmable Gate Array to Accelerate Application Performance

"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...

UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap

Deep learning with FPGA

Utilizing AMD GPUs: Tuning, programming models, and roadmap

FPGAs for Supercomputing: The Why and How

FPGA Selection Methodology for Real time projects

Ti DSP optimization on Jacinto

Smart logic

00 opencapi acceleration framework yonglu_ver2

“Show Me the Garbage!”, Garbage Collection a Friend or a Foe

Can FPGAs Compete with GPUs?

FPGAs in the cloud? (October 2017)

DESIGN CHOICES FOR EMBEDDED REAL-TIME CONTROL SYSTEMS @ 4th FPGA Camp

Using FPGA in Embedded Devices

fpga1 - What is.pptx

TMS320C5x

Synopsys User Group Presentation

Dernier

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

A Journey Into the Emotions of Software DevelopersNicole Novielli

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

A Framework for Development in the AI AgeCprime

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney

Dernier (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...

What is DBT - The Ultimate Data Build Tool.pdf

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

A Journey Into the Emotions of Software Developers

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

DevEX - reference for building teams, processes, and platforms

So einfach geht modernes Roaming fuer Notes und Nomad.pdf

Decarbonising Buildings: Making a net-zero built environment a reality

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Potential of AI (Generative AI) in Business: Learnings and Insights

TeamStation AI System Report LATAM IT Salaries 2024

Testing tools and AI - ideas what to try with some tool examples

The Ultimate Guide to Choosing WordPress Pros and Cons

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

A Framework for Development in the AI Age

Long journey of Ruby standard library at RubyConf AU 2024

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...

UiPath Community: Communication Mining from Zero to Hero

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...

Eugene Khvedchenia - Image processing using FPGAs

1. Image processing on FPGA Eugene Khvedchenya https://ua.linkedin.com/in/cvtalks

2. What is FPGA and who needs it ?

3. General implementation OpenCL Cache tuning Multithreading SIMD (SSE, NEON) FPGA Optimization pyramid

4. What’s inside? LUT Flip-Flop ALU BRAM IO pads FPGA

5. Development efforts

6. CPU vs FPGA

7. CPU vs FPGA

8. CPU vs FPGA

9. Development efforts

10. High Level Synthesis Converts C++ code to hardware design HLS compiler optimizes your code for FPGA Automatically optimize RTL and timing Provides #pragma’s for fine tuning C++ API for arbitrary precision math C++ API for stream data processing Supports C++ 11

11. Things to remember No branching penalty

12. Things to remember No dynamic memory allocation

13. Things to remember Instantaneous BRAM access Register-level bandwidth 0.5M-bits / second BRAM bandwidth 23T-bits / second Numbers above for Xilinx Kintex®-7 410T device

14. Things to remember Single producer - single consumer

15. Things to remember Pipelining

16. Things to remember ● No branching penalty ● No cache penalty ● No dynamic memory allocation ● Instantaneous BRAM access ● Single producer - single consumer ● Pipelining ● Task-centric approach

17. HLS Development cycle 1. Get baseline version 2. Write simulation test 3. Run HLS synthesis 4. Simulate 5. Validate 6. Measure 7. Optimize 8. Goto 3

18. Sobel Edge Detection Goal: Process image 1920x1080 @ 60HZ

19. Sobel Edge Detection Baseline implementation Iterate over image ● Convolve 3x3 window with Gx and Gy kernels ● Compute their absolute sum ● Write to corresponding output pixel The FPGA frequency is this example is 150 Mhz To meet 1920x1080@60Hz goal we must process data at rate 1 cycle/pixel or faster

20. Sobel Edge Detection Baseline implementation

21. Sobel Edge Detection Baseline implementation 40 cycles/pixel on FPGA Timing violation

22. Sobel Edge Detection Tuning FPGA implementation Iterate over image ● Convolve 3x3 window with Gx and Gy kernels Pipeline: Compute one field in the 3x3 filter window per clock cycle. ● Compute Gx and Gy absolute sum ● Write to corresponding output pixel

23. Sobel Edge Detection Tuning FPGA implementation

24. Sobel Edge Detection Tuning FPGA implementation 10 cycles/pixel on FPGA Timing violation

25. Sobel Edge Detection Tuning FPGA implementation Iterate over image ● Pipeline: Apply pipeline to the inner loop (columns) ● Convolve 3x3 window with Gx and Gy kernels ○ Loop gets totally unrolled and computed at 1 cycle ● Compute Gx and Gy absolute sum ○ Also computed in parallel ● Write to corresponding output pixel

26. Sobel Edge Detection Tuning FPGA implementation

27. Sobel Edge Detection Tuning FPGA implementation 1 cycle/pixel on FPGA Memory-access violation

28. Sobel Edge Detection Tuning FPGA implementation Issues ● Nine concurrent memory accesses ● More hardware blocks required ● HLS module can only connect a single port capable of one transaction/clock

29. Sobel Edge Detection Tuning FPGA implementation ● Use BRAM to store intermediate line buffer ● Read data from external memory to line buffer ● Fill memory window (Flip-flop elements) ● Convolve 3x3 window with Gx and Gy kernels ○ Loop gets totally unrolled and computed at 1 cycle ● Compute their absolute sum ○ Also computed in parallel ● Write to corresponding output pixel

30. Sobel Edge Detection Tuning FPGA implementation 1 cycle/pixel on FPGA Achievement unlocked

31. The dark side Of the FPGA development ● The tools aren’t great ● It works in simulator! ● Learning curve ● Debugging timing violations

32. Quick start ● FPGA Development board: Altera, Xilinx ● IDE & Samples: Vivado ● OpenCV support ● HLS for OpenCL

33. Image processing on FPGA Eugene Khvedchenya Questions? https://ua.linkedin.com/in/cvtalks ekhvedchenya@gmail.com @cvtalks

Eugene Khvedchenia - Image processing using FPGAs

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (9)

Similaire à Eugene Khvedchenia - Image processing using FPGAs

Similaire à Eugene Khvedchenia - Image processing using FPGAs (20)

Dernier

Dernier (20)

Eugene Khvedchenia - Image processing using FPGAs