SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
| © 2013 Aptina Imaging Corporation | Aptina Confidential1
© 2013 Aptina Imaging Corporation. All rights reserved. Products are warranted only to meet Aptina’s production data sheet specifications. Information, products, and/or
specifications are subject to change without notice. All information is provided on an “AS IS” basis without warranties of any kind. Dates are estimates only. Drawings not to
scale. Aptina and the Aptina logo are trademarks of Aptina Imaging Corporation. All other trademarks are the property of their respective owners.
Imaging using ARM GPU
Investigating flexible imaging pipelines using
embedded GPU
Mikaël Bourges-Sévenier (msevenier at aptina dot com)
Director, High-Performance Imaging
December 2, 2013
HPC & GPU Supercomputing Group
of Silicon Valley
| © 2013 Aptina Imaging Corporation | Aptina Confidential2
Agenda
•  Toward more flexible imaging pipelines
•  Replacing image processor by software & hardware
•  Video HDR using Aptina Interlaced HDR sensor
•  Q&A
| © 2013 Aptina Imaging Corporation | Aptina Confidential3
Cameras are everywhere
Interactive
Systems that respond
to user actions
(PC, Gaming, Mobile)
•  Motion/Gesture tracking
and recognition
•  Body tracking
Environmental
Imaging systems that
are situationally
aware
(Camera, Mobile, PC)
•  Face Detection/Track
•  Gesture tracking
•  Object tracking
Ubiquitous
“Cameras Everywhere”
Distributed Systems
(Mobile, Camera, DIY-
SOHO)
•  Point and shoot
•  HDR
•  Surveillance
| © 2013 Aptina Imaging Corporation | Aptina Confidential4
Computational imaging evolution
Spatial
(Volumetric)
Gesture
AR
Face Detect
Face Track
Presence
Colorimetry
Brightness
Web Cam
Smart
Camera
True Color, Brightness
Compensation, Exposure control
User Identity
Access Control
Augmented Information
3D Imaging
Interactive
Services
| © 2013 Aptina Imaging Corporation | Aptina Confidential5
How imaging pipelines work
| © 2013 Aptina Imaging Corporation | Aptina Confidential6
How Imaging Sensors work
http://www.photoaxe.com
Bayer GRBG pattern
•  50% green
•  25% red and blue
Bayer CFA is one type
of pattern
| © 2013 Aptina Imaging Corporation | Aptina Confidential7
Bayer Demosaicing
•  More G than R, B since eye is more sensitive to luminance than
chrominance
•  Convert pixel colors from Bayer space to Full RGB color
•  Complex interpolation to avoid artifacts (e.g. on edges)
0 1
2 3
0 GRBG
1 RGGB
2 GBRG
3 BGGR
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
| © 2013 Aptina Imaging Corporation | Aptina Confidential8
From RAW to RGB/YUV: the ISP
•  ISP = Imaging Signal Processor
‣  Transform sensor RAW images to YUV
‣  Very complex pipelines, dedicated, optimized for imaging
‣  Low power (200-400mW)
‣  Embedded in Application Processor or as a separate co-processor
Can I upgrade
algorithms?
Image Signal Processor
(ISP)
CMOS sensor
Color Filter Array
Lens
RAW
Bayer
RGB
YUV
Lens, sensor, aperture control
| © 2013 Aptina Imaging Corporation | Aptina Confidential9
Image pipeline block diagram (typical)
Sensor
Bayer data
Black Level
adjust
Lens Shading
correction
White
Balancing
Defect
Correction
Noise Reduction
(Bayer)
Green balance Demosaic
Color
Correction
Sharpening
Tone Mapping
and Gamma Full color RGB data
(to YUV for JPEG)
| © 2013 Aptina Imaging Corporation | Aptina Confidential10
Problem Statement
•  Given a non-typical imaging pipeline, how do we
‣  Take advantage of resources in an embedded platform?
‣  Keep frame rate at 30fps?
‣  Preserve good image quality?
‣  Minimize power usage?
‣  Provide flexible pipelines
| © 2013 Aptina Imaging Corporation | Aptina Confidential11
Alternative Approaches to ISP-only
Hybrid Full Software
ISP + GPU + CPU + DSP GPU + CPU + DSP
Less power More power
Bayer pattern Any pattern
Reuse existing ISP (may not be re-entrant) Require fast processors
Require recent devices Require high-end devices
ISP may only output 8b precision 8b-32b precision
Pre-processing
Image Signal Processor
(ISP)
Post-
processing
CMOS sensor
Color Filter Array
Lens
Bayer RGB
YUV
App
Lens, sensor, aperture control
3A
| © 2013 Aptina Imaging Corporation | Aptina Confidential12
MobileHDR on ARM Mali T604
| © 2013 Aptina Imaging Corporation | Aptina Confidential13
Arndale Samsung Exynos 5 Dual board
•  Arndale Samsung Exynos 5 board
‣  CPU: ARM Corte-A15 (2-core) 1.7 GHz 32nm
•  32KB L1 cache, 1MB L2 cache
‣  GPU: ARM MALI T604
•  64 concurrent threads
•  Vector ALUs
•  128b registers
•  OpenCL 1.1 Full Profile
‣  RAM: 2GB LP-DDR3 800 MHz (12.8 GB/s)
‣  Truly unified cached memory
•  CPU and GPU memory is shared – NO COPY!
•  128b wide L1 and L2 access
‣  2 independent job queue in T628 (in Samsung Exynos 5 Octa)
| © 2013 Aptina Imaging Corporation | Aptina Confidential14
ARM Mali T604 GPUs
In Samsung Exynos 5 Dual
Type Vector GPU Process 32nm
OpenCL 1.1 Full Profile Unified memory Yes
Rendering Tile Work-items 256
Clock 533MHz L2 cache 1MB
Register width 128b Global memory 2GB LP-DDR3 800Mhz (12.8 GB/s)
Pipelines 8 pipes (2 per core) Throughput 100 GFLOPS
Local memory 32KB/core (global)
Constant memory 64KB
Texture cache yes
Compute devices (shader
cores)
4
Cacheline 64 bytes
16/32/64b floats No/yes/yes
| © 2013 Aptina Imaging Corporation | Aptina Confidential15
Avoid buffer copy
•  Mali has unified memory
‣  Use CL_MEM_ALLOC_PTR to avoid copy between CPU and GPU
Host data pointers
Global
Memory
Buffer created
by malloc()
CPU
(Host)
GPU
(Compute
Device)
Buffers created by user (malloc) are not
mapped into the GPU memory space
Global
Memory
Buffer created
by malloc()
CPU
(Host)
Buffer created by
clCreateBuffer()
GPU
(Compute
Device)
COPY
clCreateBuffer(CL_MEM_USE_HOST_PTR)
creates a new buffer and copies the data over
(but the copy operations are expensive)
Global
Memory
Buffer created
by malloc()
Buffers created by user (malloc) are not
mapped into the GPU memory space
Global
Memory
Buffer created
by malloc()
CPU
(Host)
Buffer created by
clCreateBuffer()
GPU
(Compute
Device)
COPY
clCreateBuffer(CL_MEM_USE_HOST_PTR)
creates a new buffer and copies the data over
(but the copy operations are expensive)
Host data pointers
Global
Memory
CPU
(Host)
Buffer created by
clCreateBuffer()
GPU
(Compute
Device)
clCre
create
Where  possible  don’t  use  CL_
– Create buffers at the start of your app
– Use CL_MEM_ALLOC_HOST_PTR instead of m
– Then you can use the buffer on both
clCreateBuffer(CL_MEM_USE_HOST_PTR) clCreateBuffer(CL_MEM_ALLOC_HOST_PTR)malloc()
| © 2013 Aptina Imaging Corporation | Aptina Confidential16
Stream-based vs. Frame-based
•  Stream-based
‣  For low memory devices (e.g. ISP, DSP)
‣  Group of lines processed by kernels
‣  Delay: # of lines a kernel needs
•  Frame-based
‣  For fast data-parallel devices (e.g. GPU)
‣  Full image processed
‣  Delay: whole frame between devices
Kernel
continuous stream
of pixels
Q
Kernel
final image
accumulates lines
Kernel Kernel KernelFrame Frame
Frame Frame
| © 2013 Aptina Imaging Corporation | Aptina Confidential17
Aptina Sensor with MobileHDR™ Turned off
| © 2013 Aptina Imaging Corporation | Aptina Confidential18
Aptina Sensor with MobileHDR™ Turned on
| © 2013 Aptina Imaging Corporation | Aptina Confidential19
AR0833 8MP Camera sensor
•  Frame is inscribed in a circle
‣  4:3 for images e.g. 8MP 3264 x 2448
‣  16:9 for video e.g. 6MP 3264 x 1836
•  10-bit per pixel (framed in 16 bits)
•  At 30fps, we need 343 MB/s for 180 MPix/s
•  Interface with ISP
‣  Data over MIPI CSI2 (serial)
‣  Control over I2C
4:3
2448
3264
16:9
1836
3264
1/3.2" image circle
| © 2013 Aptina Imaging Corporation | Aptina Confidential20
Feature: Interlaced HDR
•  1 frame contains 2 exposures
interlaced
•  Ratio between odd and even pairs
‣  1x, 2x, 4x, 8x
with an algorithm designed to reconstruct this output into an HDR still image
The sensor HDR is controlled by two shutter pointers (Shutter pointer1, Shutte
pointer2) that control the integration of the odd (Shutter pointer1) and even (S
pointer 2) row pairs.
Figure 16: HDR Integration Time
Tint 1
Tint 2
Sample pointer
Shutter pointer 1
Shutter pointer 2
I-FRAME 1
I-FRAME 2
Output Frame from
EXPOSURE
I-FRAME 1
EXPOSURE
I-FRAME 2
Output
I-FRAME 1 and 2
Features
Interlaced HDR Readout
The sensor enables HDR by outputting frames where even and odd row pairs within a
single frame are captured at different integration times. This output is then matched
with an algorithm designed to reconstruct this output into an HDR still image or video.
The sensor HDR is controlled by two shutter pointers (Shutter pointer1, Shutter
pointer2) that control the integration of the odd (Shutter pointer1) and even (Shutter
pointer 2) row pairs.
Figure 16: HDR Integration Time
Tint 1
Tint 2
Sample pointer
Shutter pointer 1
Shutter pointer 2
I-FRAME 1
I-FRAME 2
Output Frame from Sensor
EXPOSURE
I-FRAME 1
EXPOSURE
I-FRAME 2
Output
I-FRAME 1 and 2
Aptina reserves the right to change products or specifications w
AR0833_DS - Rev. F Pub. 4/13 EN 30 ©2011 Aptina Imaging Corporation. All rig
Tint 1
Tint 2
Sample pointer
Shutter pointer 1
Shutter pointer 2
I-FRAME 1
I-FRAME 2
Output Frame from Sen
EXPOSURE
I-FRAME 1
EXPOSURE
I-FRAME 2
Output
I-FRAME 1 and 2
Exposure 1
Exposure 2
| © 2013 Aptina Imaging Corporation | Aptina Confidential21
mobileHDR demo
•  Zero-copy between sensor/OpenCL and OpenCL/OpenGL
•  On Arndale board (Samsung Exynos 5 Dual with Mali T604 GPU)
Noise
Reduction
iHDR
Reconstruction
Bayer scaler
Tone Mapping Color Correction
10b iHDR
3264x1836 14b
RGB888
EGLImage
CL Image
1080p
OpenCL
GL Texture
OpenGL ES
| © 2013 Aptina Imaging Corporation | Aptina Confidential22
Summary
•  Using GPU for imaging
‣  Provide flexible solutions where traditional ISP is not usable
‣  Fast time to market
•  Today’s application processors provide enough processing power
for video HDR
•  Embedded GPUs tend to increase their ALU count x2 every year
‣  Early 2013 4MP30, End 2013 8MP30,
‣  Early 2014 13MP30
| © 2013 Aptina Imaging Corporation | Aptina Confidential23
Questions & Answers
Thank you!

Contenu connexe

Tendances

Introducing LG G Watch
Introducing LG G WatchIntroducing LG G Watch
Introducing LG G Watch
JJ Wu
 
Qualcomm Snapdragon 820 Product and Infographics
Qualcomm Snapdragon 820 Product and InfographicsQualcomm Snapdragon 820 Product and Infographics
Qualcomm Snapdragon 820 Product and Infographics
Mark Shedd
 
Nvidia’s tegra line of processors for mobile devices2 2
Nvidia’s tegra line of processors for mobile devices2 2Nvidia’s tegra line of processors for mobile devices2 2
Nvidia’s tegra line of processors for mobile devices2 2
Sukul Yarraguntla
 

Tendances (20)

Snapdragon Processor
Snapdragon ProcessorSnapdragon Processor
Snapdragon Processor
 
Introducing LG G Watch
Introducing LG G WatchIntroducing LG G Watch
Introducing LG G Watch
 
Qualcomm Snapdragon 820 Product and Infographics
Qualcomm Snapdragon 820 Product and InfographicsQualcomm Snapdragon 820 Product and Infographics
Qualcomm Snapdragon 820 Product and Infographics
 
Apple MacBook 2016
Apple MacBook 2016Apple MacBook 2016
Apple MacBook 2016
 
Snapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 ArchitectureSnapdragon SoC and ARMv7 Architecture
Snapdragon SoC and ARMv7 Architecture
 
Snapdragon
SnapdragonSnapdragon
Snapdragon
 
Snapdragon s4 processors system on chip solutions for a new mobile age
Snapdragon s4 processors system on chip solutions for a new mobile ageSnapdragon s4 processors system on chip solutions for a new mobile age
Snapdragon s4 processors system on chip solutions for a new mobile age
 
Qualcomm Snapdragon Processor
Qualcomm Snapdragon ProcessorQualcomm Snapdragon Processor
Qualcomm Snapdragon Processor
 
Curiosity plus-uav
Curiosity plus-uavCuriosity plus-uav
Curiosity plus-uav
 
mobile processors
mobile processorsmobile processors
mobile processors
 
Curiosity + uav
Curiosity + uavCuriosity + uav
Curiosity + uav
 
Digital Watchdog DWC-MB421TIR Data Sheet
Digital Watchdog DWC-MB421TIR Data SheetDigital Watchdog DWC-MB421TIR Data Sheet
Digital Watchdog DWC-MB421TIR Data Sheet
 
mobile processors introduction..
mobile processors introduction..mobile processors introduction..
mobile processors introduction..
 
Programming Models for Heterogeneous Chips
Programming Models for  Heterogeneous ChipsProgramming Models for  Heterogeneous Chips
Programming Models for Heterogeneous Chips
 
Apple mobile processors
Apple mobile processorsApple mobile processors
Apple mobile processors
 
Nvidia’s tegra line of processors for mobile devices2 2
Nvidia’s tegra line of processors for mobile devices2 2Nvidia’s tegra line of processors for mobile devices2 2
Nvidia’s tegra line of processors for mobile devices2 2
 
Nvidia tegra K1 Presentation
Nvidia tegra K1 PresentationNvidia tegra K1 Presentation
Nvidia tegra K1 Presentation
 
Survey on Mobile Processors
Survey on Mobile ProcessorsSurvey on Mobile Processors
Survey on Mobile Processors
 
Digital Watchdog DWC-MB421TIR650 Data Sheet
Digital Watchdog DWC-MB421TIR650 Data SheetDigital Watchdog DWC-MB421TIR650 Data Sheet
Digital Watchdog DWC-MB421TIR650 Data Sheet
 
Mobile processors
Mobile processors Mobile processors
Mobile processors
 

En vedette (6)

MS Paint Collection: 04 - Astrology Series
MS Paint Collection: 04 - Astrology SeriesMS Paint Collection: 04 - Astrology Series
MS Paint Collection: 04 - Astrology Series
 
abstract art on MS paint
abstract art on MS paintabstract art on MS paint
abstract art on MS paint
 
Microsoft Paint
Microsoft PaintMicrosoft Paint
Microsoft Paint
 
Ms paint tutorial
Ms paint tutorialMs paint tutorial
Ms paint tutorial
 
Paint
PaintPaint
Paint
 
Microsoft Paint Powerpoint
Microsoft Paint PowerpointMicrosoft Paint Powerpoint
Microsoft Paint Powerpoint
 

Similaire à Imaging using ARM T6xx GPU

Droidcon2013 triangles gangolells_imagination
Droidcon2013 triangles gangolells_imaginationDroidcon2013 triangles gangolells_imagination
Droidcon2013 triangles gangolells_imagination
Droidcon Berlin
 
Lecture 15 ryuzo okada - vision processors for embedded computer vision
Lecture 15   ryuzo okada - vision processors for embedded computer visionLecture 15   ryuzo okada - vision processors for embedded computer vision
Lecture 15 ryuzo okada - vision processors for embedded computer vision
mustafa sarac
 
UHK-430 White paper
UHK-430 White paperUHK-430 White paper
UHK-430 White paper
Kris Hill
 
Datasheet.hk_w99802g_4101989
Datasheet.hk_w99802g_4101989Datasheet.hk_w99802g_4101989
Datasheet.hk_w99802g_4101989
Dorian Yeh
 

Similaire à Imaging using ARM T6xx GPU (20)

GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
Atmel-7735-Automotive-Microcontrollers-ATmega169P_-968165.pdf
Atmel-7735-Automotive-Microcontrollers-ATmega169P_-968165.pdfAtmel-7735-Automotive-Microcontrollers-ATmega169P_-968165.pdf
Atmel-7735-Automotive-Microcontrollers-ATmega169P_-968165.pdf
 
Droidcon2013 triangles gangolells_imagination
Droidcon2013 triangles gangolells_imaginationDroidcon2013 triangles gangolells_imagination
Droidcon2013 triangles gangolells_imagination
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
Lecture 15 ryuzo okada - vision processors for embedded computer vision
Lecture 15   ryuzo okada - vision processors for embedded computer visionLecture 15   ryuzo okada - vision processors for embedded computer vision
Lecture 15 ryuzo okada - vision processors for embedded computer vision
 
Ti DSP optimization on Jacinto
Ti DSP optimization on JacintoTi DSP optimization on Jacinto
Ti DSP optimization on Jacinto
 
Optimizing the graphics pipeline with compute
Optimizing the graphics pipeline with computeOptimizing the graphics pipeline with compute
Optimizing the graphics pipeline with compute
 
UHK-430 White paper
UHK-430 White paperUHK-430 White paper
UHK-430 White paper
 
Xmega d4 microcontroller
Xmega d4 microcontrollerXmega d4 microcontroller
Xmega d4 microcontroller
 
ARDUINO AND RASPBERRYPI.pptx
ARDUINO AND RASPBERRYPI.pptxARDUINO AND RASPBERRYPI.pptx
ARDUINO AND RASPBERRYPI.pptx
 
Real Time Video Processing in FPGA
Real Time Video Processing in FPGA Real Time Video Processing in FPGA
Real Time Video Processing in FPGA
 
Crysis 2-key-rendering-features
Crysis 2-key-rendering-featuresCrysis 2-key-rendering-features
Crysis 2-key-rendering-features
 
System-on-Chip Design Flow for the Image Signal Processor of a Nonlinear CMOS...
System-on-Chip Design Flow for the Image Signal Processor of a Nonlinear CMOS...System-on-Chip Design Flow for the Image Signal Processor of a Nonlinear CMOS...
System-on-Chip Design Flow for the Image Signal Processor of a Nonlinear CMOS...
 
Datasheet.hk_w99802g_4101989
Datasheet.hk_w99802g_4101989Datasheet.hk_w99802g_4101989
Datasheet.hk_w99802g_4101989
 
An Overview of LPC2101/02/03
An Overview of LPC2101/02/03An Overview of LPC2101/02/03
An Overview of LPC2101/02/03
 
Optimizing Games for Mobiles
Optimizing Games for MobilesOptimizing Games for Mobiles
Optimizing Games for Mobiles
 
Dsp on an-avr
Dsp on an-avrDsp on an-avr
Dsp on an-avr
 
DSP Processor
DSP Processor DSP Processor
DSP Processor
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
ATMEGA-169P.pdf
ATMEGA-169P.pdfATMEGA-169P.pdf
ATMEGA-169P.pdf
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Imaging using ARM T6xx GPU

  • 1. | © 2013 Aptina Imaging Corporation | Aptina Confidential1 © 2013 Aptina Imaging Corporation. All rights reserved. Products are warranted only to meet Aptina’s production data sheet specifications. Information, products, and/or specifications are subject to change without notice. All information is provided on an “AS IS” basis without warranties of any kind. Dates are estimates only. Drawings not to scale. Aptina and the Aptina logo are trademarks of Aptina Imaging Corporation. All other trademarks are the property of their respective owners. Imaging using ARM GPU Investigating flexible imaging pipelines using embedded GPU Mikaël Bourges-Sévenier (msevenier at aptina dot com) Director, High-Performance Imaging December 2, 2013 HPC & GPU Supercomputing Group of Silicon Valley
  • 2. | © 2013 Aptina Imaging Corporation | Aptina Confidential2 Agenda •  Toward more flexible imaging pipelines •  Replacing image processor by software & hardware •  Video HDR using Aptina Interlaced HDR sensor •  Q&A
  • 3. | © 2013 Aptina Imaging Corporation | Aptina Confidential3 Cameras are everywhere Interactive Systems that respond to user actions (PC, Gaming, Mobile) •  Motion/Gesture tracking and recognition •  Body tracking Environmental Imaging systems that are situationally aware (Camera, Mobile, PC) •  Face Detection/Track •  Gesture tracking •  Object tracking Ubiquitous “Cameras Everywhere” Distributed Systems (Mobile, Camera, DIY- SOHO) •  Point and shoot •  HDR •  Surveillance
  • 4. | © 2013 Aptina Imaging Corporation | Aptina Confidential4 Computational imaging evolution Spatial (Volumetric) Gesture AR Face Detect Face Track Presence Colorimetry Brightness Web Cam Smart Camera True Color, Brightness Compensation, Exposure control User Identity Access Control Augmented Information 3D Imaging Interactive Services
  • 5. | © 2013 Aptina Imaging Corporation | Aptina Confidential5 How imaging pipelines work
  • 6. | © 2013 Aptina Imaging Corporation | Aptina Confidential6 How Imaging Sensors work http://www.photoaxe.com Bayer GRBG pattern •  50% green •  25% red and blue Bayer CFA is one type of pattern
  • 7. | © 2013 Aptina Imaging Corporation | Aptina Confidential7 Bayer Demosaicing •  More G than R, B since eye is more sensitive to luminance than chrominance •  Convert pixel colors from Bayer space to Full RGB color •  Complex interpolation to avoid artifacts (e.g. on edges) 0 1 2 3 0 GRBG 1 RGGB 2 GBRG 3 BGGR RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B RG B
  • 8. | © 2013 Aptina Imaging Corporation | Aptina Confidential8 From RAW to RGB/YUV: the ISP •  ISP = Imaging Signal Processor ‣  Transform sensor RAW images to YUV ‣  Very complex pipelines, dedicated, optimized for imaging ‣  Low power (200-400mW) ‣  Embedded in Application Processor or as a separate co-processor Can I upgrade algorithms? Image Signal Processor (ISP) CMOS sensor Color Filter Array Lens RAW Bayer RGB YUV Lens, sensor, aperture control
  • 9. | © 2013 Aptina Imaging Corporation | Aptina Confidential9 Image pipeline block diagram (typical) Sensor Bayer data Black Level adjust Lens Shading correction White Balancing Defect Correction Noise Reduction (Bayer) Green balance Demosaic Color Correction Sharpening Tone Mapping and Gamma Full color RGB data (to YUV for JPEG)
  • 10. | © 2013 Aptina Imaging Corporation | Aptina Confidential10 Problem Statement •  Given a non-typical imaging pipeline, how do we ‣  Take advantage of resources in an embedded platform? ‣  Keep frame rate at 30fps? ‣  Preserve good image quality? ‣  Minimize power usage? ‣  Provide flexible pipelines
  • 11. | © 2013 Aptina Imaging Corporation | Aptina Confidential11 Alternative Approaches to ISP-only Hybrid Full Software ISP + GPU + CPU + DSP GPU + CPU + DSP Less power More power Bayer pattern Any pattern Reuse existing ISP (may not be re-entrant) Require fast processors Require recent devices Require high-end devices ISP may only output 8b precision 8b-32b precision Pre-processing Image Signal Processor (ISP) Post- processing CMOS sensor Color Filter Array Lens Bayer RGB YUV App Lens, sensor, aperture control 3A
  • 12. | © 2013 Aptina Imaging Corporation | Aptina Confidential12 MobileHDR on ARM Mali T604
  • 13. | © 2013 Aptina Imaging Corporation | Aptina Confidential13 Arndale Samsung Exynos 5 Dual board •  Arndale Samsung Exynos 5 board ‣  CPU: ARM Corte-A15 (2-core) 1.7 GHz 32nm •  32KB L1 cache, 1MB L2 cache ‣  GPU: ARM MALI T604 •  64 concurrent threads •  Vector ALUs •  128b registers •  OpenCL 1.1 Full Profile ‣  RAM: 2GB LP-DDR3 800 MHz (12.8 GB/s) ‣  Truly unified cached memory •  CPU and GPU memory is shared – NO COPY! •  128b wide L1 and L2 access ‣  2 independent job queue in T628 (in Samsung Exynos 5 Octa)
  • 14. | © 2013 Aptina Imaging Corporation | Aptina Confidential14 ARM Mali T604 GPUs In Samsung Exynos 5 Dual Type Vector GPU Process 32nm OpenCL 1.1 Full Profile Unified memory Yes Rendering Tile Work-items 256 Clock 533MHz L2 cache 1MB Register width 128b Global memory 2GB LP-DDR3 800Mhz (12.8 GB/s) Pipelines 8 pipes (2 per core) Throughput 100 GFLOPS Local memory 32KB/core (global) Constant memory 64KB Texture cache yes Compute devices (shader cores) 4 Cacheline 64 bytes 16/32/64b floats No/yes/yes
  • 15. | © 2013 Aptina Imaging Corporation | Aptina Confidential15 Avoid buffer copy •  Mali has unified memory ‣  Use CL_MEM_ALLOC_PTR to avoid copy between CPU and GPU Host data pointers Global Memory Buffer created by malloc() CPU (Host) GPU (Compute Device) Buffers created by user (malloc) are not mapped into the GPU memory space Global Memory Buffer created by malloc() CPU (Host) Buffer created by clCreateBuffer() GPU (Compute Device) COPY clCreateBuffer(CL_MEM_USE_HOST_PTR) creates a new buffer and copies the data over (but the copy operations are expensive) Global Memory Buffer created by malloc() Buffers created by user (malloc) are not mapped into the GPU memory space Global Memory Buffer created by malloc() CPU (Host) Buffer created by clCreateBuffer() GPU (Compute Device) COPY clCreateBuffer(CL_MEM_USE_HOST_PTR) creates a new buffer and copies the data over (but the copy operations are expensive) Host data pointers Global Memory CPU (Host) Buffer created by clCreateBuffer() GPU (Compute Device) clCre create Where  possible  don’t  use  CL_ – Create buffers at the start of your app – Use CL_MEM_ALLOC_HOST_PTR instead of m – Then you can use the buffer on both clCreateBuffer(CL_MEM_USE_HOST_PTR) clCreateBuffer(CL_MEM_ALLOC_HOST_PTR)malloc()
  • 16. | © 2013 Aptina Imaging Corporation | Aptina Confidential16 Stream-based vs. Frame-based •  Stream-based ‣  For low memory devices (e.g. ISP, DSP) ‣  Group of lines processed by kernels ‣  Delay: # of lines a kernel needs •  Frame-based ‣  For fast data-parallel devices (e.g. GPU) ‣  Full image processed ‣  Delay: whole frame between devices Kernel continuous stream of pixels Q Kernel final image accumulates lines Kernel Kernel KernelFrame Frame Frame Frame
  • 17. | © 2013 Aptina Imaging Corporation | Aptina Confidential17 Aptina Sensor with MobileHDR™ Turned off
  • 18. | © 2013 Aptina Imaging Corporation | Aptina Confidential18 Aptina Sensor with MobileHDR™ Turned on
  • 19. | © 2013 Aptina Imaging Corporation | Aptina Confidential19 AR0833 8MP Camera sensor •  Frame is inscribed in a circle ‣  4:3 for images e.g. 8MP 3264 x 2448 ‣  16:9 for video e.g. 6MP 3264 x 1836 •  10-bit per pixel (framed in 16 bits) •  At 30fps, we need 343 MB/s for 180 MPix/s •  Interface with ISP ‣  Data over MIPI CSI2 (serial) ‣  Control over I2C 4:3 2448 3264 16:9 1836 3264 1/3.2" image circle
  • 20. | © 2013 Aptina Imaging Corporation | Aptina Confidential20 Feature: Interlaced HDR •  1 frame contains 2 exposures interlaced •  Ratio between odd and even pairs ‣  1x, 2x, 4x, 8x with an algorithm designed to reconstruct this output into an HDR still image The sensor HDR is controlled by two shutter pointers (Shutter pointer1, Shutte pointer2) that control the integration of the odd (Shutter pointer1) and even (S pointer 2) row pairs. Figure 16: HDR Integration Time Tint 1 Tint 2 Sample pointer Shutter pointer 1 Shutter pointer 2 I-FRAME 1 I-FRAME 2 Output Frame from EXPOSURE I-FRAME 1 EXPOSURE I-FRAME 2 Output I-FRAME 1 and 2 Features Interlaced HDR Readout The sensor enables HDR by outputting frames where even and odd row pairs within a single frame are captured at different integration times. This output is then matched with an algorithm designed to reconstruct this output into an HDR still image or video. The sensor HDR is controlled by two shutter pointers (Shutter pointer1, Shutter pointer2) that control the integration of the odd (Shutter pointer1) and even (Shutter pointer 2) row pairs. Figure 16: HDR Integration Time Tint 1 Tint 2 Sample pointer Shutter pointer 1 Shutter pointer 2 I-FRAME 1 I-FRAME 2 Output Frame from Sensor EXPOSURE I-FRAME 1 EXPOSURE I-FRAME 2 Output I-FRAME 1 and 2 Aptina reserves the right to change products or specifications w AR0833_DS - Rev. F Pub. 4/13 EN 30 ©2011 Aptina Imaging Corporation. All rig Tint 1 Tint 2 Sample pointer Shutter pointer 1 Shutter pointer 2 I-FRAME 1 I-FRAME 2 Output Frame from Sen EXPOSURE I-FRAME 1 EXPOSURE I-FRAME 2 Output I-FRAME 1 and 2 Exposure 1 Exposure 2
  • 21. | © 2013 Aptina Imaging Corporation | Aptina Confidential21 mobileHDR demo •  Zero-copy between sensor/OpenCL and OpenCL/OpenGL •  On Arndale board (Samsung Exynos 5 Dual with Mali T604 GPU) Noise Reduction iHDR Reconstruction Bayer scaler Tone Mapping Color Correction 10b iHDR 3264x1836 14b RGB888 EGLImage CL Image 1080p OpenCL GL Texture OpenGL ES
  • 22. | © 2013 Aptina Imaging Corporation | Aptina Confidential22 Summary •  Using GPU for imaging ‣  Provide flexible solutions where traditional ISP is not usable ‣  Fast time to market •  Today’s application processors provide enough processing power for video HDR •  Embedded GPUs tend to increase their ALU count x2 every year ‣  Early 2013 4MP30, End 2013 8MP30, ‣  Early 2014 13MP30
  • 23. | © 2013 Aptina Imaging Corporation | Aptina Confidential23 Questions & Answers Thank you!