Contenu connexe Similaire à Imaging using ARM T6xx GPU (20) Imaging using ARM T6xx GPU1. | © 2013 Aptina Imaging Corporation | Aptina Confidential1
© 2013 Aptina Imaging Corporation. All rights reserved. Products are warranted only to meet Aptina’s production data sheet specifications. Information, products, and/or
specifications are subject to change without notice. All information is provided on an “AS IS” basis without warranties of any kind. Dates are estimates only. Drawings not to
scale. Aptina and the Aptina logo are trademarks of Aptina Imaging Corporation. All other trademarks are the property of their respective owners.
Imaging using ARM GPU
Investigating flexible imaging pipelines using
embedded GPU
Mikaël Bourges-Sévenier (msevenier at aptina dot com)
Director, High-Performance Imaging
December 2, 2013
HPC & GPU Supercomputing Group
of Silicon Valley
2. | © 2013 Aptina Imaging Corporation | Aptina Confidential2
Agenda
• Toward more flexible imaging pipelines
• Replacing image processor by software & hardware
• Video HDR using Aptina Interlaced HDR sensor
• Q&A
3. | © 2013 Aptina Imaging Corporation | Aptina Confidential3
Cameras are everywhere
Interactive
Systems that respond
to user actions
(PC, Gaming, Mobile)
• Motion/Gesture tracking
and recognition
• Body tracking
Environmental
Imaging systems that
are situationally
aware
(Camera, Mobile, PC)
• Face Detection/Track
• Gesture tracking
• Object tracking
Ubiquitous
“Cameras Everywhere”
Distributed Systems
(Mobile, Camera, DIY-
SOHO)
• Point and shoot
• HDR
• Surveillance
4. | © 2013 Aptina Imaging Corporation | Aptina Confidential4
Computational imaging evolution
Spatial
(Volumetric)
Gesture
AR
Face Detect
Face Track
Presence
Colorimetry
Brightness
Web Cam
Smart
Camera
True Color, Brightness
Compensation, Exposure control
User Identity
Access Control
Augmented Information
3D Imaging
Interactive
Services
5. | © 2013 Aptina Imaging Corporation | Aptina Confidential5
How imaging pipelines work
6. | © 2013 Aptina Imaging Corporation | Aptina Confidential6
How Imaging Sensors work
http://www.photoaxe.com
Bayer GRBG pattern
• 50% green
• 25% red and blue
Bayer CFA is one type
of pattern
7. | © 2013 Aptina Imaging Corporation | Aptina Confidential7
Bayer Demosaicing
• More G than R, B since eye is more sensitive to luminance than
chrominance
• Convert pixel colors from Bayer space to Full RGB color
• Complex interpolation to avoid artifacts (e.g. on edges)
0 1
2 3
0 GRBG
1 RGGB
2 GBRG
3 BGGR
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
RG
B
8. | © 2013 Aptina Imaging Corporation | Aptina Confidential8
From RAW to RGB/YUV: the ISP
• ISP = Imaging Signal Processor
‣ Transform sensor RAW images to YUV
‣ Very complex pipelines, dedicated, optimized for imaging
‣ Low power (200-400mW)
‣ Embedded in Application Processor or as a separate co-processor
Can I upgrade
algorithms?
Image Signal Processor
(ISP)
CMOS sensor
Color Filter Array
Lens
RAW
Bayer
RGB
YUV
Lens, sensor, aperture control
9. | © 2013 Aptina Imaging Corporation | Aptina Confidential9
Image pipeline block diagram (typical)
Sensor
Bayer data
Black Level
adjust
Lens Shading
correction
White
Balancing
Defect
Correction
Noise Reduction
(Bayer)
Green balance Demosaic
Color
Correction
Sharpening
Tone Mapping
and Gamma Full color RGB data
(to YUV for JPEG)
10. | © 2013 Aptina Imaging Corporation | Aptina Confidential10
Problem Statement
• Given a non-typical imaging pipeline, how do we
‣ Take advantage of resources in an embedded platform?
‣ Keep frame rate at 30fps?
‣ Preserve good image quality?
‣ Minimize power usage?
‣ Provide flexible pipelines
11. | © 2013 Aptina Imaging Corporation | Aptina Confidential11
Alternative Approaches to ISP-only
Hybrid Full Software
ISP + GPU + CPU + DSP GPU + CPU + DSP
Less power More power
Bayer pattern Any pattern
Reuse existing ISP (may not be re-entrant) Require fast processors
Require recent devices Require high-end devices
ISP may only output 8b precision 8b-32b precision
Pre-processing
Image Signal Processor
(ISP)
Post-
processing
CMOS sensor
Color Filter Array
Lens
Bayer RGB
YUV
App
Lens, sensor, aperture control
3A
12. | © 2013 Aptina Imaging Corporation | Aptina Confidential12
MobileHDR on ARM Mali T604
13. | © 2013 Aptina Imaging Corporation | Aptina Confidential13
Arndale Samsung Exynos 5 Dual board
• Arndale Samsung Exynos 5 board
‣ CPU: ARM Corte-A15 (2-core) 1.7 GHz 32nm
• 32KB L1 cache, 1MB L2 cache
‣ GPU: ARM MALI T604
• 64 concurrent threads
• Vector ALUs
• 128b registers
• OpenCL 1.1 Full Profile
‣ RAM: 2GB LP-DDR3 800 MHz (12.8 GB/s)
‣ Truly unified cached memory
• CPU and GPU memory is shared – NO COPY!
• 128b wide L1 and L2 access
‣ 2 independent job queue in T628 (in Samsung Exynos 5 Octa)
14. | © 2013 Aptina Imaging Corporation | Aptina Confidential14
ARM Mali T604 GPUs
In Samsung Exynos 5 Dual
Type Vector GPU Process 32nm
OpenCL 1.1 Full Profile Unified memory Yes
Rendering Tile Work-items 256
Clock 533MHz L2 cache 1MB
Register width 128b Global memory 2GB LP-DDR3 800Mhz (12.8 GB/s)
Pipelines 8 pipes (2 per core) Throughput 100 GFLOPS
Local memory 32KB/core (global)
Constant memory 64KB
Texture cache yes
Compute devices (shader
cores)
4
Cacheline 64 bytes
16/32/64b floats No/yes/yes
15. | © 2013 Aptina Imaging Corporation | Aptina Confidential15
Avoid buffer copy
• Mali has unified memory
‣ Use CL_MEM_ALLOC_PTR to avoid copy between CPU and GPU
Host data pointers
Global
Memory
Buffer created
by malloc()
CPU
(Host)
GPU
(Compute
Device)
Buffers created by user (malloc) are not
mapped into the GPU memory space
Global
Memory
Buffer created
by malloc()
CPU
(Host)
Buffer created by
clCreateBuffer()
GPU
(Compute
Device)
COPY
clCreateBuffer(CL_MEM_USE_HOST_PTR)
creates a new buffer and copies the data over
(but the copy operations are expensive)
Global
Memory
Buffer created
by malloc()
Buffers created by user (malloc) are not
mapped into the GPU memory space
Global
Memory
Buffer created
by malloc()
CPU
(Host)
Buffer created by
clCreateBuffer()
GPU
(Compute
Device)
COPY
clCreateBuffer(CL_MEM_USE_HOST_PTR)
creates a new buffer and copies the data over
(but the copy operations are expensive)
Host data pointers
Global
Memory
CPU
(Host)
Buffer created by
clCreateBuffer()
GPU
(Compute
Device)
clCre
create
Where possible don’t use CL_
– Create buffers at the start of your app
– Use CL_MEM_ALLOC_HOST_PTR instead of m
– Then you can use the buffer on both
clCreateBuffer(CL_MEM_USE_HOST_PTR) clCreateBuffer(CL_MEM_ALLOC_HOST_PTR)malloc()
16. | © 2013 Aptina Imaging Corporation | Aptina Confidential16
Stream-based vs. Frame-based
• Stream-based
‣ For low memory devices (e.g. ISP, DSP)
‣ Group of lines processed by kernels
‣ Delay: # of lines a kernel needs
• Frame-based
‣ For fast data-parallel devices (e.g. GPU)
‣ Full image processed
‣ Delay: whole frame between devices
Kernel
continuous stream
of pixels
Q
Kernel
final image
accumulates lines
Kernel Kernel KernelFrame Frame
Frame Frame
17. | © 2013 Aptina Imaging Corporation | Aptina Confidential17
Aptina Sensor with MobileHDR™ Turned off
18. | © 2013 Aptina Imaging Corporation | Aptina Confidential18
Aptina Sensor with MobileHDR™ Turned on
19. | © 2013 Aptina Imaging Corporation | Aptina Confidential19
AR0833 8MP Camera sensor
• Frame is inscribed in a circle
‣ 4:3 for images e.g. 8MP 3264 x 2448
‣ 16:9 for video e.g. 6MP 3264 x 1836
• 10-bit per pixel (framed in 16 bits)
• At 30fps, we need 343 MB/s for 180 MPix/s
• Interface with ISP
‣ Data over MIPI CSI2 (serial)
‣ Control over I2C
4:3
2448
3264
16:9
1836
3264
1/3.2" image circle
20. | © 2013 Aptina Imaging Corporation | Aptina Confidential20
Feature: Interlaced HDR
• 1 frame contains 2 exposures
interlaced
• Ratio between odd and even pairs
‣ 1x, 2x, 4x, 8x
with an algorithm designed to reconstruct this output into an HDR still image
The sensor HDR is controlled by two shutter pointers (Shutter pointer1, Shutte
pointer2) that control the integration of the odd (Shutter pointer1) and even (S
pointer 2) row pairs.
Figure 16: HDR Integration Time
Tint 1
Tint 2
Sample pointer
Shutter pointer 1
Shutter pointer 2
I-FRAME 1
I-FRAME 2
Output Frame from
EXPOSURE
I-FRAME 1
EXPOSURE
I-FRAME 2
Output
I-FRAME 1 and 2
Features
Interlaced HDR Readout
The sensor enables HDR by outputting frames where even and odd row pairs within a
single frame are captured at different integration times. This output is then matched
with an algorithm designed to reconstruct this output into an HDR still image or video.
The sensor HDR is controlled by two shutter pointers (Shutter pointer1, Shutter
pointer2) that control the integration of the odd (Shutter pointer1) and even (Shutter
pointer 2) row pairs.
Figure 16: HDR Integration Time
Tint 1
Tint 2
Sample pointer
Shutter pointer 1
Shutter pointer 2
I-FRAME 1
I-FRAME 2
Output Frame from Sensor
EXPOSURE
I-FRAME 1
EXPOSURE
I-FRAME 2
Output
I-FRAME 1 and 2
Aptina reserves the right to change products or specifications w
AR0833_DS - Rev. F Pub. 4/13 EN 30 ©2011 Aptina Imaging Corporation. All rig
Tint 1
Tint 2
Sample pointer
Shutter pointer 1
Shutter pointer 2
I-FRAME 1
I-FRAME 2
Output Frame from Sen
EXPOSURE
I-FRAME 1
EXPOSURE
I-FRAME 2
Output
I-FRAME 1 and 2
Exposure 1
Exposure 2
21. | © 2013 Aptina Imaging Corporation | Aptina Confidential21
mobileHDR demo
• Zero-copy between sensor/OpenCL and OpenCL/OpenGL
• On Arndale board (Samsung Exynos 5 Dual with Mali T604 GPU)
Noise
Reduction
iHDR
Reconstruction
Bayer scaler
Tone Mapping Color Correction
10b iHDR
3264x1836 14b
RGB888
EGLImage
CL Image
1080p
OpenCL
GL Texture
OpenGL ES
22. | © 2013 Aptina Imaging Corporation | Aptina Confidential22
Summary
• Using GPU for imaging
‣ Provide flexible solutions where traditional ISP is not usable
‣ Fast time to market
• Today’s application processors provide enough processing power
for video HDR
• Embedded GPUs tend to increase their ALU count x2 every year
‣ Early 2013 4MP30, End 2013 8MP30,
‣ Early 2014 13MP30
23. | © 2013 Aptina Imaging Corporation | Aptina Confidential23
Questions & Answers
Thank you!