Overview of Vivante's GC Nano GPU product line targeting the latest trends in wearables and IoT. This white paper covers GPU technologies and dives into the various graphics processor architectures in the market and the pros/cons when it comes to UI acceleration.
3. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 3 of 20
Table of Contents
Legal Notices 2
Table of Contents 3
1 Background and Overview ....................................................................................................................... 4
2 Vivante GPU Product Overview................................................................................................................ 5
3 GC Nano Overview................................................................................................................................... 6
4 Trends and Importance of 3D User Interface Rendering........................................................................... 9
5 GC Nano Bandwidth Calculation ............................................................................................................ 10
6 Vivante Immediate Mode Rendering Advantage for UIs ........................................................................ 15
7 Summary 19
Document Revision History.................................................................................................................................. 20
4. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 4 of 20
1 Background and Overview
Crisp, clear, and responsive user interface HMI (human machine interface) has become equally important
to the user experience as the content or the device form factor. A beautifully crafted smartphone that
uses a combination of brushed titanium and smudge-proof glass may look great in the hand, but the user
will quickly opt for another product if the user interface stutters or the screen is hard to read because of
aliased and inconsistent fonts. The same scenario also applies to HMI in wearables and IoT devices, which
is the focus of this white paper.
The goal of a well-designed wearable/IoT HMI is to make reading or glancing at the screen intuitive and
natural, yet engaging. In other words, it is about a consistent, seamless interaction between user and
device. Since device screens are smaller, information needs to be displayed in a simplified, uncluttered
way with only relevant data (text, images, icons, video, etc.) rendered and composed onscreen. Smaller
device screens do not directly translate into a device with less processing capabilities. The opposite can be
true since upcoming devices need to perform real time processing (UI display composition,
communications, sensor processing, analytics, etc.) as part of a single or network of IoT nodes. Some
wearables/IoTs are taking technologies found in low/mid-range smartphone application processors and
customizing parts of the IP specifically for wearables. One important IP that device OEMs need is the
graphics processing unit (GPU) to accelerate HMI screen composition at ultra-low power.
In addition, a couple hot new trends in these emerging markets is personalized screen UI or a unified UI
that spans all devices from cars and 4K TVs, to smartphones, wearables and embedded IoT screens to give
users a consistent, ubiquitous screen experience across a given operating system (OS) platform,
regardless of the underlying hardware (i.e. SoC/MCU). This will enable a cross vendor solution where
vendor A’s smartwatch will work correctly with vendor B’s TV and vendor C’s smartphone. Google and
Microsoft have recently announced support for these features in their Android Material Design and
Windows 9 releases, respectively. Support for this requires an OpenGL ES 2.0 capable GPU at the
minimum, with optional/advanced features using OpenGL ES 3.x. Google also has their light weight
wearables OS called Android Wear that requires a GPU to give the UI a similar look-and-feel as their
standard smartphone/tablet/TV Android OS.
Figure 1: Evolving Wearable and IoT Devices Requiring GPUs
5. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 5 of 20
2 Vivante GPU Product Overview
The underlying technology that accelerates HMI user experience is the graphics processing unit (GPU).
GPUs natively do screen/UI composition including multi-layer blending from multiple sources (ISP/Camera,
Video, etc.), image filtering, font rendering/acceleration, 3D effects (transition, perspective view, etc.) and
lots more. Vivante has a complete top-to-bottom product line of GPU technologies that include the GC
Vega and GC Nano Series:
GC Vega Series targets SoCs that need the latest and greatest GPU hardware and features
like OpenGL ES 3.1, Full Android Extension Pack (AEP) Support including hardware
tessellation / geometry shaders (TS/GS), DirectX 12, close to the metal GPU programming,
hybrid ray tracing, zero driver overhead, sensor fusion, and GPU-Compute for vision
processing using OpenVX, OpenCV or OpenCL, bundled in the most aggressive PPA and
feature-complete design. Target markets range from high end wearables and low/mid-range
mobile devices up to 4K TVs and GPUs for server virtualization.
GC Nano Series falls on the other side of the spectrum and targets devices that are making a
revolutionary push into consumer products like wearables and IoT (smart homes / appliances,
information gadgets, etc.) with GPU rendered HMI / UI. This core is specifically designed to
work in resource constrained environments where CPU, memory (both on-chip and DDR),
battery, and bandwidth are very limited. GC Nano is also optimized to work with MCU
platforms for smaller form factors that require UI composition acceleration at 30/60+ FPS.
Figure 2: Vivante GPU Product Line and Target Markets
6. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 6 of 20
3 GC Nano Overview
GC Nano Series consists of the following products starting with the GC Nano Lite (entry), GC Nano
(mainstream) and GC Nano Ultra (mid/high).
Figure 3: GC Nano Product Line
GC Nano Series benefits include:
Silicon Area and Power Optimized: Tiny silicon footprint that maximizes performance-per-
area for silicon constrained SoCs means vendors can add enhanced graphics functionality to
their designs without exceeding silicon/power budgets and still maintain responsive and
smooth UI performance. GC Nano maximizes battery life with ultra-low power consumption
and thermals with minimal dynamic power and near zero leakage power.
Smart Composition: Vivante’s Immediate Mode Rendering (IMR) architecture reduces
composition bandwidth, latency, overhead and power by intelligently composing and
updating only screen regions that change. Composition works either with GC Nano
composing all screen layers (graphics, background, images, videos, text, etc.) or through a
tightly coupled design where the GC Nano and display controller/processor (3
rd
party or
Vivante DC core) work in tandem for UI composition. Data can also be compressed /
decompressed through Vivante’s DEC compression IP core to further reduce bandwidth.
Wearables and IoT Ready: Ultra-lightweight vector graphics (GC Nano Lite) and OpenGL ES
2.0 (GC Nano, GC Nano Ultra) drivers, SDK and tools to easily transition wearables and IoT
screens to consumer level graphical interfaces. The GCcNano package also includes tutorials,
sample code, and documentation to help developers optimize or port their code.
Designed for MCU/MPU Platforms: Efficient design to offload and significantly reduce
system resources including complete UI / composition and display controller integration,
minimal CPU overhead, DDR-less and flash memory only configurations, bandwidth
modulation, close-to-the-metal GPU drivers, and wearables / IoT-specific GPU features to
shrink silicon size. The tiny software code size puts less constraints on memory size, speeds
up GPU initialization/boot-up times and allows instant-on UI composition for screens that
need to display information at the push of a button.
Ecosystem and Software Support: Developers can take advantage of the lightweight NanoUI
or OpenGL ES API to further enhance or customize their solutions. Large industry support on
7. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 7 of 20
existing Vivante products include the GC Nano / GC Nano Ultra product line on Android,
Android Wear and embedded UI solutions from key partners covering tools for font, artwork
and Qt development environments.
Compute Ready: As the number of wearable / IoT (processing) nodes grows by several tens
of billions of units in the next few years, bandwidth on data networks could be an issue with
an always-on, always-connected, always-processing node. GC Nano helps with this by
performing ultra-low power processing (GFLOP / GINT ops) at the node and only transmits
useful compressed data as needed. Examples include sensor fusion calculations and
image/video bandwidth reduction.
Vivante’s software driver stack, SDK and toolkit will support its NanoUI API that brings close-to-the-metal
GPU acceleration for no-OS / no-DDR options on GC Nano Lite and the OpenGL ES 2.0 API (optional 3.x)
for more advanced solutions that include proprietary or high-level operating systems like embedded Linux,
Tizen™, Android™, Android™ Wear and other RTOS that require OpenGL ES 2.0+ in the smallest memory
footprint. These various OS / non-OS platforms will form the base of next generation wearables and IoT
that bring personalized, unique and optimized experiences to each person. The GC Nano drivers include
aggressive power savings, intelligent composition and rendering, and bandwidth modulation that allow
OEMs and developers to build rich visual experiences on wearables and IoT using an ultralight UI /
composition or 3D graphics driver.
Many of the GC Nano innovations create a complete “visual” wearables MCU/SoC platform that optimizes
PPA and software efficiency to improve overall device performance and BOM cost, with the most compact
UI graphics hardware and software footprint that does not diminish or restrict the onscreen user
experience. These new GPUs are making their way into some exciting products that will appear all around
you as wearables and IoT get integrated into our lives.
Figure 4: GC Nano Series Features and Specifications
8. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 8 of 20
Figure 5: Example GC Nano Series SoC/MCU Implementation
9. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 9 of 20
4 Trends and Importance of 3D User Interface Rendering
In the UI sample in Figure 6 of a smart home device, next generation products will take some of the well
thought out UI design elements from smartphones, tablets and TVs and incorporate them into IoT devices
(and wearables) to keep a consistent interface between products. The similar UI look-and-feel will reduce
usage learning curve and accelerate device adoption. As a side note, since different devices have different
levels of processing/performance capabilities, a minimum level will be used for smaller screens (baseline
performance) with additional features/higher performance added as device capabilities move up into a
higher tier segmented by the OS vendor.
Figure 6: Sample HMI user interface on a smart home device
A few examples of updated UIs include the following:
Animated icons – easily shows the user which menu item is selected or where the input
cursor is pointed to so the user does not need to spend time searching for cursor position
onscreen. Icons can rotate, wiggle, pop out, flash, etc. before being selected.
Live animations – dynamic content can turn a simple background (wall paper) into a
dynamic moving scene that can add a personal touch to your device. Background images
and designs can also be personalized to match décor, lighting, theme and mood. Some
white good appliance makers are testing these concept designs, hoping to put one (or two)
inside your kitchen in the near future.
3D effects – text, icons and images that go beyond simple shadows where feature of the
GPU can render using powerful shader instructions to give 3-dimensional character to parts
of the UI (ex. carousel, parallax, depth blur, widget/icon rendering to 3D/2D shapes,
procedural/template animations for icon movements, physical simulations for particle
systems, perspective view, etc.). These effects can be implemented using the GC Nano’s
ultra-low power OpenGL ES 2.0/3.x pipeline.
GC Nano’s architecture excels at HMI UI composition by bringing out 3D UI effects, bandwidth
reduction and reduced latency, which will be discussed below.
10. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 10 of 20
5 GC Nano Bandwidth Calculation
In this section we will step through examples of various user interface scenarios and calculate system
bandwidth for both 30 and 60 FPS UI HMI rendering through the GC Nano GPU. All calculation
assumptions are stated in section 5.2.
Methods of Composition5.1
There are also two options for screen display composition that we will evaluate – first, where the GPU
does the entire screen composition of all layers (or surfaces) including video and the display controller
simply outputs the already composited HMI UI onscreen, and second, where the display controller takes
composited layers from both GPU and video decoder (VPU) and does the final UI composition blend and
merge before displaying. The top level diagrams below do not show DDR memory transactions, but they
will be shown in section 5.2 when describing the UI steps.
Figure 7: GC Nano Full Composition: All UI layers are processed by GC Nano before sending the final
output frame to the display controller
Figure 8: Display Controller Composition: Final output frame is composited by the display controller using
input layers from GC Nano and the video processor
11. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 11 of 20
UI Bandwidth Calculations5.2
Calculation assumptions:
GC Nano UI processing is in ARGB8 (32-bits per pixel) format. When GC Nano performs full
composition, the GPU automatically converts 16-bit YUV video format into 32-bit ARGB.
Video frame is in YUV422 (16-bits per pixel) and has the same resolution as the screen size
(GC Nano treats incoming video as video textures)
Final composited frame is in ARGB8 format (32-bits per pixel)
Reading video has a request burst size of 32-bytes
GC Nano UI request burst size is 64-bytes
Write sizes for writing out the UI rendering and final frame is 64-bytes
For these cases we assume 32-bit UI rendering. If the display format is 16-bits (applicable to
smaller screens) then the bandwidth calculations listed below will be much lower.
Bandwidth calculation examples will be given for WVGA (800x480) and 720p (1280x720)
The amount of UI pixels per frame that need to be refreshed/updated (in our example) will
include the following percentages:
o 15% (standard UI)
o 25%
o 50% (worst case UI)
5.2.1 GC Nano Full UI Composition
The following images describe the flow of data to/from DDR memory using GC Nano to perform the entire
UI composition. Some major benefits of using this method include using the GPU to perform some pre-
post processing on images or videos, filtering, adding standard 3D effects to images/videos (video
carousel, warping/dewarping, etc.) and augmented reality where GC Nano overlays rendered 3D content
on top of a video stream. This method is the most flexible since the GC Nano can be programmed to
perform image/UI related tasks.
12. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 12 of 20
Figure 9: GC Nano Full Composition memory access and UI rendering steps (steps 1 – 4)
Bandwidth calculation is as follows:
Resolution
(WxH)
Total
Screen
Pixels
(a)
UI% Updated per
Frame
UI Pixels
Updated
per Frame
(b)
UI Pixels
Updated
per Frame
(Bytes)
(c)
Composition –
UI and Video
(Bytes)
(d)
Total
Bandwidth
per frame
(MB)
(e)
Total
Bandwidth
@ 30FPS
(f)
Total
Bandwidth
@ 60FPS
(f)
800x480
(480p, 32bpp)
384000
15% (standard UI) 57600 230400 2419200 2.65 79.49 158.98
25% 96000 384000 2496000 2.88 86.40 172.80
50% (worst case UI) 192000 768000 2688000 3.46 103.68 207.36
1280x720
(720p, 32bpp)
921600
15% (standard UI) 138240 5806080 5806080 6.36 190.77 381.54
25% 230400 5990400 5990400 6.91 207.36 414.72
50% (worst case UI) 460800 6451200 6451200 8.29 248.83 497.66
Notes:
a) Total screen pixels = resolution WxH
b) UI pixels updated per frame = [Total screen pixels] * [UI% updated per frame]
13. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 13 of 20
c) Total UI pixels updated per frame in bytes = [UI pixels updated per frame] * [4 Bytes]; 4 Bytes since
the UI format is 32bpp ARGB8888
d) Assumes video is in the background (worst case). Total composition Bandwidth (Bytes) = Video part
[(a – c) * (2 Bytes for 16-bit YUV)] + UI part [c * 4 Bytes for ARGB8] + [a * 4 Bytes]
e) Total bandwidth per frame (MB) = [(c+d)/10
6
]
f) Total bandwidth = [e*30] for 30 FPS and [e*60] for 60 FPS
5.2.2 Display Controller UI Composition
This section describes the flow of data to/from DDR memory using the display controller to do the final
merging/composition of layers from the GC Nano and video processor. This method partially reduces
bandwidth consumption since the GPU does not need to read in the video surface since it does not
perform final frame composition. The GPU only works on composing the UI part of the frame minus any
additional layers from other IP blocks inside the SoC/MCU. A benefit from this method is lower overall
system bandwidth, but at the cost of less flexibility in the UI. If the video (or image) stream only needs be
merged with the rest of the UI then this is a good solution. If the incoming video (or image) stream needs
to be processed in any way – adding 3D effects, filtering, augmented reality, etc. – then this method has
limitations and it is better to use the GPU for full frame UI composition.
Figure 10: Display controller performing final frame composition from two incoming layers from GC Nano
and the video processor (VPU)
The display controller has a DMA engine that can read data from system memory directly. Data formats
supported are flexible and include various ARGB, RGB, YUV 444/422/420, and their swizzle formats.
14. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 14 of 20
Bandwidth calculation for UI composition only is straightforward and is only based on the screen
resolution size, as follows:
Resolution
(WxH)
Total
Screen
Pixels
(a)
Total UI
Pixels
(Bytes)
(b)
Total
Bandwidth
per frame
(MB)
(c)
Total
Bandwidth
@ 30FPS
(d)
Total
Bandwidth
@ 60FPS
(d)
800x480
(480p, 32bpp)
384000 1536000 1.54 46.20 92.40
1280x720
(720p, 32bpp)
921600 3686400 3.69 110.70 221.40
Notes:
a) Total screen pixels = resolution WxH
b) Total UI pixels per frame = [Total screen pixels] * 4 Bytes; 32-bit ARGB8 format
c) Total bandwidth per frame (MB) = [b/10
6
]; since GC Nano needs to perform full screen UI
minus additional layers from other sources
d) Total bandwidth = [c*30] for 30 FPS and [c*60] for 60 FPS
5.2.3 Summary of Bandwidth Calculations
The table below summarizes the calculations above:
Resolution
(WxH)
UI% Updated per
Frame
GC Nano Full Frame
Composition
Display Controller
Composition
Total
Bandwidth
@ 30FPS
Total
Bandwidth
@ 60FPS
Total
Bandwidth
@ 30FPS
Total
Bandwidth
@ 60FPS
800x480
(480p, 32bpp)
15% (standard UI) 79.49 158.98
46.20 92.4025% 86.40 172.80
50% (worst case UI) 103.68 207.36
1280x720
(720p, 32bpp)
15% (standard UI) 190.77 381.54
110.70 221.4025% 207.36 414.72
50% (worst case UI) 248.83 497.66
Adding Vivante’s DEC compression technology will also reduce bandwidth by about 2x – 3x from the
numbers above.
15. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 15 of 20
6 GC Nano Architecture Advantage for UIs
There are two main architectures for GPU rendering, tile based rendering (TBR) and immediate mode
rendering (IMR). TBR breaks a screen image into tiles and renders once all the relevant information is
available for a full frame. In IMR graphics commands are issued directly to the GPU and executed
immediately. Techniques inside Vivante’s architecture allows culling of hidden or unseen parts of the
frame so execution, bandwidth, power, etc. is not wasted on rendering parts of the scene that will
eventually be removed. Vivante’s IMR also has significant advantages when rendering photorealistic 3D
images for the latest AAA rated games that take advantage of full hardware acceleration for fine
geometries and PC level graphics quality, including support of advanced geometry/tessellation (GS/TS)
shaders in its high end GC Vega cores (DirectX 11.x, OpenGL ES 3.1 and Android Extension Pack – AEP).
Note: some of the more advanced features like GS/TS are not applicable to the GC Nano Series.
Tile Based Rendering (TBR) Architecture for UIs6.1
The following images explain the process TBR architectures use for rendering UIs.
6.1.1 Breaking a scene into tiles…
6.1.2 But…before rendering a frame, all UI surfaces need to go through a pre-tile pass before
proceeding…
6.1.3 Combining the pre-processing step and tiling step give us the following…
16. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 16 of 20
6.1.4 If the UI is dynamic then parts of the frame need to be re-processed…
6.1.5 Here are the “dirty” blocks inside the UI
6.1.6 TBR UI Rendering Summary
From the steps shown above, TBR based GPUs have additional overhead that increases UI rendering
latency since the pre-processed UI triangles need to be stored in memory first and then read back when
used. This affects overall frame rate. TBR GPUs also require large amounts of on-chip L2$ memory to store
the entire frame (tile) database, but as UI complexity grows, either the on-chip L2$ cache size (die area)
has to grow in conjunction or the TBR core has to continuously overflow to DDR memory which increases
their latency, bandwidth and power.
TBRs have mechanisms to identify and track which parts of the UI (tiles) and which surfaces have changed
to minimize pre-processing, but for newer UIs that have many moving parts; this continues to be a
limitation. In addition, as screen sizes/resolutions and content complexity increases, this latency becomes
even more apparent especially on Google, Microsoft, and other operating system platforms that will use
unified UIs across all screens.
Immediate Mode Rendering (IMR) Architecture for UIs6.2
The most advanced GPUs use IMR technology, which is object-based rendering found in PC
(desktop/notebook) graphics cards all the way to Vivante’s GC Series product lines. IMR allows the GPU to
render photorealistic images and draw the latest complex, dynamic and interactive content onscreen. In
this architecture, graphics API calls are sent directly to the GPU and object rendering happens as soon as
commands and data are received. This significantly speeds up 3D rendering performance.
In the case of UIs, the pre-pass processing is not required and this eliminates the TBR-related latency seen
in section 6.1. In addition, there are many intelligent mechanisms that perform transaction elimination so
hidden (unseen) parts of the frame are not even sent through the GPU pipeline, or if the hidden portions
are already in-flight (ex. change in UI surface), those can be discarded immediately so the pipeline can
continue executing useful work.
17. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 17 of 20
Composition processing is performed in the shaders for flexibility and the Vivante GPU can automatically
add a rectangle primitive that takes the whole screen into account to achieve 100% efficiency (versus 50%
efficiency using two triangles). Memory bandwidth is equivalent to TBR architectures for simple UIs and
3D frames, but for more advanced UIs and 3D scenes, TBR designs need to access external memory much
more than IMR since TBRs cannot hold large amounts of complex scene data in their on-chip caches.
The following images explain the process Vivante’s IMR architecture uses for rendering dynamic UIs. The
process is significantly simpler compared to TBRs, and dynamic changes in UI or graphics are
straightforward.
6.2.1 IMR Object Based UI Rendering
6.2.2 Additional UI Content are Considered New Objects
6.2.3 IMR GPUs are Ideal for Dynamic and Next Gen UIs
6.2.4 IMR UI Rendering Summary
For dynamic 3D UIs, complex 3D graphics, mapping applications, etc., IMRs are more efficient in terms of
latency, bandwidth and power. Memory consumption and memory I/O is another area where IMR has its
advantages – for upcoming dynamic real time 3D UIs, IMR is the best choice and for standard UIs, IMRs
and TBRs are equivalent but IMRs give the SoC/MCU flexibility and future-proofing. Note: historically,
TBRs were better for simpler UIs and simple 3D games (low triangle/polygon count, low complexity) since
18. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 18 of 20
TBRs could keep the full frame tile database on chip (L2$ cache), but advances in UI technologies brought
about by leading smartphones, tablets and TVs have tipped things in favor of IMR technology.
19. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 19 of 20
7 Summary
GC Nano provides flexibility and advanced graphics and UI composition capabilities to SoCs/MCUs
targeting IoT and wearables. With demand for high quality UIs that mirror other consumer devices from
mobile to home entertainment and cars, a consistent, configurable interface is possible across all screens
as the trend towards a unified platform is mandated by Google, Microsoft and others. GC Nano is also
architected for OEMs and developers to take advantage of IMR technologies to create clean, amazing UIs
that help product differentiation. The tiny core packs enough horsepower to take on the most demanding
UIs at 60+ FPS in the smallest die area and power/thermal consumption. The GC Nano also reduces
system bandwidth, latency, memory footprint and system/CPU overhead so resource constrained
wearables and IoT SoCs/MCUs can use GPUs for next generation designs.
20. Vivante GC Nano UI Acceleration White Paper
Rev. 1.0 / August 2014
Page 20 of 20
Document Revision History
Version Date Author Notes
0.1 2014-07-22 Benson W. Tao Preliminary Draft
0.2 2014-07-30 Benson W. Tao Updated Section 6 (architecture)
1.0 2014-08-01 Benson W. Tao Public Release