1) The document discusses strategies for maximizing mobile app performance while minimizing battery drain. It identifies inefficient use of the cellular radio and preventing the processor from sleeping as common causes of excessive power consumption.
2) Trepn Profiler is introduced as a tool that can accurately measure an app's power consumption and identify performance bottlenecks by tracking CPU, GPU, and other hardware component usage.
3) The document provides best practices for using Trepn Profiler to optimize apps, such as inserting markers in code, reducing overhead from unnecessary data collection, and performing automated testing to evaluate the effects of changes.
2. 2
Agenda
The Challenge
1
Trepn Profiler
Deep Dive
2
Qualcomm®
Snapdragon™
Performance
Visualizer
3
Graphics &
Gaming
4
Qualcomm Snapdragon and Trepn are products of Qualcomm Technologies, Inc.
4. 4
High end processor speeds increased from 1.5 GHz to 2.7 GHz over
the past 3 years
The number of cores have increased 3 of the past 4 years
Mobile displays are getting larger and higher in resolution
− HD (1280x720) > FHD (1920x1080) > QHD (2560x1440)
Some mobile devices now designed to work 24/7
− Tracking location
− Listening for voice commands
− Step counters track movement
A single mobile processor can replace several discrete chips
Mobile trends resulting in increased power consumption
Source: Qualcomm Technologies, Inc. data
5. 5
Battery capacity hasn’t kept up
Processor specs for flagship Android devices by year
Power
(mAh)
3500
3000
2500
2000
1500
1000
2.5
2
1.5
1
0.5
0
2012 2013 2014
Frequency
(GHz)
Frequency
33% increase
in battery power
Battery
Source: Qualcomm Technologies, Inc. data
2012 – Samsung Galaxy S III - Snapdragon S4 SoC featuring a dual-core 1.5 GHz; 2100mAh
2013 – Samsung Galaxy S4 - Qualcomm Snapdragon 600 APQ8064AB quad-core 1.9 GHz; 2600 mAh
2014 – Samsung Galaxy S5 - Qualcomm Snapdragon 801 (8974) 2.5 GHz quad-core; 2800 mAh
80% increase
in processor speeds
0500
4000
6. 6
Battery life is very important to consumers
Consumers rank phone battery life as the most important factor in their smartphone
buying decision. Q: For your next smartphone/mobile phone purchase, which of the
following features would drive your decision to select one phone over others?
Importance of Smartphone Features Among Smartphone Buyers1
38%
42%
44%
46%
46%
48%
52%
72%
Display / screen resolution
Memory / storage
4G or LTE connectivity
Screen / display type / quality
Screen / display size
Touchscreen
WiFi speed / quality
Battery life
1 Source: Qualcomm Brand Tracker, Market Research Group. United States, January 2014
Top 10 Smartphone Purchase Drivers2
Android vs. iOS Windowsvs.
Battery Life
Ease of Use
Operating System
Android, Symbian,
webOS, Windows Mobile
Touch Screen
Screen Size
56% 49% 53%
33% 39% 38%
37% 32% 40%
34% 34% 37%
37% 22% 34%
2 Source: IDC's ConsumerScape 360 by IDC Michael DeHart
7. 7
“High-risk” apps can waste power and mobile data
List of “high risk” apps that drain the battery faster than normal typically highlighted
Unexpected data usage
2 to 5 times faster than normal
Over 70%
of these apps are chart toppers in Google Play
Some of these apps can cause the battery
to drain 2 to 5 times faster than normal
Some of these apps can cause unexpected
data usage (up to 2.2GB a month)
Source: Descriptions taken from Verizon’s High-risk App site. April 2014
8. 8
Heavy battery usage is a top reason consumers uninstall apps
What causes users to delete an app?
76%
59%
71%
55% 53%
Freezes Slow
responsiveness
Crashes Heavy battery
usage
Too many ads
Source: Fierce Developer Survey – Exploring the reasons users complain about apps (Nov. 2012)
#1 #2 #3 #4 #5
10. 10
1. Compare your app to the competition – Look at idle power and real world use cases
2. If you can measure it, you can act on it – Does code A or code B consumer more power?
How to: Accurately measure the power consumption of your app
“Profile App” or
“Profile System”
Launch Trepn, go to Settings,
and select the Battery Power
data point
Check “Acquire Wakelock
while Profiling” keep your
processor awake
1 2 3
App #2 consumed 500%
more power than idle
and 53% more power
than a similar app
Device idle
(Screen off after 2 min)
App #1 consumed 3.2x
more power than idle
(Screen off after 2 min.)
Source: Trepn Profiler screenshots | Trepn is a product of Qualcomm Technologies, Inc.
11. 11
What if your device doesn’t support battery power?
Here are some of the many options available
Android 5.0 Battery app GSam Battery Monitor Pro Power Tutor
12. 12
1. Remove your USB cable – Trepn cannot display accurate power readings when your
mobile device is charging or connected to a computer. Tip: Use ADB over Wi-Fi
2. Make sure your device reports accurate battery power – Supported devices include :
Nexus 4, Nexus 5, Nexus 7, HTC One (2013), Sony Xperia ZL, HTC Droid DNA and more
3. Make sure your CPU stays awake – Check “Acquire Wakelock while Profiling” in Settings
4. Minimize background processes – It’s not enough to close everything in Recent Apps. Go
to the Task manager and clear memory; Open the Apps manager, go to Running and close
all remaining unneeded apps, and stop all unneeded services
− Pure Android devices are better for testing because they have fewer preinstalled apps and less
running in the background
− Before closing unused apps: 1043mW. After closing unused apps : 726mW. A 30% difference!
Software Power Measurement Best Practices
13. 13
5. Focus on what you’re measuring – Turn off everything that is not related to what you want
to measure (e.g., Screen when possible, Wi-Fi, Mobile networks, Location (GPS), Bluetooth,
Google Now, etc.).
6. Reduce overhead – Don’t use more data points than you need because this increases
system resource consumption. Make sure “Show Per-Application Statis..” is unchecked,
unless you need to see the mobile and Wi-Fi data transmitted. Turn off screen overlays
7. Minimize the impact of the screen – The screen is the biggest consumer of power. If you
don’t need the screen to be easily readable, you turn it off or set the screen timeout to a
short value (e.g., 1 minute)
8. Compare apples to apples – Don’t try to compare on-target measurements with off-target
(e.g. Monsoon) measurements
Software Power Measurement Best Practices, cont.
14. 14
1. Go to developer.qualcomm.com and register. Log-in and download
Trepn Profiler under ‘Increase App Performance’
2. Install and launch Trepn and go to Settings
− Under ‘General,’ check ‘Acquire Wakelock while profiling’
− Under ‘Data Points,’ check ‘CPU2 Frequency’ and ‘CPU3 Frequency’ and
uncheck ‘Application State’
− Go to ‘Overlays’ and add Graph overlays in the lower corners for ‘CPU2
Frequency’ and ‘CPU3 Frequency’
3. Go to Settings/General Save Preferences, type ‘Default’ preference
and touch Save
4. Touch Back, ‘Profile System’ and ‘Graph’
5. Touch the Back button and ‘Overlays’
6. Go to Notification bar and touch ‘Stop Profiling’
7. Save as .db, so the session can be analyzed later
Demo: Using Trepn to profile your mobile processor
16. 16
Is this app using all four of the CPU cores?
Source: Trepn screenshot of 3DMark Extreme benchmark
17. 17
Trepn tracks the actual power consumed at a level of detail that
is unmatched by most other on-target profilers
Source: Screenshot of Trepn Profiler running GLBench
18. 18
How much power is saved when playing a 4K video using the
hardware decode option?
Video playback using S/W decoder option Video playback using H/W decoder option
43% less power consumedAlmost 3x more CPU consumed load
19. 19
Is this app using system resources efficiently?
Source: Trepn screenshot
20. 20
Pros
− Measures voltage, current and power
− Very accurate
− Doesn’t consume CPU cycles (or other system resources)
like on-target solutions
Cons
− Must attach wires to the battery in your mobile device
− Can be expensive ($770)
− Can’t measure per-app or per-rail power
The pros and cons of off-target measurement
21. 21
Qualcomm® Snapdragon™ MDP development hardware have sense resistors placed on
the following hardware rails:
− CPU, Camera, Digital Core, Graphics, Internal Memory, LCD Backlight, SD Card and WLAN/BT
If you have a fixed resistance and know the voltage, you can easily determine the power
(P = V²/R). An EPM PSoC converts these analog voltages to digital, so Trepn Profiler can
display them
Measuring per-rail power consumption
Power Delta
Before
Photo Taken
Source: Trepn Profiler screenshots
Power Delta
After
Photo Taken
22. 22
How to: Profile Your App Without Leaving Your IDE
Installing the Trepn plug-in for Eclipse
1. Launch Eclipse
2. Select Help Install New Software
3. Click Add … and Enter:
• Name – you may use any name, such as Trepn Plugin
• Location –
https://developer.qualcomm.com/docs/trepn/eclipse/
• Click OK
4. Select the Trepn Plugin checkbox and click Next
5. In Install Details, click Next
6. Review and accept the license agreement and click Finish:
• If you see a security warning regarding unsigned content,
click OK
• Installing dependencies may take a few minutes
7. After installation completes, restart Eclipse
8. Select Window Open Perspective Other
9. Select Trepn and click OK – Downloading additional files may
take a few minutes
23. 23
Profiling in Eclipse
Starting and Stopping Profiling
1. Connect your device to the computer via USB
2. Confirm your device is listed as a connected device
3. In the Trepn Control pane (left side), click the green play
icon to start profiling:
• The Trepn APK will be installed on your device if
necessary
25. 25
Using Trepn to See What’s Happening Behind the Scenes
The Trepn plug-in shows what your phone is doing when it’s idle
Source: A portion of the Trepn plug-in screen
All 4 cores are
active here
GPU active here
250mA to 350mA of
power is used here Even though the cellular radio is idle, it still consumers up to 150mA of power
GPS active here
Wi-Fi
active here
26. 26
Trepn shows how much data and CPU is used
Source: Trepn Profiler screenshots
Mobile data usage Wi-Fi data usage CPU usage by app
27. 27
Playing Mobile Data Detective
Small changes can make a big difference
Source: Trepn plug-in screens
Mobile data transmitted before one setting is turned off
Mobile data transmitted after one setting is turned off
28. 28
Trepn Shows Whether the Cellular Radio Is Being Used in an
Efficient Manner
Source: Trepn Profiler screenshots
29. 29
A dormant cellular radio consumes
less than 10 mA
When data is sent or received, the
radio comes up and goes into an
active state, consuming 250-300mA
When not sending data, the radio
drops down to idle, but still
consumes about 150mA
After a timeout of 8 to 15 seconds,
the radio finally goes back to a
dormant state
Why this is a problem
Connected (Idle)
8-15 seconds
Current(mA)
Connected
(Active)
Time (seconds)
<10mA
250-300 mA
~150 mA
30. 30
Connect less often – After you transmit data, the radio stays on for an additional 10 to 12 sec1
Push, don’t poll – Specify how often items are delivered1
Use analytics wisely1
− Capture data locally and group transmissions to your server. Extend the time between transmissions
Offer ad-free versions of apps – Apps without ads connect to the network much less often
Don’t continuously scan1
− Create timeouts appropriately when scanning for Wi-Fi networks or GPS signals
Don’t continuously stream – Download streams in chunks1
Offload to Wi-Fi – It uses significantly less battery than 3G or 4G1
Use the new JobScheduler APIs that are part of Android 5.02
Use Battery Historian to view wakelocks and mobile radio usage2
(http://github.com/google/batteryhistorian)
How to efficiently use your cellular radio and Wi-Fi network
1 Source: AT&T “Tips to Increase Battery Life Handout” from AnDevCon 2013
2 Google I/O 2014 - Introduction to Project Volta by (Meghan Desai and Matthew Jay Williams)
31. 31
The use JobScheduler APIs coalesces mobile radio activity
As a result, there is less overhead per mobile radio transmission
Source: Battery Historian Slide from Google I/O 2014 - Introduction to Project Volta by (Meghan Desai and Matthew Jay Williams)
Before
After
32. 32
Bundling traffic can reduce overhead-to-data ratio
Source: The Smartphone Challenge: Signaling Congestion and Power Consumption - Gerardo Giaretta – Qualcomm Technologies Inc.
Ta Tb Tc
The amount of overhead to each data burst is high
Bundling data results in shorter connection time
and less overhead
Td
Td
Td < Ta+ Tb + Tc
Bundling or gating reduces the
number of times the device connects
to the network, which reduces
signaling and power consumption
Data from different apps
Overhead (Radio on duration)
Overhead transmission continues
(based on dormancy timer) before
terminal goes back to idle
33. 33
The benefit of using your cellular radio more efficiently
Bundling data results in significant power savings
~9000mA
~5000mA
Nearly half the power!
300
150
10
2 minutes1 minute 3 minutes
2 minutes1 minute 3 minutes
300
150
10
Power(mA)Power(mA)
Source: Qualcomm Technologies, Inc. data
35. 35
How to: Insert Markers in Your Code to Identify the Causes of
CPU, GPU and Power Spikes
Trepn correlates system events with your code
Step 1: Insert Application State markers in your code. Step 2: Track your app activity in Trepn.
36. 36
Identifying the cause of power spikes in your code
Inserting markers in your code can help identify the cause of power spikes
Source: Trepn Profiler screenshots
37. 37
How to: Using Application States to Analyze Automated Data
Using application states in automated tests to get average data
0
1000
2000
3000
4000
5000
6000
300 422 652 729 883 960 1036 1190 1267 1497 1574 1728 1958 2265
BatteryPower(mW)
Frequency (MHz)
Effect of Frequency Scaling on Battery Power
38. 38
How to: Perform Automated Testing With Trepn
Trepn measures the effect of frequency scaling on battery power
40. 40
Common causes of excessive power consumption
Preventing the
processor (or other
hardware subsystems)
from going to sleep
1
Inefficient use of
the cellular radio
and Wi-Fi network
3
Keeping the
display lit too long
2
4
Taking too many
GPS location fixes
41. 41
To test, unplug your device and run
it for at least 2-3 hours
A large number of short wakelocks
or wakeup triggers can have a
negative impact on battery usage
because a phone takes time to
wake and go back asleep
Does your app prevent the processor from going to sleep?
Using Wakelock Detector (WLD) to identify processor sleep problems (Android 4.3 and earlier)
CPU Wakelocks Wakeup Triggers
42. 42
A Nexus 5 in standby gets almost one month of battery life -- yet most consumers get less
than one day
The reason: You trade two minutes of standby power every hour for 1 second of activity. With
50 apps, 100 minutes of standby power is consumed every hour
Why this is a big deal
Notes from Google’s Android Performance Primer at Google I/O (Intro to Project Volta)
Source: Slides from Google I/O 2014 - Introduction to Project Volta by (Meghan Desai and Matthew Jay Williams)
Google studied the power consumed for three typical use cases
43. 43
Do not acquire PowerManager.Wakelocks unless you really need them
− Device battery life will be significantly affected by the use of this API
Use the minimum levels needed
Be sure to release wakelocks as soon as possible
To keep the screen lit when your app is in the foreground,
use FLAG_KEEP_SCREEN_ON instead
Consider using AlarmManager in situations when you want to have your application code run
at a specific time. Beginning with KitKat, the OS will shift alarms to minimize wakeups and
battery usage.
For normal timeouts, it’s easier and more efficient to use Handler
How to avoid preventing your device from going to sleep
Expert advice how to use wakelocks more efficiently
Source: http://developer.android.com/reference/android/os/PowerManager.html#PARTIAL_WAKE_LOCK
44. 44
The Wi-Fi radio will only turn off if no WifiLocks are held by any application
Before using a WifiLock, consider if your app requires Wi-Fi access,or could
function of the mobile network
Large file downloads should hold a WifiLock to ensure the download will complete
Using WifiLocks more efficiently
Here’s what Google says about the use of WifiLocks
Source: http://developer.android.com/reference/android/net/wifi/WifiManager.WifiLock.html
45. 45
Make sure you don’t take a GPS fix when it’s not needed – AT&T’s ARO and Trepn plug-
in show when GPS fixes occur
Use coarse GPS fixes when possible, because they require much less power
Let LocationManager find the best provider for you
Have a timeout if you can’t find a satellite
Make the user aware when location tracking is active
Make it easy for users to disable location tracking without crippling their device
How to save power when using the GPS
46. 46
Benefits of offloading to a DSP
Power savings running FastCV™ facial detection to run on a DSP instead of CPU
FastCV is a product of Qualcomm Technologies, Inc.
47. 47
Android 4.4 adds platform support for audio tunneling to a digital signal processor (DSP) in
the device chipset, waking the application processor less often and using less battery.
Audio tunneling can dramatically improve battery life for use-cases such as listening to
music over a headset with the screen off.
− For example, Nexus 5 offers a total off-network audio playback time of up to 60 hours, an increase of over
50% over non-tunneled audio.
Media applications can take advantage of audio tunneling on supported devices without
needing to modify code. Audio tunneling requires support in the device hardware.
Offloading audio to the DSP
Enabling low power & increased concurrency
48. 48
• Sleep tracking apps normally use the accelerometer in
your phone to sense your movements during sleep
• By using the accelerometer on wearables you don’t need
to keep your phone in bed and get more precise
readings
• Sensor batching is when the DSP is collecting sensor
data samples in the background with the AP shut down,
and then when the AP wakes up, it provides a “batch” of
sensor data to the AP for processing and display
• Without sensor batching: 8 hours of sleep tracking
uses 70-100% of battery life
• With sensor batching: 8 hours of sleep tracking uses
10-20% of battery life
Real World Example: Sleep Tracking
Source: https://sites.google.com/site/sleepasandroid/doc/integration
49. 49
Use wakelocks only when necessary, use the minimum levels possible and release them as
soon as possible. Use KEEP_DISPLAY_ON instead
Close TCP sockets when done. Otherwise, you unnecessarily bring up the network
just to tear down. This simple fix can reduce network power up to 20%
Group network activity when possible; Be flexible in your ping times
Review Best Practice for detailed solutions to power and performance-related problems
Turn off functions like GPS, camera, accelerometer and other sensors when they are no
longer required
Recap of power saving tips
50. 50
Summary
Long battery life is very important to consumers
There is no excuse for bad power management –
Free software is available that makes it easy to locate
and fix problems with excessive power consumption
Better battery life can give your app an advantage over
the competition, which could result in more positive reviews
52. 52
Snapdragon Performance Visualizer (SPV) is a
comprehensive software tool suite that is designed
to enable Android developers using the Snapdragon
Mobile Development Platform (MDP) to visualize,
analyze and correlate the impact of detailed CPU
and system data on application performance,
making it easier to pinpoint and resolve
performance bottlenecks.
53. 53
The GUI is web based
− Dojo
− JavaScript
− Scalable Vector Graphics (SVG)
− JSON
Command Line
− ssh access for advanced command line tools
− Access to “perf” command line
− Other familiar tools, top, strace, etc
− Scripting (bash, perl) for automation
Automation
− Web based automation APIs available for many of the tools
− http://server:7376/api/setSessionTimeout&sessionId=131074&timeout=3600
How Does SPV Work
Web based tool set and more
54. 54
Monitor performance
Examine performance monitors (CPU, L2, GPU, DSP)
Visualize system traces
Statistical (time or event) profiling
Add custom data, custom markers
Thermal data
Integrate with power monitoring (QEPM) tools
Memory leak and allocation corruption detection
Kernel probes
B and E markers from Atrace
Runs on most Linux based distributions (Android, Ubuntu, Debian, Tizen, Chrome, etc)
What can SPV do?
Tools to monitor the system post processed and in real-time
56. 56
CPU Information
− Performance Monitors (cache hits)
− Utilization
− Frequency
− Temperature
− more
GPU Information
− Performance monitors
− Utilization
− Frequency
− more
DSP
− Performance Monitors
Live View
Real time collection and Visualization of time correlated system data
57. 57
Live View (2)
Live View is the bridge to ProfileView’s advanced visualization and correlations
=
Source: Snapdragon Performance Visualizer screen shots
59. 59
File Line Field
− Reads /proc, /sys, /debugfs files
− Periodically plots the value in a specific File
…
− At a specific Line number
− And a specific Field on that line
Named Pipes
− Marker
− Long
− Double
− Binary
Adding Custom Data to SPV
Mechanisms to extend SPV with your unique data
60. 60
File Line Field node
− Look at the run-queue depth on CPU0
− Stored in
/sys/devices/system/cpu/cpu0/rq-
stats/run_queue_avg
Pipe data format is:
− timestamp,data<cr> -or- ,data<cr>
− Timestamp is gtod, in decimal seconds
− e.g. 12345.34567
− Data can be written from shells using
echo
− echo “,123.456” > /tmp/namedpipe
Adding Custom Data to SPV
Some simple code and nodes
62. 62
Sample based profiling
Uses “perf record” under the covers
Not real-time
− Collect – Stop – Analyze
Multiple Visualizations
− Text Based (Oprofile)
− ProfileView GUI
Libs provided for instrumented code
Profiling
Statistical and instrumented
Source: Snapdragon Performance Visualizer screen shots
63. 63
Correlate events
See when frequency changes
Which processes/threads are running
When CPU issues commands to GPU
Per-context Performance Monitors
Call stack analysis
Hot Spot analysis
Not real-time
− Collect – Stop – Analyze
Correlates with Live View data
Correlates with Profile data
Correlates with Custom data
Tracing
Instrumented kernel and code
Source: Snapdragon Performance Visualizer screen shots
65. 65
Monitors current and voltage rails
Combined sample rates up 50K
Simultaneously monitor multiple devices
Correlate data with SPV
− Frequency
− Thermal
− Performance
Web based
Automation interface
Data export capabilities
QEPM
Source: Snapdragon Performance Visualizer screen shots
67. 67
Graphics & Gaming
Manish Sirdeshmukh
Product Manager, Staff
Qualcomm Technologies, Inc.
Dave Astle
Principal Engineer, Manager
Qualcomm Technologies, Inc.
68. 68
Gaming on mobile today
“Epic now has brought Unreal Engine 4 to Android with the Snapdragon 800 and 805 chipsets from Qualcomm
Technologies,” said Niklas Smedberg, Senior Engine Programmer, Epic Games. “Recently we worked with Qualcomm
[QTI] to elevate graphics to the next level on the Qualcomm® Snapdragon™ Adreno GPU hardware, which delivers
some of the most power-efficient unified shader capabilities we’ve seen yet for Android smartphones and tablets.”
Comparison: PC Comparison: Mobile
Qualcomm Adreno is a product of Qualcomm Technologies, Inc.
70. 70
What is involved in games?
Image: Modern Combat 5 by Gameloft
Gameplay execution (animation):
Animation for water movement and
anchored boat motion
Gameplay execution (AI):
Enemy helicopter controlled by AI
Gameplay execution (physics):
Particle physics makes
explosions look real
Console-quality graphics:
Lens effect on the sunlight breaking
through the clouds
Console-quality graphics:
Hi-res textures provide rich
details to the scene
Console-quality graphics:
Bloom glare from gun fire provide
immersive experience
Fast connectivity:
Play a mission in multi-player gaming
High-quality video:
After completing the level, watch a
cut scene transition
Responsive and accurate control:
Control the character movement
Multi-screen experience:
Mirror your screen to TV
Cinema-quality sound:
Hear gunfire, explosions, bullets
flying by, and the helicopter’s
rotor blades
71. 71
How is SoC utilized by a game?
Heterogeneous hardware blocks and data flow
Graphics Textures,
Shaders, Geometry
Video
Data
Audio
Data
Start
Quad Core CPU
System Memory
Final Frame
CPU #1 CPU #2 CPU #3 CPU
#4
Physics
Animation
Gamelogic
Artificial
Intelligence
To Display Panel
To Wi-Fi
Display Panel
Encoded
Final
Frame
Input Signals
DisplayReads
GPUReads
Video
Graphics Rendering
Audio
GraphicsPixelWrites
Video Pixel Writes
To Speakers
Wi-Fi
Engine
Video
Decoder
Video
Encoder
DSP
(Audio Decoder)
Sensor
Engine
Display
Engine
GPU
72. 72
Let’s focus on Samsung Galaxy Note 4 hardware
Processor Snapdragon 805
CPU Quad-core Krait 450 CPU at up to 2.7 GHz per core
GPU Adreno 420 GPU
DSP Qualcomm® Hexagon™ V50 DSP (up to 800MHz)
Display 4K Ultra HD on-device display concurrent with 4K Ultra HD
output to HDTV
1080p and 4K external displays
Memory LPDDR3 800MHz Dual-channel 64-bit (25.6GBps)
Qualcomm Hexagon is a product of Qualcomm Technologies, Inc.
73. 73
Desktop and console quality graphics on mobile
Complete DirectX11 FL 11_2 pipeline, supports
OpenGL ES 3.1
Support for dynamic
hardware tessellation
& geometry shaders
Samsung Galaxy Note 4 GPU highlights
Richer, visually immersive graphics
No Tessellation Tessellation
74. 74
Samsung Galaxy Note 4 supports most advanced graphics APIs
Feature/APIs OpenGL ES 3.0 OpenGL ES 3.1 Android Extension Pack
Compute Shader No Yes Yes
Atomics No Yes Yes
Image Load/Store No Yes Yes
Draw Indirect No Yes Yes
Texture Gather No Yes Yes
Multisample Textures No Yes Yes
Stencil Textures No Yes Yes
Separate Shader Objects No Yes Yes
Advanced Blending Modes
(Programmable Blending)
No Yes Yes
Geometry Shaders No No Yes
Tessellation Shaders No No Yes
75. 75
Improved architecture for performance & efficiency
Better performance
Reduced power consumption
Samsung Galaxy Note 4 GPU highlights
Direct
Rendering
Tiled
Rendering
Dynamic
Switching
Original ASTC Compression
24bpp 8bpp 3.56bpp 2bpp
Unified Shaders
Pixel | Vertex | Compute
Tessellation | Geometry
Adreno GPU
System memory
Adreno GPU
System memory
Tile buffer
76. 76
GPU architecture
Tiled Rendering
architecture
Advantages:
Designed to minimize unnecessary data traffic to host memory
Designed to minimize power consumption
Use of transparency/anti-aliasing is inexpensive
Objects in
background
Objects in
foreground
Advantages:
• Designed to prevent unnecessary use of GPU
resources in drawing pixels for occluded objects
• Designed to increase overall graphics performance
for larger scenes with opaque geometry
Early Z (Depth) Reject feature
77. 77
GPU architecture
Adreno GPU
System memory
Direct rendering
GMEM (Tile Buffer)
Adreno GPU
System memory
Tiled rendering
FlexRender
Dynamic
Switching
1X
Speed for
“highp” Shaders
2X
Speed for
“mediump” Shaders
Dynamic FlexRender technology
Advantages:
Better performance and power for wider range of use cases
More developer flexibility
Advantages:
• Use additional/complex shaders without compromising
performance
• Better performance with power efficiency
Double Rate Half Precision (DRHP) design
81. 81
In the worst case the complete sequence of VBO updates
and draw calls may have to be repeated for each bin
Even when using glBufferSubData multiple copies of the
entire VBO may need to be maintained by the driver
Optimization: dynamic vertex buffer objects
Worst case pattern of VBO usage
Update VBO0 Update VBO0 Update VBO0Draw Draw Draw
Frame rendering
82. 82
Optimization: dynamic vertex buffer objects
Optimized dynamic VBO order
Or if multiple dynamic VBOs are used
Update VBO0 Draw VBO0
Update VBO0 Update VBOn Draw VBO0 Draw VBOn
Frame rendering
Frame rendering
83. 83
Sort by material
− Reduces shader and texture state changes
Sort opaque draw calls front-to back
− Reduces time spent shading fragments which will be overwritten later
− Have observed > 10ms/frame performance increase in some fragment bound content with just this optimization.
Draw the skybox last
− Typically the skybox is covered by foreground geometry in half or more of the screen
Optimization: sorting
Potential to reduce both the number of state changes as well as
overdraw - both of which have a negative impact on GPU performance
84. 84
Operations on 16 bit floating point (mediump) values are 2x faster than on 32 bit (highp)
− Recommend setting default precision
to medium and promoting only values
which require higher precision, E.g
Optimization: shader performance
Precision
Adreno 3xx and 4xx GPUs utilize a scalar architecture
Avoid using components that aren’t needed for the final result
Wherever possible re-order operations to execute on as few components as possible
Scalar architecture
precision mediump float; // Set default precision in FS to fp16
out vec2 vSmallTexCoord; // Uses mediump
out highp vec2 vLargeTexCoord; // Uses highp
85. 85
High levels of tessellation can generate sub-pixel
triangles which cause poor rasterizer utilization
− Very important to utilize distance, screen space size or
other adaptive metrics for computing tessellation
factors which avoid sub-pixel triangles
Optimization: tessellation
Tessellation allows for incredible levels of detail and can substantially reduce
memory bandwidth and CPU cycles by allowing other game sub-systems to
operate on low resolution representations of meshes, but …
Full Rasterizer Utilization Partial Rasterizer
Utilization
86. 86
Hardware back-face culling occurs after the tessellation stage, which potentially wastes GPU
resources tessellating back facing primitives
Back-facing primitives can be identified in the TCS and culled by setting their edge
tessellation factors to 0
− A slight “fudge” factor may be needed in this calculation if displacement mapping
will be used in the TES as this technique may change the visibility of primitives
Optimization: tessellation
Culling
Whenever possible disable the TCS and TES stages if the tessellation factor for the mesh
would be ~1
− Eliminates the use of unnecessary GPU stages
General
89. 89
Support for OpenGL ES 3.1, 3.0 & 2.0, DirectX,
and OpenCL
Supported on Windows, Mac OSX, and Linux
Comprehensive collection of utilities
Over 100 samples and tutorials
Thorough documentation
Adreno tools
Adreno SDK Adreno Profiler
Comprehensive profiling tool
Supported on Windows, Mac OSX, and Linux
Enables detailed analysis of GPU utilization
Proven effective and easy to use
Works with commercial devices & apps
Available on developer.qualcomm.com
Adreno SDK and Adreno Profiler and products of Qualcomm Technologies, inc.