Presentation WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
WordPress Websites for Engineers: Elevate Your Brand
WT-4072, Rendering Web Content at 60fps, by Vangelis Kokkevis, Antoine Labour and Brian Salomon
1. Rendering Web Content
@ 60FPS
Vangelis Kokkevis & Brian Salomon
vangelis@google.com bsalomon@Google.
com
2. Google Chrome
●
●
●
Recently celebrated Chrome’s fifth anniversary!
Hundreds of millions of active users
Cross platform:
○
○
●
●
●
Windows (XP +) , Mac, Linux
Chrome OS (x86 and ARM), Android, iOS (*)
Open source: Chromium and Blink
Rapid release cycle, four channels (canary, dev, beta, stable)
Core Principles: Speed, Security, Stability, Simplicity
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
4. Why use the GPU?
●
Enable new platform features:
○
●
3D CSS, WebGL
Speed & Responsiveness
○
○
○
Less jank: Smoother scrolling, 60fps CSS animations
Page “sticks to your finger”
Faster <canvas>, <video>
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
5. Accelerated Compositing
Re-rasterizing is expensive and should be avoided if possible
Caching rasterized contents into textures is an effective way to reduce raster costs.
Split the page contents into layers, use the GPU to composite them
What gets a layer?
●
●
Content that rasters on the GPU: WebGL, 2D Canvas, Video, Flash
Content that is expected to change infrequently:
○
○
○
●
CSS transform and opacity animations
Overflow scroll
Fixed position elements
Content that overlaps other composited content
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
7. The Rendering Pipeline
User Input
or Timer
Event
Run Script
Rasterize
Invalidated
Content
Re-Layout
Document
Upload New
Content to
Textures
Draw Textured
Quads
< 16ms =
(if needed)
Compositor
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
8. Tiling
Large content layers get tiled
●
●
Layer split up into 256 x 256 or 512 x 512 pixel tiles
Cache rasterized contents in manageable chunks to
○
○
Speed up scrolling
Conserve VRAM
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
10. GPU Architecture
Browser
Screen
Shared Memory
Renderer
Blink (WebGL)
Skia (Canvas)
Compositor
CMD
CMD
CMD
ringbuffer
ringbuffer
ringbuffer
GLES2
Client
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
GPU Process
GLES2
Service
Transfer
Transfer
Transfer
buffer
buffer
buffer
ANGLE (GL ES -> D3D)
13. Threaded Compositing
Solution: Move compositing to its own thread
16ms
Main
Thread
Compositor
Thread
JS
Upload
Layout
16ms
Rasterize
Draw
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Upload
Draw
14. Good enough?
The devil’s in the details
●
●
Need to aggressively pre-paint tiles to avoid running out of rasterized content in the compositor
thread when scrolling.
How many tiles to pre-paint?
○
○
Too many: VRAM pressure, possibly lots of unnecessary work
Too few: Checkerboarding
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
15. Deferred Rasterization
Less checkerboarding: Move raster out of main thread
16ms
Main
Thread
Compositor
Thread
Raster
Thread(s)
16ms
JS
Sort
Tiles
Record Display List
Layout
Issue
Raster
Tasks
UT
RT
RT
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
UT
RT
UT
Draw
RT
RT
Sort
Tiles
RT
Issue
Raster
Tasks
UT
UT
RT
UT
RT
UT
Draw
RT
RT
16. Tooling
Lots of threads, lots of asynchronous tasks.
Good performance tools are a must for debugging and improving!
Tools we use when developing Chrome:
●
●
●
Tracing (to monitor what each thread is doing in a timeline)
FrameViewer (Inspect layers, tiles and rasterization)
Telemetry (automated performance measurement framework)
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
20. Challenges
●
●
●
●
●
●
Rasterization is a bottleneck
The main thread is unpredictable (JS, layout, long records)
There’s not enough cores to go around (mobile)
Bandwidth is at premium
GPU is a shared resource and can get oversubscribed
Huge matrix of OS / GPU / CPU / Drivers
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
21. What does the future hold
More performance gains:
●
●
●
●
Hardware accelerated rasterization
“Zero-copy” texture uploads
Hardware accelerated image decode
Smarter and more efficient layers
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
24. Pipeline Stages
SkPaint: Life of a Path
Programmable (via Subclassing)
●
SkRasterizer
○
○
●
Coverage Mask -> Coverage Mask
e.g. Blur
Source-Space Coordinate -> Color
e.g. Gradients, Bitmap Fill
SkColorFilter
○
○
Color -> Color
e.g. Color Matrix, Blend with constant Color
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Src Image -> New Src Image
e.g. Color Blur, Morphology Filter
Subsume SkColorFilter?
SkXfermode
○
○
○
Path -> Coverage Mask
e.g. ?? [considering deprecating]
SkShader
○
○
●
●
SkMaskFilter
○
○
●
Path -> Path
e.g. Dashing
SkImageFilter
○
○
○
SkPathEffect
○
○
●
●
AKA Blend
Src Color + Dst Color -> New Dst Color
e.g. Porter-Duff modes, Darken, …
Fixed Function
●
●
●
●
●
●
Stroking (width, caps, joins)
Text settings (typeface, pt size, …)
AA enable/disable
Image filtering quality level
Alpha
Default color if no SkShader
25. GPU Shaders
GPU Backend has an “effect” system for
building shaders
●
●
●
Effects arranged in linear order.
Write a snippet of GLSL fragment
code.
Effect passes a vec4 “color” to the
next effect.
○
●
●
Input to first effect is either
constant or per-vertex value.
Can insert uniforms, functions,
textures.
Internal effects can
○
○
Insert vertex shader code.
Require additional vertex
attributes.
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Initial Coverage
Initial Color
Color Effect 1
Color Effect 2
final color
texture
matrix
uniform
Cov. Effect 1
Cov. Effect 2
Cov. Effect 3
final coverage
Important to keep color and fractional coverage separate.
26. Pipeline Stages and GPU Backend
●
SkPathEffect
○
○
○
○
●
SkRasterizer: ignored
○
○
●
Perform on CPU
Call filterPath(), draw the resulting path
Special hooks for some dashing cases
Future: general mechanism to avoid creating intermediate path object on CPU
No known clients use custom rasterizers.
Act as though no rasterizer installed
SkMaskFilter:
○
Filter object is given a gpu “context object” and primitive’s mask
■
■
■
○
○
○
Can create intermediate textures
Performs draws using Effects
Returns new mask as a texture.
Special case for filters that can be performed inline with the draw to dst
In practice the only significant SkMaskFilter is blur
Future: Specialize blur code path for simple primitive types (e.g. rects)
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
27. Pipeline Stages and GPU Backend Continued
●
SkShader
○
○
●
SkColorFilter:
○
○
●
Produces an Effect object that is inserted into the draw
Implementations for bitmap shaders, various gradient types, noise shader.
Produces an effect that receives SkShader effect’s output.
Implementations for color matrix, color table, blend-against-const-color
SkImageFilter:
○
○
Works the same way as SkMaskFilter but with color input/ouput
Implementations for
■
■
■
○
Graph implementation for chaining SkImageFilters together (CPU or GPU)
■
■
○
Color blur
Lighting effect
Any (color filter, shader, or xfermode) as an image filter
SVG image filter DAG
Future: Optimization pass to minimize intermediate draws.
Shortcuts for Image filters that can be done inline or are really just a matrix.
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
28. Pipeline Stages and GPU Backend Continued
●
SkXfermode: Either as GL coefficients or Effect
○
The Porter-Duff blend modes (src-over, etc) are all expressible as GL blend coeffs
■
○
Many others are not:
■
■
■
■
○
Big caveat here
Luminance
Darken
Arithmetic
…
Xfermode can install an Effect
■
Access to the destination?
●
Effect framework provides abstract interface for accessing the dst color
●
GL_EXT_shader_framebuffer_fetch if available
●
Future: GL_NV_texture_barrier
●
Otherwise a dst-copy-to-texture is triggered
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
29. Primitives: Text
●
Skia sits on top of system font engine:
○
○
○
○
●
Large ALPHA8 texture used as glyph mask atlas (1024 x 2048)
○
○
●
FreeType
CoreText
GDI
DirectWrite
Will use a second RGB(A) texture if there are “LCD” glyphs
Texture divided into 256x256 texel “plots”
Strike: A unique combination of
●
●
Typeface
Size
Style (italic, bold, …)
Strikes claim (multiple) plots
Plots purged wholesale using LRU
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Strike 0
Strike 1
Strike 0
Strike 2
Strike 2
Strike 1
Strike 3
Strike 3
Strike 0
Strike 3
Strike 3
Strike 1
Strike 2
○
○
○
Strike 3
(free)
Strike 2
30. Primitives: Text Continued
●
●
Glyphs packed in plots packed using Skyline algorithm [Jukka Jylänki http://clb.demon.fi/]
Attempt to perform all uploads for a frame before draws
○
○
●
Avoid flushing draws
○
○
●
●
●
Queue GL draws
Uploads go through immediately
Only flush draws to GL when a plot is purged that is referenced in currently queued draws
Matters a lot more on mobile, especially tiled architectures
Works pretty well for scrolling
Struggles with pinch-zoom
Under development: distance field atlas
○
○
Same texture partitioning and replacement scheme
“Masks” are (mostly) resolution independent
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
31. Primitives: Rects
Not anti-aliased: Simple, draw a quad!
Two approaches for anti-aliasing (non-MSAA):
●
Geometric
○
○
○
Create inner and outer offset geometry
Offset is 0.5 pixels
Use “coverage” vertex attribute
■
■
○
●
c=1
0 at outer offset rect
1 at inner offset rect
c=0
Handle degenerate cases
Shader
○
Attributes:
■
■
■
○
W = rect.width() + 0.5, H = rect.height() + 0.5
Y = normalized y-axis of rect
C = center of rect
coverage in Y at pixel P is clamp(H-((p - C) dot Y), 0, 1)
Geometry shaders could reduce VBO size and save CPU cycles
W
C
Y
H
p
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
32. Primitives: Misc
Adaptations for stroked rectangles
Similar shader techniques for:
●
●
●
Ellipses
Circles
Rounded-Rectangles
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
33. Primitives: Paths
●
Why are paths hard?
○
○
In most general case have to handle both the fill rule and anti-aliasing
After a blend coverage/alpha distinction is lost. Must only perform one blend in general.
Can’t double blend in overlap!
Can’t anti-alias interior edge!
Multiple edges from different contours
relevant to pixels in concavities!
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
34. Primitives: Paths Continued
●
●
●
●
MSAA solves the AA problem
Use the stencil to solve the fill rule problem
Tessellate contours into line segments
Pass 1:
○
○
○
●
+1
Draw the tessellated contours as triangle fan
Disable color writes
Stencil op: +1 for front face, -1 for back face
-1
Pass 2:
○
○
○
Draw bounding geometry
Enable color writes
Stencil func
■
■
●
Pass 1
+1
Winding: Pass if stencil is non-zero
Even/Odd: Pass if LSB is 1
Avoid tessellating quadratic and cubic beziers:
○
○
○
Discard in FS if outside the curve [Kokojima et al.]
Need per sample discard or sample coverage mask
No-go on ES3 :(
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL
Pass 2
35. Primitives: Paths Continued
For AA paths without MSAA:
●
●
●
Detect if path is one of the other primitive types (e.g. rounded rectangle)
If very thin stroke draw as AA lines (and ignore double blend problem)
If path is convex fill rule problem goes away
○
○
○
●
Fan the on-contour control points
Draw bounding hulls of curves
Compute coverage using implict eq. approx distance to curve [LoopBlinn]
Otherwise, SW rasterize mask and upload
| RENDERING WEB CONTENT AT 60FPS | NOVEMBER 12, 2013 | CONFIDENTIAL