Talk I have at the October 2019 London Video Tech meetup covering a few of the many AV1 coding tools (old and new), a small rant on some AV1 tests, and some graphs.
Video: <upload pending>
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Let's Be HAV1ng You - London Video Tech October 2019
1. Let’s Be H ng You
derekb@vimeo.com / derek@videolan.org
@daemon404
Derek Buitenhuis
8 October 2019
London, UK, European Union
2. Who’s this guy?
18 October 2019
• Senior Video Engineer @ Vimeo
• Open source developer (rav1e, FFmpeg, FFMS2, etc.)
• VideoLAN non-profit board member
• Professional Twitter Sh*tposter
4. Non-technical Bits (I promise this is only one slide)
38 October 2019
• I am likely biased, given I work on open source stuff and for an AOM member
• I would still recommend $dayjob uses VVC if it ends up being better and makes financial sense
• Seems unlikely given the official stance of MPEG members on VVC patents
seems to be “we’ll wait and see lol” and has no up-front IPR disclosure rules
• Personal view: I don’t believe MPEG is capable yet of actual introspection on its
patent process (or lack thereof)
• AOM is far (far!) from perfect and open, but still a better bet (to me)
• libaom is … well, more on that later
6. Stuff which is not new to AV1, but may be new to you
58 October 2019
• AV1 inherited tech and terminology from the VPx family of codecs
• Alt-ref: A reference frame, invisible (not displayed) and coded before any frames
that reference it.
• This allows b-frame-like behavior while technically having no frame reordering. Remember this
for later on!
• Can optionally be displayed later on with a show_existing_frame packet.
• Can contain anything if invisible, such as a screen tile selection for screen content.
• Golden Frame: A reference frame that is typically encoded as slightly higher quality and referenced
by multiple other frames. Purely an encoder concept.
• Superblock: The top level of a block quadtree (that is, it is bisected into smaller blocks) – 128x128 or 64x64
in AV1. Largest unit in a given tile.
8. Some quick terms / acronyms!
78 October 2019
• OBU – “Open Bitstream Units” – How everything is packetized in AV1. Basically its version of a NAL,
but slightly more generic.
• TU – “Temporal Unit” – A set of OBUs that represent a distinct point in time. e.g. A TU must contain
at least one visible frame (important due to alt-refs).
• “Temporal Group” – Basically a GOP structure.
• “Tile” – A slice in MPEG terms.
• “Decoder Model” – VBV
9. The Generic Picture, modulo new coding tools
88 October 2019
• Bigger and more transforms (chosen for each direction):
• DCT (up to 64x64), ADST (up to 16x16), FlipADST (up to 16x16), and Identity (no-op, up to 32x32)
• No non-zero coefficients allowed outside of 32x32 transformed block
• More splitting modes (4-way square split is recursive):
• More prediction modes and angles:
• 56 directions (8 main + delta) at largest block size
• Smooth interpolation between values in horizontal and vertical mode
• Context modelled palette (8 colors per plane) in WPP order
• Paeth (top, left, top left)
• Chroma-from-Luma (OK I lied, this is a new tool)
Image courtesy of Nathan Egge
10. New Tool #1: Chroma-from-Luma
98 October 2019
• Chroma planes tend to have similar structures (where the energy is, in DSP terms)
to the luma planes they’re associated with. Novel, right?
• So let’s predict them from reconstructed (and possibly subsampled) luma!
• Spatial only (in AV1).
• Big win for gaming, and screen content, and any content with high motion.
• Yes, it’s really that simple aside from implementation details like how to code it in the bitstream.
Image courtesy of xiph.org
11. New Tool #2: Constrained Direction Enhancement Filter (CDEF)
108 October 2019
• The point of the transforms used by block-based codecs is that they concentrate energy which is spread
out in the sample domain into fewer values (coefficients) in the transform domain.
• There are cases where commonly used transforms don’t work well, or make it worse. Hard edges.
• CDEF is a in-loop filter.
• Extremely simplified:
• Search to pick the direction that gives least error.
• Run directional deringing filter in that direction.
• Run constrained lowpass (conditional replacement) filter
• Background for CDEF is long – would take like 10 talks
• See references at end for more info!
Images courtesy of xiph.org
12. New Tool #3: Switch Frames (S-Frames)
118 October 2019
• A tool aimed at streaming usability rather than coding efficiency.
• In AV1, frames can reference frames that are a different resolution than they are.
• We can exploit this property with a little bit of signalling in the bitstream to allow
us to switch resolutions on inter frames.
• After an S-frame is decoded, only it and future frames can be used as a reference for
future decoded frames.
• You don’t want frames referencing multiple frame with different resolutions.
13. New Tool #4: Multi-Symbol Arithmetic Coding (MSAC)
128 October 2019
• Arithmetic coders rely on probabilities of a given input symbol in a stream (e.g. 0: 0.25 1: 0.75) to write the
entire stream as single number between 0 and 1.
• MPEG codecs mostly all use binary arithmetic coders; that is, the input is either and 1 or a 0.
• MSAC allows symbols with alphabets up to size 16 (input is 0-15) (can choose to use fewer).
• Allows multiple symbols to be handled at once, which is important, as entropy coding is serial.
• Allows better tracking of probabilities of less common elements and faster adaptation of
said probabilities.
• AV1’s level map coding exploits this by splitting quantized transformed blocks into low/high sets
for coefficients level 0-2, and 3-15, with different context models; higher levels getting a reduced
model. (Anything above is coded with standard Golomb-Rice coding).
15. New Tool #5: Loop Restoration Filter (LRF)
138 October 2019
• Actually two filters, which can be selected per LRU (Loop restoration unit – hooray more jargon!)
• LRUs can be 64x64, 128x128, or 256x256 (note how this is larger than a 128x128 superblock)
• Applied after CDEF, in-loop.
• 7x7 Wiener filter
• Separable (vertical/horizontal)
• Normalized (reduced taps need to be coded – only 3 per direction)
• Dual Self-guided (in itself is two filters):
• 3x3 and 5x5 self-guided (as in, input is also the guide) filters applied and outputs combined
with coded weights. Alone, they may not be very good, but good weight selection produces
a better reconstruction compared to the source.
• Is applied after super-resolution scaling, at native playback resolution.
16. SO MUCH MORE
148 October 2019
• There are way too many new tools to cover in one talk, so I picked some I thought were most
interesting to me.
• Others include: Global motion, warped motion, segmentation maps, new deblocking filter, scalability,
film grain synthesis, compound prediction (wedge modes), super resolution.
18. First, a small rant
168 October 2019
• You may have seen some blog posts, like the now infamous BBC article in which they compare HM
to libaom. These are wrong in many respects, and here’s why:
• They disabled alt-refs (you know, that thing like b-frames?)
• libaom has a stupid design flaw where alt-refs need 2 pass to be enabled
• They disable all lookahead
• Only used PSNR (hey, psy!)
• No methodology given, at all – not reproducible in the slightest
• Comparing HM (a reference encoder, exhaustive) to libaom (not a reference encoder, non-exhaustive)
• I consider not having a real reference encoder one of AV1/AOM’s biggest failures, and am
raising this at the symposium later this year
• Please see Debargha Mukherjee’s ICIP 2019 presentation (in references)
• “Preliminary Comparison of AV1 with emergent VVC standard”
19. 16.58 October 2019
Errata Alert!
I used the wrong runs of libaom here, where it has
--limit=30 set – the libaom curve should actually
be higher than the SVT-AV1 curve!
20. Are We Compressed Yet?
178 October 2019
• █ x264-placebo@2019-10-02
• █ x265-placebo@2019-08-23
• █ SVT-AV1-enc-mode-0@2019-09-29
• █ rav1e-speed-0@2019-09-26
• █ libaom-cpu-used-0@2019-08-21
• No VVC yet on AWCY (coming soon!)
• See ICIP 2019 slides for comparison (with full methodology) to VVC
• AWCY Link: https://beta.arewecompressedyet.com/?job=x265-placebo%402019-08-23&job=SVT-
AV1-enc-mode-0%402019-09-29T14%3A25%3A21.141Z&job=x264-placebo-newvmaf%402019-
10-02T19%3A58%3A58.409Z&job=ref_Aug21_cpu0%402019-08-
21T09%3A29%3A55.347Z&job=master-c0ac9ea_s0%402019-09-26T21%3A20%3A58.225Z
30. References / Links
198 October 2019
• AV1 Spec: https://aomediacodec.github.io/av1-spec/
• Chrome-from-Luma Demo: https://people.xiph.org/~xiphmont/demo/av1/demo1.shtml
• The history and reasoning of CDEF: https://people.xiph.org/~xiphmont/demo/av1/demo2.shtml
• CDEF Paper: https://arxiv.org/pdf/1602.05975.pdf
• Paper on AV1’s Level Map Coding: https://usercontent.irccloud-cdn.com/file/KUIqFjBi/level_map.pdf
• AV1 Coding tools: https://jmvalin.ca/papers/AV1_tools.pdf
• ICIP 2019 VVC Comparison: Slides have not been uploaded yet, but I’ve attached a few slides.