SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Let’s Be H ng You
derekb@vimeo.com / derek@videolan.org
@daemon404
Derek Buitenhuis
8 October 2019
London, UK, European Union
Who’s this guy?
18 October 2019
• Senior Video Engineer @ Vimeo
• Open source developer (rav1e, FFmpeg, FFMS2, etc.)
• VideoLAN non-profit board member
• Professional Twitter Sh*tposter
Who’s this guy?
28 October 2019
• Mostly, I’m this guy:
Non-technical Bits (I promise this is only one slide)
38 October 2019
• I am likely biased, given I work on open source stuff and for an AOM member
• I would still recommend $dayjob uses VVC if it ends up being better and makes financial sense
• Seems unlikely given the official stance of MPEG members on VVC patents
seems to be “we’ll wait and see lol” and has no up-front IPR disclosure rules
• Personal view: I don’t believe MPEG is capable yet of actual introspection on its
patent process (or lack thereof)
• AOM is far (far!) from perfect and open, but still a better bet (to me)
• libaom is … well, more on that later
48 October 2019
The Old
Stuff which is not new to AV1, but may be new to you
58 October 2019
• AV1 inherited tech and terminology from the VPx family of codecs
• Alt-ref: A reference frame, invisible (not displayed) and coded before any frames
that reference it.
• This allows b-frame-like behavior while technically having no frame reordering. Remember this
for later on!
• Can optionally be displayed later on with a show_existing_frame packet.
• Can contain anything if invisible, such as a screen tile selection for screen content.
• Golden Frame: A reference frame that is typically encoded as slightly higher quality and referenced
by multiple other frames. Purely an encoder concept.
• Superblock: The top level of a block quadtree (that is, it is bisected into smaller blocks) – 128x128 or 64x64
in AV1. Largest unit in a given tile.
68 October 2019
The New
Some quick terms / acronyms!
78 October 2019
• OBU – “Open Bitstream Units” – How everything is packetized in AV1. Basically its version of a NAL,
but slightly more generic.
• TU – “Temporal Unit” – A set of OBUs that represent a distinct point in time. e.g. A TU must contain
at least one visible frame (important due to alt-refs).
• “Temporal Group” – Basically a GOP structure.
• “Tile” – A slice in MPEG terms.
• “Decoder Model” – VBV
The Generic Picture, modulo new coding tools
88 October 2019
• Bigger and more transforms (chosen for each direction):
• DCT (up to 64x64), ADST (up to 16x16), FlipADST (up to 16x16), and Identity (no-op, up to 32x32)
• No non-zero coefficients allowed outside of 32x32 transformed block
• More splitting modes (4-way square split is recursive):
• More prediction modes and angles:
• 56 directions (8 main + delta) at largest block size
• Smooth interpolation between values in horizontal and vertical mode
• Context modelled palette (8 colors per plane) in WPP order
• Paeth (top, left, top left)
• Chroma-from-Luma (OK I lied, this is a new tool)
Image courtesy of Nathan Egge
New Tool #1: Chroma-from-Luma
98 October 2019
• Chroma planes tend to have similar structures (where the energy is, in DSP terms)
to the luma planes they’re associated with. Novel, right?
• So let’s predict them from reconstructed (and possibly subsampled) luma!
• Spatial only (in AV1).
• Big win for gaming, and screen content, and any content with high motion.
• Yes, it’s really that simple aside from implementation details like how to code it in the bitstream.
Image courtesy of xiph.org
New Tool #2: Constrained Direction Enhancement Filter (CDEF)
108 October 2019
• The point of the transforms used by block-based codecs is that they concentrate energy which is spread
out in the sample domain into fewer values (coefficients) in the transform domain.
• There are cases where commonly used transforms don’t work well, or make it worse. Hard edges.
• CDEF is a in-loop filter.
• Extremely simplified:
• Search to pick the direction that gives least error.
• Run directional deringing filter in that direction.
• Run constrained lowpass (conditional replacement) filter
• Background for CDEF is long – would take like 10 talks
• See references at end for more info!
Images courtesy of xiph.org
New Tool #3: Switch Frames (S-Frames)
118 October 2019
• A tool aimed at streaming usability rather than coding efficiency.
• In AV1, frames can reference frames that are a different resolution than they are.
• We can exploit this property with a little bit of signalling in the bitstream to allow
us to switch resolutions on inter frames.
• After an S-frame is decoded, only it and future frames can be used as a reference for
future decoded frames.
• You don’t want frames referencing multiple frame with different resolutions.
New Tool #4: Multi-Symbol Arithmetic Coding (MSAC)
128 October 2019
• Arithmetic coders rely on probabilities of a given input symbol in a stream (e.g. 0: 0.25 1: 0.75) to write the
entire stream as single number between 0 and 1.
• MPEG codecs mostly all use binary arithmetic coders; that is, the input is either and 1 or a 0.
• MSAC allows symbols with alphabets up to size 16 (input is 0-15) (can choose to use fewer).
• Allows multiple symbols to be handled at once, which is important, as entropy coding is serial.
• Allows better tracking of probabilities of less common elements and faster adaptation of
said probabilities.
• AV1’s level map coding exploits this by splitting quantized transformed blocks into low/high sets
for coefficients level 0-2, and 3-15, with different context models; higher levels getting a reduced
model. (Anything above is coded with standard Golomb-Rice coding).
Traditional Binary Arithmetic Coder
12.58 October 2019
Image courtesy of Nathan Egge
New Tool #5: Loop Restoration Filter (LRF)
138 October 2019
• Actually two filters, which can be selected per LRU (Loop restoration unit – hooray more jargon!)
• LRUs can be 64x64, 128x128, or 256x256 (note how this is larger than a 128x128 superblock)
• Applied after CDEF, in-loop.
• 7x7 Wiener filter
• Separable (vertical/horizontal)
• Normalized (reduced taps need to be coded – only 3 per direction)
• Dual Self-guided (in itself is two filters):
• 3x3 and 5x5 self-guided (as in, input is also the guide) filters applied and outputs combined
with coded weights. Alone, they may not be very good, but good weight selection produces
a better reconstruction compared to the source.
• Is applied after super-resolution scaling, at native playback resolution.
SO MUCH MORE
148 October 2019
• There are way too many new tools to cover in one talk, so I picked some I thought were most
interesting to me.
• Others include: Global motion, warped motion, segmentation maps, new deblocking filter, scalability,
film grain synthesis, compound prediction (wedge modes), super resolution.
158 October 2019
Where are we now?
First, a small rant
168 October 2019
• You may have seen some blog posts, like the now infamous BBC article in which they compare HM
to libaom. These are wrong in many respects, and here’s why:
• They disabled alt-refs (you know, that thing like b-frames?)
• libaom has a stupid design flaw where alt-refs need 2 pass to be enabled
• They disable all lookahead
• Only used PSNR (hey, psy!)
• No methodology given, at all – not reproducible in the slightest
• Comparing HM (a reference encoder, exhaustive) to libaom (not a reference encoder, non-exhaustive)
• I consider not having a real reference encoder one of AV1/AOM’s biggest failures, and am
raising this at the symposium later this year
• Please see Debargha Mukherjee’s ICIP 2019 presentation (in references)
• “Preliminary Comparison of AV1 with emergent VVC standard”
16.58 October 2019
Errata Alert!
I used the wrong runs of libaom here, where it has
--limit=30 set – the libaom curve should actually
be higher than the SVT-AV1 curve!
Are We Compressed Yet?
178 October 2019
• █ x264-placebo@2019-10-02
• █ x265-placebo@2019-08-23
• █ SVT-AV1-enc-mode-0@2019-09-29
• █ rav1e-speed-0@2019-09-26
• █ libaom-cpu-used-0@2019-08-21
• No VVC yet on AWCY (coming soon!)
• See ICIP 2019 slides for comparison (with full methodology) to VVC
• AWCY Link: https://beta.arewecompressedyet.com/?job=x265-placebo%402019-08-23&job=SVT-
AV1-enc-mode-0%402019-09-29T14%3A25%3A21.141Z&job=x264-placebo-newvmaf%402019-
10-02T19%3A58%3A58.409Z&job=ref_Aug21_cpu0%402019-08-
21T09%3A29%3A55.347Z&job=master-c0ac9ea_s0%402019-09-26T21%3A20%3A58.225Z
MS-SSIM
PSNR-HVS
CIEDE 2000
VMAF
Are We Compressed Yet? … Yet?
188 October 2019
• █ x264-placebo@2019-10-02
• █ x265-placebo@2019-08-23
• █ SVT-AV1-enc-mode-4@2019-09-29
• █ rav1e-speed-3@2019-09-26
• █ libaom-cpu-used-2@2019-08-21
• AWCY Link: https://beta.arewecompressedyet.com/?job=x265-placebo%402019-08-
23&job=SVT-AV1-enc-mode-4%402019-09-20T16%3A54%3A16.822Z&job=x264-placebo-
newvmaf%402019-10-02T19%3A58%3A58.409Z&job=ref_Aug21_cpu2%402019-08-
21T14%3A29%3A03.437Z&job=master-c0ac9ea_s3%402019-09-26T20%3A26%3A01.380Z
MS-SSIM
PSNR-HVS
CIEDE 2000
VMAF
References / Links
198 October 2019
• AV1 Spec: https://aomediacodec.github.io/av1-spec/
• Chrome-from-Luma Demo: https://people.xiph.org/~xiphmont/demo/av1/demo1.shtml
• The history and reasoning of CDEF: https://people.xiph.org/~xiphmont/demo/av1/demo2.shtml
• CDEF Paper: https://arxiv.org/pdf/1602.05975.pdf
• Paper on AV1’s Level Map Coding: https://usercontent.irccloud-cdn.com/file/KUIqFjBi/level_map.pdf
• AV1 Coding tools: https://jmvalin.ca/papers/AV1_tools.pdf
• ICIP 2019 VVC Comparison: Slides have not been uploaded yet, but I’ve attached a few slides.
Let's Be HAV1ng You - London Video Tech October 2019
Let's Be HAV1ng You - London Video Tech October 2019

Contenu connexe

Tendances (8)

Vimeo and Open Source (SMPTE Forum 2015)
Vimeo and Open Source (SMPTE Forum 2015)Vimeo and Open Source (SMPTE Forum 2015)
Vimeo and Open Source (SMPTE Forum 2015)
 
I Wrote an FFV1 Decoder in Go for Fun: What I Learned Going from Spec to Impl...
I Wrote an FFV1 Decoder in Go for Fun: What I Learned Going from Spec to Impl...I Wrote an FFV1 Decoder in Go for Fun: What I Learned Going from Spec to Impl...
I Wrote an FFV1 Decoder in Go for Fun: What I Learned Going from Spec to Impl...
 
Real-Time Voice Actuation
Real-Time Voice ActuationReal-Time Voice Actuation
Real-Time Voice Actuation
 
Wwx2014 - Todd Kulick "Shipping One Million Lines of Haxe to (Over) One Milli...
Wwx2014 - Todd Kulick "Shipping One Million Lines of Haxe to (Over) One Milli...Wwx2014 - Todd Kulick "Shipping One Million Lines of Haxe to (Over) One Milli...
Wwx2014 - Todd Kulick "Shipping One Million Lines of Haxe to (Over) One Milli...
 
Chris brown ti
Chris brown tiChris brown ti
Chris brown ti
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
 
Coverage Solutions on Emulators
Coverage Solutions on EmulatorsCoverage Solutions on Emulators
Coverage Solutions on Emulators
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 

Similaire à Let's Be HAV1ng You - London Video Tech October 2019

Raspberry pi robotics
Raspberry pi roboticsRaspberry pi robotics
Raspberry pi robotics
LloydMoore
 
Real time system_performance_mon
Real time system_performance_monReal time system_performance_mon
Real time system_performance_mon
Tomas Doran
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
Iván Montes
 
Python For Audio Signal Processing ( PDFDrive ).pdf
Python For Audio Signal Processing ( PDFDrive ).pdfPython For Audio Signal Processing ( PDFDrive ).pdf
Python For Audio Signal Processing ( PDFDrive ).pdf
shaikriyaz89
 

Similaire à Let's Be HAV1ng You - London Video Tech October 2019 (20)

04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers04 accelerating dl inference with (open)capi and posit numbers
04 accelerating dl inference with (open)capi and posit numbers
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
es_hardware_handout
es_hardware_handoutes_hardware_handout
es_hardware_handout
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
Scratching the itch, making Scratch for the Raspberry Pie
Scratching the itch, making Scratch for the Raspberry PieScratching the itch, making Scratch for the Raspberry Pie
Scratching the itch, making Scratch for the Raspberry Pie
 
ELC-E 2016 Neil Armstrong - No, it's never too late to upstream your legacy l...
ELC-E 2016 Neil Armstrong - No, it's never too late to upstream your legacy l...ELC-E 2016 Neil Armstrong - No, it's never too late to upstream your legacy l...
ELC-E 2016 Neil Armstrong - No, it's never too late to upstream your legacy l...
 
Raspberry pi robotics
Raspberry pi roboticsRaspberry pi robotics
Raspberry pi robotics
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 SupercomputerIntroducing Container Technology to TSUBAME3.0 Supercomputer
Introducing Container Technology to TSUBAME3.0 Supercomputer
 
Color me intrigued: A jaunt through color technology in video
Color me intrigued: A jaunt through color technology in videoColor me intrigued: A jaunt through color technology in video
Color me intrigued: A jaunt through color technology in video
 
Preventing Complexity in Game Programming
Preventing Complexity in Game ProgrammingPreventing Complexity in Game Programming
Preventing Complexity in Game Programming
 
PCB DESIGN - Introduction to PCB Design Library Creation
PCB DESIGN -  Introduction to PCB Design Library Creation PCB DESIGN -  Introduction to PCB Design Library Creation
PCB DESIGN - Introduction to PCB Design Library Creation
 
Putting Compilers to Work
Putting Compilers to WorkPutting Compilers to Work
Putting Compilers to Work
 
SFO15-110: Toolchain Collaboration
SFO15-110: Toolchain CollaborationSFO15-110: Toolchain Collaboration
SFO15-110: Toolchain Collaboration
 
[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist
 
Real time system_performance_mon
Real time system_performance_monReal time system_performance_mon
Real time system_performance_mon
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
 
Optimising code using Span<T>
Optimising code using Span<T>Optimising code using Span<T>
Optimising code using Span<T>
 
Python For Audio Signal Processing ( PDFDrive ).pdf
Python For Audio Signal Processing ( PDFDrive ).pdfPython For Audio Signal Processing ( PDFDrive ).pdf
Python For Audio Signal Processing ( PDFDrive ).pdf
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Let's Be HAV1ng You - London Video Tech October 2019

  • 1. Let’s Be H ng You derekb@vimeo.com / derek@videolan.org @daemon404 Derek Buitenhuis 8 October 2019 London, UK, European Union
  • 2. Who’s this guy? 18 October 2019 • Senior Video Engineer @ Vimeo • Open source developer (rav1e, FFmpeg, FFMS2, etc.) • VideoLAN non-profit board member • Professional Twitter Sh*tposter
  • 3. Who’s this guy? 28 October 2019 • Mostly, I’m this guy:
  • 4. Non-technical Bits (I promise this is only one slide) 38 October 2019 • I am likely biased, given I work on open source stuff and for an AOM member • I would still recommend $dayjob uses VVC if it ends up being better and makes financial sense • Seems unlikely given the official stance of MPEG members on VVC patents seems to be “we’ll wait and see lol” and has no up-front IPR disclosure rules • Personal view: I don’t believe MPEG is capable yet of actual introspection on its patent process (or lack thereof) • AOM is far (far!) from perfect and open, but still a better bet (to me) • libaom is … well, more on that later
  • 6. Stuff which is not new to AV1, but may be new to you 58 October 2019 • AV1 inherited tech and terminology from the VPx family of codecs • Alt-ref: A reference frame, invisible (not displayed) and coded before any frames that reference it. • This allows b-frame-like behavior while technically having no frame reordering. Remember this for later on! • Can optionally be displayed later on with a show_existing_frame packet. • Can contain anything if invisible, such as a screen tile selection for screen content. • Golden Frame: A reference frame that is typically encoded as slightly higher quality and referenced by multiple other frames. Purely an encoder concept. • Superblock: The top level of a block quadtree (that is, it is bisected into smaller blocks) – 128x128 or 64x64 in AV1. Largest unit in a given tile.
  • 8. Some quick terms / acronyms! 78 October 2019 • OBU – “Open Bitstream Units” – How everything is packetized in AV1. Basically its version of a NAL, but slightly more generic. • TU – “Temporal Unit” – A set of OBUs that represent a distinct point in time. e.g. A TU must contain at least one visible frame (important due to alt-refs). • “Temporal Group” – Basically a GOP structure. • “Tile” – A slice in MPEG terms. • “Decoder Model” – VBV
  • 9. The Generic Picture, modulo new coding tools 88 October 2019 • Bigger and more transforms (chosen for each direction): • DCT (up to 64x64), ADST (up to 16x16), FlipADST (up to 16x16), and Identity (no-op, up to 32x32) • No non-zero coefficients allowed outside of 32x32 transformed block • More splitting modes (4-way square split is recursive): • More prediction modes and angles: • 56 directions (8 main + delta) at largest block size • Smooth interpolation between values in horizontal and vertical mode • Context modelled palette (8 colors per plane) in WPP order • Paeth (top, left, top left) • Chroma-from-Luma (OK I lied, this is a new tool) Image courtesy of Nathan Egge
  • 10. New Tool #1: Chroma-from-Luma 98 October 2019 • Chroma planes tend to have similar structures (where the energy is, in DSP terms) to the luma planes they’re associated with. Novel, right? • So let’s predict them from reconstructed (and possibly subsampled) luma! • Spatial only (in AV1). • Big win for gaming, and screen content, and any content with high motion. • Yes, it’s really that simple aside from implementation details like how to code it in the bitstream. Image courtesy of xiph.org
  • 11. New Tool #2: Constrained Direction Enhancement Filter (CDEF) 108 October 2019 • The point of the transforms used by block-based codecs is that they concentrate energy which is spread out in the sample domain into fewer values (coefficients) in the transform domain. • There are cases where commonly used transforms don’t work well, or make it worse. Hard edges. • CDEF is a in-loop filter. • Extremely simplified: • Search to pick the direction that gives least error. • Run directional deringing filter in that direction. • Run constrained lowpass (conditional replacement) filter • Background for CDEF is long – would take like 10 talks • See references at end for more info! Images courtesy of xiph.org
  • 12. New Tool #3: Switch Frames (S-Frames) 118 October 2019 • A tool aimed at streaming usability rather than coding efficiency. • In AV1, frames can reference frames that are a different resolution than they are. • We can exploit this property with a little bit of signalling in the bitstream to allow us to switch resolutions on inter frames. • After an S-frame is decoded, only it and future frames can be used as a reference for future decoded frames. • You don’t want frames referencing multiple frame with different resolutions.
  • 13. New Tool #4: Multi-Symbol Arithmetic Coding (MSAC) 128 October 2019 • Arithmetic coders rely on probabilities of a given input symbol in a stream (e.g. 0: 0.25 1: 0.75) to write the entire stream as single number between 0 and 1. • MPEG codecs mostly all use binary arithmetic coders; that is, the input is either and 1 or a 0. • MSAC allows symbols with alphabets up to size 16 (input is 0-15) (can choose to use fewer). • Allows multiple symbols to be handled at once, which is important, as entropy coding is serial. • Allows better tracking of probabilities of less common elements and faster adaptation of said probabilities. • AV1’s level map coding exploits this by splitting quantized transformed blocks into low/high sets for coefficients level 0-2, and 3-15, with different context models; higher levels getting a reduced model. (Anything above is coded with standard Golomb-Rice coding).
  • 14. Traditional Binary Arithmetic Coder 12.58 October 2019 Image courtesy of Nathan Egge
  • 15. New Tool #5: Loop Restoration Filter (LRF) 138 October 2019 • Actually two filters, which can be selected per LRU (Loop restoration unit – hooray more jargon!) • LRUs can be 64x64, 128x128, or 256x256 (note how this is larger than a 128x128 superblock) • Applied after CDEF, in-loop. • 7x7 Wiener filter • Separable (vertical/horizontal) • Normalized (reduced taps need to be coded – only 3 per direction) • Dual Self-guided (in itself is two filters): • 3x3 and 5x5 self-guided (as in, input is also the guide) filters applied and outputs combined with coded weights. Alone, they may not be very good, but good weight selection produces a better reconstruction compared to the source. • Is applied after super-resolution scaling, at native playback resolution.
  • 16. SO MUCH MORE 148 October 2019 • There are way too many new tools to cover in one talk, so I picked some I thought were most interesting to me. • Others include: Global motion, warped motion, segmentation maps, new deblocking filter, scalability, film grain synthesis, compound prediction (wedge modes), super resolution.
  • 17. 158 October 2019 Where are we now?
  • 18. First, a small rant 168 October 2019 • You may have seen some blog posts, like the now infamous BBC article in which they compare HM to libaom. These are wrong in many respects, and here’s why: • They disabled alt-refs (you know, that thing like b-frames?) • libaom has a stupid design flaw where alt-refs need 2 pass to be enabled • They disable all lookahead • Only used PSNR (hey, psy!) • No methodology given, at all – not reproducible in the slightest • Comparing HM (a reference encoder, exhaustive) to libaom (not a reference encoder, non-exhaustive) • I consider not having a real reference encoder one of AV1/AOM’s biggest failures, and am raising this at the symposium later this year • Please see Debargha Mukherjee’s ICIP 2019 presentation (in references) • “Preliminary Comparison of AV1 with emergent VVC standard”
  • 19. 16.58 October 2019 Errata Alert! I used the wrong runs of libaom here, where it has --limit=30 set – the libaom curve should actually be higher than the SVT-AV1 curve!
  • 20. Are We Compressed Yet? 178 October 2019 • █ x264-placebo@2019-10-02 • █ x265-placebo@2019-08-23 • █ SVT-AV1-enc-mode-0@2019-09-29 • █ rav1e-speed-0@2019-09-26 • █ libaom-cpu-used-0@2019-08-21 • No VVC yet on AWCY (coming soon!) • See ICIP 2019 slides for comparison (with full methodology) to VVC • AWCY Link: https://beta.arewecompressedyet.com/?job=x265-placebo%402019-08-23&job=SVT- AV1-enc-mode-0%402019-09-29T14%3A25%3A21.141Z&job=x264-placebo-newvmaf%402019- 10-02T19%3A58%3A58.409Z&job=ref_Aug21_cpu0%402019-08- 21T09%3A29%3A55.347Z&job=master-c0ac9ea_s0%402019-09-26T21%3A20%3A58.225Z
  • 24. VMAF
  • 25. Are We Compressed Yet? … Yet? 188 October 2019 • █ x264-placebo@2019-10-02 • █ x265-placebo@2019-08-23 • █ SVT-AV1-enc-mode-4@2019-09-29 • █ rav1e-speed-3@2019-09-26 • █ libaom-cpu-used-2@2019-08-21 • AWCY Link: https://beta.arewecompressedyet.com/?job=x265-placebo%402019-08- 23&job=SVT-AV1-enc-mode-4%402019-09-20T16%3A54%3A16.822Z&job=x264-placebo- newvmaf%402019-10-02T19%3A58%3A58.409Z&job=ref_Aug21_cpu2%402019-08- 21T14%3A29%3A03.437Z&job=master-c0ac9ea_s3%402019-09-26T20%3A26%3A01.380Z
  • 29. VMAF
  • 30. References / Links 198 October 2019 • AV1 Spec: https://aomediacodec.github.io/av1-spec/ • Chrome-from-Luma Demo: https://people.xiph.org/~xiphmont/demo/av1/demo1.shtml • The history and reasoning of CDEF: https://people.xiph.org/~xiphmont/demo/av1/demo2.shtml • CDEF Paper: https://arxiv.org/pdf/1602.05975.pdf • Paper on AV1’s Level Map Coding: https://usercontent.irccloud-cdn.com/file/KUIqFjBi/level_map.pdf • AV1 Coding tools: https://jmvalin.ca/papers/AV1_tools.pdf • ICIP 2019 VVC Comparison: Slides have not been uploaded yet, but I’ve attached a few slides.