Full immersion is achieved by simultaneously focusing on the broader dimensions of visual quality, sound quality, and intuitive interactions. This presentation discusses how:
- Technology improvements continue to drive more immersive experiences, especially for VR and AR
- High Dynamic Range (HDR) will enhance the visual quality on all our screens
- Scene-based audio is a new paradigm for 3D audio
- Natural user interfaces like voice, gestures, and eye tracking are making interactions more intuitive
2. Technology improvements continue to drive more
immersive experiences, especially for VR and AR
3
Natural UIs like voice,
gestures, and eye tracking
are making interactions
more intuitive
2
Scene-based audio is a
new paradigm for 3D
audio
High Dynamic Range (HDR)
will enhance the visual
quality on all our screens
1
4. 4
Immersive
Experiences
• Draw you in…
• Take you to another place…
• Keep you present in the moment…
The experiences worth having,
remembering, and reliving
5. 5
Immersion enhances everyday experiences
Experiences become more realistic,
engaging, and satisfying
Spanning devices at home,
work, and throughout life
Life-like video
conferencing
Smooth, interactive,
cognitive user interfaces
Augmented reality
experiences
Virtual reality
experience
Realistic gaming
experiences
Theater-quality movies
and live sports
7. 7
Visual
quality
Sound
quality
Intuitive
interactions
The next generation technologies driving immersion
Achieving full immersion at low power to enable a comfortable, sleek form factor
Immersion
Scene-based audio
3D audio and positional audio
through higher order ambisonics
Natural user interfaces
Adaptive, multi-modal user interfaces
like voice, gestures, and eye tracking
High dynamic range (HDR)
Increased contrast, expanded color
gamut, and increased color depth
8. HDR for enhanced
visual quality
Increased brightness and contrast, expanded color gamut, and increased color depth
9. 9
HDR ON
9
HDR OFF
HDR images and videos are visually stunning
Much more realistic and immersive
10. 10
HDR will enhance the visual quality on all our screens
Bringing our experiences closer to full immersion
Visuals
So vibrant that they are
eventually indistinguishable
from the real world
11. 11
Achieving realistic HDR is challenging
Real-life brightness has a wide dynamic range that is hard to capture and replicate
Real-life
• Sun: ~10^9 nits1
• Sunlit scene: ~10^5 nits
• Starlight: ~10^-3 nits
• Dynamic range: ~10^12:1
Human vision
• Eye’s dynamic range:
• ~10^4:1 (static)
• ~10^6:1 (dynamic)
• Eyes are sensitive to relative
luminance
1 Nit is the unit of luminance, which is also known as candela per square meter (cd/m2). Candela is the unit of luminous intensity.
Camera and display
technologies
• Camera sensors can’t capture the
full dynamic range
• Display panels can’t replicate the
full dynamic range
12. 12
Three technology vectors are essential for HDR
Making every pixel count
Contrast and
brightness
Color gamut Color depth
Brighter whites and darker
blacks closer to the brightness
of real life
A subset of visible colors that
can be accurately captured and
reproduced
The number of gradations in
color that can be captured and
displayed
13. 13
HDR10 is the next step towards true-to-life visuals
A requirement for ULTRA HD PREMIUM certification
Contrast and
brightness
Color gamut Color depth Codec
EOTF up to 10,000 nits BT.2020 support
10-bit per channel:
over a billion colors!
HEVC Main 10 profile
EOTF is electro optical transfer function.
Display from 0.05 to 1000
nits or 0.0005 to 540 nits
Min. 90% DCI-P3 color
reproduction
HDR10
Content spec
ULTRA HD
PREMIUM
Display spec
14. 14
The time is right for HDR10
Technologies and ecosystem are now aligning
HDR10
Ecosystem
drivers
Device availability
Software support
Content creation
and deployment
Technology
advancements
Multimedia technologies
Display and camera
technologies
Power and thermal
efficiency
15. 15
Qualcomm® Snapdragon™ 835 processor is ready for
ULTRA HD PREMIUM certification
Enjoy vibrant HDR content on a variety of screens
SoC Display
HDR10 content
Movies and TV shows
• Netflix, Amazon, etc.
Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc.
ULTRA HD PREMIUM certification is a device-level certification. Each Snapdragon device must be certified.
16. 16
A history of multimedia technology leadership
2013
Snapdragon
800
First with 4K
(H.264) capture
and playback
Snapdragon
805
4K playback with
HEVC (H.265)
Snapdragon
810
4K capture
and playback
with HEVC
Snapdragon
820
4K playback
@ 60fps
Snapdragon
835
HDR10
ULTRA HD
PREMIUM-ready
2017
1st
1st
1st
1st
1st
17. 17
A heterogeneous computing approach is needed for HDR
Efficient processing by running the appropriate task on the appropriate engine
* Not to scale
Adreno 540 Visual Processing
• HEVC Main 10 video profile support with metadata processing
• Accurate color gamut and tone mapping
• Efficient rendering HDR effects for games with DX12 and Vulkan
• Precise blending of mixed-tone (HDR and SDR) layers
• Native 10-bit color BT.2020 support over HDMI, DP, & DSI displays
Qualcomm Spectra 180 ISP
• 14-bit processing pipeline to support the latest camera sensors
• Video and snapshot HDR processing with local tone mapping
Hexagon 682 DSP & Kryo 280 CPU
• Multicore CPU: Camera, video, and graphics application processing
• DSP + HVX for accelerated multimedia post processing
Snapdragon 835
Snapdragon
X16 LTE modem
Adreno 540
Graphics Processing Unit
(GPU)
Wi-Fi
Hexagon 682 DSP Qualcomm
Spectra 180
HVX
Qualcomm
All-Ways Aware
Qualcomm
Aqstic Kryo 280 CPU
IZat Location
Qualcomm
Haven
Display
Processing Unit
(DPU)
Video
Processing
Unit (VPU)
Snapdragon, Qualcomm Adreno, Qualcomm Hexagon, Qualcomm All-Ways Aware, Qualcomm Spectra, Qualcomm Aqstic, Qualcomm Kryo, Qualcomm IZat, and Qualcomm Haven are products of Qualcomm Technologies, Inc.
19. 19
True-to-life sound is critical
to immersive experiences
The sounds and visuals must match —
our hearing perceives depth, direction,
and magnitude of sound sources
• Sound sources are all around us
20. 20
True-to-life sound is critical
to immersive experiences
The sounds and visuals must match —
our hearing perceives depth, direction,
and magnitude of sound sources
• Sound sources are all around us
• Sound waves merge and reflect
21. 21
True-to-life sound is critical
to immersive experiences
The sounds and visuals must match —
our hearing perceives depth, direction,
and magnitude of sound sources
• Sound sources are all around us
• Sound waves merge and reflect
• Distinct sound pressure value at every
point in the 3D scene
22. 22
Scene-based audio captures
the entire 3D audio scene
Higher Order Ambisonics (HOA)
coefficients are the key
• Spherical harmonic-based transforms
convert the 3D sound pressure field into a
compact and comprehensive
representation—the HOA coefficients
• The HOA format is conducive for
compression. Spatial encoding compresses
the HOA coefficients
• Once calculated, the HOA coefficients are
decoupled from the capture and playback
23. 23
Object-based audio
for 3D audio scene
Faces issues with scaling
and requires post-processing
on capture
• Audio is associated with each object in
the scene
• Audio of each object and its corresponding
position needs to be determined through
post-processing
• The complexity and bandwidth
requirements increase with the number
of objects in the scene
• As a result, typical usage is a combination
of object- and channel-based audio
24. 24
Channel-based audio
for 3D audio scene
A legacy format with a number
of issues
• Mics are subjectively placed in different
places depending on audio engineers
• In post-processing, the sound mix is
subjectively created and may bear no
resemblance to original audio scene
• A variety of formats need to be created,
transmitted, and stored, such as 2.0, 5.1,
7.1.4, 22.2
• Playback does not adjust for incorrect
speaker layout
25. 25
Scene-based audio is a new paradigm for 3D audio
Providing key benefits and solving the major challenges of existing audio formats
MIPS = Millions of Instructions Per Second
Efficient
• Reduced bandwidth and file size
• Rendering complexity is
independent of scene complexity
• A single format
• Scalable layering
• Power efficient: high quality per
MIPS
High fidelity
• Higher order ambisonics
• The perfect representation of the
3D audio scene
• High resolution and increased
sweet spot
Comprehensive
• Simple, real-time capture
• Flexible rendering
• Seamless integration into audio
workflows/applications
• Advanced effects for interactivity
26. 26
Simple real-time capture and flexible rendering
HOA coefficients are decoupled from the capture and playback
Flexible rendering
• Audio is rendered at the playback location based
on the number and location of the speakers
• Recreates best possible reproduction of the original sound scene
• Supports any channel format: 2.0, 5.1, 7.1.4, 22.2, binaural, etc.
• Uniform experience across devices and playback locations
(theater, home, mobile devices, etc.)
Simple real-time capture
• Spatially separated microphones are required
• Ideally, a spherical mic array with 32 mics for 4th order HOA
coefficients
• Spot mics can be added
• A smartphone with 3 mics offers 1st order HOA coefficients
• Captures the entire 3D audio scene
• Generates a single, compact file
• Great for live content (sports, user generated content,
etc.) and post-production (movies, etc.)
27. 27
3D positional audio is essential for VR and AR
Accurate 3D surround sound based on your head’s position relative to various sound sources
• Sound arrives to each ear at the
accurate time and with the correct
intensity
• HRTF (head related transfer function):
◦ Takes into account typical human facial
and body characteristics, like location,
shape, and size of ears
◦ Is a function of frequency and three spatial
variables
• Sound at the ears needs to be
appropriately adjusted dynamically as
your head and the sound sources move
— the VR and AR experience
28. 28
Scene-based audio is an ideal solution for VR and AR
A natural fit for capturing and playing back 3D positional audio
High fidelity
• Captures the entire 3D sound scene
in high quality
• Video and audio captured on
the same device
Real-time & simple
• Works on a variety of devices (action
camera, smartphone, etc.)
• No post-production required but easy
to apply scene-based effects
• Great for live events like sports and
user-generated content
• Compact file
Capture Playback
Sounds
So accurate that
they are true to life
Immersive
• High-fidelity 3D surround sound
adjusts based on head pose
• 3-DOF and 6-DOF support
• A natural way to guide a user’s
attention
Efficient
• Accurate manipulation of
the sound field
• HOA coefficients are computationally
efficient to rotate, stretch, or compress
the audio scene
29. 29
Scene-based audio adoption is accelerating
The entire ecosystem needs to align
Advanced
demonstrations
• End-to-end workflow
solutions
• Broadcast (TV)
• VR
• Immersive audio
Standards
adoption
• MPEG-H 3D Audio
• ATSC 3.0
• DVB is considering MPEG-H
3D Audio
• Device interoperability:
DisplayPort, HDMI, etc.
Real
deployments
• YouTube is using first order
ambisonics for spatial audio
• 2018 Winter Olympics in
South Korea is using MPEG-
H 3D Audio
• Various mics available for
purchase
MPEG = Moving Picture Experts Group. ATSC = Advanced Television Systems Committee.
Learn more about our contribution to scene-based audio: https://www.qualcomm.com/scene-based-audio
31. 31
Adaptive, multimodal,
user interfaces
Speech recognition, eye tracking, and gesture recognition are becoming essential
Natural user interfaces for intuitive interactions
Speech recognition
Use natural language
processing
Motion & gesture recognition
Use computer vision, motion
sensors, or touch
Face recognition
Use computer vision to recognize
facial expressions
Personalized interfaces
Learn and know user preferences
based on machine learning
Eye tracking
Use computer vision to
measure point of gaze
Bringing life to objects
Efficient user interfaces for IoT
32. 32
Voice is a natural way
to interact with devices
A hands-free interface is necessary in
certain situations
Designed to be
• Intuitive
• Conversational
• Convenient
• Productive
• Personalized
Underlying technology
• Voice activation
• Noise filtering, suppression,
and cancellation
• Speech recognition
• Natural language processing
• Voice recognition / biometrics
• Deep learning
33. 33
Eye tracking naturally detects our point of interest
Providing valuable information for interacting with our devices
Natural user interface
• Gaze tracking and estimation
to navigate within next-gen
applications
• Fast and secure authentication
through iris scan
• Applicable to VR HMDs, AR glasses,
and smartphones
Improved visuals
• Gaze tracking and estimation will
be an input to new visual and
auditory rendering techniques
• Foveated rendering of graphics
and video enables a more immersive
visual user experience
• Eye tracking, when combined
with machine learning, will also
personalize VR and AR experiences
Dynamic calibration
• Each human face has a different inter-
pupillary distance (IPD)
• HMDs can also move around
on the face during use
• CV techniques will be used to
dynamically and accurately
account for IPD
Requirements
Tracking camera
Eye tracker
Gaze estimation
Latency reduction
System optimization
Robust solution
34. 34
Gesture recognition for natural hand interactions
Interact with the UI like you would in the real world
Benefits
• Intuitive interaction with a device without the
need for accessories — grab, select, type, etc.
• A reconstructed hand with accurate movements
increases the level of immersion for VR
• Increased productivity by using gestures where
appropriate and having predictive UI
Key technologies
• Wide field-of-view camera
• Computer vision
• Machine learning
Identify handsDetect
Track
Follow key points on hands
and fingers as they move
Recognize
Understand the meaning of
the hand and finger gestures,
even when occluded
Take appropriate action based
on current and predicted gesture
Act
35. QTI is uniquely positioned
to support superior
immersive experiences
Custom designed SoCs and investments in the core immersive technologies
36. 36
Within device constraints
Development time
Sleek form factor
Power and thermal efficiency
Cost
QTI is uniquely positioned to support immersive experiences
Providing efficient, comprehensive solutions
Immersive experiences Commercialization
Efficient heterogeneous
computing architecture
Comprehensive
solutions across tiers
Custom designed
processing engines
Snapdragon development
platforms
Ecosystem collaboration
App developer tools
Consistent, accurate color
HDR video, photos, and playback
High resolution and frame rate
Positional audio
Noise removal
True-to-life audio processing
Multimodal natural UIs
Intelligent, contextual interactions
Responsive and smooth UIs
Visual quality
Sound quality
Intuitive
interactions
Via Snapdragon™
solutions
Via ecosystem
enablement