1. MPEG for Augmented Reality
ISMAR, September 9, 2014, Munich
AR Standards Community Meeting September 12, 2014
Marius Preda, MPEG 3DG Chair
Institut Mines TELECOM
http://www.slideshare.net/MariusPreda/mpeg-augmented-reality-tutorial
2. What you will learn today
• Who is MPEG and why MPEG is doing AR
• MPEG ARAF design principles and the main features
• Create ARAF experiences: two exercises
8. Event LOOV
• Collecting virtual money in real world for buying real
services and products
Available on AppStore, AndroidStores and MyMultimediaWorld.com
12. Answers to (some) of Christine’s (non-technical)
questions
• Who is MPEG?
• What MPEG does successfully?
• Who are the members?
• IPR policy
13. What is MPEG?
A suite of ~130 ISO/IEC standards for:
•Coding/compression of elementary media:
• Audio (MPEG-1, 2 and 4), Video (MPEG-1, 2 and 4), 2D/3D graphics (MPEG-4)
• Transport
• MPEG-2 Transport, File Format, Dynamic Adaptive Streaming over HTTP (DASH)
• Hybrid (natural & synthetic) scene description, user interaction (MPEG-4)
• Metadata (MPEG-7)
• Media management and protection (MPEG-21)
• Sensors and actuators, Virtual Worlds (MPEG-V)
• Advanced User interaction (MPEG-U)
• Media-oriented middleware (MPEG-M)
More ISO/IEC standards under development for
• Coding and Delivery in Heterogeneous Environments (incl.)
• 3DVideo
•…
14. What is MPEG?
• A standardization activity continuing for 25 years,
– Supported by several hundreds companies/organisations from ~25 countries
– ~500 experts participating in quarterly meetings
– More than 2300 active contributors
– Many thousands experts working in companies
• A proven manner to organize the work to deliver useful and used standards
– Developing standards by integrating individual technologies
– Well defined procedures
– Subgroups with clear objectives
– Ad hoc groups continuing coordinated work between meetings
• MPEG standards are widely referenced by industry
– 3GPP, ARIB, ATSC, DVB, DVD-Forum, BDA, EITSI, SCTE, TIA, DLNA, DECE, OIPF…
• Billions of software and hardware devices built on MPEG technologies
– MP3 players, cameras, mobile handsets, PCs, DVD/Blue-Ray players, STBs, TVs, …
• Business friendly IPR policy established at ISO level
15. MPEG technologies related to AR: 1st pillar
1992/4
1997
MPEG-1/2
(AV content)
1998
VRML
• Part 11 - BIFS:
-Binarisation of VRML
-Extensions for streaming
-Extensions for server command
-Extensions for 2D graphics
- Real time augmentation with
audio & video
• Part 2 - Visual:
- 3D Mesh compression
- Face animation
• Part 2 – Visual
- Body animation
1999
MPEG-4 v.1
MPEG-4 v.2
First form of broadcast signal augmentation
16. MPEG technologies related to AR: 1st pillar
2003
MPEG-4
•AFX 2nd Edition:
- Animation by
morphing
- Multi-texturing
2005
• AFX 3rd Edition
- WSS for terrain
and cities
- Frame based
animation
2007
MPEG-4
MPEG-4
• Part 16 - AFX:
- A rich set of 3D
graphics tools
- Compression of
geometry,
appearance,
animation
• AFX 4th Edition
- Scalable complexity
mesh coding
2011
MPEG-4
A rich set of Scene
and Graphics
representation and
compression tools
17. MPEG technologies related to AR: 2nd pillar
2011
2012
MPEG-V - Media
Context and Control
2013
2014
• 2nd Edition:
- GPS
- Biosensors
- 3D Camera
MPEG-H
• Compression
of video +
depth
MPEG-V
- 3D Video
• 1st Edition
- Sensors and
actuators
- Interoperability
between Virtual
Worlds
• Feature-point based
descriptors for image
recognition
201x
CDVS
MPEG-U –
Advanced
User Interface
A rich set of Sensors
and Actuators
- 3D Audio
19. MPEG technologies related to AR: 2nd pillar
Actuators
Light
Flash
Heating
Cooling
Wind
Vibration
Sprayer
Scent
Fog
Color correction
Initialize color correction parameter
Rigid body motion
Tactile
Kinesthetic
Global position command
MPEG-V – Media Context and Control
Sensors
Light
Ambient noise
Temperature
Humidity
Distance
Atmospheric pressure
Position
Velocity
Acceleration
Orientation
Angular velocity
Angular acceleration
Force
Torque
Pressure
Motion
Intelligent camera type
Multi Interaction point
Gaze tracking
Wind
Global position
Altitude
Bend
Gas
Dust
Body height
Body weight
Body temperature
Body fat
Blood type
Blood pressure
Blood sugar
Blood oxygen
Heart rate
Electrograph
EEG , ECG, EMG, EOG , GSR
Weather
Facial expression
Facial morphology
Facial expression characteristics
Geomagnetic
20. Main features of MPEG AR technologies
• All AR-related data is available from MPEG standards
• Real time composition of synthetic and natural objects
• Access to
– Remotely/locally stored scene/compressed 2D/3D mesh
objects
– Streamed real-time scene/compressed 2D/3D mesh objects
• Inherent object scalability (e.g. for streaming)
• User interaction & server generated scene changes
• Physical context
– Captured by a broad range of standard sensors
– Affected by a broad range of standard actuators
21. MPEG vision on AR
MPEG-4/MPEG-7/MPEG-21/
MPEG-U/MPEG-V
MPEG Player
Compression
Authoring Tool
Produce
Download
ARAF
22. MPEG vision on AR
MPEG-4/MPEG-7/MPEG-21/
MPEG-U/MPEG-V
ARAF Browser
Compression
Authoring Tool
Produce
Download
ARAF
23. End to end chain
ARAF
Browser
Media
Servers
Service
Servers
User
Local
Sensors &
Actuators
Remote
Sensors &
Actuators
MPEG
ARAF
Local
Real World
Environment
Remote
Real World
Environment
Authoring
Tools
24. MPEG-A Part 13 ARAF
Three main components: scene, sensors/actuators, media
• A set of scene graph nodes/protos as defined in MPEG-4 Part 11
– Existing nodes : Audio, image, video, graphics, programming, communication, user
interactivity, animation
– New standard PROTOs : Map, MapMarker, Overlay, Local & Remote Recognition,
Local & Remote Registration, CameraCalibration, AugmentedRegion, Point of
Interest
• Connection to sensors and actuators as defined in MPEG-V
– Orientation, Position, Angular Velocity, Acceleration, GPS, Geomagnetic, Altitude
– Local or/and remote camera sensor
– Flash, Heating, Cooling, Wind, Sprayer, Scent, Fog, RigidBodyMotion, Kinestetic
• Compressed media
25. MPEG-A Part 13 ARAF
Scene: 73 XML Elements
Documentation available online:
http://wg11.sc29.org/augmentedReality/
28. MPEG-A Part 13 ARAF
Exercises
AR Quiz Augmented Book
http://youtu.be/la-Oez0aaHE http://youtu.be/LXZUbAFPP-Y
29. MPEG-A Part 13 ARAF
AR Quiz setting, preparing the medias
images, videos, audios, 2D/3D assets
GPS location
30. MPEG-A Part 13 ARAF
AR Quiz XML inspection
http://tiny.cc/MPEGARQuiz
31. MPEG-A Part 13 ARAF
AR Quiz Authoring Tool
www.MyMultimediaWorld.com go to Create / Augmented Reality
32. MPEG-A Part 13 ARAF
Augmented Book setting
images, audios
33. MPEG-A Part 13 ARAF
Augmented Book XML inspection
http://tiny.cc/MPEGAugBook
34. MPEG-A Part 13 ARAF
Augmented Book Authoring Tool
www.MyMultimediaWorld.com go to Create / Augmented Books
35. Conclusions
• ARAF Browser is Open Source
– iOS, Android, WS, Linux
– distributed at www.MyMultimediaWorld.com
• ARAF V1 published early 2014
• ARAF V2 in progress
– Visual Search (client side and server side)
– 3D Video, 3D Audio
– Connection to Social Networks
– Connection to POI servers
38. MPEG 3DG Report
ARAF 2nd Edition, items under discussion
1. Local vs Remote recognition and tracking
2. Social Networks
3. 3D video
4. 3D audio
39. MPEG 3DG Report
Server side object recognition: a real system*
Client Server
Query
image
[Extraction]
Descriptors
[Detection]
Key points
HTTP POST
(binary descriptor +
key points)
Query
descriptors
DB
descriptors
Matchin
g
ID
Correspondin
g Information
Error/no message
Data as String
Parse and
display the
answer
Decod
e
(5.2)
Decod
e
(1)
(2.2)
(2.1)
(3.1)
(3.2)
HTTP
Response
Descriptors,
images and
information
[DB]
(4)
(5.1)
(6)
(7)
(8’)
(8’’)
(10) (9)
Binary
Data
* Wine recognizer : GooT and IMT
40. MPEG 3DG Report
Server side object recognition: ARAF version
End-user Device
MAR
Scene
ARAF Browser
Video
stream Video
source
Processing Server URLs
Source
(video URL)
optional:
recognition region
Video
stream
Processing
Servers
Medi
a data
Binary (base64)
key points +
descriptors
Corresponding
media
DB
Image
Detection
Library
Detection
Library
Recognition
Libraries
MAR
Experience
Creator +
Content
Creator
Large
Image DB
ORB
41. MPEG 3DG Report
Server side object recognition: ARAF version
Discussions on:
- Does the content creator specify the form of request
(full image or descriptors) or the browser will take the
best decision?
- Is the server’s answer formalized in ARAF?
42. MPEG 3DG Report
ARAF – Social Network Data in ARAF scene
Scenario: display posts from SN in a geo-localized
manner
ARAF can do this directly by programming the access
to the SN service at the scene level
43. MPEG 3DG Report
ARAF – Social Network Data in ARAF scene
At minimum, user login to SN - at maximum : the MPEG UD
44. MPEG 3DG Report
ARAF – Social Network Data in ARAF scene
Connect to an UD server to get all the necessary data
45. MPEG 3DG Report
ARAF – Social Network scenario
Two categories of “SNS Data”
– Static data
• Name, photo, email, phone number, address,
sex, interest, …
– Social Network related activity
• Reported location, SNS post title, SNS text, SNS
media, SNS media
Obtained from the UD server
46. MPEG 3DG Report
ARAF 2nd Edition – introducing 3D Video
Modeling of 3 AR classes for 3D video:
1.Pre-created 3D model of the environment, using visual search
and other sensors to obtain camera position and orientation; 3D
video used for handle occlusions
2.No a priori 3D model of the scene, depth captured in real-time
and used to handle occlusions at the rendering step
3.No a priori model of the scene but created during AR
experience (SLAM – Simultaneous Location and Mapping)
47. MPEG 3DG Report
ARAF – introducing 3D Audio
Spatialisation Recognition
Use sounds
from the real
world to trigger
events in an AR
scene
48. MPEG 3DG Report
ARAF – 3DAudio : local spatialisation
MAR
Experience
Creator +
Content Creator
User location & direction + sound location
Scene
Mobile device
ARAF Browser
Video/audio
stream
Camera
Coordination
mapping
Sensed
data
Position &
orientation
sensor
3D Audio
Engine
Relative sound location +
(Acoustic scene) + audio
source
Spatialized
audio source
Video/audio
stream
ARAF file
Microphone
Mixer
Synthesized audio stream
49. MPEG 3DG Report
ARAF – 3DAudio : remote spatialisation
User location & direction + sound location
Scene
Mobile device
ARAF Browser
Video/audio
stream
Camera
Coordination
mapping
ARAF file
Sensed
data
Position &
orientation
sensor
video/audio
stream
Proxy
Server
3D Audio
Engine
Detection
Library
Detection
Library
Detection
Library
Relative sound location + Audio source + (Acoustic scene)
Spatialized audio source
MAR
Experience
Creator +
Content
Creator
Processing Server URL
Microphone
Mixer
Synthesized audio stream
50. MPEG 3DG Report
ARAF – Audio recognition: local
MAR
Experience
Creator +
Content Creator
Target Resources or descriptors
Audio
Detection
Library
Detection
Library
Detection
Library
Source (microphone/audio URL) Detection
Scene
Mobile device
ARAF Browser
Target Resources
ID Mask
Microphone/audio stream
Audio
source
Library
optional: detection window,
sampling rate, detection delay
51. MPEG 3DG Report
ARAF – Audio recognition: local
MAR
Experience
Creator +
Content
Creator
Target Resources or descriptors
Scene
Mobile device
ARAF Browser
Microphone/audio stream
Audio
source
Source (microphone/audio URL)
optional: detection window,
sampling rate, detection delay
Proxy
Server
Audio
Detection
Library
Detection
Library
Detection
Library
Detection
Library
ID Mask
URL of Processing Server
Target Resources or descriptors + IDs
+ optional detection window, sampling rate, detection delay
52. MPEG 3DG Report
ARAF – Audio recognition: local
MAR
Experience
Creator +
Content
Creator
Target Resources or descriptors
Target Resources or descriptors + IDs
+ optional detection window, sampling rate, detection delay
Scene
Mobile device
ARAF Browser
Audio
source
Source (microphone/audio URL)
optional: detection window,
sampling rate, detection delay
Processing
Server
Audio
Detection
Library
Detection
Library
Detection
Library
Detection
Library
ID Mask
URL of Processing Server
Descriptor
Extraction
Microphone/audio stream Descriptors
53. MPEG 3DG Report
ARAF – joint meeting with 3DAudio
Spatialisation Recognition
• The 3D audio renderer
needs an API to get the
user position and
orientation
• It may be more
complex to update in
real time position and
orientation of all the
acoustic objects
• MPEG-7 has several
tools for audio
fingerprint
• Investigate the
ongoing work on
“Audio
synchronisation” and
check if it is suitable
for AR
Notes de l'éditeur
Passing On, Treasure Hunt, Castle Quest, Arduinnae, Castle Crisis
Head Tracking is needed to render the audio.
3DAudio can be used to modulate the audio perception with respect to the user position and orientation. Currently similar approach is used at the production side but it can be used at the user side (in real time).
The 3D position and orientation of the graphical objects (enriched with audio) is known and it should be forwarded to the 3D audio engine. The relative positions between the sources and the user are prefered.
Draw a diagram showing that the scene is sending to the 3D audio engine the relative position of all the sources and get back the sound for the headphones.
Reference software implementation exists but is working using files: the chain is the following: (1) 3D decoder (multi-channel); some of the outputs are objects and higher order ambisonic. (2) Object renderer. The 3D coordinates are included as a metadata in the bitstream but an entry can be done in the Object Renderer taking the input from the scene.
Head Tracking is needed to render the audio.
3DAudio can be used to modulate the audio perception with respect to the user position and orientation. Currently similar approach is used at the production side but it can be used at the user side (in real time).
The 3D position and orientation of the graphical objects (enriched with audio) is known and it should be forwarded to the 3D audio engine. The relative positions between the sources and the user are prefered.
Draw a diagram showing that the scene is sending to the 3D audio engine the relative position of all the sources and get back the sound for the headphones.
Reference software implementation exists but is working using files: the chain is the following: (1) 3D decoder (multi-channel); some of the outputs are objects and higher order ambisonic. (2) Object renderer. The 3D coordinates are included as a metadata in the bitstream but an entry can be done in the Object Renderer taking the input from the scene.
Head Tracking is needed to render the audio.
3DAudio can be used to modulate the audio perception with respect to the user position and orientation. Currently similar approach is used at the production side but it can be used at the user side (in real time).
The 3D position and orientation of the graphical objects (enriched with audio) is known and it should be forwarded to the 3D audio engine. The relative positions between the sources and the user are prefered.
Draw a diagram showing that the scene is sending to the 3D audio engine the relative position of all the sources and get back the sound for the headphones.
Reference software implementation exists but is working using files: the chain is the following: (1) 3D decoder (multi-channel); some of the outputs are objects and higher order ambisonic. (2) Object renderer. The 3D coordinates are included as a metadata in the bitstream but an entry can be done in the Object Renderer taking the input from the scene.
Head Tracking is needed to render the audio.
3DAudio can be used to modulate the audio perception with respect to the user position and orientation. Currently similar approach is used at the production side but it can be used at the user side (in real time).
The 3D position and orientation of the graphical objects (enriched with audio) is known and it should be forwarded to the 3D audio engine. The relative positions between the sources and the user are prefered.
Draw a diagram showing that the scene is sending to the 3D audio engine the relative position of all the sources and get back the sound for the headphones.
Reference software implementation exists but is working using files: the chain is the following: (1) 3D decoder (multi-channel); some of the outputs are objects and higher order ambisonic. (2) Object renderer. The 3D coordinates are included as a metadata in the bitstream but an entry can be done in the Object Renderer taking the input from the scene.