Lecture 4 in the 2022 COMP 4010 lecture series on AR/VR. This lecture is about AR Interaction techniques. This was taught by Mark Billinghurst at the University of South Australia in 2022.
Mark BillinghurstDirector at HIT Lab NZ à University of South Australia
3. AR RequiresTracking and Registration
• Registration
• Positioning virtual object wrt real world
• Fixing virtual object on real object when view is fixed
• Calibration
• Offline measurements
• Measure camera relative to head mounted display
• Tracking
• Continually locating the user’s viewpoint when view moving
• Position (x,y,z), Orientation (r,p,y)
4. Sources of Registration Errors
•Static errors
• Optical distortions (in HMD)
• Mechanical misalignments
• Tracker errors
• Incorrect viewing parameters
•Dynamic errors
• System delays (largest source of error)
• 1 ms delay = 1/3 mm registration error
5. Reducing Static Errors
•Distortion compensation
• For lens or display distortions
•Manual adjustments
• Have user manually alighn AR andVR content
•View-based or direct measurements
• Have user measure eye position
•Camera calibration (video AR)
• Measuring camera properties
6. Reducing dynamic errors (1)
•Reduce system lag
•Faster components/system modules
•Reduce apparent lag
•Image deflection
•Image warping
8. Frames of Reference
• Word-stabilized
• E.g., billboard or signpost
• Body-stabilized
• E.g., virtual tool-belt
• Screen-stabilized
• Heads-up display
9. Tracking Requirements
• Augmented Reality Information Display
• World Stabilized
• Body Stabilized
• Head Stabilized
Increasing Tracking
Requirements
Head Stabilized Body Stabilized World Stabilized
12. Why Optical Tracking for AR?
• Many AR devices have cameras
• Mobile phone/tablet, Video see-through display
• Provides precise alignment between video and AR overlay
• Using features in video to generate pixel perfect alignment
• Real world has many visual features that can be tracked from
• Computer Vision well established discipline
• Over 40 years of research to draw on
• Old non real time algorithms can be run in real time on todays devices
13. Common AR Optical Tracking Types
• Marker Tracking
• Tracking known artificial markers/images
• e.g. ARToolKit square markers
• Markerless Tracking
• Tracking from known features in real world
• e.g. Vuforia image tracking
• Unprepared Tracking
• Tracking in unknown environment
• e.g. SLAM tracking
15. Natural Feature Tracking
• Use Natural Cues of Real Elements
• Edges
• Surface Texture
• Interest Points
• Model or Model-Free
• No visual pollution
Contours
Features Points
Surfaces
16. Detection and Tracking
Detection
Incremental
tracking
Tracking target
detected
Tracking target
lost
Tracking target
not detected
Incremental
tracking ok
Start
+ Recognize target type
+ Detect target
+ Initialize camera pose
+ Fast
+ Robust to blur, lighting
changes
+ Robust to tilt
Tracking and detection are complementary approaches.
After successful detection, the target is tracked incrementally.
If the target is lost, the detection is activated again
17. Marker vs.Natural FeatureTracking
• Marker tracking
• Usually requires no database to be stored
• Markers can be an eye-catcher
• Tracking is less demanding
• The environment must be instrumented
• Markers usually work only when fully in view
• Natural feature tracking
• A database of keypoints must be stored/downloaded
• Natural feature targets might catch the attention less
• Natural feature targets are potentially everywhere
• Natural feature targets work also if partially in view
18. Model BasedTracking
• Tracking from 3D object shape
• Example: OpenTL - www.opentl.org
• General purpose library for model based visual tracking
19. Tracking from an Unknown Environment
• What to do when you don’t know any features?
• Very important problem in mobile robotics - Where am I?
• SLAM
• Simultaneously Localize And Map the environment
• Goal: to recover both camera pose and map structure
while initially knowing neither.
• Mapping:
• Building a map of the environment which the robot is in
• Localisation:
• Navigating this environment using the map while keeping
track of the robot’s relative position and orientation
20. Parallel Tracking and Mapping
Tracking Mapping
New keyframes
Map updates
+ Estimate camera pose
+ For every frame
+ Extend map
+ Improve map
+ Slow updates rate
Parallel tracking and mapping uses two
concurrent threads, one for tracking and one for
mapping, which run at different speeds
21. Parallel Tracking and Mapping
Video stream
New frames
Map updates
Tracking Mapping
Tracked local pose
FAST SLOW
Simultaneous
localization and mapping
(SLAM)
in small workspaces
Klein/Drummond, U.
Cambridge
22. Visual SLAM
• Early SLAM systems (1986 - )
• Computer visions and sensors (e.g. IMU, laser, etc.)
• One of the most important algorithms in Robotics
• Visual SLAM
• Using cameras only, such as stereo view
• MonoSLAM (single camera) developed in 2007 (Davidson)
23. Combining Sensors andVision
• Sensors
• Produces noisy output (= jittering augmentations)
• Are not sufficiently accurate (= wrongly placed augmentations)
• Gives us first information on where we are in the world,
and what we are looking at
• Vision
• Is more accurate (= stable and correct augmentations)
• Requires choosing the correct keypoint database to track from
• Requires registering our local coordinate frame (online-
generated model) to the global one (world)
24. ARKit – Visual Inertial Odometry
• Uses both computer vision + inertial sensing
• Tracking position twice
• Computer Vision – feature tracking, 2D plane tracking
• Inertial sensing – using the phone IMU
• Output combined via Kalman filter
• Determine which output is most accurate
• Pass pose to ARKit SDK
• Each system compliments the other
• Computer vision – needs visual features
• IMU - drifts over time, doesn’t need features
25. ARKit –Visual Inertial Odometry
• Slow camera
• Fast IMU
• If camera drops out IMU takes over
• Camera corrects IMU errors
26. Conclusions
• Tracking and Registration are key problems
• Registration error
• Measures against static error
• Measures against dynamic error
• AR typically requires multiple tracking technologies
• Computer vision most popular
• Research Areas:
• SLAM systems, Deformable models, Mobile outdoor tracking
30. AR Interaction
• Designing AR Systems = Interface Design
• Using different input and output technologies
• Objective is a high quality of user experience
• Ease of use and learning
• Performance and satisfaction
31. Typical Interface Design Path
1/ Prototype Demonstration
2/ Adoption of Interaction Techniques from
other interface metaphors
3/ Development of new interface metaphors
appropriate to the medium
4/ Development of formal theoretical models
for predicting and modeling user actions
Desktop WIMP
Virtual Reality
Augmented Reality
32. Interacting with AR Content
• You can see spatially registered AR..
how can you interact with it?
33. Different Types of AR Interaction
• Browsing Interfaces
• simple (conceptually!), unobtrusive
• 3D AR Interfaces
• expressive, creative, require attention
• Tangible Interfaces
• Embedded into conventional environments
• Tangible AR
• Combines TUI input + AR display
34. AR Interfaces as Data Browsers
• 2D/3D virtual objects are
registered in 3D
• “VR in Real World”
• Interaction
• 2D/3D virtual viewpoint control
• Applications
• Visualization, training
35. AR Information Browsers
• Information is registered
to
real-world context
• Hand held AR displays
• Interaction
• Manipulation of a window
into information space
• Applications
• Context-aware information
displays
Rekimoto, et al. 1997
38. Current AR Information Browsers
• Mobile AR
• GPS + compass
• Many Applications
• Wikitude
• Yelp
• Google maps
• …
39. Example: Google Maps AR Mode
• AR Navigation Aid
• GPS + compass, 2D/3D object placement
41. Advantages and Disadvantages
• Important class of AR interfaces
• Wearable computers
• AR simulation, training
• Limited interactivity
• Modification of virtual
content is difficult
Rekimoto, et al. 1997
42. 3D AR Interfaces
• Virtual objects displayed in 3D
physical space and manipulated
• HMDs and 6DOF head-tracking
• 6DOF hand trackers for input
• Interaction
• Viewpoint control
• Traditional 3D user interface
interaction: manipulation, selection,
etc.
Kiyokawa, et al. 2000
46. Advantages and Disadvantages
• Important class of AR interfaces
• Entertainment, design, training
• Advantages
• User can interact with 3D virtual
object everywhere in space
• Natural, familiar interaction
• Disadvantages
• Usually no tactile feedback
• User has to use different devices for
virtual and physical objects
Oshima, et al. 2000
47. 3. Augmented Surfaces and Tangible Interfaces
• Basic principles
• Virtual images are projected
on a surface
• Physical objects are used as
controls for virtual objects
• Support for collaboration
Wellner, P. (1993). Interacting with paper on the
DigitalDesk. Communications of the ACM, 36(7), 87-96.
48. Augmented Surfaces
• Rekimoto, et al. 1999
• Front projection
• Marker-based tracking
• Multiple projection surfaces
• Object interaction
Rekimoto, J., & Saitoh, M. (1999, May). Augmented
surfaces: a spatially continuous work space for hybrid
computing environments. In Proceedings of the SIGCHI
conference on Human Factors in Computing
Systems (pp. 378-385).
56. i/O Brush (Ryokai, Marti, Ishii) - 2004
Ryokai, K., Marti, S., & Ishii, H. (2004, April). I/O brush: drawing with everyday objects as ink.
In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 303-310).
58. Many Other Examples
• Triangles (Gorbert 1998)
• Triangular based story telling
• ActiveCube (Kitamura 2000-)
• Cubes with sensors
• Reactable (2007- )
• Cube based music interface
59. Lessons from Tangible Interfaces
• Physical objects make us smart
• Norman’s “Things that Make Us Smart”
• encode affordances, constraints
• Objects aid collaboration
• establish shared meaning
• Objects increase understanding
• serve as cognitive artifacts
60. But There are TUI Limitations
• Difficult to change object properties
• can’t tell state of digital data
• Limited display capabilities
• projection screen = 2D
• dependent on physical display surface
• Separation between object and display
• ARgroove – Interact on table, look at screen
61. Advantages and Disadvantages
•Advantages
• Natural - user’s hands are used for interacting
with both virtual and real objects.
• No need for special purpose input devices
•Disadvantages
• Interaction is limited only to 2D surface
• Full 3D interaction and manipulation is difficult
62. Orthogonal Nature of Interfaces
3D AR interfaces Tangible Interfaces
Spatial Gap No – Interaction is
Everywhere
Yes – Interaction is
only on 2D surfaces
Interaction Gap
Yes – separate
devices for physical
and virtual objects
No – same devices for
physical and virtual
objects
63. Orthogonal Nature of Interfaces
3D AR interfaces Tangible Interfaces
Spatial Gap No – Interaction is
Everywhere
Yes – Interaction is
only on 2D surfaces
Interaction Gap
Yes – separate
devices for physical
and virtual objects
No – same devices for
physical and virtual
objects
64. 4. Tangible AR: Back to the Real World
• AR overcomes display limitation of TUIs
• enhance display possibilities
• merge task/display space
• provide public and private views
• TUI + AR = Tangible AR
• Apply TUI methods to AR interface design
Billinghurst, M., Kato, H., & Poupyrev, I. (2008). Tangible augmented reality. ACM Siggraph Asia, 7(2), 1-10.
65. Space vs. Time - Multiplexed
• Space-multiplexed
• Many devices each with one function
• Quicker to use, more intuitive, clutter
• Real Toolbox
• Time-multiplexed
• One device with many functions
• Space efficient
• mouse
66. Tangible AR: Tiles (Space Multiplexed)
• Tiles semantics
• data tiles
• operation tiles
• Operation on tiles
• proximity
• spatial arrangements
• space-multiplexed
Poupyrev, I., Tan, D. S., Billinghurst, M., Kato, H., Regenbrecht, H., & Tetsutani, N. (2001,
July). Tiles: A Mixed Reality Authoring Interface. In Interact (Vol. 1, pp. 334-341).
70. Tangible AR: Time-multiplexed Interaction
• Use of natural physical object manipulations to control
virtual objects
• VOMAR Demo
• Catalog book:
• Turn over the page
• Paddle operation:
• Push, shake, incline, hit, scoop
Kato, H., Billinghurst, M., Poupyrev, I., Imamoto, K., & Tachibana, K. (2000, October). Virtual object manipulation on a table-top AR
environment. In Proceedings IEEE and ACM International Symposium on Augmented Reality (ISAR 2000) (pp. 111-119). IEEE.
73. Advantages and Disadvantages
•Advantages
• Natural interaction with virtual and physical tools
• No need for special purpose input devices
• Spatial interaction with virtual objects
• 3D manipulation with virtual objects anywhere in space
•Disadvantages
• Requires Head Mounted Display
74. 5. Natural AR Interfaces
• Goal:
• Interact with AR content the same
way we interact in the real world
• Using natural user input
• Body motion
• Gesture
• Gaze
• Speech
• Input recognition
• Nature gestures, gaze
• Multimodal input
FingARtips (2004)
Tinmith (2001)
75. External Fixed Cameras
• Overhead depth sensing camera
• Capture real time hand model
• Create point cloud model
• Overlay graphics on AR view
• Perform gesture interaction
Billinghurst, M., Piumsomboon, T., & Bai, H. (2014). Hands in space: Gesture interaction with
augmented-reality interfaces. IEEE computer graphics and applications, 34(1), 77-80.
77. Head Mounted Cameras
• Attach cameras/depth sensor to HMD
• Connect to high end PC
• Computer vision capture/processing on PC
• Perform tracking/gesture recognition on PC
• Use custom tracking hardware
• Leap Motion (Structured IR)
• Intel RealSense (Stereo depth)
Project NorthStar (2018)
Meta2 (2016)
81. Speech Input
• Reliable speech recognition
• Windows speech, Watson, etc.
• Indirect input with AR content
• No need for gesture
• Match with gaze/head pointing
• Look to select target
• Good for Quantitative input
• Numbers, text, etc.
• Keyword trigger
• “select”, ”hey cortana”, etc https://www.youtube.com/watch?v=eHMkOpNUtR8
82. Eye Tracking Interfaces
• Use IR light to find gaze direction
• IR sources + cameras in HMD
• Support implicit input
• Always look before interact
• Natural pointing input
• Multimodal Input
• Combine with gesture/speech
Camera
IR light
IR view
Processed image
Hololens 2
84. Evolution of AR Interfaces
Tangible AR
Tangible input
AR overlay
Direct interaction
Natural AR
Freehand gesture
Speech, gaze
Tangible UI
Augmented surfaces
Object interaction
Familiar controllers
Indirect interaction
3D AR
3D UI
Dedicated
controllers
Custom devices
Browsing
Simple input
Viewpoint control
Expressiveness, Intuitiveness
86. Interaction Design
“Designing interactive products to support
people in their everyday and working lives”
Preece, J., (2002). Interaction Design
• Design of User Experience with Technology
87. Bill Verplank on Interaction Design
https://www.youtube.com/watch?v=Gk6XAmALOWI
88. •Interaction Design involves answering three questions:
•What do you do? - How do you affect the world?
•What do you feel? – What do you sense of the world?
•What do you know? – What do you learn?
Bill Verplank
89. Typical Interaction Design Cycle
Develop alternative prototypes/concepts and compare them, And iterate, iterate, iterate....
101. Tom Chi’s Prototyping Rules
1. Find the quickest path to experience
2. Doing is the best kind of thinking
3. Use materials that move at the speed of
thought to maximize your rate of learning
102. How can we quickly prototype
XR experiences with little or no
coding?