COSC 426 Lect. 7: Evaluating AR Applications

Lecture 7: Evaluating AR
Applications
pp
Mark Billinghurst
g
HIT Lab NZ
University of Canterbury

Building Compelling AR Experiences
B ildi C lli E i

experiences
Evaluation

applications Interaction

tools Authoring

components Tracking, Display

Sony CSL © 2004

The Interaction Design Process

Why Evaluate AR Applications?
To test and compare interfaces, new technologies,
interaction techniques
Test Usability (learnability, efficiency, satisfaction,...)
Get user feedback
Refine interface design
Better d t d
B tt undertsand your end users
d
...

Survey of AR Papers
Edward Swan (2005)
Surveyed major conference/journals (1992-2004)
- P
Presence, ISMAR, ISWC, IEEE VR
ISMAR ISWC
Summary
1104 total papers
t t l
266 AR papers
38 AR HCI papers (Interaction)
21 AR user studies
Only
O l 21 f
from 266 AR papers had a formal user study
h d f l t d
Less than 8% of all AR papers

HIT Lab NZ Usability Survey
A Survey of Evaluation Techniques Used in
Augmented Reality Studies
Andreas Dünser, Raphaël Grasset, Mark
p
Billinghurst

reviewed publications from 1993
and 2007
Extracted 6071 papers which mentioned
p p
“Augmented Reality”
Searched to find 165 AR papers with User
Studies

450

400

350

300

250

200

150

100

50

0
1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
ACM Digital Library SpringerLink
IEEE Xplore Journals ScienceDirect
SPIE Digital Library InformaWorld
MIT Press Journals Highwire
Blackwell Synergy Mary Ann Liebert
Wiley Interscience Sage Journals Online
Emerald Insight Oxford Journals
Cambridge Journals Online ASCE Publications
JSTOR Karger
WorldSciNet BioMed Central
ASME Annual Reviews
Nature Online MathSciNet
National Research Council of Canada Research Press (NRC) AdisOnline
APS Journals (PROLA) Royal Society Publishing

Types of User Studies
Types of AR user studies
Perception
p
User Performance
Collaboration
Usability of Complete Systems

Types of Experimental Measures Used
Types of Experimental Measures
Objective measures
Subjective measures
Qualitative analysis
Usability
U b l evaluation techniques
l h
Informal evaluations

Types of Experimental Measures Used

Summary
Over last 10 years
Most user studies focused on user performance
p
Fewest user studies on collaboration
Objective performance measures most used
Qualitative and usability measures least used

What is evaluation?
Evaluation is concerned with gathering
data about the usability of a design or
product by a specified group of users for a
particular activity within a specified
environment or work context

Evaluation
Goal: Measure goodness of the application design
yp
Two types:
Formative evaluation performed at different stages of
development to check that the product meets users’ needs.
Summative evaluation assesses the quality of a finished
product.
Focusing on F
F i Formative E l i
i Evaluation

When to evaluate?
Once the application has been developed
pros : rapid development, small evaluation cost
cons : rectifying problems

redesign &
design implementation evaluation
reimplementation

During design and development
pros : find and rectify problems early
cons : higher evaluation cost, longer development

design implementation

Four evaluation paradigms
‘quick and dirty’
q y
usability testing (lab studies)
field studies
predictive evaluation

Quick and dirty
‘quick & dirty’ evaluation: informal feedback from
users or consultants to confirm that their ideas are in-
in
line with users’ needs and are liked.
Quick & dirty evaluations are done any time
time.
Emphasis is on fast input to the design process rather
than
th carefully d
f ll documented fi di
t d findings.

Usability Testing
Recording typical users’ performance on typical tasks in
controlled settings. Field observations may be used.
g y
As the users perform these tasks they are watched & recorded
on video & their inputs are logged.
This data is used to calculate performance times, errors & help
explain why the users did what they did.
User satisfaction questionnaires & interviews are used to elicit
users’ opinions.

Laboratory-based Studies
Laboratory-based studies
can be used for evaluating the design, or the
design
implemented system
are carried out in an interruption-free usability lab
can accurately record some work situations
some studies are only possible in a lab environment
di l ibl i l b i
some tasks can be adequately performed in a lab
are useful for comparing different designs in a
controlled context

Laboratory-based Studies

Controlled, instrumented environment

Field Studies
Field studies are done in natural settings
The aim is to understand what users do naturally and
y
how technology impacts them.
In product design field studies can be used to:
design,
- identify opportunities for new technology
- determine design requirements
- decide how to introduce new technology
- evaluate technology in use
use.

Predictive Evaluation
Experts apply their knowledge of typical users,
guided by heuristics, to predict usability problems.
Can involve theoretically based models.
A key feature of predictive evaluation is that real
k f t f di ti l ti i th t l
end users need not be present
Relatively quick and inexpensive

Characteristics of Approaches
Usability Field studies Predictive
testing
Users
U do k
d task natural
l not involved
l d

Location controlled natural anywhere

When prototype early prototype

Data quantitative qualitative problems

Feed back measures & descriptions problems
errors
Type applied naturalistic expert

Evaluation Approaches and Methods
Method Usability Field studies Predictive
testing
Observing
Ob i x x
Asking users x x
Asking x x
experts
Testing x
Modeling x

DECIDE:
A framework to guide evaluation
- Determine the goals the evaluation addresses.
- Explore the specific questions to be answered.
- Choose the evaluation p di and t h i
Ch th l ti paradigm d techniques
- Identify the practical issues.
- Decide how to deal with the ethical issues.
- Evaluate, interpret and present the data.

DECIDE Framework
Determine Goals:
D G l
What are the high-level goals of the evaluation?
How wants the evaluation and why?
Explore the Questions:
Create well defined, relevant questions
q
Choose the Evaluation Paradigm
Influences the techniques used, how data is analyzed
Identify Practical Issues
How to select users, stay on budget & schedule
How to find evaluators select equipment
evaluators,

DECIDE Framework
Decide on Ethical Issues
Informed consent form
Participants have a right to:
-kknow th goals of th study and what will h
the l f the t d d h t ill happen to the fi di
t th findings
- privacy of personal information

Evaluate, Interpret and Present Data
, p
- Reliability: can the study be replicated?
- Validity: is it measuring what you thought?
y g y g
- Biases: is the process creating biases?
- Scope: can the findings be generalized?
- E l i l validity: is the environment influencing the results?
Ecological lidit i th i t i fl i th lt ?

Pilot Studies
A small trial run of the main study.
Can identify majority of issues with interface design
Pilot studies check:
- that the evaluation plan is viable
p
- you can conduct the procedure
- that interview scripts, questionnaires, experiments, etc. work
appropriately
Iron out problems before doing the main study.

Controlled experiments
Designer of a controlled experiment should carefully
consider
proposed hypothesis
selected subjects
measured variables
experimental methods
data ll i
d collection
data analysis

Variables
V i bl
Experiments manipulate and measure variables under
controlled conditions
There are two types of variables
independent: variables that are manipulated to create different
experimental conditions
- e.g. number of items in menus, colour of the icons
dependent: variables that are measured to find out the effects of
changing the independent variables
- e.g. speed of menu selection, speed of locating icons
Test Conditions
The levels, values, or settings for an independent variable
Example
E l
- test conditions: HMD, Handheld device 1, Handheld device 2

“Other” Variables
Control variables
e.g. room light, noise…
if controlled => less external validity
Random variables (not controlled)
e.g. fatigue
more influence of random variable => less internal validity
Confounding variables
p
practice
previous experience

Hypothesis
A hypothesis is a prediction of the outcome
what will happen to the dependent variables when the
independent variables are changed
to show that the prediction is right
- d
dependant variables don’t change by changing
d t i bl d ’t h b h i
the independent variables
- rejecting the null hypothesis (H0 )
j g yp (

Experimental methods
It is important to select the right experimental
method so that the results of the experiment
can be generalized
There are mainly two experimental methods
y p
between-groups: each subject is assigned to one experimental
condition
within-groups: each subject performs under all
the different conditions

Experimental methods
Between-groups
g p Within-groups
g p

Subjects Subjects

Randomly Randomly
assigned assigned

Condition Condition Condition

rimental tasks

rimental tasks

rimental tasks
erimental task

1 2 3
Condition Condition Condition Condition Condition Condition
1 2 3 2 1 1
Expe

Exper

Exper

Exper
Condition Condition Condition
3 3 2

data data data data data data

Statistical data analysis Statistical data analysis

Within vs. Between Subjects
between subjects design
each participant is tested on only one level/condition
a separate group of participants is used for each condition
- one group uses HMD other group uses Handheld device

within subjects design
participant is tested on each level/condition
- e.g. participants use Handheld device and HMD
repeated measurement

Between Subjects
Sometimes a factor must be between subjects
e.g. gender, age, experience
Between subjects advantage:
avoids interference effects (e.g. practice / learning effect)
Between subjects disadvantage:
Increased variability = need more subjects
y j
Important: randomised assignment to conditions

Within Subjects
Sometimes a factor must be within subjects
e.g. measuring learning effects
Within subjects advantages
less participants needed (all p
p p ( participants in all conditions)
p )
differences (variability) between subjects the same across
test conditions
Counterbalance order of presenting conditions
A => B => C B => C => A C => A => B
The order is best governed by a Latin Square

Latin Square Design
each condition occurs once in each row and column

Note: In a balanced Latin Square each condition both
precedes and follows each other condition an equal
d d f ll h th diti l
number of times

Subjects
The h
Th choice of subjects is critical to the validity of the
f b l h ld f h
results of an experiment
subjects group should b representative of th
bj t h ld be t ti f the
expected user population
In selecting the subjects it is important to consider
things such as their
age group, education, skills, culture
g g p
How does the sample influence the results?
Report the selection criteria and give relevant
demographic information in your publication

Subjects
How many participants?
H ?
How big is the effect you want to measure?
- l
large effects can be detected with smaller samples
ff b d d ih ll l
- e.g. small n needed to discriminate speed between turtles and a rabbits
The more participants the “smoother” the data
p p
- Central Limit Theorem - as n increases (n>30) the sample mean
approaches a normal distribution
- extreme data has less influence (e g one sleepy participants does not
(e.g.
mess up the results that much)

for quantitative analysis: rule of thumb MINIMUM
q y
15-20 or more per group/cell

Data Collection and Analysis
The choice of a method is dependent on the type of
data h
d that needs to be collected
d b ll d
In order to test a hypothesis the data has to be
analysed using a statistical method
l d l h d
The choice of a statistical method depends on
the type of collected data
All the decisions about an experiment should be
made before it is carried out

Observe and Measure
Observations are gathered…
manually (human observers)
automatically (computers, software, cameras, sensors, etc.)

A measurement is a recorded observation
Objective metrics
j
Subjective metrics

Typical objective metrics
task completion time
k l i i
errors (number, percent,…)
percent of task completed
ratio of successes to failures
number of repetitions
number of commands used
number of failed commands
physiological data (heart rate,…)
…

Typical subjective metrics
user satisfaction
subjective p
j performance
ratings
ease of use
intuitiveness
judgments
…

Data Types
Subjective
Subjective survey How easy was the task
- Likert Scale, condition rankings
1 2 3 4 5
Observations Not very easy Very easy
- Think Aloud
Interview responses
Objective
Performance measures
e o a ce easu es
- Time, accuracy, errors
Process measures
- Vid / di analysis
Video/audio l i

Experimental Measures
E erimental Meas res
Measure What does it tell us? How is it measured?
Timings Performance Via a stopwatch, or
automatically by the device.

Errors Performance, Particular sticking points in a task By success in completing the task
correctly. Through experimenter
observation, examining the route
walked.

Perceived Workload Effort invested. User satisfaction Through NASA TLX scales and other
questionnaires.
i i

Distance traveled and route Depending on the application, these can be used Using a pedometer, GPS or other
taken to pinpoint errors and to indicate performance location-sensing system. By
experimenter observation.

Percentage preferred walking Performance By finding average walking speed,
speed which is compared with normal walking
speed.

Comfort User satisfaction. Device acceptability Comfort Rating Scale and other
questionnaires.

User comments and User satisfaction and preferences. Particular Through questionnaires, interviews and
preferences sticking points in a task. think alouds.
think-alouds.

Experimenter observations Different aspects, depending on the experimenter Through observation and note-taking
and on the observations

Statistical Analysis
Once data is collected statistics can be used for analysis
Typical Statistical Techniques
yp q
Comparing between two results
- Unpaired T-Test (for between subjects – assumes normal distribution, interval
scale, h
l homogeneity of variances)
it f i )
- Paired T-Test (for within subjects – assumes normal distribution, etc.)
- Mann–Whitney U-test (between subjects – if assumptions are not met)
Comparing between > two results
- Analysis of Variance – ANOVA
- F ll
Followed b post-hoc analysis – B f
d by th l i Bonferroni adjustment
i dj t t
- Kruskal–Wallis (does not assume normal distribution)

Running the study
Offload
Offl d your B
Brain!
!
Write down instructions
prepare checklists
h kli t
create templates
print and pitch important information
Try and find an assistant
Print questionnaires and other
documents the day before
Rehearse procedures
procedures. - 4 kg in 2 weeks

Bring your lunch – don’t forget to eat

Running the study
Treat the participants nicely
Prepare candy and drinks and make them feel good.
p y g
Take the role of a friendly waiter:
Always stay in background but offer assistance if needed.
Take notes, document oddities.
Nothing is as bad as lost data!!
AVOID AVOID AVOID

Running the study
Take many photos of your setup in action.
Prepare consent forms if y want to use pictures
p you p
for publications.

Field S d
F ld Studies
Field studies are done in natural settings
settings.
“in the wild” is a term for prototypes being used
freely in natural settings
settings.
Aim to understand what users do naturally and how
technology impacts them
them.
Field studies are used in product design to:
- identify opportunities for new technology;
- determine design requirements;
- decide how best to introduce new technology;gy;
- evaluate technology in use.

59 www.id-book.com

Observation
Direct observation i the fi ld
Di b i in h field
Structuring frameworks
Degree of participation (insider or outsider)
Ethnography
Direct observation in controlled environments
Indirect observation: tracking users’ activities
Diaries
Interaction logging

Ethnography
• Ethnography is a philosophy with a set of techniques that
include participant observation and interviews
• Ethnographers immerse themselves in the culture studied
• Need cooperation of people being studied
• A researcher’s degree of participation can vary along a scale
from ‘outside’ to ‘inside’
• A l i video and d l
Analyzing id d data logs can b time-consuming
be i i
• Can use continuous data analysis
• Collections of comments, incidents and artifacts are made
comments incidents,

Direct observation in a controlled setting
g
Think-aloud technique

Indirect observation
Diaries
Interaction logs
Cultural probes

Structuring frameworks to guide observation
- The person. Who?
- The place. Where?
p
- The thing. What?

The Goetz and LeCompte (1984) framework:
- Who is present?
- What is their role?
- What is happening?
- Where is it happening?
- Why is it happening?
- How is the activity organized?

Predictive Models
Provide a way of evaluating products or designs
without directly involving users.
Less expensive than user testing.
Usefulness limited to systems with predictable tasks
e.g., telephone answering systems, mobiles, etc.
Based on expert error-free behavior
behavior.

Fitts’ Law (Fitts, 1954)

Fitts’ Law predicts that the time to point at an object
using a device is a function of the distance from the target
object and the object’s size.
The further away and the smaller the object, the longer
the time to locate it and point to it.
h l d

GOMS Model
Goals h
G l - the state the user wants to achieve e.g., find a
h hi fi d
website.
Operators - the cognitive processes and physical actions
needed to attain the goals
Eg moving mouse to select icon
g g
Methods - the procedures for accomplishing the goals, e.g.,
drag mouse over icon, click on button.
Selection rules - decide which method to select when there is
more than one.

GOMS Response Times (Card et al., 1983)
Operator Description Time (sec)
K Pressing a single key or button
g g y
Average skilled typist (55 wpm) 0.22
Average non-skilled typist (40 wpm) 0.28
Pressing shift or control key 0.08
Typist unfamiliar withthekeyboard
with the keyboard 1.20
120
P Pointing with a mouse or other device on a 0.40
display to select an object.
This value is derived fromFitts’ Law which is
discussed below.
P1 Clicking the mouse or similar device 0.20
H Bring ‘home’ hands on the keyboard or other 0.40
device
M Mentally prepare/respond 1.35
R(t) The response time is counted only if it causes t
the user to wait.

Expert Inspections
Several kinds
Experts use their knowledge of users and technology to
review application usability.
Expert critiques can be formal or informal reports.
Heuristic
H i ti evaluation i a review guided b a set of heuristics
l ti is i id d by t f h i ti
Eg: Visibility of system status
Jacob Nielsen s heuristics (1990s)
Nielsen’s
Walkthroughs involve stepping through a pre-planned
scenario noting potential problems
Eg load AR model, scale it twice the size, add new model, etc

Nielsen’s heuristics
Visibility of system status
status.
Match between system and real world.
User control and freedom
freedom.
Consistency and standards.
Error prevention.
E
Recognition rather than recall.
Flexibility and efficiency of use.
Aesthetic and minimalist design.g
Help users recognize, diagnose, recover from errors.
Help and documentation.

Three Stages for Doing Heuristic Evaluation
1/ Briefing session to tell experts what to do.
2/ Evaluation period of 1 2 h
E l i i d f 1-2 hours in which:
i hi h
Each expert works separately;
Take one pass to get a feel for the product;
Take a second pass to focus on specific features.
3/ Debriefing session in which experts work together to
prioritize problems.

Advantages and Problems
Few ethical and practical issues to consider because users
not involved.
Can be difficult and expensive to find experts.
Best experts have knowledge of application domain and
users.
Biggest problems:
Important problems may get missed;
Many trivial problems are often identified;
Experts have biases.

Types of AR Experiments
Perception
How is virtual content perceived ?
p
What perceptual cues are most important ?
Interaction
How can users interact with virtual content ?
Which interaction techniques are most efficient ?

Collaboration
How is collaboration in AR interface different ?
Which collaborative cues can be conveyed best ?

Perception
Central goal of AR systems is to fool the human perceptual
system
Display Modes
Di l M d
Direct View
Stereo Video
Stereo graphics
Multi-modal display
Different objects with different display modes
Potential for depth cue conflict
p

Perceptual User Studies
Depth / Distance Studies
Estimate distance to object
Judge relative proximity

Object localization
j
Match physical and virtual object positions

Difficulties
Diffi lti
Precise alignment / calibration of displays
Lag in head tracking (
L i h d t ki (use static i t ti images) )

Outdoor AR: Limited Field of View

Possible l i
P ibl solutions
Overview + Detail
spatial separation; two views

Focus + Context
merges both views into one view

Zooming
temporal separation

Zooming Vi
Z i Views
TU G
Graz – HIT Lab NZ - collaboration
L b ll b ti
Zooming panorama
Zooming M
Z i Map

Zooming AR interfaces
Z i i f

Context Compass
Context Compass Zooming Panorama
Zooming Panorama Zooming Map
Zooming Map

Interface Types
Compass (C)
C
Compass + Zooming Panorama (CP)
Compass + Zooming Map (CM)
p g p( )
Compass, Zooming Panorama, Zooming Map (CPM)

Experiment Evaluation
20 subjects (10 M/ 10 F)
Café finding task
g
Task 1: Find particular café named “Alpha”
Task 2: Find closest café
Experiment measures
Time to complete task
Angular distance panned around
Subjective survey feedback
j y

Results
Compass good for search, but not comparison
Zooming (P or M) aids comparison
g( ) p
Information has significant effect
Compass requires more panning
User felt compass alone wasn’t useful

Interaction Studies
Stages of Interface Development
• Prototype Demonstration
• Adoption of Interaction techniques from other interface
metaphors
• Development of new interface metaphors appropriate to
the medium
• Development of formal theoretical models for predicting
and modeling user interactions

Fitt’s Law (1964)
Relates Movement Time to Index of Difficulty

MT = a + b log2(2A/W)

where log2(2A/W) = ID

Robust under most circumstances
object tracking, tapping tasks, movement tasks
tracking tasks

Interaction Study - Reaching
Mason, A. et. al. (2001). Reaching Movements to Augmented and Graphic
Objects in Virtual Environments. Proc. CHI 2001.
Does Fitt’s Law hold in an acquisition t k?
D Fitt’ L h ld i i iti task?
Does Fitt’s Law hold when reaching for virtual objects ?
Does Fitt’s L h ld when you can’t see your h d ?
D F ’ Law hold h ’ hand

Experimental Setup
Enhanced Virtual Hand Lab
Half Silvered Mirror
Shutter Glasses
OPTOTRAK optical tracker
p
IREDs worn on wrist, object
Four target cubes
g
Conditions:
Cube size arm visibility, real/virtual objects
size, visibility

Kinematic Measures
Movement Time
Peak Velocity of Wrist
y
Time to Peak Velocity of the Wrist
Percent Time from Peak Velocity of the Wrist

AR Navigation
Many commercial AR browsers
Information in place
How to navigate to POI

2D vs. AR Navigation?

VS

AR Navigation Study
Users navigate between Points of Interest
Three conditions
AR: Using l
AR U i only an AR viewi
2D-map: Using only a top down 2D map view
AR+2D-map: Using both an AR and 2D map view
Experiment Measures
Quantitative
- Time taken, Distance travelled
Qualitative
- Experimenter observations, Navigation behavior, Interviews
- U
User surveys, workload (NASA TLX)
kl d

HIT Lab NZ Test Platform – AR View

HIT Lab NZ Platform – Map View

Distance and Time

No significant differences

Paths Travelled
Red – AR
Blue – AR + Map
Yellow - Map

Navigation Behaviour
Depends on interface
D d i t f
Map doesn’t show short
cuts

AR
User Comments
“you don't know exactly where you are all of the time.”
“ d ' k l h ll f h i ”
“using AR I found it difficult to see where I was going”
Map
M
“you were able to get a sense of where you were”
“you are actually able t see the physicall objects around you”
“ t ll bl to th ph i bj t d ”
AR+MAP
“I used the map at the b i i to understand where the
d th p t th beginning t d t d h th
buildings were and the AR between each point”
“You can choose a direction with AR and find the shortest way
You
using the map.”

Usability Issues
Screen readability in sunlight
GPS inaccuracies
Compass errors
Touch screen difficulties
No routing information

Lessons Learned
User adapt navigation behaviour to guide type
AR interface shows shortcuts
Map interface good for planning
Include map view in AR interface
2D exocentric, and 3D egocentric
Allow people to easily change between views
p p y g
May use Map far away, AR close
Difficult to accurately show depth
y p

Collaboration Studies
Remote Conferencing

Face to Face Collaboration

Remote AR Conferencing
Moves conferencing from the desktop
to the workspace

Pilot Study
How does AR conferencing differ ?
Task
discussing images
12 pairs of subjects
Conditions
audio only ( )
y (AC)
video conferencing (VC)
mixed reality conferencing (MR)

Transcript Analysis

Users speak most in Audio Only condition
MR fewest words/min and interruptions/min
More results needed

Presence and Communication
Presence Rating (0-100)
100
90
80
70
60
50
40
30
20
10 Could tell when Partner was Concentrating
0 14
AC VC MR
12
10
8
6
4
2
0
AC VC MR

Subjective Comments
Paid more attention to pictures
Remote video provided peripheral cues
In AR condition
Difficult to see everything
Remote user distracting
Communication asymmetries

Face to Face Collaboration
Compare two person collaboration in:
Face to Face, AR, Projection Display

Task
Urban design logic puzzle
- Arrange 9 building to satisfy 10 rules in 7 minutes

Subjects
Within subjects study (counter-balanced)
12 pairs of college students

Face to Face Condition

Moving Model Buildings

AR Condition

Cards with AR Models
SVGA AR Display (800x600)
Video see-through AR
g

Projection Condition

Tracked Input Devices

Interface Conditions
FtF AR Projection
User Viewpoint
p Independent Private Public
Easy to change Independent Common
Easy to change Difficult to change
Limited FOV

Interaction Two handed Two handed Mouse-based
Natural object Tangible AR One-handed
manipulation techniques Time-multiplexed
Space-multiplexed Space-multiplexed

Hypothesis
Collaboration with AR technology will produce
behaviors that are more like natural face-to-
face collaboration than from using a screen-
screen
based interface.

Metrics
Subjective
Evaluative survey after each condition
Forced-choice survey after all conditions
Post experiment interview

Objective
j
Communication measures
- Video transcription
p

Measured Results
Performance
AR collaboration slower than FtF + Projection
j
Communication
Pointing/Picking gesture behaviors same in AR as FtF
Deictic speech patterns same in AR as FtF
- Both significantly different than Projection condition
g y j

Subjective
FtF easier to work together and understand
Interaction in AR easier than Proj. and same as FtF

Deictic Expressions
30%

25%

20%

15%

10%

5%

0%
FtF Proj AR

Significant difference – ANOVA, F(2,33) = 5.77, P < 0.01
No difference between FtF and AR

Ease of Interaction

Significant d ff
S f difference
Pick - F(2,69) = 37.8, P < 0.0001
Move - F(2,69) = 28.4, P < 0.0001

Interview Comments
“AR’s biggest limit was lack of peripheral vision. The interaction was natural, it was
just difficult to see. In the projection condition you could see everything but the
interaction was tough”
Face to Face
Subjects focused on task space
- gestures easy to see gaze difficult
see,

Projection display
Interaction difficult (8/14)
- not mouse-like, invasion of space

AR display – “working solo together”
Lack of peripheral cues = “tunnel vision (10/14 people)
tunnel vision”

Face to Face Summary
Collaboration is partly a Perceptual task
AR reduces perceptual cues -> Impacts collaboration
Tangible AR metaphor enhances ease of interaction

Users felt that AR collaboration different from FtF
But:
measured speech and gesture behaviors in AR condition is more similar to
FtF condition than in Projection display

Thus we need to design AR interfaces that don’t reduce perceptual
cues, while k
h l keeping ease of interaction
f

Case Study: A Wearable Information Space

Head Stabilized Body Stabilized

An
A AR i t f
interface provides spatial audio and visual cues
id ti l di d i l
Does a spatial interface aid performance?
–Task time / accuracy
M. Billinghurst, J. BowskilE, Nick DyeE, Jason Morphett (1998). An Evaluation of Wearable Information
Spaces. Proc. Virtual Reality Annual International Symposium.

Task Performance
Task
T k
find target icons on 8 pages
remember information space
Conditions
A - head-stabilized pages
head stabilized
B - cylindrical display with trackball
C - cylindrical display with head tracking
Subjects
Within subjects (need fewer subjects)
12 subjects used

Experimental Measures
Objective
Ob
spatial ability (pre-test)
time to perform task Many
information recall
workload (NASA TLX)
Different
Subjective Measures
Post Experiment Survey
- rank conditions (forced choice)
- Likert Scale Questions
• “How intuitive was the interface to use?”

Post Experiment Survey
For each of these conditions please answer:
1) How easy was it to find the target?
1 2 3 4 5 6 7
1=not very easy 7=very easy

For the head stabilised condition (A):
For the cylindrical condition with mouse input (B):
For the head tracked condition (C):

Rank all the conditions in order on a scale of one to three
1) Which condition was easiest to find target (1 = easiest, 3 = hardest)
A: B: C:

Results
Body Stabilization Improved Performance
search times significantly faster (One factor ANOVA)

Head Tracking Improved Information recall
no difference between trackball and stack case
Head tracking involved more physical work

Subjective Impressions
5
4.5
4
3.5
3
Find Target
2.5
Enjoyable
2
1.5
15
1
0.5
0
A B C

Subjects Felt Spatialized Conditions (
j p (ANOVA):
)
More enjoyable
Easier to find target

Subjective Impressions
3

2.5

2
Easiest
1.5 Understanding
Intuitive
1

0.5

0
A B C

Subject Rankings (Kruskal-Wallis)
Spatialized
S ti li d easier t use th h d stabilized
i to than head t bili d
Body stabilized gave better understanding
Head tracking most intuitive
g

Key Points
• There is a need for more user evaluation of AR
experiences
• There are several evaluation approaches that can be used
• ‘quick and dirty’
q y
• usability testing (lab studies)
• field stu es
e studies
• predictive evaluation
• Studies should use multiple qualitative and quantitative
experimental measures.

Online Resources
Meta-site for Statistical Analysis
http://home.ubalt.edu/ntsbarsh/stat-data/Topics.htm
Online Statistical Analysis
http://www.quantitativeskills.com/sisa/
Experiment Design
http://en.wikipedia.org/wiki/Design_of_experiments
p p g g _ _ p
http://www.curiouscat.net/library/designofexperiments.cfm

Books
J. Nielsen "Usability Engineering", Academic Press, 1993.
H. Sharp, Y. Rogers, J. Preece. “Interaction Design: Beyond
Human-computer I
H Interaction”, J h Wil & S
i ” John Wiley Sons, 2007
J. Spool, J. Rubin, D. Chisnell. “Handbook of Usability Testing:
How to Plan Design, and Conduct Effective Tests”, John
Plan, Design Tests
Wiley & Sons, 2008
T. Tullis, B Albert. Measuring
T Tullis B. Albert “Measuring the User Experience:
Collecting, Analyzing, and Presenting Usability Metrics”,
Morgan Kaufmann , 2008
g
A. Field, G. Hole. “How to Design and Report Experiments”,
Sage Publications Ltd, 2003

COSC 426 Lect. 7: Evaluating AR Applications

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à COSC 426 Lect. 7: Evaluating AR Applications

Similaire à COSC 426 Lect. 7: Evaluating AR Applications (20)

Plus de Mark Billinghurst

Plus de Mark Billinghurst (20)

Dernier

Dernier (20)

COSC 426 Lect. 7: Evaluating AR Applications