HCI have long moved beyond the evaluation setting of a single user sitting in front of a single desktop computer, yet many of our fundamentally held viewpoints about evaluation continues to be ruled by outdated biases derived from this legacy. We need to engage with real users in 'Living Laboratories', in which researchers either adopt or create functioning systems that are used in real settings. These new experimental platforms will greatly enable researchers to conduct evaluations that span many users, places, time, location, and social factors in ways that are unimaginable before.
2. As a field, early fundamental contributions from:
– Computer scientists interested in changes in ways we interact
with information systems
– Psychologists interested in the implications of these changes
Combustible, because:
– Computer scientists want to create great tools, but didn’t know
how to measure impact
– Psychologists want to go beyond classical research of the brain
and human cognition
Example:
– an enduring interest in ‘augmenting human intellect’: V. Bush,
Licklider, Engelbart, in turn inspiring Stu Card, Alan Newell,
Alan Kay, and many others.
2/13/09 HCIC quot;Living Labquot; 2
3. The need to establish HCI as a science
– Adopt methods from psychology
– Convenient and fits well with problems at hand
Issues around personal computing (WIMP)
– Dual purpose: understand nature of human
behavior and build up a science of HCI
techniques.
– Good Examples: Fitts’ Law, Models of Human
Memory, Cognitive and Behavioral Modeling,
Information Foraging
– Stuart K. Card, William K. English, and Betty J. Burr (1978).
Evaluation of mouse, rate‐controlled isometric joystick,
step keys, and text keys for text selection on a
CRT. Ergonomics, 21(8):601–613, 1978.
2/13/09 HCIC quot;Living Labquot; 3
4. Beyond a user in front of computer
– Yet evaluation methods mostly stayed the same
– Perceived CHI paper template for acceptance
Many problems don’t fit the laboratory experimental
methods anymore
– Yesterday’s discussion about Large Data and HCI was largely driven by
how HCI evaluation methods need to change to fit the wild
The best examples: Trends in Social Computing and UbiComp force
us to think about new context of use
2/13/09 HCIC quot;Living Labquot; 4
5. Old Assumptions New Considerations
Single display Multiple displays
Knowledge work Games, communication, social apps
Isolated worker Collaborative and social groups
Stationary location Mobile and stationary
Short task durations Short and long tasks, and tasks with
no time boundries
Controllable experimental conditions Uncontrollable experimental conditions
2/13/09 HCIC quot;Living Labquot; 5
6. Think about research that has been done on UI
animations or flashing icons.
From visual perception, we know motion in the
periphery is more noticeable than in the foveal region
[DaVinci].
2/13/09 HCIC quot;Living Labquot; 6
7. Evaluations surrounding many HCI systems for
knowledge work focus on productivity increase, but
what about factors for adoption?
– Argument: if no productivity increase, then adoption is
irrelevant
– But the opposite argument is just as right: if no adoption, no
amount of productivity increase shown is relevant!
Academic research often focus on productivity
improvements, increasing the perceived gulf between
the ivory tower and the trenches
An Example: Color copier studies
2/13/09 HCIC quot;Living Labquot; 7
8. Artificial experimental setups are only capable of telling us
behaviors in constrained situations
– Ecological considerations
– Hard to generalize to new task contexts (with interruptions, other
tasks, other goals, unfocused attention, more displays)
– Hard to generalize to other tools, apps
– Impossible to answer questions about aggregate behaviors of groups
Example problems:
– Adoption of mobile technology
iPhones in Japan, single‐handed input [PARC]
Best selling phones in Indonesia comes with a compass [Bell]
– Aggregate behavior of Wikipedia or Delicious users
Big data analysis of edit logs
2/13/09 HCIC quot;Living Labquot; 8
9. Was a computational molecular biologist
Analogy: Just as biologists work on model plants and
genomes in the lab, this tells us just how it behaves in
an isolated environment under controlled conditions,
but not how the plant will behave in the real world.
Biologists don’t just study models in the lab, but in the
wild also.
2/13/09 HCIC quot;Living Labquot; 9
10. Observational Studies
– Ethnography
– Social Technical Design
– Iterative Design
– Diary Studies
– Longitudinal Studies with single outcome
Problems:
– Sampling from non‐normal distributions
– Treat variations in social contexts as ‘noise’
– What about adoption?
Mixed Methods is now popular to partially ameliorate
– Convergent measures tells us we’re getting closer to the truth
2/13/09 HCIC quot;Living Labquot; 10
11. – similar to Venture Business mentioned by David Millen
Conduct research on real platforms and services
– Not to replace controlled lab studies
– Expand our arsenal to cover new situations
Some principles:
– Embedded in the real world
– Ecologically valid situations
– Embrace the complexity
– Rely on big‐data‐science to extract patterns
Not first to suggest this:
– S. Carter, J. Mankoff, S. Klemmer and T. Matthews. Exiting the cleanroom: On
ecological validity and ubiquitous computing. HCI Journal, 2008
– EClass [Abowd], PlaceLab [Intille], Plasma Poster [Churchill and Nelson], Digital
Family Portrait [Rowan, Mynatt]
2/13/09 HCIC quot;Living Labquot; 11
12. Two dimensions
– 1. Whether the system is under the control of the researcher
– 2. Whether the study is conducted in the lab or in the wild
System Control System Not in
Control
Laboratory (1) Build a system, (2) Adopt a system,
study in the Lab study in the Lab
Wild (Real (4) Build a system, (3) Adopt a system,
World) release it, study in study in the Wild
the Wild
2/13/09 HCIC quot;Living Labquot; 12
13. Traditional Approach; Numerous examples
Favored by CHI reviewers
Typical situation is the study of some interaction technique
– Pen input, gestures, perception of some visualized data, reading tasks,
mobile text input
Typical measures are quantitative in nature
– performance in time, performance in accuracy, eyetracking, learning
measures, user preferences
Issues:
– Not always ecologically valid
– Hard to take all interactions into account
– Often time‐consuming; even though we thought we could do it fast.
2/13/09 HCIC quot;Living Labquot; 13
14. Harder to find in the literature
Often comparing against an older system as baseline
Typical case is comparison of two systems
– (one website with another, one word processor vs. another)
– Which highlighting feature works better
– Two text input technique on a cell phone
Typical measures are similar to (1)
Issues:
– Some similar issues to (1) because it’s in lab
– System feature not in control, so not able to compare fairly, or
isolate the feature
2/13/09 HCIC quot;Living Labquot; 14
16. Real applications in ecological valid situations
Real findings can be applied to a running system
Impact of research is more immediate, since system is already
running
Typical case is log analytics with large subject pools
– log studies of web sites, real mobile calling usages, web search logs,
studies of Wikipedia edits.
Typical measures are stickiness, amount of activity, clustering
analysis, correlational analysis
Issues:
– Factors not in control, findings not comparable
– Factors cannot be isolated
– Reasons for failure is often just guesswork
2/13/09 HCIC quot;Living Labquot; 16
17. Hypothesis: Conflict is what drives Wikipedia forward.
How to study this?
– Tukey paradigm
– Get a large paper, and plot the damn data!
– Downloaded all of Wikipedia and all of the revisions
– Hadoop/MapReduce, MySQL, etc.
2/13/09 HCIC quot;Living Labquot; 17
18. 100%
Maintenance
95%
90%
Percentage of total edits
Editor's Notes
First, I’m humbled that so many luminaries are willing to listen to me speaking about HCI and HCI evaluation.HCI have long moved beyond the evaluation setting of a single user sitting in front of a single desktop computer, yet many of our fundamentally held viewpoints about evaluation continues to be ruled by outdated biases derived from this legacy. We need to engage with real users in 'Living Laboratories', in which researchers either adopt or create functioning systems that are used in real settings. These new experimental platforms will greatly enable researchers to conduct evaluations that span many users, places, time, location, and social factors in ways that are unimaginable before.<number>
Looking back on the history of Human-Computer Interaction as a field, we see fundamental contributions mainly from two groups of researchers: (1) computing scientists interested in how technology would change the way we all interact with information, and (2) psychologists (especially cognitive psychologists) interested in the implications of those changes.
With this aim, during the formation of the field, the need to establish HCI as a science had pushed us to adopt methods from psychology, both because it was convenient as well as the methods fit the needs.
Of course, the world has changed.
Bastardization of HCIRational / taskAbandonment of the notation of task/goalBeing measureable is not the same as being able to control the conditions / experimentOutdated Evaluative AssumptionsOf course, the world has changed.
One might argue that if using an application results in no productivity increase then the fact there is adoption of the application is irrelevant.
<number>
Re-thinking EvaluationsBoth trends have required re-thinking our evaluation methodologies.
A proposal for evaluations using 'Living Laboratories'The Augmented Social Cognition group have been a proponent of the idea of 'Living Labratory' within PARC.
Looking at two different dimensions in which HCI researchers could conduct evaluations, one dimension is whether the system is under the control of the researcher or not.
So as far as whether the glass is half-empty of half-full. I think we’re still filling the glass, as we have so far really just focused on (1) and (2). <number>