SlideShare une entreprise Scribd logo
1  sur  48
Summer PIT 2019
Validation and Mechanism
exploring the limits of evaluation
Alan Dix
http://alandix.com/academic/talks/PIT-2019-validation-and-mechanism/
Tiree
Tiree Tech Wave
3-7 October
Computational Foundry
Swansea University
the foundry
building
mission
community
what is evaluation?
easy and hard questions
what does it mean anyway?
easy questions
How fast do people recognise a menu option
Is product A easer to learn than product B
even then …
individuals or average
better for whom?
WEIRD …
WEIRD people
Henrich J, Heine S, Norenzayan A. (2010). The weirdest people in the world?
Behav Brain Sci. 2010 Jun;33(2-3):61-83; discussion 83-135.
doi:10.1017/S0140525X0999152X. Epub 2010 Jun 15
Western, Educated, Industrialized,
Rich, and Democratic
Harder questions
Subjective experience (UX, fun)
Long term-interactions
e.g. meetings
Long-term effects
eg. education, sustainability, behaviour change
What do we mean by evaluation?
often
post-hoc empirical study/experiment
… but why?
what is it for?
why are you doing it?
exploration vs. validation
process vs. product
research
exploration
?finding
questions
✓
validation
answering
them
explanation
finding
why and how
ethnography
in-depth interviews
detailed observation
big data
experiments
large-scale survey
quantitative data
qualitative data
theoretical models
mechanism
development
design build
test
process
product
summative
formative
make it better
does it work
purpose
Two types of evaluation
purpose stage
formative improve a design development
summative say “this is good” contractual/sales
investigative gain understanding researchinvestigative
user research /
big changes
gain understanding
Three
exploration / formative
– find any interesting issues
– stats about deciding priorities
validation / summative
– exhaustive: find all problems/issues
– verifying: is hypothesis true, does system work
– mensuration: how good, how prevalent
explanation / investigative
– matching qualitative/quantitative, small/large samples
are five users enough?
original work
Nielsen & Landauer (1993) about iterative process
not summative – not for stats!
how many?
to find enough to do in next development cycle
depends on size of project and complexity
now-a-days with cheap development maybe n=1
but always more in next cycle
N.B. later work
on saturation
from evaluation
to validation
dealing with
harder questions
validating work
• justification
– expert opinion
– previous research
– new experiments
• evaluation
– experiments
– user studies
– peer review
your work
evaluation
• experiments
• user studies
• peer review
generative artefacts
• justification
– expert opinion
– previous research
– new experiments
• evaluation
– experiments
– user studies
– peer review
artefact
evaluation
singularity
different people
different situations
different designers
• toolkits
• devices
• interfaces
• guidelines
• methodologies
(pure) evaluation of generative artefacts
is methodologically unsound
your work
validating work
• justification
– expert opinion
– previous research
– new experiments
• evaluation
– experiments
– user studies
– peer review
justification
• expert opinion
• previous research
• new experiments
evaluation
• experiments
• user studies
• peer review
justification vs. validation
• different disciplines
– mathematics: proof = justification
– medicine: drug trials = evaluation
• combine them:
– look for weakness in justification
– focus evaluation there
evaluationjustification
mechanism
from what happens
to how and why
mechanism
quantitative and statistical
what is true end to end
phenomena
qualitative and theoretical
why and how
mechanism
generalisation
empirical data
at best interpolate
understanding mechanism allows:
extrapolation
application in new contexts
mechanism
• reduction reconstruction
– formal hypothesis testing
+ may be qualitative too
– more scientific precision
• wholistic analytic
– field studies, ethnographies
+ ‘end to end’ experiments
– more ecological validity
?
? ? ? ? ?
• wholistic analytic
– field studies, ethnographies
+ ‘end to end’ experiments
– more ecological validity
example: mobile font size
early paper on fonts in mobile menus:
well conducted experiment
statistically significant results
conclusion gives best font size
but … a menu selection task includes:
1. visual search (better big fonts)
2. if not found scroll/page display (better small fonts)
3. when found touch target (better big fonts)
no single best size – the balance depends on menu length, etc.
what have you
really shown?
stats are about the measure,
but what does it measure
what have you really shown
• think about the conditions
– are there other explanations for data?
• individual or population
– small #of groups/individuals, many measurements
– sig. statistics => effect reliable for each individual
– but are individuals representative of all?
• systems vs properties
a little story …
BIG ACM conference – ‘good’ empirical paper
looking at collaborative support for a task X
three pieces of software:
A – domain specific software, synchronous
B – generic software, synchronous
C – generic software, asynchronous
A
B C
domai
n
spec.generic
experiment
sensible quality measures
reasonable nos. subjects in each condition
significant results p<0.05
domain spec. > generic
asynchronous > synchronous
conclusion: really want asynchronous domain specific
A
B C
domain
spec.
generic
domain
spec.
generic
asyncsync
what’s wrong with that?
interaction effects
gap is interesting to study
not necessarily good to implement
more important …
if you blinked at the wrong moment …
NOT independent variables
three different pieces of software
like experiment on 3 people!
say system B was just bad
domain
spec.
generic asyncsync
A
B C
domain
spec.
generic
?
B < A B < C
what went wrong?
borrowed psych method
… but method embodies assumptions
single simple cause, controlled environment
interaction needs ecologically valid experiment
multiple causes, open situations
what to do?
understand assumptions and modify
diversity – individual/task
good for not just good
don’t just look at average!
e.g. overall system A lower error rate than system B
but … system B better for experts
… and tasks too
e.g. PieTree
(interactive circular treemap)
exploding
Pie chart
good for finding
large things
unfolding
hierarchical
text view
good for finding
small things
more important to know
who or what
something is good for
types of evaluation
you’ve designed it, but is it right?
points of comparison
• measures:
– average satisfaction 3.2 on a 5 point scale
– time to complete task in range 13.2–27.6 seconds
– good or bad?
• need a point of comparison
– but what?
– self, similar system, created or real??
– think purpose ...
• what constitutes a ‘control’
– think!!
types of knowledge
• descriptive
– explaining what happened
• predictive
– saying what will happen
cause  effect
– where science often ends
• synthetic
– working out what to do to make what you want happen
effect  cause
– design and engineering
• synthetic
– working out what to do to make what you want happen
effect  cause
– design and engineering
different kinds of evaluation
endless arguments
– quantitative vs. qualitative
– in the lab vs. in the wild
– experts vs. real users (vs UG students!)
really
– combine methods
e.g. quantitative – what is true & qualitative – why
– what is appropriate and possible
when does it end?
in a world of perpetual beta ...
real use is the ultimate evaluation
• logging, bug reporting, etc.
• how do people really use the product?
• are some features never used?
Validation and mechanism: exploring the limits of evaluation

Contenu connexe

Tendances

Human Computer Interaction (HCI)
Human Computer Interaction (HCI)Human Computer Interaction (HCI)
Human Computer Interaction (HCI)Lahiru Danushka
 
HCI 3e - Ch 19: Groupware
HCI 3e - Ch 19:  GroupwareHCI 3e - Ch 19:  Groupware
HCI 3e - Ch 19: GroupwareAlan Dix
 
What Is Interaction Design
What Is Interaction DesignWhat Is Interaction Design
What Is Interaction DesignGraeme Smith
 
HCI - Chapter 4
HCI - Chapter 4HCI - Chapter 4
HCI - Chapter 4Alan Dix
 
HCI 3e - Ch 3: The interaction
HCI 3e - Ch 3:  The interactionHCI 3e - Ch 3:  The interaction
HCI 3e - Ch 3: The interactionAlan Dix
 
Interaction 09 Introduction to Interaction Design
Interaction 09 Introduction to Interaction DesignInteraction 09 Introduction to Interaction Design
Interaction 09 Introduction to Interaction DesignDave Malouf
 
HCI 3e - Ch 5: Interaction design basics
HCI 3e - Ch 5:  Interaction design basicsHCI 3e - Ch 5:  Interaction design basics
HCI 3e - Ch 5: Interaction design basicsAlan Dix
 
Human computer interaction 3 4(revised)
Human computer interaction 3 4(revised)Human computer interaction 3 4(revised)
Human computer interaction 3 4(revised)emaan waseem
 
Formal 6 – A success story!
Formal 6 – A success story!Formal 6 – A success story!
Formal 6 – A success story!Alan Dix
 
HCI 3e - Ch 16: Dialogue notations and design
HCI 3e - Ch 16:  Dialogue notations and designHCI 3e - Ch 16:  Dialogue notations and design
HCI 3e - Ch 16: Dialogue notations and designAlan Dix
 
HCI 3e - Ch 10: Universal design
HCI 3e - Ch 10:  Universal designHCI 3e - Ch 10:  Universal design
HCI 3e - Ch 10: Universal designAlan Dix
 
HCI - Chapter 3
HCI - Chapter 3HCI - Chapter 3
HCI - Chapter 3Alan Dix
 
Modelling interactions: digital and physical – Part 1 – lightning tour
Modelling interactions: digital and physical – Part 1 – lightning tourModelling interactions: digital and physical – Part 1 – lightning tour
Modelling interactions: digital and physical – Part 1 – lightning tourAlan Dix
 
HCI 3e - Ch 18: Modelling rich interaction
HCI 3e - Ch 18:  Modelling rich interactionHCI 3e - Ch 18:  Modelling rich interaction
HCI 3e - Ch 18: Modelling rich interactionAlan Dix
 
Human computer interaction
Human computer interactionHuman computer interaction
Human computer interactionsai anjaneya
 

Tendances (20)

Hci activity#3
Hci activity#3Hci activity#3
Hci activity#3
 
Human Computer Interaction (HCI)
Human Computer Interaction (HCI)Human Computer Interaction (HCI)
Human Computer Interaction (HCI)
 
E3 chap-05
E3 chap-05E3 chap-05
E3 chap-05
 
HCI 3e - Ch 19: Groupware
HCI 3e - Ch 19:  GroupwareHCI 3e - Ch 19:  Groupware
HCI 3e - Ch 19: Groupware
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
What Is Interaction Design
What Is Interaction DesignWhat Is Interaction Design
What Is Interaction Design
 
HCI - Chapter 4
HCI - Chapter 4HCI - Chapter 4
HCI - Chapter 4
 
HCI 3e - Ch 3: The interaction
HCI 3e - Ch 3:  The interactionHCI 3e - Ch 3:  The interaction
HCI 3e - Ch 3: The interaction
 
HCI
HCI HCI
HCI
 
Hci md exam
Hci md examHci md exam
Hci md exam
 
Interaction 09 Introduction to Interaction Design
Interaction 09 Introduction to Interaction DesignInteraction 09 Introduction to Interaction Design
Interaction 09 Introduction to Interaction Design
 
HCI 3e - Ch 5: Interaction design basics
HCI 3e - Ch 5:  Interaction design basicsHCI 3e - Ch 5:  Interaction design basics
HCI 3e - Ch 5: Interaction design basics
 
Human computer interaction 3 4(revised)
Human computer interaction 3 4(revised)Human computer interaction 3 4(revised)
Human computer interaction 3 4(revised)
 
Formal 6 – A success story!
Formal 6 – A success story!Formal 6 – A success story!
Formal 6 – A success story!
 
HCI 3e - Ch 16: Dialogue notations and design
HCI 3e - Ch 16:  Dialogue notations and designHCI 3e - Ch 16:  Dialogue notations and design
HCI 3e - Ch 16: Dialogue notations and design
 
HCI 3e - Ch 10: Universal design
HCI 3e - Ch 10:  Universal designHCI 3e - Ch 10:  Universal design
HCI 3e - Ch 10: Universal design
 
HCI - Chapter 3
HCI - Chapter 3HCI - Chapter 3
HCI - Chapter 3
 
Modelling interactions: digital and physical – Part 1 – lightning tour
Modelling interactions: digital and physical – Part 1 – lightning tourModelling interactions: digital and physical – Part 1 – lightning tour
Modelling interactions: digital and physical – Part 1 – lightning tour
 
HCI 3e - Ch 18: Modelling rich interaction
HCI 3e - Ch 18:  Modelling rich interactionHCI 3e - Ch 18:  Modelling rich interaction
HCI 3e - Ch 18: Modelling rich interaction
 
Human computer interaction
Human computer interactionHuman computer interaction
Human computer interaction
 

Similaire à Validation and mechanism: exploring the limits of evaluation

HCI 3e - Ch 9: Evaluation techniques
HCI 3e - Ch 9:  Evaluation techniquesHCI 3e - Ch 9:  Evaluation techniques
HCI 3e - Ch 9: Evaluation techniquesAlan Dix
 
Analytic emperical Mehods
Analytic emperical MehodsAnalytic emperical Mehods
Analytic emperical MehodsM Surendar
 
Recommender Systems in TEL
Recommender Systems in TELRecommender Systems in TEL
Recommender Systems in TELtelss09
 
Advanced Methods for User Evaluation in Enterprise AR
Advanced Methods for User Evaluation in Enterprise ARAdvanced Methods for User Evaluation in Enterprise AR
Advanced Methods for User Evaluation in Enterprise ARMark Billinghurst
 
2 Studies UX types should know about (Straub UXPA unconference13)
2 Studies UX types should know about (Straub UXPA unconference13)2 Studies UX types should know about (Straub UXPA unconference13)
2 Studies UX types should know about (Straub UXPA unconference13)Kath Straub
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
 
Design Research For Everyday Projects - UX London
Design Research For Everyday Projects  - UX LondonDesign Research For Everyday Projects  - UX London
Design Research For Everyday Projects - UX Londonleisa reichelt
 
Tenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia SlideshareTenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia Slideshareguest94c824
 
The Art and Science of Survey Research
The Art and Science of Survey ResearchThe Art and Science of Survey Research
The Art and Science of Survey ResearchSiobhan O'Dwyer
 
Research Methodology UNIT 2.pptx
Research Methodology UNIT 2.pptxResearch Methodology UNIT 2.pptx
Research Methodology UNIT 2.pptxPallawiBulakh1
 
Quantitative Data - A Basic Introduction
Quantitative Data - A Basic IntroductionQuantitative Data - A Basic Introduction
Quantitative Data - A Basic IntroductionDrKevinMorrell
 
Exploring Users' Values, Motivations and Emotions
Exploring Users' Values, Motivations and EmotionsExploring Users' Values, Motivations and Emotions
Exploring Users' Values, Motivations and EmotionsNorthern User Experience
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Importance Of Management Research
Importance Of Management ResearchImportance Of Management Research
Importance Of Management ResearchBeth Hernandez
 

Similaire à Validation and mechanism: exploring the limits of evaluation (20)

Cue Forum2008
Cue Forum2008Cue Forum2008
Cue Forum2008
 
HCI 3e - Ch 9: Evaluation techniques
HCI 3e - Ch 9:  Evaluation techniquesHCI 3e - Ch 9:  Evaluation techniques
HCI 3e - Ch 9: Evaluation techniques
 
Analytic emperical Mehods
Analytic emperical MehodsAnalytic emperical Mehods
Analytic emperical Mehods
 
Recommender Systems in TEL
Recommender Systems in TELRecommender Systems in TEL
Recommender Systems in TEL
 
Advanced Methods for User Evaluation in Enterprise AR
Advanced Methods for User Evaluation in Enterprise ARAdvanced Methods for User Evaluation in Enterprise AR
Advanced Methods for User Evaluation in Enterprise AR
 
Evaluation
EvaluationEvaluation
Evaluation
 
2 Studies UX types should know about (Straub UXPA unconference13)
2 Studies UX types should know about (Straub UXPA unconference13)2 Studies UX types should know about (Straub UXPA unconference13)
2 Studies UX types should know about (Straub UXPA unconference13)
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Design Research For Everyday Projects - UX London
Design Research For Everyday Projects  - UX LondonDesign Research For Everyday Projects  - UX London
Design Research For Everyday Projects - UX London
 
Tenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia SlideshareTenc Winterschool09 Davinia Slideshare
Tenc Winterschool09 Davinia Slideshare
 
meta_intro_141.ppt
meta_intro_141.pptmeta_intro_141.ppt
meta_intro_141.ppt
 
meta_intro_141.ppt
meta_intro_141.pptmeta_intro_141.ppt
meta_intro_141.ppt
 
The Art and Science of Survey Research
The Art and Science of Survey ResearchThe Art and Science of Survey Research
The Art and Science of Survey Research
 
Research Methodology UNIT 2.pptx
Research Methodology UNIT 2.pptxResearch Methodology UNIT 2.pptx
Research Methodology UNIT 2.pptx
 
Quantitative Data - A Basic Introduction
Quantitative Data - A Basic IntroductionQuantitative Data - A Basic Introduction
Quantitative Data - A Basic Introduction
 
Exploring Users' Values, Motivations and Emotions
Exploring Users' Values, Motivations and EmotionsExploring Users' Values, Motivations and Emotions
Exploring Users' Values, Motivations and Emotions
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Discussant EARLI sig 27
Discussant EARLI sig 27Discussant EARLI sig 27
Discussant EARLI sig 27
 
Importance Of Management Research
Importance Of Management ResearchImportance Of Management Research
Importance Of Management Research
 
Lecture rm 2
Lecture rm 2Lecture rm 2
Lecture rm 2
 

Plus de Alan Dix

CDT Away Day Talk: Qualitative–Quantitative reasoning and lightweight numbers
CDT Away Day Talk: Qualitative–Quantitative reasoning and lightweight numbersCDT Away Day Talk: Qualitative–Quantitative reasoning and lightweight numbers
CDT Away Day Talk: Qualitative–Quantitative reasoning and lightweight numbersAlan Dix
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Human-Centred Artificial Intelligence – Malta 2024
Human-Centred Artificial Intelligence – Malta 2024Human-Centred Artificial Intelligence – Malta 2024
Human-Centred Artificial Intelligence – Malta 2024Alan Dix
 
The future of UX design support tools - talk Paris March 2024
The future of UX design support tools - talk Paris March 2024The future of UX design support tools - talk Paris March 2024
The future of UX design support tools - talk Paris March 2024Alan Dix
 
Qualitative–Quantitative reasoning and lightweight numbers
Qualitative–Quantitative reasoning and lightweight numbersQualitative–Quantitative reasoning and lightweight numbers
Qualitative–Quantitative reasoning and lightweight numbersAlan Dix
 
Invited talk at Diversifying Knowledge Production in HCI
Invited talk at Diversifying Knowledge Production in HCIInvited talk at Diversifying Knowledge Production in HCI
Invited talk at Diversifying Knowledge Production in HCIAlan Dix
 
Exceptional Experiences for Everyone
Exceptional Experiences for EveryoneExceptional Experiences for Everyone
Exceptional Experiences for EveryoneAlan Dix
 
Inclusivity and AI: opportunity or threat
Inclusivity and AI: opportunity or threatInclusivity and AI: opportunity or threat
Inclusivity and AI: opportunity or threatAlan Dix
 
Hidden Figures architectural challenges to expose parameters lost in code
Hidden Figures architectural challenges to expose parameters lost in codeHidden Figures architectural challenges to expose parameters lost in code
Hidden Figures architectural challenges to expose parameters lost in codeAlan Dix
 
ChatGPT, Culture and Creativity simulacrum and alterity
ChatGPT, Culture and Creativity simulacrum and alterityChatGPT, Culture and Creativity simulacrum and alterity
ChatGPT, Culture and Creativity simulacrum and alterityAlan Dix
 
Why pandemics and climate change are hard to understand and make decision mak...
Why pandemics and climate change are hard to understand and make decision mak...Why pandemics and climate change are hard to understand and make decision mak...
Why pandemics and climate change are hard to understand and make decision mak...Alan Dix
 
Beyond the Wireframe: tools to design, analyse and prototype physical devices
Beyond the Wireframe: tools to design, analyse and prototype physical devicesBeyond the Wireframe: tools to design, analyse and prototype physical devices
Beyond the Wireframe: tools to design, analyse and prototype physical devicesAlan Dix
 
Forever Cyborgs – a long view on physical-digital interaction
Forever Cyborgs – a long view on physical-digital interactionForever Cyborgs – a long view on physical-digital interaction
Forever Cyborgs – a long view on physical-digital interactionAlan Dix
 
Truth in an Age of Information
Truth in an Age of InformationTruth in an Age of Information
Truth in an Age of InformationAlan Dix
 
Rome Seminar: Designing User Interactions with AI
Rome Seminar: Designing User Interactions with AIRome Seminar: Designing User Interactions with AI
Rome Seminar: Designing User Interactions with AIAlan Dix
 
Tools and technology to support rich community heritage
Tools and technology to support rich community heritageTools and technology to support rich community heritage
Tools and technology to support rich community heritageAlan Dix
 
Maps with Meaning
Maps with MeaningMaps with Meaning
Maps with MeaningAlan Dix
 
Democratising Digitisation Tools to Support Small Community Archives
Democratising Digitisation Tools to Support Small Community ArchivesDemocratising Digitisation Tools to Support Small Community Archives
Democratising Digitisation Tools to Support Small Community ArchivesAlan Dix
 
Follow your nose: history frames the future
Follow your nose: history frames the futureFollow your nose: history frames the future
Follow your nose: history frames the futureAlan Dix
 

Plus de Alan Dix (20)

CDT Away Day Talk: Qualitative–Quantitative reasoning and lightweight numbers
CDT Away Day Talk: Qualitative–Quantitative reasoning and lightweight numbersCDT Away Day Talk: Qualitative–Quantitative reasoning and lightweight numbers
CDT Away Day Talk: Qualitative–Quantitative reasoning and lightweight numbers
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Human-Centred Artificial Intelligence – Malta 2024
Human-Centred Artificial Intelligence – Malta 2024Human-Centred Artificial Intelligence – Malta 2024
Human-Centred Artificial Intelligence – Malta 2024
 
The future of UX design support tools - talk Paris March 2024
The future of UX design support tools - talk Paris March 2024The future of UX design support tools - talk Paris March 2024
The future of UX design support tools - talk Paris March 2024
 
Qualitative–Quantitative reasoning and lightweight numbers
Qualitative–Quantitative reasoning and lightweight numbersQualitative–Quantitative reasoning and lightweight numbers
Qualitative–Quantitative reasoning and lightweight numbers
 
Invited talk at Diversifying Knowledge Production in HCI
Invited talk at Diversifying Knowledge Production in HCIInvited talk at Diversifying Knowledge Production in HCI
Invited talk at Diversifying Knowledge Production in HCI
 
Exceptional Experiences for Everyone
Exceptional Experiences for EveryoneExceptional Experiences for Everyone
Exceptional Experiences for Everyone
 
Inclusivity and AI: opportunity or threat
Inclusivity and AI: opportunity or threatInclusivity and AI: opportunity or threat
Inclusivity and AI: opportunity or threat
 
Hidden Figures architectural challenges to expose parameters lost in code
Hidden Figures architectural challenges to expose parameters lost in codeHidden Figures architectural challenges to expose parameters lost in code
Hidden Figures architectural challenges to expose parameters lost in code
 
ChatGPT, Culture and Creativity simulacrum and alterity
ChatGPT, Culture and Creativity simulacrum and alterityChatGPT, Culture and Creativity simulacrum and alterity
ChatGPT, Culture and Creativity simulacrum and alterity
 
Why pandemics and climate change are hard to understand and make decision mak...
Why pandemics and climate change are hard to understand and make decision mak...Why pandemics and climate change are hard to understand and make decision mak...
Why pandemics and climate change are hard to understand and make decision mak...
 
Beyond the Wireframe: tools to design, analyse and prototype physical devices
Beyond the Wireframe: tools to design, analyse and prototype physical devicesBeyond the Wireframe: tools to design, analyse and prototype physical devices
Beyond the Wireframe: tools to design, analyse and prototype physical devices
 
Forever Cyborgs – a long view on physical-digital interaction
Forever Cyborgs – a long view on physical-digital interactionForever Cyborgs – a long view on physical-digital interaction
Forever Cyborgs – a long view on physical-digital interaction
 
Truth in an Age of Information
Truth in an Age of InformationTruth in an Age of Information
Truth in an Age of Information
 
Rome Seminar: Designing User Interactions with AI
Rome Seminar: Designing User Interactions with AIRome Seminar: Designing User Interactions with AI
Rome Seminar: Designing User Interactions with AI
 
Tools and technology to support rich community heritage
Tools and technology to support rich community heritageTools and technology to support rich community heritage
Tools and technology to support rich community heritage
 
Maps with Meaning
Maps with MeaningMaps with Meaning
Maps with Meaning
 
Democratising Digitisation Tools to Support Small Community Archives
Democratising Digitisation Tools to Support Small Community ArchivesDemocratising Digitisation Tools to Support Small Community Archives
Democratising Digitisation Tools to Support Small Community Archives
 
Follow your nose: history frames the future
Follow your nose: history frames the futureFollow your nose: history frames the future
Follow your nose: history frames the future
 

Dernier

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Dernier (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Validation and mechanism: exploring the limits of evaluation

  • 1. Summer PIT 2019 Validation and Mechanism exploring the limits of evaluation Alan Dix http://alandix.com/academic/talks/PIT-2019-validation-and-mechanism/
  • 2. Tiree Tiree Tech Wave 3-7 October Computational Foundry Swansea University
  • 4.
  • 5.
  • 6. what is evaluation? easy and hard questions what does it mean anyway?
  • 7. easy questions How fast do people recognise a menu option Is product A easer to learn than product B even then … individuals or average better for whom? WEIRD …
  • 8. WEIRD people Henrich J, Heine S, Norenzayan A. (2010). The weirdest people in the world? Behav Brain Sci. 2010 Jun;33(2-3):61-83; discussion 83-135. doi:10.1017/S0140525X0999152X. Epub 2010 Jun 15 Western, Educated, Industrialized, Rich, and Democratic
  • 9. Harder questions Subjective experience (UX, fun) Long term-interactions e.g. meetings Long-term effects eg. education, sustainability, behaviour change
  • 10. What do we mean by evaluation? often post-hoc empirical study/experiment … but why? what is it for?
  • 11. why are you doing it? exploration vs. validation process vs. product
  • 12. research exploration ?finding questions ✓ validation answering them explanation finding why and how ethnography in-depth interviews detailed observation big data experiments large-scale survey quantitative data qualitative data theoretical models mechanism
  • 14. purpose Two types of evaluation purpose stage formative improve a design development summative say “this is good” contractual/sales investigative gain understanding researchinvestigative user research / big changes gain understanding Three
  • 15. exploration / formative – find any interesting issues – stats about deciding priorities validation / summative – exhaustive: find all problems/issues – verifying: is hypothesis true, does system work – mensuration: how good, how prevalent explanation / investigative – matching qualitative/quantitative, small/large samples
  • 16. are five users enough? original work Nielsen & Landauer (1993) about iterative process not summative – not for stats! how many? to find enough to do in next development cycle depends on size of project and complexity now-a-days with cheap development maybe n=1 but always more in next cycle N.B. later work on saturation
  • 17.
  • 18. from evaluation to validation dealing with harder questions
  • 19. validating work • justification – expert opinion – previous research – new experiments • evaluation – experiments – user studies – peer review your work evaluation • experiments • user studies • peer review
  • 20. generative artefacts • justification – expert opinion – previous research – new experiments • evaluation – experiments – user studies – peer review artefact evaluation singularity different people different situations different designers • toolkits • devices • interfaces • guidelines • methodologies (pure) evaluation of generative artefacts is methodologically unsound
  • 21. your work validating work • justification – expert opinion – previous research – new experiments • evaluation – experiments – user studies – peer review justification • expert opinion • previous research • new experiments evaluation • experiments • user studies • peer review
  • 22. justification vs. validation • different disciplines – mathematics: proof = justification – medicine: drug trials = evaluation • combine them: – look for weakness in justification – focus evaluation there evaluationjustification
  • 23.
  • 25. mechanism quantitative and statistical what is true end to end phenomena qualitative and theoretical why and how mechanism
  • 26. generalisation empirical data at best interpolate understanding mechanism allows: extrapolation application in new contexts
  • 27. mechanism • reduction reconstruction – formal hypothesis testing + may be qualitative too – more scientific precision • wholistic analytic – field studies, ethnographies + ‘end to end’ experiments – more ecological validity ? ? ? ? ? ? • wholistic analytic – field studies, ethnographies + ‘end to end’ experiments – more ecological validity
  • 28. example: mobile font size early paper on fonts in mobile menus: well conducted experiment statistically significant results conclusion gives best font size but … a menu selection task includes: 1. visual search (better big fonts) 2. if not found scroll/page display (better small fonts) 3. when found touch target (better big fonts) no single best size – the balance depends on menu length, etc.
  • 29.
  • 30.
  • 31. what have you really shown? stats are about the measure, but what does it measure
  • 32. what have you really shown • think about the conditions – are there other explanations for data? • individual or population – small #of groups/individuals, many measurements – sig. statistics => effect reliable for each individual – but are individuals representative of all? • systems vs properties
  • 33. a little story … BIG ACM conference – ‘good’ empirical paper looking at collaborative support for a task X three pieces of software: A – domain specific software, synchronous B – generic software, synchronous C – generic software, asynchronous A B C domai n spec.generic
  • 34. experiment sensible quality measures reasonable nos. subjects in each condition significant results p<0.05 domain spec. > generic asynchronous > synchronous conclusion: really want asynchronous domain specific A B C domain spec. generic domain spec. generic asyncsync
  • 35. what’s wrong with that? interaction effects gap is interesting to study not necessarily good to implement more important … if you blinked at the wrong moment … NOT independent variables three different pieces of software like experiment on 3 people! say system B was just bad domain spec. generic asyncsync A B C domain spec. generic ? B < A B < C
  • 36. what went wrong? borrowed psych method … but method embodies assumptions single simple cause, controlled environment interaction needs ecologically valid experiment multiple causes, open situations what to do? understand assumptions and modify
  • 37.
  • 39. don’t just look at average! e.g. overall system A lower error rate than system B but … system B better for experts
  • 40. … and tasks too e.g. PieTree (interactive circular treemap) exploding Pie chart good for finding large things unfolding hierarchical text view good for finding small things
  • 41. more important to know who or what something is good for
  • 42.
  • 43. types of evaluation you’ve designed it, but is it right?
  • 44. points of comparison • measures: – average satisfaction 3.2 on a 5 point scale – time to complete task in range 13.2–27.6 seconds – good or bad? • need a point of comparison – but what? – self, similar system, created or real?? – think purpose ... • what constitutes a ‘control’ – think!!
  • 45. types of knowledge • descriptive – explaining what happened • predictive – saying what will happen cause  effect – where science often ends • synthetic – working out what to do to make what you want happen effect  cause – design and engineering • synthetic – working out what to do to make what you want happen effect  cause – design and engineering
  • 46. different kinds of evaluation endless arguments – quantitative vs. qualitative – in the lab vs. in the wild – experts vs. real users (vs UG students!) really – combine methods e.g. quantitative – what is true & qualitative – why – what is appropriate and possible
  • 47. when does it end? in a world of perpetual beta ... real use is the ultimate evaluation • logging, bug reporting, etc. • how do people really use the product? • are some features never used?