1) The document discusses the importance of establishing construct validity in serious games by ensuring the game measures what it intends to measure both theoretically and conceptually.
2) It provides examples of methods that can be used to establish construct validity such as cognitive ethnography, multi-trait multi-method analysis, and nomological networks.
3) The key is to design the game around testable hypotheses and use mixed methods like surveys, observations and institutional data both before and after playing to analyze outcomes and provide evidence of the game's impact.
1. The importance of construct
validity in designing serious games
Brock Dubbels
2. Overview of Talk
• Great ideas – we want them to get better
– Why video games?
• You mean we have to show evidence?
– The Vegas Effect
• What can be done—
– Methods, validity, and return on investment
• Great ideas pt 2 – we know how to help
4. “More than eight in ten (83%) young people have a
video game console at home, and 56% have two or
more."
--Gen M: Media in the Lives of 8-18 Year-olds (Executive Summary, p. 36)
7. The market
• Actual market in 2009
– 52 Billion
– Growth actually closer to 25%
• 2014 Projection
– 86 Billion
– Projected growth of 9.4%
• Serious games projection
– 400 to 500 Million
Price Waterhouse Cooper, 2007
8. Get on the bandwagon?
From a consumer perspective, serious games should be judged on
how they deliver learning outcomes, not what they generate in
sales in the marketplace.
9. • Games, by their very
nature, assess, measure, and evaluate.
– Can be very much like tools used in psychological
assessments and evaluations
10. Informative Assessment
• An informative assessment stresses
meaningful, timely, and continuous feedback
about learning during learning.
• It is intended to provide the problems in contexts
where the learner can learn through trial and
error with feedback based upon the criteria for
competence.
• Research findings from over 4,000 studies indicate
that informative assessment has the most
significant impact on achievement (Wiliam, 2007).
25. The Vegas Effect
Should everything that happens in games, stay in games?
It is not enough to invoke games and play.
Serious games should provide evidence that they delivered.
This should be quantifiable in performance metrics
27. Surface (face) Validity
• Games are often built on this.
– It looks like it measures what it is supposed to
measure.
– It appears to be a good project
– Does game-like delivery cut costs?
28. Construct Validity
• When we claim construct validity, we are essentially claiming that our
observed pattern—how things operate in reality—corresponds with our
theoretical pattern—how we think the world works.
– There are four main ways for assessing construct validity:
• Internal validity is related to what actually happens in a study.
• Has the independent variable really had an effect on the dependent variable?
• Or was the effect on the dependent variable caused by some other confounding
variable.
– Convergent and discriminant validity
• External validity refers to whether the findings of a study really can be generalized
beyond the present study. We can break external validity down into two types.
– Can we use the same measures in the game, and use them in the work environment?
– Will the measures we use in the work environment be effectively modeled for in-game learning
that transfers?
• Population validity - which refers to the extent to which the findings can be generalized
to other populations of people.
• Ecological validity - which refers to the extent to which the findings can be generalized
beyond the present situation.
29. Pre-Design Methods
• Elicitation – do you really need a game?
• Cognitive Ethnography
• Identification of
– quality of life and cost variables
– theoretic processes, & documentation variables for
kiosk.
• Fidelity
• Assessment, measurement, and evaluation
methods.
30. Tension in workflow
• Software Design • Research Design
– Typically based upon an – Typically based upon
economic consideration. answering a testable
• How will this solve a question.
problem? • How will this solve a problem?
• What are the first steps in • How do I know this?
production?
– The focus is on method and
– The focus is on stages of hypothesis testing:
production: • Construct
• Business Partner validity, reliability, reliability, a
Relations, Function, Beha nd probability.
vior, Structure, & Non-
Function (qualities).
31. Nomological Networks
• This is an attempt to provide better assurance
of construct validity. To do this, the researcher
should provide a theoretical frame-work for
what is being measured, an empirical
framework for how it is to be measured, and
specification of the linkage between these two
frameworks.
– (Cronbach&Meehl, 1955)
32. Multi Trait Multi Method
Construct Method
ADL Kiosk
CRB Rubric
SDT Inventories
Campbell and Fiske (1959)
Great explanation here:
http://luna.cas.usf.edu/~mbrannic/files/pmet/mtmm1a.htm
35. The state of long term care
• 55% white, 35% Black, 10% Hispanic
• Most workers are economically disadvantaged
• Low levels of educational attainment.
• Physically and emotionally demanding work, but often among the lowest paid in
the service industry.
• Viewed as an unpleasant occupation: primarily a maid service taking care of
incontinent, cognitively unaware old people.
• 45% attrition rate in first 90 days. Some reports show 100% turn over a year.
• Great shortage—with shortages, come reductions in quality of care.
• Expected growth rate of 85% with Baby Boomers retiring.
• Regulation tends to emphasize entry training, with limited attention to continued
career growth or development.
• Supervisors with “good people skills,” promotion of worker autonomy are the most
important predictors of higher job satisfaction and lower turnover rates.
» From “Who Will Care for Us?” US Dept. Health & Human Services
36. The User Story
• Functionally capable, but
not skilled.
– Soft skills
– Documentation skills
• High school education +/-1
• Like popular culture.
• A bit irreverent about
job, but this is coffee break
coping mechanism
• Limited care load while
training.
• Enjoy popular culture –
soaps, drama, etc.
37. Hypothesis
• Will improved people skills and increased
worker autonomy reduce attrition through
improving the perceived quality of life.
– Will perceived quality of life (PQoL) improve:
– Increase well-being in residents and nursing assistants?
– Reduce pain management?
– Reduce catastrophic care?
– Confidence and accuracy in information gathering and
reporting?
38. Theoretical perspective
• Improve soft skills and documentation.
• Quality of Life Measures
– How these are affected through: Presence, Constructive Conversation, Active
Listening
» These are used for game mechanics and coding dialog.
• Reducing attrition, improving Perceived Quality of Life
(PQoL), and improving documentation will reduce costs and
allow for more hires, better wages
– Reduce: pain meds, attendant care, catastrophic care.
• Documentation
– Reduce elicitation from medical staff
– Improve medical staff objective knowledge on daily living skills
39. Analysis Tools
• In order to measure whether the game does what it was
designed to do:
– Analysis criteria must exist inside and outside of the game for
evaluation.
– Same underlying measures from game
• Tools from ADL, SDT, Complex Relationship Building
• Inside:
– Scoring system weighted dialog
– Story content equated to kiosk input
– Play aloud
• Outside:
– Observational scoring tool for preceptors
– Survey for self-report
– Resident survey
– Kiosk
– Care plans
40. Four questions
1. Can I take any credit for any changes that
have happened in an individuals learning?
2. Does this have a connection to my
instructional activities?
3. Does these instructional activities equate to a
return on investment?
4. How do I know this?
41. Theory of interaction
• The central cog in Figure
6, Psychological Needs, is
modeled from Self-Determination
Theory (Deci& Ryan, 2000).
• The base measure, or bottom
cog, came from the Activities of
Daily Living (Roper, Logan, &
Tierney, 1980; 2000) and is
hypthothesized to be influenced
through interpersonal relations.
• The interpersonal relations were
modeled from operationalization
of Complex Relationship Building
(Bulechek, Butcher, &Dochtman,
2008)
42. SDT
• Self-Determination Theory • Basic Psychological Needs
– SDT is a macro theory, and It Scale
is concerned with supporting – General
our natural or intrinsic • 21 items
tendencies to behave in – Baard, P. P., Deci, E.
effective and healthy ways. L., Ryan, R. M. (2004).
– The key game play element
here was as a larger category – Relationships
for scoring criteria and • 9 items
providing accessible terms for • La
training. Guardia, Ryan, Couchman, &D
eci, 2000)
– In the work
environment, theses
inventories are used to – Work
provide an opportunity to • 21 items
create – (Deci, Ryan, Gagne, Leone, U
sunov, Kornazheva, 2001)
external, environmental, and
population validity.
43. ADL
• Activities of Daily Living • The term “activities of daily
living” refers to a set of
– The facility had already identified 8 common, everyday
items for identification in their tasks, performance of which is
kiosk software. required for personal self-care
– The key game play element here and independent living. The most
was modeling the facility kiosks in often used measure of functional
the game and scoring the resident ability is the Katz Activities of
interaction scenarios with how the Daily Living Scale (Katz et
CNAs document their observations.
al., 1963; Katx, 1983).
– In the work environment, the • Wiener, Hanley, Clark, Van Nostrand
kiosks are already used to collect (1990, pg.1 )
data, and this provides an
opportunity to create
external, environmental, and
population validity and provide ROI
analysis for care plans.
44. Complex Relationship Building
• Complex Relationship Building • a nursing intervention from the
Nursing Interventions Classification
– Identified from Nursing Interventions (NIC) defined as establishing a
Classification therapeutic relationship with a
(NIC), (Bulechek, Butcher, &Dochtman, patient to promote insight and
2008) behavioral change.
– The key game play element was • NIC identifies a 1 hour intervention.
modeling interactions with the
residents and providing an optimal path • There are 31 activities identified.
for interaction. – These activities represent the focus of
– Although this care giving practice is the game design and scoring process.
supposed to take an hour, the CNA must – 3 operationalized processes were
choose how to apportion that hour. taught to mediate this:
– In the work environment, these • Constructive Conversation
activities are used to provide language • Presence
and action for continuous • Active Listening
improvement. They are part of rubrics
for observation, and a further
opportunity to create
external, environmental, and
population validity.
45. Design Decisions
• FPH –the first person healer
• Flash / with database
– Web-host as well as installed with data upload.
• Time serves as game element
– Functional task selection vs. Interpersonal Communication
• Dialog driven
– Dialog supported by video cut scenes with voice narration
• Mini games such as room clean up
• Reward system (blue stars)
• Preceptor / Optimal Path / Debrief
• Interface with task log, resident information, clock, pause.
• Use real people’s faces as avatars
• Increase engagement through subtle but tasteful irreverence.
46. Outcome Analysis
• General Linear Model
• Quality of Life Variables
– Operationalized in soft skills and PQoL construct
• Presence, Active Listening, and Constructive Conversation
• Longitudinal study
– Pre, Game Play, Post
• Compare performance in: surveys, objective observer data, game
play, non-game play controls, self-report.
• Game play – construct sub-level scoring, i.e. number of
residents, rewards, optimal path decisions.
– Institutional data pre / post
• Compare catastrophic care, pain meds, independence, attrition
– Use game play, survey and observational tools as co-variates.
47. Take home
• Can you pose a testable question– hypothesis?
– Tension between design process and measurement
• Needs – behavior, function, non-function, structure.
• Construct validity – are you measuring what you think you are measuring? Theoretically?
Conceptually?
• Assessments, measures, & evaluations
• Mixed Methods approaches such as cognitive ethnography can provide an
opportunity to create a nomological network.
– MTMM provides an analysis tool that can be constructed to identify
convergent and discriminant validity.
• Spend time understanding the sample population
– Beliefs, likes, skills, & abilities.
– irreverence increases engagement, but reduces happiness of business partner.
• Usability testing should align with construct
• Again, emphasis on validity
– Without it, there is no capability for ROI analysis
Notes de l'éditeur
We gathered from the ethnographies that many watched reality television, soapy kind of programming, liked comedies, popular culture, gossip, irreverent, and suggested that for the game we have wheelchair jousting.We chose to offer the game as a soap opera, with intrigues at the residency.
cognitive ethnography assumes that cognition is distributed through rules, roles, language, relationships and coordinated activities, and can be embodied in artifacts and objects (Dubbels, 2008).Conceptual Space Analysis, Physical Space Analysis, Social Space AnalysisCognitive ethnography assumes that human cognition adapts to its natural surroundings. Therefore, the role of cognitive ethnographer is to transform observational data and interpretation into meaningful representations so that cognitive properties of the system become visible Ethnography often involves the researcher living in the community of study, learning the language, doing what members of the community do—learning to see the world as it is seen by the natives in their cultural context, Fetterman (1998).Cognitive ethnography follows the same protocol, but its purpose is to understand cognitive process and context—examining them together, thus, eliminating the false dichotomy between psychology and anthropology.