In this presentation, issues of planning game design for transfer and assessment are discussed. A review of the role of play is provided in relation to game design. Play can be part of a problem because of the lack of certainty in learning transfer. Serious games are developed to deliver learning outcomes. When there are specific learning outcomes, the game must make sure that learning that happens in games, does not stay in games. This is described here as the Vegas Effect. A simple methodological recommendation with examples is provided for improving validity and reliability in the independent variable (game interventions). This is known as inter rater reliability.
4. Play and Function
• "Biologically, its function is to reinforce the
organism’s variability in the face of
rigidifications of successful adaptation”
– (Sutton-Smith, 1997, 231).
• Play allows for a reframing of reality, and
reconsideration of context and the realm of
the possibilities.
– (Dubbels, 2010)
5. Play and Cultural Role
• Play strengthens societies by uniting
individuals through ritual activity and helping
them achieve common goals.
– Huizinga (1950)
• Toys, jokes, and games are often as symbols of play to
face collective fears about cultural issues that quickly
overwhelm the individual: bigotry, racism, rejection,
terrorism, addiction, and poverty.
• Toys, jokes, and games are things we can study as
distributed cognition by examining them as tools, rules,
roles, and context.
6. Play
• Natural State of Learning and Inquiry
• Play allows for imaginative exploration and risk-
taking with the freedom to make choices and
mistakes, and the potential for spontaneous shared
experience in constructed micro worlds without
consequence.
• Play allows for a reframing of reality, and
reconsideration of context and the realm of the
possibilities.
– Requires time
– How do we conduct ROI?
9. Memory and Learning
Retrieval Auto Associative Memory
• Retrieval (testing, quizzing) involves a • Associating patterns which are
discrimination process in which a set – similar,
– contrary,
of retrieval cues is established and – in close proximity (spatial),
the cues are used to determine the – in close succession (temporal)
prior occurrence of a tar- get event. • Associative recall
– evoke associated patterns
The effectiveness or diagnostic value – recall a pattern by part of it
of retrieval cues for solving this – evoke/recall with incomplete/
noisy patterns
discrimination problem will be a • Two types of associations. For two patterns
function of how well a cue specifies s and t
certain candidates – hetero-association (s != t) : relating
– (Tulving, 1974; Tulving & Thomson, 1973) to the two different patterns
exclusion of other competitors (see Raaijmakers &
– auto-association (s = t): relating
Shiffrin, 1980, 1981; Surprenant & Neath, 2009;
Karpicke & Blunt, 2011; Karpicke & smith, 2012; parts of a pattern with other parts
Karpicke & Zaromb, 2010).
10. Fun Evidence
• Name that tune
• Finish my sentence
• Crystalized knowledge
– Content Knowledge &
Terminology
• Base
11. Why not Games?
• Games have been used for describing complex
dynamic systems with multiple variables for
many years for a variety of systems including
economics, business, social systems, political
science, biology, etc.
– (Borels, 1938; von Neuman & Morgenstern, 1945)
12. Concrete
Externally
Imposed
Structure Work
Models/Bran
ching
Simulations
Games
Story &
Narrative
Self-Structured,
Do-Over Abstract Play
Consequence
Y = Structure: Time, Rules, Roles, Tools, Criteria
X = Consequence: Repetition, Status, Goods, Wealth
Z = Representation: Abstract (fantasy), Concrete (data)
13. Ethos of Activity
Play Work
Risk Emphasis on learning outcomes from
assessment, evaluation as consequence
14. What if we made a game about going
to Las Vegas
18. The Vegas Effect
Should everything that happens in games, stay in games?
It is not enough to invoke games and play.
Serious games should provide evidence that they delivered.
This should be quantifiable in performance metrics
19. Assessment Criteria & Mechanics
• Games assess, measure, and evaluate by their very nature.
• Outcomes from scoring criteria can provide evidence for
assessment and diagnosis.
• Evidence is only as good as the scoring criteria.
• Evidence should constitute measures that support transfer of
learning.
20. Games and Assessment
• Formative assessments –a measurement tool used to measure growth and
progress in learning and activity and can be used in games to alter subsequent
learning experiences in games. Formative assessments represent a tool external to
the learning activity, and typically occur in leading up to a summative evaluation.
• Summative assessments provide an evaluation or a final summarization of
learning. Summative assessment is characterized as assessment of learning and is
contrasted with formative assessment, which is assessment for learning.
Summative assessments are also tools external to the learning activity, and
typically occur at the end of the learning intervention to evaluate and summarize
and is conducted with a tool that is external, not part of the training.
• Informative assessment guides and facilitates learning as part of the assessment.
The assessment is the intervention. Successful participation in the learning results
in evidence that learning has taken place. The behaviors in the activity have been
shown to verify that learning has taken place. No external measures have been
added on for assessment.
21. Games & Informative Assessment
• Research findings from over 4,000 studies
indicate that informative assessment has the
most significant impact on achievement.
• (Wiliam, 2007).
22. Surface (face) & Content Validity
• Games are often built on these.
– It looks like it measures what it is supposed to
measure.
– Have checked how the game represents the
content against the relevant content domain.
– Approach assumes that you have a good detailed
description of the content domain, something
that's not always true.
23. Criterion Validity
• Criteria-related validity, you check the
performance of your operationalization
against some criterion.
– Predictive validity: assess the ability to predict something it should
theoretically be able to predict. Improve ADLs.
– Concurrent Validity: measure should be able to distinguish between
people who can live independently or in assisted living.
– Convergent Validity: correlate the scores on our test with scores on
other tests that purport to measure ADLs, where high correlations
would be evidence of convergent validity.
– Discriminant Validity: gather evidence that shows that the assessment
is not similar.
24. Serious Game Development
• Games need to adopt methods from the field
of psychometrics in development for assuring
validity and transfer.
– Inter rater (coder, judge) reliability should be a
critical component of content analysis for serious
games.
• However, it does not insure validity
– but without it, the data and interpretations of the
data can not be considered valid.
25.
26. Methodology in Assessment
• ”Without the establishment of reliability, content analysis
measures are useless”
– Neuendorf (2002) , p. 141).
• "interjudge reliability is often perceived as the standard
measure of research quality. High levels of disagreement
among judges suggest weaknesses in research methods,
including the possibility of poor operational definitions,
categories, and judge training"
– (Kolbe & Burnett, 1991, p. 248).
27. Increasing Reliability
• Select one or more appropriate Indices (Cohen’s Kappa, Fleiss
Kappa) best to select 2
– establishing a decision rule that takes into account the assumptions
and/or weaknesses of each
– Select a minimum level of reliability Coefficients of .90 or greater are
nearly always acceptable, .80 or greater is acceptable in most
situations, and .70 may be appropriate in some exploratory studies for
some indices.
• Assess reliability formally during coding of the sample.
• Report interrater reliability in a careful, clear, and detailed
manner in all research reports.
28. Branching
• Choosing number of
patients is all about
mathematics
• Observed power for
analysis.
– N= 19, .9
– Choice of parametric &
non-parametric tests
– Within-subject and
across-subject tests.
29. Ratings
• Cohen’s Kappa
• Fleiss Kappa
• Agreement at .8
– Kappa of <0.2 is
considered poor
agreement, 0.21-0.4 fair,
0.41-0.6 moderate, 0.61-
0.8 strong, and more
than 0.8 near complete
agreement.
30. ADL
• Activities of Daily Living • The term “activities of daily
living” refers to a set of common,
– The facility had already identified 8 everyday tasks, performance of
items for identification in their which is required for personal
kiosk software. self-care and independent living.
– The key game play element here The most often used measure of
was modeling the facility kiosks in functional ability is the Katz
the game and scoring the resident Activities of Daily Living Scale
interaction scenarios with how the (Katz et al., 1963; Katz, 1983).
CNAs document their observations. • Wiener, Hanley, Clark, Van Nostrand
– In the work environment, the (1990, pg.1 )
kiosks are already used to collect
data, and this provides an
opportunity to create external,
environmental, and population
validity and provide ROI analysis
for care plans.
31. Complex Relationship Building
• Establish therapeutic relationship with patient
to promote behavioral change.
1. Identify own attitude toward patient and
situation.
2. Determine ethical boundaries of the
relationship.
3. Deal with personal feeling evoked by the patient
that my interfere with effectiveness
4. Provide for physical comfort before interaction
5. Discuss confidentiality of information shared
6. Create climate of warmth and acceptance
7. Reassure patient of your interest in them as a
person
8. Return at established time to demonstrate trust
9. Maintain open body posture
10. Monitor, seek clarification & respond to non-
verbal messages.
– 10 of 31, NIC revised 5th edition 2008
32. Facial Action Coding
• The Facial Action 1.
2.
Anger
Disgust
Coding Systems (FACS) 3.
4.
Fear
Happiness
– (Ekman and Frisen 5. sadness
6. surprise
1978). 7. Amusement
– 2 or more raters 8. Contempt
9. Contentment
10. Embarrassment
11. Excitement
12. Guilt
13. Pride in achievement
14. Relief
15. satisfaction
16. sensory Pleasure
17. shame
35. Tension in workflow
• Software Design • Research Design
– Typically based upon an – Typically based upon
economic consideration. answering a testable
• How will this solve a question.
problem? • How will this solve a problem?
• What are the first steps in • How do I know this?
production?
– The focus is on method and
– The focus is on stages of hypothesis testing:
production: • Construct validity, reliability,
• Business Partner reliability, and probability.
Relations, Function,
Behavior, Structure, &
Non-Function (qualities).
36. Training Development Process
<<Design Review>>
Overall Design
Maintenance
Beta Deploy
CPI
Prototype Build
Stakeholders Signed Off
Edit OK
Standards Followed
Inputs
Review
[Changes]
{At least one must be
classroom delivery}
Charter Required Outputs
Assess
Templates
Risk Project Plan
Dashboard
Style Guide Build
Design
37. Take home
• Can you pose a testable question– hypothesis?
– Tension between design process and measurement
• How will you assure game mechanics are measuring what
you think you are measuring? Theoretically? Conceptually?
– Assessments, measures, & evaluations
• Usability testing should align with construct
– Testing should be happen in development.
• Again, emphasis on validity
– Without it, there is no capability for ROI analysis
Notes de l'éditeur
Games and play can be a very powerful form of learningThe work of the game designer is to find the happy medium.The key to this is the creation of game mechanics that scaffold the learner into success through repetition and encouraging feedback based upon criteria.
There are many types of play. This variation in the activity
Games and play have their own types and degree of risk, but often the assessments do not come with the risks of failure, and are not as focused on crystallized content.Games are are not often constructed to provide evidence of transfer. These issues should be a priority in serious game developmentthere should be evidence that learning acquired in a game is applicable outside of the game.
Serious games are very much like the tools used in psychological assessments and evaluations. Three types of assessments from psychometric methods
Face -- you might observe a teenage pregnancy prevention program and conclude that, "Yep, this is indeed a teenage pregnancy prevention program." Of course, if this is all you do to assess face validity, it would clearly be weak evidence because it is essentially a subjective judgment call.Content – the domain is drawn from observations—is this good enough for scientific results? ROI?
How is this different from content validity? In content validity, the criteria are the construct definition itself -- it is a direct comparison. In criterion-related validity, we usually make a prediction about how the operationalization will perform based on our theory of the construct. The differences among the different criterion-related validity types is in the criteria they use as the standard for judgment.
In our own work we developed for a number of criterion tools. This talk will examine Activities of Daily Living Explicitly