SlideShare une entreprise Scribd logo
1  sur  40
Upcoming Caveon Events
• Caveon Webinar Series: Next session, October 16
The Good and Bad of Online Proctoring, Part 2
• EATP – September 25-27 in St. Julian’s, Malta.
– Caveon’s John Fremer and Steve Addicott presenting:
What are we Accountable For? Security Standards and Resources for High
Stakes Testing Programs
– Steve Addicott hosting an ignite session: Leveraging Social Media to Connect with
International Test Candidates
• The 2nd Annual Statistical Detection of Potential Test Fraud Conference
– October 17-19, 2013, Madison, Wisconsin
– Caveon’s Dennis Maynes and Cindy Butler will be presenting three sessions
• Handbook of Test Security – Now Available. We will share a discount code at the
end of this session.
Caveon Online
• Caveon Security Insights Blog
– http://www.caveon.com/blog/
• twitter
– Follow @Caveon
• LinkedIn
– Caveon Company Page
– ―Caveon Test Security‖ Group
• Please contribute!
• Facebook
– Will you be our ―friend?‖
– ―Like‖ us!
www.caveon.com
Improving Testing with Key Strength Analysis
Dennis Maynes Dan Allen
Chief Scientist Psychometrician
Caveon Test Security Western Governors University
Marcus Scott Barbara Foster
Data Forensics Scientist Psychometrician
Caveon Test Security American Board of Obstetrics
and Gynecology
September 18, 2013
Caveon Webinar Series:
Agenda for Today
• Review classical item analysis
• Introduce Key Strength Analysis
• Derive Key Strength Analysis
• Observations by Dan Allen and Barbara Foster
• Conclusions and Q&A
Review Classical Item Analysis
• Statistics
– P-value
– Point-biserial correlation
• Typical rules
– Low p-values (hard items)
– High p-values (easy items)
– Low point-biserial correlations (low discriminations)
• Easy to understand and implement
• Good at flagging poor items
Introduce Key Strength Analysis
• Why Key Strength Analysis?
– Model uses information from all items
– Answer choices for same item are compared
– Provides possible reasons for poor performance
• High performing test takers (knowledgeable students)
– Typically report problems with the answer key
– Usually choose the correct answer
• Most frequently selected choice
– Is usually correct for easy items
– Is not necessarily correct for hard items
Capabilities of Key Strength Analysis
• Built upon classical item analysis
– Point-biserial correlations discriminate between high and low
performers
– P-values detect hard/easy items
• Typical problems with items
– Mis-keyed items
– Weakly keyed items
– Ambiguously keyed items
• Use probabilities to make inferences about item
performance
Modify Point-Biserial Correlation
1. Exclude the item score from the test score
• Places all answer choices on ―the same playing field‖
• Allows correct and incorrect answers to be compared using
―what if‖
2. Compute point-biserial correlations
• For correct answer and
• For distractors
3. Scale point-biserial appropriately
• We call this statistic, z*
• Use z* to compute the probability of the choice (A, B, etc.) being
a key--this is the ―key strength‖
Derive Key Strength Analysis
After Some Algebra
Why z* Depends on all the Right Quantities
Z* for all Items and Responses
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
z*
Right Wrong
154 Examinees, 100 Items
Calculating p(choice is a key | data)
Approximation Theory
• Central Limit Theorem  z* is normal.
• Probability function should be monotonic
increasing, which requires equal variances
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
z*
Right Right Normal Wrong Wrong Normal
P(choice is a key | z*)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
Analysis of Distractors
• Compute key strength (KS) for all responses
• Low KS – probability less than 50%
• High KS – probability 50% or more
AnswerDistractors Low KS High KS
Low KS Weakly keyed Potential mis-key
High KS Normal Ambiguously keyed
Example I – Good Key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
A
C D
B
Response z* Probability
A 3.25 0.99
B 0.25 0.06
C -2.75 0
D -2.4 0
Answer key arrow is
colored gold
Example II – Potential Mis-key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
A
B
C D
Response z* Probability
A 3.25 0.99
B 0.25 0.06
C -2.75 0
D -2.4 0
Answer key arrow is
colored gold
Example III – Weak Key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
A
B
C D
Response z* Probability
A 1.0 0.32
B 0.25 0.06
C -3 0
D -2.5 0
Answer key arrow is
colored gold
Example IV – Ambiguous Key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
Response z* Probability
A 3.75 0.99
B 2.25 0.9
C -3 0
D -2.5 0
C D
A
B
Answer key arrow is
colored gold
Validation – Answer Key Estimation
• Assume the key is not known
• Check accuracy of estimated answer key
• Algorithm:
– Start with most frequent response as initial guess
– Revise key using probabilities until no more changes
• For 12 different exams
– Key estimation accuracy varied from 81% to 99%
– Cannot infer multiple keys
– Cannot guess key when there are no correct responses
Summary of Validation Study
• Accuracy improves with item quality
• Accuracy affected by sample size & test length
Exam
Name
N Forms
Form
Length
Items
Non-scored
Items
Accuracy Observations
A 2,966 2 180 307 0 99.2%
B 337 2 107 214 0 85.5%
C 337 1 230 230 0 90.9%
D 1815 1 204 204 7 92.1%Some association with "deleted" items
E 1408 1 199 199 1 96.0%
F 46,356 2 240 480 0 96.0%
G 44,104 2 120 240 0 95.8%
H 25,448 2 60 120 0 93.3%
I 121 3 165 417 43 81.0%Strong association with "field test" items
J 1,071 8 52 & 61 391 0 80.5%85.2% (English-only)
K 2,033 8 68, 76 & 77 510 0 85.9%
L 6,473 21 250 1050 850 85.7%
All errors except one were on non-scored
items.
Reason for Answer Key Estimation
• If a group of test takers has stolen the test and worked
out their own answer key, it is likely some answers will
be wrong.
• Answer key estimation can find the errors committed by
test thieves.
Dan Allen
Psychometrician
Western Governors University
Example Item: Ambiguous Key
Which is a property of all X?
A. They contain Y.
B. They have property Z.
C. * They do not contain Y.
D. They have property W.
Looking at the item text, we see that this is likely being
caused by rival options A and C. SME feedback
suggests the item is too text specific.
Example Item: Ambiguous Key
Which is a component of X?
A. * Real anticipated expense
B. Time spent
C. Liquid assets
D. Quality
In this case, students of high ability were often
selecting C instead of A. SME feedback suggests the
deleted word may have been turning students off to
that option.
Example Item: Weak Key
Select 3 possible causes of X
A. *Obesity
B. Contaminated drinking water
C. *Unhealthy diet
D. *Genetic factors
E. Lack of exercise
High performing students were picking C and D correctly, but
were as likely to pick E as they were to pick A. SME feedback
suggested that E may be a reasonable answer to the question.
The revision involved making A, C, and E all incorrect answers
so that D would remain the sole answer.
Example Item: Potential Mis-key
Which is a sound accounting principle?
A. X
B. Not X
C. *Y
D. Z
Nearly all students selected distractor B (Not X). This
item was not mis-keyed. It seems most likely that this
concept was not covered sufficiently in the text and/or
other learning resources—leaving students to use
guessing strategies rather than content knowledge.
Barbara Foster
Psychometrician
The American Board of Obstetrics
and Gynecology
The American Board of
Obstetrics and Gynecology
2013 Certifying Exam
• 180 scored items
• Five sets of 40 field test items
• Potential mis-keys from Caveon
– 8 identified among the scored items (4%)
– 22 identified among the field test items (11%)
The lower proportion in the scored items is not
surprising since those items have been field
tested and some may have been previously
used.
The American Board of Obstetrics and Gynecology
• Result of the SME review of the flagged scored
items:
– 4 of the 8 (50%) were found to have problems.
These problems were a combination of ambiguous
wording, new information published just prior to
the exam, recent changes in guidelines, or just a
very difficult item. These items were deleted from
the exam prior to scoring.
The American Board of Obstetrics and Gynecology
• Result of the SME review of the flagged field
test items:
– 15 of the 22 (68%) were found to have problems.
These problems were mostly a combination of
ambiguous wording, responses too closely related,
and changes in the field.
The American Board of Obstetrics and Gynecology
Our Standard Methods The z* Method
27 Field Test Items
flagged
(13.5%)
22 Field Test Items
flagged
(11.0%)8 (4%)
items
flagged
by both
The American Board of Obstetrics and Gynecology
Our Standard Methods The z* Method
27 Field Test Items
flagged
(13.5%)
13 had problems
22 Field Test Items
flagged
(11.0%)
15 had problems
8 (4%)
5 items
had
problems
The American Board of Obstetrics and Gynecology
• Conclusion
This new method indicates that it is detecting
differences that are not being detected by our
current methods. These differences do not
appear to be strictly keying errors but involve
other important problem areas as well.
The American Board of Obstetrics and Gynecology
Conclusions
• Item analysis helps ensure
– Unidimensionality
– Desired item performance
• Key Strength Analysis enhances classical item analysis
– Uses information from all items
– Compares answer choices for same item
• Can detect structural flaws in items
• Can suggest the actual key when the item is mis-keyed
– Suggests possible reasons for poor performance
• Future research
– Investigate thresholds for Key Strength Analysis
– Simulate item problems to measure ability to detect
– Evaluate performance when assumptions fail
Questions?
Please type questions for our presenters in the
GoToWebinar control panel on your screen.
HANDBOOK OF TEST SECURITY
• Editors - James Wollack & John Fremer
• Published March 2013
• Preventing, Detecting, and Investigating Cheating
• Testing in Many Domains
– Certification/Licensure
– Clinical
– Educational
– Industrial/Organizational
• Don’t forget to order your copy at www.routledge.com
– http://bit.ly/HandbookTS (Case Sensitive)
– Save 20% - Enter discount code: HYJ82
THANK YOU!
- Follow Caveon on twitter @caveon
- Check out our blog…www.caveon.com/blog
- LinkedIn Group – ―Caveon Test Security‖
Dennis Maynes Dan Allen
Chief Scientist Psychometrician
Caveon Test Security Western Governors University
Marcus Scott Barbara Foster
Data Forensics Scientist Psychometrician
Caveon Test Security American Board of Obstetrics
and Gynecology

Contenu connexe

Similaire à Caveon Webinar Series: Improving Testing with Key Strength Analysis

Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3 Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3 Caroline Jarrett
 
Psychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouPsychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouExamSoft
 
Lesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabiltyLesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabiltymjlobetos
 
I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...Tim Hunt
 
Fdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by ddFdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by dddettmore
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter AnalysisSue Quirante
 
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis scienceHarmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis sciencequestRCN
 
Unit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCHUnit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCHPramod Rawat
 
Administering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessmentAdministering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessmentNema Grace Medillo
 
Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...Blackboard APAC
 
Chapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test ItemsChapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test ItemsSHELAMIE SANTILLAN
 
Collection of data
Collection of dataCollection of data
Collection of dataBaiju KT
 
Test construction tony coloma
Test construction tony colomaTest construction tony coloma
Test construction tony colomaTony Coloma
 

Similaire à Caveon Webinar Series: Improving Testing with Key Strength Analysis (20)

Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3 Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3
 
Psychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouPsychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling You
 
Lesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabiltyLesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabilty
 
ch.9 (1).ppt
ch.9 (1).pptch.9 (1).ppt
ch.9 (1).ppt
 
Item analysis with spss software
Item analysis with spss softwareItem analysis with spss software
Item analysis with spss software
 
I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...
 
Questionnaire development
Questionnaire developmentQuestionnaire development
Questionnaire development
 
Teaching technology2
Teaching technology2Teaching technology2
Teaching technology2
 
Fdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by ddFdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by dd
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter Analysis
 
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis scienceHarmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
 
Unit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCHUnit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCH
 
Administering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessmentAdministering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessment
 
Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...
 
Chapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test ItemsChapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test Items
 
Collection of data
Collection of dataCollection of data
Collection of data
 
AOL-CHAPTER-3.pptx
AOL-CHAPTER-3.pptxAOL-CHAPTER-3.pptx
AOL-CHAPTER-3.pptx
 
Test construction tony coloma
Test construction tony colomaTest construction tony coloma
Test construction tony coloma
 
Analysis of item test
Analysis of item testAnalysis of item test
Analysis of item test
 
Analysis of item test
Analysis of item testAnalysis of item test
Analysis of item test
 

Plus de Caveon Test Security

Unpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enemUnpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enemCaveon Test Security
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon Test Security
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon Test Security
 
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...Caveon Test Security
 
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...Caveon Test Security
 
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217Caveon Test Security
 
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...Caveon Test Security
 
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...Caveon Test Security
 
Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317Caveon Test Security
 
CESP Study Session #1 October 2016
CESP Study Session #1 October 2016CESP Study Session #1 October 2016
CESP Study Session #1 October 2016Caveon Test Security
 
A Tale of Two Cities - School District Webinar #1 Jan 2017
A Tale of Two Cities - School District Webinar  #1 Jan 2017A Tale of Two Cities - School District Webinar  #1 Jan 2017
A Tale of Two Cities - School District Webinar #1 Jan 2017Caveon Test Security
 
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...
Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...Caveon Test Security
 
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016Caveon Test Security
 
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...Caveon Test Security
 
Caveon Webinar Series - Conducting Test Security Investigations in School Di...
Caveon Webinar Series -  Conducting Test Security Investigations in School Di...Caveon Webinar Series -  Conducting Test Security Investigations in School Di...
Caveon Webinar Series - Conducting Test Security Investigations in School Di...Caveon Test Security
 
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016Caveon Test Security
 
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...Caveon Test Security
 
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series -  Will the Real Cloned Item Please Stand Up? finalCaveon Webinar Series -  Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? finalCaveon Test Security
 
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...Caveon Test Security
 
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...Caveon Test Security
 

Plus de Caveon Test Security (20)

Unpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enemUnpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enem
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...
 
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...
 
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
 
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
 
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
 
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...
 
Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317
 
CESP Study Session #1 October 2016
CESP Study Session #1 October 2016CESP Study Session #1 October 2016
CESP Study Session #1 October 2016
 
A Tale of Two Cities - School District Webinar #1 Jan 2017
A Tale of Two Cities - School District Webinar  #1 Jan 2017A Tale of Two Cities - School District Webinar  #1 Jan 2017
A Tale of Two Cities - School District Webinar #1 Jan 2017
 
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...
Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...
 
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016
 
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...
 
Caveon Webinar Series - Conducting Test Security Investigations in School Di...
Caveon Webinar Series -  Conducting Test Security Investigations in School Di...Caveon Webinar Series -  Conducting Test Security Investigations in School Di...
Caveon Webinar Series - Conducting Test Security Investigations in School Di...
 
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016
 
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
 
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series -  Will the Real Cloned Item Please Stand Up? finalCaveon Webinar Series -  Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? final
 
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...
 
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
 

Dernier

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 

Dernier (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 

Caveon Webinar Series: Improving Testing with Key Strength Analysis

  • 1. Upcoming Caveon Events • Caveon Webinar Series: Next session, October 16 The Good and Bad of Online Proctoring, Part 2 • EATP – September 25-27 in St. Julian’s, Malta. – Caveon’s John Fremer and Steve Addicott presenting: What are we Accountable For? Security Standards and Resources for High Stakes Testing Programs – Steve Addicott hosting an ignite session: Leveraging Social Media to Connect with International Test Candidates • The 2nd Annual Statistical Detection of Potential Test Fraud Conference – October 17-19, 2013, Madison, Wisconsin – Caveon’s Dennis Maynes and Cindy Butler will be presenting three sessions • Handbook of Test Security – Now Available. We will share a discount code at the end of this session.
  • 2. Caveon Online • Caveon Security Insights Blog – http://www.caveon.com/blog/ • twitter – Follow @Caveon • LinkedIn – Caveon Company Page – ―Caveon Test Security‖ Group • Please contribute! • Facebook – Will you be our ―friend?‖ – ―Like‖ us! www.caveon.com
  • 3. Improving Testing with Key Strength Analysis Dennis Maynes Dan Allen Chief Scientist Psychometrician Caveon Test Security Western Governors University Marcus Scott Barbara Foster Data Forensics Scientist Psychometrician Caveon Test Security American Board of Obstetrics and Gynecology September 18, 2013 Caveon Webinar Series:
  • 4. Agenda for Today • Review classical item analysis • Introduce Key Strength Analysis • Derive Key Strength Analysis • Observations by Dan Allen and Barbara Foster • Conclusions and Q&A
  • 5. Review Classical Item Analysis • Statistics – P-value – Point-biserial correlation • Typical rules – Low p-values (hard items) – High p-values (easy items) – Low point-biserial correlations (low discriminations) • Easy to understand and implement • Good at flagging poor items
  • 6. Introduce Key Strength Analysis • Why Key Strength Analysis? – Model uses information from all items – Answer choices for same item are compared – Provides possible reasons for poor performance • High performing test takers (knowledgeable students) – Typically report problems with the answer key – Usually choose the correct answer • Most frequently selected choice – Is usually correct for easy items – Is not necessarily correct for hard items
  • 7. Capabilities of Key Strength Analysis • Built upon classical item analysis – Point-biserial correlations discriminate between high and low performers – P-values detect hard/easy items • Typical problems with items – Mis-keyed items – Weakly keyed items – Ambiguously keyed items • Use probabilities to make inferences about item performance
  • 8. Modify Point-Biserial Correlation 1. Exclude the item score from the test score • Places all answer choices on ―the same playing field‖ • Allows correct and incorrect answers to be compared using ―what if‖ 2. Compute point-biserial correlations • For correct answer and • For distractors 3. Scale point-biserial appropriately • We call this statistic, z* • Use z* to compute the probability of the choice (A, B, etc.) being a key--this is the ―key strength‖
  • 11. Why z* Depends on all the Right Quantities
  • 12. Z* for all Items and Responses 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 z* Right Wrong 154 Examinees, 100 Items
  • 13. Calculating p(choice is a key | data)
  • 14. Approximation Theory • Central Limit Theorem  z* is normal. • Probability function should be monotonic increasing, which requires equal variances 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 z* Right Right Normal Wrong Wrong Normal
  • 15. P(choice is a key | z*) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z*
  • 16. Analysis of Distractors • Compute key strength (KS) for all responses • Low KS – probability less than 50% • High KS – probability 50% or more AnswerDistractors Low KS High KS Low KS Weakly keyed Potential mis-key High KS Normal Ambiguously keyed
  • 17. Example I – Good Key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* A C D B Response z* Probability A 3.25 0.99 B 0.25 0.06 C -2.75 0 D -2.4 0 Answer key arrow is colored gold
  • 18. Example II – Potential Mis-key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* A B C D Response z* Probability A 3.25 0.99 B 0.25 0.06 C -2.75 0 D -2.4 0 Answer key arrow is colored gold
  • 19. Example III – Weak Key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* A B C D Response z* Probability A 1.0 0.32 B 0.25 0.06 C -3 0 D -2.5 0 Answer key arrow is colored gold
  • 20. Example IV – Ambiguous Key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* Response z* Probability A 3.75 0.99 B 2.25 0.9 C -3 0 D -2.5 0 C D A B Answer key arrow is colored gold
  • 21. Validation – Answer Key Estimation • Assume the key is not known • Check accuracy of estimated answer key • Algorithm: – Start with most frequent response as initial guess – Revise key using probabilities until no more changes • For 12 different exams – Key estimation accuracy varied from 81% to 99% – Cannot infer multiple keys – Cannot guess key when there are no correct responses
  • 22. Summary of Validation Study • Accuracy improves with item quality • Accuracy affected by sample size & test length Exam Name N Forms Form Length Items Non-scored Items Accuracy Observations A 2,966 2 180 307 0 99.2% B 337 2 107 214 0 85.5% C 337 1 230 230 0 90.9% D 1815 1 204 204 7 92.1%Some association with "deleted" items E 1408 1 199 199 1 96.0% F 46,356 2 240 480 0 96.0% G 44,104 2 120 240 0 95.8% H 25,448 2 60 120 0 93.3% I 121 3 165 417 43 81.0%Strong association with "field test" items J 1,071 8 52 & 61 391 0 80.5%85.2% (English-only) K 2,033 8 68, 76 & 77 510 0 85.9% L 6,473 21 250 1050 850 85.7% All errors except one were on non-scored items.
  • 23. Reason for Answer Key Estimation • If a group of test takers has stolen the test and worked out their own answer key, it is likely some answers will be wrong. • Answer key estimation can find the errors committed by test thieves.
  • 25. Example Item: Ambiguous Key Which is a property of all X? A. They contain Y. B. They have property Z. C. * They do not contain Y. D. They have property W. Looking at the item text, we see that this is likely being caused by rival options A and C. SME feedback suggests the item is too text specific.
  • 26. Example Item: Ambiguous Key Which is a component of X? A. * Real anticipated expense B. Time spent C. Liquid assets D. Quality In this case, students of high ability were often selecting C instead of A. SME feedback suggests the deleted word may have been turning students off to that option.
  • 27. Example Item: Weak Key Select 3 possible causes of X A. *Obesity B. Contaminated drinking water C. *Unhealthy diet D. *Genetic factors E. Lack of exercise High performing students were picking C and D correctly, but were as likely to pick E as they were to pick A. SME feedback suggested that E may be a reasonable answer to the question. The revision involved making A, C, and E all incorrect answers so that D would remain the sole answer.
  • 28. Example Item: Potential Mis-key Which is a sound accounting principle? A. X B. Not X C. *Y D. Z Nearly all students selected distractor B (Not X). This item was not mis-keyed. It seems most likely that this concept was not covered sufficiently in the text and/or other learning resources—leaving students to use guessing strategies rather than content knowledge.
  • 29. Barbara Foster Psychometrician The American Board of Obstetrics and Gynecology
  • 30. The American Board of Obstetrics and Gynecology 2013 Certifying Exam • 180 scored items • Five sets of 40 field test items
  • 31. • Potential mis-keys from Caveon – 8 identified among the scored items (4%) – 22 identified among the field test items (11%) The lower proportion in the scored items is not surprising since those items have been field tested and some may have been previously used. The American Board of Obstetrics and Gynecology
  • 32. • Result of the SME review of the flagged scored items: – 4 of the 8 (50%) were found to have problems. These problems were a combination of ambiguous wording, new information published just prior to the exam, recent changes in guidelines, or just a very difficult item. These items were deleted from the exam prior to scoring. The American Board of Obstetrics and Gynecology
  • 33. • Result of the SME review of the flagged field test items: – 15 of the 22 (68%) were found to have problems. These problems were mostly a combination of ambiguous wording, responses too closely related, and changes in the field. The American Board of Obstetrics and Gynecology
  • 34. Our Standard Methods The z* Method 27 Field Test Items flagged (13.5%) 22 Field Test Items flagged (11.0%)8 (4%) items flagged by both The American Board of Obstetrics and Gynecology
  • 35. Our Standard Methods The z* Method 27 Field Test Items flagged (13.5%) 13 had problems 22 Field Test Items flagged (11.0%) 15 had problems 8 (4%) 5 items had problems The American Board of Obstetrics and Gynecology
  • 36. • Conclusion This new method indicates that it is detecting differences that are not being detected by our current methods. These differences do not appear to be strictly keying errors but involve other important problem areas as well. The American Board of Obstetrics and Gynecology
  • 37. Conclusions • Item analysis helps ensure – Unidimensionality – Desired item performance • Key Strength Analysis enhances classical item analysis – Uses information from all items – Compares answer choices for same item • Can detect structural flaws in items • Can suggest the actual key when the item is mis-keyed – Suggests possible reasons for poor performance • Future research – Investigate thresholds for Key Strength Analysis – Simulate item problems to measure ability to detect – Evaluate performance when assumptions fail
  • 38. Questions? Please type questions for our presenters in the GoToWebinar control panel on your screen.
  • 39. HANDBOOK OF TEST SECURITY • Editors - James Wollack & John Fremer • Published March 2013 • Preventing, Detecting, and Investigating Cheating • Testing in Many Domains – Certification/Licensure – Clinical – Educational – Industrial/Organizational • Don’t forget to order your copy at www.routledge.com – http://bit.ly/HandbookTS (Case Sensitive) – Save 20% - Enter discount code: HYJ82
  • 40. THANK YOU! - Follow Caveon on twitter @caveon - Check out our blog…www.caveon.com/blog - LinkedIn Group – ―Caveon Test Security‖ Dennis Maynes Dan Allen Chief Scientist Psychometrician Caveon Test Security Western Governors University Marcus Scott Barbara Foster Data Forensics Scientist Psychometrician Caveon Test Security American Board of Obstetrics and Gynecology