This document provides an overview of psychometrics and how to interpret item analysis reports. It discusses common statistical measures like item difficulty, discrimination index, and point-biserial. Guidelines are provided for desired statistical ranges for these measures. Examples of item analysis reports are also shown and discussed. The goal is to help users understand what their assessment data is telling them about item and test performance.
Psychometrics 101: Know what your assessment data is telling you
1. 1
An ExamSoft Client Webinar
Psychometrics 101:
Know What Your Exam
Data is Telling You
2. Psychometrics 101:
Know what your assessment
data is telling you
Eric Ermie – Director of Client Solutions, ExamSoft
(Formerly) Program Manager for Assessment and Evaluation,
The Ohio State University College of Medicine.
3. AGENDA
• Types
of
stats
• Interpre.ng
the
item
analysis
report
• General
sta.s.cal
guidelines
• Examples
4. TYPES
OF STATS
Common
Stats:
• Item
Difficulty/p
Value-‐
decimal
representa3on
of
difficulty
using
the
percentage
of
students
who
got
the
item
correct.
The
lower
the
decimal
the
higher
the
difficulty
• Upper
27%
-‐
what
percentage
of
the
top
27%
of
performers
got
the
ques3on
correct
• Lower
27%
-‐
what
percentage
of
the
boBom
27%
of
performers
got
the
ques3on
correct.
Common
Stats
Cont’d:
• Discrimina.on
index
–
the
difference
in
performance
between
the
Upper
27%
and
the
Lower
27%
• Point-‐Biserial-‐
a
discrimina3on
sta3s3c
that
indicates
whether
doing
well
on
that
specific
item
correlated
with
doing
well
on
the
exam
overall.
Thus
was
that
item
a
good
or
bad
predictor
of
overall
performance
on
the
exam.
6. But with any statistic it is important to
remember context matters!
7. ITEM ANALYSIS
EXAMPLES
Diff(p) Upper A B D E
0.98 100.00% 0.10 0 1 1 *178
0.00 0.55 0.55 98.34
0.00 0.02 -0.10 0.10
0.00 0.00 -0.02 0.02
0.00 0.00 0.00 1.00
0.00 0.00 0.02 0.98Lower 27%
Upper 27%
Disc. Index 0.00
0.00
0.00
0.00
0
0.00
Lower
Disc.
Index
1
% Selected
Point Biserial (rpb)
96.15% E0.04
Item
#
Correct Responses Point
Biserial
Correct
Answer
Response Frequencies (*Indicates correct answer)
C
8. Diff(p) Upper A B D E
0.66 82.00% 0.28 7 17 *120 9
3.87 9.39 66.30 4.97
-0.11 -0.19 0.28 -0.07
-0.04 -0.19 0.36 -0.04
0.00 0.00 0.82 0.06
0.04 0.19 0.46 0.10
Lower C
Item
#
Correct Responses Disc.
Index
Point
Biserial
Correct
Answer
Response Frequencies (*Indicates correct answer)
0.36
Lower 27%
Upper 27%
Disc. Index -0.09
0.21
0.12
Point Biserial (rpb)
46.15% D 28
15.47
-0.12
7
% Selected
ITEM ANALYSIS
EXAMPLES
9. ITEM ANALYSIS
EXAMPLES
Diff(p) Upper A B D E
0.36 52.00% 0.22 35 34 *66 25
19.34 18.78 36.46 13.81
-0.09 0.04 0.22 -0.06
-0.15 0.07 0.25 -0.02
0.10 0.24 0.52 0.10
0.25 0.17 0.27 0.12
Item
#
Correct Responses Disc.
Index
Point
Biserial
Correct
Answer
Response Frequencies (*Indicates correct answer)
Lower C
0.25
Lower 27%
Upper 27%
Disc. Index -0.15
0.19
0.04
Point Biserial (rpb)
26.92% D 21
11.60
-0.20
22
% Selected
10. ITEM ANALYSIS
EXAMPLES
Diff(p) Upper A B D E
0.55 25.00% -0.43 7 17 *120 9
3.87 9.39 55.00 7.46
-0.11 -0.19 -0.43 0.00
-0.04 -0.19 -0.57 0.00
0.00 0.00 0.25 0.00
0.00 0.00 0.83 0.00
Lower C
Ite m
#
Correct Responses Disc.
Index
Point
Biserial
Correct
Answer
Response Frequencies (*Indicates correct answer)
-0.57
Lower 27%
Upper 27%
Disc. Index -0.09
0.17
0.75
Point Biserial (rpb)
82.50% D 28
37.54
-0.12
82
% Selected
11. ITEM ANALYSIS
EXAMPLES
Diff(p) Upper A B D E
0.52 64.00% 0.18 61 21 5 0
33.70 11.60 2.76 0.00
-0.10 -0.19 0.12 0.00
-0.12 -0.13 0.04 0.00
0.26 0.04 0.06 0.00
0.38 0.17 0.02 0.00
Item
#
Correct Responses Disc.
Index
Point
Biserial
Correct
Answer
Response Frequencies (*Indicates correct answer)
Lower C
0.22
Lower 27%
Upper 27%
Disc. Index 0.22
0.42
0.64
Point Biserial (rpb)
42.31% C *94
51.93
0.18
24
% Selected
12. ITEM ANALYSIS
EXAMPLES
Diff(p) Upper A B D E
0.71 90.00% 0.31 0 *129 30 21
0.00 71.27 16.57 11.60
0.00 0.31 -0.25 -0.11
0.00 0.34 -0.23 -0.09
0.00 0.90 0.06 0.04
0.00 0.56 0.29 0.13
Item
#
Correct Responses Disc.
Index
Point
Biserial
Correct
Answer
Response Frequencies (*Indicates correct answer)
Lower C
0.34
Lower 27%
Upper 27%
Disc. Index -0.02
0.02
0.00
Point Biserial (rpb)
55.77% B 1
0.55
-0.16
34
% Selected
13. GENERAL
GUIDELINES
Desired
sta3s3cal
range’s
-‐
opinions
differ
but
most
commonly
used
are:
• Item
Difficulty/p
Value
-‐
Acceptable
item
difficulty
is
not
a
set
number
but
more
a
correla3on
with
ques3on
inten3on.
If
you
intended
the
item
to
be
a
mastery
item
you
want
the
difficulty
as
close
to
1.00
as
possible.
If
you
desired
a
discrimina3ng
ques3on
significantly
lower
levels
are
acceptable.
• Upper
27%
-‐
if
less
than
60%
of
your
top
performers
are
geQng
a
ques3on
correct
a
further
analysis
is
needed
to
see
if
there
are
issues
with
the
ques3on.
Also
if
less
of
your
upper
27%
get
a
ques3on
correct
than
your
lower
27%
then
there
is
also
an
issue.
• Lower
27%
-‐
generally
you
never
want
it
to
be
higher
than
the
upper
27%.
As
low
as
0%
can
be
acceptable
as
high
as
100%
can
be
acceptable
if
it
is
a
mastery
ques3on.
14. GENERAL
GUIDELINES
Desired
sta3s3cal
range’s
-‐
opinions
differ
but
most
commonly
used
are:
• Discrimina.on
index
–
some
set
specific
numbers
of
acceptable
and
unacceptable
values,
I
would
argue
the
more
accurate
guide
is
that
the
lower
the
p
value
the
higher
the
discrimina3on
index
needs
to
be.
Generally
.2
the
item
is
considered
to
have
discriminated,
less
than
that
is
considered
no
discrimina3on.
.3
or
greater
is
consider
highly
discrimina3ng.
• Point-‐Biserial
–
similarly
to
discrimina3on
index
some
set
specific
numbers
of
acceptable
and
unacceptable
values.
Generally
.2
and
above
is
considered
to
have
discrimina3on
and
have
posi3ve
associa3on
with
overall
performance
on
the
assessment,
lower
levels
are
acceptable
for
mastery
and
.3+
would
be
desired
for
discrimina3ng
ques3ons.
15. GENERAL
GUIDELINES
KR-‐20
Used
as
an
overall
measure
of
reliability
for
the
assessment.
Measured
on
a
scale
from
0.0
to
1.0
with
0.0
being
very
poor
and
1.0
being
excellent.
Quick
notes:
Heavily
influenced
by
number
of
ques3ons
in
assessment
Heavily
influenced
by
number
of
students
taking
the
assessments
The
combina3on
can
FREQUENTLY
lead
to
false
posi3ve
and
false
nega3ve
KR-‐20
values.
16. EXTRANEOUS
FACTORS
Stats
alone
do
not
tell
the
whole
story:
• Student
behavior
– Chea3ng
– Return
on
investment
• Conflic3ng
content/faculty
• “six
degrees
from
Sunday”
Ways
to
increase
the
accuracy/usefulness
of
your
stats:
• Item
review
process
– Format
– Level
of
difficulty
– Alterna3ve
correct
op3ons
• Historical
item
analysis
– Across
assessments
– Across
versions
• Reuse/Recycle
18. • Simplified
and
detailed
versions
of
item
analysis
reports
• Historical
item
analysis
data
by
version,
assessment
and
in
aggregate
• Ability
to
pull
item
analysis
by
discipline/ques3on
author/category
EXAMSOFT
FIT
THE DATA YOU NEED
19. Click to edit Master title style
Click to edit Master subtitle style
For More Information:
Call: 1.866.429.8889
Email: info@examsoft.com
Visit: learn.examsoft.com