SlideShare une entreprise Scribd logo
1  sur  85
Adaptive Testing
(Item Respond Theory)
Timothy K. Shih
Item Response Theory
1. The Item Characteristic Curve
2. Item Characteristic Curve Models
3. Estimating Item Parameters
4. The Test Characteristic Curve
5. Estimating an Examinee’s Ability
6. The Information Function
7. Test Calibration
8. Specifying the Characteristics of a Test
Source: FRANK B. BAKER, University of Wisconsin
Item Characteristic Curve
• What is Item Characteristic Curve
– Certain probability that an examinee with the
ability will give a correct answer to the item
– This probability is denoted by P
1.The Item Characteristic Curve
Item Characteristic Curve
under one-parameter model
1.The Item Characteristic Curve
Higher ability  higher probability
3 Item Characteristic Curve
with same discrimination
1.The Item Characteristic Curve
Higher difficulty  lower probability
3 Item Characteristic Curve
with same difficulty
1.The Item Characteristic Curve
Higher discrimination  lower probability
Logistic Function
• The Logistic Function
– e is the constant 2.718
– b is the difficulty
• typical value is between -3 to 3
– a is the discrimination
• typical value is between -2.80 to 2.80
– L = a(Θ-b) is the logistic deviate
– Θ is an ability level
b-a-
e1
1
e1
1
P L
2. Item Characteristic Curve Models
Logistic Function
(two-parameter model)
• Example:
– b = 1.0 (difficulty); a = 0.5 (discrimination)
– Illustrative computation with ability level: -3 (Θ=-3)
1.L = a(Θ-b) = 0.5*(-3.0-1.0) = -2.0
2.EXP(-L) = EXP(2.0) = 2.7182.0 = 7.389
3.1+ EXP(-L) = 1 + 7.389 = 8.389
4.P(Θ) = 1/(1+EXP(-L)) = 1/8.389 = 0.12
2. Item Characteristic Curve Models
Logistic Function
(two-parameter model)
Ability Logit EXP(-L) 1+EXP(-L) P
-3 -2 7.389 8.389 0.12
-2 -1.5 4.482 5.482 0.18
-1 -1 2.718 3.718 0.27
0 -0.5 1.649 2.649 0.38
1 0 1 2 0.5
2 0.5 0.607 1.607 0.26
3 1 0.368 1.368 0.73
2. Item Characteristic Curve Models
Logistic Function
(two-parameter model)
2. Item Characteristic Curve Models
b = 1.0 (difficulty); a = 0.5 (discrimination)
Logistic Function
(one-parameter model)
• One Parameter Logistic Model (Rasch)
– The discrimination parameter of the two-
parameter logistic model is fixed at a value
of a = 1.0 for all items; only the difficulty
parameter can take on different values
b
ee
1b-a-
1
1
1
1
P
2. Item Characteristic Curve Models
b = difficulty
a = discrimination
Logistic Function
(one-parameter model)
• Example:
– b = 1.0 (difficulty)
– Illustrative computation with ability level: -3 (Θ=-3)
1.L = Θ-1.0 = -3.0-1.0 = -4.0
2.EXP(-L) = EXP(4.0) = 2.7184.0 = 54.598
3.1+ EXP(-L) = 1 + 54.598 = 55.598
4.P(Θ) = 1/(1+EXP(-L)) = 1/55.598 = 0.02
2. Item Characteristic Curve Models
Logistic Function
(one-parameter model)
Ability Logit EXP(-L) 1+EXP(-L) P
-3 -4 54.598 55.598 0.02
-2 -3 20.086 21.086 0.05
-1 -2 7.389 8.389 0.12
0 -1 2.718 3.718 0.27
1 0 1 2 0.5
2 1 0.368 1.368 0.73
3 2 0.135 1.135 0.88
2. Item Characteristic Curve Models
Logistic Function
(one-parameter model)
2. Item Characteristic Curve Models
a = 1.0 (fixed) b = 1.0
Logistic Function
(three-parameter model)
• Three Parameter Model
– One of the facts of life in testing is that examinees
will get items correct by guessing. Thus, the
probability of correct response includes a small
component that is due to guessing.
– b is difficulty
– a is discrimination
– c is guessing
» Theoretical value is between 0 to 1.0
» But c>0.35 are not considered acceptable
» Hence c is between 0 to 0.35
– Θ is an ability level
b-a-
1
1
1P
e
cc
2. Item Characteristic Curve Models
That is why multiple choice
questions have 4 answers
Logistic Function
(three-parameter model)
• Example:
– b = 1.5 (difficulty); a = 1.3 (discrimination); c = 0.2 (guessing)
– Illustrative computation with ability level: -3 (Θ=-3)
1.L = a(Θ-b) = 1.3*(-3.0-1.5) = -5.85
2.EXP(-L) = EXP(5.85) = 2.7185.85 = 347.234
3.1+ EXP(-L) = 1 + 347.234 = 348.234
4.1/(1+EXP(-L)) = 1/ 348.234 = 0.0029
5.P(Θ) = c + (1 - c) * 0.0029 = 0.2 + (1 - 0.2) * 0.0029
= 0.2 + 0.8 * 0.0029
= 0.2 + 0.0023
= 0.2023
2. Item Characteristic Curve Models
Logistic Function
(three-parameter model)
Ability Logit EXP(-L) 1+EXP(-L) P
-3 -5.85 347.234 348.234 0.2
-2 -4.55 94.632 95.632 0.21
-1 -3.25 25.79 26.79 0.23
0 -1.95 7.029 8.029 0.3
1 -0.65 1.916 2.916 0.47
2 0.65 0.522 1.522 0.73
3 1.95 0.142 1.142 0.9
2. Item Characteristic Curve Models
Logistic Function
(three-parameter model)
2. Item Characteristic Curve Models
–b = 1.5 (difficulty); a = 1.3 (discrimination); c = 0.2 (guessing)
Negative Discrimination
• While most test items will discriminate in
a positive manner
– the probability of correct response increases
as the ability level increases
• Some items have negative
discrimination. In such items, the
probability of correct response decreases
as the ability level increases from low to
high
2. Item Characteristic Curve Models
Negative Discrimination
2. Item Characteristic Curve Models
Negative Discrimination
Items with negative discrimination occur in
two ways.
• the incorrect response to a two-choice item will always
have a negative discrimination parameter if the correct
response has a positive value.
• sometimes the correct response to an item will yield a
negative discrimination index.
• This tells you that something is wrong with the item:
– Either it is poorly written or there is some
misinformation prevalent among the high-ability
students.
• For most of the item response theory topics of
interest, the value of the discrimination parameter
will be positive.
2. Item Characteristic Curve Models
Discussion
Incorrect
Correct
2. Item Characteristic Curve Models
Discussion
1. The two item characteristic curves have
the same value for the difficulty
parameter (b = 1.0)
2. And the discrimination parameters have
the same absolute value. However, they
have opposite signs, with the correct
response being positive and the
incorrect response being negative.
2. Item Characteristic Curve Models
Observed Proportion
• M examinees responds to the N items in the
test
– These examinees will be divided into, J groups
along the scale so that all the examinees within
a given group have the same ability level θj
• And there will be mj examinees within group j, where j
= 1, 2, 3. . . . J.
– Within a particular ability score group, rj
examinees answer the given item correctly.
• at an ability level of θj, the observed proportion of
correct response is p(θj ) = rj/mj
• p(θj ) is an estimation of the probability of correct
response at ability level θj
3. Estimating Item Parameters
Observed Proportion
• If the observed proportions of correct
response in each ability group are plotted,
the result will look like this
3. Estimating Item Parameters
Find the ICC best fits the observed
proportions of correct response
1. Select a model for the curve to be fitted
– two-parameter model will be employed here
2. Initial values for the item parameters
– b = 0.0, a = 1.0
3. Using these estimates, the value of P(θj) is computed at each ability
level via the equation of the two-parameter model.
4. The agreement of the observed value of p(θj) and computed value P(θj)
is determined across all ability groups.
5. Adjustments to the estimated item parameters are found that result in
better agreement between the ICC defined by the estimated values of
the parameters and the observed proportions of correct response.
6. This process is continued until the adjustments get so small that little
improvement in the agreement is possible.
7. At this point, the estimation procedure is terminated and the current
values of b and a are the item parameter estimates.
3. Estimating Item Parameters
The Chi-square goodness-
of-fit index
– J is the number of ability groups
– Θj is the ability level of group j
– mj is the number of examinees having ability Θj
– p(Θj) is the observed proportion of correct response for group j
– P(Θj) is the probability of correct response for group j computed from
the ICC model using the parameter estimates
J
j jj
jj
j
QP
Pp
mx
1
2
2
3. Estimating Item Parameters
The Chi-square goodness-
of-fit index
• If the value of the “Chi-square goodness-
of-fit index” is greater than a criterion
value
– the item characteristic curve specified by the
values of the item parameter estimates does
not fit the data
• the wrong item characteristic curve model may
have been employed.
• the values of the observed proportions of correct
response are so widely scattered that a good
fit, regardless of model, cannot be obtained.
3. Estimating Item Parameters
The Group Invariance of
Item Parameters
• Assume two groups of examinees are
drawn from the same population of
examinees
• The first group has a range of ability scores
from -3 to -1, with a mean of -2; The second
group has a range of ability scores from +1
to +3 with a mean of +2
• the observed proportion of correct response
to a given item is computed from the item
response data for every ability level within
each of the two groups.
3. Estimating Item Parameters
The Group Invariance of
Item Parameters
For the first group, the proportions
of correct response are plotted as
this
The maximum likelihood procedure is
then used to fit an item characteristic
curve to the data and numerical values
of the item parameter estimates, b(1) =
-.39 and a(1) = 1.27, were obtained.
3. Estimating Item Parameters
The Group Invariance of
Item Parameters
For the second group, the
proportions of correct response
are plotted like this
The maximum likelihood procedure is
then used to fit an item characteristic
curve to the data and numerical values
of the item parameter estimates, b(1) =
-.39 and a(1) = 1.27, were obtained.
3. Estimating Item Parameters
The Group Invariance of
Item Parameters
3. Estimating Item Parameters
• b(1) = b(2) and a(1) = a(2)
• The item parameters are group invariant.
• The values of the item parameters are a property of the item, not of
the group that responded to the item.
• The value of the classical item difficulty index is not group invariant.
True score
N
1i
jij
θPTS
TSj is the true score for examinees with ability level θj.
i denotes an item
Pi(θj ) depends upon the particular ICC model employed (i.e.,
computed from the ICC model)
4. The Test Characteristic Curve
True score
• Example
– with two-parameter model; at an ability level of 1.0.
– Item 1:
P1 (1.0) = 1/(1 + exp(-0.5(1.0 - (-1.0)))) = 0.73
– Item2:
P2 (1.0) = 1/(1 + exp(-1.2 (1.0- (0.75)))) = 0.57
– Item3:
P3 (1.0) = 1/(1 + exp(-0.8 (1.0 -(0)))) = 0.69
– Item4:
P4 (1.0) = 1/(1 + exp(-1.0 (1.0 - (0.5)))) = 0.62
4. The Test Characteristic Curve
True score
4. The Test Characteristic Curve
True score
2.61
0.62+0.69+0.57+.730
4
1
0.10.1
i
i
PTS
4. The Test Characteristic Curve
Test Characteristic Curve
• Test Characteristic Curve (TCC)
– The vertical axis would be the true scores and
would range from zero to the number of items in the
test
– The horizontal axis would be the ability scale
4. The Test Characteristic Curve
Test Characteristic Curve
• The primary role of the TCC in IRT is to
provide a means of transforming ability
scores to true scores
• Given your ability, provides your “True
Score”
4. The Test Characteristic Curve
Primary purpose for administering
a test to an examinee
• Under IRT, the primary purpose for
administering a test to an examinee is to
locate that person on the ability scale. If
such an ability measure can be obtained for
each person taking the test, two goals can
be achieved.
– The examinee can be evaluated in terms of how
much underlying ability he or she possesses.
– Comparisons among examinees can be made
for purposes of assigning grades, awarding
scholarships, etc.
5. Estimating an Examinee’s Ability
Estimating an Examinee’s Ability
• Ability Estimation Procedures
N
i
SiSii
N
i
Siii
SS
QPa
Pua
1
^^
2
1
^
^
1
^
Θ^
s is the estimated ability of the examinee within iteration s
ai is the discrimination parameter of item i, i = 1, 2, . . . .N
ui is the response made by the examinee to item i:
ui = 1 for a correct response
ui = 0 for an incorrect response
Pi(θ^
s ) is the probability of correct response to item i, under the
given ICC model, at ability level θ^ within iteration s.
Qi (θ^
s ) = 1 - Pi(θ^
s ) is the probability of incorrect response to
item i, under the given ICC model, at ability level θ^ within
iteration s.
5. Estimating an Examinee’s Ability
Estimating an Examinee’s Ability
• Example
– 3 items test:
• Item_1: b=-1; a= 1.0
• Item_2: b=0; a=1.2
• Item_3: b=1; a=0.8
– Under ICC two-
parameter model
– The examinee’s item
responses were:
• Item_1: 1
• Item_2: 0
• Item_3: 1
item u P(1) Q
=(1-P)
a(u-P) a*a(PQ)
1 1 0.88 0.12 0.119 0.105
2 0 0.77 0.23 -0.922 0.255
3 1 0.5 0.5 0.4 0.160
sum -0.403 0.52
The examinee’s ability is set to θ^
s = 1.0
ΔΘ^
s = -0.403/0.520 = -0.773,
Θ^
s+1 = 1.0 - 0.773 = 0.227
1’st iteration:
5. Estimating an Examinee’s Ability
Estimating an Examinee’s Ability
item u P(0.227)
Q
=(1-P)
a(u-P) a*a(PQ)
1 1 0.77 0.23 0.227 0.175
2 0 0.57 0.43 -0.681 0.353
3 1 0.35 0.65 0.520 0.146
sum 0.066 0.674
2’nd iteration:
ΔΘ^
s = 0.066/0.674 = 0.097,
Θ^
s+1 = 0.227 + 0.097 = 0.324
item u P(0.324)
Q
=(1-P)
a(u-P) a*a(PQ)
1 1 0.79 0.21 0.2102 0.1660
2 0 0.60 0.40 -0.7152 0.3467
3 1 0.37 0.63 0.5056 0.1488
sum 0.0006 0.6615
3’rd iteration:
ΔΘ^
s = 0.0006/0.6615 = 0.0009,
Θ^
s+1 = 0.324 + 0.0009 = 0.3249
5. Estimating an Examinee’s Ability
The iteration is terminated because
the value of the adjustment (0.0009)
is very small.
The examinee’s estimated ability is
0.3249
Standard error
• The standard error is a measure of the
variability of the values of θ^ around the
examinee’s unknown parameter value θ.
5. Estimating an Examinee’s Ability
N
i
i
QPa
SE
1
^^
2
^
1
Standard error
5. Estimating an Examinee’s Ability
item u P(0.324)
Q
=(1-P)
a(u-P) a*a(PQ)
1 1 0.79 0.21 0.2102 0.1660
2 0 0.60 0.40 -0.7152 0.3467
3 1 0.37 0.63 0.5056 0.1488
sum 0.0006 0.6615
23.1
6615.0
1^
SE
Estimating an Examinee’s Ability
• The examinee’s ability (0.3249) was not
estimated very precisely because the
standard error is very large (1.23).
– This is primarily due to the fact that only
three items were used here and one would
not expect a very good estimate.
5. Estimating an Examinee’s Ability
Estimating an Examinee’s Ability
• Two cases for the estimation procedure fails
to yield an ability estimate
– When an examinee answers none of the items
correctly
• the corresponding ability estimate is negative infinity.
– When an examinee answers all the items in the
test correctly
• the corresponding ability estimate is positive infinity.
• The computer programs used to estimate
ability must protect themselves against
these two conditions
5. Estimating an Examinee’s Ability
Item Invariance of an
Examinee’s Ability Estimate
• The examinee’s ability is invariant with
respect to the items used to determine it
– All the items measure the same underlying
latent trait
– The values of all the item parameters are in
a common metric
5. Estimating an Examinee’s Ability
Item Invariance of an
Examinee’s Ability Estimate
• A set of 10 items having an average difficulty of -2
were administered to this examinee
– the item responses could be used to estimate the examinee’s
ability, yielding θ^
1 for this test.
• Another set of 10 items having an average difficulty
of +1 were also administered to this examinee
– these item responses could be used to estimate the examinee’s
ability, yielding θ^
2 for this second test.
• Under the item invariance principle
– θ^
1 = θ^
2
– i.e., the two sets of items should yield the same ability
estimate, within sampling variation, for the examinee
5. Estimating an Examinee’s Ability
The Information Function
• What’s “Information”
– having information => knowing something
about a particular object or topic
– In statistics & psychometrics
• The reciprocal of the precision with which a
parameter could be estimated
6. The Information Function
The Information Function
• Measure of precision is the variance of
the estimators, denote by σ2
• The amount of information, denoted by I
6. The Information Function
2
1
I
The Information Function
• If the amount of information is large, it
means that an examinee whose true ability
is at that level can be estimated with
precision;
– i.e., all the estimates will be reasonably close to
the true value
• If the amount of information is small, it
means that the ability cannot be estimated
with precision and the estimates will be
widely scattered about the true ability
6. The Information Function
The Information Function
The amount of information has a maximum at an ability level of
-1.0 and is about 3 for the ability range of -2<= θ <= 0.
Within this range, ability is estimated with some precision.
Outside this range, the amount of information decreases
rapidly, and the corresponding ability levels are not estimated
very well.
6. The Information Function
• The information function does not
depend upon the distribution of
examinees over the ability scale.
• In a general purpose test, the ideal
information function would be a
horizontal line at some large value of
I and all ability levels would be
estimated with the same precision.
• Unfortunately, such an information
function is hard to achieve.
• Different ability levels are estimated
with differing degrees of precision.
Item Information Function
6. The Information Function
1. The amount of information, based upon a single item, can be
computed at any ability level and is denoted by Ii (θ ), where i
indexes the item.
2. Because only a single item is involved, the amount of information at
any point on the ability scale is going to be rather small.
3. The amount of item information decreases as the ability level
departs from the item difficulty and approaches zero at the extremes
of the ability scale.
Definition of Item Information
• Two-Parameter Item Characteristic
Curve Model
iiii
QPaI
2
ai is the discrimination parameter for item I
Pi(θ) = 1 / (1 + EXP(-ai(θ - bi)))
Qi(θ) =1 - Pi(θ)
θ is the ability level of interest
6. The Information Function
Definition of Item Information
θ L EXP(-L) Pi(θ) Qi(θ) Pi(θ)Qi(θ) a2 Ii(θ)
-3 -6 403.43 0.00 1.00 0.00 2.25 0.00
-2 -4.5 90.02 0.01 0.99 0.01 2.25 0.02
-1 -3.0 20.09 0.05 0.95 0.05 2.25 0.11
0 -1.5 4.48 0.18 0.82 0.15 2.25 0.34
1 0.0 1.00 0.50 0.50 0.25 2.25 0.56
2 1.5 0.22 0.82 0.18 0.15 2.25 0.34
3 3.0 0.05 0.95 0.05 0.05 2.25 0.11
Calculation of item information under a two-parameter model
b = 1.0, a = 1.5
6. The Information Function
Definition of Item Information
6. The Information Function
Definition of Item Information
• One-Parameter Item Characteristic
Curve Model
iii
QPI
Pi(θ) = 1 / (1 + EXP(-(θ - bi)))
Qi(θ) =1 - Pi(θ)
θ is the ability level of interest
6. The Information Function
Definition of Item Information
θ L EXP(-L) Pi(θ) Qi(θ) Pi(θ)Qi(θ) a2 Ii(θ)
-3 -4.0 45.60 0.02 0.98 0.02 1 0.02
-2 -3.0 20.09 0.05 0.95 0.05 1 0.05
-1 -2.0 7.39 0.12 0.88 0.11 1 0.11
0 -1.0 2.72 0.27 0.73 0.20 1 0.20
1 0.0 1.00 0.50 0.50 0.25 1 0.25
2 1.0 0.37 0.73 0.27 0.20 1 0.20
3 2.0 0.14 0.88 0.12 0.11 1 0.11
Calculation of item information under a one-parameter model
b = 1.0
6. The Information Function
Definition of Item Information
6. The Information Function
Definition of Item Information
• Three-Parameter Item Characteristic
Curve Model
2
2
2
1 c
cP
P
Q
aI i
i
i
i
Pi(θ) = c + (1 - c) (1/(1 + EXP (-L)))
L = a (θ - b)
Qi(θ) =1 - Pi(θ)
θ is the ability level of interest
6. The Information Function
Definition of Item Information
• Example
– b = 1.0;
a = 1.5;
c = 0.2
– ability level of θ = 0.0.
1. L = a (θ - b) = 1.5 (0 - 1) = -1.5
EXP (-L) = EXP (1.5) = 4.482
1/(1 + EXP (-L)) = 1/(1 + 4.482) = 0.182
Pi (θ ) = c + (1 - c) (1/(1 + EXP (-L)))
= 0.2 + 0.8 (0.182)
= 0.346
2. Qi (θ ) = 1 - 0.346 = 0.654
3. Qi (θ )/P1 (θ ) = 0.654/0.346 = 1.890
4. (Pi (θ ) - c)2 = (0.346 - 0.2)2
= (0.146)2
= 0.021
5. (1 - c)2 = (1 - 0.2)2 = (0.8)2 = 0.64
6. a2 = (1.5)2 = 2.25
7. Ii (θ ) = (2.25) (1.890) (0.021)/(0.64)
= 0.142
2
2
2
1 c
cP
P
Q
aI i
i
i
i
6. The Information Function
Definition of Item Information
θ L Pi(θ) Qi(θ) Pi(θ)Qi(θ) (Pi(θ)-c) Ii(θ)
-3 -6.0 0.20 0.80 3.950 0.000 0.000
-2 -4.5 0.21 0.79 3.785 0.000 0.001
-1 -3.0 0.24 0.76 3.202 0.001 0.016
0 -1.5 0.35 0.65 1.890 0.021 0.142
1 0.0 0.60 0.40 0.667 0.160 0.375
2 1.5 0.85 0.15 0.171 0.428 0.257
3 3.0 0.96 0.04 0.040 0.481 0.082
Calculation of item information under a three-parameter model
b = 1.0; a = 1.5; c = 0.2
6. The Information Function
Definition of Item Information
6. The Information Function
Test Information Function
N
i
i
II
1
I (θ) is the amount of test information
at an ability level of θ
Ii(θ) is the amount of information for
item i at ability level θ
N is the number of items in the test
6. The Information Function
Computing a Test
Information Function
• Example
– 5-item
– Under two-parameter model
item b a
1 -1.0 2.0
2 -0.5 1.5
3 -0.0 1.5
4 0.5 1.5
5 1.0 2.0
6. The Information Function
Computing a Test
Information Function
θ 1 2 3 4 5 Test Information
-3 0.071 0.051 0.024 0.012 0.001 0.159
-2 0.420 0.194 0.102 0.051 0.010 0.777
-1 1.000 0.490 0.336 0.194 0.071 2.091
0 0.420 0.490 0.563 0.490 0.420 2.383
1 0.071 0.194 0.336 0.490 1.000 2.091
2 0.010 0.051 0.102 0.194 0.420 0.777
3 0.001 0.012 0.024 0.051 0.071 0.159
6. The Information Function
The Test Calibration Process
• The Birnbaum paradigm is an iterative
procedure employing two stages of
maximum likelihood estimation.
– Stage 1: the parameters of the N items in the
test are estimated,
– Stage 2: the ability parameters of the M
examinees are estimated.
• The two stages are performed iteratively
until a stable set of parameter estimates is
obtained
• And the test has been calibrated and an ability scale metric
defined
7. Test Calibration
The Test Calibration Process
• Stage one:
– The estimated ability of each examinee is treated as
if it is expressed in the true metric of the latent trait.
– The parameters of each item in the test are
estimated via the maximum likelihood procedure
discussed in Estimating Item Parameters.
– This is done one item at a time, because an
underlying assumption is that the items are
independent of each other.
– The result is a set of values for the estimates of the
parameters of the items in the test.
7. Test Calibration
The Test Calibration Process
• Stage two:
– The ability of each examinee is estimated
using the maximum likelihood procedure
presented in Estimating an Examinee’s
Ability
– It is assumed that the ability of each
examinee is independent of all other
examinees. Hence, the ability estimates are
obtained one examinee at a time
7. Test Calibration
The Test Calibration Process
• The two-stage process is repeated until
some suitable convergence criterion is
met
• The overall effect is that the parameters
of the N test items and the ability levels
of the M examinees have been estimated
simultaneously, even though they were
done one at a time
7. Test Calibration
Test Calibration Under the
one-parameter Model
1 2 3 4 5 6 7 8 9 10 RS
01 0 0 1 0 0 0 0 1 0 0 2
02 1 0 1 0 0 0 0 0 0 0 2
03 1 1 1 0 1 0 1 0 0 0 5
04 1 1 1 0 1 0 0 0 0 0 4
05 0 0 0 0 1 0 0 0 0 0 1
06 1 1 0 1 0 0 0 0 0 0 3
07 1 0 0 0 0 1 1 1 0 0 4
08 1 0 0 0 1 1 0 0 1 0 4
09 1 0 1 0 0 1 0 0 1 0 4
10 1 0 0 0 1 0 0 0 0 1 3
11 1 1 1 1 1 1 1 1 1 0 9
12 1 1 1 1 1 1 1 1 1 0 9
13 1 1 1 0 1 0 1 0 0 1 6
14 1 1 1 1 1 1 1 1 1 0 9
15 1 1 0 1 1 1 1 1 1 1 9
16 1 1 1 1 1 1 1 1 1 1 10 1 for correct and 0 for incorrect.
if an item is answered correctly
by all of the examinees or by
none of the examinees, its item
difficulty parameter cannot be
estimated.
examinee
items
Test calibration under the
Rasch model: all examinees
having the same number of
items correct will obtain the
same estimated ability.
7. Test Calibration
Test Calibration Under the
one-parameter Model
1 2 3 4 5 6 7 8 9 10 ROW
Total
1 1 1
2 1 2 1 4
3 2 1 1 1 1 6
4 4 1 2 2 3 1 1 2 16
5 1 1 1 1 1 5
6 1 1 1 1 1 1 6
9 4 4 2 4 4 4 4 4 4 2 36
COL
Total
13 8 8 5 10 7 7 6 7 3 74
7. Test Calibration
items
score
Test Calibration Under the one-
parameter Model
item difficulty
1 -2.37
2 -0.27
3 -0.27
4 0.98
5 -1
6 0.11
7 0.11
8 0.52
9 0.11
10 2.06
7. Test Calibration
Examinee Ability obtained Raw Score
1 -1.50 2
2 -1.50 2
3 +0.02 5
4 -0.42 4
5 -2.37 1
6 -0.91 3
7 -0.42 4
8 -0.42 4
9 -0.42 4
10 -0.91 3
11 +2.33 9
12 +2.33 9
13 +0.46 6
14 +2.33 9
15 +2.33 9
16 ***** 10
Test Calibration Under the
one-parameter Model
• Under the Rasch model, the value of the
discrimination parameter is fixed at 1 for
all of the items in the test. This aspect of
the Rasch model is appealing to
practitioners because they intuitively feel
that examinees obtaining the same raw
test score should receive the same ability
estimate.
7. Test Calibration
Test Calibration Under the
2/3-parameter Model
• When the two- and three-parameter item
characteristic curve models are used, an
examinee’s ability estimate depends
upon the particular pattern of item
responses rather than the raw score.
7. Test Calibration
Test Calibration Under the
2/3-parameter Model
• Under these models, examinees with the
same item response pattern will obtain
the same ability estimate. Thus,
examinees with the same raw score
could obtain different ability estimates if
they answered different items correctly.
7. Test Calibration
The Framework of IRT
• In order to obtain the many advantages
of IRT, tests should be designed,
constructed, analyzed, and interpreted
within the framework of the theory.
• This chapter provides the experiences in
the technical aspects of test construction
within the framework of IRT.
8. Specifying the Characteristics of a Test
Item Banking
• Test construction process is usually based
upon having a collection of items from which
to select those to be included in a particular
test. (Item pools)
• Items are selected from such pools on the
basis of both their content and their
technical characteristics,
i.e., their item parameter values
• Under IRT, a well-defined set of procedures
is used to establish and maintain such item
pools.
item banking, has been given to these procedures
8. Specifying the Characteristics of a Test
Item Banking
• Basic Goal
– have an item pool in which the values of the
item parameters are expressed in a known
ability-scale metric.
8. Specifying the Characteristics of a Test
Developing a Test From a
Pre-calibrated Item Pool
• ICC model is selected, the examinees’ item
response data are analyzed via the
Birnbaum paradigm, and the test is
calibrated.
• The ability scale resulting from this
calibration is considered to be the baseline
metric of the item pool.
• From a test construction point of view, we
now have a set of items whose item
parameter values are known; in technical
terms, a “pre-calibrated item pool” exists.
8. Specifying the Characteristics of a Test
Developing a Test From a
Pre-calibrated Item Pool
• The advantage of having a pre-calibrated
item pool is that the parameter values of
the items included in the test can be used
to compute the test characteristic curve
and the test information function before
the test is administered.
8. Specifying the Characteristics of a Test
Some Typical Testing Goals
• Screening tests
– Tests used for screening purposes have the
capability to distinguish rather sharply
between examinees whose abilities are just
below a given ability level and those who are
at or above that level.
– Such tests are used to assign scholarships
and to assign students to specific
instructional programs such as remediation
or advanced placement.
8. Specifying the Characteristics of a Test
Some Typical Testing Goals
• Broad-ranged tests
– These tests are used to measure ability over
a wide range of underlying ability scale. The
primary purpose is to be able to make a
statement about an examinee’s ability and to
make comparisons among examinees.
– Tests measuring reading or mathematics are
typically broad-range tests.
8. Specifying the Characteristics of a Test
Some Typical Testing Goals
• Peaked tests
– Such tests are designed to measure ability
quite well in a region of the ability scale
where most of the examinees’ abilities will
be located, and less well outside this region.
– When one deliberately creates a peaked test,
it is to measure ability well in a range of
ability that is wider than that of a screening
test, but not as wide as that of a broad-range
test.
8. Specifying the Characteristics of a Test
Summary
• Classical Test Theory
• IRT
– Item Characteristic Curve
– Test Characteristic Curve
– Estimating an Examinee’s Ability
– Test Calibration
– Item Banking
• Automatic Test Generation

Contenu connexe

Tendances

Irt 1 pl, 2pl, 3pl.pdf
Irt 1 pl, 2pl, 3pl.pdfIrt 1 pl, 2pl, 3pl.pdf
Irt 1 pl, 2pl, 3pl.pdfCarlo Magno
 
Psychometric testingpvl
Psychometric testingpvlPsychometric testingpvl
Psychometric testingpvlsirishareddy
 
Basic concepts in psychological testing
Basic concepts in psychological testingBasic concepts in psychological testing
Basic concepts in psychological testingRoi Xcel
 
Sach sentence completion
Sach sentence completionSach sentence completion
Sach sentence completionEyeFrani
 
Reliability & validity
Reliability & validityReliability & validity
Reliability & validityshefali84
 
The Rorschach Psychodiagnostic Test
The Rorschach Psychodiagnostic TestThe Rorschach Psychodiagnostic Test
The Rorschach Psychodiagnostic TestS-Shabir
 
Introduction to Item Response Theory
Introduction to Item Response TheoryIntroduction to Item Response Theory
Introduction to Item Response TheoryNathan Thompson
 
Types of psychological test
Types of psychological testTypes of psychological test
Types of psychological testAbigail Gamboa
 
Protective Test - HFD- Personality Analysis
Protective Test  - HFD- Personality AnalysisProtective Test  - HFD- Personality Analysis
Protective Test - HFD- Personality AnalysisArora Mairaj
 
IRT - Item response Theory
IRT - Item response TheoryIRT - Item response Theory
IRT - Item response TheoryAjay Dhamija
 
ETHICAL STANDARDS IN TESTING.
ETHICAL STANDARDS IN TESTING.ETHICAL STANDARDS IN TESTING.
ETHICAL STANDARDS IN TESTING.ANCYBS
 
Clinical assessment: legal and ethical issues
Clinical assessment: legal and ethical issuesClinical assessment: legal and ethical issues
Clinical assessment: legal and ethical issuesJoshua Watson
 
Educational testing and assessment
Educational testing and assessmentEducational testing and assessment
Educational testing and assessmentAbdul Majid
 

Tendances (20)

Case presentation
Case presentationCase presentation
Case presentation
 
Personality
PersonalityPersonality
Personality
 
Irt 1 pl, 2pl, 3pl.pdf
Irt 1 pl, 2pl, 3pl.pdfIrt 1 pl, 2pl, 3pl.pdf
Irt 1 pl, 2pl, 3pl.pdf
 
Psychometric testingpvl
Psychometric testingpvlPsychometric testingpvl
Psychometric testingpvl
 
Basic concepts in psychological testing
Basic concepts in psychological testingBasic concepts in psychological testing
Basic concepts in psychological testing
 
Sach sentence completion
Sach sentence completionSach sentence completion
Sach sentence completion
 
Reliability & validity
Reliability & validityReliability & validity
Reliability & validity
 
IRT in Test Construction
IRT in Test Construction IRT in Test Construction
IRT in Test Construction
 
The Rorschach Psychodiagnostic Test
The Rorschach Psychodiagnostic TestThe Rorschach Psychodiagnostic Test
The Rorschach Psychodiagnostic Test
 
Irt assessment
Irt assessmentIrt assessment
Irt assessment
 
Introduction to Item Response Theory
Introduction to Item Response TheoryIntroduction to Item Response Theory
Introduction to Item Response Theory
 
Types of psychological test
Types of psychological testTypes of psychological test
Types of psychological test
 
Protective Test - HFD- Personality Analysis
Protective Test  - HFD- Personality AnalysisProtective Test  - HFD- Personality Analysis
Protective Test - HFD- Personality Analysis
 
Lesson 18
Lesson 18Lesson 18
Lesson 18
 
IRT - Item response Theory
IRT - Item response TheoryIRT - Item response Theory
IRT - Item response Theory
 
ETHICAL STANDARDS IN TESTING.
ETHICAL STANDARDS IN TESTING.ETHICAL STANDARDS IN TESTING.
ETHICAL STANDARDS IN TESTING.
 
Mmpi scale
Mmpi scaleMmpi scale
Mmpi scale
 
SSCT-Report.pptx
SSCT-Report.pptxSSCT-Report.pptx
SSCT-Report.pptx
 
Clinical assessment: legal and ethical issues
Clinical assessment: legal and ethical issuesClinical assessment: legal and ethical issues
Clinical assessment: legal and ethical issues
 
Educational testing and assessment
Educational testing and assessmentEducational testing and assessment
Educational testing and assessment
 

En vedette

J4932 Resilient livelihoods framework AW-web
J4932 Resilient livelihoods framework AW-webJ4932 Resilient livelihoods framework AW-web
J4932 Resilient livelihoods framework AW-websimone di vicenz
 
Examining wetland loss and potential restoration opportunities in the Sandusk...
Examining wetland loss and potential restoration opportunities in the Sandusk...Examining wetland loss and potential restoration opportunities in the Sandusk...
Examining wetland loss and potential restoration opportunities in the Sandusk...James Ashby
 
International Economic Lecture 2
International Economic Lecture 2International Economic Lecture 2
International Economic Lecture 2saark
 
Communication for development in Climate Field School: the case of Livelihood...
Communication for development in Climate Field School: the case of Livelihood...Communication for development in Climate Field School: the case of Livelihood...
Communication for development in Climate Field School: the case of Livelihood...Csdi Initiative
 
Kastoria by Theano Manoli
Kastoria by Theano ManoliKastoria by Theano Manoli
Kastoria by Theano Manoliguest8f2515b
 
PERI Holistic Assessment Seminar 2010 Presentation Slides
PERI Holistic Assessment Seminar 2010 Presentation SlidesPERI Holistic Assessment Seminar 2010 Presentation Slides
PERI Holistic Assessment Seminar 2010 Presentation Slidesmoeccd
 
Administering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessmentAdministering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessmentNema Grace Medillo
 
Biodiversity Offsetting - Legislation and Wetlands - NPCA's Role
Biodiversity Offsetting - Legislation and Wetlands - NPCA's RoleBiodiversity Offsetting - Legislation and Wetlands - NPCA's Role
Biodiversity Offsetting - Legislation and Wetlands - NPCA's RoleMichael Reles
 
P ri nz conference - michael field (2)
P ri nz conference - michael field (2)P ri nz conference - michael field (2)
P ri nz conference - michael field (2)Michael Field
 
10 Social Theories You Need To Know
10 Social Theories You Need To Know10 Social Theories You Need To Know
10 Social Theories You Need To Knowbrightlemon
 
Utilizing Wetlands, Protecting Marine Resources Powerpoint
Utilizing Wetlands, Protecting Marine Resources PowerpointUtilizing Wetlands, Protecting Marine Resources Powerpoint
Utilizing Wetlands, Protecting Marine Resources PowerpointCharvari Watson
 
Chapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test ItemsChapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test ItemsSHELAMIE SANTILLAN
 
Subjective vs Objective test
Subjective vs Objective testSubjective vs Objective test
Subjective vs Objective testSùng A Tô
 
Lesson 4 analysis of test results
Lesson 4 analysis of test resultsLesson 4 analysis of test results
Lesson 4 analysis of test resultsCarlo Magno
 
Tania goel kuznets curve
Tania goel kuznets curveTania goel kuznets curve
Tania goel kuznets curveTania goel
 
Impacts of wetland degradation
Impacts of wetland degradationImpacts of wetland degradation
Impacts of wetland degradationManoshi Goswami
 

En vedette (20)

Subjective & Objective Writing Skills
Subjective & Objective Writing SkillsSubjective & Objective Writing Skills
Subjective & Objective Writing Skills
 
Item Analysis
Item AnalysisItem Analysis
Item Analysis
 
J4932 Resilient livelihoods framework AW-web
J4932 Resilient livelihoods framework AW-webJ4932 Resilient livelihoods framework AW-web
J4932 Resilient livelihoods framework AW-web
 
Examining wetland loss and potential restoration opportunities in the Sandusk...
Examining wetland loss and potential restoration opportunities in the Sandusk...Examining wetland loss and potential restoration opportunities in the Sandusk...
Examining wetland loss and potential restoration opportunities in the Sandusk...
 
International Economic Lecture 2
International Economic Lecture 2International Economic Lecture 2
International Economic Lecture 2
 
Communication for development in Climate Field School: the case of Livelihood...
Communication for development in Climate Field School: the case of Livelihood...Communication for development in Climate Field School: the case of Livelihood...
Communication for development in Climate Field School: the case of Livelihood...
 
Kastoria by Theano Manoli
Kastoria by Theano ManoliKastoria by Theano Manoli
Kastoria by Theano Manoli
 
PERI Holistic Assessment Seminar 2010 Presentation Slides
PERI Holistic Assessment Seminar 2010 Presentation SlidesPERI Holistic Assessment Seminar 2010 Presentation Slides
PERI Holistic Assessment Seminar 2010 Presentation Slides
 
Administering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessmentAdministering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessment
 
Biodiversity Offsetting - Legislation and Wetlands - NPCA's Role
Biodiversity Offsetting - Legislation and Wetlands - NPCA's RoleBiodiversity Offsetting - Legislation and Wetlands - NPCA's Role
Biodiversity Offsetting - Legislation and Wetlands - NPCA's Role
 
P ri nz conference - michael field (2)
P ri nz conference - michael field (2)P ri nz conference - michael field (2)
P ri nz conference - michael field (2)
 
10 Social Theories You Need To Know
10 Social Theories You Need To Know10 Social Theories You Need To Know
10 Social Theories You Need To Know
 
Utilizing Wetlands, Protecting Marine Resources Powerpoint
Utilizing Wetlands, Protecting Marine Resources PowerpointUtilizing Wetlands, Protecting Marine Resources Powerpoint
Utilizing Wetlands, Protecting Marine Resources Powerpoint
 
Chapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test ItemsChapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test Items
 
Subjective vs Objective test
Subjective vs Objective testSubjective vs Objective test
Subjective vs Objective test
 
Lesson 4 analysis of test results
Lesson 4 analysis of test resultsLesson 4 analysis of test results
Lesson 4 analysis of test results
 
Tania goel kuznets curve
Tania goel kuznets curveTania goel kuznets curve
Tania goel kuznets curve
 
Test validity
Test validityTest validity
Test validity
 
T est item analysis
T est item analysisT est item analysis
T est item analysis
 
Impacts of wetland degradation
Impacts of wetland degradationImpacts of wetland degradation
Impacts of wetland degradation
 

Similaire à 11 adaptive testing-irt

AQM Presentation by Kathleen Preston on Jan 9, 2009
AQM Presentation by Kathleen Preston on Jan 9, 2009AQM Presentation by Kathleen Preston on Jan 9, 2009
AQM Presentation by Kathleen Preston on Jan 9, 2009guestbeb22e
 
20060411ahp 0411-130118075335-phpapp01
20060411ahp 0411-130118075335-phpapp0120060411ahp 0411-130118075335-phpapp01
20060411ahp 0411-130118075335-phpapp01Mr Garg
 
HUDE 225Take Home Directions You are a psychologist working a.docx
HUDE 225Take Home Directions You are a psychologist working a.docxHUDE 225Take Home Directions You are a psychologist working a.docx
HUDE 225Take Home Directions You are a psychologist working a.docxwellesleyterresa
 
Caveon Webinar Series: Using Decision Theory for Accurate Pass/Fail Decisions
Caveon Webinar Series: Using Decision Theory for Accurate Pass/Fail Decisions Caveon Webinar Series: Using Decision Theory for Accurate Pass/Fail Decisions
Caveon Webinar Series: Using Decision Theory for Accurate Pass/Fail Decisions Caveon Test Security
 
20060411 Analytic Hierarchy Process (AHP)
20060411 Analytic Hierarchy Process (AHP)20060411 Analytic Hierarchy Process (AHP)
20060411 Analytic Hierarchy Process (AHP)Will Shen
 
Irene Martelli - PhD presentation
Irene Martelli - PhD presentationIrene Martelli - PhD presentation
Irene Martelli - PhD presentationIrene Martelli
 
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...PyData
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningcsandit
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...cscpconf
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive ModellingRajiv Advani
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svmtaikhoan262
 

Similaire à 11 adaptive testing-irt (20)

AQM Presentation by Kathleen Preston on Jan 9, 2009
AQM Presentation by Kathleen Preston on Jan 9, 2009AQM Presentation by Kathleen Preston on Jan 9, 2009
AQM Presentation by Kathleen Preston on Jan 9, 2009
 
evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 2
 
20060411ahp 0411-130118075335-phpapp01
20060411ahp 0411-130118075335-phpapp0120060411ahp 0411-130118075335-phpapp01
20060411ahp 0411-130118075335-phpapp01
 
HUDE 225Take Home Directions You are a psychologist working a.docx
HUDE 225Take Home Directions You are a psychologist working a.docxHUDE 225Take Home Directions You are a psychologist working a.docx
HUDE 225Take Home Directions You are a psychologist working a.docx
 
Slide Psikologi.docx
Slide Psikologi.docxSlide Psikologi.docx
Slide Psikologi.docx
 
Ga
GaGa
Ga
 
Caveon Webinar Series: Using Decision Theory for Accurate Pass/Fail Decisions
Caveon Webinar Series: Using Decision Theory for Accurate Pass/Fail Decisions Caveon Webinar Series: Using Decision Theory for Accurate Pass/Fail Decisions
Caveon Webinar Series: Using Decision Theory for Accurate Pass/Fail Decisions
 
20060411 Analytic Hierarchy Process (AHP)
20060411 Analytic Hierarchy Process (AHP)20060411 Analytic Hierarchy Process (AHP)
20060411 Analytic Hierarchy Process (AHP)
 
Irene Martelli - PhD presentation
Irene Martelli - PhD presentationIrene Martelli - PhD presentation
Irene Martelli - PhD presentation
 
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
F5 add maths year plan 2013
F5 add maths year plan 2013F5 add maths year plan 2013
F5 add maths year plan 2013
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
Machine learning
Machine learningMachine learning
Machine learning
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Confirmatory Factor Analysis
Confirmatory Factor AnalysisConfirmatory Factor Analysis
Confirmatory Factor Analysis
 
Guide
GuideGuide
Guide
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
 

Plus de 宥均 林

Y1 midterm presentation
Y1 midterm presentationY1 midterm presentation
Y1 midterm presentation宥均 林
 
A research directions for e-learning technologies
A research directions for e-learning technologiesA research directions for e-learning technologies
A research directions for e-learning technologies宥均 林
 
15 selected topics for e-learning technologies (dtv)
15 selected topics for e-learning technologies (dtv)15 selected topics for e-learning technologies (dtv)
15 selected topics for e-learning technologies (dtv)宥均 林
 
14 selected topics for e-learning technologies (gbl)
14 selected topics for e-learning technologies (gbl)14 selected topics for e-learning technologies (gbl)
14 selected topics for e-learning technologies (gbl)宥均 林
 
13 selected topics for e-learning technologies (ml).pptx
13 selected topics for e-learning technologies (ml).pptx13 selected topics for e-learning technologies (ml).pptx
13 selected topics for e-learning technologies (ml).pptx宥均 林
 
10 intelligent tutoring-spc
10 intelligent tutoring-spc10 intelligent tutoring-spc
10 intelligent tutoring-spc宥均 林
 
09 commercial distance learning software systems
09 commercial distance learning software systems09 commercial distance learning software systems
09 commercial distance learning software systems宥均 林
 
08 learning object repository with cordra
08 learning object repository with cordra08 learning object repository with cordra
08 learning object repository with cordra宥均 林
 
07 distance learning standards-common cartridge
07 distance learning standards-common cartridge07 distance learning standards-common cartridge
07 distance learning standards-common cartridge宥均 林
 
06 distance learning standards-qti
06 distance learning standards-qti06 distance learning standards-qti
06 distance learning standards-qti宥均 林
 
05 distance learning standards-scorm research
05 distance learning standards-scorm research05 distance learning standards-scorm research
05 distance learning standards-scorm research宥均 林
 
04 distance learning standards-scorm specification
04 distance learning standards-scorm specification04 distance learning standards-scorm specification
04 distance learning standards-scorm specification宥均 林
 
03 synchronized distance learning
03 synchronized distance learning03 synchronized distance learning
03 synchronized distance learning宥均 林
 
02 asynchronized distance learning
02 asynchronized distance learning02 asynchronized distance learning
02 asynchronized distance learning宥均 林
 
00 special events
00 special events00 special events
00 special events宥均 林
 
01 overview of distance learning technologies
01 overview of distance learning technologies01 overview of distance learning technologies
01 overview of distance learning technologies宥均 林
 
Y2 final presentation
Y2 final presentationY2 final presentation
Y2 final presentation宥均 林
 
01 overview of distance learning technologies
01 overview of distance learning technologies01 overview of distance learning technologies
01 overview of distance learning technologies宥均 林
 

Plus de 宥均 林 (20)

Y1 midterm presentation
Y1 midterm presentationY1 midterm presentation
Y1 midterm presentation
 
B references
B referencesB references
B references
 
A research directions for e-learning technologies
A research directions for e-learning technologiesA research directions for e-learning technologies
A research directions for e-learning technologies
 
15 selected topics for e-learning technologies (dtv)
15 selected topics for e-learning technologies (dtv)15 selected topics for e-learning technologies (dtv)
15 selected topics for e-learning technologies (dtv)
 
14 selected topics for e-learning technologies (gbl)
14 selected topics for e-learning technologies (gbl)14 selected topics for e-learning technologies (gbl)
14 selected topics for e-learning technologies (gbl)
 
13 selected topics for e-learning technologies (ml).pptx
13 selected topics for e-learning technologies (ml).pptx13 selected topics for e-learning technologies (ml).pptx
13 selected topics for e-learning technologies (ml).pptx
 
10 intelligent tutoring-spc
10 intelligent tutoring-spc10 intelligent tutoring-spc
10 intelligent tutoring-spc
 
09 commercial distance learning software systems
09 commercial distance learning software systems09 commercial distance learning software systems
09 commercial distance learning software systems
 
08 learning object repository with cordra
08 learning object repository with cordra08 learning object repository with cordra
08 learning object repository with cordra
 
07 distance learning standards-common cartridge
07 distance learning standards-common cartridge07 distance learning standards-common cartridge
07 distance learning standards-common cartridge
 
06 distance learning standards-qti
06 distance learning standards-qti06 distance learning standards-qti
06 distance learning standards-qti
 
05 distance learning standards-scorm research
05 distance learning standards-scorm research05 distance learning standards-scorm research
05 distance learning standards-scorm research
 
04 distance learning standards-scorm specification
04 distance learning standards-scorm specification04 distance learning standards-scorm specification
04 distance learning standards-scorm specification
 
03 synchronized distance learning
03 synchronized distance learning03 synchronized distance learning
03 synchronized distance learning
 
02 asynchronized distance learning
02 asynchronized distance learning02 asynchronized distance learning
02 asynchronized distance learning
 
00 syllabus
00 syllabus00 syllabus
00 syllabus
 
00 special events
00 special events00 special events
00 special events
 
01 overview of distance learning technologies
01 overview of distance learning technologies01 overview of distance learning technologies
01 overview of distance learning technologies
 
Y2 final presentation
Y2 final presentationY2 final presentation
Y2 final presentation
 
01 overview of distance learning technologies
01 overview of distance learning technologies01 overview of distance learning technologies
01 overview of distance learning technologies
 

11 adaptive testing-irt

  • 1. Adaptive Testing (Item Respond Theory) Timothy K. Shih
  • 2. Item Response Theory 1. The Item Characteristic Curve 2. Item Characteristic Curve Models 3. Estimating Item Parameters 4. The Test Characteristic Curve 5. Estimating an Examinee’s Ability 6. The Information Function 7. Test Calibration 8. Specifying the Characteristics of a Test Source: FRANK B. BAKER, University of Wisconsin
  • 3. Item Characteristic Curve • What is Item Characteristic Curve – Certain probability that an examinee with the ability will give a correct answer to the item – This probability is denoted by P 1.The Item Characteristic Curve
  • 4. Item Characteristic Curve under one-parameter model 1.The Item Characteristic Curve Higher ability  higher probability
  • 5. 3 Item Characteristic Curve with same discrimination 1.The Item Characteristic Curve Higher difficulty  lower probability
  • 6. 3 Item Characteristic Curve with same difficulty 1.The Item Characteristic Curve Higher discrimination  lower probability
  • 7. Logistic Function • The Logistic Function – e is the constant 2.718 – b is the difficulty • typical value is between -3 to 3 – a is the discrimination • typical value is between -2.80 to 2.80 – L = a(Θ-b) is the logistic deviate – Θ is an ability level b-a- e1 1 e1 1 P L 2. Item Characteristic Curve Models
  • 8. Logistic Function (two-parameter model) • Example: – b = 1.0 (difficulty); a = 0.5 (discrimination) – Illustrative computation with ability level: -3 (Θ=-3) 1.L = a(Θ-b) = 0.5*(-3.0-1.0) = -2.0 2.EXP(-L) = EXP(2.0) = 2.7182.0 = 7.389 3.1+ EXP(-L) = 1 + 7.389 = 8.389 4.P(Θ) = 1/(1+EXP(-L)) = 1/8.389 = 0.12 2. Item Characteristic Curve Models
  • 9. Logistic Function (two-parameter model) Ability Logit EXP(-L) 1+EXP(-L) P -3 -2 7.389 8.389 0.12 -2 -1.5 4.482 5.482 0.18 -1 -1 2.718 3.718 0.27 0 -0.5 1.649 2.649 0.38 1 0 1 2 0.5 2 0.5 0.607 1.607 0.26 3 1 0.368 1.368 0.73 2. Item Characteristic Curve Models
  • 10. Logistic Function (two-parameter model) 2. Item Characteristic Curve Models b = 1.0 (difficulty); a = 0.5 (discrimination)
  • 11. Logistic Function (one-parameter model) • One Parameter Logistic Model (Rasch) – The discrimination parameter of the two- parameter logistic model is fixed at a value of a = 1.0 for all items; only the difficulty parameter can take on different values b ee 1b-a- 1 1 1 1 P 2. Item Characteristic Curve Models b = difficulty a = discrimination
  • 12. Logistic Function (one-parameter model) • Example: – b = 1.0 (difficulty) – Illustrative computation with ability level: -3 (Θ=-3) 1.L = Θ-1.0 = -3.0-1.0 = -4.0 2.EXP(-L) = EXP(4.0) = 2.7184.0 = 54.598 3.1+ EXP(-L) = 1 + 54.598 = 55.598 4.P(Θ) = 1/(1+EXP(-L)) = 1/55.598 = 0.02 2. Item Characteristic Curve Models
  • 13. Logistic Function (one-parameter model) Ability Logit EXP(-L) 1+EXP(-L) P -3 -4 54.598 55.598 0.02 -2 -3 20.086 21.086 0.05 -1 -2 7.389 8.389 0.12 0 -1 2.718 3.718 0.27 1 0 1 2 0.5 2 1 0.368 1.368 0.73 3 2 0.135 1.135 0.88 2. Item Characteristic Curve Models
  • 14. Logistic Function (one-parameter model) 2. Item Characteristic Curve Models a = 1.0 (fixed) b = 1.0
  • 15. Logistic Function (three-parameter model) • Three Parameter Model – One of the facts of life in testing is that examinees will get items correct by guessing. Thus, the probability of correct response includes a small component that is due to guessing. – b is difficulty – a is discrimination – c is guessing » Theoretical value is between 0 to 1.0 » But c>0.35 are not considered acceptable » Hence c is between 0 to 0.35 – Θ is an ability level b-a- 1 1 1P e cc 2. Item Characteristic Curve Models That is why multiple choice questions have 4 answers
  • 16. Logistic Function (three-parameter model) • Example: – b = 1.5 (difficulty); a = 1.3 (discrimination); c = 0.2 (guessing) – Illustrative computation with ability level: -3 (Θ=-3) 1.L = a(Θ-b) = 1.3*(-3.0-1.5) = -5.85 2.EXP(-L) = EXP(5.85) = 2.7185.85 = 347.234 3.1+ EXP(-L) = 1 + 347.234 = 348.234 4.1/(1+EXP(-L)) = 1/ 348.234 = 0.0029 5.P(Θ) = c + (1 - c) * 0.0029 = 0.2 + (1 - 0.2) * 0.0029 = 0.2 + 0.8 * 0.0029 = 0.2 + 0.0023 = 0.2023 2. Item Characteristic Curve Models
  • 17. Logistic Function (three-parameter model) Ability Logit EXP(-L) 1+EXP(-L) P -3 -5.85 347.234 348.234 0.2 -2 -4.55 94.632 95.632 0.21 -1 -3.25 25.79 26.79 0.23 0 -1.95 7.029 8.029 0.3 1 -0.65 1.916 2.916 0.47 2 0.65 0.522 1.522 0.73 3 1.95 0.142 1.142 0.9 2. Item Characteristic Curve Models
  • 18. Logistic Function (three-parameter model) 2. Item Characteristic Curve Models –b = 1.5 (difficulty); a = 1.3 (discrimination); c = 0.2 (guessing)
  • 19. Negative Discrimination • While most test items will discriminate in a positive manner – the probability of correct response increases as the ability level increases • Some items have negative discrimination. In such items, the probability of correct response decreases as the ability level increases from low to high 2. Item Characteristic Curve Models
  • 20. Negative Discrimination 2. Item Characteristic Curve Models
  • 21. Negative Discrimination Items with negative discrimination occur in two ways. • the incorrect response to a two-choice item will always have a negative discrimination parameter if the correct response has a positive value. • sometimes the correct response to an item will yield a negative discrimination index. • This tells you that something is wrong with the item: – Either it is poorly written or there is some misinformation prevalent among the high-ability students. • For most of the item response theory topics of interest, the value of the discrimination parameter will be positive. 2. Item Characteristic Curve Models
  • 23. Discussion 1. The two item characteristic curves have the same value for the difficulty parameter (b = 1.0) 2. And the discrimination parameters have the same absolute value. However, they have opposite signs, with the correct response being positive and the incorrect response being negative. 2. Item Characteristic Curve Models
  • 24. Observed Proportion • M examinees responds to the N items in the test – These examinees will be divided into, J groups along the scale so that all the examinees within a given group have the same ability level θj • And there will be mj examinees within group j, where j = 1, 2, 3. . . . J. – Within a particular ability score group, rj examinees answer the given item correctly. • at an ability level of θj, the observed proportion of correct response is p(θj ) = rj/mj • p(θj ) is an estimation of the probability of correct response at ability level θj 3. Estimating Item Parameters
  • 25. Observed Proportion • If the observed proportions of correct response in each ability group are plotted, the result will look like this 3. Estimating Item Parameters
  • 26. Find the ICC best fits the observed proportions of correct response 1. Select a model for the curve to be fitted – two-parameter model will be employed here 2. Initial values for the item parameters – b = 0.0, a = 1.0 3. Using these estimates, the value of P(θj) is computed at each ability level via the equation of the two-parameter model. 4. The agreement of the observed value of p(θj) and computed value P(θj) is determined across all ability groups. 5. Adjustments to the estimated item parameters are found that result in better agreement between the ICC defined by the estimated values of the parameters and the observed proportions of correct response. 6. This process is continued until the adjustments get so small that little improvement in the agreement is possible. 7. At this point, the estimation procedure is terminated and the current values of b and a are the item parameter estimates. 3. Estimating Item Parameters
  • 27. The Chi-square goodness- of-fit index – J is the number of ability groups – Θj is the ability level of group j – mj is the number of examinees having ability Θj – p(Θj) is the observed proportion of correct response for group j – P(Θj) is the probability of correct response for group j computed from the ICC model using the parameter estimates J j jj jj j QP Pp mx 1 2 2 3. Estimating Item Parameters
  • 28. The Chi-square goodness- of-fit index • If the value of the “Chi-square goodness- of-fit index” is greater than a criterion value – the item characteristic curve specified by the values of the item parameter estimates does not fit the data • the wrong item characteristic curve model may have been employed. • the values of the observed proportions of correct response are so widely scattered that a good fit, regardless of model, cannot be obtained. 3. Estimating Item Parameters
  • 29. The Group Invariance of Item Parameters • Assume two groups of examinees are drawn from the same population of examinees • The first group has a range of ability scores from -3 to -1, with a mean of -2; The second group has a range of ability scores from +1 to +3 with a mean of +2 • the observed proportion of correct response to a given item is computed from the item response data for every ability level within each of the two groups. 3. Estimating Item Parameters
  • 30. The Group Invariance of Item Parameters For the first group, the proportions of correct response are plotted as this The maximum likelihood procedure is then used to fit an item characteristic curve to the data and numerical values of the item parameter estimates, b(1) = -.39 and a(1) = 1.27, were obtained. 3. Estimating Item Parameters
  • 31. The Group Invariance of Item Parameters For the second group, the proportions of correct response are plotted like this The maximum likelihood procedure is then used to fit an item characteristic curve to the data and numerical values of the item parameter estimates, b(1) = -.39 and a(1) = 1.27, were obtained. 3. Estimating Item Parameters
  • 32. The Group Invariance of Item Parameters 3. Estimating Item Parameters • b(1) = b(2) and a(1) = a(2) • The item parameters are group invariant. • The values of the item parameters are a property of the item, not of the group that responded to the item. • The value of the classical item difficulty index is not group invariant.
  • 33. True score N 1i jij θPTS TSj is the true score for examinees with ability level θj. i denotes an item Pi(θj ) depends upon the particular ICC model employed (i.e., computed from the ICC model) 4. The Test Characteristic Curve
  • 34. True score • Example – with two-parameter model; at an ability level of 1.0. – Item 1: P1 (1.0) = 1/(1 + exp(-0.5(1.0 - (-1.0)))) = 0.73 – Item2: P2 (1.0) = 1/(1 + exp(-1.2 (1.0- (0.75)))) = 0.57 – Item3: P3 (1.0) = 1/(1 + exp(-0.8 (1.0 -(0)))) = 0.69 – Item4: P4 (1.0) = 1/(1 + exp(-1.0 (1.0 - (0.5)))) = 0.62 4. The Test Characteristic Curve
  • 35. True score 4. The Test Characteristic Curve
  • 37. Test Characteristic Curve • Test Characteristic Curve (TCC) – The vertical axis would be the true scores and would range from zero to the number of items in the test – The horizontal axis would be the ability scale 4. The Test Characteristic Curve
  • 38. Test Characteristic Curve • The primary role of the TCC in IRT is to provide a means of transforming ability scores to true scores • Given your ability, provides your “True Score” 4. The Test Characteristic Curve
  • 39. Primary purpose for administering a test to an examinee • Under IRT, the primary purpose for administering a test to an examinee is to locate that person on the ability scale. If such an ability measure can be obtained for each person taking the test, two goals can be achieved. – The examinee can be evaluated in terms of how much underlying ability he or she possesses. – Comparisons among examinees can be made for purposes of assigning grades, awarding scholarships, etc. 5. Estimating an Examinee’s Ability
  • 40. Estimating an Examinee’s Ability • Ability Estimation Procedures N i SiSii N i Siii SS QPa Pua 1 ^^ 2 1 ^ ^ 1 ^ Θ^ s is the estimated ability of the examinee within iteration s ai is the discrimination parameter of item i, i = 1, 2, . . . .N ui is the response made by the examinee to item i: ui = 1 for a correct response ui = 0 for an incorrect response Pi(θ^ s ) is the probability of correct response to item i, under the given ICC model, at ability level θ^ within iteration s. Qi (θ^ s ) = 1 - Pi(θ^ s ) is the probability of incorrect response to item i, under the given ICC model, at ability level θ^ within iteration s. 5. Estimating an Examinee’s Ability
  • 41. Estimating an Examinee’s Ability • Example – 3 items test: • Item_1: b=-1; a= 1.0 • Item_2: b=0; a=1.2 • Item_3: b=1; a=0.8 – Under ICC two- parameter model – The examinee’s item responses were: • Item_1: 1 • Item_2: 0 • Item_3: 1 item u P(1) Q =(1-P) a(u-P) a*a(PQ) 1 1 0.88 0.12 0.119 0.105 2 0 0.77 0.23 -0.922 0.255 3 1 0.5 0.5 0.4 0.160 sum -0.403 0.52 The examinee’s ability is set to θ^ s = 1.0 ΔΘ^ s = -0.403/0.520 = -0.773, Θ^ s+1 = 1.0 - 0.773 = 0.227 1’st iteration: 5. Estimating an Examinee’s Ability
  • 42. Estimating an Examinee’s Ability item u P(0.227) Q =(1-P) a(u-P) a*a(PQ) 1 1 0.77 0.23 0.227 0.175 2 0 0.57 0.43 -0.681 0.353 3 1 0.35 0.65 0.520 0.146 sum 0.066 0.674 2’nd iteration: ΔΘ^ s = 0.066/0.674 = 0.097, Θ^ s+1 = 0.227 + 0.097 = 0.324 item u P(0.324) Q =(1-P) a(u-P) a*a(PQ) 1 1 0.79 0.21 0.2102 0.1660 2 0 0.60 0.40 -0.7152 0.3467 3 1 0.37 0.63 0.5056 0.1488 sum 0.0006 0.6615 3’rd iteration: ΔΘ^ s = 0.0006/0.6615 = 0.0009, Θ^ s+1 = 0.324 + 0.0009 = 0.3249 5. Estimating an Examinee’s Ability The iteration is terminated because the value of the adjustment (0.0009) is very small. The examinee’s estimated ability is 0.3249
  • 43. Standard error • The standard error is a measure of the variability of the values of θ^ around the examinee’s unknown parameter value θ. 5. Estimating an Examinee’s Ability N i i QPa SE 1 ^^ 2 ^ 1
  • 44. Standard error 5. Estimating an Examinee’s Ability item u P(0.324) Q =(1-P) a(u-P) a*a(PQ) 1 1 0.79 0.21 0.2102 0.1660 2 0 0.60 0.40 -0.7152 0.3467 3 1 0.37 0.63 0.5056 0.1488 sum 0.0006 0.6615 23.1 6615.0 1^ SE
  • 45. Estimating an Examinee’s Ability • The examinee’s ability (0.3249) was not estimated very precisely because the standard error is very large (1.23). – This is primarily due to the fact that only three items were used here and one would not expect a very good estimate. 5. Estimating an Examinee’s Ability
  • 46. Estimating an Examinee’s Ability • Two cases for the estimation procedure fails to yield an ability estimate – When an examinee answers none of the items correctly • the corresponding ability estimate is negative infinity. – When an examinee answers all the items in the test correctly • the corresponding ability estimate is positive infinity. • The computer programs used to estimate ability must protect themselves against these two conditions 5. Estimating an Examinee’s Ability
  • 47. Item Invariance of an Examinee’s Ability Estimate • The examinee’s ability is invariant with respect to the items used to determine it – All the items measure the same underlying latent trait – The values of all the item parameters are in a common metric 5. Estimating an Examinee’s Ability
  • 48. Item Invariance of an Examinee’s Ability Estimate • A set of 10 items having an average difficulty of -2 were administered to this examinee – the item responses could be used to estimate the examinee’s ability, yielding θ^ 1 for this test. • Another set of 10 items having an average difficulty of +1 were also administered to this examinee – these item responses could be used to estimate the examinee’s ability, yielding θ^ 2 for this second test. • Under the item invariance principle – θ^ 1 = θ^ 2 – i.e., the two sets of items should yield the same ability estimate, within sampling variation, for the examinee 5. Estimating an Examinee’s Ability
  • 49. The Information Function • What’s “Information” – having information => knowing something about a particular object or topic – In statistics & psychometrics • The reciprocal of the precision with which a parameter could be estimated 6. The Information Function
  • 50. The Information Function • Measure of precision is the variance of the estimators, denote by σ2 • The amount of information, denoted by I 6. The Information Function 2 1 I
  • 51. The Information Function • If the amount of information is large, it means that an examinee whose true ability is at that level can be estimated with precision; – i.e., all the estimates will be reasonably close to the true value • If the amount of information is small, it means that the ability cannot be estimated with precision and the estimates will be widely scattered about the true ability 6. The Information Function
  • 52. The Information Function The amount of information has a maximum at an ability level of -1.0 and is about 3 for the ability range of -2<= θ <= 0. Within this range, ability is estimated with some precision. Outside this range, the amount of information decreases rapidly, and the corresponding ability levels are not estimated very well. 6. The Information Function • The information function does not depend upon the distribution of examinees over the ability scale. • In a general purpose test, the ideal information function would be a horizontal line at some large value of I and all ability levels would be estimated with the same precision. • Unfortunately, such an information function is hard to achieve. • Different ability levels are estimated with differing degrees of precision.
  • 53. Item Information Function 6. The Information Function 1. The amount of information, based upon a single item, can be computed at any ability level and is denoted by Ii (θ ), where i indexes the item. 2. Because only a single item is involved, the amount of information at any point on the ability scale is going to be rather small. 3. The amount of item information decreases as the ability level departs from the item difficulty and approaches zero at the extremes of the ability scale.
  • 54. Definition of Item Information • Two-Parameter Item Characteristic Curve Model iiii QPaI 2 ai is the discrimination parameter for item I Pi(θ) = 1 / (1 + EXP(-ai(θ - bi))) Qi(θ) =1 - Pi(θ) θ is the ability level of interest 6. The Information Function
  • 55. Definition of Item Information θ L EXP(-L) Pi(θ) Qi(θ) Pi(θ)Qi(θ) a2 Ii(θ) -3 -6 403.43 0.00 1.00 0.00 2.25 0.00 -2 -4.5 90.02 0.01 0.99 0.01 2.25 0.02 -1 -3.0 20.09 0.05 0.95 0.05 2.25 0.11 0 -1.5 4.48 0.18 0.82 0.15 2.25 0.34 1 0.0 1.00 0.50 0.50 0.25 2.25 0.56 2 1.5 0.22 0.82 0.18 0.15 2.25 0.34 3 3.0 0.05 0.95 0.05 0.05 2.25 0.11 Calculation of item information under a two-parameter model b = 1.0, a = 1.5 6. The Information Function
  • 56. Definition of Item Information 6. The Information Function
  • 57. Definition of Item Information • One-Parameter Item Characteristic Curve Model iii QPI Pi(θ) = 1 / (1 + EXP(-(θ - bi))) Qi(θ) =1 - Pi(θ) θ is the ability level of interest 6. The Information Function
  • 58. Definition of Item Information θ L EXP(-L) Pi(θ) Qi(θ) Pi(θ)Qi(θ) a2 Ii(θ) -3 -4.0 45.60 0.02 0.98 0.02 1 0.02 -2 -3.0 20.09 0.05 0.95 0.05 1 0.05 -1 -2.0 7.39 0.12 0.88 0.11 1 0.11 0 -1.0 2.72 0.27 0.73 0.20 1 0.20 1 0.0 1.00 0.50 0.50 0.25 1 0.25 2 1.0 0.37 0.73 0.27 0.20 1 0.20 3 2.0 0.14 0.88 0.12 0.11 1 0.11 Calculation of item information under a one-parameter model b = 1.0 6. The Information Function
  • 59. Definition of Item Information 6. The Information Function
  • 60. Definition of Item Information • Three-Parameter Item Characteristic Curve Model 2 2 2 1 c cP P Q aI i i i i Pi(θ) = c + (1 - c) (1/(1 + EXP (-L))) L = a (θ - b) Qi(θ) =1 - Pi(θ) θ is the ability level of interest 6. The Information Function
  • 61. Definition of Item Information • Example – b = 1.0; a = 1.5; c = 0.2 – ability level of θ = 0.0. 1. L = a (θ - b) = 1.5 (0 - 1) = -1.5 EXP (-L) = EXP (1.5) = 4.482 1/(1 + EXP (-L)) = 1/(1 + 4.482) = 0.182 Pi (θ ) = c + (1 - c) (1/(1 + EXP (-L))) = 0.2 + 0.8 (0.182) = 0.346 2. Qi (θ ) = 1 - 0.346 = 0.654 3. Qi (θ )/P1 (θ ) = 0.654/0.346 = 1.890 4. (Pi (θ ) - c)2 = (0.346 - 0.2)2 = (0.146)2 = 0.021 5. (1 - c)2 = (1 - 0.2)2 = (0.8)2 = 0.64 6. a2 = (1.5)2 = 2.25 7. Ii (θ ) = (2.25) (1.890) (0.021)/(0.64) = 0.142 2 2 2 1 c cP P Q aI i i i i 6. The Information Function
  • 62. Definition of Item Information θ L Pi(θ) Qi(θ) Pi(θ)Qi(θ) (Pi(θ)-c) Ii(θ) -3 -6.0 0.20 0.80 3.950 0.000 0.000 -2 -4.5 0.21 0.79 3.785 0.000 0.001 -1 -3.0 0.24 0.76 3.202 0.001 0.016 0 -1.5 0.35 0.65 1.890 0.021 0.142 1 0.0 0.60 0.40 0.667 0.160 0.375 2 1.5 0.85 0.15 0.171 0.428 0.257 3 3.0 0.96 0.04 0.040 0.481 0.082 Calculation of item information under a three-parameter model b = 1.0; a = 1.5; c = 0.2 6. The Information Function
  • 63. Definition of Item Information 6. The Information Function
  • 64. Test Information Function N i i II 1 I (θ) is the amount of test information at an ability level of θ Ii(θ) is the amount of information for item i at ability level θ N is the number of items in the test 6. The Information Function
  • 65. Computing a Test Information Function • Example – 5-item – Under two-parameter model item b a 1 -1.0 2.0 2 -0.5 1.5 3 -0.0 1.5 4 0.5 1.5 5 1.0 2.0 6. The Information Function
  • 66. Computing a Test Information Function θ 1 2 3 4 5 Test Information -3 0.071 0.051 0.024 0.012 0.001 0.159 -2 0.420 0.194 0.102 0.051 0.010 0.777 -1 1.000 0.490 0.336 0.194 0.071 2.091 0 0.420 0.490 0.563 0.490 0.420 2.383 1 0.071 0.194 0.336 0.490 1.000 2.091 2 0.010 0.051 0.102 0.194 0.420 0.777 3 0.001 0.012 0.024 0.051 0.071 0.159 6. The Information Function
  • 67. The Test Calibration Process • The Birnbaum paradigm is an iterative procedure employing two stages of maximum likelihood estimation. – Stage 1: the parameters of the N items in the test are estimated, – Stage 2: the ability parameters of the M examinees are estimated. • The two stages are performed iteratively until a stable set of parameter estimates is obtained • And the test has been calibrated and an ability scale metric defined 7. Test Calibration
  • 68. The Test Calibration Process • Stage one: – The estimated ability of each examinee is treated as if it is expressed in the true metric of the latent trait. – The parameters of each item in the test are estimated via the maximum likelihood procedure discussed in Estimating Item Parameters. – This is done one item at a time, because an underlying assumption is that the items are independent of each other. – The result is a set of values for the estimates of the parameters of the items in the test. 7. Test Calibration
  • 69. The Test Calibration Process • Stage two: – The ability of each examinee is estimated using the maximum likelihood procedure presented in Estimating an Examinee’s Ability – It is assumed that the ability of each examinee is independent of all other examinees. Hence, the ability estimates are obtained one examinee at a time 7. Test Calibration
  • 70. The Test Calibration Process • The two-stage process is repeated until some suitable convergence criterion is met • The overall effect is that the parameters of the N test items and the ability levels of the M examinees have been estimated simultaneously, even though they were done one at a time 7. Test Calibration
  • 71. Test Calibration Under the one-parameter Model 1 2 3 4 5 6 7 8 9 10 RS 01 0 0 1 0 0 0 0 1 0 0 2 02 1 0 1 0 0 0 0 0 0 0 2 03 1 1 1 0 1 0 1 0 0 0 5 04 1 1 1 0 1 0 0 0 0 0 4 05 0 0 0 0 1 0 0 0 0 0 1 06 1 1 0 1 0 0 0 0 0 0 3 07 1 0 0 0 0 1 1 1 0 0 4 08 1 0 0 0 1 1 0 0 1 0 4 09 1 0 1 0 0 1 0 0 1 0 4 10 1 0 0 0 1 0 0 0 0 1 3 11 1 1 1 1 1 1 1 1 1 0 9 12 1 1 1 1 1 1 1 1 1 0 9 13 1 1 1 0 1 0 1 0 0 1 6 14 1 1 1 1 1 1 1 1 1 0 9 15 1 1 0 1 1 1 1 1 1 1 9 16 1 1 1 1 1 1 1 1 1 1 10 1 for correct and 0 for incorrect. if an item is answered correctly by all of the examinees or by none of the examinees, its item difficulty parameter cannot be estimated. examinee items Test calibration under the Rasch model: all examinees having the same number of items correct will obtain the same estimated ability. 7. Test Calibration
  • 72. Test Calibration Under the one-parameter Model 1 2 3 4 5 6 7 8 9 10 ROW Total 1 1 1 2 1 2 1 4 3 2 1 1 1 1 6 4 4 1 2 2 3 1 1 2 16 5 1 1 1 1 1 5 6 1 1 1 1 1 1 6 9 4 4 2 4 4 4 4 4 4 2 36 COL Total 13 8 8 5 10 7 7 6 7 3 74 7. Test Calibration items score
  • 73. Test Calibration Under the one- parameter Model item difficulty 1 -2.37 2 -0.27 3 -0.27 4 0.98 5 -1 6 0.11 7 0.11 8 0.52 9 0.11 10 2.06 7. Test Calibration Examinee Ability obtained Raw Score 1 -1.50 2 2 -1.50 2 3 +0.02 5 4 -0.42 4 5 -2.37 1 6 -0.91 3 7 -0.42 4 8 -0.42 4 9 -0.42 4 10 -0.91 3 11 +2.33 9 12 +2.33 9 13 +0.46 6 14 +2.33 9 15 +2.33 9 16 ***** 10
  • 74. Test Calibration Under the one-parameter Model • Under the Rasch model, the value of the discrimination parameter is fixed at 1 for all of the items in the test. This aspect of the Rasch model is appealing to practitioners because they intuitively feel that examinees obtaining the same raw test score should receive the same ability estimate. 7. Test Calibration
  • 75. Test Calibration Under the 2/3-parameter Model • When the two- and three-parameter item characteristic curve models are used, an examinee’s ability estimate depends upon the particular pattern of item responses rather than the raw score. 7. Test Calibration
  • 76. Test Calibration Under the 2/3-parameter Model • Under these models, examinees with the same item response pattern will obtain the same ability estimate. Thus, examinees with the same raw score could obtain different ability estimates if they answered different items correctly. 7. Test Calibration
  • 77. The Framework of IRT • In order to obtain the many advantages of IRT, tests should be designed, constructed, analyzed, and interpreted within the framework of the theory. • This chapter provides the experiences in the technical aspects of test construction within the framework of IRT. 8. Specifying the Characteristics of a Test
  • 78. Item Banking • Test construction process is usually based upon having a collection of items from which to select those to be included in a particular test. (Item pools) • Items are selected from such pools on the basis of both their content and their technical characteristics, i.e., their item parameter values • Under IRT, a well-defined set of procedures is used to establish and maintain such item pools. item banking, has been given to these procedures 8. Specifying the Characteristics of a Test
  • 79. Item Banking • Basic Goal – have an item pool in which the values of the item parameters are expressed in a known ability-scale metric. 8. Specifying the Characteristics of a Test
  • 80. Developing a Test From a Pre-calibrated Item Pool • ICC model is selected, the examinees’ item response data are analyzed via the Birnbaum paradigm, and the test is calibrated. • The ability scale resulting from this calibration is considered to be the baseline metric of the item pool. • From a test construction point of view, we now have a set of items whose item parameter values are known; in technical terms, a “pre-calibrated item pool” exists. 8. Specifying the Characteristics of a Test
  • 81. Developing a Test From a Pre-calibrated Item Pool • The advantage of having a pre-calibrated item pool is that the parameter values of the items included in the test can be used to compute the test characteristic curve and the test information function before the test is administered. 8. Specifying the Characteristics of a Test
  • 82. Some Typical Testing Goals • Screening tests – Tests used for screening purposes have the capability to distinguish rather sharply between examinees whose abilities are just below a given ability level and those who are at or above that level. – Such tests are used to assign scholarships and to assign students to specific instructional programs such as remediation or advanced placement. 8. Specifying the Characteristics of a Test
  • 83. Some Typical Testing Goals • Broad-ranged tests – These tests are used to measure ability over a wide range of underlying ability scale. The primary purpose is to be able to make a statement about an examinee’s ability and to make comparisons among examinees. – Tests measuring reading or mathematics are typically broad-range tests. 8. Specifying the Characteristics of a Test
  • 84. Some Typical Testing Goals • Peaked tests – Such tests are designed to measure ability quite well in a region of the ability scale where most of the examinees’ abilities will be located, and less well outside this region. – When one deliberately creates a peaked test, it is to measure ability well in a range of ability that is wider than that of a screening test, but not as wide as that of a broad-range test. 8. Specifying the Characteristics of a Test
  • 85. Summary • Classical Test Theory • IRT – Item Characteristic Curve – Test Characteristic Curve – Estimating an Examinee’s Ability – Test Calibration – Item Banking • Automatic Test Generation