Test equating using irt. final

test equating using irt
Presented by: Muhammad Munsif
Presented to: Dr. nasir mehmood
COURSE: test theories and designs
M.Phil. education (Evening 2019-2021)
munsifsail@gmail.com

TEST EQUATING
Equating is part of Item Analysis.
Used for large scale tests: Standardized tests/ Critereon
referenced test like SAT, IELTS, GAT

TEST EQUATING
Equating is a statistical process in which scores from
different test forms are adjusted so that they can be used
interchangeably.
“Equating is a statistical process to adjust scores on
different test forms to make them comparable (Kim &
Hanson, 2002)”.

TEST EQUATING
Each year a separate test is used to measure the abilities of
students. If the performance of the incoming students in a given
year is better than the performance of the students in the
preceding year, one can explain the difference in two ways.
1. Either the incoming students are more proficient or
2. The test they took was easier.

TEST EQUATING
In order to exclude the second explanation and make sure that the
observed trends are a valid reflection of changes in students’
abilities over time, the assessment mechanism needs to equate
scores from different test forms and adjust for variations in form
difficulty.

Test equating is a procedure “to provide comparable
scores on multiple forms of the same test, consequently
avoiding some of the possible inequities that could occur if
one examinee took a more difficult form of a test than that
taken by another examinee”

Conditions/Properties for Test Equating
Dorans (1990) describes four conditions necessary for
equating to be considered successful.
1. The test forms that are being equated must
measure the same construct and must be built to the
same content specifications.

2. A second property of a good equating process is that of
equity.
3. The equating transformation, or adjustment to scores,
must be symmetric.
4. The group invariance property, which states that the
transformation used to equate test scores should work
equally well in all subpopulations of interest.

TEST EQUATING USING IRT
IRT models were developed in the context of dichotomously
scored (0, 1) data e.g. MCQs or other tests, scored as
right/wrong.
Later IRT models have been expanded to accommodate
polytomously scored items, including Likert-type items,
performance ratings, and essay questions.

The mathematical function underlying IRT models
relates an examinee’s probability of a correct response
to the underlying level of the construct, commonly
termed theta (θ ) in the IRT literature.
Although theta is often referred to as “ability,” it could
represent any construct capable of being modeled in
terms of response probabilities.

IRT MODELS
IRT models are not only limited to educational applications.
Can be used with any type of latent construct.
Different IRT models are distinguished by the parameters
used to model the relation between theta and the response
probability.

IRT MODELS
The one-parameter model includes only a difficulty parameter, usually
symbolized as b.
The two-parameter model adds a discrimination parameter, a.
The three-parameter model adds a pseudo-guessing parameter, c.
This parameter adjusts the model to allow for the possibility that an
examinee might obtain the correct answer by guessing or other
construct-irrelevant behavior. Thus, even an examinee with very low
ability could have a probability greater than zero of obtaining the
correct answer.

IRT EQUATING DESIGNS
First determine that test forms are sufficiently similar in
content and difficulty.
Select an equating design.
An equating design is essentially a design for
administering the two (or more) test forms.

Single-Group Design
Administering both forms to the same sample.
Advantage
Scores on cannot be attributed to differences in the samples.
Disadvantage
Increased Testing time
Fatigue effects
practice effects

A safer way to get around this potential
problem
Splitting the group of test takers in two.
In one group, the new test form is taken first and the base
form second.
In the other group the order of the two forms is switched.
This design is known as the single- group counterbalanced
design.

Random-Groups Design
The test forms are typically spiraled by creating stacks of
tests in which the two (or more) forms are alternated (e.g., X,
Y, X, Y, X, Y).
Tests are then passed out to examinees in alternating order.
Advantage: Examinees do not have to take two test forms.
Differences in average scores on the two forms can be taken
as indications of differences in test difficulty.
This makes the equating process fairly straightforward.

The Disadvantage
It requires a large number of test takers because the
group is being split in half. According to Livingston
(2014), the random-groups design can require 5 –
15times as many examinees as the single-group
counterbalanced design to yield the same level of
equating accuracy.

Common Item Nonequivalent Groups
Design
In many cases, new forms of a test are developed and
administered some time after the base form of the test
was administered.
In such cases, it is clearly not possible to randomly split
those taking the two forms into groups.
For nonequivalent groups, this is accomplished by
including a set of common items (anchor/linking items).

These common items are included on both test forms and
are used to provide information about group differences on
the construct being measured.
Although the groups may differ in ability, they should not be
too different; otherwise the equating process will not be
able to completely adjust for differences across forms.
This is one of the drawbacks of the common-item
nonequivalent groups design.

The ability to interpret score differences in the
common-item nonequivalent groups design depends
heavily on the common-item set.
The common items should be in the same or very similar
locations on both tests because the order in which items
are presented can affect their level of difficulty.
Common items should be disbursed throughout the test,
rather than included in a separate block.

Question
What is Test Equating?
Tell names of IRT Models.
Name test equating designs.

Question
Could the following pairs of tests be equated? Why or
why not?
a. Two parallel versions of a reading comprehension test
b. Two personality tests that are based on different
theories of personality
c. Two tests of motor development, one for children up to
6 months of age and one for children aged 3 to 3 ½

Test equating using irt. final

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Test equating using irt. final

Similaire à Test equating using irt. final (20)

Plus de munsif123

Plus de munsif123 (20)

Dernier

Dernier (20)

Test equating using irt. final

Notes de l'éditeur