The document discusses test equating, which is the process of establishing comparable scores on different forms of a test. It covers topics such as why scaled scores are reported instead of raw scores, considerations in choosing a score scale, limitations of equating, different equating methods like linear and equipercentile equating, and different equating designs like single-group and anchor designs. It provides explanations of key concepts in test equating and guidelines for effective equating.
1. Menggunakan AlisJK
Equating
http://www.negeripelangi.org/index.php/id/produk/alisjk
Wildan Maulana
wildan.m@openthinklabs.com
2. Sasaran
● Explain why testing organizations report scaled scores instead of
raw scores.
● State two important considerations in choosing a score scale.
● Explain how equating differs from statistical prediction
● Explain why equating for individual test-takers is impossible.
● State the linear and equipercentile definitions of comparable
scores and explain why they are meaningful only with reference to
a population of test-takers.
3. Sasaran
● Explain why linear equating leads to out-of-range scores
and is heavily group-dependent and how equipercentile
equating avoids these problems.
● Explain why equipercentile equating requires “smoothing.”
● Explain how the precision of equating (by any method) is
limited by the discreteness of the score scale.
● Describe five data collection designs for equating and
state the main advantages and limitations of each.
4. Sasaran
● Explain the problems of “scale drift” and “equating strains.”
● State at least six practical guidelines for selecting common
items for anchor equating.
● Explain the fundamental assumption of anchor equating and
explain how it differs for different equating methods.
● Explain the logic of chained equating methods in an anchor
equating design.
● Explain the logic of equating methods that condition on
anchor scores and the conditions under which these
methods are biased.
5. Pembahasan
● Penyetaraan Tes
● Rancangan Penyetaraan Tes
● Metode Penyetaraan
● Bentuk-Bentuk Penyetaraan Tes
9. Limitations of Equating
● Equating cannot adjust scores correctly for
every individual test-taker.
● Equating cannot adjust scores correctly for
every possible group of test-takers.
10. Yang Harus Diperhatikan
Lord (Hambleton & Swaminathan, 1985)
● Perangkat tes yang mengukur sifat dan kemampuan
berbeda tidak dapat disetarakan
● Skor mentah perangkat tes yang tidak sama
realibilitasnya tidak disetarakan
● Skor mentah perangkat tes yang memiliki tingkat
kesukaran berbeda tidak dapat disetarakan
● Skor perangkat tes X dan Y tidak dapat disetarakan
tanpa adanya bukti bahwa kedua perangkat tes pararel
● Skor-skor yang berasal dari dua perangkat tes yang
berbeda materi tidak disetarakan
12. A General Definition of Equating
A score on the new form and a score on the
reference form are equivalent in a
group of test-takers if they represent the same
relative position in the group.
17. Equipercentile Equating
● To equate scores on the new form to scores on
the reference form in a group of test-takers,
transform each score on the new form to the
score on the reference form that has the same
percentile rank in that group.
28. Equating Designs
● The single-group design
● The counterbalanced design
● The equivalent-groups design
● The internal-anchor design
● The external-anchor design
33. Selecting “Common Items” for an Internal Anchor
● Include enough questions from the reference form
● Choose a set of questions that resembles the full test in
content and format
● Include questions that represent the full range of difficulty
● Don’t include any questions that have been changed.
● Try to avoid breaking up an “item set.”
34. Selecting “Common Items” for an Internal Anchor
● Don’t use questions at the end of the test as
anchor items, unless the time limit is very
generous
● Put each anchor item in approximately the
same position in the new form as it was in the
reference form
● Other things being equal, choose common
items that correlate well with the total score.
48. Test: Anchor Equating
● A test developer is assembling a new form of a
test that will be equated to a previous form by
means of an internal anchor consisting of
repeated questions (“common items”). The
reference form included a set of four questions
based on a particular reading passage, and the
test developer wants to include those questions in
the anchor. However, one of those questions has
been changed. What should the statistician tell
the test developer to do?
●
49. Test: Anchor Equating
● In what part of the score distribution does the standard error
of equating tend to be smallest?
● In chained equipercentile equating, what statistical
relationship is assumed to generalize
from the equating sample to the target population?
● In Tucker equating, what statistical relationship is assumed
to generalize from the equating sample to the target
population?
● Name an anchor equating method that equates the new
form to the anchor in one group of test-takers and equates
the anchor to the reference form in another group of test-
takers.
50. Test: Anchor Equating
● Name an anchor equating method that uses
data from the anchor test to estimate the mean
and standard deviation of the scores on each
form in the target population.
● Name an anchor equating method that tends to
give better results if the score distributions are
smoothed before the method is applied.
● Name an anchor equating method that requires
reliability estimates for the full test and the
anchor.
51. Test: Anchor Equating
● Name an anchor equating method that
produces an equating conversion that is correct
for every examinee in the new form equating
sample.
● Briefly describe the conditions under which the
Tucker equating method is heavily biased.
52. Hubungan (Linking) Antar Tes
(Kollen dan Brennan, 2004)
● Penyetaraan (Equiting)
● Concordance
● Prediksi (Prediction)
53. ● Rancangan Kelompok Tunggal (RKT) / Single
Group Decision
● Rancangan Kelompok Ekuivalen (RKE) /
Equivalen Group Design
● Rancangan dengan Butir Jangkar (RBJ)
54. Equating Designs
● The single-group design
● The counterbalanced design
● The equivalent-groups design
● The internal-anchor design
● The external-anchor design
56. Metode Penyetaraan Tes
(Anghoff, 1982; Lord, 1980)
● Metode Regresi
● Metode Rerata Sigma
● Metode Rerata dan Sigma Tegar
● Metode Kurva Karakteristik
61. Bentuk-Bentuk Penyetaraan Tes
● Penyetaraan Tes Vertikal
● Penyetaraan tes yang digunakan antar level yang
berbeda
● Penyetaraan Tes Horizontal
● Penyetaraan tes dimana terdapat dua paket tes
atau yang dikembangkan berdasarkan isi dan item
tes yang sama, namun lazimnya setiap paket tes
memiliki perbedaan tingkat kesulitan.
62. Empat Aspek Kesetaraan yang
Harus Diperhatikan
● Interferensi
● Konstruk
● Populasi
● Karakteristik dan Kondisi Pengukuran
63. Prosedur Penyetaraan Tes
● Uji Prasyarat
● Hasil Uji Post Hoc
– Dengan Uji Scheffe
– Dengan Uji Tukey
– Dengan Uji Bonferroni
– Dengan Uji LSD (Least Significance Differences)
● Hasil Uji Homogenitas Varians
● Hasil Uji Normalitas Skor Tiga Kelompok
● Estimasi Parameter Butir dan Kemampuan
● Estimasi Persamaan Penyetaraan
64. Referensi
● Equating Test Scores (Without IRT), Samuel A.
Livingston. Educational Testing Services (ETS),
2004
● Penyetaraan Tes UAN : Mengapa dan
Bagaimana, Sukirno DS, FISE Universitas
Negeri Yogyakarta
65. Terimakasih
wildan.m@openthinklabs.com http://www.openthinklabs.com
@wildanmaulana @openthinklabs
Untuk berdiskusi, silahkan bergabung di milis AlisJK :
http://groups.google.com/group/alisjk?hl=id