3. Not expensive,
Within appropriate time constraint,
Relatively easy to administer,
A scoring/evaluation procedure that is
specific and time-efficient.
4. 1. Are administrative details clearly
established before the test?
2. Can students complete the test
reasonably within the set time frame?
3. Is the cost of the test within budget
limits?
5. Consistency of assessment results
(Linn & Gronlund).
A test is reliable if:
“You give the same test to the same
student or matched students on two
different occasions, the test should yield
similar results.” (Brown,2004)
6. Students-related reliability
Rater reliability
Test administration reliability
Test reliability
7. The most common learner-
related issue in reliability is
caused by temporary
illness, fatigue, a “bad
day”, anxiety, and other
physical or psychological
factors.
8. Inter-rater reliability:
When two or more scorers yield
inconsistent scores of the same test.
Factors: lack of attention to scoring,
inexperience, inattention, etc.
9. Intra-rater
Scoring criteria, fatigue, bias toward
particular “good” and “bad” students, or
simple carelessness.
10. It can be caused by administration
factors.
e.g. noisy from outside, photocopying
variations, room condition, even
condition of desks and chair.
11. Factors cause unreliability:
If a test too long, test takers may
become fatigued by the time they reach
the later items and hastily respond
incorrectly.
Ambiguous items.
12. “Measuring what should be measured”
o Content-related evidence
o Criterion-related evidence
o Construct-related evidence
o Consequential validity
o Face validity
13. If a test samples the subject matter
about which conclusions are to be drawn.
If a test requires the test-taker to
perform the behavior that is being
measured.
14. is used to demonstrate the accuracy of
a measure or procedure by comparing
it with another measure or procedure
which has been demonstrated to be
valid.
15. Example
imagine a hands-on driving test has been
shown to be an accurate test of driving
skills. By comparing the scores on the
written driving test with the scores from
the hands-on driving test, the written
test can be validated by using a criterion
related strategy in which the hands-on
driving test is compared to the written
test.
16. 1.Concurrent validity/ empiric validity
if a test result is supported by other
concurrent performance beyond
assessment itself.
e.g.
the validity of a high score on the final
exam of a foreign language course will be
substantiated by actual proficiency in the
language.
17. 2. Predictive validity
to assess (and predict) a test
taker’s likelihood of future success.
e.g SNMPTN
18. How well performance on the
assessment can be interpreted as
meaningful measure of some characteristics
or quality.
19. How well use of assessment results
accomplishes intended purposes and avoids
unintended effect.
20. It refers to the degree to which a test
looks right, and appears to measure the
knowledge or ability it claims to
measure, based on the subjective
judgment of the examinees who take
it, the administrative personnel who
decide on its use, and other
psychometrically unsophisticated
observers (Mousavi in Brown, 2004)
21.
22. The language as natural as possible.
Items contextualized rather than
isolated.
Topics meaningful
(relevant, interesting) for the learner.
Some thematic organization to items is
provided, such as through a story line or
episode.
Tasks represent, or closely
approximate, real-world tasks.
23. Contextualized Decontextualized
‘Going to”
1. What _______ this 1. There are three countries I
weekend? would like to visit. One is
a. you are going to do Italy.
b. are you going to do a. The other is New
c. your gonna do Zealand
and other is Nepal
b. The others are New
Zealand and Nepal
c. Others are New
Zealand and Nepal
24. Contextualized Contextualized
2. I’m not sure. 2. When I was twelve
_______ anything years old, I used
special? ______every day.
a.Are you going to a. swimming
do b. to swimming
b.You are going to do c. to swim
c. Is going to do
25. The effect of testing on teaching and
learning (Hughes in Brown, 2004).
Generally refers to the effects tests have
on instruction in terms of how students
prepare for the test (Brown, 2004).