2. Validity
To assert that a test has construct validity; empirical
evidence is needed.
Subordinate forms of validity.
A valid test
measures
accurately what
it is intended to
measure.
Content
validity
Criterion-
related
validity
3. Content Validity
The content of the test constitutes a respresentative sample of
the skills it is supossed to measure = Content Validity
Specification of the skills or structures that the test is meant to
cover.
The test must include a
proper sample of the
relevant structures.
ATTENTION!
Content validation should
be carried out while a test
is being developed.
4. Criterion-related validity
•The test and the criterion are administer at the same time.
•Example: Oral exam. Long vs. Short version of the exam.
Random sampling.
•Levels of agreement = correlation coefficient.
•Perfect agreement = coefficient of 1. Lack of agreement =
coefficient of zero.
Concurrent
validity
•The degree to which a test can predict candidates’ future
performance.
•Example: Profiency test to predict a student’s ability to cope
with a graduate course at a British university. Criterion
measure: student’s English perceived by his supervisor or the
outcome of the course.
Predictive
validity
5. Validity in scoring
Items and the way in which they are scored
must valid.
Example: A reading test.
(Should we consider grammar and spelling
mistakes in the responses?)
6. A test is said to have face
validity…..
For example: A test to
measure pronunciation
ability.
7. How to make tests more valid
The scoring must
be related to what
is being testing.
Reliability!!
Whenever
feasible, use
direct
testing.
Write explicit
specifications
for the test.
8. Reliability
We have to: construct, administer and score items in
a way that we will obtain similar results in different
situations.
9. The reliability coefficient
To quantify the realibility of a test.
Ideal reliability
coefficient = 1 Would always
give the same
results.
Reliability
coefficient of
zero
Sets of results
unconnected
with each other.
10. It is required
to have two
sets of scores
to be
compared.
A group of students
take the same test
twice.
TEST-RETEST
METHOD
1. Too soon
(memorization
of the
answers)
2. Too late
(forgetting)
solution
Alternate forms
methods
Split half method =
only one
administration of
one test
11. Scorer reliability
If the scoring of a test is not reliable, then the test results cannot
be reliable either.
For example:
The scorer reliability coefficient on a composition writing
test = .92
The reliability coefficient for the test = .84
Variability in the performance of individual candidates
accounted for the differece between the two coefficients.
12. How to make tests more reliable
Take enough
samples of
behaviour
Exclude items
(weaker vs.
Stronger
students)
Do not allow
too much
freedom
Write
unambiguous
items
Provide clear
and explicit
instructions
Ensure that
tests are well
laid out and
perfectly
legiable
Make
candidates
familiar with
format and
testing
techniques
Provide
uniform
conditions of
administration
1. More items = more reliability
2. Too easy and too difficult items
3. Choice of questions
4. Unclear meaning of the items
5. The supposition that the students all understand the instructions
6. Institutional tests are badly typed
7. Unfamiliar aspects of the test
8. Precautions must be taken
13.
14. Ways of obtaining scorer reliability
Use items that
permit scoring
which is as
objective as
possible
Make comparisons
between
candidates as
direct as possible
Provide a detailed
scoring key
Train scorers
Agree acceptable
responses and
appropiate scorers
at outset of scoring
Identify candidates
by number, not
name
Employ multiple
independent
scoring
15. Reliability and validity
To be valid a test must provide consistently accurate
measurements.
A reliable test may not be valid at all.
Example: writing test
To make tests reliable, we must be wary of reducing
their validity.