This short SlideShare presentation explores a basic overview of test reliability and test validity. Validity is the degree to which a test measures what it is supposed to measure. Reliability is the degree to which a test consistently measures whatever it measures. Examples are given as well as a slide on considerations for writing test questions that demand higher-order thinking.
2. Considerations for Test Makers
• We need to make sure that our test will
gather the appropriate data.
– Does it relate to content covered?
– Will it adequately measure evidence of
student learning?
• We need to make sure that our test will
consistently work.
– What assumptions are in place?
– Materials, content, conditions, subjects?
3. Validity
• Validity is the degree to which a test
measures what it is supposed to
measure
• A test is not valid per se; it is valid for a
particular purpose and for a particular
group
4. Face Validity
• If a test appears to be designed
correctly, it has good face validity.
• “Looks good.”
• Appears that it should do what it’s
supposed to do.
– Ex. Algebra questions on an algebra test.
• Uses the very scientific “Biocular
Rejection Test.”
7. Validity
• Construct Validity
– A construct is a non-observable trait, such
as intelligence, which explains behavior
– The degree to which a test measures an
intended hypothetical construct
• Content Validity
– The degree to which a test measures an
intended content area
8. Validity
• Concurrent Validity
– The degree to which the scores on a test are
related to the scores on another, already
established test administered at the same
time.
– Use a measure of reliability: Pearson r,
Spearman rho
• Predictive Validity
– The degree to which a test can predict how
well an individual will do in a future situation
9. Reliability
• Reliability is the degree to which a test
consistently measures whatever it
measures.
• If you give the test over and over it will
consistently return the same results.
10. Reliability
• Test-Retest
– Degree to which scores are consistent over time
• Equivalent forms
– Two tests identical in every way except item order
• Split-half
– Give a test to group. Score ½ of test with ½ of
class, score other ½ of test with other ½ of class
• Interscorer/interrater
– A measure of the agreements and disagreements
of 2 or more judges
11. Can a measure be both
valid and reliable?
• Does it measure what it is supposed to
measure?
– Did you find what you were looking for?
• Does it measure what it is supposed to
measure and does it consistently
measure it, time after time?
– If you administered the same measure
many times would you get similar results?
13. Reliable but not valid.
Arrows on target, and grouped, indicating consistency. Yet not in
the bullseye—assumed to be the goal of the archer.
14. Neither reliable or valid.
Arrows not on target and not grouped, indicating inconsistency.
15. Both reliable and valid.
Arrows on target, and grouped in the bullseye, demonstrating both
accuracy (validity) and consistency (reliability).
16. Being both reliable and valid
• A test can be reliable, meaning test-
takers will get the same score no
matter when or where they take it.
– This does not mean that the test is valid or
measuring what it is supposed to
measure.
• A test can be reliable without being
valid, however, a test cannot be valid
unless it is reliable.
17. Consideration of the test format
• What will best test evidence of
learning?
– Multiple choice
– Matching
– Essay
– Open ended?
• Try to use questions that stimulate
higher-order thinking
19. Summary
• Validity is the degree to which a test
measures what it is supposed to measure
• Reliability is the degree to which a test
consistently measures whatever it measures.
• A test can be reliable without being valid,
however, a test cannot be valid unless it is
reliable.