SlideShare une entreprise Scribd logo
1  sur  53
Reliability

  CHAPTER 4
What is Reliability?

1.
Defining Reliability

1.
Defining Reliability

1.
Defining Reliability

1.
Defining Reliability

1.
Defining Reliability

1.
Defining Reliability

1.
Defining Reliability

1.
Defining Reliability

1.
Test/Retest Reliability Estimates

1.
Test/Retest Reliability Estimates

1.
Test/Retest Reliability Estimates

A second problem with the Test/Retest method is the length
of time required to conduct the two test administrations.

A short delay between Time 1 and Time 2 increases the
potential for carry-over effects due to memory, fatigue,
practice, etc.

But a long delay between Time 1 and Time 2 increases the
potential for carry-over effects due to mood, developmental
change, etc.

Consequently, the Test/Retest method is most appropriate in
contexts wherein the test is not susceptible to carry-over
effects.
Parallel-Forms & Alternative-Forms Reliability Estimates

1.
Parallel-Forms & Alternative-Forms Reliability Estimates

1.
Parallel-Forms & Alternative-Forms Reliability Estimates

1.
Parallel-Forms & Alternative-Forms Reliability Estimates

1.
Internal Consistency Estimates of Reliability

We have seen that reliability estimates can be obtained by
administering the same test to the same examinees and by
correlating the results: Test/Retest

We have also seen that reliability estimates can be obtained by
administering two parallel or alternate forms of a test, and then
correlating those results: Parallel- & Alternate-Forms

In both of the above cases, the researcher must administer two
exams, and they are sometimes given at different times making
them susceptible to carry-over effects.

Here, we will see that it is possible to obtain a reliability estimate
using only a single test.

The most common way to obtain a reliability estimate using a
single test is through the Split-half approach.
Split-Half approach to Reliability

When using the Split-Half approach, one gives a single test to
a group of examinees.

Later, the test is divided into two parts, which may be
considered to be alternate forms of one another.
• In fact, the split is not so arbitrary; an attempt is made to
   choose the two halves so that they are parallel or
   essentially τ-equivalent.
   • If the halves are considered parallel, then the reliability
       of the whole test is estimated using the Spearman-
       Brown formula.

   • If the halves are essentially τ-equivalent, then the
     coefficient α can be used to estimate reliability.
Split-Half approach to Reliability

1.
Split-Half approach to Reliability




0.00              0.00
0.20              0.33
0.40              0.57
0.60              0.75
0.80              0.89
1.00              1.00
Split-Half approach to Reliability

On the other hand, the two test halves may not (and are likely
not) parallel forms.

This is confirmed when it is determined that the two halves
have unequal variances.

In these situations, it is best to use a different approach to
estimating reliability.
• Cronbach’s coefficient α

α can be used to estimate the reliability of the entire test.
Split-Half approach to Reliability

If the test halves are not essentially τ-equivalent, then
coefficient α will give a lower bound for the test’s reliability.
• In other words, the test’s reliability must be greater than,
    or equal to, the value produced by Cronbach’s α.

   • If α is a high value, then you know that the test
     reliability is also high.

   • If α is a low value, then you may not know whether the
     test actually has low reliability or whether the halves of
     the test are simply not essentially τ-equivalent.
Split-Half approach to Reliability

1.
Split-Half approach to Reliability

1.
Split-Half approach to Reliability

It is the case, that if the variances on both test halves are
equal, then the Spearman-Brown formula and Cronbach’s α
will produce identical results.

If the variances of the two test halves are equal, but the
halves are not Essentially τ-Equivalent, then both the
Spearman-Brown formula and Cronbach’s α will
underestimate the test’s reliability.
• Lower bound estimate

If the observed-score variances of the test halves are equal
and the tests are Essentially τ-Equivalent, then the Spearman-
Brown formula and Cronbach’s α will both equal the test’s
reliability.
Split-Half approach to Reliability

Obviously, the major advantage to using internal-consistency
reliability estimates is that test need only be given once to
obtain such an estimate.

Naturally, this approach is limited only to tests that can be
divided into two parts, or into two parts that are either
parallel or essentially τ-equivalent, or when the test lacks
independent items that can be separated from one another.
• In these situations, one must use test/retest, parallel- or
   alternate-forms reliability approaches.

Assuming one is able to use the Split-Half approach, however,
how does one go about forming two test halves?
Split-Half approach to Reliability

Forming Test Halves:

There are 3 commonly used methods for forming test halves:
1. The Odd/Even method

2. The Order method

3. The Matched Random Subsets method
Odd/Even approach to Test Halves

The Odd/Even method classifies items by their order,
whether odd-numbered or even numbered, on the test.
• In other words, all odd-numbered test items form the first
  half, and all even-numbered test items form the second
  half.

After the two halves are formed, a score for each half is
obtained for each examinee.

These scores are used to obtain an estimate of reliability.

This is a fairly simple, and straightforward approach to
forming two test halves.
Ordered approach to Test Halves

The Ordered method requires that a test be divided prior to its
administration.

From this point, there are multiple additional approaches to
administrating the Ordered method.
1. Every examinee can be given the same test and then, one can
    compare scores from the first half to scores from the second
    half.
   • Carry-over effects may be a concern.

2. Each half is labeled, say A and B, are then given in different
    orders to different examinees.
   • In other words, half the examinees will be randomly
      assigned order A-B, and the other half will be assigned
      order B-A.

The Ordered method is generally considered to be less
satisfactory than the Odd/Even method because of the increased
potential for carry-over effects.
The Matched Random Subsets
                        approach to Test Halves

The Matched Random Subsets method is much more
sophisticated than the two aforementioned methods.

This process involves several steps:
1. For each test item, two statistics are computed:
   • The proportion of examinees passing the item – a
       measure of the item’s “difficulty.”
   • The biserial or point-biserial correlation between the
       item score and the total test score.

2. Each item is plotted on a graph using the above two
    statistics.
   • Items that are close together on the graph are paired,
       and one item from each pair is randomly assigned to
       one half of the test.
   • The remaining items form the other half of the test.
The Matched Random Subsets
                        approach to Test Halves




For example, in the graphic above, we see the plot of test
items A, B, C, D, E & F.

Test items A & B are similar, and therefore grouped. Likewise,
so is C with D, and E with F.
Internal-Consistency Reliability – The General Case

In our previous examples, we divided a given test into two equal halves.

But, here we can examine dividing a given test into multiple equal components.

Even in these cases, we can apply the basic principles of each of the methods for dividing
a test.
   • For example, the odd/even method can be modified to divide a nine item test into
       thirds by taking every third item in a sequence to form a given component, etc.

  • The Matched Random Subsets method would involve forming triplets, rather than
    pairs, but then the first item is randomly assigned to one component, the next to
    another, and so on.
Internal-Consistency Reliability – The General Case

Let us assume that a given test is divided into N components.

The variances of the scores on each component and the variances of the entire test are
used to estimate the reliability of the test.

If the components are essentially τ-equivalent, then formulas presented herein will
provide good estimates of the test’s reliability.

If, however, the components are not essentially τ-equivalent, then the formulas
presented herein will underestimate (i.e., provide a lower bound for) the test’s reliability.

Furthermore, it is important the any test divided into components measure only a single
trait (i.e., be homogeneous in content).
• Intelligence tests are a classic example of a heterogeneous test, because they measure
   a broad spectrum of traits.
Internal-Consistency Reliability – The General Case

1.
Internal-Consistency Reliability – The General Case

1.
Internal-Consistency Reliability – The General Case

1.
Internal-Consistency Reliability – The General Case

1.
The Spearman-Brown Formula: The General Case

1.
The Spearman-Brown Formula: The General Case

1.
The Spearman-Brown Formula: The General Case

1.
The Spearman-Brown Formula: The General Case

1.
The Spearman-Brown Formula: The General Case

1.
The Spearman-Brown Formula: The General Case

If the component tests are not parallel, then the Spearman-
Brown formula will wither over- or underestimate the
reliability of a longer test.

An example scenario of overestimation:
• Suppose one has a 10 item test with a reliability of 0.60.
• The Spearman-Brown formula predicts that by adding a
   parallel ten-item test that the resultant total reliability will
   be 0.75.
• But suppose the test that is added by a faulty test that has
   no variance.
• Effectively, we’ve only added a constant to every
   examinee’s score, which does not contribute to the test’s
   reliability.
• In this case, the total test reliability would still be 0.60.
The Spearman-Brown Formula: The General Case

If the component tests are not parallel, then the Spearman-
Brown formula will wither over- or underestimate the
reliability of a longer test.

An example scenario of underestimation:
• Suppose a ten item test has a reliability of 0.00.
• The Spearman-Brown formula predicts that by doubling
   the test length with a parallel component would produce a
   reliability of 0.00.
• However, if a non-parallel test is added instead with a
   reliability of, say, 0.70, then the resultant reliability of the
   lengthened test will be greater than 0.00.
Comparison of Methods of Estimating Reliabilities

So far, we have learned several different ways to estimate the reliability of a
given test.

Here is a summary of the basic principles of each, that one should use when
deciding which is appropriate for estimating the reliability of one’s test:
1. When using Test/Retest methods, one should use Parallel- or Alternate-
    Forms reliability estimates because most internal-consistency measures
    would be inaccurate.

2.    Use of Cronbach’s α or the Kuder-Richardson methods produces a lower
      bound for a given test’s reliability.
     • If the tests happen to be essentially τ-equivalent, then the estimated
         reliability is the test’s reliability.
     • But these tests should only be used for homogeneous tests

3.    When using the Split-Half method, the Spearman-Brown formula can over-
      or underestimate a test’s reliability if the components are not parallel.
     • When the components are parallel, then the estimate provided is very
          good for judging the effects of changing test length.
Standard Errors of Measurement
     & Confidence Intervals for True Scores

1.
Standard Errors of Measurement
              & Confidence Intervals for True Scores

The bottom chart depicts an approximately normal
distribution of observed scores obtained from many
independent testings of a single examinee.

Note how the scores vary, but tend to group around the
examinee’s true score.
Standard Errors of Measurement
     & Confidence Intervals for True Scores

1.
Standard Errors of Measurement
     & Confidence Intervals for True Scores

1.
Standard Errors of Measurement
     & Confidence Intervals for True Scores

1.
Standard Errors of Measurement
               & Confidence Intervals for True Scores

The confidence intervals for true scores can be interpreted in
either of two ways:
1. The intervals can be expected to contain a given
    examinee’s true score a specified percentage of time
    when the interval is constructed using observed scores
    that are the result of repeated independent testings of the
    examinee using the same test (or parallel tests).

2. The interval can be expected to cover a specified
   percentage of the examinee’s true scores when many
   examinees are tested once with the same test (or parallel
   tests) and a confidence interval is calculated for each
   examinee.
Standard Errors of Measurement
                & Confidence Intervals for True Scores

Tests with a high degree of measurement error will produce
confidence intervals that are necessarily wider.

Less reliable tests tend to have a high degree of
measurement error.

Therefore, wide confidence intervals are an indication that
the observed scores are not very good estimates of true
scores.

If a test has good reliability, then the confidence intervals will
also be narrow, indicating good estimates of true scores.

Contenu connexe

Tendances

Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Validity, its types, measurement & factors.
Validity, its types, measurement & factors.
Maheen Iftikhar
 
Validity and its types
Validity and its typesValidity and its types
Validity and its types
BibiNadia1
 
Reliability (assessment of student learning I)
Reliability (assessment of student learning I)Reliability (assessment of student learning I)
Reliability (assessment of student learning I)
Rey-ra Mora
 

Tendances (20)

Presentation validity
Presentation validityPresentation validity
Presentation validity
 
What is Reliability and its Types?
What is Reliability and its Types? What is Reliability and its Types?
What is Reliability and its Types?
 
Reliability types
Reliability typesReliability types
Reliability types
 
Reliability & Validity
Reliability & ValidityReliability & Validity
Reliability & Validity
 
Reliability
ReliabilityReliability
Reliability
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and Reliability
 
Validity in Research
Validity in ResearchValidity in Research
Validity in Research
 
Reliability
ReliabilityReliability
Reliability
 
Reliability
Reliability Reliability
Reliability
 
Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Validity, its types, measurement & factors.
Validity, its types, measurement & factors.
 
Meaning and Methods of Estimating Reliability of Test.pptx
Meaning and Methods of Estimating Reliability of Test.pptxMeaning and Methods of Estimating Reliability of Test.pptx
Meaning and Methods of Estimating Reliability of Test.pptx
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Validity and its types
Validity and its typesValidity and its types
Validity and its types
 
Week 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and ReliabilityWeek 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and Reliability
 
Validity & reliability
Validity & reliabilityValidity & reliability
Validity & reliability
 
Norms[1]
Norms[1]Norms[1]
Norms[1]
 
Reliability (assessment of student learning I)
Reliability (assessment of student learning I)Reliability (assessment of student learning I)
Reliability (assessment of student learning I)
 
Reliability bachman 1990 chapter 6
Reliability bachman 1990 chapter 6Reliability bachman 1990 chapter 6
Reliability bachman 1990 chapter 6
 
Validity, Reliability and Feasibility
Validity, Reliability and FeasibilityValidity, Reliability and Feasibility
Validity, Reliability and Feasibility
 
Reliability and validity1
Reliability and validity1Reliability and validity1
Reliability and validity1
 

Similaire à Reliability

What makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docxWhat makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docx
mecklenburgstrelitzh
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
Louzel Linejan
 
With-Hershey-Marie-Abarri-4hshzjzhzhzhzhzhz.pptx
With-Hershey-Marie-Abarri-4hshzjzhzhzhzhzhz.pptxWith-Hershey-Marie-Abarri-4hshzjzhzhzhzhzhz.pptx
With-Hershey-Marie-Abarri-4hshzjzhzhzhzhzhz.pptx
JunrivRivera
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
cyrilcoscos
 
Validity and reliability of the instrument
Validity and reliability of the instrumentValidity and reliability of the instrument
Validity and reliability of the instrument
Bhumi Patel
 

Similaire à Reliability (20)

Reliability Analysis
Reliability AnalysisReliability Analysis
Reliability Analysis
 
Characteristics of a good test
Characteristics of a good test Characteristics of a good test
Characteristics of a good test
 
Reliability of test
Reliability of testReliability of test
Reliability of test
 
What makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docxWhat makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docx
 
CHAPTER-3-TOPIC-4-REPORTER-3-PPT.pptx
CHAPTER-3-TOPIC-4-REPORTER-3-PPT.pptxCHAPTER-3-TOPIC-4-REPORTER-3-PPT.pptx
CHAPTER-3-TOPIC-4-REPORTER-3-PPT.pptx
 
Evaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.pptEvaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.ppt
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
 
Establishing the English Language Test Reliability
 Establishing the  English Language Test Reliability  Establishing the  English Language Test Reliability
Establishing the English Language Test Reliability
 
Monika seminar
Monika seminarMonika seminar
Monika seminar
 
Monika seminar
Monika seminarMonika seminar
Monika seminar
 
Testing in language programs (chapter 8)
Testing in language programs (chapter 8)Testing in language programs (chapter 8)
Testing in language programs (chapter 8)
 
With-Hershey-Marie-Abarri-4hshzjzhzhzhzhzhz.pptx
With-Hershey-Marie-Abarri-4hshzjzhzhzhzhzhz.pptxWith-Hershey-Marie-Abarri-4hshzjzhzhzhzhzhz.pptx
With-Hershey-Marie-Abarri-4hshzjzhzhzhzhzhz.pptx
 
Edm 202
Edm 202Edm 202
Edm 202
 
EM&E.pptx
EM&E.pptxEM&E.pptx
EM&E.pptx
 
reliablity and validity in social sciences research
reliablity and validity  in social sciences researchreliablity and validity  in social sciences research
reliablity and validity in social sciences research
 
Validity andreliability
Validity andreliabilityValidity andreliability
Validity andreliability
 
Reliability and its types: Split half method and test retest methods
Reliability and its types: Split half method and test retest methodsReliability and its types: Split half method and test retest methods
Reliability and its types: Split half method and test retest methods
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
RELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYRELIABILITY AND VALIDITY
RELIABILITY AND VALIDITY
 
Validity and reliability of the instrument
Validity and reliability of the instrumentValidity and reliability of the instrument
Validity and reliability of the instrument
 

Reliability

  • 13. Test/Retest Reliability Estimates A second problem with the Test/Retest method is the length of time required to conduct the two test administrations. A short delay between Time 1 and Time 2 increases the potential for carry-over effects due to memory, fatigue, practice, etc. But a long delay between Time 1 and Time 2 increases the potential for carry-over effects due to mood, developmental change, etc. Consequently, the Test/Retest method is most appropriate in contexts wherein the test is not susceptible to carry-over effects.
  • 14. Parallel-Forms & Alternative-Forms Reliability Estimates 1.
  • 15. Parallel-Forms & Alternative-Forms Reliability Estimates 1.
  • 16. Parallel-Forms & Alternative-Forms Reliability Estimates 1.
  • 17. Parallel-Forms & Alternative-Forms Reliability Estimates 1.
  • 18. Internal Consistency Estimates of Reliability We have seen that reliability estimates can be obtained by administering the same test to the same examinees and by correlating the results: Test/Retest We have also seen that reliability estimates can be obtained by administering two parallel or alternate forms of a test, and then correlating those results: Parallel- & Alternate-Forms In both of the above cases, the researcher must administer two exams, and they are sometimes given at different times making them susceptible to carry-over effects. Here, we will see that it is possible to obtain a reliability estimate using only a single test. The most common way to obtain a reliability estimate using a single test is through the Split-half approach.
  • 19. Split-Half approach to Reliability When using the Split-Half approach, one gives a single test to a group of examinees. Later, the test is divided into two parts, which may be considered to be alternate forms of one another. • In fact, the split is not so arbitrary; an attempt is made to choose the two halves so that they are parallel or essentially τ-equivalent. • If the halves are considered parallel, then the reliability of the whole test is estimated using the Spearman- Brown formula. • If the halves are essentially τ-equivalent, then the coefficient α can be used to estimate reliability.
  • 20. Split-Half approach to Reliability 1.
  • 21. Split-Half approach to Reliability 0.00 0.00 0.20 0.33 0.40 0.57 0.60 0.75 0.80 0.89 1.00 1.00
  • 22. Split-Half approach to Reliability On the other hand, the two test halves may not (and are likely not) parallel forms. This is confirmed when it is determined that the two halves have unequal variances. In these situations, it is best to use a different approach to estimating reliability. • Cronbach’s coefficient α α can be used to estimate the reliability of the entire test.
  • 23. Split-Half approach to Reliability If the test halves are not essentially τ-equivalent, then coefficient α will give a lower bound for the test’s reliability. • In other words, the test’s reliability must be greater than, or equal to, the value produced by Cronbach’s α. • If α is a high value, then you know that the test reliability is also high. • If α is a low value, then you may not know whether the test actually has low reliability or whether the halves of the test are simply not essentially τ-equivalent.
  • 24. Split-Half approach to Reliability 1.
  • 25. Split-Half approach to Reliability 1.
  • 26. Split-Half approach to Reliability It is the case, that if the variances on both test halves are equal, then the Spearman-Brown formula and Cronbach’s α will produce identical results. If the variances of the two test halves are equal, but the halves are not Essentially τ-Equivalent, then both the Spearman-Brown formula and Cronbach’s α will underestimate the test’s reliability. • Lower bound estimate If the observed-score variances of the test halves are equal and the tests are Essentially τ-Equivalent, then the Spearman- Brown formula and Cronbach’s α will both equal the test’s reliability.
  • 27. Split-Half approach to Reliability Obviously, the major advantage to using internal-consistency reliability estimates is that test need only be given once to obtain such an estimate. Naturally, this approach is limited only to tests that can be divided into two parts, or into two parts that are either parallel or essentially τ-equivalent, or when the test lacks independent items that can be separated from one another. • In these situations, one must use test/retest, parallel- or alternate-forms reliability approaches. Assuming one is able to use the Split-Half approach, however, how does one go about forming two test halves?
  • 28. Split-Half approach to Reliability Forming Test Halves: There are 3 commonly used methods for forming test halves: 1. The Odd/Even method 2. The Order method 3. The Matched Random Subsets method
  • 29. Odd/Even approach to Test Halves The Odd/Even method classifies items by their order, whether odd-numbered or even numbered, on the test. • In other words, all odd-numbered test items form the first half, and all even-numbered test items form the second half. After the two halves are formed, a score for each half is obtained for each examinee. These scores are used to obtain an estimate of reliability. This is a fairly simple, and straightforward approach to forming two test halves.
  • 30. Ordered approach to Test Halves The Ordered method requires that a test be divided prior to its administration. From this point, there are multiple additional approaches to administrating the Ordered method. 1. Every examinee can be given the same test and then, one can compare scores from the first half to scores from the second half. • Carry-over effects may be a concern. 2. Each half is labeled, say A and B, are then given in different orders to different examinees. • In other words, half the examinees will be randomly assigned order A-B, and the other half will be assigned order B-A. The Ordered method is generally considered to be less satisfactory than the Odd/Even method because of the increased potential for carry-over effects.
  • 31. The Matched Random Subsets approach to Test Halves The Matched Random Subsets method is much more sophisticated than the two aforementioned methods. This process involves several steps: 1. For each test item, two statistics are computed: • The proportion of examinees passing the item – a measure of the item’s “difficulty.” • The biserial or point-biserial correlation between the item score and the total test score. 2. Each item is plotted on a graph using the above two statistics. • Items that are close together on the graph are paired, and one item from each pair is randomly assigned to one half of the test. • The remaining items form the other half of the test.
  • 32. The Matched Random Subsets approach to Test Halves For example, in the graphic above, we see the plot of test items A, B, C, D, E & F. Test items A & B are similar, and therefore grouped. Likewise, so is C with D, and E with F.
  • 33. Internal-Consistency Reliability – The General Case In our previous examples, we divided a given test into two equal halves. But, here we can examine dividing a given test into multiple equal components. Even in these cases, we can apply the basic principles of each of the methods for dividing a test. • For example, the odd/even method can be modified to divide a nine item test into thirds by taking every third item in a sequence to form a given component, etc. • The Matched Random Subsets method would involve forming triplets, rather than pairs, but then the first item is randomly assigned to one component, the next to another, and so on.
  • 34. Internal-Consistency Reliability – The General Case Let us assume that a given test is divided into N components. The variances of the scores on each component and the variances of the entire test are used to estimate the reliability of the test. If the components are essentially τ-equivalent, then formulas presented herein will provide good estimates of the test’s reliability. If, however, the components are not essentially τ-equivalent, then the formulas presented herein will underestimate (i.e., provide a lower bound for) the test’s reliability. Furthermore, it is important the any test divided into components measure only a single trait (i.e., be homogeneous in content). • Intelligence tests are a classic example of a heterogeneous test, because they measure a broad spectrum of traits.
  • 39. The Spearman-Brown Formula: The General Case 1.
  • 40. The Spearman-Brown Formula: The General Case 1.
  • 41. The Spearman-Brown Formula: The General Case 1.
  • 42. The Spearman-Brown Formula: The General Case 1.
  • 43. The Spearman-Brown Formula: The General Case 1.
  • 44. The Spearman-Brown Formula: The General Case If the component tests are not parallel, then the Spearman- Brown formula will wither over- or underestimate the reliability of a longer test. An example scenario of overestimation: • Suppose one has a 10 item test with a reliability of 0.60. • The Spearman-Brown formula predicts that by adding a parallel ten-item test that the resultant total reliability will be 0.75. • But suppose the test that is added by a faulty test that has no variance. • Effectively, we’ve only added a constant to every examinee’s score, which does not contribute to the test’s reliability. • In this case, the total test reliability would still be 0.60.
  • 45. The Spearman-Brown Formula: The General Case If the component tests are not parallel, then the Spearman- Brown formula will wither over- or underestimate the reliability of a longer test. An example scenario of underestimation: • Suppose a ten item test has a reliability of 0.00. • The Spearman-Brown formula predicts that by doubling the test length with a parallel component would produce a reliability of 0.00. • However, if a non-parallel test is added instead with a reliability of, say, 0.70, then the resultant reliability of the lengthened test will be greater than 0.00.
  • 46. Comparison of Methods of Estimating Reliabilities So far, we have learned several different ways to estimate the reliability of a given test. Here is a summary of the basic principles of each, that one should use when deciding which is appropriate for estimating the reliability of one’s test: 1. When using Test/Retest methods, one should use Parallel- or Alternate- Forms reliability estimates because most internal-consistency measures would be inaccurate. 2. Use of Cronbach’s α or the Kuder-Richardson methods produces a lower bound for a given test’s reliability. • If the tests happen to be essentially τ-equivalent, then the estimated reliability is the test’s reliability. • But these tests should only be used for homogeneous tests 3. When using the Split-Half method, the Spearman-Brown formula can over- or underestimate a test’s reliability if the components are not parallel. • When the components are parallel, then the estimate provided is very good for judging the effects of changing test length.
  • 47. Standard Errors of Measurement & Confidence Intervals for True Scores 1.
  • 48. Standard Errors of Measurement & Confidence Intervals for True Scores The bottom chart depicts an approximately normal distribution of observed scores obtained from many independent testings of a single examinee. Note how the scores vary, but tend to group around the examinee’s true score.
  • 49. Standard Errors of Measurement & Confidence Intervals for True Scores 1.
  • 50. Standard Errors of Measurement & Confidence Intervals for True Scores 1.
  • 51. Standard Errors of Measurement & Confidence Intervals for True Scores 1.
  • 52. Standard Errors of Measurement & Confidence Intervals for True Scores The confidence intervals for true scores can be interpreted in either of two ways: 1. The intervals can be expected to contain a given examinee’s true score a specified percentage of time when the interval is constructed using observed scores that are the result of repeated independent testings of the examinee using the same test (or parallel tests). 2. The interval can be expected to cover a specified percentage of the examinee’s true scores when many examinees are tested once with the same test (or parallel tests) and a confidence interval is calculated for each examinee.
  • 53. Standard Errors of Measurement & Confidence Intervals for True Scores Tests with a high degree of measurement error will produce confidence intervals that are necessarily wider. Less reliable tests tend to have a high degree of measurement error. Therefore, wide confidence intervals are an indication that the observed scores are not very good estimates of true scores. If a test has good reliability, then the confidence intervals will also be narrow, indicating good estimates of true scores.