SlideShare a Scribd company logo
1 of 9
Reliability in Testing

    Is the test or assessment tool
     consistent and dependable?
• Student-related reliability
• Rater reliability
• Test administration reliability
• Test reliability (Brown, 2004, 20-22)
Student-related reliability

        • Give students consistent materials/time for
          test preparation
        • Plenty of time for studying
        • Consistent test time/conditions (always on
          Wednesdays)
Rater reliability

         • For subjective scoring in high stakes tests:
               Have more than one rater
               Use rubric and have traning/norming sessions
               Have outside periodic oversight
               Keep tests anonymous
         • For low stakes tests (quizzes):
             Use rubric
             Read through all tests before rating
Test administration reliability

        • Consistent rules for all classes/teachers:
            Dictionary use
            Notes/books
            Strict time limits
How to measure test reliability?

One way is through test-retest method:

• The same group of students takes the same test
  twice. (drawback- motivation and washback)

What else could we do to test whether the
same students (or very similar students)
score the same on the same test twice?
Split half method




 Have the test split   Then measure the       This is a good way to
 into two halves,      scores for each half   look at reliability, but
 equal in tasks and                           it requires that the
 difficulty.                                  split tests are really
                                              equal
Split half method – Your task
 • What can you tell about the reliability for
   the following split sections on my test?

         Joanna       88          95
         Jenny        90          87
         Asuka        97          95
         David        92          88
         Annick       74          78
         Mercedes     84          89
Checklist for Reliability (1)

         • Have many independent items
         • Delete items that do not discriminate between
             weaker/stronger students
         •   Limit choices – restrict student responses
         •   Keep items clear and unambiguous
         •   Provide clear instructions and examples
         •   Keep type large enough and cleanly copied
Checklist for Reliability (2)
•   Provide practice with testing format
•   Keep administration uniform
•   Have clear answers for all items (as possible)
•   Provide detailed answer key
•   Train raters as necessary
•   Determine acceptable responses before scoring starts
•   Keep test-takers anonymous

             John Bunting (2004) presentation in the Course: Testing, Assessment and Teaching- A
             program for EFL Teachers at UABC. Facultad de Idiomas, UABC

More Related Content

Viewers also liked

Inter-rater Reliability Training - ACT Holistic Writing Rubric
Inter-rater Reliability Training - ACT Holistic Writing RubricInter-rater Reliability Training - ACT Holistic Writing Rubric
Inter-rater Reliability Training - ACT Holistic Writing RubricPeggy Muehlenkamp
 
Psychometric Assessment at its best
Psychometric Assessment at its best Psychometric Assessment at its best
Psychometric Assessment at its best Dr. Johnsey Thomas
 
How to detect bias in the news ('12)
How to detect bias in the news ('12)How to detect bias in the news ('12)
How to detect bias in the news ('12)LHendersonRSS
 
Reliability and validity1
Reliability and validity1Reliability and validity1
Reliability and validity1MMIHS
 
Reliability in Language Testing
Reliability in Language Testing Reliability in Language Testing
Reliability in Language Testing Seray Tanyer
 
1 Reliability and Validity in Physical Therapy Tests
1  Reliability and Validity in Physical Therapy Tests1  Reliability and Validity in Physical Therapy Tests
1 Reliability and Validity in Physical Therapy Testsaebrahim123
 
Test Reliability and Validity
Test Reliability and ValidityTest Reliability and Validity
Test Reliability and ValidityBrian Ebie
 
Research methods - PSYA1 psychology AS
Research methods - PSYA1 psychology ASResearch methods - PSYA1 psychology AS
Research methods - PSYA1 psychology ASNicky Burt
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good testcyrilcoscos
 
PSYA4 - Research methods
PSYA4 - Research methodsPSYA4 - Research methods
PSYA4 - Research methodsNicky Burt
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Linejan
 
Research methods in psychology
Research methods in psychologyResearch methods in psychology
Research methods in psychologyAlexisCowan
 
Research Methods in Psychology
Research Methods in PsychologyResearch Methods in Psychology
Research Methods in PsychologyJames Neill
 
Psychometric instrument development
Psychometric instrument developmentPsychometric instrument development
Psychometric instrument developmentJames Neill
 
Types of Language Assessment
Types of Language AssessmentTypes of Language Assessment
Types of Language AssessmentJhun Ar Ar Ramos
 

Viewers also liked (19)

Identifying bias
Identifying biasIdentifying bias
Identifying bias
 
Inter-rater Reliability Training - ACT Holistic Writing Rubric
Inter-rater Reliability Training - ACT Holistic Writing RubricInter-rater Reliability Training - ACT Holistic Writing Rubric
Inter-rater Reliability Training - ACT Holistic Writing Rubric
 
Psychometric Assessment at its best
Psychometric Assessment at its best Psychometric Assessment at its best
Psychometric Assessment at its best
 
Kappa
KappaKappa
Kappa
 
How to detect bias in the news ('12)
How to detect bias in the news ('12)How to detect bias in the news ('12)
How to detect bias in the news ('12)
 
Reliability and validity1
Reliability and validity1Reliability and validity1
Reliability and validity1
 
Reliability in Language Testing
Reliability in Language Testing Reliability in Language Testing
Reliability in Language Testing
 
1 Reliability and Validity in Physical Therapy Tests
1  Reliability and Validity in Physical Therapy Tests1  Reliability and Validity in Physical Therapy Tests
1 Reliability and Validity in Physical Therapy Tests
 
Test Reliability and Validity
Test Reliability and ValidityTest Reliability and Validity
Test Reliability and Validity
 
Research methods - PSYA1 psychology AS
Research methods - PSYA1 psychology ASResearch methods - PSYA1 psychology AS
Research methods - PSYA1 psychology AS
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
Psychometric Tests
Psychometric TestsPsychometric Tests
Psychometric Tests
 
Reliability
ReliabilityReliability
Reliability
 
PSYA4 - Research methods
PSYA4 - Research methodsPSYA4 - Research methods
PSYA4 - Research methods
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
 
Research methods in psychology
Research methods in psychologyResearch methods in psychology
Research methods in psychology
 
Research Methods in Psychology
Research Methods in PsychologyResearch Methods in Psychology
Research Methods in Psychology
 
Psychometric instrument development
Psychometric instrument developmentPsychometric instrument development
Psychometric instrument development
 
Types of Language Assessment
Types of Language AssessmentTypes of Language Assessment
Types of Language Assessment
 

More from Jesus Buelna

Plantemiento del Problema
Plantemiento del ProblemaPlantemiento del Problema
Plantemiento del ProblemaJesus Buelna
 
Lesson Plan -Importanat points- (shared using VisualBee) (download it to wat...
Lesson Plan -Importanat points-  (shared using VisualBee) (download it to wat...Lesson Plan -Importanat points-  (shared using VisualBee) (download it to wat...
Lesson Plan -Importanat points- (shared using VisualBee) (download it to wat...Jesus Buelna
 
Teorias del desarrollo moral
Teorias del desarrollo moralTeorias del desarrollo moral
Teorias del desarrollo moralJesus Buelna
 
Presentacion didactica
Presentacion didacticaPresentacion didactica
Presentacion didacticaJesus Buelna
 
International school
International schoolInternational school
International schoolJesus Buelna
 
Criteria for admision
Criteria for admisionCriteria for admision
Criteria for admisionJesus Buelna
 
Puzzle elementary school
Puzzle elementary schoolPuzzle elementary school
Puzzle elementary schoolJesus Buelna
 

More from Jesus Buelna (9)

Plantemiento del Problema
Plantemiento del ProblemaPlantemiento del Problema
Plantemiento del Problema
 
Lesson Plan -Importanat points- (shared using VisualBee) (download it to wat...
Lesson Plan -Importanat points-  (shared using VisualBee) (download it to wat...Lesson Plan -Importanat points-  (shared using VisualBee) (download it to wat...
Lesson Plan -Importanat points- (shared using VisualBee) (download it to wat...
 
Teorias del desarrollo moral
Teorias del desarrollo moralTeorias del desarrollo moral
Teorias del desarrollo moral
 
Sigmund freud
Sigmund freudSigmund freud
Sigmund freud
 
Presentacion didactica
Presentacion didacticaPresentacion didactica
Presentacion didactica
 
International school
International schoolInternational school
International school
 
Didactica
DidacticaDidactica
Didactica
 
Criteria for admision
Criteria for admisionCriteria for admision
Criteria for admision
 
Puzzle elementary school
Puzzle elementary schoolPuzzle elementary school
Puzzle elementary school
 

Reliability in Testing (shared using VisualBee)

  • 1. Reliability in Testing Is the test or assessment tool consistent and dependable? • Student-related reliability • Rater reliability • Test administration reliability • Test reliability (Brown, 2004, 20-22)
  • 2. Student-related reliability • Give students consistent materials/time for test preparation • Plenty of time for studying • Consistent test time/conditions (always on Wednesdays)
  • 3. Rater reliability • For subjective scoring in high stakes tests:  Have more than one rater  Use rubric and have traning/norming sessions  Have outside periodic oversight  Keep tests anonymous • For low stakes tests (quizzes):  Use rubric  Read through all tests before rating
  • 4. Test administration reliability • Consistent rules for all classes/teachers:  Dictionary use  Notes/books  Strict time limits
  • 5. How to measure test reliability? One way is through test-retest method: • The same group of students takes the same test twice. (drawback- motivation and washback) What else could we do to test whether the same students (or very similar students) score the same on the same test twice?
  • 6. Split half method Have the test split Then measure the This is a good way to into two halves, scores for each half look at reliability, but equal in tasks and it requires that the difficulty. split tests are really equal
  • 7. Split half method – Your task • What can you tell about the reliability for the following split sections on my test? Joanna 88 95 Jenny 90 87 Asuka 97 95 David 92 88 Annick 74 78 Mercedes 84 89
  • 8. Checklist for Reliability (1) • Have many independent items • Delete items that do not discriminate between weaker/stronger students • Limit choices – restrict student responses • Keep items clear and unambiguous • Provide clear instructions and examples • Keep type large enough and cleanly copied
  • 9. Checklist for Reliability (2) • Provide practice with testing format • Keep administration uniform • Have clear answers for all items (as possible) • Provide detailed answer key • Train raters as necessary • Determine acceptable responses before scoring starts • Keep test-takers anonymous John Bunting (2004) presentation in the Course: Testing, Assessment and Teaching- A program for EFL Teachers at UABC. Facultad de Idiomas, UABC