SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
Ministry of Education


           DRAFT Technical Report

                       of the

     Pre- and Post-Pilot Testing for the
    Continuous Assessment Programme
in Lusaka, Southern and Western Provinces



                  Coordinated by the
            Examinations Council of Zambia
       Research and Test Development Department

               Under the Direction of the
Continuous Assessment Steering and Technical Committees
                 Ministry of Education




                   Lusaka, Zambia
                    October 2007
Table of Contents

ACKNOWLEDGMENTS ..................................................................................2

CHAPTER ONE: BACKGROUND ....................................................................3
  1.1     Introduction to Continuous Assessment....................................................... 3
  1.2     Definition of Continuous Assessment .......................................................... 4
  1.3     Challenges in the Implementation of Continuous Assessment .................... 4
  1.4     Guidelines for Implementation of Continuous Assessment.......................... 5
  1.5     Plan for Implementation of Continuous Assessment.................................... 7

CHAPTER TWO: EVALUATION METHODOLOGY ..............................................8
  2.1     Objectives .................................................................................................... 8
  2.2     Design.......................................................................................................... 8
  2.3     Sample......................................................................................................... 9
  2.4     Instruments .................................................................................................. 9
  2.5     Administration .............................................................................................. 9
  2.6     Data Capture and Scoring.......................................................................... 10
  2.7     Data Analysis ............................................................................................. 10

CHAPTER THREE: ASSESSMENT RESULTS..................................................11
  3.1     Psychometric Characteristics..................................................................... 11
  3.2     Classical Test Theory ................................................................................ 11
  3.3     Item Response Theory............................................................................... 14
  3.4     Scaled Scores............................................................................................ 15
  3.5     Vertical Scaled Scores ............................................................................... 18
  3.6     Comparison between Pilot and Comparison Groups ................................. 19
  3.7     Comparison across Regions ...................................................................... 24
  3.8     Performance Categories ............................................................................ 25

CHAPTER FOUR: SUMMARY AND CONCLUSIONS .........................................28

APPENDIX 1: ITEM STATISTICS BY SUBJECT

APPENDIX 2: SCORES AND FREQUENCIES - GRADE 5 PRE-TESTS

APPENDIX 3: SCORES AND FREQUENCIES - GRADE 5 POST-TESTS

APPENDIX 4: HISTOGRAMS BY SUBJECT AND GROUP




                                                           1
ACKNOWLEDGMENTS
The Continuous Assessment Joint Steering and Technical Committees and the
Examinations Council of Zambia wish to express profound gratitude to the
professional and material support provided by the Provincial Education Offices,
District Education Boards, Educational Zone staff in the different districts, school
administrators, teachers and pupils. Without this support, the baseline and post-pilot
assessment exercises would not have succeeded.

Other appreciations go to the management in the Directorate for Curriculum and
Assessment in the Ministry of Education for providing professional support towards
the Continuous Assessment programme in general and the assessment exercises in
particular. We wish to specifically thank the Director for Standards and Curriculum,
the Director for the Examinations Council of Zambia, and the Chief Curriculum
Specialist for allowing their personnel to take part in the assessment exercise.

Finally, we wish to express our appreciation to the USAID and the EQUIP2 Project
for providing the finances and technical support towards the Continuous Assessment
programme in Zambia.

All of the participants and stakeholders listed above have played a crucial role in not
only developing and implementing the Continuous Assessment programme, but
have also been supportive of the quantitative evaluation of the programme presented
in this technical paper. It is because of their interest in improving student learning
outcomes that the Continuous Assessment programme has had the necessary
financial, administrative and technical support. Our hope is that the programme will
prove to be valuable for all of the pupils and teachers in Zambian schools.




                                          2
Chapter One: Background
1.1   Introduction to Continuous Assessment
      Over the years in Zambia, the education system has not been able to provide
      enough spaces for all learners to proceed from Grade 7 to Grade 8, from
      Grade 9 to Grade 10, and from Grade 12 to higher learning institutions. The
      system has used examinations for selection of those to proceed to the next
      level and for the certification of candidates; however, this has been done
      without formal consideration of the school-based assessment as a component
      in the final examinations, with the exception of some practical subjects.

      The 1977 Educational Reforms explicitly provided for the use of Continuous
      Assessment (CA). Later, national policy documents, particularly Educating
      Our Future (1996) and Ministry of Education’s Strategic Plan 2003-2007,
      stated the need for integrating school-based continuous assessment into the
      education system, including the development of strategies to combine CA
      results with the final examination results for purposes of pupil certification and
      selection.

      Furthermore, the national education policy, as stated in Educating Our Future,
      stipulated that the Ministry of Education will develop procedures that will
      enable teachers to standardise their assessment methods and tasks for use
      as an integral part of school-based CA. The education policy document also
      stated that the Directorate of Standards, in cooperation with the Examinations
      Council of Zambia (ECZ), will determine how school-based CA can be better
      conducted so that it can contribute to the final examination results for pupil
      certification and promotion to the subsequent levels. The policy also stated
      that the Directorate of Standards, with input from the ECZ, will determine
      when school-based CA can be introduced.

      In order to set in motion the implementation of school-based CA, the ECZ
      convened a preparatory workshop from 16th to 22nd November 2003 in
      Kafue. Ninety (90) participants from various stakeholders’ institutions took
      part. The objectives of the preparatory workshop were to:

      •   Recommend a plan for developing and implementing CA;
      •   Recommend a training plan for preparing teachers in implementing CA;
      •   Explore ways of ensuring transparency, reliability, validity and
          comparability in using CA results;
      •   Agree on common assessment tasks and learning outcomes to be
          identified in the syllabuses for CA;
      •   Discuss the development of a teacher’s manual on CA; and
      •   Discuss the nature of summary forms for recording marks that should be
          provided to schools.




                                          3
1.2    Definition of Continuous Assessment
       Continuous assessment is defined as an on-going, diagnostic, classroom-
       based process that uses a variety of assessment tools to measure learner
       performance. CA is a formative evaluation tool conducted during the teaching
       and learning process with the aim of influencing and informing the overall
       instructional process. It is the assessment of the whole learner on an ongoing
       basis over a period of time, where cumulative judgments of the learner’s
       abilities in specific areas are made in order to facilitate further positive
       learning (Le Grange & Reddy, 1998). 1

       The data generated from CA should be useful in assisting teachers to plan for
       the learning by individual pupils. It also should assist teachers in identifying
       the unique understanding of each learner in a classroom by informing the
       pupil of the level of instructional attainment, helping to target opportunities that
       promote learning, and reducing anxiety and other problems associated with
       examinations. CA has shown to have had positive impacts on student learning
       outcomes in hundreds of educational settings (Black & William, 1998). 2

       CA is made up of a variety of assessment methods that can be formal or
       informal. It takes place during the learning process when it is most necessary,
       making use of criterion referencing rather than norm referencing and providing
       feedback on how learners are changing.

1.3    Challenges in the Implementation of Continuous Assessment
       There are several areas in which the implementation of CA in the classroom
       will present challenges. Some of these are listed below.
       • Large class sizes in most primary schools are a major problem. It is
           common to find classes of 60 and above in Zambian classrooms.
           Teachers are expected to mark and keep records of the progress of all of
           these learners.
       • CA can take a lot of time for teachers. As a result, teachers get concerned
           that time spent on remediation and enrichment is excessive and many
           teachers do not believe that they would finish the syllabus with CA.
       • CA will not be successfully implemented if there are inadequate teaching
           resources / equipment in schools. Teachers need materials and equipment
           such as stationery, computers and photocopiers (and electricity).
       • There may be cases of resistance from school administrators and teachers
           if they feel left out in the process of developing the CA programme.
       • CA requires the cooperation of communities and parents. If they do not
           understand what is expected of them, they may resist and hence affect the
           success of the programme.



1
  Le Grange, L.L. & Reddy, C. 1998. Continuous Assessment: An Introduction and Guidelines to
Implementation. Cape Town, South Africa: Juta.
2
 Black, P. & William, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1),
7-74.


                                                4
1.4   Guidelines for Implementation of Continuous Assessment
      A teachers’ guide on the implementation of continuous assessment at the
      basic school level was developed with the involvement of curriculum
      specialists, Standards officers, Examinations specialists, Provincial Education
      Officials, District Education Officials, Zonal in-Service training providers,
      school administrators and teachers.

      The Teachers’ Guide on CA comprises the following:

      •   Sample record forms;
      •   Description of the CA schemes;
      •   Instructions for preparing and administering assessment materials;
      •   Marking and moderation of the CA marks;
      •   Recording and reporting assessment results; and
      •   Monitoring of the implementation of the CA.

      The Teachers’ Guide also specifies the roles of stakeholders as follows:

      Teachers

      •   Plan assessment tasks, projects and mark schedules;
      •   Teach, guide and supervise pupils in implementing given tasks;
      •   Conduct the assessment in line with given guidelines;
      •   Mark and record the results;
      •   Provide correction and remedial work to the pupils;
      •   Inform the head teacher and parents on the performance of the child;
      •   Advise and counsel the pupils on their performance in class tasks;
      •   Take part in internal moderation of pupils’ results.

      School Administrators

      •   Provide an enabling environment, such as the procurement of teaching
          and learning materials;
      •   Act as links between the school and other stakeholders like ECZ,
          traditional leaders, politicians and parents;
      •   Ensure validity, reliability and comparability through moderation of CA;
      •   Compile CA results and hand them to ECZ.

      Parents

      •   Provide professional, moral, financial and material support to pupils.
      •   Continuously monitor their children’s attendance and performance
      •   Take part in making and enforcing school rules.
      •   Attend open days and witness the giving of prizes (rewards) to outstanding
          pupils in terms of performance.




                                          5
Standards Officers

•   Interpret Government of Zambia policy on education;
•   Monitor education policy implementation at various levels of the education
    system;
•   Advise and evaluate the extent to which the education objectives have
    been achieved;
•   Ensure that acceptable assessment practices are conducted;
•   Monitor the overall standards of education.

Guidance Teachers/School Counsellors

•   Prepare and store record cards for CA;
•   Counsel pupils, teachers and parents/ guardians on CA and feedback;
•   Take care of the pupils’ psycho-social needs;
•   Make referrals for pupils to access other specialized assistance/support.

Heads of Department/Senior Teachers/Section Heads

•   Monitor and advise teachers in the planning, setting, conducting, marking
    and recording of CA results;
•   Ensure validity, reliability and dependability of CA by conducting internal
    moderation of results;
•   Hold departmental meetings to analyze the assessment;
•   Provide or make available the teaching and learning materials;
•   Compile a final record of CA results and hand them over to Guidance
    Teachers for onward submission to the ECZ.

District Resource Centre Coordinators

•   Ensure adequate in service training for teachers in planning, conducting,
    marking, moderating and recording results at school level in the district;
•   Monitor the conduct of CA in the schools and district;
•   Professionally guide teachers to ensure provision of quality education at
    school level.

Provincial Resource Centre Coordinators

•   Ensure adequate in-service training for teachers for them to be effective in
    planning, conducting, marking, moderating and recording CA results;
•   Monitor the conduct of CA in the province;
•   Professionally guide teachers to ensure provision of quality education at
    provincial level.

Examinations Specialist

•   Analyse and moderate CA results and certify candidates;
•   Integrate CA results with terminal examination results;
•   Determine grade boundaries;
•   Certify the candidates;

                                    6
•   Disseminate the results of candidates.

      Monitors

      As monitors of the CA programme, various officials and stakeholders will look
      out for the following documents and information:

      •   Progress chart;
      •   Record of CA results and analysis;
      •   Marked evidence of pupils’ CA work on remedial activities;
      •   Evaluating gender performance;
      •   Pupil’s Record Cards;
      •   CA plans or schedules and schemes;
      •   Evidence of pupils’ work;
      •   CA administration;
      •   Evidence of remedial work;
      •   Availability of planned remedial work in the classroom;
      •   Availability of the teacher’s guide;
      •   Sample CA tasks;
      •   Evidence of a variety of CA tasks;
      •   Teacher’s record of pupils’ performance.

1.5   Plan for Implementation of Continuous Assessment
      CA in Zambia is planned to roll out over a period of several years. This will
      allow for proper stakeholder support and evaluation. The following list
      provides the brief timeline of important CA activities through 2008:

      •   Creation of CA Steering and Technical Committees (2005);
      •   Development of assessment schemes, teacher’s guides, model
          assessment tasks booklets and recordkeeping forms (2005);
      •   Design of quantitative evaluation methodology with focus on student
          learning outcomes (2005);
      •   Implementation of CA pilot in Phase 1 schools: Lusaka, Southern and
          Western regions (2006);
      •   Baseline report on student learning outcomes (2006);
      •   Implementation of CA pilot in Phase 2 schools: Central, Copperbelt and
          Eastern Regions (2007);
      •   Expansion of modified CA pilot to community schools (2007);
      •   Post-test report on student learning outcomes (2007);
      •   Implementation of CA pilot in Phase 3 schools: Luapula, Northern and
          Northwestern Regions (2008);
      •   Discussion of scaling up of CA pilot and systems-level planning for
          combining Grade 7 end-of-cycle summative test scores with CA scores for
          selection and certification purposes (2008).




                                          7
Chapter Two: Evaluation Methodology
2.1    Objectives
       The main objective of the quantitative evaluation is to determine whether the
       CA programme has had positive effects on student learning outcomes. The
       evaluation allows for a determination of whether pupils’ academic
       performance has changed as a result of the CA intervention, as well as the
       extent of the change in performance.

2.2    Design
       The evaluation design is quasi-experimental, with pre-test and post-tests
       administered to intervention (pilot) and control (comparison) groups. It
       features a pre-test at the beginning of Grade 5 and post-tests at the end of
       Grades 5, 6, and 7. The pilot and comparison groups will be compared at
       each time point in 6 subject areas to see if there are differences in test scores
       from the baseline to the post-tests by group (see Figures 1 and 2 below). 3

       Figure 1: Pre-Test and Post-Test, Pilot and Control Group Design

                          Grade 5                 Grade 5            Grade 6           Grade 7
                          Pre-test                Post-test          Post-test         Post-test

                            Pilot                   Pilot              Pilot             Pilot
                           Group                   Group              Group             Group



                          Control                 Control            Control           Control
                          Group                   Group              Group             Group



       Figure 2: Expected Results from the Evaluation

                          650
                          600
                          550
           Scaled Score




                          500
                          450
                          400
                          350                                                          Pilot
                          300                                                          Control
                          250
                          200
                                    G5 Pre-test       G5 Post-test    G6 Post-test   G7 Post-test
                                                              Assessment



3
  For more information, refer to the Summary of the Continuous Assessment Program August 2007 by
the Examinations Council of Zambia and the EQUIP2-Zambia project.

                                                              8
With the matched pairs random assignment design, it was expected that the
      two groups, pilot and control, would have similar mean scores on the pre-test.
      However, with a successful intervention, it was expected that the pilot group
      would score higher than the control group on the subsequent post-tests.

2.3   Sample
      The sample included all the 2006 (pre-test) and 2007 (post-test) Grade 5
      basic school pupils in Lusaka, Southern and Western Provinces in the 24 pilot
      (intervention) and 24 comparison (control) schools. The schools were chosen
      using matched pairs by geographic location, school size, and grade levels as
      matching variables, followed by random assignment to pilot and comparison
      status. CA activities were implemented in pilot schools but not in the
      comparison schools.

2.4   Instruments
      Student achievement for the Grade 5 baseline and post-pilot administrations
      was measured using multiple choice tests with 30 items (30 points per test).
      The test development process included the following steps:

      •   Review of the curriculums for each subject area;
      •   Development of test specifications;
      •   Development of items;
      •   Piloting of items;
      •   Data reviews of item statistics;
      •   Forms pulling (selecting items for final test papers).

      The test instruments were developed by teams of Curriculum Specialists,
      Standards Officers, Examination Specialists and Teachers. The baseline tests
      (pre-tests) were developed based on the Grade 4 syllabus and the post-pilot
      tests (post-tests) were developed based on the Grade 5 syllabus.

2.5   Administration
      The ECZ organized the administration of both pre-test and post-test papers.
      Teams comprising an Examination Specialist, a Standards Officer and a
      Curriculum Specialist were sent to each region to supervise the
      administration. District Education officials, School Administrators and
      Teachers were involved in the actual administration of the tests. All of the
      Grade 5 pupils in the pilot and comparison schools sat for six tests, one in
      each of the six subject areas (English, Mathematics, Social and Development
      Studies, Integrated Science, Creative and Technology Studies and
      Community Studies). The baseline tests (Grade 4 syllabus) were administered
      to the students at the beginning of Grade 5, in February 2006. The post-pilot
      tests (Grade 5 syllabus) were administered in February 2007.

      Note that there will be two more administrations of post-tests for the cohort of
      students in the three provinces. These will take place in February 2008



                                           9
(Grade 6 syllabus) and November 2008 (Grade 7 syllabus). This process will
      be repeated in Phases 2 and 3 schools (see Table 1 below).

      Table 1: Implementation Plan for CA Pilot

                 Phase            2006           2007      2008      2009      2010
       Phase 1 (Lusaka,
                                 Grade 5        Grade 6   Grade 7
       Southern, Western)
       Phase 2 (Central,
                                                Grade 5   Grade 6   Grade 7
       Copperbelt, Eastern )
       Phase 3 (Luapula,
                                                          Grade 5   Grade 6   Grade 7
       Northern, Northwestern)


2.6   Data Capture and Scoring
      Data were captured using Optical Mark Readers (OMR) and scored by use of
      the Faim software at the ECZ. Through this process, tem scores for all
      students were converted into electronic format and data files were produced
      for analysis.

2.7   Data Analysis
      Data were analysed by use of the Statistical Package for Social Sciences
      (SPSS). Scores and frequencies by subject were generated. Analysed data
      were presented in tabular, chart and graphical forms. Additional analyses
      were conducted using WINSTEPS (item response theory Rasch modelling)
      software. SPSS was used for scaling the pupils’ scores.




                                           10
Chapter Three: Assessment Results
3.1     Psychometric Characteristics
        An initial step in determining the results from the assessments was to conduct
        analyses to determine the psychometric characteristics of the assessments.
        Both the Standards for Educational and Psychological Testing (1999) 4 and
        the Code of Fair Testing Practices in Education (2004) 5 include standards for
        identifying quality items. Items should assess only knowledge or skills that are
        identified as part of the domain being tested and should avoid assessing
        irrelevant factors (e.g., ambiguous and grammatical errors, sensitive content
        or language, etc.).

        Both quantitative and qualitative analyses were conducted to ensure that
        items on both Grade 5 baseline and post-pilot tests met satisfactory
        psychometric guidelines. The statistical evaluations of the items are presented
        in two parts, using classical test theory (CTT) and item response theory (IRT),
        which is sometimes called modern test theory. 6 The two measurement
        models generally provide similar results, but IRT is particularly useful for test
        scaling and equating. CTT analyses included 1) difficulty index (p-value), 2)
        discrimination index (item-test correlations), and 3) test reliability (Cronbach's
        Alpha for an estimate of internal consistency reliability). IRT analyses
        included (1) calibration of items, and (2) examination of item difficulty index
        (i.e., b-parameter).

3.2     Classical Test Theory
        Difficulty Indices (p)

        All multiple-choice items were evaluated in terms of item difficulty according to
        standard classical test theory practices. Difficulty was defined as the average
        proportion of points achieved on an item by the students. It was calculated by
        obtaining the average score on an item and dividing by the maximum possible
        score for the item. Multiple-choice items were scored dichotomously (1 point
        vs. no points, or correct vs. incorrect), so the difficulty index was simply the
        proportion of students who correctly answered the item. All items on Grade 5
        pre-tests and post-tests had four response options. Table 2 shows the
        average p-values for each test. Note that this may also be calculated by
        taking the average raw score of all students divided by the maximum points
        (30) per test.

4
  American Educational Research Association, American Psychological Association, and National
Council on Measurement in Education (1999). Standards for Educational and Psychological Testing.
Washington, DC: American Educational Research Association.
5
  Joint Committee on Testing Practices (2004). Code of Fair Testing Practices in Education.
Washington, DC: American Psychological Association.
6
  For more information, see Crocker, L. and Algina, J. (1986). Introduction to Classical and Modern
Test Theory. New York: Harcourt Brace.




                                                11
Table 2: Overall Test Difficulty Estimates by Subject Area

                                     Grade 5 Pre-test   Grade 5 Post-test
           Subject Area                        Mean               Mean
                                    # Items             # Items
                                              p-value            p-value
 English                               30       0.40       30      0.37
 Social and Developmental Studies      30       0.34       30      0.42
 Mathematics                           30       0.41       30      0.40
 Integrated Science                   30        0.33      30       0.36
 Creative and Technology Studies      30        0.35      30       0.36
 Community Studies                    30        0.32      30       0.37

Items that are answered correctly by almost all students provide little
information about differences in student ability, but they do indicate
knowledge or skills that have been mastered by most students. Similarly,
items that are correctly answered by very few students may indicate
knowledge or skills that have not yet been mastered by most students, but
such items provide little information about differences in student ability. In
general, to provide best measurement, difficulty indices should range from
near-chance performance of about 0.20 (for four-option, multiple-choice
items) to 0.90. In general, the item difficulty indices for both Grade 5 pre-tests
and post-tests were within generally acceptable and expected ranges (see
Appendix 1 for a complete list of p-values for all items on each test).

Item Discrimination (Item-Test or Point-Biserial Correlations)

One desirable feature of an item is that the higher performing students do
better on the item than lower performing students. The correlation between
student performance on a single item and total test score is a commonly used
measure of this characteristic of an item. Within classical test theory, the item-
test (or point-biserial) correlation is referred to as the item’s discrimination
because it indicates the extent to which successful performance on an item
discriminates between high and low scores on the test. The theoretical range
of these statistics is –1 to +1, with a typical range from 0.2 to 0.6.

Discrimination indices can be thought of as measures of how closely an item
assesses the same knowledge and skills assessed by other items contributing
to the total score. Discrimination indices for Grade 5 are presented in Table 3.

Table 3: Overall Test Discrimination Estimates by Subject Area

                                     Grade 5 Pre-test   Grade 5 Post-test
           Subject Area                        Mean               Mean
                                    # Items             # Items
                                               Pt-bis             Pt-bis
 English                               30       0.46       30      0.48
 Social and Developmental Studies      30       0.38       30      0.45
 Mathematics                           30       0.37       30      0.41
 Integrated Science                    30       0.35       30      0.43
 Creative and Technology Studies       30       0.38       30      0.44
 Community Studies                     30       0.29       30      0.43


                                     12
On average, the discrimination indices were within acceptable and expected
        ranges (i.e., 0.20 to 0.60). The positive discrimination indices indicate that
        students who performed well on individual items tended to perform well
        overall on the test. There were no items on the instruments that had near-zero
        discrimination indices (see Appendix 1 for a complete list of the point-biserial
        correlations for all items on each pre-test and post-test per subject area).

        Test Reliabilities

        Although an individual item’s statistical properties is an important focus, a
        complete evaluation of an assessment must also address the way items
        function together and complement one another.

        There are a number of ways to estimate an assessment’s reliability. One
        possible approach is to give the same test to the same students at two
        different points in time. If students receive the same scores on each test, then
        the extraneous factors affecting performance are small and the test is reliable.
        (This is referred to as test-retest reliability.) A potential problem with this
        approach is that students may remember items from the first administration or
        may have gained (or lost) knowledge or skills in the interim between the two
        administrations. A solution to the ‘remembering items’ problem is to give a
        different, but parallel test at the second administration. If the student scores
        on each test correlate highly, the test is considered reliable. (This is known as
        alternate forms reliability, because an alternate form of the test is used in
        each administration.) This approach, however, does not address the problem
        that students may have gained (or lost) knowledge or skills in the interim
        between the two administrations. In addition, the practical challenges of
        developing and administering parallel forms generally preclude the use of
        parallel forms reliability indices. One way to address these problems is to split
        the test in half and then correlate students’ scores on the two half-tests; this in
        effect treats each half-test as a complete test. By doing this, the problems
        associated with an intervening time interval, and of creating and administering
        two parallel forms of the test, are alleviated. This is known as a split-half
        estimate of reliability. If the two half-test scores correlate highly, items on the
        two half-tests must be measuring very similar knowledge or skills. This is
        evidence that the items complement one another and function well as a
        group. This also suggests that measurement error will be minimal.

        The split-half method requires a judgment regarding the selection of which
        items contribute to which half-test score. This decision may have an impact on
        the resulting correlation; different splits will give different estimates of
        reliability. Cronbach (1951) 7 provided a statistic, α (alpha), that avoids this
        concern about the split-half method. Cronbach’s α gives an estimate of the
        average of all possible splits for a given test. Cronbach’s α is often referred to
        as a measure of internal consistency because it provides a measure of how
        well all the items in the test measure one single underlying ability. Cronbach’s
        α is computed using the following formula:

7
 Cronbach, L. J. (1951). Coefficient Alpha and the Internal Structure of Tests. Psychometrika, 16,
297–334.


                                                 13
⎡         n             ⎤
                                              n ⎢          ∑σ 2 (Yi ) ⎥
                                       α   =      ⎢1 −     i =1           ⎥
                                             n −1 ⎢             σ x2      ⎥
                                                  ⎢                       ⎥
                                                  ⎣                       ⎦
        where, i : Item
              n : Total number of items,
              σ 2 (Yi ) : Individual item variance, and
                σ x2 : Total test variance

        For standardized tests, reliability estimates should be approximately 0.80 or
        higher. According to Table 4, the reliabilities for the tests on the pre-test
        ranged from 0.63 (Community Studies) to 0.87 (English). The reliability
        estimate for Community Studies was low due to the absence of a national
        curriculum for use in test construction. In contrast, the reliability estimates for
        the post-tests ranged 0.83 (Mathematics) to 0.89 (English). It is likely that the
        post-tests had higher reliability estimates since the test developers had more
        experience than they had when they developed the baseline tests.

        Table 4: Test Reliability Estimates by Subject Area

                                                      Grade 5 Pre-test           Grade 5 Post-test
                     Subject Area                                 Coefficient             Coefficient
                                                   # Items                      # Items
                                                                    Alpha                   Alpha
         English                                      30               0.87       30         0.89
         Social and Developmental Studies             30               0.80       30         0.87
         Mathematics                                  30               0.79       30         0.83
         Integrated Science                           30               0.76       30         0.85
         Creative and Technology Studies              30               0.80       30         0.86
         Community Studies                            30               0.63       30         0.85


3.3     Item Response Theory
        Item Response Theory (IRT) uses mathematical models to define a
        relationship between an unobserved measure of student ability, usually
        referred to as theta ( θ ), and the probability ( p ) of getting a dichotomous item
        correct. In IRT, it is assumed that all items are independent measures of the
        same construct or ability (i.e., the same θ ). The process of determining the
        specific mathematical relationship between θ and p is referred to as item
        calibration. Once items are calibrated, they are defined by a set of parameters
        which specify a non-linear relationship between θ and p . 8


8
 For more information about item calibration, see the following references: Lord, F.M. and Novick,
M.R. (1968). Statistical Theories of Mental Test Scores. Boston, MA: Addison-Wesley; Hambleton,
R.K. and Swaminathan, H. (1984). Item Response Theory: Principles and Applications. New York:
Springer.

                                                 14
For the CA programme, a 1-parameter or Rasch model was implemented.
       The equation for the Rasch model is defined as probability of giving correct
       response to item i by a student with ability level of θ :

                                                   exp D(θ − bi )
                                     Pi (θ ) =
                                                 1 + exp D(θ − bi )

       Where, i = item,
              b = item difficulty,
             D = a normalizing constant equal to 1.701.

       In IRT, item difficulty ( bi ) and student ability ( θ ) are measured on a scale of
        − ∞ to + ∞ . A scale of − 3.0 to + 3.0 is used operationally in educational
       assessment programmes. with − 3.0 being low student ability or an easy item
       and + 3.0 being high student ability or a difficult item. The bi parameter for an
       item is the position on the ability scale where the probability of a correct
       response is 0.50.

       The WINSTEPS program was the software used to do the IRT analyses. The
       item parameter files resulting from the analyses are provided in Appendices 2
       and 3. This presentation is direct output from WINSTEPS. 9 Raw scores were
       then scaled using the item response theory model, with a range of 100-500
       (see Appendices 2 and 3 for the raw score to scale score conversion tables
       for each subject area).

3.4    Scaled Scores
       The Grade 5 pre-test and post-test scores in each subject area are reported
       on a scale that ranges from 100 to 500. Students’ raw scores or total number
       of points, on the pre-tests and post-tests are translated to scaled scores using
       a data analysis process called scaling. Scaling simply converts raw points
       from one scale to another. In the same way that distance can be expressed in
       miles or kilometres, or monetary value can be expressed in terms of U.S.
       dollars or Zambian Kwacha, student scores on both pre and post-tests could
       be expressed as raw scores (i.e., number of points) or scaled scores.

       Cut points were established on the raw score scale both for the pre-tests and
       post-tests (see Section 3.8 “Performance Levels” for an explanation of how
       these cut points were determined). Once the raw score cut points were
       determined via standard setting, the next step was to compute theta cuts
       using the test characteristic curve (TCC) mapping procedure and then
       calculate the transformation coefficients that would be used to place students’
       raw scores onto the theta scale then onto the scaled score used for reporting.
       As previously stated, student scores on the assessments are reported in
       integer values from 100 to 500 with two scores representing cut scores on
       each assessment. Two cut points (Unsatisfactory/Satisfactory and
       Satisfactory/Advanced) were pre-set at 250 and 350, respectively.


9
 See the WINSTEPS user’s manual for additional details regarding this output (at
http://www.winsteps.com).

                                                 15
Figure 3: Scaled Score Conversion Procedure


   Raw Score Cut           Conversion of Raw Score Cuts into theta     Calculation of
   Scores (from            cuts   θ1 and θ 2   Using TCC Mapping       Scaled Score
   Standard Setting)                                                   constants (b
                                                                       and m) using
                                                                       theta cuts
                                                                       ( θ 1 , θ 2 ), and
                                   Calculation of Scaled Score using   scaled score
                                                                       cuts (250 and
                                    m(θ ) + b                          350)




The scaled scores are obtained by a simple linear transformation of the theta
score using the values of 250 and 350 on the scaled score metric and the
associated theta cut points to define the transformation. The scaling
coefficients were calculated using the following formulae:

                                  b = 250 − m(θ1 )
                                  b = 350 − m(θ 2 )
                                       (350 − 250)
                              m=
                                         (θ 2 − θ1 )

Where m is the slope of the line providing the relationship between the theta
and scaled scores, b is the intercept, θ 1 is the cut score on the theta score
metric for the Unsatisfactory/Satisfactory cut (i.e., corresponding to the raw
score cut for Unsatisfactory/Satisfactory), and θ 2 is the cut score on the theta
score metric for the Satisfactory/Advanced cut (i.e., corresponding to the raw
score cut for Satisfactory/Advanced). Scaled scores were then calculated
using the following linear transformation (see Figure 1):

                           Scaled Score = m (θ ) + b

Where, θ represents a student’s theta (or ability) score. The values obtained
using this formula were rounded to the nearest integer and then truncated
such that no student received a score below 100 or above 500. Table 4
presents the mean raw score for each grade/subject area combination in pre
and post-tests.

It is important to note that converting from raw scores to scaled scores does
not change the students’ performance-level classifications. For the Zambia
CA programme, a score of 250 is the cut score between Unsatisfactory and
Satisfactory and a score of 350 is the cut score between Satisfactory and
Advanced. This is true regardless of which subject area, grade, or year one
may be concerned with.

Scaled scores supplement the pre-test and post-test results by providing
information about the position of a student’s results within a performance
level. For instance, if the range for a performance level is 200 to 250, a

                                      16
student with a scaled score of 245 is near the top of the performance level,
and close to the next higher performance level.

School level scaled scores are calculated by computing the average of
student-level scaled scores. Table 5 provides the raw score averages for each
of the subject areas, while Table 6 provides the same information in scaled
scores.

Table 5: Grade 5 Mean Raw Scores by Subject Area

                                             Grade 5 Pre-test       Grade 5 Post-test
                                       #
           Subject Area                                    Std.                    Std.
                                    Items    N    Mean              N     Mean
                                                           Dev.                    Dev.
 English                             30     3798   12.2    6.5     4025   11.7     7.1
 Social and Developmental Studies    30     3962   10.1    5.3     4104   13.2     6.6
 Mathematics                         30     3883   12.3    5.3     4127   12.4     5.8
 Integrated Science                  30     4039    9.9    4.9     4135   11.1     6.3
 Creative and Technology Studies     30     4032   10.5    5.3     4097   11.7     6.2
 Community Studies                   30     4037    9.5    4.0     4141   11.2     6.4

According to Table 5, overall mean raw scores (with both pilot and
comparison groups taken together) across the subject areas on the pre-test
ranged from 9.5 (Community Studies) to 12.3 (Mathematics) out of possible
score point of 30. In contrast, the overall mean raw scores for the post-tests
ranged from 11.1 (Integrated Science and Creative and Technology Studies)
to 13.2 (Social and Developmental Studies). From Table 6, the scaled score
averages for Grade 5 pre-tests ranged from 214 (Community Studies) to 239
(English) out of possible score point of 100-500. In contrast, the scaled score
averages for the post-tests ranged from 233 (English) to 262 (Mathematics).

Table 6: Grade 5 Mean Scaled Scores by Subject Area

                                              Grade 5 Pre-test       Grade 5 Post-test
                                       #
           Subject Area                                     Std.                   Std.
                                    Items    N    Mean               N    Mean
                                                           Dev.                    Dev.
 English                             30     3798 238.8 83.7         4025 233.4 88.1
 Social and Developmental Studies    30     3962 230.5 86.2         4104 241.2 83.9
 Mathematics                         30     3883 222.4 89.2         4127 261.9 72.6
 Integrated Science                  30     4039 226.5 80.2         4135 245.7 73.7
 Creative and Technology Studies     30     4032 224.1 85.3         4097 244.3 83.0
 Community Studies                   30     4037 214.0 83.7         4141 236.9 72.3



It was stated earlier that scaled score is a simple linear transformation of the
raw scores, using the values of 250 and 350 on the scaled score metric.
Student’s relative position on the raw score matrix does not change due to
this scale transformation.

Note that the primary interest of this evaluation is not whether the raw scores
and/or scaled scores increase or decrease from pre-test to post-test. These
differences will occur mainly through variations in test difficulty. The main
analysis will compare the relative changes in the two groups, i.e., pilot and

                                     17
comparison, across the two time points, i.e., pre-test to post-test. At a later
        point, post-tests will also be conducted when the cohort of students is in
        Grade 6 and Grade 7, followed by extended analyses for the two additional
        time points.

3.5     Vertical Scaled Scores
        In vertical scaling, tests that vary in difficulty level, but that are intended to
        measure similar constructs, are placed on the same scale. Placing different
        tests on the same scale can be implemented in a number of ways, such as,
        linking items across the tests or social moderation. For the CA programme, a
        social moderation (Linn, 1993) procedure was employed for vertical scaling. 10

        In social moderation, assessments are developed in reference to a common
        content framework. Performance of individual students, and schools, is
        measured against a single set of common standards. For Zambia, an analysis
        of the Grade 4 and 5 curriculums showed that the content was vertically
        aligned, i.e., students were expected to progress in their learning along the
        same constructs from one grade level to the next. This allowed the test
        developers to link the pre-tests and post-tests through common performance
        standards. The visual representation of the vertical scaling scheme for the CA
        programme is shown below.

        Figure 4: Vertical Scaling Scheme


           Grade 5 Pre-test:                250           350




          Grade 5 Post-test:                              350          450




          Grade 6 Post-test:                                           450           550




           Grade 7 Post-test:                                                        550           650




        In other words, students who were classified as Advanced in the Grade 5 pre-
        test (i.e., end of Grade 4 syllabus) would also be considered as Satisfactory in
        Grade 5 post-test (i.e., end of Grade 5 syllabus) and students who classified
        as Advanced in Grade 5 post-test would be considered as Satisfactory in
        Grade 6 post (end of Grade 6 test) so on through Grade 7. In the vertical

10
  Linn, R. L. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6(1),
83-102.

                                                  18
scaled score matrix, students who earned a grade level scaled score of 250
      on Grade 5 post-test would also earn a vertical scaled score of 350 (because
      350 is the equivalent grade level scaled score in Grade 5 pre-test).
      Therefore, grade level scaled scores and vertical scaled scores is differed by
      a constant value of 100 points. The mean vertical scaled scores for each
      subject are shown in Table 7.

      Table 7: Grade 5 Mean Vertical Scaled Scores by Subject Area

                                                                             Grade 5 Pre-test      Grade 5 Post-test
                                                                   #
                                        Subject Area                                       Std.                  Std.
                                                                Items       N    Mean              N    Mean
                                                                                          Dev.                   Dev.
       English                                                       30    3798 238.8 83.7        4025 333.4 88.1
       Social and Developmental Studies                              30    3962 230.5 86.2        4104 341.2 83.9
       Mathematics                                                   30    3883 222.4 89.2        4127 361.9 72.6
       Integrated Science                                            30    4039 226.5 80.2        4135 345.6 73.7
       Creative and Technology Studies                               30    4032 224.1 85.3        4097 344.4 83.0
       Community Studies                                             30    4037 214.0 83.7        4141 336.9 72.3



      Figure 5 shows that mean vertical scaled scores on pre and post-tests across
      the subject areas. Vertical scaled scores for the pre-test are basically the
      grade level scaled scores. As expected, vertical scaled scores for Grade 5
      post-test are higher than the Grade 5 pre-test scaled scores.


      Figure 5: Vertical Scaled Mean Scores by Subject Area


                                  400
          vertical Scaled Score




                                  300

                                                                                                     PRE
                                  200
                                                                                                     POST


                                  100


                                   0
                                           Eng.        SDS   Math.        ISC     CTS      CS



3.6   Comparison between Pilot and Comparison Groups
      The comparisons between pilot and comparison groups were made in raw
      scores and vertical scaled scores. Although raw scores in the pre and post
      tests are not on the same scale as the tests are of varied difficulty, however
      the comparison was made for simplicity. Comparison would be more
      relevant, valid, and beneficial when they are compared on the vertical scaled
      score. Note that vertical scaled scores for the pre and post tests are on the
      same scale.

                                                                     19
Raw Scores

Table 8 shows that the raw score mean differences between the pilot and
comparison schools on the Grade 5 pre-tests were small for each subject
area. The mean differences, analyzed using t-tests, were statistically
significant only in English and Mathematics, with the pupils in comparison
group performing better than those in the pilot group (p<.05). In the other four
subjects, the t-tests showed no significant differences between the two groups
on the baseline. In raw scores, differences in English and Mathematics were
about a half-point, while the differences for the other subjects had a maximum
difference of two-tenths of a point. These results reflected the expectation of
very small differences on the pre-tests, since the schools were randomly
assigned to one of the two groups based on a matched pairs design.

Table 8: Mean Raw Scores by Subject Area and Group

                                        Grade 5 Pre-test                  Grade 5 Post-test
   Subject Area         Group                                            †
                                      N     Mean Std. Dev.              N     Mean Std. Dev.
                   Pilot             1785    11.9        6.4           1773   13.3*       1.6
English            Comparison        2013    12.4*       6.6           1967    12.2       1.6
                   Total             3798    12.2        6.5           3740    12.8       1.6
Social and         Pilot             1907    10.0        5.2           1895   14.9*       1.3
Developmental      Comparison        2055    10.2        5.5           2008    13.7       1.3
Studies            Total             3962    10.1        5.3           3903    14.3       1.3
                   Pilot             1861    12.0        5.3           1849   13.8*       1.4
Mathematics        Comparison        2022    12.6*       5.3           1975    13.2       1.4
                   Total             3883    12.3        5.3           3824    13.5       1.4
                   Pilot             1961     9.8        4.9           1949   13.2*       1.9
Integrated Science Comparison        2078     9.9        4.9           2031    11.2       1.8
                   Total             4039     9.9        4.9           3980    12.2       1.9
                   Pilot             1967    10.5        5.2           1955   12.9*       1.5
Creative and
                   Comparison        2065    10.6        5.4           2018    11.7       1.5
Technology Studies
                   Total             4032    10.5        5.3           3973    12.3       1.5
                   Pilot             1979     9.5        4.0           1967   13.4*       1.6
Community Studies Comparison         2058     9.5        3.9           2011    12.5       1.6
                   Total             4037     9.5        4.0           3978    13.0       1.6

* Significant at p<0.05; † represents adjusted weighted sample size.

The differences between the two groups for all subject areas in Grade 5 post-
test (also in Table 8),were evaluated using an Analysis of Covariance
(ANCOVA), with the pre-test scores as the covariates. In other words, the pre-
tests scores were made statistically equivalent so that the groups could be
evaluated on an equal basis on the post-tests. Using the raw scores, the
results were statistically significant in each of the subject areas, with the pilot
group outperforming the comparison group (p<.05).

Note that all statistical comparisons were made at the school level, not at the
student level. This was due to changes in student population at each school
from pre-test to post-test. The design was based on cohorts (student groups

                                         20
over time) and not on panels (the same students over time). A panel design
would have been statistically possible, but it would also have led to skewed
results due to student attrition.

Vertical Scaled Scores

As started, vertical scaled scores on the pre and post tests were computed
independently both for pilot and comparison groups and were measured on
the same scale (i.e., vertical scale). This makes the comparison more relevant
and valid to assess the impact of CA in the pilot schools compared to the
comparison schools.

Table 9: Mean Vertical Scaled Scores by Subject Area and Group

                                     Grade 5 Pre-tests          Grade 5 Post-tests
   Subject Area           Group                                †
                                   N      Mean Std. Dev.      N     Mean Std. Dev.
                   Pilot          1785    236.1      82.4    1773   352.3*     20.3
English            Comparison     2013    241.2*     84.8    1967   339.9      20.3
                   Total          3798    238.8      83.7    3740   346.1      20.3
Social and         Pilot          1907    229.1      84.3    1895   362.4*     17.7
Developmental      Comparison     2055    231.8      87.9    2008   346.2      17.7
Studies            Total          3962    230.5      86.2    3903   354.3      17.7
                   Pilot          1861    217.8      89.3    1849   380.5*     17.1
Mathematics        Comparison     2022    226.7*     88.9    1975   373.1      17.1
                   Total          3883    222.4      89.2    3824   376.8      17.1
                   Pilot          1961    225.5      80.1    1949   369.5*     20.4
Integrated Science Comparison     2078    227.4      80.4    2031   348.0      20.4
                   Total          4039    226.5      80.2    3980   358.8      20.4
                   Pilot          1967    223.0      84.0    1955   357.1*     16.0
Creative and
                   Comparison     2065    225.1      86.5    2018   343.5      16.0
Technology Studies
                   Total          4032    224.1      85.3    3973   350.3      16.0
                   Pilot          1979    213.7      84.3    1967   365.8*     22.1
Community Studies Comparison      2058    214.2      83.1    2011   352.8      22.1
                   Total          4037    214.0      83.7    3978   359.3      22.1

* Significant at p<0.05

Table 9 shows that the vertical scaled score mean differences between the
pilot and comparison schools on the Grade 5 pre-tests were small for each
subject area. The mean differences in all six subject areas, analyzed using t-
tests, were not statistically significant (p>.05). In contrast, when the
differences between the two groups for all subject areas in Grade 5 post-test
(also in Table 9),were evaluated using an ANCOVA (with the pre-test scores
as the covariates), the results were statistically significant in all subject areas,
with the pilot group outperforming the comparison group (p<.05).

 Figures 6 through 11 show the differences in vertical scaled scores from the
Grade 5 pre-test to the Grade 5 post-test for each of the subject areas. The
graphs show clearly the greater score increases by the pilot groups in all
subject areas except for Mathematics, where the increases were not as
evident as in the other groups, though the pilot group started off lower.

                                     21
Figure 6: English Mean Vertical Scores by Group

                         400
                         380
                         360
 Vertical Scaled Score

                         340
                         320
                         300
                         280                            Pilot
                         260                            Comparison
                         240
                         220
                         200
                               Grade 5 Pre-test        Grade 5 Post-test


Figure 7: Social & Dev. Studies Mean Vertical Scores by Group

                         400
                         380
                         360
 Vertical Scaled Score




                         340
                         320
                         300
                         280
                         260                           Pilot
                         240                           Comparison

                         220
                         200
                               Grade 5 Pre-test        Grade 5 Post-test


Figure 8: Mathematics Mean Vertical Scores by Group

                         400
                         380
                         360
 Vertical Scaled Score




                         340
                         320
                         300
                         280
                         260                           Pilot
                         240                           Comparison
                         220
                         200
                               Grade 5 Pre-test        Grade 5 Post-test




                                                  22
Figure 9: Integrated Science Mean Vertical Scores by Group

                         400
                         380
                         360
 Vertical Scaled Score

                         340
                         320
                         300
                         280
                         260                                   Pilot
                         240                                   Comparison
                         220
                         200
                               Grade 5 Pre-test        Grade 5 Post-test


Figure 10: Creative & Tech. Studies Mean Vertical Scores by Group

                         400
                         380
                         360
 Vertical Scaled Score




                         340
                         320
                         300
                         280
                         260                           Pilot
                         240                           Comparison
                         220
                         200
                               Grade 5 Pre-test        Grade 5 Post-test


Figure 11: Community Studies Mean Vertical Scores by Group

                         400
                         380
                         360
 Vertical Scaled Score




                         340
                         320
                         300
                         280
                         260                           Pilot
                         240                           Comparison
                         220
                         200
                               Grade 5 Pre-test        Grade 5 Post-test




                                                  23
3.7   Comparison across Regions
      While not the focus of the evaluation, the next two sections have useful
      information on student performance. Tables 10 and 11 contain a brief analysis
      of the scores by region, providing information on the scores on a
      disaggregated basis. As with the overall analyses, the comparisons across
      the three regions were made in raw scores and vertical scaled scores. Lusaka
      Region consistently had the highest mean scores (both raw scores and
      vertical scaled scores) in all subjects on the Grade 5 pre-tests, followed by
      Western and Southern. The same pattern of results was also observed for
      Grade 5 post-tests.

      Table 10: Subject Area Mean Raw Scores by Region

                                      Grade 5 Pre-test             Grade 5 Post-test
       Subject Area   Region
                                  N     Mean      Std. Dev.    N     Mean        Std Dev.
                      Southern   1010     11.0        6.2     1157      10.4         6.6
                      Western    994      11.7        5.9     1103      11.9         6.7
      English
                      Lusaka     1794     13.1        6.9     1765      12.4         7.5
                      Total      3798     12.2        6.5     4025      11.7         7.1
                      Southern   1014      9.4        4.8     1214      11.7         6.0
      Social and      Western    1112      9.9        4.9     1125      13.2         6.1
      Developmental
      Studies         Lusaka     1836     10.7        5.8     1765      14.1         7.0
                      Total      3962     10.1        5.3     4104      13.2         6.6
                      Southern   1002     11.5        5.4     1226      11.1         5.2
                      Western    1086     12.2        5.2     1120      12.7         5.3
      Mathematics
                      Lusaka     1795     12.9        5.2     1781      13.0         6.3
                      Total      3883     12.3        5.3     4127      12.4         5.8
                      Southern   1025      9.2        4.4     1212       9.6         5.4
      Integrated      Western    1151      9.4        4.6     1154      11.7         6.4
      Science         Lusaka     1863     10.6        5.3     1769      11.8         6.7
                      Total      4039      9.9        4.9     4135      11.1         6.3
                      Southern   1016      9.6        4.8     1205       9.9         5.6
      Creative and    Western    1140     10.2        5.0     1146      11.3         6.0
      Technology
      Studies         Lusaka     1876     11.2        5.7     1790      11.9         6.9
                      Total      4032     10.5        5.3     4141      11.2         6.4
                      Southern   1015      9.0        3.5     1191      10.5         5.3
      Community       Western    1146      9.4        4.3     1122      11.5         6.0
      Studies         Lusaka     1876      9.8        4.0     1784      12.7         6.8
                      Total      4037      9.5        4.0     4097      11.7         6.2




                                          24
Table 11: Subject Area Mean Vertical Scaled Scores by Region

                                            Grade 5 Pre-test               Grade 5 Post-test
          Subject Area    Region
                                      N       Mean      Std. Dev.    N       Mean        Std Dev.
                         Southern    1010      224.1       80.3     1157       317.3         82.8
                         Western     994       232.3       72.9     1103       335.0         81.0
        English
                         Lusaka      1794      250.7       89.3     1765       343.0         94.1
                         Total       3798      238.8       83.7     4025       333.4         88.1
                         Southern    1014      218.5       77.4     1214       321.7         76.7
        Social and       Western     1112      226.4       79.1     1125       341.1         78.1
        Developmental
        Studies          Lusaka      1836      239.6       93.6     1765       354.7         89.5
                         Total       3962      230.5       86.2     4104       341.2         84.0
                         Southern    1002      209.2       91.0     1226       346.6         66.1
                         Western     1086      219.9       86.2     1120       366.6         65.5
        Mathematics
                         Lusaka      1795      231.3       89.0     1781       369.5         79.3
                         Total       3883      222.4       89.2     4127       361.9         72.6
                         Southern    1025      215.7       72.1     1212       328.9         63.5
        Integrated       Western     1151      218.1       76.1     1154       353.0         74.2
        Science          Lusaka      1863      237.5       85.5     1769       352.4         78.0
                         Total       4039      226.5       80.2     4135       345.7         73.7
                         Southern    1016      209.8       77.9     1191       327.6         70.7
        Creative and     Western     1140      218.9       79.7     1122       340.7         79.5
        Technology
        Studies          Lusaka      1876      234.9       90.8     1784       357.7         90.3
                         Total       4032      224.1       85.3     4097       344.3         83.0
                         Southern    1015      204.2       74.8     1205       323.4         64.3
        Community        Western     1146      213.1       88.6     1146       338.7         66.8
        Studies          Lusaka      1876      219.8       84.6     1790       344.9         79.1
                         Total       4037      214.0       83.7     4141       336.9         72.3



3.8    Performance Categories
       Depending on test difficulty and score distributions, performance categories
       were established for each of the tests using a procedure called standard
       setting. An Angoff (1971) 11 standard setting method was implemented to set
       the cut scores between Unsatisfactory and Satisfactory and between
       Satisfactory and Advanced both for pre-tests and post-tests.

       The resultant cut scores are presented in Tables 12 and 13. In English, for
       example, students who got a score of 1-12 would be classified Unsatisfactory,
       students who got a score of 12-21 would be classified as Satisfactory and
       students who earned a score of 22-30 would be classified as Advanced on the
       pre-test. For Mathematics, the corresponding ranges are 1-13 Unsatisfactory,
       14-19 Satisfactory, and 20-30 Advanced for the pre-test. The post-test ranges
       for each subject area are different from those on the pre-tests; the reason is
       that the pre-tests and post-tests covered different content and had different
       levels of difficulty.

11
 Angoff, W. H. (1971). Scales, Norms, and Equivalent Scores. In R.L. Thorndike (Ed.) Educational
Measurement (2nd ed.). (pp. 508-560). Washington, DC: American Council on Education.


                                                25
Table 12: Performance Categories for Pre-tests by Subject

                                            Grade 5 Pre-test
 Subject Area                    1                 2               3
                           Unsatisfactory     Satisfactory      Advanced
                              (Fail)            (Pass)           (Pass)
 English                       1-12              13-21           22-30
 Social and
                               1-10              11-17           18-30
 Developmental Studies
 Mathematics                   1-13              14-19           20-30

 Integrated Science            1-10              11-17           18-30
 Creative and Technology
                               1-11              12-18           19-30
 Studies
 Community Studies             1-10              11-15           16-30


Table 13: Performance Categories for Post-tests by Subject

                                            Grade 5 Post-test
 Subject Area                    1                 2               3
                           Unsatisfactory     Satisfactory      Advanced
                              (Fail)            (Pass)           (Pass)
 English                       1-12              13-21           22-30
 Social and
                               1-13              14-21           22-30
 Developmental Studies
 Mathematics                   1-10              11-19           20-30

 Integrated Science            1-10              11-20           21-30
 Creative and Technology
                               1-11              12-21           22-30
 Studies
 Community Studies             1-11              12-19           20-30



Tables 14 and 15 provide the percentages of students classified in the 3
performance categories by subject. On the pre-test, the percentages in each
category by group were similar for most of the subjects. For instance, in
Integrated Science, similar percentages of students were in the Satisfactory
(Pass) category for the pilot (34%) and comparison (33%) groups. However,
on the post-test, there were some differences for the groups, mostly in favour
of the pilot group. In Integrated Science, 53% of students in the pilot group
were Satisfactory vs. 43% of students in the comparison group. The
percentages for each group favoured the pilot group on the post-test, with the
exception of Mathematics where the rounded percentage passing was the
same in the pilot (65%) and comparison (65%) groups.




                                      26
Table 14: Percentages of Students in Performance Categories for Pre-tests

                                               Grade 5 Pre-test
 Subject Area    Group              1                 2               3
                              Unsatisfactory     Satisfactory      Advanced
                                 (Fail)            (Pass)           (Pass)
                 Pilot            63.0               27.2            9.8
 English
                 Comparison       59.7               28.2            12.1
 Social and      Pilot            62.8               26.9            10.3
 Developmental
 Studies         Comparison       64.4               24.0            11.6
                 Pilot            64.3               26.2            9.5
 Mathematics
                 Comparison       60.1               29.4            10.5

 Integrated      Pilot            65.9               25.6            8.5
 Science         Comparison       67.3               22.9            9.8
 Creative and    Pilot            67.5               22.9            9.6
 Technology
 Studies         Comparison       68.4               20.1            11.5

 Community       Pilot            66.8               25.4            7.8
 Studies         Comparison       66.8               24.8            8.4



Table 15: Percentages of Students in Performance Categories for Post-tests

                                               Grade 5 Post-test
 Subject Area    Group              1                 2               3
                              Unsatisfactory     Satisfactory      Advanced
                                 (Fail)            (Pass)           (Pass)
                 Pilot            60.0               26.5            13.5
 English
                 Comparison       64.0               24.0            11.9
 Social and      Pilot            51.4               33.4            15.3
 Developmental
 Studies         Comparison       59.3               30.6            10.2
                 Pilot            35.2               53.9            10.9
 Mathematics
                 Comparison       34.8               56.3            8.9

 Integrated      Pilot            46.7               40.2            13.1
 Science         Comparison       57.3               36.0            6.7
 Creative and    Pilot            54.5               35.1            10.4
 Technology
 Studies         Comparison       62.3               31.0            6.7

 Community       Pilot            50.4               33.9            15.6
 Studies         Comparison       54.4               36.2            9.5




                                   27
Chapter Four: Summary and Conclusions
The main objective of the evaluation was to determine whether the CA
programme is having positive effects on student learning outcomes in the first
year of implementation. This was accomplished by measuring and comparing
the levels of learning achievement of pupils in pilot (intervention) and
comparison (control) schools. A baseline (pre-test) assessment occurred
before implementation of the proposed interventions at the beginning of
Grade 5 in randomly selected pilot schools. This created a basis upon which
the impact of CA was measured at the end of the Grade 5 pilot year.

A sample of 48 schools was selected from Lusaka, Southern and Western
Provinces using a matched pairs design and random assignment, resulting in
24 pilot schools and 24 comparison schools. Student achievement for the
Grade 5 baseline and post-test administrations was measured using multiple
choice tests in 6 subject areas with 30 items each (30 points per test). The
Grade 5 baseline tests were based on the Grade 4 curriculum, while the
Grade 5 post-tests were based on the Grade 5 curriculum. Overall, the
psychometric characteristics of the tests were very satisfactory on both the
pre-tests and post-tests. Items were within acceptable difficulty (p-value)
ranges and discrimination (point-biserial correlation) levels. Overall tests were
found reliable, using Cronbach's Alpha as an estimate of internal consistency
reliability.

Performance of the schools in the baseline and post-tests were compared
using mean raw scores and mean vertical scaled scores. The vertical scaled
score comparison was found more relevant, valid, and beneficial, since the
school mean scores both on the baseline and post-tests were evaluated on
the same measurement scale (i.e., vertical scale). In addition, statisticians
generally prefer using scaled scores for longitudinal comparisons since the
scale is equal interval, thus making comparisons more accurate.

Overall, the pupils’ scores on the baseline pre-test were very similar in the
pilot and comparison schools. The comparison schools scored slightly higher
on the English and Mathematics tests, but the score differences for the two
groups on the other four tests were minimal. On the post-test, which was
administered after one year of the CA programme, the scores of the pilot
schools on all six tests were significantly higher than those in the comparison
schools. This provides strong initial evidence that the CA programme had a
significantly positive effect on pupil learning outcomes.

When the performance of the schools on the baseline and post-tests were
compared by region, Lusaka Region consistently had the highest mean
scores in all subjects on the Grade 5 pre-tests and post-test, followed by
Western and Southern. The number of schools by region was too small to
make statistically valid region-by-region comparisons of pre-test to post-test
scores for the pilot and comparison groups.

Students were also classified into three performance level categories
(Unsatisfactory, Satisfactory, and Advanced) in each subject area based on
their performance in baseline and post-tests. On the pre-tests, the

                                   28
percentages in each category by group were similar for most of the subjects.
However, on the post-test, there were differences in favour of the pilot group
in virtually all subjects. For instance, in Integrated Science, 53% of students in
the pilot group were Satisfactory and above vs. 43% of students in the
comparison group. This provided strong evidence that a greater percentage of
students in the pilot group were achieving a passing score on the post-test
than those in the comparison group.

The next round of post-tests in the Phase 1 schools will be administered when
the same cohort of pupils completes Grade 6. This will be followed by a final
test administration (a third post-test) when the cohort of pupils completes
Grade 7. At that point, with four time points (a baseline and three post-tests),
more substantial conclusions will be drawn on the effectiveness of the CA
programme.

Note also that the evaluation process is being repeated in the Phase 2 and
Phase 3 schools, which will provide a complete national quantitative
evaluation of the programme at the end of Year 5 of implementation (2010).
Based on guidance from the CA Steering Committee, results from the
evaluation will be used at a selected point in the implementation period as a
criterion for scaling up the CA programme to other primary schools in Zambia.




                                    29
Appendix 1: Item Statistics by Subject
Table A1: English Item Statistics

       P-value       Pt-Biserial               P-value     Pt-Biserial
Seq.                                   Seq.
       Pre-test       Pre-test                 Post-test   Post-test
1        .65            .47             1        .65          .55
2        .63            .53             2        .51          .58
3        .63            .52             3        .48          .44
4        .48            .56             4        .41          .54
5        .52            .55             5        .40          .48
6        .40            .53             6        .29          .36
7        .56            .58             7        .50          .45
8        .54            .55             8        .46          .46
9        .46            .56             9        .52          .61
10       .46            .41             10       .35          .61
11       .61            .52             11       .26          .46
12       .40            .52             12       .21          .35
13       .38            .47             13       .33          .58
14       .39            .50             14       .36          .56
15       .27            .46             15       .35          .55
16       .29            .42             16       .33          .40
17       .28            .40             17       .22          .24
18       .47            .55             18       .36          .59
19       .33            .40             19       .42          .54
20       .36            .46             20       .40          .51
21       .24            .46             21       .34          .53
22       .34            .30             22       .38          .47
23       .33            .36             23       .21          .35
24       .37            .47             24       .38          .56
25       .39            .46             25       .41          .49
26       .35            .42             26       .35          .46
27       .31            .38             27       .34          .50
28       .25            .28             28       .30          .40
29       .27            .32             29       .38          .52
30       .20            .29             30       .27          .40
Table A2: Social and Developmental Studies Item Statistics

          P-value    Pt-Biserial              P-value     Pt-Biserial
Seq.                                  Seq.
          Pre-test    Pre-test                Post-test   Post-test
1           .49         .52            1        .66          .57
2           .47         .39            2        .53          .60
3           .39         .49            3        .66          .60
4           .37         .32            4        .58          .50
5           .35         .47            5        .51          .57
6           .36         .35            6        .48          .61
7           .43         .51            7        .52          .61
8           .41         .41            8        .42          .31
9           .36         .21            9        .44          .56
10          .37         .43            10       .49          .50
11          .38         .49            11       .34          .42
12          .37         .48            12       .39          .43
13          .35         .42            13       .51          .49
14          .33         .34            14       .43          .54
15          .30         .46            15       .36          .58
16          .33         .41            16       .36          .44
17          .28         .30            17       .39          .40
18          .31         .26            18       .42          .42
19          .30         .46            19       .37          .55
20          .40         .45            20       .34          .51
21          .25         .44            21       .32          .38
22          .26         .43            22       .35          .36
23          .25         .41            23       .32          .44
24          .26         .29            24       .38          .26
25          .36         .31            25       .38          .25
26          .26         .32            26       .34          .39
27          .26         .19            27       .36          .31
28          .27         .37            28       .32          .24
29          .29         .19            29       .27          .22
30          .30         .25            30       .30          .39
Table A3: Mathematics Item Statistics


       P-value    Pt-Biserial              P-value     Pt-Biserial
Seq.                               Seq.
       Pre-test    Pre-test                Post-test   Post-test
1        .81         .43            1        .70          .56
2        .59         .51            2        .65          .55
3        .46         .34            3        .71          .57
4        .49         .48            4        .56          .55
5        .54         .55            5        .60          .54
6        .57         .51            6        .64          .52
7        .44         .42            7        .46          .48
8        .46         .25            8        .50          .50
9        .43         .29            9        .47          .32
10       .50         .51            10       .55          .34
11       .43         .51            11       .38          .44
12       .34         .26            12       .39          .44
13       .39         .42            13       .39          .45
14       .46         .42            14       .40          .45
15       .48         .45            15       .42          .28
16       .30         .25            16       .34          .32
17       .36         .30            17       .34          .46
18       .32         .23            18       .38          .48
19       .33         .36            19       .29          .34
20       .27         .28            20       .30          .35
21       .52         .40            21       .25          .37
22       .57         .48            22       .27          .40
23       .32         .33            23       .23          .34
24       .40         .46            24       .24          .33
25       .31         .43            25       .18          .23
26       .27         .32            26       .27          .33
27       .30         .26            27       .24          .28
28       .21         .17            28       .36          .48
29       .19         .15            29       .16          .18
30       .25         .32            30       .23          .30
Table A4: Integrated Science Item Statistics

       P-value    Pt-Biserial               P-value     Pt-Biserial
Seq.                               Seq.
       Pre-test    Pre-test                 Post-test   Post-test
1        .49         .42            1         .53          .56
2        .33         .17            2         .53          .56
3        .45         .41            3         .39          .57
4        .41         .44            4         .51          .49
5        .31         .20            5         .44          .52
6        .40         .39            6         .57          .48
7        .28         .43            7         .45          .49
8        .31         .26            8         .47          .53
9        .34         .45            9         .44          .48
10       .29         .26            10        .33          .51
11       .43         .29            11        .38          .34
12       .31         .40            12        .42          .49
13       .52         .28            13        .31          .44
14       .37         .45            14        .36          .51
15       .36         .42            15        .36          .40
16       .41         .43            16        .36          .49
17       .34         .29            17        .38          .55
18       .30         .50            18        .21          .21
19       .37         .50            19        .28          .42
20       .26         .25            20        .38          .48
21       .29         .37            21        .29          .47
22       .26         .38            22        .34          .49
23       .28         .34            23        .25          .29
24       .24         .39            24        .22          .16
25       .20         .35            25        .31          .38
26       .25         .25            26        .25          .29
27       .27         .33            27        .25          .36
28       .29         .21            28        .27          .40
29       .23         .45            29        .23          .27
30       .30         .27            30        .21          .33
Table A5: Creative & Technology Studies Item Statistics

           P-value    Pt-Biserial              P-value     Pt-Biserial
Seq.                                   Seq.
           Pre-test    Pre-test                Post-test   Post-test
1            .25          .55           1        .29          .34
2            .41          .50           2        .41          .50
3            .33          .34           3        .43          .55
4            .56          .45           4        .49          .64
5            .38          .16           5        .46          .54
6            .40          .34           6        .40          .55
7            .35          .46           7        .47          .45
8            .36          .34           8        .48          .52
9            .39          .54           9        .43          .37
10           .47          .48           10       .44          .53
11           .43          .48           11       .29          .46
12           .41          .31           12       .40          .52
13           .30          .40           13       .36          .55
14           .28          .41           14       .39          .56
15           .26          .39           15       .32          .46
16           .37          .52           16       .28          .37
17           .29          .27           17       .36          .37
18           .36          .35           18       .40          .52
19           .41          .40           19       .33          .51
20           .30          .41           20       .22          .25
21           .29          .54           21       .36          .35
22           .25          .25           22       .36          .28
23           .50          .40           23       .29          .25
24           .31          .34           24       .30          .36
25           .28         .387           25       .27          .42
26           .22          .14           26       .28          .44
27           .47          .37           27       .27          .32
28           .34          .32           28       .33          .24
29           .39          .35           29       .23          .52
30           .17          .08           30       .32          .44
Table A6: Community Studies Item Statistics


       P-value    Pt-Biserial             P-value     Pt-Biserial
Seq.                              Seq.
       Pre-test    Pre-test               Post-test   Post-test
1        .62         .41           1        .53          .52
2        .52         .35           2        .44          .60
3        .46         .42           3        .53          .61
4        .43         .48           4        .52          .57
5        .41         .33           5        .44          .49
6        .36         .32           6        .44          .40
7        .31         .21           7        .47          .51
8        .36         .33           8        .42          .57
9        .27         .20           9        .38          .56
10       .37         .21           10       .44          .50
11       .30         .35           11       .30          .41
12       .40         .38           12       .42          .52
13       .30         .19           13       .39          .51
14       .30         .45           14       .36          .43
15       .20         .18           15       .44          .41
16       .30         .36           16       .33          .49
17       .30         .25           17       .43          .50
18       .28         .38           18       .36          .42
19       .26         .21           19       .37          .29
20       .25         .19           20       .32          .31
21       .31         .34           21       .34          .44
22       .26         .21           22       .32          .39
23       .25         .26           23       .32          .29
24       .25         .24           24       .26          .31
25       .30         .31           25       .29          .37
26       .22         .28           26       .30          .28
27       .26         .28           27       .28          .41
28       .23         .21           28       .27          .24
29       .19         .16           29       .24          .21
30       .21         .16           30       .24          .23
Appendix 2: Scores and Frequencies – Grade 5 Pre-Tests
Table A7: English Scores and Frequencies

 Raw      Theta    Scale            Pilot Group                   Comparison Group
 Score             Score    Freq.       %         Cum. %   Freq.         %      Cum. %
   1      -3.59     100      24        1.3         1.3      30          1.5          1.5
   2      -2.84     100      28        1.6         2.9      31          1.5          3.0
   3      -2.38     102      43        2.4         5.3      61          3.0          6.1
   4      -2.04     126      54        3.0         8.3      45          2.2          8.3
   5      -1.76     146      66        3.7         12.0     76          3.8          12.1
   6      -1.52     163      112       6.3         18.3    112          5.6          17.6
   7      -1.31     178      138       7.7         26.1    152          7.6          25.2
   8      -1.11     192      145       8.1         34.2    137          6.8          32.0
   9      -0.93     205      151       8.5         42.6    146          7.3          39.2
  10      -0.76     217      140       7.8         50.5    142          7.1          46.3
  11      -0.60     228      118       6.6         57.1    158          7.8          54.1
  12      -0.44     239      105       5.9         63.0    111          5.5          59.7
  13      -0.29     250      68        3.8         66.8    109          5.4          65.1
  14      -0.14     261      83        4.6         71.4     85          4.2          69.3
  15      0.01      271      67        3.8         75.2     68          3.4          72.7
  16      0.16      282      55        3.1         78.3     68          3.4          76.1
  17      0.30      292      50        2.8         81.1     41          2.0          78.1
  18      0.46      303      41        2.3         83.4     45          2.2          80.3
  19      0.61      314      43        2.4         85.8     52          2.6          82.9
  20      0.77      325      44        2.5         88.2     50          2.5          85.4
  21      0.94      337      35        2.0         90.2     50          2.5          87.9
  22      1.12      350      24        1.3         91.5     27          1.3          89.2
  23      1.31      363      25        1.4         92.9     36          1.8          91.0
  24      1.52      378      19        1.1         94.0     37          1.8          92.8
  25      1.75      395      19        1.1         95.1     46          2.3          95.1
  26      2.03      415      26        1.5         96.5     28          1.4          96.5
  27      2.37      439      14         .8         97.3     18           .9          97.4
  28      2.82      471      19        1.1         98.4     28          1.4          98.8
  29      3.56      500      23        1.3         99.7     20          1.0          99.8
  30      4.80      500       6         .3        100.0     4            .2      100.0
 Total                      1785      100.0                2013        100.0
Table A8: Social and Developmental Studies Scores and Frequencies

 Raw      Theta    Scale            Pilot Group                   Comparison Group
 Score             Score    Freq.       %         Cum. %   Freq.         %      Cum. %
   1      -3.42     100      28        1.5         1.5      28          1.4          1.4
   2      -2.69     100      30        1.6         3.0      35          1.7          3.1
   3      -2.24     100      49        2.6         5.6      46          2.2          5.3
   4      -1.91     112      78        4.1         9.7      66          3.2          8.5
   5      -1.65     139      129       6.8         16.5    138          6.7          15.2
   6      -1.42     162      164       8.6         25.1    188          9.1          24.4
   7      -1.22     183      179       9.4         34.5    209          10.2         34.5
   8      -1.04     201      210       11.0        45.5    253          12.3         46.9
   9      -0.87     218      175       9.2         54.6    191          9.3          56.2
  10      -0.71     235      155       8.1         62.8    169          8.2          64.4
  11      -0.56     250      143       7.5         70.3    118          5.7          70.1
  12      -0.42     264      111       5.8         76.1     97          4.7          74.8
  13      -0.27     280      79        4.1         80.2     78          3.8          78.6
  14      -0.14     293      60        3.1         83.4     65          3.2          81.8
  15      0.00      307      39        2.0         85.4     46          2.2          84.0
  16      0.14      321      36        1.9         87.3     50          2.4          86.5
  17      0.28      336      45        2.4         89.7     39          1.9          88.4
  18      0.42      350      32        1.7         91.3     36          1.8          90.1
  19      0.56      364      28        1.5         92.8     30          1.5          91.6
  20      0.71      380      29        1.5         94.3     32          1.6          93.1
  21      0.87      396      27        1.4         95.8     24          1.2          94.3
  22      1.04      413      14         .7         96.5     28          1.4          95.7
  23      1.22      432      22        1.2         97.6     17           .8          96.5
  24      1.42      452      16         .8         98.5     19           .9          97.4
  25      1.65      476       6         .3         98.8     17           .8          98.2
  26      1.91      500      12         .6         99.4     14           .7          98.9
  27      2.24      500       7         .4         99.8     13           .6          99.6
  28      2.69      500       3         .2         99.9     7            .3          99.9
  29      3.42      500       1         .1        100.0     1            .0      100.0
  30      4.65      500       0         .0        100.0     1            .0      100.0
 Total                      1907      100.0                2055        100.0
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07
Ca Baseline and Post test assessment report 2007 12 oct07

Contenu connexe

Tendances

First Quarter of 2013 Regional Monitoring, Evaluation and Plan Adjustment (RM...
First Quarter of 2013 Regional Monitoring, Evaluation and Plan Adjustment (RM...First Quarter of 2013 Regional Monitoring, Evaluation and Plan Adjustment (RM...
First Quarter of 2013 Regional Monitoring, Evaluation and Plan Adjustment (RM...Dr. Joy Kenneth Sala Biasong
 
Keycharnumeracyp6
Keycharnumeracyp6Keycharnumeracyp6
Keycharnumeracyp6G.j. Darma
 
Monitoring And Evaluation Framework For The K 12 Education And Training Syste...
Monitoring And Evaluation Framework For The K 12 Education And Training Syste...Monitoring And Evaluation Framework For The K 12 Education And Training Syste...
Monitoring And Evaluation Framework For The K 12 Education And Training Syste...Wesley Schwalje
 
Sample RPMS for MT II
Sample RPMS for MT IISample RPMS for MT II
Sample RPMS for MT IIDivine Dizon
 
M&E completion training report oct 142012
M&E completion training report oct 142012M&E completion training report oct 142012
M&E completion training report oct 142012dr-ayub
 
QAAD's Report on Regional Monitoring, Evaluation and Plan Adjustment (RMEPA) ...
QAAD's Report on Regional Monitoring, Evaluation and Plan Adjustment (RMEPA) ...QAAD's Report on Regional Monitoring, Evaluation and Plan Adjustment (RMEPA) ...
QAAD's Report on Regional Monitoring, Evaluation and Plan Adjustment (RMEPA) ...Dr. Joy Kenneth Sala Biasong
 
Quality Assurance and institutional accreditation performance indicators and ...
Quality Assurance and institutional accreditation performance indicators and ...Quality Assurance and institutional accreditation performance indicators and ...
Quality Assurance and institutional accreditation performance indicators and ...Ganesh Shukla
 
Rubrics for IPCRF of Teachers per Objective of their KRAs
Rubrics for IPCRF of Teachers per Objective of their KRAsRubrics for IPCRF of Teachers per Objective of their KRAs
Rubrics for IPCRF of Teachers per Objective of their KRAsDIEGO Pomarca
 
A strategic plan for cavite state university naic branch
A strategic plan for cavite state university naic branchA strategic plan for cavite state university naic branch
A strategic plan for cavite state university naic branchCavite State University
 
Enhanced Sip Guidebook
Enhanced Sip GuidebookEnhanced Sip Guidebook
Enhanced Sip GuidebookIndanan South
 
Sample RPMS for Principal I to IV
Sample RPMS for Principal I to IVSample RPMS for Principal I to IV
Sample RPMS for Principal I to IVDivine Dizon
 
Sample Accomplished SMEA templates
Sample Accomplished SMEA templatesSample Accomplished SMEA templates
Sample Accomplished SMEA templatesDivine Dizon
 
Rpms accomplishment journal
Rpms accomplishment journalRpms accomplishment journal
Rpms accomplishment journalRichard Raqueno
 
Individual performance commitment and review form for regular teachers
Individual performance commitment and review form for regular teachersIndividual performance commitment and review form for regular teachers
Individual performance commitment and review form for regular teachersRai Blanquera
 
Annual operational plan template
Annual operational plan templateAnnual operational plan template
Annual operational plan templateDorothee Villarruz
 
GEAR UP Final Report UPDATE 4 13 15
GEAR UP Final Report UPDATE 4 13 15GEAR UP Final Report UPDATE 4 13 15
GEAR UP Final Report UPDATE 4 13 15Brian Risi
 

Tendances (20)

First Quarter of 2013 Regional Monitoring, Evaluation and Plan Adjustment (RM...
First Quarter of 2013 Regional Monitoring, Evaluation and Plan Adjustment (RM...First Quarter of 2013 Regional Monitoring, Evaluation and Plan Adjustment (RM...
First Quarter of 2013 Regional Monitoring, Evaluation and Plan Adjustment (RM...
 
Keycharnumeracyp6
Keycharnumeracyp6Keycharnumeracyp6
Keycharnumeracyp6
 
Individual Performance Commitment and Review Form
Individual Performance Commitment and Review FormIndividual Performance Commitment and Review Form
Individual Performance Commitment and Review Form
 
Monitoring And Evaluation Framework For The K 12 Education And Training Syste...
Monitoring And Evaluation Framework For The K 12 Education And Training Syste...Monitoring And Evaluation Framework For The K 12 Education And Training Syste...
Monitoring And Evaluation Framework For The K 12 Education And Training Syste...
 
Sample RPMS for MT II
Sample RPMS for MT IISample RPMS for MT II
Sample RPMS for MT II
 
M&E completion training report oct 142012
M&E completion training report oct 142012M&E completion training report oct 142012
M&E completion training report oct 142012
 
QAAD's Report on Regional Monitoring, Evaluation and Plan Adjustment (RMEPA) ...
QAAD's Report on Regional Monitoring, Evaluation and Plan Adjustment (RMEPA) ...QAAD's Report on Regional Monitoring, Evaluation and Plan Adjustment (RMEPA) ...
QAAD's Report on Regional Monitoring, Evaluation and Plan Adjustment (RMEPA) ...
 
Quality Assurance and institutional accreditation performance indicators and ...
Quality Assurance and institutional accreditation performance indicators and ...Quality Assurance and institutional accreditation performance indicators and ...
Quality Assurance and institutional accreditation performance indicators and ...
 
Rubrics for IPCRF of Teachers per Objective of their KRAs
Rubrics for IPCRF of Teachers per Objective of their KRAsRubrics for IPCRF of Teachers per Objective of their KRAs
Rubrics for IPCRF of Teachers per Objective of their KRAs
 
A strategic plan for cavite state university naic branch
A strategic plan for cavite state university naic branchA strategic plan for cavite state university naic branch
A strategic plan for cavite state university naic branch
 
Enhanced Sip Guidebook
Enhanced Sip GuidebookEnhanced Sip Guidebook
Enhanced Sip Guidebook
 
Sample RPMS for Principal I to IV
Sample RPMS for Principal I to IVSample RPMS for Principal I to IV
Sample RPMS for Principal I to IV
 
IPCRF
IPCRFIPCRF
IPCRF
 
Sample Accomplished SMEA templates
Sample Accomplished SMEA templatesSample Accomplished SMEA templates
Sample Accomplished SMEA templates
 
Rpms accomplishment journal
Rpms accomplishment journalRpms accomplishment journal
Rpms accomplishment journal
 
Individual performance commitment and review form for regular teachers
Individual performance commitment and review form for regular teachersIndividual performance commitment and review form for regular teachers
Individual performance commitment and review form for regular teachers
 
Annual operational plan template
Annual operational plan templateAnnual operational plan template
Annual operational plan template
 
Form 2-ppa
Form 2-ppaForm 2-ppa
Form 2-ppa
 
GEAR UP Final Report UPDATE 4 13 15
GEAR UP Final Report UPDATE 4 13 15GEAR UP Final Report UPDATE 4 13 15
GEAR UP Final Report UPDATE 4 13 15
 
Dm s2022 004
Dm s2022 004Dm s2022 004
Dm s2022 004
 

Similaire à Ca Baseline and Post test assessment report 2007 12 oct07

Ncv 4 Lo Ass Guide
Ncv 4 Lo Ass GuideNcv 4 Lo Ass Guide
Ncv 4 Lo Ass Guidesasha gordon
 
Besra Progress Report
Besra Progress ReportBesra Progress Report
Besra Progress Reporteloisaquestro
 
Aspects of the Zambian Ministry of Education's policy on assessment
Aspects of the Zambian Ministry of Education's policy on assessmentAspects of the Zambian Ministry of Education's policy on assessment
Aspects of the Zambian Ministry of Education's policy on assessmentWilliam Kapambwe
 
Curriculum (formative &amp; summative) evaluation
Curriculum (formative &amp; summative) evaluationCurriculum (formative &amp; summative) evaluation
Curriculum (formative &amp; summative) evaluationDrGavisiddappa Angadi
 
Professional Development PPT slides.pptx
Professional Development PPT slides.pptxProfessional Development PPT slides.pptx
Professional Development PPT slides.pptxNqobile Nkosi
 
Final-BEMEF-Presentation.pptx
Final-BEMEF-Presentation.pptxFinal-BEMEF-Presentation.pptx
Final-BEMEF-Presentation.pptxTineCristine
 
Final dual-mode-university-manual-7feb2020
Final dual-mode-university-manual-7feb2020Final dual-mode-university-manual-7feb2020
Final dual-mode-university-manual-7feb2020utpalbhattacharjee
 
Manual of Instructional Supervision.pptx
Manual of Instructional Supervision.pptxManual of Instructional Supervision.pptx
Manual of Instructional Supervision.pptxJohannaQCuares
 
Pilot-tesing, Monitoring and Evaluating the Implementation of Curriculum
Pilot-tesing, Monitoring and Evaluating the Implementation of CurriculumPilot-tesing, Monitoring and Evaluating the Implementation of Curriculum
Pilot-tesing, Monitoring and Evaluating the Implementation of CurriculumVirginia Sevilla
 
United International Academic and Consultancy Services
United  International Academic and Consultancy ServicesUnited  International Academic and Consultancy Services
United International Academic and Consultancy ServicesInternational United
 
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptxassessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptxMarjorie Malveda
 
Examination_and_Assessment_Reforms.pdf
Examination_and_Assessment_Reforms.pdfExamination_and_Assessment_Reforms.pdf
Examination_and_Assessment_Reforms.pdfaptushar
 
The AP Regional initiative on QA of TVET Qualifications in the context of ASE...
The AP Regional initiative on QA of TVET Qualifications in the context of ASE...The AP Regional initiative on QA of TVET Qualifications in the context of ASE...
The AP Regional initiative on QA of TVET Qualifications in the context of ASE...OECD CFE
 

Similaire à Ca Baseline and Post test assessment report 2007 12 oct07 (20)

Assessment plan
Assessment planAssessment plan
Assessment plan
 
Ncv 4 Lo Ass Guide
Ncv 4 Lo Ass GuideNcv 4 Lo Ass Guide
Ncv 4 Lo Ass Guide
 
Besra Progress Report
Besra Progress ReportBesra Progress Report
Besra Progress Report
 
Aspects of the Zambian Ministry of Education's policy on assessment
Aspects of the Zambian Ministry of Education's policy on assessmentAspects of the Zambian Ministry of Education's policy on assessment
Aspects of the Zambian Ministry of Education's policy on assessment
 
Curriculum (formative &amp; summative) evaluation
Curriculum (formative &amp; summative) evaluationCurriculum (formative &amp; summative) evaluation
Curriculum (formative &amp; summative) evaluation
 
Professional Development PPT slides.pptx
Professional Development PPT slides.pptxProfessional Development PPT slides.pptx
Professional Development PPT slides.pptx
 
Final-BEMEF-Presentation.pptx
Final-BEMEF-Presentation.pptxFinal-BEMEF-Presentation.pptx
Final-BEMEF-Presentation.pptx
 
Final dual-mode-university-manual-7feb2020
Final dual-mode-university-manual-7feb2020Final dual-mode-university-manual-7feb2020
Final dual-mode-university-manual-7feb2020
 
Accountability
AccountabilityAccountability
Accountability
 
Dep ed order no 44 s 2010
Dep ed order no 44 s 2010Dep ed order no 44 s 2010
Dep ed order no 44 s 2010
 
Manual of Instructional Supervision.pptx
Manual of Instructional Supervision.pptxManual of Instructional Supervision.pptx
Manual of Instructional Supervision.pptx
 
Pilot-tesing, Monitoring and Evaluating the Implementation of Curriculum
Pilot-tesing, Monitoring and Evaluating the Implementation of CurriculumPilot-tesing, Monitoring and Evaluating the Implementation of Curriculum
Pilot-tesing, Monitoring and Evaluating the Implementation of Curriculum
 
CONTINUOUS AND COMPREHENSIVE EVALUATION
CONTINUOUS AND COMPREHENSIVE EVALUATIONCONTINUOUS AND COMPREHENSIVE EVALUATION
CONTINUOUS AND COMPREHENSIVE EVALUATION
 
United International Academic and Consultancy Services
United  International Academic and Consultancy ServicesUnited  International Academic and Consultancy Services
United International Academic and Consultancy Services
 
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptxassessmentforlearningchapter-1-copy-converted-200124131944.pptx
assessmentforlearningchapter-1-copy-converted-200124131944.pptx
 
Examination_and_Assessment_Reforms.pdf
Examination_and_Assessment_Reforms.pdfExamination_and_Assessment_Reforms.pdf
Examination_and_Assessment_Reforms.pdf
 
Pilot testing
Pilot testingPilot testing
Pilot testing
 
LIAMES_PPT.pptx
LIAMES_PPT.pptxLIAMES_PPT.pptx
LIAMES_PPT.pptx
 
The AP Regional initiative on QA of TVET Qualifications in the context of ASE...
The AP Regional initiative on QA of TVET Qualifications in the context of ASE...The AP Regional initiative on QA of TVET Qualifications in the context of ASE...
The AP Regional initiative on QA of TVET Qualifications in the context of ASE...
 
sbm-manual.pdf
sbm-manual.pdfsbm-manual.pdf
sbm-manual.pdf
 

Plus de William Kapambwe

Effective administration and management of high stake assessement
Effective administration and management of high stake assessementEffective administration and management of high stake assessement
Effective administration and management of high stake assessementWilliam Kapambwe
 
Test construction (for content staff) eg feb08 erp
Test construction (for content staff) eg feb08 erpTest construction (for content staff) eg feb08 erp
Test construction (for content staff) eg feb08 erpWilliam Kapambwe
 
General Framework for Setting Examination Papers and Test Papers
General Framework for Setting Examination Papers and Test PapersGeneral Framework for Setting Examination Papers and Test Papers
General Framework for Setting Examination Papers and Test PapersWilliam Kapambwe
 
CA Lessons for zambia from namibia, malawi and tanzania 2005
CA Lessons for zambia from namibia, malawi and tanzania  2005CA Lessons for zambia from namibia, malawi and tanzania  2005
CA Lessons for zambia from namibia, malawi and tanzania 2005William Kapambwe
 
An overview of the assessment tasks for all the six subject areas
An overview of the assessment tasks for all the six subject areasAn overview of the assessment tasks for all the six subject areas
An overview of the assessment tasks for all the six subject areasWilliam Kapambwe
 
The role of strategic planning in effecting change the realtionshiop between ...
The role of strategic planning in effecting change the realtionshiop between ...The role of strategic planning in effecting change the realtionshiop between ...
The role of strategic planning in effecting change the realtionshiop between ...William Kapambwe
 
Characteristics of outcomes based assessment
Characteristics of outcomes based assessmentCharacteristics of outcomes based assessment
Characteristics of outcomes based assessmentWilliam Kapambwe
 
What is philosophy presentation
What is philosophy presentationWhat is philosophy presentation
What is philosophy presentationWilliam Kapambwe
 
Continuous assessment as a relevant tool to quality products of learners in e...
Continuous assessment as a relevant tool to quality products of learners in e...Continuous assessment as a relevant tool to quality products of learners in e...
Continuous assessment as a relevant tool to quality products of learners in e...William Kapambwe
 
Training session on talent management and development
Training session on talent management and developmentTraining session on talent management and development
Training session on talent management and developmentWilliam Kapambwe
 
Power point for the techniques for constructing exam items
Power point for the techniques for constructing exam itemsPower point for the techniques for constructing exam items
Power point for the techniques for constructing exam itemsWilliam Kapambwe
 
William paper presentation
William paper presentationWilliam paper presentation
William paper presentationWilliam Kapambwe
 

Plus de William Kapambwe (13)

Effective administration and management of high stake assessement
Effective administration and management of high stake assessementEffective administration and management of high stake assessement
Effective administration and management of high stake assessement
 
Test construction (for content staff) eg feb08 erp
Test construction (for content staff) eg feb08 erpTest construction (for content staff) eg feb08 erp
Test construction (for content staff) eg feb08 erp
 
General Framework for Setting Examination Papers and Test Papers
General Framework for Setting Examination Papers and Test PapersGeneral Framework for Setting Examination Papers and Test Papers
General Framework for Setting Examination Papers and Test Papers
 
CA Lessons for zambia from namibia, malawi and tanzania 2005
CA Lessons for zambia from namibia, malawi and tanzania  2005CA Lessons for zambia from namibia, malawi and tanzania  2005
CA Lessons for zambia from namibia, malawi and tanzania 2005
 
An overview of the assessment tasks for all the six subject areas
An overview of the assessment tasks for all the six subject areasAn overview of the assessment tasks for all the six subject areas
An overview of the assessment tasks for all the six subject areas
 
The role of strategic planning in effecting change the realtionshiop between ...
The role of strategic planning in effecting change the realtionshiop between ...The role of strategic planning in effecting change the realtionshiop between ...
The role of strategic planning in effecting change the realtionshiop between ...
 
Characteristics of outcomes based assessment
Characteristics of outcomes based assessmentCharacteristics of outcomes based assessment
Characteristics of outcomes based assessment
 
What is philosophy presentation
What is philosophy presentationWhat is philosophy presentation
What is philosophy presentation
 
Continuous assessment as a relevant tool to quality products of learners in e...
Continuous assessment as a relevant tool to quality products of learners in e...Continuous assessment as a relevant tool to quality products of learners in e...
Continuous assessment as a relevant tool to quality products of learners in e...
 
What makes a leader
What makes a leaderWhat makes a leader
What makes a leader
 
Training session on talent management and development
Training session on talent management and developmentTraining session on talent management and development
Training session on talent management and development
 
Power point for the techniques for constructing exam items
Power point for the techniques for constructing exam itemsPower point for the techniques for constructing exam items
Power point for the techniques for constructing exam items
 
William paper presentation
William paper presentationWilliam paper presentation
William paper presentation
 

Dernier

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 

Dernier (20)

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 

Ca Baseline and Post test assessment report 2007 12 oct07

  • 1. Ministry of Education DRAFT Technical Report of the Pre- and Post-Pilot Testing for the Continuous Assessment Programme in Lusaka, Southern and Western Provinces Coordinated by the Examinations Council of Zambia Research and Test Development Department Under the Direction of the Continuous Assessment Steering and Technical Committees Ministry of Education Lusaka, Zambia October 2007
  • 2. Table of Contents ACKNOWLEDGMENTS ..................................................................................2 CHAPTER ONE: BACKGROUND ....................................................................3 1.1 Introduction to Continuous Assessment....................................................... 3 1.2 Definition of Continuous Assessment .......................................................... 4 1.3 Challenges in the Implementation of Continuous Assessment .................... 4 1.4 Guidelines for Implementation of Continuous Assessment.......................... 5 1.5 Plan for Implementation of Continuous Assessment.................................... 7 CHAPTER TWO: EVALUATION METHODOLOGY ..............................................8 2.1 Objectives .................................................................................................... 8 2.2 Design.......................................................................................................... 8 2.3 Sample......................................................................................................... 9 2.4 Instruments .................................................................................................. 9 2.5 Administration .............................................................................................. 9 2.6 Data Capture and Scoring.......................................................................... 10 2.7 Data Analysis ............................................................................................. 10 CHAPTER THREE: ASSESSMENT RESULTS..................................................11 3.1 Psychometric Characteristics..................................................................... 11 3.2 Classical Test Theory ................................................................................ 11 3.3 Item Response Theory............................................................................... 14 3.4 Scaled Scores............................................................................................ 15 3.5 Vertical Scaled Scores ............................................................................... 18 3.6 Comparison between Pilot and Comparison Groups ................................. 19 3.7 Comparison across Regions ...................................................................... 24 3.8 Performance Categories ............................................................................ 25 CHAPTER FOUR: SUMMARY AND CONCLUSIONS .........................................28 APPENDIX 1: ITEM STATISTICS BY SUBJECT APPENDIX 2: SCORES AND FREQUENCIES - GRADE 5 PRE-TESTS APPENDIX 3: SCORES AND FREQUENCIES - GRADE 5 POST-TESTS APPENDIX 4: HISTOGRAMS BY SUBJECT AND GROUP 1
  • 3. ACKNOWLEDGMENTS The Continuous Assessment Joint Steering and Technical Committees and the Examinations Council of Zambia wish to express profound gratitude to the professional and material support provided by the Provincial Education Offices, District Education Boards, Educational Zone staff in the different districts, school administrators, teachers and pupils. Without this support, the baseline and post-pilot assessment exercises would not have succeeded. Other appreciations go to the management in the Directorate for Curriculum and Assessment in the Ministry of Education for providing professional support towards the Continuous Assessment programme in general and the assessment exercises in particular. We wish to specifically thank the Director for Standards and Curriculum, the Director for the Examinations Council of Zambia, and the Chief Curriculum Specialist for allowing their personnel to take part in the assessment exercise. Finally, we wish to express our appreciation to the USAID and the EQUIP2 Project for providing the finances and technical support towards the Continuous Assessment programme in Zambia. All of the participants and stakeholders listed above have played a crucial role in not only developing and implementing the Continuous Assessment programme, but have also been supportive of the quantitative evaluation of the programme presented in this technical paper. It is because of their interest in improving student learning outcomes that the Continuous Assessment programme has had the necessary financial, administrative and technical support. Our hope is that the programme will prove to be valuable for all of the pupils and teachers in Zambian schools. 2
  • 4. Chapter One: Background 1.1 Introduction to Continuous Assessment Over the years in Zambia, the education system has not been able to provide enough spaces for all learners to proceed from Grade 7 to Grade 8, from Grade 9 to Grade 10, and from Grade 12 to higher learning institutions. The system has used examinations for selection of those to proceed to the next level and for the certification of candidates; however, this has been done without formal consideration of the school-based assessment as a component in the final examinations, with the exception of some practical subjects. The 1977 Educational Reforms explicitly provided for the use of Continuous Assessment (CA). Later, national policy documents, particularly Educating Our Future (1996) and Ministry of Education’s Strategic Plan 2003-2007, stated the need for integrating school-based continuous assessment into the education system, including the development of strategies to combine CA results with the final examination results for purposes of pupil certification and selection. Furthermore, the national education policy, as stated in Educating Our Future, stipulated that the Ministry of Education will develop procedures that will enable teachers to standardise their assessment methods and tasks for use as an integral part of school-based CA. The education policy document also stated that the Directorate of Standards, in cooperation with the Examinations Council of Zambia (ECZ), will determine how school-based CA can be better conducted so that it can contribute to the final examination results for pupil certification and promotion to the subsequent levels. The policy also stated that the Directorate of Standards, with input from the ECZ, will determine when school-based CA can be introduced. In order to set in motion the implementation of school-based CA, the ECZ convened a preparatory workshop from 16th to 22nd November 2003 in Kafue. Ninety (90) participants from various stakeholders’ institutions took part. The objectives of the preparatory workshop were to: • Recommend a plan for developing and implementing CA; • Recommend a training plan for preparing teachers in implementing CA; • Explore ways of ensuring transparency, reliability, validity and comparability in using CA results; • Agree on common assessment tasks and learning outcomes to be identified in the syllabuses for CA; • Discuss the development of a teacher’s manual on CA; and • Discuss the nature of summary forms for recording marks that should be provided to schools. 3
  • 5. 1.2 Definition of Continuous Assessment Continuous assessment is defined as an on-going, diagnostic, classroom- based process that uses a variety of assessment tools to measure learner performance. CA is a formative evaluation tool conducted during the teaching and learning process with the aim of influencing and informing the overall instructional process. It is the assessment of the whole learner on an ongoing basis over a period of time, where cumulative judgments of the learner’s abilities in specific areas are made in order to facilitate further positive learning (Le Grange & Reddy, 1998). 1 The data generated from CA should be useful in assisting teachers to plan for the learning by individual pupils. It also should assist teachers in identifying the unique understanding of each learner in a classroom by informing the pupil of the level of instructional attainment, helping to target opportunities that promote learning, and reducing anxiety and other problems associated with examinations. CA has shown to have had positive impacts on student learning outcomes in hundreds of educational settings (Black & William, 1998). 2 CA is made up of a variety of assessment methods that can be formal or informal. It takes place during the learning process when it is most necessary, making use of criterion referencing rather than norm referencing and providing feedback on how learners are changing. 1.3 Challenges in the Implementation of Continuous Assessment There are several areas in which the implementation of CA in the classroom will present challenges. Some of these are listed below. • Large class sizes in most primary schools are a major problem. It is common to find classes of 60 and above in Zambian classrooms. Teachers are expected to mark and keep records of the progress of all of these learners. • CA can take a lot of time for teachers. As a result, teachers get concerned that time spent on remediation and enrichment is excessive and many teachers do not believe that they would finish the syllabus with CA. • CA will not be successfully implemented if there are inadequate teaching resources / equipment in schools. Teachers need materials and equipment such as stationery, computers and photocopiers (and electricity). • There may be cases of resistance from school administrators and teachers if they feel left out in the process of developing the CA programme. • CA requires the cooperation of communities and parents. If they do not understand what is expected of them, they may resist and hence affect the success of the programme. 1 Le Grange, L.L. & Reddy, C. 1998. Continuous Assessment: An Introduction and Guidelines to Implementation. Cape Town, South Africa: Juta. 2 Black, P. & William, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 7-74. 4
  • 6. 1.4 Guidelines for Implementation of Continuous Assessment A teachers’ guide on the implementation of continuous assessment at the basic school level was developed with the involvement of curriculum specialists, Standards officers, Examinations specialists, Provincial Education Officials, District Education Officials, Zonal in-Service training providers, school administrators and teachers. The Teachers’ Guide on CA comprises the following: • Sample record forms; • Description of the CA schemes; • Instructions for preparing and administering assessment materials; • Marking and moderation of the CA marks; • Recording and reporting assessment results; and • Monitoring of the implementation of the CA. The Teachers’ Guide also specifies the roles of stakeholders as follows: Teachers • Plan assessment tasks, projects and mark schedules; • Teach, guide and supervise pupils in implementing given tasks; • Conduct the assessment in line with given guidelines; • Mark and record the results; • Provide correction and remedial work to the pupils; • Inform the head teacher and parents on the performance of the child; • Advise and counsel the pupils on their performance in class tasks; • Take part in internal moderation of pupils’ results. School Administrators • Provide an enabling environment, such as the procurement of teaching and learning materials; • Act as links between the school and other stakeholders like ECZ, traditional leaders, politicians and parents; • Ensure validity, reliability and comparability through moderation of CA; • Compile CA results and hand them to ECZ. Parents • Provide professional, moral, financial and material support to pupils. • Continuously monitor their children’s attendance and performance • Take part in making and enforcing school rules. • Attend open days and witness the giving of prizes (rewards) to outstanding pupils in terms of performance. 5
  • 7. Standards Officers • Interpret Government of Zambia policy on education; • Monitor education policy implementation at various levels of the education system; • Advise and evaluate the extent to which the education objectives have been achieved; • Ensure that acceptable assessment practices are conducted; • Monitor the overall standards of education. Guidance Teachers/School Counsellors • Prepare and store record cards for CA; • Counsel pupils, teachers and parents/ guardians on CA and feedback; • Take care of the pupils’ psycho-social needs; • Make referrals for pupils to access other specialized assistance/support. Heads of Department/Senior Teachers/Section Heads • Monitor and advise teachers in the planning, setting, conducting, marking and recording of CA results; • Ensure validity, reliability and dependability of CA by conducting internal moderation of results; • Hold departmental meetings to analyze the assessment; • Provide or make available the teaching and learning materials; • Compile a final record of CA results and hand them over to Guidance Teachers for onward submission to the ECZ. District Resource Centre Coordinators • Ensure adequate in service training for teachers in planning, conducting, marking, moderating and recording results at school level in the district; • Monitor the conduct of CA in the schools and district; • Professionally guide teachers to ensure provision of quality education at school level. Provincial Resource Centre Coordinators • Ensure adequate in-service training for teachers for them to be effective in planning, conducting, marking, moderating and recording CA results; • Monitor the conduct of CA in the province; • Professionally guide teachers to ensure provision of quality education at provincial level. Examinations Specialist • Analyse and moderate CA results and certify candidates; • Integrate CA results with terminal examination results; • Determine grade boundaries; • Certify the candidates; 6
  • 8. Disseminate the results of candidates. Monitors As monitors of the CA programme, various officials and stakeholders will look out for the following documents and information: • Progress chart; • Record of CA results and analysis; • Marked evidence of pupils’ CA work on remedial activities; • Evaluating gender performance; • Pupil’s Record Cards; • CA plans or schedules and schemes; • Evidence of pupils’ work; • CA administration; • Evidence of remedial work; • Availability of planned remedial work in the classroom; • Availability of the teacher’s guide; • Sample CA tasks; • Evidence of a variety of CA tasks; • Teacher’s record of pupils’ performance. 1.5 Plan for Implementation of Continuous Assessment CA in Zambia is planned to roll out over a period of several years. This will allow for proper stakeholder support and evaluation. The following list provides the brief timeline of important CA activities through 2008: • Creation of CA Steering and Technical Committees (2005); • Development of assessment schemes, teacher’s guides, model assessment tasks booklets and recordkeeping forms (2005); • Design of quantitative evaluation methodology with focus on student learning outcomes (2005); • Implementation of CA pilot in Phase 1 schools: Lusaka, Southern and Western regions (2006); • Baseline report on student learning outcomes (2006); • Implementation of CA pilot in Phase 2 schools: Central, Copperbelt and Eastern Regions (2007); • Expansion of modified CA pilot to community schools (2007); • Post-test report on student learning outcomes (2007); • Implementation of CA pilot in Phase 3 schools: Luapula, Northern and Northwestern Regions (2008); • Discussion of scaling up of CA pilot and systems-level planning for combining Grade 7 end-of-cycle summative test scores with CA scores for selection and certification purposes (2008). 7
  • 9. Chapter Two: Evaluation Methodology 2.1 Objectives The main objective of the quantitative evaluation is to determine whether the CA programme has had positive effects on student learning outcomes. The evaluation allows for a determination of whether pupils’ academic performance has changed as a result of the CA intervention, as well as the extent of the change in performance. 2.2 Design The evaluation design is quasi-experimental, with pre-test and post-tests administered to intervention (pilot) and control (comparison) groups. It features a pre-test at the beginning of Grade 5 and post-tests at the end of Grades 5, 6, and 7. The pilot and comparison groups will be compared at each time point in 6 subject areas to see if there are differences in test scores from the baseline to the post-tests by group (see Figures 1 and 2 below). 3 Figure 1: Pre-Test and Post-Test, Pilot and Control Group Design Grade 5 Grade 5 Grade 6 Grade 7 Pre-test Post-test Post-test Post-test Pilot Pilot Pilot Pilot Group Group Group Group Control Control Control Control Group Group Group Group Figure 2: Expected Results from the Evaluation 650 600 550 Scaled Score 500 450 400 350 Pilot 300 Control 250 200 G5 Pre-test G5 Post-test G6 Post-test G7 Post-test Assessment 3 For more information, refer to the Summary of the Continuous Assessment Program August 2007 by the Examinations Council of Zambia and the EQUIP2-Zambia project. 8
  • 10. With the matched pairs random assignment design, it was expected that the two groups, pilot and control, would have similar mean scores on the pre-test. However, with a successful intervention, it was expected that the pilot group would score higher than the control group on the subsequent post-tests. 2.3 Sample The sample included all the 2006 (pre-test) and 2007 (post-test) Grade 5 basic school pupils in Lusaka, Southern and Western Provinces in the 24 pilot (intervention) and 24 comparison (control) schools. The schools were chosen using matched pairs by geographic location, school size, and grade levels as matching variables, followed by random assignment to pilot and comparison status. CA activities were implemented in pilot schools but not in the comparison schools. 2.4 Instruments Student achievement for the Grade 5 baseline and post-pilot administrations was measured using multiple choice tests with 30 items (30 points per test). The test development process included the following steps: • Review of the curriculums for each subject area; • Development of test specifications; • Development of items; • Piloting of items; • Data reviews of item statistics; • Forms pulling (selecting items for final test papers). The test instruments were developed by teams of Curriculum Specialists, Standards Officers, Examination Specialists and Teachers. The baseline tests (pre-tests) were developed based on the Grade 4 syllabus and the post-pilot tests (post-tests) were developed based on the Grade 5 syllabus. 2.5 Administration The ECZ organized the administration of both pre-test and post-test papers. Teams comprising an Examination Specialist, a Standards Officer and a Curriculum Specialist were sent to each region to supervise the administration. District Education officials, School Administrators and Teachers were involved in the actual administration of the tests. All of the Grade 5 pupils in the pilot and comparison schools sat for six tests, one in each of the six subject areas (English, Mathematics, Social and Development Studies, Integrated Science, Creative and Technology Studies and Community Studies). The baseline tests (Grade 4 syllabus) were administered to the students at the beginning of Grade 5, in February 2006. The post-pilot tests (Grade 5 syllabus) were administered in February 2007. Note that there will be two more administrations of post-tests for the cohort of students in the three provinces. These will take place in February 2008 9
  • 11. (Grade 6 syllabus) and November 2008 (Grade 7 syllabus). This process will be repeated in Phases 2 and 3 schools (see Table 1 below). Table 1: Implementation Plan for CA Pilot Phase 2006 2007 2008 2009 2010 Phase 1 (Lusaka, Grade 5 Grade 6 Grade 7 Southern, Western) Phase 2 (Central, Grade 5 Grade 6 Grade 7 Copperbelt, Eastern ) Phase 3 (Luapula, Grade 5 Grade 6 Grade 7 Northern, Northwestern) 2.6 Data Capture and Scoring Data were captured using Optical Mark Readers (OMR) and scored by use of the Faim software at the ECZ. Through this process, tem scores for all students were converted into electronic format and data files were produced for analysis. 2.7 Data Analysis Data were analysed by use of the Statistical Package for Social Sciences (SPSS). Scores and frequencies by subject were generated. Analysed data were presented in tabular, chart and graphical forms. Additional analyses were conducted using WINSTEPS (item response theory Rasch modelling) software. SPSS was used for scaling the pupils’ scores. 10
  • 12. Chapter Three: Assessment Results 3.1 Psychometric Characteristics An initial step in determining the results from the assessments was to conduct analyses to determine the psychometric characteristics of the assessments. Both the Standards for Educational and Psychological Testing (1999) 4 and the Code of Fair Testing Practices in Education (2004) 5 include standards for identifying quality items. Items should assess only knowledge or skills that are identified as part of the domain being tested and should avoid assessing irrelevant factors (e.g., ambiguous and grammatical errors, sensitive content or language, etc.). Both quantitative and qualitative analyses were conducted to ensure that items on both Grade 5 baseline and post-pilot tests met satisfactory psychometric guidelines. The statistical evaluations of the items are presented in two parts, using classical test theory (CTT) and item response theory (IRT), which is sometimes called modern test theory. 6 The two measurement models generally provide similar results, but IRT is particularly useful for test scaling and equating. CTT analyses included 1) difficulty index (p-value), 2) discrimination index (item-test correlations), and 3) test reliability (Cronbach's Alpha for an estimate of internal consistency reliability). IRT analyses included (1) calibration of items, and (2) examination of item difficulty index (i.e., b-parameter). 3.2 Classical Test Theory Difficulty Indices (p) All multiple-choice items were evaluated in terms of item difficulty according to standard classical test theory practices. Difficulty was defined as the average proportion of points achieved on an item by the students. It was calculated by obtaining the average score on an item and dividing by the maximum possible score for the item. Multiple-choice items were scored dichotomously (1 point vs. no points, or correct vs. incorrect), so the difficulty index was simply the proportion of students who correctly answered the item. All items on Grade 5 pre-tests and post-tests had four response options. Table 2 shows the average p-values for each test. Note that this may also be calculated by taking the average raw score of all students divided by the maximum points (30) per test. 4 American Educational Research Association, American Psychological Association, and National Council on Measurement in Education (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association. 5 Joint Committee on Testing Practices (2004). Code of Fair Testing Practices in Education. Washington, DC: American Psychological Association. 6 For more information, see Crocker, L. and Algina, J. (1986). Introduction to Classical and Modern Test Theory. New York: Harcourt Brace. 11
  • 13. Table 2: Overall Test Difficulty Estimates by Subject Area Grade 5 Pre-test Grade 5 Post-test Subject Area Mean Mean # Items # Items p-value p-value English 30 0.40 30 0.37 Social and Developmental Studies 30 0.34 30 0.42 Mathematics 30 0.41 30 0.40 Integrated Science 30 0.33 30 0.36 Creative and Technology Studies 30 0.35 30 0.36 Community Studies 30 0.32 30 0.37 Items that are answered correctly by almost all students provide little information about differences in student ability, but they do indicate knowledge or skills that have been mastered by most students. Similarly, items that are correctly answered by very few students may indicate knowledge or skills that have not yet been mastered by most students, but such items provide little information about differences in student ability. In general, to provide best measurement, difficulty indices should range from near-chance performance of about 0.20 (for four-option, multiple-choice items) to 0.90. In general, the item difficulty indices for both Grade 5 pre-tests and post-tests were within generally acceptable and expected ranges (see Appendix 1 for a complete list of p-values for all items on each test). Item Discrimination (Item-Test or Point-Biserial Correlations) One desirable feature of an item is that the higher performing students do better on the item than lower performing students. The correlation between student performance on a single item and total test score is a commonly used measure of this characteristic of an item. Within classical test theory, the item- test (or point-biserial) correlation is referred to as the item’s discrimination because it indicates the extent to which successful performance on an item discriminates between high and low scores on the test. The theoretical range of these statistics is –1 to +1, with a typical range from 0.2 to 0.6. Discrimination indices can be thought of as measures of how closely an item assesses the same knowledge and skills assessed by other items contributing to the total score. Discrimination indices for Grade 5 are presented in Table 3. Table 3: Overall Test Discrimination Estimates by Subject Area Grade 5 Pre-test Grade 5 Post-test Subject Area Mean Mean # Items # Items Pt-bis Pt-bis English 30 0.46 30 0.48 Social and Developmental Studies 30 0.38 30 0.45 Mathematics 30 0.37 30 0.41 Integrated Science 30 0.35 30 0.43 Creative and Technology Studies 30 0.38 30 0.44 Community Studies 30 0.29 30 0.43 12
  • 14. On average, the discrimination indices were within acceptable and expected ranges (i.e., 0.20 to 0.60). The positive discrimination indices indicate that students who performed well on individual items tended to perform well overall on the test. There were no items on the instruments that had near-zero discrimination indices (see Appendix 1 for a complete list of the point-biserial correlations for all items on each pre-test and post-test per subject area). Test Reliabilities Although an individual item’s statistical properties is an important focus, a complete evaluation of an assessment must also address the way items function together and complement one another. There are a number of ways to estimate an assessment’s reliability. One possible approach is to give the same test to the same students at two different points in time. If students receive the same scores on each test, then the extraneous factors affecting performance are small and the test is reliable. (This is referred to as test-retest reliability.) A potential problem with this approach is that students may remember items from the first administration or may have gained (or lost) knowledge or skills in the interim between the two administrations. A solution to the ‘remembering items’ problem is to give a different, but parallel test at the second administration. If the student scores on each test correlate highly, the test is considered reliable. (This is known as alternate forms reliability, because an alternate form of the test is used in each administration.) This approach, however, does not address the problem that students may have gained (or lost) knowledge or skills in the interim between the two administrations. In addition, the practical challenges of developing and administering parallel forms generally preclude the use of parallel forms reliability indices. One way to address these problems is to split the test in half and then correlate students’ scores on the two half-tests; this in effect treats each half-test as a complete test. By doing this, the problems associated with an intervening time interval, and of creating and administering two parallel forms of the test, are alleviated. This is known as a split-half estimate of reliability. If the two half-test scores correlate highly, items on the two half-tests must be measuring very similar knowledge or skills. This is evidence that the items complement one another and function well as a group. This also suggests that measurement error will be minimal. The split-half method requires a judgment regarding the selection of which items contribute to which half-test score. This decision may have an impact on the resulting correlation; different splits will give different estimates of reliability. Cronbach (1951) 7 provided a statistic, α (alpha), that avoids this concern about the split-half method. Cronbach’s α gives an estimate of the average of all possible splits for a given test. Cronbach’s α is often referred to as a measure of internal consistency because it provides a measure of how well all the items in the test measure one single underlying ability. Cronbach’s α is computed using the following formula: 7 Cronbach, L. J. (1951). Coefficient Alpha and the Internal Structure of Tests. Psychometrika, 16, 297–334. 13
  • 15. n ⎤ n ⎢ ∑σ 2 (Yi ) ⎥ α = ⎢1 − i =1 ⎥ n −1 ⎢ σ x2 ⎥ ⎢ ⎥ ⎣ ⎦ where, i : Item n : Total number of items, σ 2 (Yi ) : Individual item variance, and σ x2 : Total test variance For standardized tests, reliability estimates should be approximately 0.80 or higher. According to Table 4, the reliabilities for the tests on the pre-test ranged from 0.63 (Community Studies) to 0.87 (English). The reliability estimate for Community Studies was low due to the absence of a national curriculum for use in test construction. In contrast, the reliability estimates for the post-tests ranged 0.83 (Mathematics) to 0.89 (English). It is likely that the post-tests had higher reliability estimates since the test developers had more experience than they had when they developed the baseline tests. Table 4: Test Reliability Estimates by Subject Area Grade 5 Pre-test Grade 5 Post-test Subject Area Coefficient Coefficient # Items # Items Alpha Alpha English 30 0.87 30 0.89 Social and Developmental Studies 30 0.80 30 0.87 Mathematics 30 0.79 30 0.83 Integrated Science 30 0.76 30 0.85 Creative and Technology Studies 30 0.80 30 0.86 Community Studies 30 0.63 30 0.85 3.3 Item Response Theory Item Response Theory (IRT) uses mathematical models to define a relationship between an unobserved measure of student ability, usually referred to as theta ( θ ), and the probability ( p ) of getting a dichotomous item correct. In IRT, it is assumed that all items are independent measures of the same construct or ability (i.e., the same θ ). The process of determining the specific mathematical relationship between θ and p is referred to as item calibration. Once items are calibrated, they are defined by a set of parameters which specify a non-linear relationship between θ and p . 8 8 For more information about item calibration, see the following references: Lord, F.M. and Novick, M.R. (1968). Statistical Theories of Mental Test Scores. Boston, MA: Addison-Wesley; Hambleton, R.K. and Swaminathan, H. (1984). Item Response Theory: Principles and Applications. New York: Springer. 14
  • 16. For the CA programme, a 1-parameter or Rasch model was implemented. The equation for the Rasch model is defined as probability of giving correct response to item i by a student with ability level of θ : exp D(θ − bi ) Pi (θ ) = 1 + exp D(θ − bi ) Where, i = item, b = item difficulty, D = a normalizing constant equal to 1.701. In IRT, item difficulty ( bi ) and student ability ( θ ) are measured on a scale of − ∞ to + ∞ . A scale of − 3.0 to + 3.0 is used operationally in educational assessment programmes. with − 3.0 being low student ability or an easy item and + 3.0 being high student ability or a difficult item. The bi parameter for an item is the position on the ability scale where the probability of a correct response is 0.50. The WINSTEPS program was the software used to do the IRT analyses. The item parameter files resulting from the analyses are provided in Appendices 2 and 3. This presentation is direct output from WINSTEPS. 9 Raw scores were then scaled using the item response theory model, with a range of 100-500 (see Appendices 2 and 3 for the raw score to scale score conversion tables for each subject area). 3.4 Scaled Scores The Grade 5 pre-test and post-test scores in each subject area are reported on a scale that ranges from 100 to 500. Students’ raw scores or total number of points, on the pre-tests and post-tests are translated to scaled scores using a data analysis process called scaling. Scaling simply converts raw points from one scale to another. In the same way that distance can be expressed in miles or kilometres, or monetary value can be expressed in terms of U.S. dollars or Zambian Kwacha, student scores on both pre and post-tests could be expressed as raw scores (i.e., number of points) or scaled scores. Cut points were established on the raw score scale both for the pre-tests and post-tests (see Section 3.8 “Performance Levels” for an explanation of how these cut points were determined). Once the raw score cut points were determined via standard setting, the next step was to compute theta cuts using the test characteristic curve (TCC) mapping procedure and then calculate the transformation coefficients that would be used to place students’ raw scores onto the theta scale then onto the scaled score used for reporting. As previously stated, student scores on the assessments are reported in integer values from 100 to 500 with two scores representing cut scores on each assessment. Two cut points (Unsatisfactory/Satisfactory and Satisfactory/Advanced) were pre-set at 250 and 350, respectively. 9 See the WINSTEPS user’s manual for additional details regarding this output (at http://www.winsteps.com). 15
  • 17. Figure 3: Scaled Score Conversion Procedure Raw Score Cut Conversion of Raw Score Cuts into theta Calculation of Scores (from cuts θ1 and θ 2 Using TCC Mapping Scaled Score Standard Setting) constants (b and m) using theta cuts ( θ 1 , θ 2 ), and Calculation of Scaled Score using scaled score cuts (250 and m(θ ) + b 350) The scaled scores are obtained by a simple linear transformation of the theta score using the values of 250 and 350 on the scaled score metric and the associated theta cut points to define the transformation. The scaling coefficients were calculated using the following formulae: b = 250 − m(θ1 ) b = 350 − m(θ 2 ) (350 − 250) m= (θ 2 − θ1 ) Where m is the slope of the line providing the relationship between the theta and scaled scores, b is the intercept, θ 1 is the cut score on the theta score metric for the Unsatisfactory/Satisfactory cut (i.e., corresponding to the raw score cut for Unsatisfactory/Satisfactory), and θ 2 is the cut score on the theta score metric for the Satisfactory/Advanced cut (i.e., corresponding to the raw score cut for Satisfactory/Advanced). Scaled scores were then calculated using the following linear transformation (see Figure 1): Scaled Score = m (θ ) + b Where, θ represents a student’s theta (or ability) score. The values obtained using this formula were rounded to the nearest integer and then truncated such that no student received a score below 100 or above 500. Table 4 presents the mean raw score for each grade/subject area combination in pre and post-tests. It is important to note that converting from raw scores to scaled scores does not change the students’ performance-level classifications. For the Zambia CA programme, a score of 250 is the cut score between Unsatisfactory and Satisfactory and a score of 350 is the cut score between Satisfactory and Advanced. This is true regardless of which subject area, grade, or year one may be concerned with. Scaled scores supplement the pre-test and post-test results by providing information about the position of a student’s results within a performance level. For instance, if the range for a performance level is 200 to 250, a 16
  • 18. student with a scaled score of 245 is near the top of the performance level, and close to the next higher performance level. School level scaled scores are calculated by computing the average of student-level scaled scores. Table 5 provides the raw score averages for each of the subject areas, while Table 6 provides the same information in scaled scores. Table 5: Grade 5 Mean Raw Scores by Subject Area Grade 5 Pre-test Grade 5 Post-test # Subject Area Std. Std. Items N Mean N Mean Dev. Dev. English 30 3798 12.2 6.5 4025 11.7 7.1 Social and Developmental Studies 30 3962 10.1 5.3 4104 13.2 6.6 Mathematics 30 3883 12.3 5.3 4127 12.4 5.8 Integrated Science 30 4039 9.9 4.9 4135 11.1 6.3 Creative and Technology Studies 30 4032 10.5 5.3 4097 11.7 6.2 Community Studies 30 4037 9.5 4.0 4141 11.2 6.4 According to Table 5, overall mean raw scores (with both pilot and comparison groups taken together) across the subject areas on the pre-test ranged from 9.5 (Community Studies) to 12.3 (Mathematics) out of possible score point of 30. In contrast, the overall mean raw scores for the post-tests ranged from 11.1 (Integrated Science and Creative and Technology Studies) to 13.2 (Social and Developmental Studies). From Table 6, the scaled score averages for Grade 5 pre-tests ranged from 214 (Community Studies) to 239 (English) out of possible score point of 100-500. In contrast, the scaled score averages for the post-tests ranged from 233 (English) to 262 (Mathematics). Table 6: Grade 5 Mean Scaled Scores by Subject Area Grade 5 Pre-test Grade 5 Post-test # Subject Area Std. Std. Items N Mean N Mean Dev. Dev. English 30 3798 238.8 83.7 4025 233.4 88.1 Social and Developmental Studies 30 3962 230.5 86.2 4104 241.2 83.9 Mathematics 30 3883 222.4 89.2 4127 261.9 72.6 Integrated Science 30 4039 226.5 80.2 4135 245.7 73.7 Creative and Technology Studies 30 4032 224.1 85.3 4097 244.3 83.0 Community Studies 30 4037 214.0 83.7 4141 236.9 72.3 It was stated earlier that scaled score is a simple linear transformation of the raw scores, using the values of 250 and 350 on the scaled score metric. Student’s relative position on the raw score matrix does not change due to this scale transformation. Note that the primary interest of this evaluation is not whether the raw scores and/or scaled scores increase or decrease from pre-test to post-test. These differences will occur mainly through variations in test difficulty. The main analysis will compare the relative changes in the two groups, i.e., pilot and 17
  • 19. comparison, across the two time points, i.e., pre-test to post-test. At a later point, post-tests will also be conducted when the cohort of students is in Grade 6 and Grade 7, followed by extended analyses for the two additional time points. 3.5 Vertical Scaled Scores In vertical scaling, tests that vary in difficulty level, but that are intended to measure similar constructs, are placed on the same scale. Placing different tests on the same scale can be implemented in a number of ways, such as, linking items across the tests or social moderation. For the CA programme, a social moderation (Linn, 1993) procedure was employed for vertical scaling. 10 In social moderation, assessments are developed in reference to a common content framework. Performance of individual students, and schools, is measured against a single set of common standards. For Zambia, an analysis of the Grade 4 and 5 curriculums showed that the content was vertically aligned, i.e., students were expected to progress in their learning along the same constructs from one grade level to the next. This allowed the test developers to link the pre-tests and post-tests through common performance standards. The visual representation of the vertical scaling scheme for the CA programme is shown below. Figure 4: Vertical Scaling Scheme Grade 5 Pre-test: 250 350 Grade 5 Post-test: 350 450 Grade 6 Post-test: 450 550 Grade 7 Post-test: 550 650 In other words, students who were classified as Advanced in the Grade 5 pre- test (i.e., end of Grade 4 syllabus) would also be considered as Satisfactory in Grade 5 post-test (i.e., end of Grade 5 syllabus) and students who classified as Advanced in Grade 5 post-test would be considered as Satisfactory in Grade 6 post (end of Grade 6 test) so on through Grade 7. In the vertical 10 Linn, R. L. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6(1), 83-102. 18
  • 20. scaled score matrix, students who earned a grade level scaled score of 250 on Grade 5 post-test would also earn a vertical scaled score of 350 (because 350 is the equivalent grade level scaled score in Grade 5 pre-test). Therefore, grade level scaled scores and vertical scaled scores is differed by a constant value of 100 points. The mean vertical scaled scores for each subject are shown in Table 7. Table 7: Grade 5 Mean Vertical Scaled Scores by Subject Area Grade 5 Pre-test Grade 5 Post-test # Subject Area Std. Std. Items N Mean N Mean Dev. Dev. English 30 3798 238.8 83.7 4025 333.4 88.1 Social and Developmental Studies 30 3962 230.5 86.2 4104 341.2 83.9 Mathematics 30 3883 222.4 89.2 4127 361.9 72.6 Integrated Science 30 4039 226.5 80.2 4135 345.6 73.7 Creative and Technology Studies 30 4032 224.1 85.3 4097 344.4 83.0 Community Studies 30 4037 214.0 83.7 4141 336.9 72.3 Figure 5 shows that mean vertical scaled scores on pre and post-tests across the subject areas. Vertical scaled scores for the pre-test are basically the grade level scaled scores. As expected, vertical scaled scores for Grade 5 post-test are higher than the Grade 5 pre-test scaled scores. Figure 5: Vertical Scaled Mean Scores by Subject Area 400 vertical Scaled Score 300 PRE 200 POST 100 0 Eng. SDS Math. ISC CTS CS 3.6 Comparison between Pilot and Comparison Groups The comparisons between pilot and comparison groups were made in raw scores and vertical scaled scores. Although raw scores in the pre and post tests are not on the same scale as the tests are of varied difficulty, however the comparison was made for simplicity. Comparison would be more relevant, valid, and beneficial when they are compared on the vertical scaled score. Note that vertical scaled scores for the pre and post tests are on the same scale. 19
  • 21. Raw Scores Table 8 shows that the raw score mean differences between the pilot and comparison schools on the Grade 5 pre-tests were small for each subject area. The mean differences, analyzed using t-tests, were statistically significant only in English and Mathematics, with the pupils in comparison group performing better than those in the pilot group (p<.05). In the other four subjects, the t-tests showed no significant differences between the two groups on the baseline. In raw scores, differences in English and Mathematics were about a half-point, while the differences for the other subjects had a maximum difference of two-tenths of a point. These results reflected the expectation of very small differences on the pre-tests, since the schools were randomly assigned to one of the two groups based on a matched pairs design. Table 8: Mean Raw Scores by Subject Area and Group Grade 5 Pre-test Grade 5 Post-test Subject Area Group † N Mean Std. Dev. N Mean Std. Dev. Pilot 1785 11.9 6.4 1773 13.3* 1.6 English Comparison 2013 12.4* 6.6 1967 12.2 1.6 Total 3798 12.2 6.5 3740 12.8 1.6 Social and Pilot 1907 10.0 5.2 1895 14.9* 1.3 Developmental Comparison 2055 10.2 5.5 2008 13.7 1.3 Studies Total 3962 10.1 5.3 3903 14.3 1.3 Pilot 1861 12.0 5.3 1849 13.8* 1.4 Mathematics Comparison 2022 12.6* 5.3 1975 13.2 1.4 Total 3883 12.3 5.3 3824 13.5 1.4 Pilot 1961 9.8 4.9 1949 13.2* 1.9 Integrated Science Comparison 2078 9.9 4.9 2031 11.2 1.8 Total 4039 9.9 4.9 3980 12.2 1.9 Pilot 1967 10.5 5.2 1955 12.9* 1.5 Creative and Comparison 2065 10.6 5.4 2018 11.7 1.5 Technology Studies Total 4032 10.5 5.3 3973 12.3 1.5 Pilot 1979 9.5 4.0 1967 13.4* 1.6 Community Studies Comparison 2058 9.5 3.9 2011 12.5 1.6 Total 4037 9.5 4.0 3978 13.0 1.6 * Significant at p<0.05; † represents adjusted weighted sample size. The differences between the two groups for all subject areas in Grade 5 post- test (also in Table 8),were evaluated using an Analysis of Covariance (ANCOVA), with the pre-test scores as the covariates. In other words, the pre- tests scores were made statistically equivalent so that the groups could be evaluated on an equal basis on the post-tests. Using the raw scores, the results were statistically significant in each of the subject areas, with the pilot group outperforming the comparison group (p<.05). Note that all statistical comparisons were made at the school level, not at the student level. This was due to changes in student population at each school from pre-test to post-test. The design was based on cohorts (student groups 20
  • 22. over time) and not on panels (the same students over time). A panel design would have been statistically possible, but it would also have led to skewed results due to student attrition. Vertical Scaled Scores As started, vertical scaled scores on the pre and post tests were computed independently both for pilot and comparison groups and were measured on the same scale (i.e., vertical scale). This makes the comparison more relevant and valid to assess the impact of CA in the pilot schools compared to the comparison schools. Table 9: Mean Vertical Scaled Scores by Subject Area and Group Grade 5 Pre-tests Grade 5 Post-tests Subject Area Group † N Mean Std. Dev. N Mean Std. Dev. Pilot 1785 236.1 82.4 1773 352.3* 20.3 English Comparison 2013 241.2* 84.8 1967 339.9 20.3 Total 3798 238.8 83.7 3740 346.1 20.3 Social and Pilot 1907 229.1 84.3 1895 362.4* 17.7 Developmental Comparison 2055 231.8 87.9 2008 346.2 17.7 Studies Total 3962 230.5 86.2 3903 354.3 17.7 Pilot 1861 217.8 89.3 1849 380.5* 17.1 Mathematics Comparison 2022 226.7* 88.9 1975 373.1 17.1 Total 3883 222.4 89.2 3824 376.8 17.1 Pilot 1961 225.5 80.1 1949 369.5* 20.4 Integrated Science Comparison 2078 227.4 80.4 2031 348.0 20.4 Total 4039 226.5 80.2 3980 358.8 20.4 Pilot 1967 223.0 84.0 1955 357.1* 16.0 Creative and Comparison 2065 225.1 86.5 2018 343.5 16.0 Technology Studies Total 4032 224.1 85.3 3973 350.3 16.0 Pilot 1979 213.7 84.3 1967 365.8* 22.1 Community Studies Comparison 2058 214.2 83.1 2011 352.8 22.1 Total 4037 214.0 83.7 3978 359.3 22.1 * Significant at p<0.05 Table 9 shows that the vertical scaled score mean differences between the pilot and comparison schools on the Grade 5 pre-tests were small for each subject area. The mean differences in all six subject areas, analyzed using t- tests, were not statistically significant (p>.05). In contrast, when the differences between the two groups for all subject areas in Grade 5 post-test (also in Table 9),were evaluated using an ANCOVA (with the pre-test scores as the covariates), the results were statistically significant in all subject areas, with the pilot group outperforming the comparison group (p<.05). Figures 6 through 11 show the differences in vertical scaled scores from the Grade 5 pre-test to the Grade 5 post-test for each of the subject areas. The graphs show clearly the greater score increases by the pilot groups in all subject areas except for Mathematics, where the increases were not as evident as in the other groups, though the pilot group started off lower. 21
  • 23. Figure 6: English Mean Vertical Scores by Group 400 380 360 Vertical Scaled Score 340 320 300 280 Pilot 260 Comparison 240 220 200 Grade 5 Pre-test Grade 5 Post-test Figure 7: Social & Dev. Studies Mean Vertical Scores by Group 400 380 360 Vertical Scaled Score 340 320 300 280 260 Pilot 240 Comparison 220 200 Grade 5 Pre-test Grade 5 Post-test Figure 8: Mathematics Mean Vertical Scores by Group 400 380 360 Vertical Scaled Score 340 320 300 280 260 Pilot 240 Comparison 220 200 Grade 5 Pre-test Grade 5 Post-test 22
  • 24. Figure 9: Integrated Science Mean Vertical Scores by Group 400 380 360 Vertical Scaled Score 340 320 300 280 260 Pilot 240 Comparison 220 200 Grade 5 Pre-test Grade 5 Post-test Figure 10: Creative & Tech. Studies Mean Vertical Scores by Group 400 380 360 Vertical Scaled Score 340 320 300 280 260 Pilot 240 Comparison 220 200 Grade 5 Pre-test Grade 5 Post-test Figure 11: Community Studies Mean Vertical Scores by Group 400 380 360 Vertical Scaled Score 340 320 300 280 260 Pilot 240 Comparison 220 200 Grade 5 Pre-test Grade 5 Post-test 23
  • 25. 3.7 Comparison across Regions While not the focus of the evaluation, the next two sections have useful information on student performance. Tables 10 and 11 contain a brief analysis of the scores by region, providing information on the scores on a disaggregated basis. As with the overall analyses, the comparisons across the three regions were made in raw scores and vertical scaled scores. Lusaka Region consistently had the highest mean scores (both raw scores and vertical scaled scores) in all subjects on the Grade 5 pre-tests, followed by Western and Southern. The same pattern of results was also observed for Grade 5 post-tests. Table 10: Subject Area Mean Raw Scores by Region Grade 5 Pre-test Grade 5 Post-test Subject Area Region N Mean Std. Dev. N Mean Std Dev. Southern 1010 11.0 6.2 1157 10.4 6.6 Western 994 11.7 5.9 1103 11.9 6.7 English Lusaka 1794 13.1 6.9 1765 12.4 7.5 Total 3798 12.2 6.5 4025 11.7 7.1 Southern 1014 9.4 4.8 1214 11.7 6.0 Social and Western 1112 9.9 4.9 1125 13.2 6.1 Developmental Studies Lusaka 1836 10.7 5.8 1765 14.1 7.0 Total 3962 10.1 5.3 4104 13.2 6.6 Southern 1002 11.5 5.4 1226 11.1 5.2 Western 1086 12.2 5.2 1120 12.7 5.3 Mathematics Lusaka 1795 12.9 5.2 1781 13.0 6.3 Total 3883 12.3 5.3 4127 12.4 5.8 Southern 1025 9.2 4.4 1212 9.6 5.4 Integrated Western 1151 9.4 4.6 1154 11.7 6.4 Science Lusaka 1863 10.6 5.3 1769 11.8 6.7 Total 4039 9.9 4.9 4135 11.1 6.3 Southern 1016 9.6 4.8 1205 9.9 5.6 Creative and Western 1140 10.2 5.0 1146 11.3 6.0 Technology Studies Lusaka 1876 11.2 5.7 1790 11.9 6.9 Total 4032 10.5 5.3 4141 11.2 6.4 Southern 1015 9.0 3.5 1191 10.5 5.3 Community Western 1146 9.4 4.3 1122 11.5 6.0 Studies Lusaka 1876 9.8 4.0 1784 12.7 6.8 Total 4037 9.5 4.0 4097 11.7 6.2 24
  • 26. Table 11: Subject Area Mean Vertical Scaled Scores by Region Grade 5 Pre-test Grade 5 Post-test Subject Area Region N Mean Std. Dev. N Mean Std Dev. Southern 1010 224.1 80.3 1157 317.3 82.8 Western 994 232.3 72.9 1103 335.0 81.0 English Lusaka 1794 250.7 89.3 1765 343.0 94.1 Total 3798 238.8 83.7 4025 333.4 88.1 Southern 1014 218.5 77.4 1214 321.7 76.7 Social and Western 1112 226.4 79.1 1125 341.1 78.1 Developmental Studies Lusaka 1836 239.6 93.6 1765 354.7 89.5 Total 3962 230.5 86.2 4104 341.2 84.0 Southern 1002 209.2 91.0 1226 346.6 66.1 Western 1086 219.9 86.2 1120 366.6 65.5 Mathematics Lusaka 1795 231.3 89.0 1781 369.5 79.3 Total 3883 222.4 89.2 4127 361.9 72.6 Southern 1025 215.7 72.1 1212 328.9 63.5 Integrated Western 1151 218.1 76.1 1154 353.0 74.2 Science Lusaka 1863 237.5 85.5 1769 352.4 78.0 Total 4039 226.5 80.2 4135 345.7 73.7 Southern 1016 209.8 77.9 1191 327.6 70.7 Creative and Western 1140 218.9 79.7 1122 340.7 79.5 Technology Studies Lusaka 1876 234.9 90.8 1784 357.7 90.3 Total 4032 224.1 85.3 4097 344.3 83.0 Southern 1015 204.2 74.8 1205 323.4 64.3 Community Western 1146 213.1 88.6 1146 338.7 66.8 Studies Lusaka 1876 219.8 84.6 1790 344.9 79.1 Total 4037 214.0 83.7 4141 336.9 72.3 3.8 Performance Categories Depending on test difficulty and score distributions, performance categories were established for each of the tests using a procedure called standard setting. An Angoff (1971) 11 standard setting method was implemented to set the cut scores between Unsatisfactory and Satisfactory and between Satisfactory and Advanced both for pre-tests and post-tests. The resultant cut scores are presented in Tables 12 and 13. In English, for example, students who got a score of 1-12 would be classified Unsatisfactory, students who got a score of 12-21 would be classified as Satisfactory and students who earned a score of 22-30 would be classified as Advanced on the pre-test. For Mathematics, the corresponding ranges are 1-13 Unsatisfactory, 14-19 Satisfactory, and 20-30 Advanced for the pre-test. The post-test ranges for each subject area are different from those on the pre-tests; the reason is that the pre-tests and post-tests covered different content and had different levels of difficulty. 11 Angoff, W. H. (1971). Scales, Norms, and Equivalent Scores. In R.L. Thorndike (Ed.) Educational Measurement (2nd ed.). (pp. 508-560). Washington, DC: American Council on Education. 25
  • 27. Table 12: Performance Categories for Pre-tests by Subject Grade 5 Pre-test Subject Area 1 2 3 Unsatisfactory Satisfactory Advanced (Fail) (Pass) (Pass) English 1-12 13-21 22-30 Social and 1-10 11-17 18-30 Developmental Studies Mathematics 1-13 14-19 20-30 Integrated Science 1-10 11-17 18-30 Creative and Technology 1-11 12-18 19-30 Studies Community Studies 1-10 11-15 16-30 Table 13: Performance Categories for Post-tests by Subject Grade 5 Post-test Subject Area 1 2 3 Unsatisfactory Satisfactory Advanced (Fail) (Pass) (Pass) English 1-12 13-21 22-30 Social and 1-13 14-21 22-30 Developmental Studies Mathematics 1-10 11-19 20-30 Integrated Science 1-10 11-20 21-30 Creative and Technology 1-11 12-21 22-30 Studies Community Studies 1-11 12-19 20-30 Tables 14 and 15 provide the percentages of students classified in the 3 performance categories by subject. On the pre-test, the percentages in each category by group were similar for most of the subjects. For instance, in Integrated Science, similar percentages of students were in the Satisfactory (Pass) category for the pilot (34%) and comparison (33%) groups. However, on the post-test, there were some differences for the groups, mostly in favour of the pilot group. In Integrated Science, 53% of students in the pilot group were Satisfactory vs. 43% of students in the comparison group. The percentages for each group favoured the pilot group on the post-test, with the exception of Mathematics where the rounded percentage passing was the same in the pilot (65%) and comparison (65%) groups. 26
  • 28. Table 14: Percentages of Students in Performance Categories for Pre-tests Grade 5 Pre-test Subject Area Group 1 2 3 Unsatisfactory Satisfactory Advanced (Fail) (Pass) (Pass) Pilot 63.0 27.2 9.8 English Comparison 59.7 28.2 12.1 Social and Pilot 62.8 26.9 10.3 Developmental Studies Comparison 64.4 24.0 11.6 Pilot 64.3 26.2 9.5 Mathematics Comparison 60.1 29.4 10.5 Integrated Pilot 65.9 25.6 8.5 Science Comparison 67.3 22.9 9.8 Creative and Pilot 67.5 22.9 9.6 Technology Studies Comparison 68.4 20.1 11.5 Community Pilot 66.8 25.4 7.8 Studies Comparison 66.8 24.8 8.4 Table 15: Percentages of Students in Performance Categories for Post-tests Grade 5 Post-test Subject Area Group 1 2 3 Unsatisfactory Satisfactory Advanced (Fail) (Pass) (Pass) Pilot 60.0 26.5 13.5 English Comparison 64.0 24.0 11.9 Social and Pilot 51.4 33.4 15.3 Developmental Studies Comparison 59.3 30.6 10.2 Pilot 35.2 53.9 10.9 Mathematics Comparison 34.8 56.3 8.9 Integrated Pilot 46.7 40.2 13.1 Science Comparison 57.3 36.0 6.7 Creative and Pilot 54.5 35.1 10.4 Technology Studies Comparison 62.3 31.0 6.7 Community Pilot 50.4 33.9 15.6 Studies Comparison 54.4 36.2 9.5 27
  • 29. Chapter Four: Summary and Conclusions The main objective of the evaluation was to determine whether the CA programme is having positive effects on student learning outcomes in the first year of implementation. This was accomplished by measuring and comparing the levels of learning achievement of pupils in pilot (intervention) and comparison (control) schools. A baseline (pre-test) assessment occurred before implementation of the proposed interventions at the beginning of Grade 5 in randomly selected pilot schools. This created a basis upon which the impact of CA was measured at the end of the Grade 5 pilot year. A sample of 48 schools was selected from Lusaka, Southern and Western Provinces using a matched pairs design and random assignment, resulting in 24 pilot schools and 24 comparison schools. Student achievement for the Grade 5 baseline and post-test administrations was measured using multiple choice tests in 6 subject areas with 30 items each (30 points per test). The Grade 5 baseline tests were based on the Grade 4 curriculum, while the Grade 5 post-tests were based on the Grade 5 curriculum. Overall, the psychometric characteristics of the tests were very satisfactory on both the pre-tests and post-tests. Items were within acceptable difficulty (p-value) ranges and discrimination (point-biserial correlation) levels. Overall tests were found reliable, using Cronbach's Alpha as an estimate of internal consistency reliability. Performance of the schools in the baseline and post-tests were compared using mean raw scores and mean vertical scaled scores. The vertical scaled score comparison was found more relevant, valid, and beneficial, since the school mean scores both on the baseline and post-tests were evaluated on the same measurement scale (i.e., vertical scale). In addition, statisticians generally prefer using scaled scores for longitudinal comparisons since the scale is equal interval, thus making comparisons more accurate. Overall, the pupils’ scores on the baseline pre-test were very similar in the pilot and comparison schools. The comparison schools scored slightly higher on the English and Mathematics tests, but the score differences for the two groups on the other four tests were minimal. On the post-test, which was administered after one year of the CA programme, the scores of the pilot schools on all six tests were significantly higher than those in the comparison schools. This provides strong initial evidence that the CA programme had a significantly positive effect on pupil learning outcomes. When the performance of the schools on the baseline and post-tests were compared by region, Lusaka Region consistently had the highest mean scores in all subjects on the Grade 5 pre-tests and post-test, followed by Western and Southern. The number of schools by region was too small to make statistically valid region-by-region comparisons of pre-test to post-test scores for the pilot and comparison groups. Students were also classified into three performance level categories (Unsatisfactory, Satisfactory, and Advanced) in each subject area based on their performance in baseline and post-tests. On the pre-tests, the 28
  • 30. percentages in each category by group were similar for most of the subjects. However, on the post-test, there were differences in favour of the pilot group in virtually all subjects. For instance, in Integrated Science, 53% of students in the pilot group were Satisfactory and above vs. 43% of students in the comparison group. This provided strong evidence that a greater percentage of students in the pilot group were achieving a passing score on the post-test than those in the comparison group. The next round of post-tests in the Phase 1 schools will be administered when the same cohort of pupils completes Grade 6. This will be followed by a final test administration (a third post-test) when the cohort of pupils completes Grade 7. At that point, with four time points (a baseline and three post-tests), more substantial conclusions will be drawn on the effectiveness of the CA programme. Note also that the evaluation process is being repeated in the Phase 2 and Phase 3 schools, which will provide a complete national quantitative evaluation of the programme at the end of Year 5 of implementation (2010). Based on guidance from the CA Steering Committee, results from the evaluation will be used at a selected point in the implementation period as a criterion for scaling up the CA programme to other primary schools in Zambia. 29
  • 31. Appendix 1: Item Statistics by Subject
  • 32. Table A1: English Item Statistics P-value Pt-Biserial P-value Pt-Biserial Seq. Seq. Pre-test Pre-test Post-test Post-test 1 .65 .47 1 .65 .55 2 .63 .53 2 .51 .58 3 .63 .52 3 .48 .44 4 .48 .56 4 .41 .54 5 .52 .55 5 .40 .48 6 .40 .53 6 .29 .36 7 .56 .58 7 .50 .45 8 .54 .55 8 .46 .46 9 .46 .56 9 .52 .61 10 .46 .41 10 .35 .61 11 .61 .52 11 .26 .46 12 .40 .52 12 .21 .35 13 .38 .47 13 .33 .58 14 .39 .50 14 .36 .56 15 .27 .46 15 .35 .55 16 .29 .42 16 .33 .40 17 .28 .40 17 .22 .24 18 .47 .55 18 .36 .59 19 .33 .40 19 .42 .54 20 .36 .46 20 .40 .51 21 .24 .46 21 .34 .53 22 .34 .30 22 .38 .47 23 .33 .36 23 .21 .35 24 .37 .47 24 .38 .56 25 .39 .46 25 .41 .49 26 .35 .42 26 .35 .46 27 .31 .38 27 .34 .50 28 .25 .28 28 .30 .40 29 .27 .32 29 .38 .52 30 .20 .29 30 .27 .40
  • 33. Table A2: Social and Developmental Studies Item Statistics P-value Pt-Biserial P-value Pt-Biserial Seq. Seq. Pre-test Pre-test Post-test Post-test 1 .49 .52 1 .66 .57 2 .47 .39 2 .53 .60 3 .39 .49 3 .66 .60 4 .37 .32 4 .58 .50 5 .35 .47 5 .51 .57 6 .36 .35 6 .48 .61 7 .43 .51 7 .52 .61 8 .41 .41 8 .42 .31 9 .36 .21 9 .44 .56 10 .37 .43 10 .49 .50 11 .38 .49 11 .34 .42 12 .37 .48 12 .39 .43 13 .35 .42 13 .51 .49 14 .33 .34 14 .43 .54 15 .30 .46 15 .36 .58 16 .33 .41 16 .36 .44 17 .28 .30 17 .39 .40 18 .31 .26 18 .42 .42 19 .30 .46 19 .37 .55 20 .40 .45 20 .34 .51 21 .25 .44 21 .32 .38 22 .26 .43 22 .35 .36 23 .25 .41 23 .32 .44 24 .26 .29 24 .38 .26 25 .36 .31 25 .38 .25 26 .26 .32 26 .34 .39 27 .26 .19 27 .36 .31 28 .27 .37 28 .32 .24 29 .29 .19 29 .27 .22 30 .30 .25 30 .30 .39
  • 34. Table A3: Mathematics Item Statistics P-value Pt-Biserial P-value Pt-Biserial Seq. Seq. Pre-test Pre-test Post-test Post-test 1 .81 .43 1 .70 .56 2 .59 .51 2 .65 .55 3 .46 .34 3 .71 .57 4 .49 .48 4 .56 .55 5 .54 .55 5 .60 .54 6 .57 .51 6 .64 .52 7 .44 .42 7 .46 .48 8 .46 .25 8 .50 .50 9 .43 .29 9 .47 .32 10 .50 .51 10 .55 .34 11 .43 .51 11 .38 .44 12 .34 .26 12 .39 .44 13 .39 .42 13 .39 .45 14 .46 .42 14 .40 .45 15 .48 .45 15 .42 .28 16 .30 .25 16 .34 .32 17 .36 .30 17 .34 .46 18 .32 .23 18 .38 .48 19 .33 .36 19 .29 .34 20 .27 .28 20 .30 .35 21 .52 .40 21 .25 .37 22 .57 .48 22 .27 .40 23 .32 .33 23 .23 .34 24 .40 .46 24 .24 .33 25 .31 .43 25 .18 .23 26 .27 .32 26 .27 .33 27 .30 .26 27 .24 .28 28 .21 .17 28 .36 .48 29 .19 .15 29 .16 .18 30 .25 .32 30 .23 .30
  • 35. Table A4: Integrated Science Item Statistics P-value Pt-Biserial P-value Pt-Biserial Seq. Seq. Pre-test Pre-test Post-test Post-test 1 .49 .42 1 .53 .56 2 .33 .17 2 .53 .56 3 .45 .41 3 .39 .57 4 .41 .44 4 .51 .49 5 .31 .20 5 .44 .52 6 .40 .39 6 .57 .48 7 .28 .43 7 .45 .49 8 .31 .26 8 .47 .53 9 .34 .45 9 .44 .48 10 .29 .26 10 .33 .51 11 .43 .29 11 .38 .34 12 .31 .40 12 .42 .49 13 .52 .28 13 .31 .44 14 .37 .45 14 .36 .51 15 .36 .42 15 .36 .40 16 .41 .43 16 .36 .49 17 .34 .29 17 .38 .55 18 .30 .50 18 .21 .21 19 .37 .50 19 .28 .42 20 .26 .25 20 .38 .48 21 .29 .37 21 .29 .47 22 .26 .38 22 .34 .49 23 .28 .34 23 .25 .29 24 .24 .39 24 .22 .16 25 .20 .35 25 .31 .38 26 .25 .25 26 .25 .29 27 .27 .33 27 .25 .36 28 .29 .21 28 .27 .40 29 .23 .45 29 .23 .27 30 .30 .27 30 .21 .33
  • 36. Table A5: Creative & Technology Studies Item Statistics P-value Pt-Biserial P-value Pt-Biserial Seq. Seq. Pre-test Pre-test Post-test Post-test 1 .25 .55 1 .29 .34 2 .41 .50 2 .41 .50 3 .33 .34 3 .43 .55 4 .56 .45 4 .49 .64 5 .38 .16 5 .46 .54 6 .40 .34 6 .40 .55 7 .35 .46 7 .47 .45 8 .36 .34 8 .48 .52 9 .39 .54 9 .43 .37 10 .47 .48 10 .44 .53 11 .43 .48 11 .29 .46 12 .41 .31 12 .40 .52 13 .30 .40 13 .36 .55 14 .28 .41 14 .39 .56 15 .26 .39 15 .32 .46 16 .37 .52 16 .28 .37 17 .29 .27 17 .36 .37 18 .36 .35 18 .40 .52 19 .41 .40 19 .33 .51 20 .30 .41 20 .22 .25 21 .29 .54 21 .36 .35 22 .25 .25 22 .36 .28 23 .50 .40 23 .29 .25 24 .31 .34 24 .30 .36 25 .28 .387 25 .27 .42 26 .22 .14 26 .28 .44 27 .47 .37 27 .27 .32 28 .34 .32 28 .33 .24 29 .39 .35 29 .23 .52 30 .17 .08 30 .32 .44
  • 37. Table A6: Community Studies Item Statistics P-value Pt-Biserial P-value Pt-Biserial Seq. Seq. Pre-test Pre-test Post-test Post-test 1 .62 .41 1 .53 .52 2 .52 .35 2 .44 .60 3 .46 .42 3 .53 .61 4 .43 .48 4 .52 .57 5 .41 .33 5 .44 .49 6 .36 .32 6 .44 .40 7 .31 .21 7 .47 .51 8 .36 .33 8 .42 .57 9 .27 .20 9 .38 .56 10 .37 .21 10 .44 .50 11 .30 .35 11 .30 .41 12 .40 .38 12 .42 .52 13 .30 .19 13 .39 .51 14 .30 .45 14 .36 .43 15 .20 .18 15 .44 .41 16 .30 .36 16 .33 .49 17 .30 .25 17 .43 .50 18 .28 .38 18 .36 .42 19 .26 .21 19 .37 .29 20 .25 .19 20 .32 .31 21 .31 .34 21 .34 .44 22 .26 .21 22 .32 .39 23 .25 .26 23 .32 .29 24 .25 .24 24 .26 .31 25 .30 .31 25 .29 .37 26 .22 .28 26 .30 .28 27 .26 .28 27 .28 .41 28 .23 .21 28 .27 .24 29 .19 .16 29 .24 .21 30 .21 .16 30 .24 .23
  • 38. Appendix 2: Scores and Frequencies – Grade 5 Pre-Tests
  • 39. Table A7: English Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 1 -3.59 100 24 1.3 1.3 30 1.5 1.5 2 -2.84 100 28 1.6 2.9 31 1.5 3.0 3 -2.38 102 43 2.4 5.3 61 3.0 6.1 4 -2.04 126 54 3.0 8.3 45 2.2 8.3 5 -1.76 146 66 3.7 12.0 76 3.8 12.1 6 -1.52 163 112 6.3 18.3 112 5.6 17.6 7 -1.31 178 138 7.7 26.1 152 7.6 25.2 8 -1.11 192 145 8.1 34.2 137 6.8 32.0 9 -0.93 205 151 8.5 42.6 146 7.3 39.2 10 -0.76 217 140 7.8 50.5 142 7.1 46.3 11 -0.60 228 118 6.6 57.1 158 7.8 54.1 12 -0.44 239 105 5.9 63.0 111 5.5 59.7 13 -0.29 250 68 3.8 66.8 109 5.4 65.1 14 -0.14 261 83 4.6 71.4 85 4.2 69.3 15 0.01 271 67 3.8 75.2 68 3.4 72.7 16 0.16 282 55 3.1 78.3 68 3.4 76.1 17 0.30 292 50 2.8 81.1 41 2.0 78.1 18 0.46 303 41 2.3 83.4 45 2.2 80.3 19 0.61 314 43 2.4 85.8 52 2.6 82.9 20 0.77 325 44 2.5 88.2 50 2.5 85.4 21 0.94 337 35 2.0 90.2 50 2.5 87.9 22 1.12 350 24 1.3 91.5 27 1.3 89.2 23 1.31 363 25 1.4 92.9 36 1.8 91.0 24 1.52 378 19 1.1 94.0 37 1.8 92.8 25 1.75 395 19 1.1 95.1 46 2.3 95.1 26 2.03 415 26 1.5 96.5 28 1.4 96.5 27 2.37 439 14 .8 97.3 18 .9 97.4 28 2.82 471 19 1.1 98.4 28 1.4 98.8 29 3.56 500 23 1.3 99.7 20 1.0 99.8 30 4.80 500 6 .3 100.0 4 .2 100.0 Total 1785 100.0 2013 100.0
  • 40. Table A8: Social and Developmental Studies Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 1 -3.42 100 28 1.5 1.5 28 1.4 1.4 2 -2.69 100 30 1.6 3.0 35 1.7 3.1 3 -2.24 100 49 2.6 5.6 46 2.2 5.3 4 -1.91 112 78 4.1 9.7 66 3.2 8.5 5 -1.65 139 129 6.8 16.5 138 6.7 15.2 6 -1.42 162 164 8.6 25.1 188 9.1 24.4 7 -1.22 183 179 9.4 34.5 209 10.2 34.5 8 -1.04 201 210 11.0 45.5 253 12.3 46.9 9 -0.87 218 175 9.2 54.6 191 9.3 56.2 10 -0.71 235 155 8.1 62.8 169 8.2 64.4 11 -0.56 250 143 7.5 70.3 118 5.7 70.1 12 -0.42 264 111 5.8 76.1 97 4.7 74.8 13 -0.27 280 79 4.1 80.2 78 3.8 78.6 14 -0.14 293 60 3.1 83.4 65 3.2 81.8 15 0.00 307 39 2.0 85.4 46 2.2 84.0 16 0.14 321 36 1.9 87.3 50 2.4 86.5 17 0.28 336 45 2.4 89.7 39 1.9 88.4 18 0.42 350 32 1.7 91.3 36 1.8 90.1 19 0.56 364 28 1.5 92.8 30 1.5 91.6 20 0.71 380 29 1.5 94.3 32 1.6 93.1 21 0.87 396 27 1.4 95.8 24 1.2 94.3 22 1.04 413 14 .7 96.5 28 1.4 95.7 23 1.22 432 22 1.2 97.6 17 .8 96.5 24 1.42 452 16 .8 98.5 19 .9 97.4 25 1.65 476 6 .3 98.8 17 .8 98.2 26 1.91 500 12 .6 99.4 14 .7 98.9 27 2.24 500 7 .4 99.8 13 .6 99.6 28 2.69 500 3 .2 99.9 7 .3 99.9 29 3.42 500 1 .1 100.0 1 .0 100.0 30 4.65 500 0 .0 100.0 1 .0 100.0 Total 1907 100.0 2055 100.0