Ofqual conducts research on international comparisons of senior secondary assessments taken by students prior to university entrance. The research involves analyzing assessments in chemistry, English, history and mathematics from different international systems. Emerging findings show variations between assessments in topics covered, question types, and skills evaluated. The goal is to understand effective assessment models and ensure England's A Levels continue to properly prepare students for university.
ICT role in 21st century education and it's challenges.
International comparisons in senior secondary assessments
1. International comparisons in senior secondary assessments Dennis Opposs Presented at the 37 th IAEA Annual Conference, Manila 27 October 2011
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15. CRAS analysis Learner needs to devise their own strategy Learner must monitor the application of their strategy Learner must select content from a large, complex pool of information Learner must organise how to communicate response Strategy for answer is given No need to monitor strategy No selection of information required No organisation required The extent to which the learner devises (or selects) and maintains a strategy for tackling and answering the question Strategy S Highly abstract Requires use of technical terms Deals with concrete objects Avoids need for technical terms The extent to which the learner deals with ideas rather than concrete objects or phenomena Abstractness A Learner must generate all the necessary data/information All and only the data/ information needed is given The use of data and information Resources R Synthesis or evaluation of operations Requires technical comprehension Makes links between operations Simple operations (i.e. ideas/steps) No comprehension, except that required for natural language No links between operations The complexity of each component operation or idea and the links between them Complexity C 4 2–3 1 Scoring What is it? Category
16. CRAS analysis Learner must generate all the necessary data/information. 4 All and only the data/ information needed is given. Scoring: 1 The use of data and information. What is it? Category: Resources
17. CRAS analysis Highly abstract. Requires use of technical terms. 4 Deals with concrete objects. Avoids need for technical terms. Scoring: 1 The extent to which the learner deals with ideas rather than concrete objects or phenomena. What is it? Category: Abstractness
18.
19.
20.
21.
22.
23.
24.
25.
Notes de l'éditeur
Ofqual is the regulator of qualifications and assessments in England. We regulate – we do not provide assessments ourselves. Our work is focussed on maintaining standards.
At the last two IAEA conferences, I have spoken about Ofqual’s two-year research study looking at the consistency of assessments and factors which may affect the reliability of results. We now have on our website more than 20 reports from the programme – from highly statistical analyses to work on public perceptions. So if that is an area of work that interests you, do look on our website. But I am here in Manila to talk about our international work.
How well do qualifications and assessments used in England stand up to their equivalents used in other countries? We thought that was a valuable question to tackle. Last year Ofqual started a six-year research programme. This will investigate the demand of assessments commonly taken by learners internationally in comparison to those taken by learners in England. It is not about producing an international league table. It is about gaining rich information and seeing what we can learn from other systems that we can use to strengthen our qualifications. We have found interesting features in other systems so what we have done is proving worthwhile.
The first of these studies has focused on assessments available to senior secondary learners intending to progress to university. A levels are the main qualifications used to gain entry to university in England. They are available in over 45 subjects and around 860,000 entries were made in summer 2011 by about 250,000 students. A levels are coming out comparatively well in the initial findings we are currently working through. Reflecting on it, though, we realise that A levels are designed for students who will mostly go to universities in England so in that sense they have an advantage over the other systems we studied. At first sight our findings for A levels don’t correspond to findings for England in international surveys like PISA, where our performance is typically in the middle of the range. However, do not believe that the two sets of findings contradict each other. That is because of what each assessment is measuring and who takes them, and what conclusions can be drawn.
This slide is about what the two types of studies are measuring and who takes them. The first row here is about the issue of these assessments being different in terms of motivation for learners and what learners are used to in terms of style of assessment. The second row is about the issue that the international studies can only look at a part of the curriculum. They use that part that is common between all the systems that agree to be part of the survey. The third row. PISA results are for the whole cohort of 15 year olds across the ability range. That contrasts with our focus. We were effectively looking at the next stage of education but only for the top 10% of 18 year olds . A levels seem to be preparing these top students comparatively well for university. We must be clear that our work so far tells us nothing about students with lower A level grades. It tells us nothing about how the rest of our cohort at a younger age compares with the rest of the world.
So one reason we believe that the two sets of findings do not contradict each other is because of what each assessment is measuring and who takes them. A second reason is because you cannot use the findings of international surveys in isolation as a definitive answer to the question of whether one educational system is better than another. As OECD says ‘on their own, cross-sectional international comparisons such as PISA cannot identify cause-and-effect relationships between certain factors and educational outcomes, especially in relation to the classroom and the processes of teaching and learning that take place there’.
The model for Ofqual’s study takes into account the limitations of previous reviews. We included assessments offered across a range of educational systems. We considered a range of subjects. We acknowledge that the transferability of findings will be limited to just one dimension of the education systems - the demand of assessment. We invited 22 countries, states and provinces - education systems. We also invited Cambridge International Examinations, the International Baccalaureate Organization and the three A level organisations in England with the highest number of learners, to cooperate with the study. They were asked to contribute materials in four subjects representing a range of academic disciplines: chemistry, English (where this is the national language or the main language of tuition), history and mathematics.
(Apologies about India and Pakistan) The selection of educational systems invited to cooperate was based on a number of criteria: We wanted to include European Union, British Commonwealth and other education systems representing a breadth of education traditions and economic structures We wanted to include educational systems where the primary language of tuition is not English We wanted to include educational systems with universities ranked highly in international comparisons We wanted to take into account the findings of the 2009 PISA survey We wanted to take into account educational systems which have the highest rates of learners coming to the UK to study (click) (click) Here we are in Manila Of those that were invited the following educational systems accepted: Sequence matches the sequence the pins appear on the map New South Wales, Australia Alberta, Canada China (The National Higher Education Entrance Examination – The Gaokao) Denmark Finland France Hong Kong (both the outgoing HKALE and the incoming HKDSE) Netherlands (havo and vwo) New Zealand Norway Republic of Ireland Republic of Korea New York State, USA A handful of others wouldn’t play ball.
Also in the study are: The International Baccalaureate (concentrating on higher level subjects) The ACT, USA (standardised test for college admissions) The Cambridge Pre-U The Cambridge International A levels And of course UK A levels - [OCR – chemistry, AQA – English literature, and Edexcel – history and mathematics)
What the education systems have in common is that they are the main assessment used by learners to gain entry to university in their respective system. But they differed in purpose. Some were assessments (either provided by the state or by another organisation) with the specific purpose of selecting students for university (for example a matriculation examination). Others were assessments of educational achievement that in addition are used to identify students for university (of which A levels are one).
As the demand of any assessment is linked to its purpose and relation to other assessments, we had to redefine each assessment regardless of its original purpose and relationships, in a common way to enable meaningful comparisons to be made. Therefore the demand of each assessment was judged in relation to its appropriateness in preparing learners for entry to honours level degree study in England. In order to undertake this analysis, a framework with two dimensions was established that included: assessment model level factors, and assessment instrument level factors This framework is grounded in recent thinking and experience from projects in the area of comparability. Four panels of about 10 members - one panel for each subject - were convened, each led by an independent subject expert. The panel memberships were designed to represent a broad spectrum of stakeholders and to bring together a range of different views.
One dimension of the framework is the assessment model level analysis This part of the analysis required reviewers to consider what learners needed to know and be able to do to succeed in the assessment; the nature of the subject matter to be covered (including topics covered, and breadth and depth of subject coverage), and accessibility to the range of learners taking the assessment. So this analysis comprised: a factual check of the content covered, the assessment objectives, the types of assessment, how the content was assessed and how the answers were rewarded, and how each assessment model compared to the A level a comparison made between these aspects for each of the sets of assessment materials reviewed and A levels.
The other dimension of the framework is the assessment instrument level analysis. We used a tool of assessment analysis at question level that was developed for Ofqual’s predecessor organisation, QCA. This CRAS analysis is used as a basis for organising the thoughts of reviewers with regard to specific features at question, task or question paper level.
CRAS is an acronym for complexity , resources , abstractness and strategy . In undertaking a CRAS analysis, panel members were asked to rate the assessment instruments and/or individual questions against a set of factors that are known to affect the demand of questions against the standard that would be expected that candidates would meet prior to entrance to university. The ratings use a scale of 1 (below the required level) to 4 (exceeding the required level) and cover the complexity of each idea and the links between them. This slide is difficult to read so here are two of the rows re formatted.
“ Resources” is about the extent to which resources are provided. So if all the data or information needed is given, the question is scored 1. If the student must generate all the necessary data or information themselves, a score of 4 is given, the maximum.
Similarly here is “abstractness” - the extent to which the learner deals with abstract ideas rather than concrete objects or phenomena. (pause) Once the panel members had carried out their individual analyses of the qualification specifications and assessment instruments that they were assigned, the subject panel leaders collated these analyses to produce subject-level reports on the collective findings of the panels. Once each report is complete, a draft of the final report will be shared with participating educational systems and organisations whose qualifications were included. Ofqual will publish a final report in early 2012.
Emerging findings Methodology. Feedback from those who participated indicates that the process for reviewing the assessment materials was an effective means of distilling the key features of each assessment and how they compared to A levels. The availability of parts of assessment (for example, where components were internally assessed) and the lack of learner work limited the depth of analysis that could be undertaken. We would like to extend future research to include these aspects. While A level specifications were seen as strong in terms of depth, breadth and analytical thinking, there were a number of features within other systems that were of particular interest. (click) The inclusion of independent research and extended essays where they formed part of a baccalaureate or diploma style assessment system could bring additional depth to subject expertise. (click) The high proportion of external assessment in A levels was unusual. While the standard of internally set assessment could not be reviewed as part of this study, it had the potential to stretch learners, especially where oral examinations were part of the system. (click) Multiple-choice questions were common in other education systems at senior secondary level. They are not common in A levels. Many of the multiple-choice tests found in other assessment systems were very demanding and could assess skills that are more difficult to test by other means. (click) A levels had amongst the most straightforward and transparent mark schemes for assessment. In other systems marks were allocated in complex ways, marks were deducted for incorrect answers, or learners were not permitted to see what had been rewarded in previous assessments. (click) Predictability of questions - where very similar questions appear in successive assessments - appeared to be an issue for some education systems. While this was not part of the initial research question, such predictability would affect the demand of otherwise challenging looking questions.
What did we find out about each of the four subjects? Mathematics was particularly interesting. There was little agreement on what was considered core mathematics teaching at senior secondary level between education systems. Some assessments focused almost entirely on pure mathematics, while others were concerned predominately with the application of mathematics. The former would equip you well for a science, engineering or mathematics university course, the latter for a social science course – but none would have applicability for both. A level mathematics was unusual in covering both areas in the same course. (click) The volume of different mathematics assessments (often at a lower level) available to learners in many education systems was also a contrast to A level mathematics. (click) Additionally, the use of new technologies in assessment (for example, allowing the use of algebraic calculators) in other education systems such as New Zealand was viewed as positive, as it reflects the type of work a learner would be undertaking in a university setting. (click) The case of mechanics is a particularly interesting example, as it is covered in A level mathematics but was absent in nearly all other mathematics assessments. A desk research exercise of 10 educational systems showed that they covered mechanics as core content within their physics specification rather than in mathematics . This was in many cases a sizeable proportion of the curriculum.
Chemistry. Distinct differences were found in coverage of topics, the balance between content-recall and application, and how practical work was assessed and rewarded. A levels had broader, more balanced content, including organic, physical, inorganic and analytical chemistry in similar proportions, and included a number of modern methods of analysis, and a greater depth of treatment of rates of reaction (click) and the inclusion of more challenging topics such as entropy and free energy and their inter-relationship. Reviewers felt that A levels had the most breadth and depth of all systems, and also – and this will surprise some in England (click) - had the most mathematical content of all the specifications considered in the review.
English. There was no consensus amongst any of the educational systems on what is meant by requirements for 'English' at senior secondary level. A levels stand apart in having almost an exclusive focus on reading and interpreting traditional forms of text. (click) In other systems there were very different views on what could be considered a text (from photo, to film to Chaucer) and reviewers considered that this broader interpretation may prepare learners better for studying in a university setting. (click) Oral examinations were relatively common as were multiple-choice question papers. The latter was thought to offer something distinct from that in A levels, enabling assessment in the use of English.
History. There were very different views on what the purpose of historical study was at senior secondary education between systems. Was it to promote 'right thinking' and/or good citizenship? Was it the development of historical study skills and concepts? Was it the ability to rote learn facts and figures from history? There were huge differences in the amount of national history required, and what was considered 'history'. The A level stood out with regard to preparation for university in England as it had a good balance of historical content with the concepts and skills to analyse and interpret materials.
In drawing conclusions from our work we need to take into account that in other countries the breadth and depth of the curriculum and of individual subjects is different from A levels. Like for like comparisons are difficult. Other systems can have different priorities from us and that’s reflected in their design. But looking at what happens elsewhere provides us with a rich source of ideas that we will think hard about and debate before we produce another generation of A levels.
Further studies looking at 11 and 16 year olds are getting under way. Mathematics and English seem key to those but no decisions yet on which subjects we’ll include or which other systems we will use as comparisons. In the meantime, a full report on what I’ve been talking about today will be published early next year. Thank you.