SlideShare une entreprise Scribd logo
1  sur  59
Large-scale testing:
Uses and abuses
Richard P. Phelps
Universidad Finis Terrae, Santiago, Chile
January 7, 2014
Large-scale testing: Uses and abuses

1.
2.
3.
4.
5.

3 types of large-scale tests
Measuring test quality
A chronology of mistakes
Economists misunderstand testing
How SIMCE is affected
1. Three types of large-scale tests

Achievement
Aptitude
Non-cognitive
Achievement tests
Historically, were larger versions of classroom tests
~ 1900 - “scientific” achievement tests developed
(Germany & USA)
J.M. Rice -

systematically analyzed
test structures & effects

E.L. Thorndike -

developed scoring
scales

SOURCE: Phelps, Standardized Testing Primer, 2007
Achievement tests
Purpose: to measure how much you know and can recall
Developed using: content coverage analysis
How validated: retrospective or concurrent validity
(correlation with past measures, such as high school grades)
Requires a mastery of content prior to test.
Fairness assumes that all have same opportunity to learn content
Coachable – specific content is known in advance

SOURCE: Phelps, Standardized Testing Primer, 2007
Aptitude tests
1890s – A. Binet & T. Simon (France)
- Pre-school children with mental disabilities
- achievement test not possible
- developed content-free test of mental abilities
(association, attention, memory, motor skills, reasoning)

1917 – Adapted by U.S. Army to select, assign soldiers in World War 1
1930s – Harvard University president J. Conant
- wanted new admission test to identify students from lower social
classes with the potential to succeed at Harvard
- developed the first Scholastic Aptitude Test (SAT)
SOURCE: Phelps, Standardized Testing Primer, 2007
Aptitude tests
Purpose: predict how much can be learned
Developed using: skills/job analysis
How validated: predictive validity, correlation with future activity (e.g.,
university or job evaluations)
Content independent. Measures:
… what student does with content provided
… how student applies skills & abilities developed over a lifetime
Not easily coachable – the content is either…
… not known in advance,
… basic, broad, commonly known by all, curriculum-free;
… less dependent on the quality of schools
SOURCE: Phelps, Standardized Testing Primer, 2007
Aptitude tests
Aptitude tests can identify:
- Students bored in school who study
what interests them on their own
- Students not well adapted to high
school, but well adapted to university
- Students of high ability stuck in poor
schools
SOURCE: Phelps, Standardized Testing Primer, 2007
Comparing Achievement & Aptitude tests
Achievement

Aptitude

Measure

past learning

potential

Development

content analysis

job/skills analysis

Validation

retrospective

predictive

Content

dependent

independent

Coachable?

very much

not much
Non-cognitive tests
More recently developed
– measure values, attitudes, preferences
Types:

integrity tests
career exploration
matchmaking
employment “fit”
Non-cognitive tests
Purpose: to identify “fit” with others or a situation

Developed using: surveys, personal interviews
How validated? success rate in future activities

Content is personal, not learned
“Faking” can be an issue (e.g., “honesty” tests)
Comparing Achievement, Aptitude, &
Non-Cognitive Tests
Achievement

Aptitude

Non-Cognitive

Measure

past learning

potential

attitudes, values,
preferences

Development

content analysis

job/skills analysis

surveys

Validation

retrospective

predictive

predictive

Content

dependent

independent

independent

Coachable?

very much

very little

can be faked
2. Measuring test quality
Test reports can
be “data dumps”
3 measures are important:
1. Predictive validity
2. Content coverage
3. Sub-group differences
Predictive validity
(values from -1.0 to +1.0)

…measures how well higher scores
on admission test match better
outcomes at university (e.g., grades,
completion)
A test with low predictive validity provides
little information.
A positive correlation between two measures

Source: NIST, Engineering Statistics Handbook
A negative correlation between two measures

Source: NIST, Engineering Statistics Handbook
No correlation between two measures

Source: NIST, Engineering Statistics Handbook
How does one measure
predictive capacity?

Correlation Coefficient:
I--------------------------------------------I
-1
0
1
Predictive validities: SAT and PSU
0.6
0.5
0.4
0.3

SAT

0.2

PSU 2010

0.1
0
Language Mathematics SAT WritingSU Social Science
P

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
Predictive validities: SAT and PSU
(faculty: Administracion)
0.6
0.5
0.4
0.3
0.2
0.1
0
Language

Mathematics
SAT

SAT Writing

PSU Social
Science

PSU Administracion

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
Predictive validities: SAT and PSU
(faculty: Arquitectura)
0.6
0.5
0.4
0.3
0.2
0.1

0
Language

Mathematics
SAT

SAT Writing

PSU Social
Science

PSU Arquitectura

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
Predictive validities: SAT and PSU
(faculty: Educacion)
0.6
0.5
0.4
0.3
0.2
0.1

0
Language

Mathematics
SAT

SAT Writing

PSU Social
Science

PSU Educacion

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
Predictive validities: ACT and PSU
0.6
0.5
0.4
0.3
0.2
0.1

0
Language

Mathematics Social Science
ACT

Science

PSU

SOURCE: ACT, Research Summary Services, 1997_1998; Pearson, Final Report
Evaluation of the Chile PSU, January 2013
Predictive validities of the PSU
(CTA v Pearson estimates)
0.6
0.5
0.4
0.3
0.2

0.1
0
Language

Mathematics
CTA

Pearson

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013; CTA
Incremental Predictive validities (engineering):
(controlling for NEM)
35
30
25
20
15

PAA

10

PSU

5
0
U. Chile

PUC

Language & Math

U. Chile

PUC

Language & Math + subject test

SOURCE: S.A. Prado, Estudio de Validez Predictiva de la PSU y Comparacion con el Sistema
PAA, Universidad de Chile
Content coverage (values from 0% to 100%)
…how much of
the content
domain of a test
has been taught
in the schools.

It is not fair to expect students to master
content to which they have not been exposed.
…or, to compare students who have been
exposed to students who have not.
Percentage curricular coverage in Chilean high schools, by type of school: 2012
Mathematics, Level 1

100

75

50

25

0
Municipal

Subvencionado

Pagado

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media
Lenguaje y Comunicacion – Matematica, Septiembre 2012
Percentage curricular coverage in Chilean high schools, by type of school: 2012
Language & Communication, Level 2
100

75

50

25

0
Municipal

Subvencionado

Pagado

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media
Lenguaje y Comunicacion – Matematica, Septiembre 2012
Percentage curricular coverage in Chilean high schools, by type of school: 2012
Mathematics, Level 3
100

75

50

25

0
Municipal

Subvencionado

Pagado

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media
Lenguaje y Comunicacion – Matematica, Septiembre 2012
Percentage curricular coverage in Chilean high schools, by type of school: 2012
Language & Communication, Level 4
100

75

50

25

0
Municipal

Subvencionado

Pagado

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media
Lenguaje y Comunicacion – Matematica, Septiembre 2012
Percentage curricular coverage in Chilean high schools, by type of curriculum: 2012
Mathematics, Level 4
100

75

50

25

0
Humanista Cientifica

Technico Profesional

Polivante

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media
Lenguaje y Comunicacion – Matematica, Septiembre 2012
Percentage curricular coverage in Chilean high schools, by type of curriculum: 2012
Language & Communication, Level 4
100

75

50

25

0
Numanista Cientifica

Technico Profesional

Polivante

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media
Lenguaje y Comunicacion – Matematica, Septiembre 2012
Percentage of Chilean high schools with full curricular coverage, by subject area: 2012
Levels 1--4

100%

75%

50%

Do NOT Cover 100%
Cover 100%

25%

0%
Mathematics

Language &
Communication

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media
Lenguaje y Comunicacion – Matematica, Septiembre 2012
Subgroup differences

Differences in test scores among
subgroups (e.g., gender, ethnic, school
type) should be due only to differences
in the attribute measured by the test
and not to systematic biases in the test.
Growing gaps in PSU Mathematics raw & adjusted scores,
by type of curriculum: 2002—2010
Brechas PSU Matemáticas para toda la muestra
Brechas PSU Matemáticas para toda la muestra
180
200
160
200
180
140
180
160
120
160
200
140
100
140
180
120
80
120
160
100
60
100
140
80
40
80
120
60
20
60
100
40
0
40
80
20

Brechas

Brechas Brechas Brechas
Brechas

200

20
600
0
40

190
180
170
160
150
140
130
120
110
111
100
90
95
80
70
60
50
46
40
43
30
20
2002
2003
2004
2005
10
0
2002
2003
2004
2005
2002
2003
2004
2002
2003
2004
2005

Brechas PSU Matemáticas para toda la 170
muestra
Brechas PSU Matemáticas para toda la muestra

Brechas PSU Matemáticas para toda la muestra
124

85

51

2006

2007

2008

2009

2010

2006
2005
2006

2007
2006
2007

2008
2009
2010
2007
2008
2009
2008
2009
2010

2006

2007

2008

PP Muni-TP
Brecha Sin
Ajustar
PP Muni-TP
Brecha Sin
PP Muni-TP
PP Muni-TP
Ajustar
Brecha Ajustada
Brecha Sin
Ajustar
PP Muni-TP
Brecha Ajustada
PP Muni-CH
PP Muni-TP
Brecha Sin
Brecha Ajustada
Sin
Ajustar
Ajustar
PP Muni-CH
Brecha Sin
PP Muni-CH
Muni-CH
PP Muni-TP
Ajustar
Brecha Ajustada
Sin
Brecha Ajustada
Ajustar
PP Muni-CH
Brecha Ajustada
PP Muni-CH
Brecha Ajustada
Sin
Ajustar

PP Muni-CH
Brecha Ajustada

2010

20
0
2002

2003

2004

2005

2009

2010

SOURCE: Koljatic, Silva, & Phelps, Consequential Tests and Conflicts of Interest: The Case of Chile’s
PSU, forthcoming, 2014
Growing gaps in PSU Language & Communication raw &
adjusted scores, by type of curriculum: 2002—2010
Brechas PSU Lenguaje para toda la muestra
Brechas PSU Matemáticas para toda la muestra
200
170

160
180
200
140
160
180
120
140
200
160
100
120
180
140
80
100
160
120
60
80
140
100
40
60
120
80
20
40
100
60
0
20
80
40

150

Brechas

Brechas Brechas Brechas
Brechas

180
200

0
60
20

160

Brechas PSU Matemáticas para toda la muestra

140
130
120

Brechas PSU Matemáticas para toda la muestra
113

110
100

106

90

79

80
70

86

60

44

50
40
30
20
2002
10

44
36

2003

2004

2005

2006

2007

2003
2002

2004
2003

2005
2004

2006
2005

2007
2008
2009
2010
2006
2007
2008
2009

2002

2003

2004

2005

2006

2007

2008

2009

2008

2009

2010

2002

2003

2004

2005

2006

2007

2008

2009

2010

PP Muni-TP
Brecha Sin
Ajustar
PP Muni-TP
Brecha Sin
PP Muni-TP
Ajustar
PP Muni-TP
Brecha Ajustada
Brecha Sin
PP Muni-TP
Ajustar
Brecha Ajustada
PP Muni-CH
PP Muni-TP
PP Muni-TP
Brecha Sin
Brecha Sin
Brecha Ajustada
Ajustar
Ajustar
PP Muni-CH
Brecha Sin
PP Muni-CH
Ajustar
PP Muni-TP
PP Muni-CH
Brecha Ajustada
Brecha Ajustada
Brecha Sin
PP Muni-CH
Ajustar
Brecha Ajustada
PP Muni-CH
PP Muni-CH
Brecha Sin
Brecha Ajustada
Ajustar

2010

0
2002

40
0
20

Brechas PSU Matemáticas para toda la muestra
161

PP Muni-CH
Brecha Ajustada

2010

0

SOURCE: Koljatic, Silva, & Phelps, Consequential Tests and Conflicts of Interest: The Case of Chile’s
PSU, forthcoming, 2014
3. A chronology of mistakes
2000, initial proposal, SIES/PSU project

This proposal attempts a redesign of the tests currently used to select
students for higher education in Chile. It is expected that [this new test

will] have a positive impact in the efficiency of the selection
process, improving the psychometric properties of the measuring
instruments, and establishing a better articulation between the
selection system and the secondary education curriculum.

SOURCE: Proyecto FONDEF, Reformulacion de las Pruebas de Seleccion a la Educacion Superior
A chronology of mistakes (cont.)
2001 (World Bank & MINEDUC)

…the Academic Aptitude Test for entry to the university system is under
revision, together with the universities belonging to the Council of Rectors.
This instrument of entry selection, needs also to be aligned with the new

curriculum and may become an exit exam from the secondary
education system.

SOURCE: World Bank, Implementation Completion Report on a Loan in the Amount of $35 million to the Republic of
Chile for Secondary Education, 2001
A chronology of mistakes (cont.)
2005 (World Bank)

…The new law adopted in May 2005 (Bulletin 3223-04) established a system of
student loans available to all students achieving a threshold score in the
University Admission Exam (PSU). …the new system does not impede students
unable to provide collateral from financing their studies. The new system

promises to improve equity further by increasing options for talented
students from non-affluent families to access higher education.

SOURCE: IMPLEMENTATION COMPLETION REPORT (TF-25378 SCL-44040 PPFB-P3360) ON A LOAN IN THE AMOUNT
OF US$145.45 MILLION TO THE REPUBLIC OF CHILE FOR THE HIGHER EDUCATION IMPROVEMENT
PROJECT, December 2005
A chronology of mistakes (cont.)
2009 (OECD & World Bank)

[One option for revising admission testing] would be for Chile to move
away from a university entry test towards a national school leaving
test or set of tests – ideally, not simple multiple choice tests but
longer exams, which test both knowledge and candidates’ ability to
think and to apply knowledge. Such school leaving exams or tests
could also remove the need for a separate school leaving
certificate, by having two pass levels, the lower level equivalent to the NEM
and the higher level setting the minimum standard for entry to an academic or
professional degree course.

SOURCE: OECD & World Bank, Tertiary Education in Chile, 2009
A chronology of mistakes (cont.)
2009 (OECD & World Bank)

The second option [to revising admissions testing] would be to reform the
PSU by incorporating elements other countries consider useful and
important in identifying the students most likely to benefit from HE. These
elements would include extended essays and questions designed to

test reasoning ability and learning potential. They could also include
personal statements which could cover non-curricular
experience, personal motivation and interest in the programme.
Again, there should be a variant for vocational secondary school students.

SOURCE: OECD & World Bank, Tertiary Education in Chile, 2009
A chronology of mistakes (cont.)
2010 (World Bank)

Over time the government should consider replacing the university
entry exam with a national school leaving exam as the prime
criterion for entry into tertiary education institutions. This could
establish a closer link between test results and the school that is responsible
for them, making it easier to reach the goal that has been pursued with the
introduction of the PSU.

SOURCE: N. Brandt, CHILE: CLIMBING ON GIANTS' SHOULDERS: BETTER SCHOOLS FOR ALL CHILEANCHILDREN;
ECONOMICS DEPARTMENT WORKING PAPERS No. 784
A chronology of mistakes (cont.)
2010 (World Bank)

There is evidence that central curriculum based exit exams are strongly and
positively related to student academic performance (Wößmann, 2005;
Bishop, 2006). To allow students to show in more detail their knowledge and
their ability to apply it, the school exit exam could be a bit more in-

depth than the multiple-choice PSU, including verbal and nonverbal
reasoning.

SOURCE: N. Brandt, CHILE: CLIMBING ON GIANTS' SHOULDERS: BETTER SCHOOLS FOR ALL CHILEANCHILDREN;
ECONOMICS DEPARTMENT WORKING PAPERS No. 784
4. Economists misunderstand testing
Testing & Measurement PhD program
(University of Massachusetts, USA, 2013-2014)
EDUC
EDUC
EDUC
EDUC
EDUC
EDUC
EDUC
EDUC
EDUC
EDUC
EDUC
EDUC
EDUC
EDUC

501 Classroom Assessment
553 Construction, Validation, and Uses of Criterion-Referenced Tests
555 Introduction to Statistics & Computer Analysis I
632 Principles of Educational & Psychological Testing
637 Non-Parametric Statistics Analysis
656 Introduction to Statistical & Computer Analysis II
661 Educational Research Methods I
727 Scale and Instrument Development
731 Structural Equation Modeling
735 Advanced Theory & Practice of Testing I
736 Advanced Theory & Practice of Testing II
771 Application of Applied Multivariate Statistics I
772 Application of Applied Multivariate Statistics II
821 Advanced Validity Theory & Test Validation
How economists misunderstand
testing - 1

Increasing an admission test’s correlation
with high school work can decrease its
correlation with university work
How economists misunderstand
testing - 2
Incentives aren’t all that
matter in improving
efficiency;
…also important: more
and better
information, better
classification &
allocation
How economists misunderstand
testing - 3
Incentives generally work best
when applied to the actor
responsible for the target
behavior;
…currently, students bear the
consequences when schools do
not teach the curriculum tested
on the PSU
How economists misunderstand
testing - 4
Many useful and successful tests serve multiple purposes.
But, some purposes are compatible and some are not.
Responsible authorities have argued that the PSU will:
1.
2.
3.
4.
5.
6.
7.

Measure the implementation of a new curriculum;
Fairly measure mastery of two, very different curricula;
Incentivize high schools to implement the new curriculum;
Incentivize high school students to study more;
Predict success in university generally;
Predict success across very different types of university programs;
Reduce socio-economic disparities.
The PSU: A test at war with itself
(a science-humanities exit exam, sold originally as a science-humanities
curriculum coverage survey, that is used as an entry exam for all students)

Expected to do to many things…
…it does none of them well,
…and makes some of them worse.
You cannot get there from here
The PSU cannot be “fixed”; it is fundamentally flawed.

A non-cognitive test, used as a
high-stakes admission test, will
exacerbate the problems. It is
easily faked. Wealthier students
will pay for coaching and the
scores will be invalid.

The old system – PAA + PCEs – was a sensible system.
Other options to consider
Option for Technical-Professional Graduates:
As is done in Germany, offer short course on scientific-humanistic
11th & 12th grade curricula with exam at the end for technicalprofessional graduates who decide after graduation that they wish
to change careers.
Create separate test for technical-professionals to enter university.
ETS & Pearson recommendations:
Lessen the content in PSU to the common level – 10th grade – and
to that which is genuinely necessary for a good prediction.
How the PSU Runs:
• CRUCh: "owners" of the PSU
• Comité Técnico Asesor (CTA) para la PSU: designated
by CRUCh as supervisors of DEMRE and official
evaluators of the PSU
• DEMRE: responsible for developing test items, test
assembly, tests administration, test
scoring, application system for CRUCh and
associated universities, etc.
Ministry of Education--funds the system since 2007 (fee
waivers)
1/23/2014
CRUCH

COMITÉ TECNICO
ASESOR DEL CRUCH
PARA LA PSU (CTA)

DEMRE
U. de Chile

Source: adapted from the Pearson Report (2013)
5. How SIMCE is affected

What does this have to do with SIMCE?
Most do not see the difference among tests. In public
perception, one bad test makes all tests look bad.
SIMCE’s largest challenge may the loss of public goodwill
towards all testing.
“If a thing exists, it exists in some amount. If it exists in some amount, then it
is capable of being measured.”

−−Rene Descartes, Principles of Philosophy, 1664

Contenu connexe

Similaire à University Admission Testing in Chile: The PSU

Arkansas common core presentation
Arkansas common core presentationArkansas common core presentation
Arkansas common core presentationRichard P Phelps
 
Using Multiple Data Points to Identify Gifted and Highly Able Learners
Using Multiple Data Points to Identify Gifted and Highly Able LearnersUsing Multiple Data Points to Identify Gifted and Highly Able Learners
Using Multiple Data Points to Identify Gifted and Highly Able LearnersKris Happe, M.Ed
 
Key Elements in An Accountability System
Key Elements in An Accountability SystemKey Elements in An Accountability System
Key Elements in An Accountability SystemIwan Syahril
 
ACT and Me - Guidance Counselor Presentation
ACT and Me - Guidance Counselor PresentationACT and Me - Guidance Counselor Presentation
ACT and Me - Guidance Counselor PresentationACT, Inc.
 
Module 4b assessment
Module 4b assessmentModule 4b assessment
Module 4b assessmentmelanielayttu
 
Topic 10 Issues and Concerns Related to Assessment in Malaysia
Topic 10 Issues and Concerns Related to Assessment in MalaysiaTopic 10 Issues and Concerns Related to Assessment in Malaysia
Topic 10 Issues and Concerns Related to Assessment in MalaysiaYee Bee Choo
 
Developing quality growth goals step 2 creating goals using the smart process
Developing quality growth goals step 2 creating goals using the smart processDeveloping quality growth goals step 2 creating goals using the smart process
Developing quality growth goals step 2 creating goals using the smart processAnil Kumar Yadav
 
Hartland Share
Hartland ShareHartland Share
Hartland ShareEdAdvance
 
Module 4 Application- Stephanie Herrera
Module 4 Application- Stephanie HerreraModule 4 Application- Stephanie Herrera
Module 4 Application- Stephanie HerreraStephanie Herrera
 
Guidelines on the assessment and rating of learning outcomes under the k to 1...
Guidelines on the assessment and rating of learning outcomes under the k to 1...Guidelines on the assessment and rating of learning outcomes under the k to 1...
Guidelines on the assessment and rating of learning outcomes under the k to 1...Joey Miñano
 
Student Assessment
Student AssessmentStudent Assessment
Student Assessmentcgamble
 
Technology-Enhanced-Assessment-for-Learning.pptx
Technology-Enhanced-Assessment-for-Learning.pptxTechnology-Enhanced-Assessment-for-Learning.pptx
Technology-Enhanced-Assessment-for-Learning.pptxFroilanAlexCuevas1
 
Designing an evaluation of a tertiary preparatory program sounds
Designing an evaluation of a tertiary preparatory program soundsDesigning an evaluation of a tertiary preparatory program sounds
Designing an evaluation of a tertiary preparatory program soundsphysrcd
 
Designing an evaluation of a tertiary preparatory program sounds
Designing an evaluation of a tertiary preparatory program soundsDesigning an evaluation of a tertiary preparatory program sounds
Designing an evaluation of a tertiary preparatory program soundsphysrcd
 
PLA Presentation - OrACRAO
PLA Presentation - OrACRAOPLA Presentation - OrACRAO
PLA Presentation - OrACRAOMelanie Booth
 
Non-Cognitive Testing
Non-Cognitive TestingNon-Cognitive Testing
Non-Cognitive Testingjwilliams77
 

Similaire à University Admission Testing in Chile: The PSU (20)

Arkansas common core presentation
Arkansas common core presentationArkansas common core presentation
Arkansas common core presentation
 
Using Multiple Data Points to Identify Gifted and Highly Able Learners
Using Multiple Data Points to Identify Gifted and Highly Able LearnersUsing Multiple Data Points to Identify Gifted and Highly Able Learners
Using Multiple Data Points to Identify Gifted and Highly Able Learners
 
Key Elements in An Accountability System
Key Elements in An Accountability SystemKey Elements in An Accountability System
Key Elements in An Accountability System
 
ACT and Me - Guidance Counselor Presentation
ACT and Me - Guidance Counselor PresentationACT and Me - Guidance Counselor Presentation
ACT and Me - Guidance Counselor Presentation
 
Module 4b assessment
Module 4b assessmentModule 4b assessment
Module 4b assessment
 
module 4b
module 4bmodule 4b
module 4b
 
Elearning Summit 2015 - BoSCO - Minneapolis
Elearning Summit 2015 - BoSCO - MinneapolisElearning Summit 2015 - BoSCO - Minneapolis
Elearning Summit 2015 - BoSCO - Minneapolis
 
Topic 10 Issues and Concerns Related to Assessment in Malaysia
Topic 10 Issues and Concerns Related to Assessment in MalaysiaTopic 10 Issues and Concerns Related to Assessment in Malaysia
Topic 10 Issues and Concerns Related to Assessment in Malaysia
 
Developing quality growth goals step 2 creating goals using the smart process
Developing quality growth goals step 2 creating goals using the smart processDeveloping quality growth goals step 2 creating goals using the smart process
Developing quality growth goals step 2 creating goals using the smart process
 
Hartland Share
Hartland ShareHartland Share
Hartland Share
 
Common standard 2
Common standard 2Common standard 2
Common standard 2
 
Module 4 Application- Stephanie Herrera
Module 4 Application- Stephanie HerreraModule 4 Application- Stephanie Herrera
Module 4 Application- Stephanie Herrera
 
Guidelines on the assessment and rating of learning outcomes under the k to 1...
Guidelines on the assessment and rating of learning outcomes under the k to 1...Guidelines on the assessment and rating of learning outcomes under the k to 1...
Guidelines on the assessment and rating of learning outcomes under the k to 1...
 
Student Assessment
Student AssessmentStudent Assessment
Student Assessment
 
Technology-Enhanced-Assessment-for-Learning.pptx
Technology-Enhanced-Assessment-for-Learning.pptxTechnology-Enhanced-Assessment-for-Learning.pptx
Technology-Enhanced-Assessment-for-Learning.pptx
 
Designing an evaluation of a tertiary preparatory program sounds
Designing an evaluation of a tertiary preparatory program soundsDesigning an evaluation of a tertiary preparatory program sounds
Designing an evaluation of a tertiary preparatory program sounds
 
Designing an evaluation of a tertiary preparatory program sounds
Designing an evaluation of a tertiary preparatory program soundsDesigning an evaluation of a tertiary preparatory program sounds
Designing an evaluation of a tertiary preparatory program sounds
 
PLA Presentation - OrACRAO
PLA Presentation - OrACRAOPLA Presentation - OrACRAO
PLA Presentation - OrACRAO
 
Eifel Ell Pportfolios
Eifel Ell PportfoliosEifel Ell Pportfolios
Eifel Ell Pportfolios
 
Non-Cognitive Testing
Non-Cognitive TestingNon-Cognitive Testing
Non-Cognitive Testing
 

Plus de Richard P Phelps

Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptx
Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptxDismissive Reviews, Citation Cartels, and the Replication Crisis.pptx
Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptxRichard P Phelps
 
The Successful Degradation of Evidence on Educational Testing in the United S...
The Successful Degradation of Evidence on Educational Testing in the United S...The Successful Degradation of Evidence on Educational Testing in the United S...
The Successful Degradation of Evidence on Educational Testing in the United S...Richard P Phelps
 
Comparing achievement and aptitude tests for university admission
Comparing achievement and aptitude tests for university admissionComparing achievement and aptitude tests for university admission
Comparing achievement and aptitude tests for university admissionRichard P Phelps
 
Boarding School: Benefits and Drawbacks
Boarding School: Benefits and DrawbacksBoarding School: Benefits and Drawbacks
Boarding School: Benefits and DrawbacksRichard P Phelps
 
It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflation It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflation Richard P Phelps
 
It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflationIt's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflationRichard P Phelps
 
Designing an Assessment System
Designing an Assessment SystemDesigning an Assessment System
Designing an Assessment SystemRichard P Phelps
 
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...Richard P Phelps
 
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...Richard P Phelps
 
Classroom testing: Using tests to promote learning
Classroom testing: Using tests to promote learningClassroom testing: Using tests to promote learning
Classroom testing: Using tests to promote learningRichard P Phelps
 
Forty years of polls on standardized tests in education
Forty years of polls on standardized tests in educationForty years of polls on standardized tests in education
Forty years of polls on standardized tests in educationRichard P Phelps
 
Economic perspectives on testing
Economic perspectives on testingEconomic perspectives on testing
Economic perspectives on testingRichard P Phelps
 
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...
L'effet de tests standardisés sur les résultats scolaires des élèves :  1910-...L'effet de tests standardisés sur les résultats scolaires des élèves :  1910-...
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...Richard P Phelps
 
The effect of testing on student achievement: 1910-2010
The effect of testing on student achievement: 1910-2010The effect of testing on student achievement: 1910-2010
The effect of testing on student achievement: 1910-2010Richard P Phelps
 
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...Richard P Phelps
 
Worse Than Plagiarism: Dismissive Reviews
Worse Than Plagiarism: Dismissive ReviewsWorse Than Plagiarism: Dismissive Reviews
Worse Than Plagiarism: Dismissive ReviewsRichard P Phelps
 

Plus de Richard P Phelps (18)

Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptx
Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptxDismissive Reviews, Citation Cartels, and the Replication Crisis.pptx
Dismissive Reviews, Citation Cartels, and the Replication Crisis.pptx
 
The Successful Degradation of Evidence on Educational Testing in the United S...
The Successful Degradation of Evidence on Educational Testing in the United S...The Successful Degradation of Evidence on Educational Testing in the United S...
The Successful Degradation of Evidence on Educational Testing in the United S...
 
Comparing achievement and aptitude tests for university admission
Comparing achievement and aptitude tests for university admissionComparing achievement and aptitude tests for university admission
Comparing achievement and aptitude tests for university admission
 
Boarding School: Benefits and Drawbacks
Boarding School: Benefits and DrawbacksBoarding School: Benefits and Drawbacks
Boarding School: Benefits and Drawbacks
 
It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflation It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflation
 
It's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflationIt's a myth: High stakes cause test score inflation
It's a myth: High stakes cause test score inflation
 
Designing an Assessment System
Designing an Assessment SystemDesigning an Assessment System
Designing an Assessment System
 
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
Innovaciones en la evaluación en el aula: El uso de pruebas para promover el ...
 
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
Fortalezas y debilidades de las pruebas estandarizadas como mecanismos inclus...
 
Classroom testing: Using tests to promote learning
Classroom testing: Using tests to promote learningClassroom testing: Using tests to promote learning
Classroom testing: Using tests to promote learning
 
Test benefits slide show
Test benefits slide showTest benefits slide show
Test benefits slide show
 
Forty years of polls on standardized tests in education
Forty years of polls on standardized tests in educationForty years of polls on standardized tests in education
Forty years of polls on standardized tests in education
 
Economic perspectives on testing
Economic perspectives on testingEconomic perspectives on testing
Economic perspectives on testing
 
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...
L'effet de tests standardisés sur les résultats scolaires des élèves :  1910-...L'effet de tests standardisés sur les résultats scolaires des élèves :  1910-...
L'effet de tests standardisés sur les résultats scolaires des élèves : 1910-...
 
The effect of testing on student achievement: 1910-2010
The effect of testing on student achievement: 1910-2010The effect of testing on student achievement: 1910-2010
The effect of testing on student achievement: 1910-2010
 
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
L'effet de tests standardisés sur les résultats scolaires des élèves : Méta-a...
 
Source of Lake Wobegon
Source of Lake WobegonSource of Lake Wobegon
Source of Lake Wobegon
 
Worse Than Plagiarism: Dismissive Reviews
Worse Than Plagiarism: Dismissive ReviewsWorse Than Plagiarism: Dismissive Reviews
Worse Than Plagiarism: Dismissive Reviews
 

Dernier

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Dernier (20)

Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

University Admission Testing in Chile: The PSU

  • 1. Large-scale testing: Uses and abuses Richard P. Phelps Universidad Finis Terrae, Santiago, Chile January 7, 2014
  • 2. Large-scale testing: Uses and abuses 1. 2. 3. 4. 5. 3 types of large-scale tests Measuring test quality A chronology of mistakes Economists misunderstand testing How SIMCE is affected
  • 3. 1. Three types of large-scale tests Achievement Aptitude Non-cognitive
  • 4. Achievement tests Historically, were larger versions of classroom tests ~ 1900 - “scientific” achievement tests developed (Germany & USA) J.M. Rice - systematically analyzed test structures & effects E.L. Thorndike - developed scoring scales SOURCE: Phelps, Standardized Testing Primer, 2007
  • 5. Achievement tests Purpose: to measure how much you know and can recall Developed using: content coverage analysis How validated: retrospective or concurrent validity (correlation with past measures, such as high school grades) Requires a mastery of content prior to test. Fairness assumes that all have same opportunity to learn content Coachable – specific content is known in advance SOURCE: Phelps, Standardized Testing Primer, 2007
  • 6. Aptitude tests 1890s – A. Binet & T. Simon (France) - Pre-school children with mental disabilities - achievement test not possible - developed content-free test of mental abilities (association, attention, memory, motor skills, reasoning) 1917 – Adapted by U.S. Army to select, assign soldiers in World War 1 1930s – Harvard University president J. Conant - wanted new admission test to identify students from lower social classes with the potential to succeed at Harvard - developed the first Scholastic Aptitude Test (SAT) SOURCE: Phelps, Standardized Testing Primer, 2007
  • 7. Aptitude tests Purpose: predict how much can be learned Developed using: skills/job analysis How validated: predictive validity, correlation with future activity (e.g., university or job evaluations) Content independent. Measures: … what student does with content provided … how student applies skills & abilities developed over a lifetime Not easily coachable – the content is either… … not known in advance, … basic, broad, commonly known by all, curriculum-free; … less dependent on the quality of schools SOURCE: Phelps, Standardized Testing Primer, 2007
  • 8. Aptitude tests Aptitude tests can identify: - Students bored in school who study what interests them on their own - Students not well adapted to high school, but well adapted to university - Students of high ability stuck in poor schools SOURCE: Phelps, Standardized Testing Primer, 2007
  • 9. Comparing Achievement & Aptitude tests Achievement Aptitude Measure past learning potential Development content analysis job/skills analysis Validation retrospective predictive Content dependent independent Coachable? very much not much
  • 10. Non-cognitive tests More recently developed – measure values, attitudes, preferences Types: integrity tests career exploration matchmaking employment “fit”
  • 11. Non-cognitive tests Purpose: to identify “fit” with others or a situation Developed using: surveys, personal interviews How validated? success rate in future activities Content is personal, not learned “Faking” can be an issue (e.g., “honesty” tests)
  • 12. Comparing Achievement, Aptitude, & Non-Cognitive Tests Achievement Aptitude Non-Cognitive Measure past learning potential attitudes, values, preferences Development content analysis job/skills analysis surveys Validation retrospective predictive predictive Content dependent independent independent Coachable? very much very little can be faked
  • 13. 2. Measuring test quality Test reports can be “data dumps” 3 measures are important: 1. Predictive validity 2. Content coverage 3. Sub-group differences
  • 14. Predictive validity (values from -1.0 to +1.0) …measures how well higher scores on admission test match better outcomes at university (e.g., grades, completion) A test with low predictive validity provides little information.
  • 15. A positive correlation between two measures Source: NIST, Engineering Statistics Handbook
  • 16. A negative correlation between two measures Source: NIST, Engineering Statistics Handbook
  • 17. No correlation between two measures Source: NIST, Engineering Statistics Handbook
  • 18. How does one measure predictive capacity? Correlation Coefficient: I--------------------------------------------I -1 0 1
  • 19. Predictive validities: SAT and PSU 0.6 0.5 0.4 0.3 SAT 0.2 PSU 2010 0.1 0 Language Mathematics SAT WritingSU Social Science P SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
  • 20. Predictive validities: SAT and PSU (faculty: Administracion) 0.6 0.5 0.4 0.3 0.2 0.1 0 Language Mathematics SAT SAT Writing PSU Social Science PSU Administracion SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
  • 21. Predictive validities: SAT and PSU (faculty: Arquitectura) 0.6 0.5 0.4 0.3 0.2 0.1 0 Language Mathematics SAT SAT Writing PSU Social Science PSU Arquitectura SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
  • 22. Predictive validities: SAT and PSU (faculty: Educacion) 0.6 0.5 0.4 0.3 0.2 0.1 0 Language Mathematics SAT SAT Writing PSU Social Science PSU Educacion SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013
  • 23. Predictive validities: ACT and PSU 0.6 0.5 0.4 0.3 0.2 0.1 0 Language Mathematics Social Science ACT Science PSU SOURCE: ACT, Research Summary Services, 1997_1998; Pearson, Final Report Evaluation of the Chile PSU, January 2013
  • 24. Predictive validities of the PSU (CTA v Pearson estimates) 0.6 0.5 0.4 0.3 0.2 0.1 0 Language Mathematics CTA Pearson SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013; CTA
  • 25. Incremental Predictive validities (engineering): (controlling for NEM) 35 30 25 20 15 PAA 10 PSU 5 0 U. Chile PUC Language & Math U. Chile PUC Language & Math + subject test SOURCE: S.A. Prado, Estudio de Validez Predictiva de la PSU y Comparacion con el Sistema PAA, Universidad de Chile
  • 26. Content coverage (values from 0% to 100%) …how much of the content domain of a test has been taught in the schools. It is not fair to expect students to master content to which they have not been exposed. …or, to compare students who have been exposed to students who have not.
  • 27. Percentage curricular coverage in Chilean high schools, by type of school: 2012 Mathematics, Level 1 100 75 50 25 0 Municipal Subvencionado Pagado SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012
  • 28. Percentage curricular coverage in Chilean high schools, by type of school: 2012 Language & Communication, Level 2 100 75 50 25 0 Municipal Subvencionado Pagado SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012
  • 29. Percentage curricular coverage in Chilean high schools, by type of school: 2012 Mathematics, Level 3 100 75 50 25 0 Municipal Subvencionado Pagado SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012
  • 30. Percentage curricular coverage in Chilean high schools, by type of school: 2012 Language & Communication, Level 4 100 75 50 25 0 Municipal Subvencionado Pagado SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012
  • 31. Percentage curricular coverage in Chilean high schools, by type of curriculum: 2012 Mathematics, Level 4 100 75 50 25 0 Humanista Cientifica Technico Profesional Polivante SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012
  • 32. Percentage curricular coverage in Chilean high schools, by type of curriculum: 2012 Language & Communication, Level 4 100 75 50 25 0 Numanista Cientifica Technico Profesional Polivante SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012
  • 33. Percentage of Chilean high schools with full curricular coverage, by subject area: 2012 Levels 1--4 100% 75% 50% Do NOT Cover 100% Cover 100% 25% 0% Mathematics Language & Communication SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ensenanza Media Lenguaje y Comunicacion – Matematica, Septiembre 2012
  • 34. Subgroup differences Differences in test scores among subgroups (e.g., gender, ethnic, school type) should be due only to differences in the attribute measured by the test and not to systematic biases in the test.
  • 35.
  • 36.
  • 37.
  • 38. Growing gaps in PSU Mathematics raw & adjusted scores, by type of curriculum: 2002—2010 Brechas PSU Matemáticas para toda la muestra Brechas PSU Matemáticas para toda la muestra 180 200 160 200 180 140 180 160 120 160 200 140 100 140 180 120 80 120 160 100 60 100 140 80 40 80 120 60 20 60 100 40 0 40 80 20 Brechas Brechas Brechas Brechas Brechas 200 20 600 0 40 190 180 170 160 150 140 130 120 110 111 100 90 95 80 70 60 50 46 40 43 30 20 2002 2003 2004 2005 10 0 2002 2003 2004 2005 2002 2003 2004 2002 2003 2004 2005 Brechas PSU Matemáticas para toda la 170 muestra Brechas PSU Matemáticas para toda la muestra Brechas PSU Matemáticas para toda la muestra 124 85 51 2006 2007 2008 2009 2010 2006 2005 2006 2007 2006 2007 2008 2009 2010 2007 2008 2009 2008 2009 2010 2006 2007 2008 PP Muni-TP Brecha Sin Ajustar PP Muni-TP Brecha Sin PP Muni-TP PP Muni-TP Ajustar Brecha Ajustada Brecha Sin Ajustar PP Muni-TP Brecha Ajustada PP Muni-CH PP Muni-TP Brecha Sin Brecha Ajustada Sin Ajustar Ajustar PP Muni-CH Brecha Sin PP Muni-CH Muni-CH PP Muni-TP Ajustar Brecha Ajustada Sin Brecha Ajustada Ajustar PP Muni-CH Brecha Ajustada PP Muni-CH Brecha Ajustada Sin Ajustar PP Muni-CH Brecha Ajustada 2010 20 0 2002 2003 2004 2005 2009 2010 SOURCE: Koljatic, Silva, & Phelps, Consequential Tests and Conflicts of Interest: The Case of Chile’s PSU, forthcoming, 2014
  • 39. Growing gaps in PSU Language & Communication raw & adjusted scores, by type of curriculum: 2002—2010 Brechas PSU Lenguaje para toda la muestra Brechas PSU Matemáticas para toda la muestra 200 170 160 180 200 140 160 180 120 140 200 160 100 120 180 140 80 100 160 120 60 80 140 100 40 60 120 80 20 40 100 60 0 20 80 40 150 Brechas Brechas Brechas Brechas Brechas 180 200 0 60 20 160 Brechas PSU Matemáticas para toda la muestra 140 130 120 Brechas PSU Matemáticas para toda la muestra 113 110 100 106 90 79 80 70 86 60 44 50 40 30 20 2002 10 44 36 2003 2004 2005 2006 2007 2003 2002 2004 2003 2005 2004 2006 2005 2007 2008 2009 2010 2006 2007 2008 2009 2002 2003 2004 2005 2006 2007 2008 2009 2008 2009 2010 2002 2003 2004 2005 2006 2007 2008 2009 2010 PP Muni-TP Brecha Sin Ajustar PP Muni-TP Brecha Sin PP Muni-TP Ajustar PP Muni-TP Brecha Ajustada Brecha Sin PP Muni-TP Ajustar Brecha Ajustada PP Muni-CH PP Muni-TP PP Muni-TP Brecha Sin Brecha Sin Brecha Ajustada Ajustar Ajustar PP Muni-CH Brecha Sin PP Muni-CH Ajustar PP Muni-TP PP Muni-CH Brecha Ajustada Brecha Ajustada Brecha Sin PP Muni-CH Ajustar Brecha Ajustada PP Muni-CH PP Muni-CH Brecha Sin Brecha Ajustada Ajustar 2010 0 2002 40 0 20 Brechas PSU Matemáticas para toda la muestra 161 PP Muni-CH Brecha Ajustada 2010 0 SOURCE: Koljatic, Silva, & Phelps, Consequential Tests and Conflicts of Interest: The Case of Chile’s PSU, forthcoming, 2014
  • 40. 3. A chronology of mistakes 2000, initial proposal, SIES/PSU project This proposal attempts a redesign of the tests currently used to select students for higher education in Chile. It is expected that [this new test will] have a positive impact in the efficiency of the selection process, improving the psychometric properties of the measuring instruments, and establishing a better articulation between the selection system and the secondary education curriculum. SOURCE: Proyecto FONDEF, Reformulacion de las Pruebas de Seleccion a la Educacion Superior
  • 41. A chronology of mistakes (cont.) 2001 (World Bank & MINEDUC) …the Academic Aptitude Test for entry to the university system is under revision, together with the universities belonging to the Council of Rectors. This instrument of entry selection, needs also to be aligned with the new curriculum and may become an exit exam from the secondary education system. SOURCE: World Bank, Implementation Completion Report on a Loan in the Amount of $35 million to the Republic of Chile for Secondary Education, 2001
  • 42. A chronology of mistakes (cont.) 2005 (World Bank) …The new law adopted in May 2005 (Bulletin 3223-04) established a system of student loans available to all students achieving a threshold score in the University Admission Exam (PSU). …the new system does not impede students unable to provide collateral from financing their studies. The new system promises to improve equity further by increasing options for talented students from non-affluent families to access higher education. SOURCE: IMPLEMENTATION COMPLETION REPORT (TF-25378 SCL-44040 PPFB-P3360) ON A LOAN IN THE AMOUNT OF US$145.45 MILLION TO THE REPUBLIC OF CHILE FOR THE HIGHER EDUCATION IMPROVEMENT PROJECT, December 2005
  • 43. A chronology of mistakes (cont.) 2009 (OECD & World Bank) [One option for revising admission testing] would be for Chile to move away from a university entry test towards a national school leaving test or set of tests – ideally, not simple multiple choice tests but longer exams, which test both knowledge and candidates’ ability to think and to apply knowledge. Such school leaving exams or tests could also remove the need for a separate school leaving certificate, by having two pass levels, the lower level equivalent to the NEM and the higher level setting the minimum standard for entry to an academic or professional degree course. SOURCE: OECD & World Bank, Tertiary Education in Chile, 2009
  • 44. A chronology of mistakes (cont.) 2009 (OECD & World Bank) The second option [to revising admissions testing] would be to reform the PSU by incorporating elements other countries consider useful and important in identifying the students most likely to benefit from HE. These elements would include extended essays and questions designed to test reasoning ability and learning potential. They could also include personal statements which could cover non-curricular experience, personal motivation and interest in the programme. Again, there should be a variant for vocational secondary school students. SOURCE: OECD & World Bank, Tertiary Education in Chile, 2009
  • 45. A chronology of mistakes (cont.) 2010 (World Bank) Over time the government should consider replacing the university entry exam with a national school leaving exam as the prime criterion for entry into tertiary education institutions. This could establish a closer link between test results and the school that is responsible for them, making it easier to reach the goal that has been pursued with the introduction of the PSU. SOURCE: N. Brandt, CHILE: CLIMBING ON GIANTS' SHOULDERS: BETTER SCHOOLS FOR ALL CHILEANCHILDREN; ECONOMICS DEPARTMENT WORKING PAPERS No. 784
  • 46. A chronology of mistakes (cont.) 2010 (World Bank) There is evidence that central curriculum based exit exams are strongly and positively related to student academic performance (Wößmann, 2005; Bishop, 2006). To allow students to show in more detail their knowledge and their ability to apply it, the school exit exam could be a bit more in- depth than the multiple-choice PSU, including verbal and nonverbal reasoning. SOURCE: N. Brandt, CHILE: CLIMBING ON GIANTS' SHOULDERS: BETTER SCHOOLS FOR ALL CHILEANCHILDREN; ECONOMICS DEPARTMENT WORKING PAPERS No. 784
  • 48. Testing & Measurement PhD program (University of Massachusetts, USA, 2013-2014) EDUC EDUC EDUC EDUC EDUC EDUC EDUC EDUC EDUC EDUC EDUC EDUC EDUC EDUC 501 Classroom Assessment 553 Construction, Validation, and Uses of Criterion-Referenced Tests 555 Introduction to Statistics & Computer Analysis I 632 Principles of Educational & Psychological Testing 637 Non-Parametric Statistics Analysis 656 Introduction to Statistical & Computer Analysis II 661 Educational Research Methods I 727 Scale and Instrument Development 731 Structural Equation Modeling 735 Advanced Theory & Practice of Testing I 736 Advanced Theory & Practice of Testing II 771 Application of Applied Multivariate Statistics I 772 Application of Applied Multivariate Statistics II 821 Advanced Validity Theory & Test Validation
  • 49. How economists misunderstand testing - 1 Increasing an admission test’s correlation with high school work can decrease its correlation with university work
  • 50. How economists misunderstand testing - 2 Incentives aren’t all that matter in improving efficiency; …also important: more and better information, better classification & allocation
  • 51. How economists misunderstand testing - 3 Incentives generally work best when applied to the actor responsible for the target behavior; …currently, students bear the consequences when schools do not teach the curriculum tested on the PSU
  • 52. How economists misunderstand testing - 4 Many useful and successful tests serve multiple purposes. But, some purposes are compatible and some are not. Responsible authorities have argued that the PSU will: 1. 2. 3. 4. 5. 6. 7. Measure the implementation of a new curriculum; Fairly measure mastery of two, very different curricula; Incentivize high schools to implement the new curriculum; Incentivize high school students to study more; Predict success in university generally; Predict success across very different types of university programs; Reduce socio-economic disparities.
  • 53. The PSU: A test at war with itself (a science-humanities exit exam, sold originally as a science-humanities curriculum coverage survey, that is used as an entry exam for all students) Expected to do to many things… …it does none of them well, …and makes some of them worse.
  • 54. You cannot get there from here The PSU cannot be “fixed”; it is fundamentally flawed. A non-cognitive test, used as a high-stakes admission test, will exacerbate the problems. It is easily faked. Wealthier students will pay for coaching and the scores will be invalid. The old system – PAA + PCEs – was a sensible system.
  • 55. Other options to consider Option for Technical-Professional Graduates: As is done in Germany, offer short course on scientific-humanistic 11th & 12th grade curricula with exam at the end for technicalprofessional graduates who decide after graduation that they wish to change careers. Create separate test for technical-professionals to enter university. ETS & Pearson recommendations: Lessen the content in PSU to the common level – 10th grade – and to that which is genuinely necessary for a good prediction.
  • 56. How the PSU Runs: • CRUCh: "owners" of the PSU • Comité Técnico Asesor (CTA) para la PSU: designated by CRUCh as supervisors of DEMRE and official evaluators of the PSU • DEMRE: responsible for developing test items, test assembly, tests administration, test scoring, application system for CRUCh and associated universities, etc. Ministry of Education--funds the system since 2007 (fee waivers) 1/23/2014
  • 57. CRUCH COMITÉ TECNICO ASESOR DEL CRUCH PARA LA PSU (CTA) DEMRE U. de Chile Source: adapted from the Pearson Report (2013)
  • 58. 5. How SIMCE is affected What does this have to do with SIMCE? Most do not see the difference among tests. In public perception, one bad test makes all tests look bad. SIMCE’s largest challenge may the loss of public goodwill towards all testing.
  • 59. “If a thing exists, it exists in some amount. If it exists in some amount, then it is capable of being measured.” −−Rene Descartes, Principles of Philosophy, 1664

Notes de l'éditeur

  1. Sorry, but this presentation will be in English.However, these slides will be made available later in both English and Spanish.
  2. Used by many North American universities to test students AFTER admission, on first day at school. Used for advising and placement.
  3. Scatterplot shows the relationship between two factors – for example, high school grades and university grades
  4. From the Ministry of Education content coverage study.Unit of analysis is school. Vertical line shows the range of coverage – some schools teach NONE of the required curriculum; some teach all. Horizontal line is the mean coverage.
  5. Those with 0% coverage, I assume, are behind in the curriculum, because students are behind, and still teaching lower grade content.
  6. Now we compare by curriculum.
  7. Scatterplot of student performance by impact of socio-economic background on that performance.Chile in below average performance / above average affect of SES quadrant, along with Mexico, Brazil, and the USA.
  8. Scatterplot of PISA scores and PISA SES index shows clear correlation for Chile by type of school: municipal, subsidized, & private fee-based.
  9. Trends in enrollment in Chile up to 2006 – high school and university showing opposite trends. SES gap in high school narrowed. SES gap in university widened.
  10. Converting from classical to IRT may “modernize” a test, but does not necessarily improve its psychometric properties.The articulation now seems worse than it was with the PAA and the PCEs.
  11. When a test becomes an exit exam from the previous level of schooling, it is no longer an “academic aptitude test”; it becomes an achievement test based on the previous curriculum.In some states of the USA, ACT administers tests that are highly predictive of future university work, and also good surveys of a student’s mastery of the high school curriculum. But, the two functions are accomplished by separate items among the tests administered, subject-area achievement tests are added to the regular ACT test and the collection of tests is administered over two or three days.
  12. Obviously, the new system has not improved equity; in fact, it has decreased it.
  13. It is ironic, because the World Bank and the OECD advocated and helped to finance the move away from tests that measured “candidates’ ability to think and to apply knowledge.”
  14. Again, the World Bank and OECD are now recommending ability testing, which they earlier helped to eliminate. But, they are also recommending using non-cognitive tests – that are easily faked – as part of the university entrance requirement. A non-persevering student can easily claim to be persevering if he knows it will help get him into university.
  15. Actually, the PSU is a curricular-based achievement test – the type typically appropriate for use as an exit exam – but it is being used as an entry exam. This recommendation is directly opposite the previous two World Bank recommendations.
  16. There are hundreds of studies by psychologists and psychometricians the World Bank could have cited. Instead, they cite two studies by economists that are only superficially appropriate for this issue.The fact alone that test items are constructed response rather than multiple-choice does not make them any more in-depth. One can write very superficial constructed-response items, and very deep, complex multiple-choice items
  17. Suppose you have a leaky faucet and would like it to be fixed. Which of these professionals would you call?
  18. The courses taken by a testing and measurement student in one US PhD program.
  19. No single test can possibly do all of this.
  20. My opinion.
  21. It can be very discouraging if a decision a student (or the student’s parents) makes at age 14 will determine the student’s career …forever, and cannot ever be altered.