SlideShare une entreprise Scribd logo
1  sur  42
THE EARLY YEARS OF VALIDITY
1800S-----1951
Maryam Bolouri
Major developments in England,
France, Germany, and the USA
 1836: matriculation examinations
 1845: first in USA, the superiority of written
exam over oral quiz
 1853: India act for impartial selection for civil
services
 1858: local examinations in OXFORD and
Cambridge
 Development of statistical approach in Britain
such as Spearman contribution
Major developments in England,
France, Germany, and the USA
 1904: Binet in France , development of a series of
test to discriminate unmotivated and incapable
children from the others
 USA,Yerkes et al. development of intelligence test
in army recruits
 Purpose: bring scientific methods to the study of
edu such as achievement test or development of
mental tests
 Problem: growing discontent regarding the
unreliability of marks and unfair evaluation by
human minds
Personal equation concern:
 Solution: sentence completion,T/F items,
MC selection…
 Development of objective and standard
based assessment (1st roots in USA and soV
is the product of NA)
 Led to the mushrooming publication of
standard tests and research into test and
testing from 1910--1920
The outcome of pre 1921
 Structured and objective assessment
 Distinction btw sub-domains of edu and psycho
measurement
1. Professional communities: diagnosis,
achievement, selection
2. Scientific communities: explore personality
characteristics and innate differences
 Distinction btw different types of tests (ling vs.
performance-individual vs. group- written and
standardized tests)
 Recognition of CO.CO as a tool for judging the
quality of tests
Post 1921 era
 The term “V” began to take root in the lexicon of
researchers and practitioners.
 1911 Freeman: technique andV of test methods
 1915Terman: evaluated theV of intelligence and
IQ tests
 1916 Starch: referred toV or fairness of
measures
 1916Thorndike: essentials of valid scale
 1919 APA attempts for professional certification
in response to use of mental tests by unqualified
individuals
Post 1921 era
 1921 NADER national asso of Directors of edu
research: seek standardization and consistency among
concepts and procedures (similar to APA attempts in
1895, 1906).
 Regulations proposed by them:
1. Preparation and selection
2. Experimental org of test and instruction
3. Trail of tentative test
4. Final org of test
5. Final cond of test (scoring, tabulation and
interpretation)
6. DetermineV
7. Determine R
8. Determine norms
1st official definition of V
 By NADER
 Challenged to promote and develop new methods
 1st classic definition ofV:
The degree to which a test or examination
measures what it purports to measure
The idea of criterion was central to this and the
dominant approaches were predictive or
concurrent ones.
Content consideration existed yet was not sig and
robust
1915—1930 boom period: new tests multiplied like
rabbits, being uncritical to the instruments and
the results
Early years:
 Over simplistic descriptions
 Elaboration of insights that had been
established before
 Elevation of empirical evidence at the expense
of logical analysis (dust-bowl empiricism)
According to Shepard: 1920—1950: defense to
test criterion correlations
1940s:V= predictive Co. CO
According to Kane: criterion phase
According to Cronbach: whole ofV theory:
prediction
Some issues regarding early years:
1. We cannot ignore early
years Theory of prediction descriptive and
explanatory investigations
 The omissions of early years discussion is
counter productive and we shouldn’t teachV
from the baseline of 1954.
 Only with reference to the baseline of 1921
the transition fromTrinitarian conception ofV
to present day theory can be understood.
Some issues regarding early years:
2. Too many seminal works In
early years
 There were too many seminal works that
made impossible for a coherent tradition to
emerge.
 Each with new perspectives
 1920s was prolific for edu measurement
 Difference in perspectives among authors
within sub domains as well as in different sub
domains
Some issues regarding early years:
3. V in different ways and
phases
 Both wars influenced testing and validation.
 Large implementation of mental testing and a
method of scoring by stencil for rapid marking
by Otis during 1st world war
 The army α and β: military aptitude gave mental
testing publicity and prestige
 Mechanical test construction to predict criterion
measures (blindly empirical)
 This is only one side of this complex story from
mid of 19th to 20th century (to 1952)
Prediction phase a caricature:
1) Widespread adoption of blindly empirical
methods specifically aptitude testing for the
army
2) The degradation of classic definition over time
and the method forV measurement was
mistaken for definition ofV. it consists of 3 stages
a) Quality of measurement
b) Degree of correlation btw test and criterion
c) Co. Co btw the test and criterion
from a to b: 1922:McCall, only
by correlations we know what
test measures
 Classic definition: discreteV and validation,
 It was conceptual abstraction.
 A hypothetical true proficiency rank as an absolute
criterion
 There is no single true proficiency rank but a range
of ranks
 No sense of prediction, just in terms of correlation
btw actual test results and hypo proficiency
from a to b: 1922:McCall, only
by correlations we know what
test measures
2 methods to determine the correspondence:
1. Prolonged careful observation in real life situ
 determine true proficiency and use it as
criterion  rank students on the test
correlate them
2. Rank pupils with known proficiency  rank
on the test correlate them
Other approaches to develop
criterion:
 Expert or teacher judgment
 Results of multiple existing tests measure the
same thing
 Results from specific tests
From b to c: change of criteria from
conceptual abstraction to more
concrete and pragmatic measures
 Coefficient ofV= Co. Co btw the test and scores
and criterion scores
 V= observed agreement rather than a hypo
agreement btw test scores and true proficiency
 V= empirical correlation
 There was no Q to the v of criterion scores!!
 Fusion of definition and method
 Underscored the use of test and each test has
differentV with regard to the use
From b to c: change of criteria from
conceptual abstraction to more
concrete and pragmatic measures
 Dominance of atheoretical definition
 Distinction btw practicalV and factorialV
 PracticalV: a test is valid for anything with
which it correlates (Guilford, 1946)
 There are 2 kinds ofV and the practicalV
addresses the fundamental Q ofV
 Undue emphasis on empirical evidence
problem: inadequacy of definition and
criterion problem
Terman (1928) 3 primary concerns
of edu and psycho measurement
1. achievement 2. intelligence 3. aptitudes
1. School achievement –Walter Monroe
V as multifaceted concept based on correlation and a
conceptual definition ofV was expressed
a. Objectivity in describing the performances (rater)
b. Reliability
( Co of R, index of R, error of measurement, Co of
correspondence, overlapping of grade groups)
c. Discrimination (agreement with Normal curve
d. Comparison with criterion measures
e.V inference based on test structure and admin
Terman (1928) 3 primary concerns of
edu and psycho measurement
1. achievement 2. intelligence 3. aptitudes
 6 threats to valid interpretation:
1. Do the tasks require other abilities ?
2. Can the tasks be answered in a variety of
methods? (other than the intended one)
3. Is the test administered under a variety of
conds?
4. Do students continue to exe their ability across
all tasks?
5. Are the tasks rep of the field of ability being
measured?
6. Are all students given this opportunity?
Unitary conception of V:
 Integration of multiple sources of empirical
evidence and logical analysis
 2 primary categories of sources of evidence:
1. Expert opinion vs. experimental Ruch 1929
2. Curricular vs. statistical – Ruch 1933
 3 approaches to logical analysis: Ruch 1929
1. Competent person judgment on the appropriateness of
content
2. Alignment of content with test book
3. Alignment of content with recommendation of national edu
committees
Terman (1928) 3 primary concerns of edu
and psycho measurement
1. achievement 2. intelligence 3. aptitudes
Fundamental role:
 extensive sampling in school achievement tests, random
sampling from the field, or rep of the most important
elements, measuring the same thing or attribute
 Tests parallel to actual teaching
 Centrality of logical analysis
Problem: no field is perfectly homogeneous , so there would
be always a certain degree of compromise
Major innovation:
Scaling, tests with different levels of difficulty items of a
test were not selected based on content and rep effectively
Problem: tension btw discrimination and sampling
From random sampling to restricted sampling
It not possible to construct a robust measure of
overall achievement based on weighted sampling
of behavior across the entire achievement domain.
So instead of rep sample we should tap the essence of
achievement .
So those items with high correlation to general
achievement must be selected. Each item play a
role contributing to the essence of general
achievement attribute
Items discriminate btw high and low students
correlate high with criterion.
From random sampling to
restricted sampling
 V from curriculum viewpoint andV from
general achievement view point need to
arrive at a compromise.
 A large unresolved tension can be detected
throughout the study by Lindquist (1936)
Terman (1928) 3 primary concerns of edu
and psycho measurement
1. achievement 2. intelligence 3. aptitudes
 Tyler (1931):V in terms of usefulness of the test in
measuring the attainment of course objectives
 He was not opposed to empirical approach, but not
impressed by the use ofT marks as empirical
criterion
 His suggestion: development of preliminary tests
for each course objectives to help
 1) creating comprehensive criterion measures
 2) diagnostic purposes
Then preparation of some practical tests to be
validated by correlation
Tyler’s concerns:
1. Sampling
2. Test construction
3. Validity
4. Mental process, no distinction btw content of
subj and the required mental process, and
items test info not the interpretation or
application of principles
5. Negative impacts of tests on instruction and
the reform of curriculum. Studying and
teaching were adapted to the emphasis of
tests
Tension btw empirical and logical
1930s-1940s
 Overemphasis on empirical: inadequacy of
criteria for establishingV and backwash
effect on teaching and learning
 Overemphasis on logical: impossibility of rep
sampling and fallibility of human judgement
 Tyler: rational hypo in test construction
 Pendulum swings against empirical
considerations (technician viewpoint)
2 key principles in evaluation movement
1. The evaluation could not begin until the
curriculum had been defined in terms of
behavioral objectives
2. Any useful device might be employed in
the production of pupil growth account:
 Teacher judgment
 Essay examination
 Objective test
Terman (1928) 3 primary concerns
of edu and psycho measurement
1. achievement 2. intelligence 3. aptitudes
Logical approach: Raw brain power and Binet-Simon
scales were extended.
Problem: thorough description of the universe of
intelligent behavior was not straightforward, there
was no clear definition
Binet: faculties are different from general intelligence
, a single test can be a test of intelligence.
Post-Binet: not a single test, but combined tests
(manifold and heterogeneous) performance on a
test is the product of both faculties and general
intelligence.
Terman (1928) 3 primary concerns of edu
and psycho measurement
1. achievement 2. intelligence 3. aptitudes
 Solution: permissive sampling, assess
considerably more than the essence of
intelligence
 V can be maximized by intentional construct
under-representation or intentional
construct- irrelevance
 Assumption: random irrelevant item
variance cancel out in law of averages.
Terman (1928) 3 primary concerns of edu
and psycho measurement
1. achievement 2. intelligence 3. aptitudes
 Empirical approach:
 Criterion measure of intelligence is needed
 During 1st world war: a number of reputed tests of
higher quality to be adopted as yardstick
Otis group test: most valid
Terman Group
Miller Group test: least valid
Army Alpha
Cattell-1943: promoted F.A as an important
validation technique and transform it from lay
activity to scientific prax
Terman (1928) 3 primary concerns of edu
and psycho measurement
1. achievement 2. intelligence 3. aptitudes
 For the purpose of vocational guidance and
selection
 1st Assumption: aptitudes were stable, if not innate
 2nd assumption: aptitudes differ across and within
individuals along continua
 Difference of aptitude measurement: the criterion
was not sth of present but of the future.
 Successful performance in vocation= exercise of
skills and abilities that had not yet been developed.
 Problem? How should it be validated??
Empirical approach of
Aptitude test:
 The idea of sampling is meaningless so it led
to elevation of empirical approaches in 4
stages:
1) Administer the aptitude test
2) Wait until the required skills and abilities are
received
3) Assess job proficiency in situ
4) Correlate the result of tests and assessment
of job proficency
Empirical approach of
Aptitude test:
 Absence of clear rational principles
 Development based on haphazard trial and error
search for effective predictors
 With minimum rationality
 Large list of preference to discriminate btw
professions
 Selection of items with high correlation to criterion
in successive fashion (multiple regression
challenge) low inter item correlation and high
correlation with criterion (weakness of aptitude
test)
Achilles heel of aptitude
testing
 Robust criterion measures
 V for criterion measures
 2 major components of criterion problem:
1. The definition of criterion, subjective
judgment and widespread lack of
agreement over occupational success
2. The development of a procedure to measure
the criterion
Thorndike (1949): 3 categories
of criteria
1. Ultimate category: complete final goal of a
particular type of selection, multifaceted
and not available for direct study
2. Intermediate category
3. Immediate category
Validation will fall back on no 2, 3
Blind empiricism is fragile, dangerous. It was
repeatedly said by Messick 1970s—1990s
Mid 1940s: Paul Meehl and
Lee Cronbach, construct V
Paul Meehl:
Dissatisfied by client self-rating
Self rating should not be used as a behavior
surrogate but as an indirect sign of sth deeper
Because it requires
1. Appropriate level of self understanding
2. Willingness to disclose
Mid 1940s: Paul Meehl and
Lee Cronbach, construct V
 Lee Cronbach:
Impact of item format
Response set: the tendency to respond differently to
items in different ways
6 kinds of response: Give many responses, Speed,
Accuracy, Gamble…
A threat toV:different individuals demonstrate
different response set on same set
Solution: useT/F less and MC more
Cronbach (1949): 5 technical
criteria of a good test
1. Validity
2. Reliability
3. Objectivity
4. Norms
5. Good items
2 approaches of logical analysis (psychological
understanding of attribute) and empirical
evidence
Cronbach (1949): V as the
correspondence of test to definition of
attribute
There are items that correspond to definition of
attribute yet bring irrelevant variables that make
the items impure:
1. Items with different answers of test takers
using different methods
2. Items with limited access to some test takers
from certain cultural groups
3. Items that are vulnerable to response sets
4. Items correspond to content yet fail to assess
desired processes
Cronbach (1949): ultimate
consideration
1. Logical analysis is inferior to empirical evidence.
2. Most frequently used criterion: instructor or
supervisors rating, others tests of the same
attribute
3. Discussed criterion problem in-depth
4. Rise of particular empirical approach : factorialV,
the degree that a test could purely measure one
type of ability
Newton ch2

Contenu connexe

Tendances

Calallen parent night staar high school 2012
Calallen parent night staar high school 2012Calallen parent night staar high school 2012
Calallen parent night staar high school 2012
rsendejo
 
NED 203 Criterion Referenced Test & Rubrics
NED 203 Criterion Referenced Test & RubricsNED 203 Criterion Referenced Test & Rubrics
NED 203 Criterion Referenced Test & Rubrics
Carmina Gurrea
 
254457273-LET-Review-Assessment-of-Learning-Test-Items.pptx
254457273-LET-Review-Assessment-of-Learning-Test-Items.pptx254457273-LET-Review-Assessment-of-Learning-Test-Items.pptx
254457273-LET-Review-Assessment-of-Learning-Test-Items.pptx
ItsssClarizza
 

Tendances (20)

Measurement and Assessment in Teaching 11th Edition Miller Test Bank
Measurement and Assessment in Teaching 11th Edition Miller Test BankMeasurement and Assessment in Teaching 11th Edition Miller Test Bank
Measurement and Assessment in Teaching 11th Edition Miller Test Bank
 
Qualitative item analysis
Qualitative item analysisQualitative item analysis
Qualitative item analysis
 
Calallen parent night staar high school 2012
Calallen parent night staar high school 2012Calallen parent night staar high school 2012
Calallen parent night staar high school 2012
 
NED 203 Criterion Referenced Test & Rubrics
NED 203 Criterion Referenced Test & RubricsNED 203 Criterion Referenced Test & Rubrics
NED 203 Criterion Referenced Test & Rubrics
 
Administering, Analyzing, and Improving the Test or Assessment
Administering, Analyzing, and Improving the Test or AssessmentAdministering, Analyzing, and Improving the Test or Assessment
Administering, Analyzing, and Improving the Test or Assessment
 
254457273-LET-Review-Assessment-of-Learning-Test-Items.pptx
254457273-LET-Review-Assessment-of-Learning-Test-Items.pptx254457273-LET-Review-Assessment-of-Learning-Test-Items.pptx
254457273-LET-Review-Assessment-of-Learning-Test-Items.pptx
 
standardized Achievement tests SAT
standardized Achievement tests SATstandardized Achievement tests SAT
standardized Achievement tests SAT
 
Methods of teaching copy
Methods of teaching copyMethods of teaching copy
Methods of teaching copy
 
Algebra Competency
Algebra CompetencyAlgebra Competency
Algebra Competency
 
What is a test
What is a testWhat is a test
What is a test
 
Standardized Test
Standardized TestStandardized Test
Standardized Test
 
Criterion-Referenced Assessment Review
Criterion-Referenced Assessment ReviewCriterion-Referenced Assessment Review
Criterion-Referenced Assessment Review
 
International comparisons in senior secondary assessments
International comparisons in senior secondary assessmentsInternational comparisons in senior secondary assessments
International comparisons in senior secondary assessments
 
Multiple choice tests
Multiple choice testsMultiple choice tests
Multiple choice tests
 
Norm referenced & criterion-referenced tests
Norm referenced & criterion-referenced testsNorm referenced & criterion-referenced tests
Norm referenced & criterion-referenced tests
 
NORM- AND CRITERION-REFERENCED TESTS AND CONTENT VALIDITY EVIDENCE
NORM- AND CRITERION-REFERENCED TESTS AND CONTENT VALIDITY EVIDENCENORM- AND CRITERION-REFERENCED TESTS AND CONTENT VALIDITY EVIDENCE
NORM- AND CRITERION-REFERENCED TESTS AND CONTENT VALIDITY EVIDENCE
 
Nrt and crt
Nrt and crtNrt and crt
Nrt and crt
 
Rone ryan
Rone ryanRone ryan
Rone ryan
 
Chapter 6
Chapter 6Chapter 6
Chapter 6
 
Testing and assesment
Testing and assesmentTesting and assesment
Testing and assesment
 

En vedette

Relative clauses review exercises 1
Relative clauses   review exercises 1Relative clauses   review exercises 1
Relative clauses review exercises 1
Sandra Àlvarez
 
Creative Actuarial Resume
Creative Actuarial ResumeCreative Actuarial Resume
Creative Actuarial Resume
Michael Dixon
 

En vedette (20)

Clase 6 teoría del productor
Clase 6  teoría del productorClase 6  teoría del productor
Clase 6 teoría del productor
 
Keyhole implant placement in upper jaw
Keyhole implant placement in upper jawKeyhole implant placement in upper jaw
Keyhole implant placement in upper jaw
 
Top 10 Technological Breaktroughs In 2008
Top 10 Technological Breaktroughs In 2008Top 10 Technological Breaktroughs In 2008
Top 10 Technological Breaktroughs In 2008
 
Relative clauses review exercises 1
Relative clauses   review exercises 1Relative clauses   review exercises 1
Relative clauses review exercises 1
 
How to-thrive
How to-thriveHow to-thrive
How to-thrive
 
DEPLIANT convention
DEPLIANT conventionDEPLIANT convention
DEPLIANT convention
 
The One Thing You Must Do To Have New Ideas
The One Thing You Must Do To Have New IdeasThe One Thing You Must Do To Have New Ideas
The One Thing You Must Do To Have New Ideas
 
2015 - Images of JULY - July 09 - July 15
2015 - Images of JULY - July 09 - July 152015 - Images of JULY - July 09 - July 15
2015 - Images of JULY - July 09 - July 15
 
Enhanced Mobile IP Handover Using Link Layer Information
Enhanced Mobile IP Handover Using Link Layer InformationEnhanced Mobile IP Handover Using Link Layer Information
Enhanced Mobile IP Handover Using Link Layer Information
 
Desmistificando o Facebook
Desmistificando o Facebook Desmistificando o Facebook
Desmistificando o Facebook
 
Part 3 of the first Israeli Hackathon for students in primary schools
Part 3 of the first Israeli Hackathon for students in primary schoolsPart 3 of the first Israeli Hackathon for students in primary schools
Part 3 of the first Israeli Hackathon for students in primary schools
 
Creative Actuarial Resume
Creative Actuarial ResumeCreative Actuarial Resume
Creative Actuarial Resume
 
Compost
CompostCompost
Compost
 
Dimension economica sostenible
Dimension   economica sostenibleDimension   economica sostenible
Dimension economica sostenible
 
UX and Tech Teams Integration
UX and Tech Teams IntegrationUX and Tech Teams Integration
UX and Tech Teams Integration
 
動画のあれこれ
動画のあれこれ動画のあれこれ
動画のあれこれ
 
The Beauty of Paganism
The Beauty of PaganismThe Beauty of Paganism
The Beauty of Paganism
 
Simple Steps to Implement Kanban in Lean Manufacturing
Simple Steps to Implement Kanban in Lean ManufacturingSimple Steps to Implement Kanban in Lean Manufacturing
Simple Steps to Implement Kanban in Lean Manufacturing
 
Guia de boas práticas em redes sociais
Guia de boas práticas em redes sociaisGuia de boas práticas em redes sociais
Guia de boas práticas em redes sociais
 
The Middleware technology that connects the enterprise
The Middleware technology that connects the enterpriseThe Middleware technology that connects the enterprise
The Middleware technology that connects the enterprise
 

Similaire à Newton ch2

Fundamental Issues in MeasurementAdvanced Measurement and Eval.docx
Fundamental Issues in MeasurementAdvanced Measurement and Eval.docxFundamental Issues in MeasurementAdvanced Measurement and Eval.docx
Fundamental Issues in MeasurementAdvanced Measurement and Eval.docx
budbarber38650
 
Linn 2000
Linn 2000Linn 2000
Linn 2000
clinic
 
Assignment #71. What is the importance of communication, negotia.docx
Assignment #71. What is the importance of communication, negotia.docxAssignment #71. What is the importance of communication, negotia.docx
Assignment #71. What is the importance of communication, negotia.docx
festockton
 
comparative and non-comparative evaluation
comparative and non-comparative evaluationcomparative and non-comparative evaluation
comparative and non-comparative evaluation
u067328
 

Similaire à Newton ch2 (20)

Maryam Bolouri
Maryam BolouriMaryam Bolouri
Maryam Bolouri
 
Career Maturity Inventory Presentation
Career Maturity Inventory PresentationCareer Maturity Inventory Presentation
Career Maturity Inventory Presentation
 
Fundamental Issues in MeasurementAdvanced Measurement and Eval.docx
Fundamental Issues in MeasurementAdvanced Measurement and Eval.docxFundamental Issues in MeasurementAdvanced Measurement and Eval.docx
Fundamental Issues in MeasurementAdvanced Measurement and Eval.docx
 
pr2 dll week 1.docx
pr2 dll week 1.docxpr2 dll week 1.docx
pr2 dll week 1.docx
 
Rmic 823 master_syllabus_2_o1o
Rmic 823 master_syllabus_2_o1oRmic 823 master_syllabus_2_o1o
Rmic 823 master_syllabus_2_o1o
 
Linn 2000
Linn 2000Linn 2000
Linn 2000
 
Developing an in-house speaking assessment: Rasch analysis for action research
Developing an in-house speaking assessment: Rasch analysis for action researchDeveloping an in-house speaking assessment: Rasch analysis for action research
Developing an in-house speaking assessment: Rasch analysis for action research
 
A New Generation of Assessments: 3 Things You Need to Know
A New Generation of Assessments: 3 Things You Need to KnowA New Generation of Assessments: 3 Things You Need to Know
A New Generation of Assessments: 3 Things You Need to Know
 
Standard progressive matrices
Standard progressive matricesStandard progressive matrices
Standard progressive matrices
 
Src Voc
Src VocSrc Voc
Src Voc
 
Intro to philosophy Module1_Q1.pptx
Intro to philosophy Module1_Q1.pptxIntro to philosophy Module1_Q1.pptx
Intro to philosophy Module1_Q1.pptx
 
Qmet 252
Qmet 252Qmet 252
Qmet 252
 
Validity in Psychological Testing
Validity in Psychological TestingValidity in Psychological Testing
Validity in Psychological Testing
 
Fabio Arico
Fabio AricoFabio Arico
Fabio Arico
 
shorthistoryassessment.ppt
shorthistoryassessment.pptshorthistoryassessment.ppt
shorthistoryassessment.ppt
 
Assignment #71. What is the importance of communication, negotia.docx
Assignment #71. What is the importance of communication, negotia.docxAssignment #71. What is the importance of communication, negotia.docx
Assignment #71. What is the importance of communication, negotia.docx
 
Assessment+summary+from+workshop+march26 2012 bonnie
Assessment+summary+from+workshop+march26 2012 bonnieAssessment+summary+from+workshop+march26 2012 bonnie
Assessment+summary+from+workshop+march26 2012 bonnie
 
comparative and non-comparative evaluation
comparative and non-comparative evaluationcomparative and non-comparative evaluation
comparative and non-comparative evaluation
 
Presentation1 new copy
Presentation1 new   copyPresentation1 new   copy
Presentation1 new copy
 
Nursing Research IntroDuction SOP Hypothesis.ppt
Nursing Research IntroDuction SOP Hypothesis.pptNursing Research IntroDuction SOP Hypothesis.ppt
Nursing Research IntroDuction SOP Hypothesis.ppt
 

Plus de Allame Tabatabaei

Plus de Allame Tabatabaei (20)

political discourse
political discoursepolitical discourse
political discourse
 
discourse analysis
discourse analysis discourse analysis
discourse analysis
 
flowerdew basics
 flowerdew basics  flowerdew basics
flowerdew basics
 
religion discourse analysis
religion discourse analysisreligion discourse analysis
religion discourse analysis
 
discourse analysis EAP
discourse analysis EAPdiscourse analysis EAP
discourse analysis EAP
 
General points in letter writing
General points in letter writing General points in letter writing
General points in letter writing
 
Edmodo presentations
Edmodo presentationsEdmodo presentations
Edmodo presentations
 
Coleman1,2
Coleman1,2Coleman1,2
Coleman1,2
 
White bolouri
White bolouriWhite bolouri
White bolouri
 
Mc kay bolouri
Mc kay bolouriMc kay bolouri
Mc kay bolouri
 
Attitudes bolouri
Attitudes bolouriAttitudes bolouri
Attitudes bolouri
 
Swan.bolouri
Swan.bolouriSwan.bolouri
Swan.bolouri
 
Bell.bolouri
Bell.bolouriBell.bolouri
Bell.bolouri
 
Id
IdId
Id
 
structural
structuralstructural
structural
 
Regression presentation
Regression presentationRegression presentation
Regression presentation
 
attitide anxiety bolouri
 attitide anxiety bolouri attitide anxiety bolouri
attitide anxiety bolouri
 
ANxiety bolouri
ANxiety bolouriANxiety bolouri
ANxiety bolouri
 
Irt assessment
Irt assessmentIrt assessment
Irt assessment
 
Bolouri qualitative method
Bolouri qualitative methodBolouri qualitative method
Bolouri qualitative method
 

Dernier

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Dernier (20)

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

Newton ch2

  • 1. THE EARLY YEARS OF VALIDITY 1800S-----1951 Maryam Bolouri
  • 2. Major developments in England, France, Germany, and the USA  1836: matriculation examinations  1845: first in USA, the superiority of written exam over oral quiz  1853: India act for impartial selection for civil services  1858: local examinations in OXFORD and Cambridge  Development of statistical approach in Britain such as Spearman contribution
  • 3. Major developments in England, France, Germany, and the USA  1904: Binet in France , development of a series of test to discriminate unmotivated and incapable children from the others  USA,Yerkes et al. development of intelligence test in army recruits  Purpose: bring scientific methods to the study of edu such as achievement test or development of mental tests  Problem: growing discontent regarding the unreliability of marks and unfair evaluation by human minds
  • 4. Personal equation concern:  Solution: sentence completion,T/F items, MC selection…  Development of objective and standard based assessment (1st roots in USA and soV is the product of NA)  Led to the mushrooming publication of standard tests and research into test and testing from 1910--1920
  • 5. The outcome of pre 1921  Structured and objective assessment  Distinction btw sub-domains of edu and psycho measurement 1. Professional communities: diagnosis, achievement, selection 2. Scientific communities: explore personality characteristics and innate differences  Distinction btw different types of tests (ling vs. performance-individual vs. group- written and standardized tests)  Recognition of CO.CO as a tool for judging the quality of tests
  • 6. Post 1921 era  The term “V” began to take root in the lexicon of researchers and practitioners.  1911 Freeman: technique andV of test methods  1915Terman: evaluated theV of intelligence and IQ tests  1916 Starch: referred toV or fairness of measures  1916Thorndike: essentials of valid scale  1919 APA attempts for professional certification in response to use of mental tests by unqualified individuals
  • 7. Post 1921 era  1921 NADER national asso of Directors of edu research: seek standardization and consistency among concepts and procedures (similar to APA attempts in 1895, 1906).  Regulations proposed by them: 1. Preparation and selection 2. Experimental org of test and instruction 3. Trail of tentative test 4. Final org of test 5. Final cond of test (scoring, tabulation and interpretation) 6. DetermineV 7. Determine R 8. Determine norms
  • 8. 1st official definition of V  By NADER  Challenged to promote and develop new methods  1st classic definition ofV: The degree to which a test or examination measures what it purports to measure The idea of criterion was central to this and the dominant approaches were predictive or concurrent ones. Content consideration existed yet was not sig and robust 1915—1930 boom period: new tests multiplied like rabbits, being uncritical to the instruments and the results
  • 9. Early years:  Over simplistic descriptions  Elaboration of insights that had been established before  Elevation of empirical evidence at the expense of logical analysis (dust-bowl empiricism) According to Shepard: 1920—1950: defense to test criterion correlations 1940s:V= predictive Co. CO According to Kane: criterion phase According to Cronbach: whole ofV theory: prediction
  • 10. Some issues regarding early years: 1. We cannot ignore early years Theory of prediction descriptive and explanatory investigations  The omissions of early years discussion is counter productive and we shouldn’t teachV from the baseline of 1954.  Only with reference to the baseline of 1921 the transition fromTrinitarian conception ofV to present day theory can be understood.
  • 11. Some issues regarding early years: 2. Too many seminal works In early years  There were too many seminal works that made impossible for a coherent tradition to emerge.  Each with new perspectives  1920s was prolific for edu measurement  Difference in perspectives among authors within sub domains as well as in different sub domains
  • 12. Some issues regarding early years: 3. V in different ways and phases  Both wars influenced testing and validation.  Large implementation of mental testing and a method of scoring by stencil for rapid marking by Otis during 1st world war  The army α and β: military aptitude gave mental testing publicity and prestige  Mechanical test construction to predict criterion measures (blindly empirical)  This is only one side of this complex story from mid of 19th to 20th century (to 1952)
  • 13. Prediction phase a caricature: 1) Widespread adoption of blindly empirical methods specifically aptitude testing for the army 2) The degradation of classic definition over time and the method forV measurement was mistaken for definition ofV. it consists of 3 stages a) Quality of measurement b) Degree of correlation btw test and criterion c) Co. Co btw the test and criterion
  • 14. from a to b: 1922:McCall, only by correlations we know what test measures  Classic definition: discreteV and validation,  It was conceptual abstraction.  A hypothetical true proficiency rank as an absolute criterion  There is no single true proficiency rank but a range of ranks  No sense of prediction, just in terms of correlation btw actual test results and hypo proficiency
  • 15. from a to b: 1922:McCall, only by correlations we know what test measures 2 methods to determine the correspondence: 1. Prolonged careful observation in real life situ  determine true proficiency and use it as criterion  rank students on the test correlate them 2. Rank pupils with known proficiency  rank on the test correlate them
  • 16. Other approaches to develop criterion:  Expert or teacher judgment  Results of multiple existing tests measure the same thing  Results from specific tests
  • 17. From b to c: change of criteria from conceptual abstraction to more concrete and pragmatic measures  Coefficient ofV= Co. Co btw the test and scores and criterion scores  V= observed agreement rather than a hypo agreement btw test scores and true proficiency  V= empirical correlation  There was no Q to the v of criterion scores!!  Fusion of definition and method  Underscored the use of test and each test has differentV with regard to the use
  • 18. From b to c: change of criteria from conceptual abstraction to more concrete and pragmatic measures  Dominance of atheoretical definition  Distinction btw practicalV and factorialV  PracticalV: a test is valid for anything with which it correlates (Guilford, 1946)  There are 2 kinds ofV and the practicalV addresses the fundamental Q ofV  Undue emphasis on empirical evidence problem: inadequacy of definition and criterion problem
  • 19. Terman (1928) 3 primary concerns of edu and psycho measurement 1. achievement 2. intelligence 3. aptitudes 1. School achievement –Walter Monroe V as multifaceted concept based on correlation and a conceptual definition ofV was expressed a. Objectivity in describing the performances (rater) b. Reliability ( Co of R, index of R, error of measurement, Co of correspondence, overlapping of grade groups) c. Discrimination (agreement with Normal curve d. Comparison with criterion measures e.V inference based on test structure and admin
  • 20. Terman (1928) 3 primary concerns of edu and psycho measurement 1. achievement 2. intelligence 3. aptitudes  6 threats to valid interpretation: 1. Do the tasks require other abilities ? 2. Can the tasks be answered in a variety of methods? (other than the intended one) 3. Is the test administered under a variety of conds? 4. Do students continue to exe their ability across all tasks? 5. Are the tasks rep of the field of ability being measured? 6. Are all students given this opportunity?
  • 21. Unitary conception of V:  Integration of multiple sources of empirical evidence and logical analysis  2 primary categories of sources of evidence: 1. Expert opinion vs. experimental Ruch 1929 2. Curricular vs. statistical – Ruch 1933  3 approaches to logical analysis: Ruch 1929 1. Competent person judgment on the appropriateness of content 2. Alignment of content with test book 3. Alignment of content with recommendation of national edu committees
  • 22. Terman (1928) 3 primary concerns of edu and psycho measurement 1. achievement 2. intelligence 3. aptitudes Fundamental role:  extensive sampling in school achievement tests, random sampling from the field, or rep of the most important elements, measuring the same thing or attribute  Tests parallel to actual teaching  Centrality of logical analysis Problem: no field is perfectly homogeneous , so there would be always a certain degree of compromise Major innovation: Scaling, tests with different levels of difficulty items of a test were not selected based on content and rep effectively Problem: tension btw discrimination and sampling
  • 23. From random sampling to restricted sampling It not possible to construct a robust measure of overall achievement based on weighted sampling of behavior across the entire achievement domain. So instead of rep sample we should tap the essence of achievement . So those items with high correlation to general achievement must be selected. Each item play a role contributing to the essence of general achievement attribute Items discriminate btw high and low students correlate high with criterion.
  • 24. From random sampling to restricted sampling  V from curriculum viewpoint andV from general achievement view point need to arrive at a compromise.  A large unresolved tension can be detected throughout the study by Lindquist (1936)
  • 25. Terman (1928) 3 primary concerns of edu and psycho measurement 1. achievement 2. intelligence 3. aptitudes  Tyler (1931):V in terms of usefulness of the test in measuring the attainment of course objectives  He was not opposed to empirical approach, but not impressed by the use ofT marks as empirical criterion  His suggestion: development of preliminary tests for each course objectives to help  1) creating comprehensive criterion measures  2) diagnostic purposes Then preparation of some practical tests to be validated by correlation
  • 26. Tyler’s concerns: 1. Sampling 2. Test construction 3. Validity 4. Mental process, no distinction btw content of subj and the required mental process, and items test info not the interpretation or application of principles 5. Negative impacts of tests on instruction and the reform of curriculum. Studying and teaching were adapted to the emphasis of tests
  • 27. Tension btw empirical and logical 1930s-1940s  Overemphasis on empirical: inadequacy of criteria for establishingV and backwash effect on teaching and learning  Overemphasis on logical: impossibility of rep sampling and fallibility of human judgement  Tyler: rational hypo in test construction  Pendulum swings against empirical considerations (technician viewpoint)
  • 28. 2 key principles in evaluation movement 1. The evaluation could not begin until the curriculum had been defined in terms of behavioral objectives 2. Any useful device might be employed in the production of pupil growth account:  Teacher judgment  Essay examination  Objective test
  • 29. Terman (1928) 3 primary concerns of edu and psycho measurement 1. achievement 2. intelligence 3. aptitudes Logical approach: Raw brain power and Binet-Simon scales were extended. Problem: thorough description of the universe of intelligent behavior was not straightforward, there was no clear definition Binet: faculties are different from general intelligence , a single test can be a test of intelligence. Post-Binet: not a single test, but combined tests (manifold and heterogeneous) performance on a test is the product of both faculties and general intelligence.
  • 30. Terman (1928) 3 primary concerns of edu and psycho measurement 1. achievement 2. intelligence 3. aptitudes  Solution: permissive sampling, assess considerably more than the essence of intelligence  V can be maximized by intentional construct under-representation or intentional construct- irrelevance  Assumption: random irrelevant item variance cancel out in law of averages.
  • 31. Terman (1928) 3 primary concerns of edu and psycho measurement 1. achievement 2. intelligence 3. aptitudes  Empirical approach:  Criterion measure of intelligence is needed  During 1st world war: a number of reputed tests of higher quality to be adopted as yardstick Otis group test: most valid Terman Group Miller Group test: least valid Army Alpha Cattell-1943: promoted F.A as an important validation technique and transform it from lay activity to scientific prax
  • 32. Terman (1928) 3 primary concerns of edu and psycho measurement 1. achievement 2. intelligence 3. aptitudes  For the purpose of vocational guidance and selection  1st Assumption: aptitudes were stable, if not innate  2nd assumption: aptitudes differ across and within individuals along continua  Difference of aptitude measurement: the criterion was not sth of present but of the future.  Successful performance in vocation= exercise of skills and abilities that had not yet been developed.  Problem? How should it be validated??
  • 33. Empirical approach of Aptitude test:  The idea of sampling is meaningless so it led to elevation of empirical approaches in 4 stages: 1) Administer the aptitude test 2) Wait until the required skills and abilities are received 3) Assess job proficiency in situ 4) Correlate the result of tests and assessment of job proficency
  • 34. Empirical approach of Aptitude test:  Absence of clear rational principles  Development based on haphazard trial and error search for effective predictors  With minimum rationality  Large list of preference to discriminate btw professions  Selection of items with high correlation to criterion in successive fashion (multiple regression challenge) low inter item correlation and high correlation with criterion (weakness of aptitude test)
  • 35. Achilles heel of aptitude testing  Robust criterion measures  V for criterion measures  2 major components of criterion problem: 1. The definition of criterion, subjective judgment and widespread lack of agreement over occupational success 2. The development of a procedure to measure the criterion
  • 36. Thorndike (1949): 3 categories of criteria 1. Ultimate category: complete final goal of a particular type of selection, multifaceted and not available for direct study 2. Intermediate category 3. Immediate category Validation will fall back on no 2, 3 Blind empiricism is fragile, dangerous. It was repeatedly said by Messick 1970s—1990s
  • 37. Mid 1940s: Paul Meehl and Lee Cronbach, construct V Paul Meehl: Dissatisfied by client self-rating Self rating should not be used as a behavior surrogate but as an indirect sign of sth deeper Because it requires 1. Appropriate level of self understanding 2. Willingness to disclose
  • 38. Mid 1940s: Paul Meehl and Lee Cronbach, construct V  Lee Cronbach: Impact of item format Response set: the tendency to respond differently to items in different ways 6 kinds of response: Give many responses, Speed, Accuracy, Gamble… A threat toV:different individuals demonstrate different response set on same set Solution: useT/F less and MC more
  • 39. Cronbach (1949): 5 technical criteria of a good test 1. Validity 2. Reliability 3. Objectivity 4. Norms 5. Good items 2 approaches of logical analysis (psychological understanding of attribute) and empirical evidence
  • 40. Cronbach (1949): V as the correspondence of test to definition of attribute There are items that correspond to definition of attribute yet bring irrelevant variables that make the items impure: 1. Items with different answers of test takers using different methods 2. Items with limited access to some test takers from certain cultural groups 3. Items that are vulnerable to response sets 4. Items correspond to content yet fail to assess desired processes
  • 41. Cronbach (1949): ultimate consideration 1. Logical analysis is inferior to empirical evidence. 2. Most frequently used criterion: instructor or supervisors rating, others tests of the same attribute 3. Discussed criterion problem in-depth 4. Rise of particular empirical approach : factorialV, the degree that a test could purely measure one type of ability