SlideShare une entreprise Scribd logo
1  sur  41
LANGUAGE  TESTING  INGLÉS Second  Bimester Language School Teacher: Orlando V. Lizaldes E. April – August 2011 1
Second Bimester ,[object Object]
 6 Measurement
 7 The social character of language tests
 8 New directions –and dilemmas?2 http://www.google.com/imgres
Testing is a matter of using data to establish evidence of learning. 
What makes a good tests good? Its qualities: Reliable, Valid, Practical.  There’s no such a thing as a “good test”  4
Validity Validity Reliability (standardized tests) Inference Judgment Test Validation 5
Testing the test 6
Key questions in assessment Validity: does this test measure what is supposed to measure?  Reliability: does this test or instrument consistently measure what is supposed to measure?
The harder of the two concepts is… Reliability doesn’t really apply to classroom teachers or classroom based test very often.
Reliability. Conceptual understanding ,[object Object]
 It has to deal with consistency of measurement… means the same test to the same group of students.
It is not really a reliability application in classroom-based teaching. We really don’t have time to give the same tests over and over to the same person to see if this test is reliable or not. High stakes test (YES),[object Object]
11 EXAMPLE:
Validity:the degree to which the test actually measures what it is intended to measure.
If no validation There is potential for unfairness and injustice The potential is in proportion to what is at stake. The validation procedure guarantees the FACE VALIDITY of the test.
MEASUREMENT What is measurement? 	Is the estimation of physical quantity such as distance, energy, temperature, time. Measurements find the ratio of some physical quantity to a standard quantity of the same type, thus a measurement of length is the ratio of a physical length to some standard length, such as a standard meter.
MEASUREMENT Assessment usually involves allocating a score, an attractively simple number.  A rose is a rose is a rose “Gertrude Stein (Sacred Emily) A score is not a score is not a score because different raters give the same and different  scores. Measurement = dauntingly technical field = means, percentiles, standard deviations and statistics.
Measurement always involves some error, and so in science measurements are accompanied by error bounds.
The assigning of  numbers and scores QUANTIFICATION MATH –  PROCEDURES For various kinds  of mathematical and  statistical patterning within  the matrix in order to investigate  the extent to which necessary  properties are present in  the assessment.
Investigating the properties of individual test items Investigating rater characteristics is important to guaranteeing the meaningfulness and fairness of assessment performance. (ITEM ANALYSIS).  Item analysis is a normal part of test development               PILOT        OPERATIONAL
Correlation coefficient r It expresses the extent to which one score set is knowable from another, and uses a scale from 0 to 1.  Reliability coefficient Inter-rater reliability 19
Norm-referenced and Criterion-referenced MEASUREMENTS Norm-referenced Measurements (N-R-M) adopts a framework of comparison between individuals for understanding the significance of any single score.  In Criterion-referenced Measurements (C-R-M) individual performances are evaluated against a verbal description of a satisfactory performance at a given level.
Criterion-referenced They are not always easily defined in a yes/no judgment.
Norm-referenced www.utpl.edu.ec Scores may not be consistent across instruments
Bell curve of a normal distribution http://www.google.com/imgres?imgurl=http://classes
CENTRAL TENDENCY The Central Tendency of a distributionisanestimate of the “center” of a distribution of values. http://www.google.com/images?imgurlstr=http://centraltendency
CENTRAL TENDENCY There are threemajortypes of estimates of CentralTendency:    -  Mean    -  Median    -  Mode
CENTRAL TENDENCY The Mean oraverageisprobablythemostcommonlyusedmethod of describing central tendency.
CENTRAL TENDENCY   The Mean    To compute the mean, add up all the values and divide by the number of values.
CENTRAL TENDENCY The Mean     For example:    20, 20, 20, 18, 17, 14, 14= 135    The sum of these 8 values is 135/8=  16.87
CENTRAL TENDENCY The Median  Isthe score found at theexactmiddle of the set of values. One way to compute the median is to list all scores in numerical order, and then locate the score in the center of the sample.
The Median  EXAMPLES:      15, 15, 15, 15, 15, 17, 18, 20 There are 8 scores and score # 4 and # 5 representthehalfwaypoint. Sinceboththese scores are 15, the median is 15. Example: find the Median of {12, 3 and 5} Put them in order:  3, 5, 12 The middle number is 5, so the median is 5.
CENTRAL TENDENCY If the two middle scores have different values, you would have to interpolate to determine the median. There are now fourteen numbers and so we don't have just one middle number, we have a pair of middle numbers:  3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 40, 56  In this example the middle numbers are 21 and 23.  To find the value half-way between them, add them together and divide by 2:  21 + 23 = 4444 ÷ 2 = 22 And, so, the Median in this example is 22.
The social character of language tests Educational assessment has traditionally drawn its concepts and procedures from the field of Psychology. When tests reform are introduced within the educational system, they are likely to figure prominently in the press and become matters of public concern.
Conventional proficiency tests have been used for purposes of exclusion. Industrialized countries have developed more flexible policies for the recognition and certification of specific work-related skills (competencies) International Ss need to meet a standard on a language test for academic purposes.
Computers and Language Testing The proponents of computer based testing can point to a number of advantages. First, scoring of fixed response items can be done automatically, and the candidate can be given a score immediately.  Second, the computer can deliver tests that are tailored to the particular abilities of the candidate.
It seems inefficient for all candidates to take all the questions on a test; clearly some are so easy for some candidates that they provide little information on their abilities; others are too hard to be of use. It makes sense to use the very limited time available for testing to focus on those items that are just within, and just beyond a candidate’s threshold of ability.
The use of computer for delivery of test materials raises questions of validity. For example, different levels of familiarity with computers will affect people’s performance with them, and interaction with the computer may be stressful experience for some students or candidates. (McNamara ( 2000, 79-81)
New directions Computer based tests (CBT) Do raters react differently to printed versus handwritten texts? Semi-direct test of speaking. (cheaper to administer – raises questions of validity since there’s no COMMUNICATION at all.) 37

Contenu connexe

Tendances

Arte387 Ch8
Arte387 Ch8Arte387 Ch8
Arte387 Ch8
SCWARTED
 
Testing for Language Teachers Arthur Hughes
Testing for Language TeachersArthur HughesTesting for Language TeachersArthur Hughes
Testing for Language Teachers Arthur Hughes
Rajputt Ainee
 
Measuring and scaling of quantitative data khalid
Measuring and scaling of quantitative data khalidMeasuring and scaling of quantitative data khalid
Measuring and scaling of quantitative data khalid
Khalid Mahmood
 
Lesson 4 analysis of test results
Lesson 4 analysis of test resultsLesson 4 analysis of test results
Lesson 4 analysis of test results
Carlo Magno
 
RELIABILITY IN LANGUAGE TESTING-TITIN'S GROUP
RELIABILITY IN LANGUAGE TESTING-TITIN'S GROUPRELIABILITY IN LANGUAGE TESTING-TITIN'S GROUP
RELIABILITY IN LANGUAGE TESTING-TITIN'S GROUP
Titin Rohayati
 

Tendances (20)

Arte387 Ch8
Arte387 Ch8Arte387 Ch8
Arte387 Ch8
 
Attitude scales
Attitude scalesAttitude scales
Attitude scales
 
Testing for Language Teachers Arthur Hughes
Testing for Language TeachersArthur HughesTesting for Language TeachersArthur Hughes
Testing for Language Teachers Arthur Hughes
 
Lecture 07
Lecture 07Lecture 07
Lecture 07
 
Attitude scaling
Attitude scalingAttitude scaling
Attitude scaling
 
Measuring and scaling of quantitative data khalid
Measuring and scaling of quantitative data khalidMeasuring and scaling of quantitative data khalid
Measuring and scaling of quantitative data khalid
 
Criteria of a good language test
Criteria of a good language testCriteria of a good language test
Criteria of a good language test
 
Topic 7 measurement in research
Topic 7   measurement in researchTopic 7   measurement in research
Topic 7 measurement in research
 
Concept of Measurements in Business Research
Concept of Measurements in Business ResearchConcept of Measurements in Business Research
Concept of Measurements in Business Research
 
Attitude Measurement
Attitude MeasurementAttitude Measurement
Attitude Measurement
 
Lesson 4 analysis of test results
Lesson 4 analysis of test resultsLesson 4 analysis of test results
Lesson 4 analysis of test results
 
Likert scale
Likert scaleLikert scale
Likert scale
 
Rating Scale
Rating ScaleRating Scale
Rating Scale
 
Test item analysis
Test item analysisTest item analysis
Test item analysis
 
Validity, reliabiltiy and alignment to determine the effectiveness of assessment
Validity, reliabiltiy and alignment to determine the effectiveness of assessmentValidity, reliabiltiy and alignment to determine the effectiveness of assessment
Validity, reliabiltiy and alignment to determine the effectiveness of assessment
 
Types of Scores & Types of Standard Scores
Types of Scores & Types of Standard ScoresTypes of Scores & Types of Standard Scores
Types of Scores & Types of Standard Scores
 
RELIABILITY IN LANGUAGE TESTING-TITIN'S GROUP
RELIABILITY IN LANGUAGE TESTING-TITIN'S GROUPRELIABILITY IN LANGUAGE TESTING-TITIN'S GROUP
RELIABILITY IN LANGUAGE TESTING-TITIN'S GROUP
 
Utilization of assessment data
Utilization of assessment dataUtilization of assessment data
Utilization of assessment data
 
Validity, reliablility, washback
Validity, reliablility, washbackValidity, reliablility, washback
Validity, reliablility, washback
 
Research tool.rating scale
Research tool.rating scaleResearch tool.rating scale
Research tool.rating scale
 

En vedette (6)

Language testing and evaluation validity and reliability.
Language testing and evaluation validity and reliability.Language testing and evaluation validity and reliability.
Language testing and evaluation validity and reliability.
 
Assessments, concepts and issues
Assessments, concepts and issuesAssessments, concepts and issues
Assessments, concepts and issues
 
Educational measurement, assessment and evaluation
Educational measurement, assessment and evaluationEducational measurement, assessment and evaluation
Educational measurement, assessment and evaluation
 
Measurement,evaluation,assessment(upload)
Measurement,evaluation,assessment(upload)Measurement,evaluation,assessment(upload)
Measurement,evaluation,assessment(upload)
 
Measurement, Evaluation, Assessment, and Tests
Measurement, Evaluation, Assessment, and TestsMeasurement, Evaluation, Assessment, and Tests
Measurement, Evaluation, Assessment, and Tests
 
Testing, assessment, measurement and evaluation definition
Testing, assessment, measurement and evaluation definitionTesting, assessment, measurement and evaluation definition
Testing, assessment, measurement and evaluation definition
 

Similaire à LENGUAGE TESTING (II Bimestre Abril Agosto 2011)

Administering,scoring and reporting a test ppt
Administering,scoring and reporting a test pptAdministering,scoring and reporting a test ppt
Administering,scoring and reporting a test ppt
Manali Solanki
 
Assessing learning in Instructional Design
Assessing learning in Instructional DesignAssessing learning in Instructional Design
Assessing learning in Instructional Design
leesha roberts
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
KAthy Cea
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
KAthy Cea
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
KAthy Cea
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
KAthy Cea
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
KAthy Cea
 
administrating test,scoring,grading vs marks
administrating test,scoring,grading vs marksadministrating test,scoring,grading vs marks
administrating test,scoring,grading vs marks
krishu29
 
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
hakim azman
 

Similaire à LENGUAGE TESTING (II Bimestre Abril Agosto 2011) (20)

Administering,scoring and reporting a test ppt
Administering,scoring and reporting a test pptAdministering,scoring and reporting a test ppt
Administering,scoring and reporting a test ppt
 
Administering a test, scoring - grading vs marks
Administering a test, scoring - grading vs marksAdministering a test, scoring - grading vs marks
Administering a test, scoring - grading vs marks
 
Educational Technology and Assessment of Learning
Educational Technology and Assessment of LearningEducational Technology and Assessment of Learning
Educational Technology and Assessment of Learning
 
Bab 3
Bab 3 Bab 3
Bab 3
 
Measurement & Evaluation pptx
Measurement & Evaluation pptxMeasurement & Evaluation pptx
Measurement & Evaluation pptx
 
ASSESSMENT OF LEARNING
ASSESSMENT OF LEARNINGASSESSMENT OF LEARNING
ASSESSMENT OF LEARNING
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnaires
 
Assessment of learning and Educational Technology
Assessment of learning and Educational Technology Assessment of learning and Educational Technology
Assessment of learning and Educational Technology
 
Assessing learning in Instructional Design
Assessing learning in Instructional DesignAssessing learning in Instructional Design
Assessing learning in Instructional Design
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
 
Evaluation.2011intro
Evaluation.2011introEvaluation.2011intro
Evaluation.2011intro
 
administrating test,scoring,grading vs marks
administrating test,scoring,grading vs marksadministrating test,scoring,grading vs marks
administrating test,scoring,grading vs marks
 
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
 
Week 6 - Scoring and Rating
Week 6 - Scoring and RatingWeek 6 - Scoring and Rating
Week 6 - Scoring and Rating
 
Language assessment tsl3123 notes
Language assessment tsl3123 notesLanguage assessment tsl3123 notes
Language assessment tsl3123 notes
 
Assessment of Learning
Assessment of LearningAssessment of Learning
Assessment of Learning
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)
 

Plus de Videoconferencias UTPL

La oración en clave de espiritualidad misionera
La oración en clave de espiritualidad misioneraLa oración en clave de espiritualidad misionera
La oración en clave de espiritualidad misionera
Videoconferencias UTPL
 
Asesoria trabajo fin de titulacion (Lineas y proyectos de investigación )
Asesoria trabajo fin de titulacion (Lineas y proyectos de investigación )Asesoria trabajo fin de titulacion (Lineas y proyectos de investigación )
Asesoria trabajo fin de titulacion (Lineas y proyectos de investigación )
Videoconferencias UTPL
 
Introducción a las ciencias ambientales
Introducción a las ciencias ambientalesIntroducción a las ciencias ambientales
Introducción a las ciencias ambientales
Videoconferencias UTPL
 

Plus de Videoconferencias UTPL (20)

La oración en clave de espiritualidad misionera
La oración en clave de espiritualidad misioneraLa oración en clave de espiritualidad misionera
La oración en clave de espiritualidad misionera
 
Asesoria trabajo fin de titulacion (Lineas y proyectos de investigación )
Asesoria trabajo fin de titulacion (Lineas y proyectos de investigación )Asesoria trabajo fin de titulacion (Lineas y proyectos de investigación )
Asesoria trabajo fin de titulacion (Lineas y proyectos de investigación )
 
Asesoria trabajo fin de titulacion (objetivos y planificacion)
Asesoria trabajo fin de titulacion (objetivos y planificacion)Asesoria trabajo fin de titulacion (objetivos y planificacion)
Asesoria trabajo fin de titulacion (objetivos y planificacion)
 
Generos graficos
Generos graficosGeneros graficos
Generos graficos
 
Periodismo digital
Periodismo digitalPeriodismo digital
Periodismo digital
 
El editorial
El editorialEl editorial
El editorial
 
La entrevista
La entrevistaLa entrevista
La entrevista
 
La noticia
La noticiaLa noticia
La noticia
 
Generos periodisticos
Generos periodisticosGeneros periodisticos
Generos periodisticos
 
Biología general
Biología generalBiología general
Biología general
 
Introducción a las ciencias ambientales
Introducción a las ciencias ambientalesIntroducción a las ciencias ambientales
Introducción a las ciencias ambientales
 
Expresion oral y escrita
Expresion oral y escritaExpresion oral y escrita
Expresion oral y escrita
 
Matematicas I
Matematicas IMatematicas I
Matematicas I
 
Contabilidad general I
Contabilidad general IContabilidad general I
Contabilidad general I
 
Realidad Nacional
Realidad NacionalRealidad Nacional
Realidad Nacional
 
Aplicación de nuevas tecnologías
Aplicación de nuevas tecnologíasAplicación de nuevas tecnologías
Aplicación de nuevas tecnologías
 
Marketing y protocolo empresarial
Marketing y protocolo empresarialMarketing y protocolo empresarial
Marketing y protocolo empresarial
 
Gerencia educativa
Gerencia educativaGerencia educativa
Gerencia educativa
 
Toma de decisiones
Toma de decisiones Toma de decisiones
Toma de decisiones
 
Ejercicios fonetica y fonologia
Ejercicios fonetica y fonologiaEjercicios fonetica y fonologia
Ejercicios fonetica y fonologia
 

Dernier

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Dernier (20)

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 

LENGUAGE TESTING (II Bimestre Abril Agosto 2011)

  • 1. LANGUAGE TESTING INGLÉS Second Bimester Language School Teacher: Orlando V. Lizaldes E. April – August 2011 1
  • 2.
  • 4. 7 The social character of language tests
  • 5. 8 New directions –and dilemmas?2 http://www.google.com/imgres
  • 6. Testing is a matter of using data to establish evidence of learning. 
  • 7. What makes a good tests good? Its qualities: Reliable, Valid, Practical. There’s no such a thing as a “good test” 4
  • 8. Validity Validity Reliability (standardized tests) Inference Judgment Test Validation 5
  • 10. Key questions in assessment Validity: does this test measure what is supposed to measure? Reliability: does this test or instrument consistently measure what is supposed to measure?
  • 11. The harder of the two concepts is… Reliability doesn’t really apply to classroom teachers or classroom based test very often.
  • 12.
  • 13. It has to deal with consistency of measurement… means the same test to the same group of students.
  • 14.
  • 16. Validity:the degree to which the test actually measures what it is intended to measure.
  • 17. If no validation There is potential for unfairness and injustice The potential is in proportion to what is at stake. The validation procedure guarantees the FACE VALIDITY of the test.
  • 18. MEASUREMENT What is measurement? Is the estimation of physical quantity such as distance, energy, temperature, time. Measurements find the ratio of some physical quantity to a standard quantity of the same type, thus a measurement of length is the ratio of a physical length to some standard length, such as a standard meter.
  • 19. MEASUREMENT Assessment usually involves allocating a score, an attractively simple number. A rose is a rose is a rose “Gertrude Stein (Sacred Emily) A score is not a score is not a score because different raters give the same and different scores. Measurement = dauntingly technical field = means, percentiles, standard deviations and statistics.
  • 20. Measurement always involves some error, and so in science measurements are accompanied by error bounds.
  • 21. The assigning of numbers and scores QUANTIFICATION MATH – PROCEDURES For various kinds of mathematical and statistical patterning within the matrix in order to investigate the extent to which necessary properties are present in the assessment.
  • 22. Investigating the properties of individual test items Investigating rater characteristics is important to guaranteeing the meaningfulness and fairness of assessment performance. (ITEM ANALYSIS). Item analysis is a normal part of test development PILOT OPERATIONAL
  • 23. Correlation coefficient r It expresses the extent to which one score set is knowable from another, and uses a scale from 0 to 1. Reliability coefficient Inter-rater reliability 19
  • 24. Norm-referenced and Criterion-referenced MEASUREMENTS Norm-referenced Measurements (N-R-M) adopts a framework of comparison between individuals for understanding the significance of any single score. In Criterion-referenced Measurements (C-R-M) individual performances are evaluated against a verbal description of a satisfactory performance at a given level.
  • 25. Criterion-referenced They are not always easily defined in a yes/no judgment.
  • 26. Norm-referenced www.utpl.edu.ec Scores may not be consistent across instruments
  • 27. Bell curve of a normal distribution http://www.google.com/imgres?imgurl=http://classes
  • 28. CENTRAL TENDENCY The Central Tendency of a distributionisanestimate of the “center” of a distribution of values. http://www.google.com/images?imgurlstr=http://centraltendency
  • 29. CENTRAL TENDENCY There are threemajortypes of estimates of CentralTendency: - Mean - Median - Mode
  • 30. CENTRAL TENDENCY The Mean oraverageisprobablythemostcommonlyusedmethod of describing central tendency.
  • 31. CENTRAL TENDENCY The Mean To compute the mean, add up all the values and divide by the number of values.
  • 32. CENTRAL TENDENCY The Mean For example: 20, 20, 20, 18, 17, 14, 14= 135 The sum of these 8 values is 135/8= 16.87
  • 33. CENTRAL TENDENCY The Median Isthe score found at theexactmiddle of the set of values. One way to compute the median is to list all scores in numerical order, and then locate the score in the center of the sample.
  • 34. The Median EXAMPLES: 15, 15, 15, 15, 15, 17, 18, 20 There are 8 scores and score # 4 and # 5 representthehalfwaypoint. Sinceboththese scores are 15, the median is 15. Example: find the Median of {12, 3 and 5} Put them in order: 3, 5, 12 The middle number is 5, so the median is 5.
  • 35. CENTRAL TENDENCY If the two middle scores have different values, you would have to interpolate to determine the median. There are now fourteen numbers and so we don't have just one middle number, we have a pair of middle numbers: 3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 40, 56 In this example the middle numbers are 21 and 23. To find the value half-way between them, add them together and divide by 2: 21 + 23 = 4444 ÷ 2 = 22 And, so, the Median in this example is 22.
  • 36. The social character of language tests Educational assessment has traditionally drawn its concepts and procedures from the field of Psychology. When tests reform are introduced within the educational system, they are likely to figure prominently in the press and become matters of public concern.
  • 37. Conventional proficiency tests have been used for purposes of exclusion. Industrialized countries have developed more flexible policies for the recognition and certification of specific work-related skills (competencies) International Ss need to meet a standard on a language test for academic purposes.
  • 38. Computers and Language Testing The proponents of computer based testing can point to a number of advantages. First, scoring of fixed response items can be done automatically, and the candidate can be given a score immediately. Second, the computer can deliver tests that are tailored to the particular abilities of the candidate.
  • 39. It seems inefficient for all candidates to take all the questions on a test; clearly some are so easy for some candidates that they provide little information on their abilities; others are too hard to be of use. It makes sense to use the very limited time available for testing to focus on those items that are just within, and just beyond a candidate’s threshold of ability.
  • 40. The use of computer for delivery of test materials raises questions of validity. For example, different levels of familiarity with computers will affect people’s performance with them, and interaction with the computer may be stressful experience for some students or candidates. (McNamara ( 2000, 79-81)
  • 41. New directions Computer based tests (CBT) Do raters react differently to printed versus handwritten texts? Semi-direct test of speaking. (cheaper to administer – raises questions of validity since there’s no COMMUNICATION at all.) 37
  • 42. Summing - up Language testing remains a complex and perplexing activity. Language testing is an uncertain and approximate business at the best times, even if to the outsider this may be camouflaged by its impressive, even daunting, technical trappings (McNamara, Language Testing, 86). 38
  • 43. Consulted Bibliography McNamara, T.(2000). Language Testing. Oxford University Press. London Heaton J. B.(1998). Classroom Testing. Keys to Language Teaching. Longman. New York (USA) Richards, J.C. (2005). Communicative Language Teaching , Cambridge Univ. Press Brown, H. D. (2004). Language Assessment. Principles and classroom practices. Longman, United States IBT Tests (2004). MacGraw Hills. Freeman D., Richards J.C. (2001). Teacher Learning in Language Teaching. Pearson. USA O’Malley, J. M., Valdez Pierce, L. (1996). Authentic assessment for English language learners. Practical approaches for teachers. Longman. USA 39
  • 45. 41