SlideShare a Scribd company logo
1 of 21
Download to read offline
PRINCIPLES OF LANGUAGE ASSESSMENT
Dr.VMS
Brown’s Model
COMPONENTS OF LANGUAGE ASSESSMENT
1. Practicality,
2. Reliability,
3. Validity,
4. Authenticity and
5. Washback
1. PRACTICALITY
 An effective test is practical. This means that it
 Is not excessively expensive,
 Stays within appropriate time constraints,
 Is relatively easy to administer, and
 Has a scoring/evaluation procedure that is
specific and time-efficient.
PRACTICALITY
 A test that is prohibitively expensive is
impractical. A test of language proficiency that
takes a student five hours to complete is
impractical-it consumes more time (and money)
than necessary to accomplish its objective. A test
that requires individual one-on-one proctoring is
impractical for a group of several hundred test-
takers and only a handful of examiners. A test
that takes a few minutes for a student to take
and several hours for an examiner too evaluate
is impractical for most classroom situations.

2. RELIABILITY
 A reliable test is consistent and dependable. If
you give the same test to the same student or
matched students on two different occasions, the
test should yield similar result. The issue of
reliability of a test may best be addressed by
considering a number of factors that may
contribute to the unreliability of a test. Consider
the following possibilities (adapted from Mousavi,
2002, p. 804): fluctuations in the student, in
scoring, in test administration, and in the test
itself.
2.1 STUDENT-RELATED RELIABILITY
 The most common learner-related issue in
reliability is caused by temporary illness,
fatigue, a “bad day,” anxiety, and other
physical or psychological factors, which may
make an “observed” score deviate from one’s
“true” score. Also included in this category are
such factors as a test-taker’s “test-wiseness” or
strategies for efficient test taking (Mousavi,
2002, p. 804).
2.2 RATER RELIABILITY
 Human error, subjectivity, and bias may enter into the
scoring process. Inter-rater reliability occurs when two
or more scores yield inconsistent score of the same
test, possibly for lack of attention to scoring criteria,
inexperience, inattention, or even preconceived biases.
In the story above about the placement test, the initial
scoring plan for the dictations was found to be
unreliable-that is, the two scorers were not applying
the same standards.
2.3 TEST ADMINISTRATION RELIABILITY
 Unreliability may also result from the conditions in
which the test is administered. I once witnessed the
administration of a test of aural comprehension in
which a tape recorder played items for comprehension,
but because of street noise outside the building,
students sitting next to windows could not hear the
tape accurately. This was a clear case of unreliability
caused by the conditions of the test administration.
Other sources of unreliability are found in
photocopying variations, the amount of light in
different parts of the room, variations in temperature,
and even the condition of desks and chairs.
2.4 TEST RELIABILITY
 Sometimes the nature of the test itself can cause
measurement errors. If a test is too long, test-takers
may become fatigued by the time they reach the later
items and hastily respond incorrectly. Timed tests may
discriminate against students who do not perform well
on a test with a time limit. We all know people (and
you may be include in this category1) who “know” the
course material perfectly but who are adversely
affected by the presence of a clock ticking away. Poorly
written test items (that are ambiguous or that have
more than on correct answer) may be a further source
of test unreliability.
3. VALIDITY
 By far the most complex criterion of an effective test-and
arguably the most important principle-is validity, “the extent
to which inferences made from assessment result are
appropriate, meaningful, and useful in terms of the purpose of
the assessment” (Ground, 1998, p. 226). A valid test of reading
ability actually measures reading ability-not 20/20 vision, nor
previous knowledge in a subject, nor some other variable of
questionable relevance. To measure writing ability, one might
ask students to write as many words as they can in 15
minutes, then simply count the words for the final score. Such
a test would be easy to administer (practical), and the scoring
quite dependable (reliable). But it would not constitute a valid
test of writing ability without some consideration of
comprehensibility, rhetorical discourse elements, and the
organization of ideas, among other factors.
3.1 CONTENT-RELATE EVIDENCE
 If a test actually samples the subject matter
about which conclusion are to be drawn, and if it
requires the test-takers to perform the behavior
that is being measured, it can claim content-
related evidence of validity, often popularly
referred to as content validity (e.g., Mousavi,
2002; Hughes, 2003). You can usually identify
content-related evidence observationally if you
can clearly define the achievement that you are
measuring.
3.2 CRITERION-RELATED EVIDENCE
 A second of evidence of the validity of a test may be
found in what is called criterion-related evidence, also
referred to as criterion-related validity, or the extent
to which the “criterion” of the test has actually been
reached. You will recall that in Chapter I it was noted
that most classroom-based assessment with teacher-
designed tests fits the concept of criterion-referenced
assessment. In such tests, specified classroom
objectives are measured, and implied predetermined
levels of performance are expected to be reached (80
percent is considered a minimal passing grade).

3.4 CONSTRUCT-RELATED EVIDENCE
 A third kind of evidence that can support
validity, but one that does not play as large a role
classroom teachers, is construct-related validity,
commonly referred to as construct validity. A
construct is any theory, hypothesis, or model that
attempts to explain observed phenomena in our
universe of perceptions. Constructs may or may
not be directly or empirically measured-their
verification often requires inferential data.
3.5 CONSEQUENTIAL VALIDITY
 As well as the above three widely accepted forms of
evidence that may be introduced to support the validity of
an assessment, two other categories may be of some interest
and utility in your own quest for validating classroom test.
Messick (1989), Grounlund (1998), McNamara (2000), and
Brindley (2001), among others, underscore the potential
importance of the consequences of using an assessment.
Consequential validity encompasses all the consequences of
a test, including such considerations as its accuracy in
measuring intended criteria, its impact on the preparation
of test-takers, its effect on the learner, and the (intended
and unintended) social consequences of a test’s
interpretation and use.
3.6 FACE VALIDITY
 An important facet of consequential validity is the
extent to which “students view the assessment as fair,
relevant, and useful for improving learning”
(Gronlund, 1998, p. 210), or what is popularly known
as face validity. “Face validity refers to the degree to
which a test looks right, and appears to measure the
knowledge or abilities it claims to measure, based on
the subjective judgment of the examines who take it,
the administrative personnel who decode on its use,
and other psychometrically unsophisticated observers”
(Mousavi, 2002, p. 244).
4. AUTHENTICITY
 An fourth major principle of language testing is
authenticity, a concept that is a little slippery to
define, especially within the art and science of
evaluating and designing tests. Bachman and
Palmer (1996, p. 23) define authenticity as “the
degree of correspondence of the characteristics of
a given language test task to the features of a
target language task,” and then suggest an
agenda for identifying those target language
tasks and for transforming them into valid test
items.
5. WASHBACK
 A facet of consequential validity, discussed above, is “the
effect of testing on teaching and learning” (Hughes, 2003, p.
1), otherwise known among language-testing specialists as
washback. In large-scale assessment, wasback generally
refers to the effects the test have on instruction in terms of
how students prepare for the test. “Cram” courses and
“teaching to the test” are examples of such washback.
Another form of washback that occurs more in classroom
assessment is the information that “washes back” to
students in the form of useful diagnoses of strengths and
weaknesses. Washback also includes the effects of an
assessment on teaching and learning prior to the
assessment itself, that is, on preparation for the
assessment.
5.1 WASHBACK/BACKWASH
 The term wasback is commonly used in applied
linguistics. it is rarely found in dictionaries.
 However, the word backwash can be found in certain
dictionaries and it is defined as “an effect that is not
the direct result of something” by Cambridge
Advanced Learner’s Dictionary.
 In dealing with principles of language assessment,
these two words somehow can be interchangeable.
 Washback (Brown, 2004) or Backwash (Heaton, 1990)
refers to the influence of testing on teaching and
learning.  The influence itself can be positive or
negative (Cheng et al. (Eds.), 2008:7-11)
5.2 POSITIVE WASHBACK
 Positive washback has beneficial influence on
teaching and learning. It means teachers and
students have a positive attitude toward the
examination or test, and work willingly and
collaboratively towards its objective (Cheng &
Curtis, 2008:10).
 A good test should have a good effect.
5.3 NEGATIVE WASHBACK
 Negative washback does not give any beneficial
influence on teaching and learning (Cheng and
Curtis, 2008:9).
 Tests which have negative washback is
considered to have negative influence on teaching
and learning.
Thank You

More Related Content

What's hot

Unit1(testing, assessing, and teaching)
Unit1(testing, assessing, and teaching)Unit1(testing, assessing, and teaching)
Unit1(testing, assessing, and teaching)
Kheang Sokheng
 
Task based syllabus
Task based syllabusTask based syllabus
Task based syllabus
Uspan Sayuti
 
Task Based Language Teaching - TBLT
Task Based Language Teaching - TBLTTask Based Language Teaching - TBLT
Task Based Language Teaching - TBLT
Müberra GÜLEK
 
Chapter 3(designing classroom language tests)
Chapter 3(designing classroom language tests)Chapter 3(designing classroom language tests)
Chapter 3(designing classroom language tests)
Kheang Sokheng
 
Notional functional syllabus design
Notional functional syllabus designNotional functional syllabus design
Notional functional syllabus design
Mazharul Islam
 
Designing classroom language tests
Designing classroom language testsDesigning classroom language tests
Designing classroom language tests
Sutrisno Evenddy
 
Principles of language assessment
Principles of language assessmentPrinciples of language assessment
Principles of language assessment
Sutrisno Evenddy
 

What's hot (20)

Test techniques and testing overall ability
Test techniques and testing overall abilityTest techniques and testing overall ability
Test techniques and testing overall ability
 
Unit1(testing, assessing, and teaching)
Unit1(testing, assessing, and teaching)Unit1(testing, assessing, and teaching)
Unit1(testing, assessing, and teaching)
 
Designing classroom language tests
Designing classroom language testsDesigning classroom language tests
Designing classroom language tests
 
Task based syllabus
Task based syllabusTask based syllabus
Task based syllabus
 
Testing grammar and vocabulary
Testing grammar and vocabularyTesting grammar and vocabulary
Testing grammar and vocabulary
 
Chapter 2: Principles of Language Assessment
Chapter 2: Principles of Language AssessmentChapter 2: Principles of Language Assessment
Chapter 2: Principles of Language Assessment
 
Task Based Language Teaching - TBLT
Task Based Language Teaching - TBLTTask Based Language Teaching - TBLT
Task Based Language Teaching - TBLT
 
Chapter 3(designing classroom language tests)
Chapter 3(designing classroom language tests)Chapter 3(designing classroom language tests)
Chapter 3(designing classroom language tests)
 
Notional functional syllabus design
Notional functional syllabus designNotional functional syllabus design
Notional functional syllabus design
 
Teaching and testing
Teaching and testingTeaching and testing
Teaching and testing
 
Designing classroom language tests
Designing classroom language testsDesigning classroom language tests
Designing classroom language tests
 
Types of tests and types of testing
Types of tests and types of testingTypes of tests and types of testing
Types of tests and types of testing
 
Principles of language assessment
Principles of language assessmentPrinciples of language assessment
Principles of language assessment
 
Assessing Writing
Assessing WritingAssessing Writing
Assessing Writing
 
Designing language test
Designing language testDesigning language test
Designing language test
 
Testing for Language Teachers
Testing for Language TeachersTesting for Language Teachers
Testing for Language Teachers
 
Approaches to Language Testing
Approaches to Language TestingApproaches to Language Testing
Approaches to Language Testing
 
Situational syllabus
Situational syllabusSituational syllabus
Situational syllabus
 
Types of syllabus design
Types of syllabus designTypes of syllabus design
Types of syllabus design
 
Assessing listening
Assessing listening Assessing listening
Assessing listening
 

Similar to Principles of Language Assessment

Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliability
songoten77
 
Main Requirements of an Efficient Test
Main Requirements of an Efficient TestMain Requirements of an Efficient Test
Main Requirements of an Efficient Test
Jennifer Ocampo
 
Test characteristics
Test characteristicsTest characteristics
Test characteristics
Samcruz5
 

Similar to Principles of Language Assessment (20)

The nittygritty of language testing
The nittygritty of language testingThe nittygritty of language testing
The nittygritty of language testing
 
Basic Principles of Assessment
Basic Principles of AssessmentBasic Principles of Assessment
Basic Principles of Assessment
 
CLASSROOM ACTIVITIES
CLASSROOM  ACTIVITIESCLASSROOM  ACTIVITIES
CLASSROOM ACTIVITIES
 
Principles of assessment
Principles of assessmentPrinciples of assessment
Principles of assessment
 
Testing
TestingTesting
Testing
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliability
 
Principles of Language Assessment
Principles of Language AssessmentPrinciples of Language Assessment
Principles of Language Assessment
 
PRINCIPLES OF ASSESSMENT 2.pptx
PRINCIPLES OF ASSESSMENT 2.pptxPRINCIPLES OF ASSESSMENT 2.pptx
PRINCIPLES OF ASSESSMENT 2.pptx
 
Validity and objectivity of tests
Validity and objectivity of testsValidity and objectivity of tests
Validity and objectivity of tests
 
Standardized and non standardized tests (1)
Standardized and non standardized tests (1)Standardized and non standardized tests (1)
Standardized and non standardized tests (1)
 
Discussion question for meeting two language assessment
Discussion question for meeting two language assessmentDiscussion question for meeting two language assessment
Discussion question for meeting two language assessment
 
Main Requirements of an Efficient Test
Main Requirements of an Efficient TestMain Requirements of an Efficient Test
Main Requirements of an Efficient Test
 
ASSESSMENT.pptx
ASSESSMENT.pptxASSESSMENT.pptx
ASSESSMENT.pptx
 
Standardized and non standardized tests
Standardized and non standardized testsStandardized and non standardized tests
Standardized and non standardized tests
 
Test characteristics
Test characteristicsTest characteristics
Test characteristics
 
Language Testing : Principles of language assessment
Language Testing : Principles of language assessment Language Testing : Principles of language assessment
Language Testing : Principles of language assessment
 
CHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENTCHARACTERISTICS OF A GOOD INSTRUMENT
CHARACTERISTICS OF A GOOD INSTRUMENT
 
Qualities of a Good Test
Qualities of a Good TestQualities of a Good Test
Qualities of a Good Test
 
Validity
ValidityValidity
Validity
 
Issues regarding construction of exams
Issues regarding construction of examsIssues regarding construction of exams
Issues regarding construction of exams
 

More from SubramanianMuthusamy3

More from SubramanianMuthusamy3 (19)

Call and calt
Call and caltCall and calt
Call and calt
 
Lfg and gpsg
Lfg and gpsgLfg and gpsg
Lfg and gpsg
 
Group discussion
Group discussionGroup discussion
Group discussion
 
Word sense, notions
Word sense, notionsWord sense, notions
Word sense, notions
 
Rewrite systems
Rewrite systemsRewrite systems
Rewrite systems
 
Phrase structure grammar
Phrase structure grammarPhrase structure grammar
Phrase structure grammar
 
Head Movement and verb movement
Head Movement and verb movementHead Movement and verb movement
Head Movement and verb movement
 
Text editing, analysis, processing, bibliography
Text editing, analysis, processing, bibliographyText editing, analysis, processing, bibliography
Text editing, analysis, processing, bibliography
 
R language
R languageR language
R language
 
Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
Computer programming languages
Computer programming languagesComputer programming languages
Computer programming languages
 
Computer dictionaries and_parsing_ppt
Computer dictionaries and_parsing_pptComputer dictionaries and_parsing_ppt
Computer dictionaries and_parsing_ppt
 
Applications of computers in linguistics
Applications of computers in linguisticsApplications of computers in linguistics
Applications of computers in linguistics
 
Scope of translation technologies in indusstry 5.0
Scope of translation technologies in indusstry 5.0Scope of translation technologies in indusstry 5.0
Scope of translation technologies in indusstry 5.0
 
Stylistics in computational perspective
Stylistics in computational perspectiveStylistics in computational perspective
Stylistics in computational perspective
 
Presentation skills
Presentation skillsPresentation skills
Presentation skills
 
Creativity and strategic thinking
Creativity and strategic thinkingCreativity and strategic thinking
Creativity and strategic thinking
 
Building rapport soft skills
Building rapport soft skillsBuilding rapport soft skills
Building rapport soft skills
 
Types of computers[6999]
Types of computers[6999]Types of computers[6999]
Types of computers[6999]
 

Recently uploaded

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Recently uploaded (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

Principles of Language Assessment

  • 1. PRINCIPLES OF LANGUAGE ASSESSMENT Dr.VMS Brown’s Model
  • 2. COMPONENTS OF LANGUAGE ASSESSMENT 1. Practicality, 2. Reliability, 3. Validity, 4. Authenticity and 5. Washback
  • 3. 1. PRACTICALITY  An effective test is practical. This means that it  Is not excessively expensive,  Stays within appropriate time constraints,  Is relatively easy to administer, and  Has a scoring/evaluation procedure that is specific and time-efficient.
  • 4. PRACTICALITY  A test that is prohibitively expensive is impractical. A test of language proficiency that takes a student five hours to complete is impractical-it consumes more time (and money) than necessary to accomplish its objective. A test that requires individual one-on-one proctoring is impractical for a group of several hundred test- takers and only a handful of examiners. A test that takes a few minutes for a student to take and several hours for an examiner too evaluate is impractical for most classroom situations. 
  • 5. 2. RELIABILITY  A reliable test is consistent and dependable. If you give the same test to the same student or matched students on two different occasions, the test should yield similar result. The issue of reliability of a test may best be addressed by considering a number of factors that may contribute to the unreliability of a test. Consider the following possibilities (adapted from Mousavi, 2002, p. 804): fluctuations in the student, in scoring, in test administration, and in the test itself.
  • 6. 2.1 STUDENT-RELATED RELIABILITY  The most common learner-related issue in reliability is caused by temporary illness, fatigue, a “bad day,” anxiety, and other physical or psychological factors, which may make an “observed” score deviate from one’s “true” score. Also included in this category are such factors as a test-taker’s “test-wiseness” or strategies for efficient test taking (Mousavi, 2002, p. 804).
  • 7. 2.2 RATER RELIABILITY  Human error, subjectivity, and bias may enter into the scoring process. Inter-rater reliability occurs when two or more scores yield inconsistent score of the same test, possibly for lack of attention to scoring criteria, inexperience, inattention, or even preconceived biases. In the story above about the placement test, the initial scoring plan for the dictations was found to be unreliable-that is, the two scorers were not applying the same standards.
  • 8. 2.3 TEST ADMINISTRATION RELIABILITY  Unreliability may also result from the conditions in which the test is administered. I once witnessed the administration of a test of aural comprehension in which a tape recorder played items for comprehension, but because of street noise outside the building, students sitting next to windows could not hear the tape accurately. This was a clear case of unreliability caused by the conditions of the test administration. Other sources of unreliability are found in photocopying variations, the amount of light in different parts of the room, variations in temperature, and even the condition of desks and chairs.
  • 9. 2.4 TEST RELIABILITY  Sometimes the nature of the test itself can cause measurement errors. If a test is too long, test-takers may become fatigued by the time they reach the later items and hastily respond incorrectly. Timed tests may discriminate against students who do not perform well on a test with a time limit. We all know people (and you may be include in this category1) who “know” the course material perfectly but who are adversely affected by the presence of a clock ticking away. Poorly written test items (that are ambiguous or that have more than on correct answer) may be a further source of test unreliability.
  • 10. 3. VALIDITY  By far the most complex criterion of an effective test-and arguably the most important principle-is validity, “the extent to which inferences made from assessment result are appropriate, meaningful, and useful in terms of the purpose of the assessment” (Ground, 1998, p. 226). A valid test of reading ability actually measures reading ability-not 20/20 vision, nor previous knowledge in a subject, nor some other variable of questionable relevance. To measure writing ability, one might ask students to write as many words as they can in 15 minutes, then simply count the words for the final score. Such a test would be easy to administer (practical), and the scoring quite dependable (reliable). But it would not constitute a valid test of writing ability without some consideration of comprehensibility, rhetorical discourse elements, and the organization of ideas, among other factors.
  • 11. 3.1 CONTENT-RELATE EVIDENCE  If a test actually samples the subject matter about which conclusion are to be drawn, and if it requires the test-takers to perform the behavior that is being measured, it can claim content- related evidence of validity, often popularly referred to as content validity (e.g., Mousavi, 2002; Hughes, 2003). You can usually identify content-related evidence observationally if you can clearly define the achievement that you are measuring.
  • 12. 3.2 CRITERION-RELATED EVIDENCE  A second of evidence of the validity of a test may be found in what is called criterion-related evidence, also referred to as criterion-related validity, or the extent to which the “criterion” of the test has actually been reached. You will recall that in Chapter I it was noted that most classroom-based assessment with teacher- designed tests fits the concept of criterion-referenced assessment. In such tests, specified classroom objectives are measured, and implied predetermined levels of performance are expected to be reached (80 percent is considered a minimal passing grade). 
  • 13. 3.4 CONSTRUCT-RELATED EVIDENCE  A third kind of evidence that can support validity, but one that does not play as large a role classroom teachers, is construct-related validity, commonly referred to as construct validity. A construct is any theory, hypothesis, or model that attempts to explain observed phenomena in our universe of perceptions. Constructs may or may not be directly or empirically measured-their verification often requires inferential data.
  • 14. 3.5 CONSEQUENTIAL VALIDITY  As well as the above three widely accepted forms of evidence that may be introduced to support the validity of an assessment, two other categories may be of some interest and utility in your own quest for validating classroom test. Messick (1989), Grounlund (1998), McNamara (2000), and Brindley (2001), among others, underscore the potential importance of the consequences of using an assessment. Consequential validity encompasses all the consequences of a test, including such considerations as its accuracy in measuring intended criteria, its impact on the preparation of test-takers, its effect on the learner, and the (intended and unintended) social consequences of a test’s interpretation and use.
  • 15. 3.6 FACE VALIDITY  An important facet of consequential validity is the extent to which “students view the assessment as fair, relevant, and useful for improving learning” (Gronlund, 1998, p. 210), or what is popularly known as face validity. “Face validity refers to the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure, based on the subjective judgment of the examines who take it, the administrative personnel who decode on its use, and other psychometrically unsophisticated observers” (Mousavi, 2002, p. 244).
  • 16. 4. AUTHENTICITY  An fourth major principle of language testing is authenticity, a concept that is a little slippery to define, especially within the art and science of evaluating and designing tests. Bachman and Palmer (1996, p. 23) define authenticity as “the degree of correspondence of the characteristics of a given language test task to the features of a target language task,” and then suggest an agenda for identifying those target language tasks and for transforming them into valid test items.
  • 17. 5. WASHBACK  A facet of consequential validity, discussed above, is “the effect of testing on teaching and learning” (Hughes, 2003, p. 1), otherwise known among language-testing specialists as washback. In large-scale assessment, wasback generally refers to the effects the test have on instruction in terms of how students prepare for the test. “Cram” courses and “teaching to the test” are examples of such washback. Another form of washback that occurs more in classroom assessment is the information that “washes back” to students in the form of useful diagnoses of strengths and weaknesses. Washback also includes the effects of an assessment on teaching and learning prior to the assessment itself, that is, on preparation for the assessment.
  • 18. 5.1 WASHBACK/BACKWASH  The term wasback is commonly used in applied linguistics. it is rarely found in dictionaries.  However, the word backwash can be found in certain dictionaries and it is defined as “an effect that is not the direct result of something” by Cambridge Advanced Learner’s Dictionary.  In dealing with principles of language assessment, these two words somehow can be interchangeable.  Washback (Brown, 2004) or Backwash (Heaton, 1990) refers to the influence of testing on teaching and learning.  The influence itself can be positive or negative (Cheng et al. (Eds.), 2008:7-11)
  • 19. 5.2 POSITIVE WASHBACK  Positive washback has beneficial influence on teaching and learning. It means teachers and students have a positive attitude toward the examination or test, and work willingly and collaboratively towards its objective (Cheng & Curtis, 2008:10).  A good test should have a good effect.
  • 20. 5.3 NEGATIVE WASHBACK  Negative washback does not give any beneficial influence on teaching and learning (Cheng and Curtis, 2008:9).  Tests which have negative washback is considered to have negative influence on teaching and learning.