3. The ongoing process of gathering, analysing
and reflecting on evidence to make informed
and consistent judgements with the goal of
improving student outcomes.
Gathering, analysing and reflecting on
evidence to make informed judgements
within a targeted outcome area
4. Find out what students know
Identify students’ learning needs
Plan teaching programs
Select candidates for programs, scholarships
Monitor effectiveness of interventions
Monitor impact of policy
Report to parents
Report to governments
5. Assessment must be technically adequate
Assessment must be targeted toward the right
level of difficulty, so that all students have
opportunity to demonstrate what they know,
think, or can do.
Assessment should be based on a variety of
different measures to cater for learner
differences.
Assessment should be ongoing rather than
episodic, and should provide a meaningful basis
for feedback and reflection.
6.
7. Given periodically to determine at a particular
point in time what students know and do not
know.
Occur at the end of unit learning.
◦ Determines at a point in time what students know
and can do.
◦ Used for reporting against standards.
◦ Used for entry (e.g. to university)
8. Selected Response
◦ Multiple Choice
◦ True/False
◦ Matching
◦ Fill-in
Extended Written Response
Performance Assessment
Assessment of practical or laboratory work.
Oral examinations
Short answer
Portfolio
9. Is generally seen as process-oriented. Although
the information that is gleaned from summative
assessments is important, it can only help in
evaluating certain aspects of the learning
process.
Can provide the information needed to adjust
teaching and learning while they are happening.
Often involves students in the formative
assessment process, both as assessors of their
own learning and as resources to other students.
11. 1. Identification by teachers & learners of learning goals,
intentions or outcomes and criteria for achieving
these.
2. Rich conversations between teachers & students that
continually build and deepen.
3. Provision of effective, timely feedback to enable
students to advance their learning.
4. Active involvement of students in their own learning.
5. Teachers responding to identified learning needs and
strengths by modifying their teaching approach(es).
Black &Wiliam, 1998
13. BALANCED CLASSROOM ASSESSMENT SYSTEM
FORMATIVE ASSESSMENT SUMMATIVEASSESSMENT
A process used by teachers
and students during
instruction that provides
feedback to adjust ongoing
teaching and learning to help
students improve their
achievement of intended
instructional outcomes.
A tool used after
instruction to measure
student achievement
which provides evidence
of student competence or
program effectiveness.
14. Diagnostic assessment is seen by some as a
component of formative assessment, but in
general it is seen as a distinct form.
In practice, the purpose of diagnostic assessment
is to ascertain, prior to instruction, each
student’s strengths, weaknesses, knowledge, and
skills. Establishing these permits the instructor to
remediate students and adjust the curriculum to
meet each pupil’s unique needs.
Because the primary purpose of the diagnostic
test is remediation, it is both un-graded and
low-stakes.
15.
16. Assessment FOR learning: used by teachers to inform their
teaching (formative assessment)
Assessment AS learning: students monitor their progress
to inform their learning goals
Assessment OF learning: teachers use evidence of student
learning to make judgments on student achievement
against goals and standards (summative assessment)
17. Expectation of variability
Marks allocated and norm calculated
Marks often statistically manipulated
Basis for student comparison
Reports give marks and class position
18. Performance criteria established for each
desired outcome
Most students expected to achieve minimum
criteria
Can be associated with pass/fail judgments
(rather than marks)
No direct comparisons between students
Report indicate number of outcomes achieved
by students
19. Score Description
5 Demonstrates excellent
understanding of the
problem. All requirements
of task are included in
response.
4 Demonstrates very good
understanding of the
problem. All requirements
of task are included.
3 Demonstrates adequate
understanding of the
problem. Most
requirements of task are
included.
2 Demonstrates poor
understanding of the
problem. Many
requirements of task are
missing.
1 Demonstrates no
understanding of the
problem.
0 No response/ task not
attempted.
20. Sample scoring for the history question: What caused World War II?
Student answers
Criterion-referenced
assessment
Norm-referenced assessment
Student #1:
WWII was caused by Hitler and Germany
invading Poland.
This answer is correct.
This answer is worse than
Student #2's answer, but better
than Student #3's answer.
Student #2:
WWII was caused by multiple factors, including
the Great Depression and the general economic
situation, the rise of nationalism, fascism, and
imperialist expansionism, and unresolved
resentments related to WWI. The war in Europe
began with the German invasion of Poland.
This answer is correct.
This answer is better than
Student #1's and Student #3's
answers.
Student #3:
WWII was caused by the assassination of
Archduke Ferdinand.
This answer is wrong.
This answer is worse than
Student #1's and Student #2's
answers.
21. Behaviourist approach:
◦ state the specific task,
◦ teach the specific task,
◦ test the specific task
Assumes learning is linear.
Suited to criterion-referenced assessment.
22. Cognitive approach:
Assumes active involvement of students in
making meaning through thinking, reasoning,
engaging (constructive)
◦ Deals with complex learning outcomes
◦ Assessments of these need extended period of time
◦ Assessments require meaningful context
Sometimes called ‘authentic’ assessment
23. ‘Authentic’ Assessment
Presents students with ‘real-world’
challenges to apply relevant skills and
knowledge
◦ Elicit higher order thinking in addition to basic
skills
◦ Allow for the possibility of multiple human
judgments
24. Which indicates a behaviourist approach and
a cognitive approach?
◦ Paper-and-Pen Tests
◦ Questionnaires, scales
◦ Portfolios
◦ Projects
◦ Performances
◦ Self- and peer-assessment
◦ Student Journals
25. 1. is integral to instructional design
2. is fair (free from biases)
3. is technically adequate
4. Has clear purpose, goals, standards and criteria
5. Attends to student outcomes and processes,
recognizing how students think and learn
6. is well targeted to allow students to show what they
know and can do
7. Uses a range of measures to cater for learner
differences
8. is ongoing rather than episodic
9. provides feedback to the learner
10. informs the teacher what to teach next
26. Level 1 Knowledge
◦ Recall of specifics and universals
◦ Recall of methods and processes
◦ Recall of pattern, structure, setting
Level 2 Comprehension
◦ Lowest level of understanding
◦ Knowing what is being communicated and using the
material without relating this to other material or
seeing its fuller implications
27. Level 3 Application
◦ The use of abstractions in particular and concrete
situations
◦ Abstractions can be in the form of general ideas, rules of
procedures, or generalized methods
◦ Abstractions can be technical principles, ideas, and
theories which must be remembered and applied
Level 4 Analysis
◦ Breakdown of material into its elements to show relative
hierarchy of ideas and/or relations between ideas
◦ Such analyses clarify the material, to indicate how it is
organized
28. Level 5 Synthesis
◦ Putting together elements to form a whole, and
arranging and combining them in to constitute a
pattern or structure not clearly seen before.
Level 6 Evaluation
◦ Judgements about the value of material for given
purposes.
◦ Quantitative and qualitative judgements about the
extent to which material and methods satisfy
criteria.
29. Emphasis on higher-order thinking
Bloom’s Taxonomy of Cognitive Objectives
(1950s) expresses qualitatively different
kinds of thinking
One of the most well-used models for
classroom assessment
Revised taxonomy Lorin Anderson (1990s)
30. Names of six major categories changed from
noun to verbs to reflect emphasis on thinking as
an active process.
Original Terms New Terms
Evaluation Creating
Synthesis Evaluating
Analysis Analyzing
Application Applying
Comprehension Understanding
Knowledge Remembering
31. Consistency or stability with which a test
measures what it is intended to measure
Property of test
All tests are imperfect at estimating the
qualities or skills they are trying to measure
◦ Score each student receives always includes some
error
◦ More reliable a test, the less error in the score
actually obtained
32. Observed score = true score + random error
Common sources of measurement error
◦ Inconsistencies across testing occasions
◦ Inconsistencies across forms of the test
◦ Inconsistencies between raters
◦ Inconsistencies in sampling to content domain
33. Standardised tests take into consideration
and make estimations of how much students
scores would probably vary if they were
tested repeatedly
◦ Standard deviation of distribution of scores from
hypothesised repeated testing
34. If the assessment measures what it is
supposed to be measuring
Is a property of test scores, not test itself –
depends on person and situation
◦ Test may be valid for one purpose but not for
another
Is a matter of degree
Evidence for validity – content related,
criterion related, construct related
35. Extent to which sample of items, tasks
or questions on an assessment are
representative of some defined
domain of content
Approaches to establish content
related validity
◦ Domain sampling, relevance, clarity
◦ Logical analysis of test content – Bloom’s
Taxonomy
◦ Examining test content and format
36. Extent to which scores are systematically
related to one or more outcome criteria
Approaches to establishing criterion related
validity
◦ Predictive validity - does the score highly correlate
to performance later
◦ Concurrent validity - does it correlate to a test
known to measure the assessment area
◦ Face validity - does evidence show test is assessing
according to decision purposes
37. Extent to which assessment measures the
identified underlying psychological
characteristic of interest
Approaches to construct related validity
◦ Explicating construct meaning
◦ Convergence evidence
◦ Divergent evidence
◦ Deriving and testing predictions about test
performance from the underlying theory
38. A test must be reliable to be valid, BUT a
reliable test is not always valid
39.
40. Differences in the extent to which the
assessee has had the opportunity to know
and become familiar with the specific subject
matter or specific processes required by the
test item
Distorts the performance of a group – either
for better or worse
41. Test content and characteristics
Test takers
Test environment
Test usage
Examining bias is matter of examining
validity of assessment across groups
42. Bias/Fairness
Distribution of difficulty within assessment
◦ Bloom’s Taxonomy
Sources of difficulty in assessment items
◦ Construct relevant/construct irrelevant
◦ Subject or concept difficulty
◦ Process difficulty
◦ Question or stimulus difficulty
43. Negations
Referential
Vocabulary
Sentence and paragraph lengths
Abstraction of text
Location of relevant text
Problem complexity
Novelty
Item placement in test
Closeness of the best distractors to the
correct answer
44. What piece of laboratory equipment is best-suited
for accurately measuring the volume of
a liquid?
a) graduated cylinder
b) beaker
c) Erlenmeyer flask
d) more than one of the above
Which piece of laboratory equipment can be
used to store chemicals for long periods of
time?
a) buret
b) evaporating dish
c) beaker
d) more than one of the above
45. Which of the following is the most appropriate
unit for expressing the weight of a pencil?
a) pounds
b) ounces
c) quarts
d) pints
e) tons
Due to budget cutbacks, the university library
now
subscribes to fewer than _?_ periodicals.
a) 25,000
b) 20,000
c) 15,000
d) 10,000
46. Reliabili
ty
Does assessment accurately reflect student’s
achievements?
Does moderation reveal consistency between markers and
student grading?
Have criteria for assessment been applied the same way by
different markers?
Validity
Does assessment measure what it was designed to
measure?
Is assessment sufficiently challenging, engaging and
relevant to students?
Does assessment provide sufficiently broad evidence –
have different types of evidence been considered?
Fairness
Are students familiar with formats and expectations of
assessment task?
Do assessment tasks favour one group over another?
Have learning activities prior to testing sufficiently
47. Many factors that should be considered when
designing assessments including
◦ amount of assessments
◦ types of assessment
◦ how to assess
◦ how to ensure assessment is truly representative of
ability
◦ analysis of data obtained from assessment
48. Woolfolk, A. & Margetts, K. (2013). Educational psychology (3rd
ed.). Pearson Education: Australia.
Marsh, C. J. (2010). Becoming a teacher: Knowledge, skills and
issues (5th ed.). Pearson Education: Australia.
Chapman, E. (2013). Technical Adequacy. Approaches to Student
Assessment. Presented at University of Western Australia: Perth,
Western Australia
NC TEACH ,(2010). ASSESSMENT:FORMATIVE & SUMMATIVE , Practices for the
Classroom:
◦ Retrieved from
◦ uncw.edu/ed/ncteach/cohort3/documents/ASSESSMENT.ppt
• Educational app (2011). Formative Feedback
Retrieved from
http://www.youtube.com/watch?v=pJ7v8TtAx8o
• Mr Bean (2007) . The Exam
Retieved from
http://www.youtube.com/watch?v=Ocd1D8fwdjU