1. Carlo Magno, PhD.
Lasallian Institute for Development and Educational Research
College of Education, De La Salle University, Manila
2. Answer the following questions as a group.Your answers
should reflect your current practices in assessing your
students. Write your answers in a piece of paper.
1. List down the things that you do when preparing to write
your test items? (procedure)
2. What are the things that you consider when writing your
test items? (concepts)
3. What further steps do you do after you have scored and
recorded the test papers? (procedure)
4. What other forms of assessment do you conduct aside
from paper and pencil tests?
3. PrepareTable of Specifications (TOS)
Use theTaxonomy of Cognitive skills (Bloom’s
taxonomy)
Conduct Item review
4. Learning objectives
Curriculum/national standards
Needs of students
Higher order thinking skills
Test length
Test instruction
Test layout
Scoring
8. 8
Decide what information should be sought
(1) No instruments are available to measure such construct
(2) All tests are foreign and it is not suitable for the
stakeholders or sample that will take the measure
(3) Existing measures are not appropriate for the purpose of
assessment
(4) The test developer intends to explore the underlying
factors of a construct and eventually confirm it
Search for Content Domain:
Search for relevant literature reviews
Look for the appropriate definition
Explain the theory
Specify the underlying variables (deconstruction)
11. 11
Write the first draft of items:
Items are created for each subscale as guided by the conceptual
definition.
The number of items as planned in the Table of Specifications
is also considered.
As much as possible, a large number of items are written to
represent well the behavior being measured.
How to write Items:
Items are based on the definition of the subscales
Provide the manifestation of the construct
Descriptions from references
Ask experts to write sample items
12. Content
Outline
No. of items
1. Table of specifications 10
2. Test and Item characteristics 20
3. Test layout 5
4. Test instructions 5
5. Reproducing the test 5
6. Test length 5
7. Scoring the test 5
TOTAL 55
One-grid Table of SpecificationsOne-grid Table of Specifications
One-gridTable of Specifications
15. 15
Good questionnaire items should:
1. Include a vocabulary that is simple, direct, and familiar to all
respondents
2. Be clear and specific
3. Not involve leading, loaded or double barreled questions
4. Be as short as possible
5. Include all conditional information prior to the key ideas
6. Be edited for readability
7. Generalizable for a large sample.
8. Avoid time-bound situations.
16. 16
Example of bad items:
I am satisfied with my wages and hours at the place where I
work. (Double Barreled)
I not in favor congress passing a law not allowing any employer
to force any employee to retire
at any age. (Double Negative)
Most people favor death penalty. What do you think? (Leading
Question)
17. 17
Select a Response Format:
After writing the items, the test developer decides on the appropriate
response format to be used in the scale.
The most common response formats used:
Binary type
Multiple choice
Short answer
Essay
Develop directions for responding:
Directions or instructions for the target respondents be created as early as
when the items are created.
Clear and concise.
Respondents should be informed how to answer.
When you intend to have a separate answer sheet, make sure to inform the
respondents about it in the instructions.
Instructions should also include ways of changing answers, how to answer
(encircle, check, or shade).
Inform the respondents in the instructions specifically what they need to do.
18. 18
Conduct a judgmental review of items
Have experts review your items.
19. 19
Reexamine and revise the questionnaire
Prepare a draft and gather preliminary pilot data:
Requires a layout of the test for the respondents.
Make the scale as easy as possible to use.
Each item can be identified with a number or a letter to
facilitate scoring of responses later.
The items should be structured for readability and recording
responses.
Whenever possible items with the same response formats are
placed together.
Make the layout visually appealing to increase response rate.
Self-explanatory and the respondents can complete it in a short
time.
Ordering of items: The first few questions set the tone for the
rest of the items and determine how willingly and
conscientiously respondents will work on subsequent
questions.
20. 20
Analyze Pilot data:
The responses in the test should be recorded using a
spreadsheet.
The numerical responses are then analyzed.
The analysis consists of determining whether the test is reliable
or valid.
Revise the Instrument:
The instrument is then revised because items with low factor
loadings (not significant in CFA) are removed
Items when removed will increase Cronbach’s alpha.
21. 21
Gather final pilot data
A large sample is again selected which is three times the
number of items.
Conduct Additional Validity and Reliability Analysis
The validity and reliability is again analyzed using the new pilot
data.
Edit the test and specify the procedures for its use
Items with low factor loadings are again removed resulting to
less items.
A new form of the test with reduced items will be formed.
Prepare the Test Manual
The test manual indicates the purpose of the test, instructions
in administering, procedure for scoring, interpreting the scores
including the norms.
22. The test must be of sufficiently length to yield
reliable scores.
The longer the test, the more the reliable the
results.This also targets the validity of the test
because the test should be valid if it is reliable.
For the grade school, one must consider the
stamina and attention span of the pupils
The test should be long enough to be adequately
reliable and short enough to be administered
23. It is the function of the test instructions to
furnish the learning experiences needed in
order to enable each examinee to understand
clearly what he is being asked to do.
Instructions may be oral, a combination of
written and oral instruction is probably
desirable, except with very young children.
Clear concise and specific.
24. The arrangement of the test items influences the
speed and accuracy of the examinee
Utilize the space available while retaining readability.
Items of the same type should be grouped together
Arrange test items from easiest to most difficult as a
means of reducing test anxiety.
The test should be ordered first by type then by
content
Each item should be completed in the column and
page in which it is started.
If the reference material is needed, it should occur on
the same page as the item
If you are using numbers to identify items it is better
to use letters for the options
26. Consistency of scores Obtained by the same
person when retested with the identical test
or with an equivalent form of the test
27. Repeating the identical test on a second occasion
Temporal stability
When variables are stable ex: motor coordination, finger
dexterity, aptitude, capacity to learn
Correlate the scores from the first test and second test.
The higher the correlation the more reliable
28. Same person is tested with one form on the first occasion
and with another equivalent form on the second
Equivalence;
Temporal stability and consistency of response
Used for personality and mental ability tests
Correlate scores on the first form and scores on the second
form
29. Two scores are obtained for each person by dividing the test
into equivalent halves
Internal consistency;
Homogeneity of items
Used for personality and mental ability tests
The test should have many items
Correlate scores of the odd and even numbered items
Convert the obtained correlation coefficient into a
coefficient estimate using Spearman Brown
30. When computing for binary (e.g., true/false)
items
Consistency of responses to all items
Used if there is a correct answer (right or
wrong)
Use KR #20 or KR #21 formula
31. The reliability that would result if all values
for each item were standardized (z
transformed)
Consistency of responses to all items
Homogeneity of items
Used for personality tests with multiple
scored-items
Use the cronbach’s alpha formula
32. Consistency of responses to all items
Homogeneity of items
Used for personality tests with multiple
scored-items
Each item is correlated with every item in the
test
33. Having a sample of test papers independently scored by two
examiners
To decrease examiner or scorer variance
Clinical instruments employed in intensive individual tests
ex. projective tests
The two scores from the two raters obtained are correlated
with each other
34. Degree to which the test actually measures
what it purports to measure
35. Systematic examination of the test content to determine
whether it covers a representative sample of the behavior
domain to be measured.
More appropriate for achievement tests & teacher made
tests
Items are based on instructional objectives, course syllabi &
textbooks
Consultation with experts
Making test-specifications
36. Prediction from the test to any criterion
situation over time interval
Hiring job applicants, selecting students for
admission to college, assigning military
personnel to occupational training programs
Test scores are correlated with other criterion
measures ex: mechanical aptitude and job
performance as a machinist
37. Tests are administered to a group on whom
criterion data are already available
Diagnosing for existing status ex. entrance
exam scores of students for college with their
average grade for their senior year.
Correlate the test score with the other
existing measure
38. The extent to which the test may be said to
measure a theoretical construct or trait.
Used for personality tests. Measures that
are multidimensional
Correlate a new test with a similar
earlier test as measured approximately the
same general behavior
Factor analysis
Comparison of the upper and lower
group
Point-biserial correlation (pass and fail
with total test score)
Correlate subtest with the entire test
39. The test should correlate significantly from
variables it is related to
Commonly for personality measures
Multitrait-multidimensional matrix
40. The test should not correlate significantly
from variables from which it should differ
Commonly for personality measures
Multitrait-multidimensional matrix
41.
42.
43.
44.
45. Item Difficulty –The percentage of
respondents who answered an item correctly
Item Discrimination – Degree to which an
item differentiates correctly among test
takers in the behavior that the test is
designed to measure.
46. Difficulty Index Remark
.76 or higher Easy Item
.25 to .75 Average Item
.24 or lower Difficult Item
47. .40 and above -Very good item
.30 - .39 - Good item
.20 - .29 - Reasonably Good item
.10 - .19 - Marginal item
Below .10 - Poor item
49. 1. Salvador Dali is
a. a famous Indian.
b. important in international law.
c. known for his surrealistic art.
d. the author of many avant-garde plays.
•It is recommended that the stem be a direct question.
•The stem should pose a clear, define, explicit, and singular
problem.
Why is the item faulty?
50. IMPROVED:With which one of the fine arts is
Salvador Dali associated?
a. surrealistic painting
b. avant-garde theatre
c. polytonal symphonic music
d. impressionistic poetry
51. 2. Milk can be pasteurized at home by
a. heating it to a temperature of 130o
b. Heating it to a temperature of 145o
c. Heating it to a temperature of 160o
d. Heating it to a temperature of 175o
•Include in the stem any words that might otherwise be repeated
in each response.
Why is the item faulty?
53. 3. Although the experimental research, particularly that by
Hansmocker must be considered equivocal and assumptions
viewed as too restrictive, most testing experts would
recommend as the easiest method of significantly improving
paper-and-pencil achievement test reliability to
a. increase the size of the group being tested.
b. increase the differential weighting of items.
c. increase the objective of scoring.
d. increase the number of items.
e. increase the amount of testing time.
Items should be stated simply and understandably, excluding
all nonfunctional words from stem and alternatives.
Why is the item faulty?
54. IMPROVED: Assume a 10-item, 10-minute paper-
and-pencil multiple choice achievement test has a
reliability of .40.The easiest way of increasing the
reliability to .80 would be to increased
a. group size
b. scoring objectivity
c. differential item scoring weights
d. the number of items
e. testing time
55. 4. None of the following cities is a state
capital except
a. Bangor
b. Los Angeles
c. Denver
d. New Haven
•Avoid negatively stated items
Why is the item faulty?
56. IMPROVED:Which of the following cities is a
state capital?
a. Bangor
b. Los Angeles
c. Denver
d. New Haven
57. 5. Who wrote Harry Potter and the Goblet of
Fire?
a. J. K. Rowling
b. Manny Paquiao
c. Lea Salonga
d. MarkTwain
•If possible the alternatives should be presented in some logical,
numerical, or systematic order.
•Response alternatives should be mutually exclusive.
Why is the item faulty?
59. 6. Which of the following statements makes
clear the meaning of the word “electron”?
a.An electronic tool
b. Neutral particles
c. Negative particles
d. A voting machine
e.The nuclei of atoms
•Make all responses plausible and attractive to the less
knowledgeable and skillful student.
Why is the item faulty?
60. IMPROVED:Which of the following phrases is
a description of an “electron”?
a. Neutral particle
b. Negative particle
c. Neutralized proton
d. Radiated particle
e. Atom nucleus
61. 7. What is the area of a right triangle whose
sides adjacent to the right angle are 4 inches
long respectively?
a. 7
b. 12
c. 25
d. None of the above
•The response alternative “None of the above” should be used
with caution, if at all.
Why is the item faulty?
62. IMPROVED:What is the area of a right
triangle whose sides adjacent to the right
angle are 4 inches and 3 inches respectively?
a. 6 sq. inches
b. 7 sq. inches
c. 12 sq. inches
d. 25 sq. inches
e. None of the above
63. 8. As compared with the American factory worker in
the early part of the 19th century, the American
factory worker at the close of the century
a. was working long hours
b. received greater social security benefits
c. was to receive lower money wages
d. was less likely to belong to a labor union.
e. became less likely to have personal contact with
employers
Make options grammatically parallel to each other and
consistent with the stem.
Why is the item faulty?
64. IMPROVED: As compared with the American
factory worker in the early part of the century, the
American factory worker at the close of the century
a. worked longer hours.
b. had more social security.
c. received lower money wages.
d. was less likely to belong to a labor union
e. had less personal contact with his employer
65. 9. The “standard error of estimate’ refer to
a. the objectivity of scoring.
b. the percentage of reduced error variance.
c. an absolute amount of possible error.
d. the amount of error in estimating criterion
scores.
Avoid such irrelevant cues as “common elements” and “pat
verbal associations.”
Why is the item faulty?
66. IMPROVED:The “standard error of estimate”
is most directly related to which of the
following test characteristic?
a. Objectivity
b. Reliability
c.Validity
d. Usability
e. Specificity
67. 10. What name is given to the group of
complex organic compounds that occur in
small quantities in natural foods that are
essential to normal nutrition?
a. Calorie
b. Minerals
c. Nutrients
d.Vitamins
In testing for understanding of a term or concept, it is generally
preferable to present the term in the stem and alternative
definitions in the options.
Why is the item faulty?