TESTING Validity: Internal Validity of Test Items and Item Analysis
1. Page 1
Course VALIDITY & ASSESSMENT
Learner/Practitioner Assessment Project
Purpose of the Assessment: The purpose of the test was to assess knowledge of nurses
completing an in-service training about “Patient Safety”.
Persons being assessed: Learners who took the “Patient Safety” test are nursing staff on the
8th Floor Vanderbilt University Medical Center with varying degrees of experience in and outside
of Vanderbilt and total years of nursing experience. The learners were attendees of the
inservice and the inservice would count towards their 4 hours annual required inservice time.
This works as a motivator to get nurses to attend inservices.
Framework – content: The content for the inservice was derived from current findings
published by the Institute for Healthcare Improvement Safety Initiative called Transforming Care
at the Bedside (TCAB) (Viney et al. 2006). The concepts in the inservice were presented to
staff to help explain key quality and safety concepts about inpatient acute hospital falls, hospital
medication errors, adverse events, and nosocomial pressure ulcers. One arm of
recommendations stemming from TCAB is that nurses and teams benefit from current
knowledge and awareness about evidenced based research regarding patient safety and
hospital quality improvement.
Framework – measurement and outcome level: The assessment for this inservice was
criterion-referenced framework. The level of learning outcome being assessed is 3A, Learning:
Declarative Knowledge measured by posttest (Moore et al. 2009). The passing score for this
test was 70%. If learners did not achieve a score of 70% or greater, they did not receive a full
hour of inservice time. Out of 37 taking the assessment, 33 scored above this 70% mark.
Data Collection tool:12 item True or False questions online web-based posttest. The link was
emailed to each attendee the Friday following the 4 separate nightshift and dayshift inservice
events. The test was not proctored, there was no discussion of using other resources and
attendees were told that it would be based on the power point lecture. They were told that 70%
would be passing.
Person(s) completing the data collection tool: Participants in the inservice complete the test.
Frequency of data collection and the sample: The test was assigned once after the
inservices and taken online within two weeks of inservice for full inservice time. It is a one time,
no remediation test. 100% of inservice attendees took test.
Descriptive Results from the data set:
2. Page 2
One leaner did not answer one item. 2 people are missing from some of this data. One learner
did not answer every question and another learner was not a nurse but an ancillary staff
member. Their data was removed from reliability testing and item analysis. This first bar chart
describes all test takers, their percent of items correct, the mean of 91%, and standard deviation
of 16.5.
TABLE 1.
Percent Correct
Number of
Learners
Percent Correct
3. Page 3
TABLE 2.
All Learners Percent Correct
Frequency Percent
Valid Percent Cumulative
Percent
33.33
1
2.7
2.7
2.7
41.67
1
2.7
2.7
5.4
58.33
1
2.7
2.7
8.1
66.67
1
2.7
2.7
10.8
75.00
2
5.4
5.4
16.2
83.33
1
2.7
2.7
18.9
91.67
7
18.9
18.9
37.8
100.00 23
62.2
62.2
100.0
Total
100.0
100.0
Percent
Correct
Valid
37
TABLE 3.
Reliability Statistics
Cronbach's
Alpha
Cronbach's
N of Items
Alpha Based
on
Standardized
Items
.855
.866
11
In Table 3 The number of items for which we could perform reliability testing is 11. One item is
not included in the reliability measure because not all learners answered the question.
4. Page 4
TABLE 4.
Mean
Std. Deviation N
.7429
.44344
35
device_pu_scored
Device related pressure
ulcers may be
unpreventable when a
patient is compromised .8857
nutritionally, has poor
perfusion and must
have device secured in
place for life support.
.32280
35
toiletting_scored Per
Vanderbilt policy, if you
assist a patient to the
.9429
toilet, you must stay with
them.
.23550
35
reimbursed_scored As
of 2012, hospitals are
reimbursed related to
their patient safety
scores.
.9143
.28403
35
rrt_scored Rapid
Response Systems
were designed to
prevent failure to
rescue. Calling Rapid
.9714
Response for first
recognition of trigger is
the reliable way to
ensure Rapid Response
Systems remain reliable.
.16903
35
gait_belts_scored Gait
belts are used to
prevent falls.
5. Page 5
reliability_scored
Hospital reliability and
nursing communication
related to patient safety
must include checklists, .9714
standardized
communication formats
and information system
checks.
.16903
35
transfusion_scored
Transfusion errors begin
.9714
at the point of collecting
the specimen.
.16903
35
ebp_fall_scored Some
hospitals are using hip
protectors and helmets .9714
on patients who are
known for falling.
.16903
35
stop_pu_scored
Pressure ulcers are
prevented by
appropriate surface
selection, regular
repositioning and
.8571
turning, optimizing
temperature control, and
preventing
moisture/providing
moisture barrier
products.
.35504
35
fall_liability_scored
Patients who fall who
have stated a high
desire for
independence, who
.8857
have stated they do not
have to use the call bell,
can not hold us liable if
they fall and are hurt.
.32280
35
6. Page 6
zero_scored Falls are
preventable and
achieving zero falls has .8286
been attained in other
hospitals.
.38239
35
In Table 4 the item statistics are presented. The mean percent of learners getting the item
correct for each item is in the column labeled mean. 2 people are missing from this data. One
learner did not answer every question and another learner was not a nurse but an ancillary staff
member. Their data was removed from reliability testing and item analysis. The first item “Gait
belts are used to prevent falls” is a false statement. I suspect that it may have been a tricky
question. A true statement would be “Gait belts are used to prevent injury during falls.” I think
the reason people got it wrong is that it is just a little bit tricky.
TABLE 5.
Item-Total Statistics
Scale Mean if Scale
Item Deleted Variance if
Item Deleted
Corrected
Item-Total
Correlation
Squared
Multiple
Correlation
Cronbach's
Alpha if Item
Deleted
gait_belts_scored Gait
belts are used to prevent 9.2000
falls.
2.929
.690
.
.834
device_pu_scored
Device related pressure
ulcers may be
unpreventable when a
patient is compromised 9.0571
nutritionally, has poor
perfusion and must have
device secured in place
for life support.
3.291
.664
.
.833
toiletting_scored Per
Vanderbilt policy, if you
assist a patient to the
9.0000
toilet, you must stay with
them.
3.471
.737
.
.832
7. Page 7
reimbursed_scored As
of 2012, hospitals are
reimbursed related to
their patient safety
scores.
9.0286
3.499
.558
.
.842
rrt_scored Rapid
Response Systems were
designed to prevent
failure to rescue. Calling
Rapid Response for first 8.9714
recognition of trigger is
the reliable way to
ensure Rapid Response
Systems remain reliable.
3.793
.533
.
.848
reliability_scored
Hospital reliability and
nursing communication
related to patient safety
must include checklists, 8.9714
standardized
communication formats
and information system
checks.
3.793
.533
.
.848
transfusion_scored
Transfusion errors begin
8.9714
at the point of collecting
the specimen.
3.852
.441
.
.852
ebp_fall_scored Some
hospitals are using hip
protectors and helmets
on patients who are
known for falling.
3.970
.260
.
.859
8.9714
8. Page 8
stop_pu_scored
Pressure ulcers are
prevented by
appropriate surface
selection, regular
repositioning and
9.0857
turning, optimizing
temperature control, and
preventing
moisture/providing
moisture barrier
products.
3.081
.775
.
.822
fall_liability_scored
Patients who fall who
have stated a high
desire for independence,
who have stated they do 9.0571
not have to use the call
bell, can not hold us
liable if they fall and are
hurt.
3.114
.838
.
.817
zero_scored Falls are
preventable and
achieving zero falls has 9.1143
been attained in other
hospitals.
3.692
.228
.
.876
Table 5 describes item statistics. Each Cronbach’s Alpha is very good and is calculated to
predict internal consistency. This can serve as an index of consistency and an approximation to
test-retest reliability.
Measurement Characteristics:
Reliability
We are able to come up with measures for internal consistency such as calculating the test
item intercorrelations and reject or accept the questions with the highest or lowest reliability
coefficient. We were able to accept all items and the last item makes no difference.
My index of consistency used was the Cronbach’s Alpha. It was 0.86 for 11 test items. This
is a very good level of internal consistency. The Standard deviation for this test is 16.54. The
9. Page 9
mean score is 91.2. This means the average test score of all test participants was 91.2%. The
standard error of measurement (SEM) is an estimate of error to use in interpreting an
individual’s test score. A test score is an estimate of a person’s “true” test performance. Using a
reliability coefficient and the test’s standard deviation, we can calculate this value:
SEM =sd 1 – r) The Standard Error of Measurement = 6.40. The SEM of the test scores
of the test participants was 6.40.
With 99% confidence the mean true test score lies between 74.69 and 100. (16.51) With
95% confidence the mean true test score lies between 78.66 and 100. (12.54)
Validity
The validity of this assessment is that this assessment was a measure of how much was
understood about concepts and ideas presented in a staff inservice about safety and quality.
Nurses who do not have a general understanding of key ideas about safety quality may have
less motivation implementing new processes and strategies to improve quality and safety.
Decisions: Those that score 70% in this assessment will be given a full inservice hour towards
their total 4 hours required by the department. If they score less than 70% they only receive a
half hour. This assessment would be formative in that it would give feedback to learners about
where they have weakness or where they could do further study.
The content validity was assured because each question on the test was exactly quoted
from the inservice and from the power point slides shown at the inservice. The content was
related to the learning objectives given at the beginning of the class.
Construct validity about the content of the inservice is related to the importance of
understanding key points about patient safety and quality in the hospital setting. These ideas
are also key points reflected in Joint Commissions National Patient Safety Goals. Vanderbilt
University Medical Center also has 5 Pillar Goals for 2012 that relate to patient safety and
quality including preventing falls and pressure ulcers. The questions came directly from the
lecture. And the content of the assessment is the content from the inservice materials.
When taking Kane’s “argument-based approach to validity”, and using “Criterion 1:
Clarity of the Argument” the inservice lecture and test is based directly on the newest evidenced
based points that comprise a better understanding of content of the Transforming Care At the
Bedside initiative and the National Patient Safety Goals set by The Joint Commission. These
points of evidence lay the foundation for understanding patient safety and quality improvement
initiatives that are occurring in American hospitals. The inservice was conducted as a way to
spread the latest evidenced based information and increase the nurse’s base
knowledge.According to Criterion 2: Coherence of the Argument, by assuring transmitted
evidenced based information that is relevant to a nurse’s work, the test is a way to measure the
transmission of the information.According to Criterion 3: Plausibility of the assumptions, it is very
10. Page 10
plausible that the test is valid because the test questions are exactly quoted from the lecture
and power point slides when test questions are true. When the test question is false the
statement is changed in a simple way to make it false.Other sources of “error,”and other
sources of unwanted variance that might undermine the measurement characteristics of this
assessment are various things. I’ve listed nine examples of possible sources of error. 1. The
test taker not being present during the inservice would undermine the results of the test. 2. The
questions must be phrased in a clear non-confusing way. 3. There could be and was an
attendee who was not a nurse but an ancillary staff member that wanted to attend and take the
test. I did not include them in the reliability and item analysis calculations. 4. There are other
factors such as learning or reading disabilities that any of the participants may possess that may
interfere with test taking ability. 5. There could have been a distractor that caused the test
participant to accidentally mark an answer they did not intend. 6. The test was given through
Redcap, scoring was precise and was completed using SPSS. 7. Some of the nurses may
have already known the information and to the degree that the inservice was unnecessary. 8.
While this patient safety inservice is not given to improve patient safety directly, it is given to
improve the nurse’s motivation and involvement in unit and patient safety awareness. 9. It is
possible that those who scored poorly had already met their inservice time requirement and did
not take the test seriously. There are numerous other possible sources of error (Kane 1992).
Improvement Plan
1.
The first way to ensure that knowledge is being gained is to use this test as the pre-test.
I could assign this test before giving the class to assess the baseline knowledge.
2.
One aspect that could be improved upon is content validity. I could approach this by
having a few nurse colleagues assess the test for content as well as question writing
(Miller & Linn 2000).
3.
Next I could repeat this test and measure test item intercorrelations. I could re-conduct
this inservice on another floor with a separate and new cohort and see if my data is
different and in what way.
11. Page 11
References
1. Viney MM, Batcheller JM, Houston S, Belcik KB. Transforming Care at the Bedside:
Designing New Care Systems in an Age of Complexity. Journal of Nursing Care Quality
April. 2006;21(2):143–150.
2. Rutherford P, Moen R, Taylor J. TCAB: The “How” and the “What.” AJN, American
Journal of Nursing. 2009;109:5–17.
3. Kane MT. An argument-based approach to validity.Psychological Bulletin.
1992;112(3):527–535.
4. Moore Jr DE. Achieving desired results and improved outcomes: integrating planning
and assessment throughout learning activities.J CONTIN EDUC HEALTH PROF.
2009;29(1):1.
5. Miller DM, & Linn RL.Validation of performance-based assessments.APPL PSYCH
MEAS. 2000;24(4):367.