Selecting Measuring Instruments for Educational Research

Educational Research
Chapter 5
Selecting Measuring Instruments
Gay, Mills, and Airasian

Topics Discussed in this Chapter
 Data collection
 Measuring instruments
 Terminology
 Interpreting data
 Types of instruments
 Technical issues
 Validity
 Reliability
 Selection of a test

Data Collection
 Scientific inquiry requires the collection,
analysis, and interpretation of data
 Data – the pieces of information that are
collected to examine the research topic
 Issues related to the collection of this
information are the focus of this chapter

Data Collection
 Terminology related to data
 Constructs – abstractions that cannot be
observed directly but are helpful when
trying to explain behavior

Intelligence

Teacher effectiveness

Self concept
Obj. 1.1 & 1.2

Data Collection
 Data terminology (continued)
 Operational definition – the ways by which
constructs are observed and measured

Weschler IQ test

Virgilio Teacher Effectiveness Inventory

Tennessee Self-Concept Scale
 Variable – a construct that has been
operationalized and has two or more
values
Obj. 1.1 & 1.2

Data Collection
 Measurement scales
 Nominal – categories

Gender, ethnicity, etc.
 Ordinal – ordered categories

Rank in class, order of finish, etc.
 Interval – equal intervals

Test scores, attitude scores, etc.
 Ratio – absolute zero

Time, height, weight, etc.
Obj. 2.1

Data Collection
 Types of variables
 Categorical or quantitative

Categorical variables reflect nominal scales
and measure the presence of different qualities
(e.g., gender, ethnicity, etc.)

Quantitative variables reflect ordinal, interval, or
ratio scales and measure different quantities of
a variable (e.g., test scores, self-esteem
scores, etc.)
Obj. 2.2

Data Collection
 Types of variables
 Independent or dependent

Independent variables are purported causes

Dependent variables are purported effects

Two instructional strategies, co-operative groups and
traditional lectures, were used during a three week social
studies unit. Students’ exam scores were analyzed for
differences between the groups.
 The independent variable is the instructional approach (of
which there are two levels)
 The dependent variable is the students’ achievement
Obj. 2.3

Measurement Instruments
 Important terms
 Instrument – a tool used to collect data
 Test – a formal, systematic procedure for
gathering information
 Assessment – the general process of
collecting, synthesizing, and interpreting
information
 Measurement – the process of quantifying
or scoring a subject’s performance
Obj. 3.1 & 3.2

 Important terms (continued)
 Cognitive tests – examining subjects’
thoughts and thought processes
 Affective tests – examining subjects’
feelings, interests, attitudes, beliefs, etc.
 Standardized tests – tests that are
administered, scored, and interpreted in a
consistent manner
Obj. 3.1

 Selected response item format – respondents
select answers from a set of alternatives

Multiple choice

True-false

Matching
 Supply response item format – respondents
construct answers

Short answer

Completion

Essay
Obj. 3.3 & 11.3

 Individual tests – tests administered on an
individual basis
 Group tests – tests administered to a group
of subjects at the same time
 Performance assessments – assessments
that focus on processes or products that
have been created
Obj. 3.6

 Interpreting data
 Raw scores – the actual score made on a
test
 Standard scores – statistical
transformations of raw scores

Percentiles (0.00 – 99.9)

Stanines (1 – 9)

Normal Curve Equivalents (0.00 – 99.99)
Obj. 3.4

 Interpreting data (continued)
 Norm-referenced – scores are interpreted
relative to the scores of others taking the
test
 Criterion-referenced – scores are
interpreted relative to a predetermined
level of performance
 Self-referenced – scores are interpreted
relative to changes over time
Obj. 3.5

 Types of instruments
 Cognitive – measuring intellectual
processes such as thinking, memorizing,
problem solving, analyzing, or reasoning
 Achievement – measuring what students
already know
 Aptitude – measuring general mental
ability, usually for predicting future
performance
Obj. 4.1 & 4.2

 Types of instruments (continued)
 Affective – assessing individuals’ feelings,
values, attitudes, beliefs, etc.

Typical affective characteristics of interest
 Values – deeply held beliefs about ideas, persons, or
objects
 Attitudes – dispositions that are favorable or unfavorable
toward things
 Interests – inclinations to seek out or participate in
particular activities, objects, ideas, etc.
 Personality – characteristics that represent a person’s
typical behaviors
Obj. 4.1 & 4.5

 Affective (continued)

Scales used for responding to items on affective tests
 Likert

Positive or negative statements to which subjects
respond on scales such as strongly disagree,
disagree, neutral, agree, or strongly agree
 Semantic differential

Bipolar adjectives (i.e., two opposite adjectives) with a
scale between each adjective

Dislike: ___ ___ ___ ___ ___ :Like
 Rating scales – rankings based on how a subject would
rate the trait of interest
Obj. 5.1

 Affective (continued)

Scales used for responding to items on
affective tests (continued)
 Thurstone – statements related to the trait of interest
to which subjects agree or disagree
 Guttman – statements representing a uni-
dimensional trait
Obj. 5.1

 Issues for cognitive, aptitude, or affective
tests
 Problems inherent in the use of self-report
measures

Bias – distortions of a respondent’s performance or
responses based on ethnicity, race, gender, language,
etc.

Responses to affective test items
 Socially acceptable responses
 Accuracy of responses
 Response sets
 Alternatives include the use of projective tests
Obj. 4.3, 4.4

Technical Issues
 Two concerns
 Validity
 Reliability

Technical Issues
 Validity – extent to which interpretations
made from a test score are appropriate
 Characteristics

The most important technical characteristic

Situation specific

Does not refer to the instrument but to the
interpretations of scores on the instrument

Best thought of in terms of degree
Obj. 6.1 & 7.1

Technical Issues
 Validity (continued)
 Four types

Content – to what extent does the test measure
what it is supposed to measure
 Item validity
 Sampling validity
 Determined by expert judgment
Obj. 7.1 & 7.2

Technical Issues

Criterion-related
 Predictive – to what extent does the test predict a
future performance
 Concurrent - to what extent does the test predict a
performance measured at the same time
 Estimated by correlations between two tests

Construct – the extent to which a test measures
the construct it represents
 Underlying difficulty defining constructs
 Estimated in many ways
Obj. 7.1, 7.3, & 7.4

Technical Issues
 Consequential – to what extent are the
consequences that occur from the test
harmful

Estimated by empirical and expert judgment
 Factors affecting validity

Unclear test directions

Confusing and ambiguous test items

Vocabulary that is too difficult for test takers
Obj. 7.1, 7.5, & 7.7

Technical Issues
 Factors affecting validity (continued)
 Overly difficult and complex sentence
structure
 Inconsistent and subjective scoring
 Untaught items
 Failure to follow standardized
administration procedures
 Cheating by the participants or someone
teaching to the test items
Obj. 7.7

Technical Issues
 Reliability – the degree to which a test
consistently measures whatever it is
measuring
 Characteristics

Expressed as a coefficient ranging from 0 to 1

A necessary but not sufficient characteristic of
a test
Obj. 6.1, 8.1, & 8.7

Technical Issues
 Reliability (continued)
 Six reliability coefficients

Stability – consistency over time with the same
instrument
 Test – retest
 Estimated by a correlation between the two
administrations of the same test

Equivalence – consistency with two parallel
tests administered at the same time
 Parallel forms
 Estimated by a correlation between the parallel tests
Obj. 8.1, 8.2, 8.3, & 8.7

Technical Issues
 Six reliability coefficients (continued)

Equivalence and stability – consistency over
time with parallel forms of the test
 Combines attributes of stability and equivalence
 Estimated by a correlation between the parallel forms

Internal consistency – artificially splitting the
test into halves
 Several coefficients – split halves, KR 20, KR 21, Cronbach
alpha
 All coefficients provide estimates ranging from 0 to 1
Obj. 8.1, 8.4, 8.5, & 8.7

Technical Issues
 Six reliability coefficients

Scorer/rater – consistency of observations
between raters
 Inter-judge – two observers
 Intra-judge – one judge over two occasions
 Estimated by percent agreement between
observations
Obj. 8.1, 8.6, & 8.7

Technical Issues
 Six reliability coefficients (continued)

Standard error of measurement (SEM) – an
estimate of how much difference there is
between a person’s obtained score and his or
her true score
 Function of the variation of the test and the reliability
coefficient (e.g., KR 20, Cronbach alpha, etc.)
 Estimated by specifying an interval rather than a
point estimate of a person’s score
Obj. 8.1, 8.7, & 9.1

Selection of a Test
 Sources of test information
 Mental Measurement Yearbooks (MMY)

The reviews in MMY are most easily accessed through
your university library and the services to which they
subscribe (e.g., EBSCO)

Provides factual information on all known tests

Provides objective test reviews

Comprehensive bibliography for specific tests

Indices: titles, acronyms, subject, publishers, developers
 Buros Institute
Obj. 10.1 & 12.1

Selection of a Test
 Sources (continued)
 Tests in Print

Tests in Print is a subsidiary of the Buros Institute

The reviews in it are most easily accessed through your
university library and the services to which they
subscribe (e.g., EBSCO)

Bibliography of all known commercially produced tests
currently available

Very useful to determine availability
 Tests in Print
Obj. 10.1 & 12.1

Selection of a Test
 ETS Test Collection

Published and unpublished tests

Includes test title, author, publication date, target
population, publisher, and description of purpose

Annotated bibliographies on achievement, aptitude,
attitude and interests, personality, sensory motor, special
populations, vocational/occupational, and miscellaneous
 ETS Test Collection
Obj. 10.1 &12.1

Selection of a Test
 Professional journals
 Test publishers and distributors
 Issues to consider when selecting tests
 Psychometric properties

Validity

Reliability

Length of test

Scoring and score interpretation
Obj. 10.1, 11.1, & 12.1

Selection of a Test
 Issues to consider when selecting tests
 Non-psychometric issues

Cost

Administrative time

Objections to content by parents or others

Duplication of testing
Obj. 11.1

Selection of a Test
 Designing your own tests
 Get help from others with experience in
developing tests
 Item writing guidelines

Avoid ambiguous and confusing wording and sentence
structure

Use appropriate vocabulary

Write items that have only one correct answer

Give information about the nature of the desired answer

Do not provide clues to the correct answer
 See Writing Multiple Choice Items
Obj. 11.2

Selection of a Test
 Test administration guidelines
 Plan ahead
 Be certain that there is consistency across
testing sessions
 Be familiar with any and all procedures
necessary to administer a test
Obj. 11.4

Selecting Measuring Instruments for Educational Research

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (15)

Similaire à Selecting Measuring Instruments for Educational Research

Similaire à Selecting Measuring Instruments for Educational Research (20)

Dernier

Dernier (20)

Selecting Measuring Instruments for Educational Research