Part 2

Research
Methods for
Organizational
Studies
- 2ed, 2004
LIU YING
copyrightⓒ 2013 All rights reserved by
LIU YING
1

3. Measurement Foundations
Validity and Validation
• Construct Definitions
– Construct Domain
– Nomological Networks
• Construct Definition Illustration
• Construct Validity Challenges
– Random Errors
– Systematic Errors
– Scores Are Critical
• Construct Validation
– Content Validity
– Reliability
• Types of Reliability
• Reliability and Construct Validity
– Convergent Validity
– Discriminant Validity
– Criterion-Related Validity
– Investigating Nomological Networks
• Summary
• For Review
– Terms to Know
– Things to Know
LIU YING
2

Construct Definitions
Measurement produces numerical values that are designed to
summarize characteristics of cases under study. A measure
is an instrument to record such scores. Construct valid measures
yield numerical values that accurately represent the
characteristic.
The most useful conceptual definitions have two elements.
 Construct Domain: First, useful conceptual definitions
identify the nature of the construct by specifying its meaning.
This element explains what a researcher has in mind for
the construct;
 Nomological Networks: The second element also should
specify how the construct of interest relates to other
constructs in a broader web of relationships
LIU YING
3

Construct Definition Illustration
LIU YING
4

Construct Validity Challenges
• Construct Validity Challenges
– Random Errors
• The more items, the more successfully this type
of random error is controlled.
– Systematic Errors
• A measure is Contaminated or Deficient
– Scores Are Critical
• Systematic variance is sometimes called true
score variance
LIU YING
5

ScoresAreCritical
LIU YING
6

Scores Are Critical
• Exhibit 3.2 also shows three sources of construct
invalidity:
– 1. Scores on a measure may be less than construct
valid because of deficiency. In the example, observed
scores are deficient because the measure does not
capture satisfaction with computer durability.
– 2. Scores may be less than construct valid because of
systematic contamination. In the example, observed
scores are contaminated because the measure includes
satisfaction with the purchasing experience.
– 3. Finally, scores on a measure are less than construct
valid to the extent that they include random errors.
LIU YING
7

Construct Validation
Because construct validity cannot be assessed directly, it cannot
be directly established. However, there are procedures available
to help researchers develop construct valid measures and to
help evaluate those measures once developed. Six such
procedures are described here.
• Content Validity
• Reliability
– Types of Reliability
– Reliability and Construct Validity
• Convergent Validity
• Discriminant Validity
• Criterion-Related Validity
• Investigating Nomological Networks
LIU YING
8

Content Validity
• A measure is content valid when its items art judged to accurately
reflect the domain of the construct as defined conceptually.
Content validation ordinarily has experts in the subject matter of
interest provide assessments of content validity.
• As a part of development, the researcher has a panel of experts in
computer programming review the measure for its content. Content
validation of this sort provides information about potential
systematic errors in measures.
• Content validation can help improve the items that form a
measure. Nevertheless, it is not sufficient for construct validity. In
particular, content validation procedures may not provide
information about potential deficiency, nor can subject matter
experts provide much information about random errors that may
be present in the scores that are obtained.
LIU YING
9

Reliability
• Types of Reliability
– Internal consistency: reliability refers to the similarity of item
scores obtained on a measure that has multiple items. It can be
assessed when items are intended to measure a single construct. In
the computer example (Exhibit 3.1), satisfaction is measured with six
items. The internal consistency of that questionnaire can be estimated if
scores are available from a set of cases.
– Interrater reliability: indicates the degree to which a group of
observers or raters provide consistent evaluations. For example, the
observers may be a group of international judges who are asked to
evaluate ice skaters performing in a competition. In this case, the judges
serve as measurement repetitions just as the items serve as repetitions
in the computer satisfaction questionnaire. High reliability is obtained
when the judges agree on the evaluation of each
– Stability reliability: refers to the consistency of measurement results
across time. Here measurement repetitions refer to time periods (a measure
is administered more than once).
• Reliability and Construct Validity
*** There are three common contexts in which researchers seek to assess the reliability of measurement.
Chapter 17 describes statistical procedures for estimating these types of reliability.
LIU YING
10

Reliability and Construct Validity
Reliability speaks only to a measure's freedom from
random errors. It does not address systematic errors
involving contamination or deficiency. Reliability is
thus necessary for construct validity but not
sufficient. It is necessary because unreliable
variance must be construct invalid. It is not sufficient
because systematic variance may be contaminated
and because reliability simply does not account for
deficiency. In short, reliability addresses only
whether scores are consistent; it does not address
whether scores capture a particular construct as
defined conceptually.
LIU YING
11

Convergent Validity
• Convergent validity is present when there is a
high correspondence between scores from two
or more different measures of the same
construct. Convergent validity is important
because it must be present if scores from both
measures are construct valid. But convergent
validity is not sufficient for construct validity any
more than is reliability.
LIU YING
12

Convergent Validity
The area crossed with vertical lines shows the
proportion of variance in scores from the two measures
that is convergent. However, only the area also crossed
with horizontal lines shows common construct valid
variance. The area covered only by vertical lines shows
where the two measures share variance that
represents contamination from a construct validity
perspective. Convergent validity also does not address
whether measures are deficient. Nor does it provide
construct validity information about the proportion of
variance in the two measures that do not converge.
Exhibit 3.3 shows that more of the variance unique to
the measure A overlaps with construct variance than
variance from measure B.
Despite these limitations, evidence of convergent
validity is desirable. If two measures that are designed
to measure the same construct do not converge, at
least one of them is not construct valid. Alternatively, if
they do converge, circumstantial evidence is obtained
that they may both be construct valid. Evidence of
convergent validity adds to a researcher's confidence
in the construct validity of measures.
LIU YING
13

Discriminant Validity
• Discriminant validity is inferred when scores
from measures of different constructs do not
converge. It thus provides information about
whether scores from a measure of a construct are
unique rather than contaminated by other
constructs.
• Proposed constructs should provide
contributions beyond constructs already in the
research domain. Consequently, measures of
proposed constructs should show evidence of
discriminant validity with measures of existing
constructs.
LIU YING
14

Criterion-Related Validity
• Criterion-related validity is present when scores on a
measure are related to scores on another measure that better
reflects the construct of interest. It differs from convergent
validity, where scores from the measures are assumed to be
equivalent representations of the construct. In criterion-
related validity the criterion measure is assumed to have
greater construct validity than the measure being developed
or investigated.
• Why not just use the criterion measure if it has greater
construct validity?
• Historically, researchers used the term to describe
relationships between a construct (represented by the
measure under consideration) and a measure of another
construct that is thought to be conceptually related to the first.
LIU YING
15

Investigating Nomological
Networks
• Nomological networks have been described as
relationships between a construct under
measurement consideration and other
constructs.
• Researchers with a conceptual orientation are
interested in the conceptual relationship
between the independent and dependent
variable constructs.
• Specifically, the measurement researcher
assumes the conceptual relationship is true (line
a).
LIU YING
16

Networks
LIU YING
17

Networks
• A relationship observed between one measured construct and
a measure of another provides only limited evidence for
construct validity. Thus, researchers usually seek to create
more elaborate nomological networks. These may include
several variables that are expected to vary with a measure of
the construct. It also may include variables that are expected
not to vary with it to show evidence of discriminant validity.
• Evidence for construct validity mounts as empirical research
supports relationships expected from a nomological network.
The richer the network and the more support, the
greater a researcher's confidence that the measure is
capturing variance that is construct valid.
LIU YING
18

• Summary
• For Review
– Terms to Know
– Things to Know
LIU YING
19

4. Measurement Applications
Research Questionnaires
Questionnaire Decisions
• Alternatives to Questionnaire
Construction
• Secondary Data
• Questionnaires Developed by
Others
• Questionnaire Type
• Self-Reports Versus
Observations
• Interviews Versus Written
Questionnaires
Questionnaire Construction
• Content Domain
• Items
• Item Wording
• Item Sequence
• Scaling
Questionnaire Response Styles
• Self-Reports
• Observations
• Implications for Questionnaire
Construction and Use
Pilot Testing
Summary
For Review
• Terms to Know
• Things to Know
Part II Suggested Readings
LIU YING
20

Measurement Applications
Research Questionnaires
• Questionnaires are measuring instruments that ask
individuals to answer a set of questions.
• If the questions ask for information about the individual
respondents, they are called self-report
questionnaires. Information obtained in self-report
questionnaires include biographical
information, attitudes, opinions, and knowledge.
• Individuals may complete self-report questionnaires
by responding to written questions or to questions
shown on a computer terminal. Self-reports also
may be obtained through an interview in which
another individual (the interviewer) asks the questions
verbally and is responsible for recording responses.
LIU YING
21

• Alternatives to Questionnaire
Construction
• Secondary Data
• Questionnaires Developed by Others
• Questionnaire Type
• Self-Reports Versus Observations
• Interviews Versus Written Questionnaires
LIU YING
22

• Constructing a questionnaire is time-consuming
and challenging. It is particularly challenging
when abstract constructs are measured.
• There are two additional questions to address if
it is decided that a questionnaire must be
developed to carry out a research project.
– One, should information be obtained with a written
questionnaire or an interview?
– Two, if the questionnaire is designed to obtain
information about individuals, should the
questionnaire obtain it from outside observers or
from individuals reporting on themselves?
LIU YING
23

Alternatives to Questionnaire
Construction
Choosing measures depends foremost on the topic a researcher seeks to investigate. Given a
topic, a starting point is to see if the data you are interested in studying may already be
available. If not, a second step is to see if a measure(s) is available that will serve your
research interest.
• Secondary Data
– secondary data: data collected for some other purpose. Such data are available from many
sources.
– internal purposes, external requirements, by outside organizations (industry trade associations and
organizations, individuals)
• Questionnaires Developed by Others
• Another researcher may have
– already developed a questionnaire that addresses your research questions. Although data may not
be available, a questionnaire may be available that you can use to collect your own data.
(Questionnaires measuring constructs relating to many individual characteristics such as
ability, personality, and interests are readily available. Questionnaires are also available for measuring
characteristics of individuals interacting with organizations, such as employee satisfaction. A good method
for finding these measures is to examine research reports on topics related to your research interests.)
– If suitable for your research interests, questionnaires already constructed are obviously
advantageous in the time and effort they save. They are especially attractive if construct validation
research as described in the previous chapter has already been performed.
LIU YING
24

Questionnaire Type
Often secondary data or questionnaires already
developed are simply not viable options. This is
necessarily true when a researcher chooses
to investigate a construct that has not been
previously defined.
• Self-Reports Versus Observations
• Interviews Versus Written Questionnaires
LIU YING
25

Self-Reports Versus Observations
• Researchers are often curious about relationships that include
behaviors or characteristics of individuals interacting with
organizations.
• Some constructs require that the information be measured with
responses provided by research participants. (Attitudes and
opinions, intentions, interests, and preferences)
• However, there are other constructs that can be measured either
internally through self-reports or externally by observation. These
constructs typically involve overt behaviors, characteristics of
individuals that can be observed directly. Observations are typically
preferred when constructs can be assessed directly. External
observers are more likely to provide consistent assessments
across research participants. Furthermore, external observers may
be less likely to bias responses in a way that characterizes the
behavior in a favorable light.
LIU YING
26

Interviews Versus Written
Questionnaires
• A distinction is sometimes drawn between the development
of interviews and the development of questionnaires. This
distinction is largely unwarranted. The difference
between the two procedures resides primarily in the way
information is obtained from research participants.
Interviews elicit information verbally; questionnaires elicit
information in written form. The same care must be taken in
developing interview questions and response formats as is
taken in developing questionnaires.
• A case can be made that interviews allow greater flexibility.
Interviewers can follow up on answers with questions that
probe respondents' thinking in greater depth. Interviewers can
record responses and interviewee behaviors that are not
available as formal questionnaire responses.
LIU YING
27

Questionnaires
• These are differences that make interviews attractive in the early
stages of instrument development. Interviews can help researchers
refine questions to be asked and the response formats to be used.
However, when finalized, when a researcher is ready to collect data that
will be used to investigate the main research expectations, a typical
interview schedule will look much like a typical questionnaire.
• The decision about whether to use an interview or questionnaire as the final
measurement instrument depends on other criteria. Assuming the same
care in construction, questionnaires usually are less expensive to
administer. The decision to use an interview or a questionnaire also must
take account of respondents' abilities and motivations. Reading
abilities among some members of heterogeneous populations may make
the use of questionnaires problematic. Interviews may also be
advantageous from a motivational perspective. The interaction
that takes place between the interviewer and interviewee may be used
advantageously to motivate participation and complete responses.
LIU YING
28

Questionnaires
• Interaction between interviewers and interviewees
also poses dangers for interviews. There is a
greater risk that the administration of questions
differs from interview to interview.
Furthermore, because there is
interaction, interviewee responses may be
influenced by the particular individual conducting
the interview.
• It is generally desirable to use questionnaires
when possible. The importance of uniformity in
questions and response coding favors
questionnaires. When interviews are used, it is
important that they be conducted as systematically
as possible.
LIU YING
29

• Content Domain
• Items
• Item Wording
• Item Sequence
• Scaling
LIU YING
30

• Questionnaires, whether administered in
written form or through interviews, have
two essential characteristics. First, they
have items designed to elicit information of
research interest. Second, they have a
protocol for recording responses.
LIU YING
31

Questionnaire
Construction
LIU YING
32

Content Domain
• A properly designed study will identify the variables to be measured
by the time questionnaire development becomes an issue. If one or
more constructs are included, these should be carefully defined as
described in chapter 3. Items should follow closely from the
definitions.
• Typically, researchers also want to obtain additional information from
their questionnaires. At the very least, information will be sought
about personal descriptive characteristics of the questionnaire
respondents.
• Interesting side issues are likely to occur while the questionnaire is
being developed. As a consequence, it is often tempting to add
items that are not central to the research investigation. Resist this
temptation. Attend to developing a set of items that focus directly
and unequivocally on your research topic. Diverting attention to
related items and issues will likely reduce the quality of items that
are essential. Furthermore, response rates inevitably decline as
questionnaire length increases.
LIU YING
33

Items
• Item wording and the arrangement of items obviously affect
the responses obtained.
• Item Wording
– 1. Keep the respondent in mind.
– 2. Make it simple.
– 3. Be specific.
– 4. Be honest.
• Item Sequence
– The way items are ordered in a questionnaire is constrained by
the type of items included. For example, order is of little
consequence if items are all similar. However, order can
influence
– It is helpful to start a questionnaire with items that participants
find interesting and that are easy to complete.
– Ask for demographic information last.
LIU YING
34

Scaling
• An open-ended response format permits respondents to
answer questions in their own words. They are sometimes
used on small groups early in the questionnaire development
process to make sure the full range of potential responses is
captured. They also are sometimes used in
interviews, particularly when the questions are designed to
elicit complex responses.
• Closed-ended response formats in which respondents are
asked to choose the one category that most closely applies
to them. Closed-ended responses are easy to complete;
they are also easy to code reliably.
• Categories with equal intervals are attractive for conducting
statistical analyses on scores, as discussed in part IV.
LIU YING
35

Scaling
LIU YING
36

RESEARCH HIGHLIGHT 4.1
words to avoid in questionnaires
• Absolutes. Words expressing absolutes such as
always, never, everyone, and all create logical problems because
statements including item are almost always
• And. The word and usually signals that the Item is getting at two
ideas not one—a double-barreled question. Double-barreled
questions are problematic because responses may differ depending
on which "barrel" is considered.
• You, You is problematic if there can be any question about whether
it refers to the respondent or to a group the respondent represents
(e.g., an organization).
• Adjectives to describe quantity. Words such as
occasionally, sometimes, frequently, and often mean different things
to different people. One person's occasionally may be
equivalent, numerically, to another person's frequently. Use
numerical values when you want to obtain numerical information.
LIU YING
37

Questionnaire Response Styles
Chapter 3 noted that scores are critical for establishing the
construct validity of measures. This is a reminder that the value of
information obtained from questionnaires is determined by the quality
of scores obtained. Items and scaling formats alone, no matter how
elegant, do not guarantee successful questionnaire outcomes.
Research has established that characteristics of the individuals
completing questionnaires and the environments in which they
complete them often affect the scores obtained. Some of these
characteristics have been studied in situations involving self-reports;
others have been
studied on observational ratings.
• Self-Reports
• Observations
• Implications for Questionnaire Construction and Use
LIU YING
38

Self-Reports
• Two tendencies that influence self-report
responses have received substantial
attention. Social desirability refers to the
tendency to present oneself in a publicly
favorable light. For example, a socially desirable response expresses
approval for a public policy (e.g., the Supreme Court's decision on abortion) because the
respondent believes others approve. Response acquiescence
or yea-saying is a tendency to agree with a
statement regardless of its content. Of the
two, social desirability appears to be a more
general problem for questionnaire responses.
LIU YING
39

Observations
A number of response styles have also been identified when
individuals are asked to make observations about some object. These
include
• leniency error, a tendency to systematically provide a more
favorable response than is warranted.
• Severity error, a tendency to systematically provide less favorable
responses than warranted, is less frequent. Alternatively,
• central tendency error is present if an observer clusters
responses in the middle of a scale when more variable responses
should be recorded.
• Halo error is present when an observer evaluates an object in an
undifferentiated manner.
For example, a student may provide favorable evaluations to an instructor on all dimensions of teaching effectiveness because the
instructor is effective on one dimension of teaching.
LIU YING
40

Implications for Questionnaire
Construction and Use
Self-report and observer errors are difficult to identify in practice.
Furthermore, attempts to control for either self-report response styles
or observational errors through questionnaire construction have
only limited success.
• Forced-choice scales are designed to provide respondents with
choices that appear to be equal in social desirability or equal in
favorability. Behaviorally anchored observation or rating scales
(see Research Highlight 4.2) are designed to yield more
accurate ratings by providing respondents with meaningful scale
anchors to help generate scores that are less susceptible to
rating errors.
• Unfortunately, research investigations comparing formats on
common self-report and observational problems have not found one
format to be systematically better than others.
LIU YING
41

Pilot Testing
• No matter how much care is used, questionnaire construction remains an imprecise research
procedure. Before using a questionnaire for substantive research, it is essential to obtain
information by pilot testing the questionnaire on individuals similar to those who will be
asked to complete it as a part of the substantive research. Two types of pilot tests are
desirable.
• One type asks individuals, preferably like those who will complete the final
questionnaire, to provide their interpretation and understanding of each item. This
assessment will help identify errors in assumptions about participants' frames of reference.
It also helps identify items that are difficult to understand. Pilot tests of this sort will almost
always lead to changes in the design of a research questionnaire. These changes may
help increase response rates, reduce missing data, and obtain more valid responses on
the final questionnaire.
• A second type of pilot test is more like a regular research study; a large number of
respondents are desirable. Data from this type of pilot test are used to see if scores
behave as expected. Are average scores reasonable? Do scores on items vary as
expected? Analyses assessing relationships among items are also useful in this
type of pilot test. For example, internal consistency reliability of multi-item measures
can be assessed by a procedure described in chapter 17. Indeed, this second type of pilot
test can be viewed as an important step in construct validation as described in the last
chapter. However, its preliminary nature must be emphasized. Changes in items will
almost always be suggested the first time scores from a new questionnaire are analyzed.
LIU YING
42

Summary
For Review
• Terms to Know
• Things to Know
Part II Suggested Readings
LIU YING
43

Part 2

Recommandé

Recommandé

Contenu connexe

Similaire à Part 2

Similaire à Part 2 (20)

Plus de Ying Liu

Plus de Ying Liu (20)

Dernier

Dernier (20)

Part 2