SlideShare une entreprise Scribd logo
1  sur  34
Validity and Reliability
Session 2
Chapters 4
Colton & Covert (2007)
What is Validity?
According to Colton and Covert (2007), validity is
“the ability of an instrument to measure what you intend
it to measure” (p. 65).
Validity ensures trustworthy and credible information.
“Validity is a matter of degree.”
(Colton & Covert, 2017, p. 65)
• Assessment instruments are not merely valid or
invalid.
• Validity exists in varying degrees across a
continuum.
• Validity is a characteristic of the responses/data
gathered.
• The greater the evidence of validity the greater
the likelihood of credible trustworthy data.
• Hence, the importance of establishing/testing
the validity before the instrument is used.
In order to gather evidence that an
instrument is valid, we need to establish
that it is measuring :
1. the right content (Content Validity)
(Does the instrument measure the content it’s intended to measure?
2. the right construct (Construct Validity)
(Does the instrument measure the construct it’s designed to measure?)
3. the right criterion (Criterion Validity)
(Do the instrument scores align with 1 or more standards or outcomes
related to the instrument’s intent?)
Establishing Evidence of
Content Validity
To determine this, ask:
Do the items in the instrument represent
the topics or process being investigated?
Ex: An instrument designed to measure alcohol use
should measure behaviors associated with alcohol use
(not smoking, drug use, etc.).
These steps are done during the assessment development stage:
1. Define the content domain that the assessment intends to measure.
2. Define the components of the content domain that should be
represented in the assessment through a literature review.
3. Write the items/questions that reflect this defined content domain.
4. Have a panel of topic experts review the items/questions.
Establishing Evidence of
Content Validity
You are to design an instrument to measure undergraduate college teaching
effectiveness.
1. Clearly define the domain of the content that the assessment intends to represent.
Determine the topics/principles related to college teaching effectiveness using the
literature.
2. Define the components of the content domain that should be represented in the
assessment.
Select the content areas that are specific to effective undergraduate college
teaching (not graduate school or adult learning)
3. Write items/questions that reflect this defined content domain
Write response items for each component.
4. Have a panel of topic experts review the response items for clarity and coverage.
.
An Example: Establishing Evidence of
Content-related Validity
Recommended method for a response item
review by panel of topic experts (Popham, 2000)
1. Have a panel of experts individually examine each item for
content relevance—noting YES, it’s relevant or No, not
relevant
2. Calculate the percentage of YES responses for each item
and then the average percent of YES for all items. This
reflects item relevance.
3. Have panel members individually review the instrument for
content-coverage—noting a percentage
4. Compute an average percentage of all panelist estimates of
coverage. This reflects content coverage.
What do the results mean?
95% item relevance
85% content coverage
Impressive evidence of content-related validity!
You could say with, relative confidence, that the instrument
validly measures the content it intends to measure.
---------------------
65% item relevance
40% content coverage
Poor evidence of content-related validity
You could NOT say with, confidence, that instrument validly
measures the content it intends to measure.
Establishing Criterion Validity
To determine this, ask:
Are the results from the instrument
comparable to an external standard or
outcome?
There are 2 types:
1. Predictive-related validity
2. Concurrent-related validity
1. Predictive-related Criterion Validity
The assessment scores are valid for predicting future
outcomes regarding similar criteria.
A significant lag time is needed.
Ex: A group of students take a standardized math & verbal
aptitude test in 10th grade and score very low. In the
students’ senior year, 2 years later, the students’ math and
verbal aptitude scores (criterion data) on the SAT (a college
entrance exam) bear out to be similarly low.
In this case, evidence of predictive criterion-related validity has been established.
We can trust the predictive inferences regarding math & verbal skills made from
this standardized instrument 
2. Concurrent-related Criterion Validity
The assessment scores are valid for indicating
current behavior.
Ex: A group of students take a standardized math &
reading comprehension aptitude test in 10th grade
and receive very low scores. The scores are
compared to grades in 10th grade algebra and English
literature courses. They are equally low.
In this case, evidence of concurrent criterion-related validity has been established.
We trust the inferences regarding math and reading comprehension scores made
from the standardized instrument.
Establishing Construct Validity
To determine this, ask:
Does the instrument measure the construct
(i.e. psychological characteristic or human
behavior) it’s designed to measure?
Note: “Constructs” are hypothetical or theoretical.
Example: “Love” is a theoretical construction. Everyone
has constructed their own theory of what it is.
Establishing Construct Validity
• The first step is to use the literature to
operationalize (i.e. define) the construct.
• A panel of topic experts can add additional
support.
• Specific studies provide additional evidence.
Studies for Establishing Construct
Validity
1. Intervention studies
2. Differential-population studies
3. Related-measures studies: compare scores to
other measures that measure the same
construct
1. Intervention studies
Demonstrate pre-post changes in the construct
of interest based on a treatment.
Ex : An inventory designed to measure test-anxiety is given to 25
students self-identified as having test anxiety and 25 students
who claim they do not. The inventory is administered just
before a high-stakes final exam. As predicted, the scores were
significantly different between the test anxiety group and
non-test anxiety group.
In this case, evidence of construct-related validity has been established.
We can trust inferences regarding anxiety based on the anxiety inventory scores.
2. Differential-population studies:
Demonstrate different populations score differently
on the measure.
Ex : An inventory is designed to measure insecurity due to
baldness. The inventory is given to bald-headed men and
men with a head full of hair. As predicted the bald-headed
men had much higher scores than the men with hair.
Evidence of construct-related validity has been established.
We can trust inferences regarding bald-headed insecurity based on the inventory scores.
3. Related-measures studies:
Correlate scores (positive or negative) to other
measures that measure similar constructs.
Ex : An inventory is designed to measure introversion.
The inventory is given to sales people scoring high on an
extroversion inventory. As predicted the sales people
introversion inventory resulted in very low scores.
Evidence of construct-related validity has been established.
We can trust inferences regarding introversion based on the inventory scores.
It is recommended to continually establish construct-related validity as the instrument
is use. The theoretical definition of a construct changes over time.
Other types of validity
• Convergent Validity
• Discriminant Validity
• Multicultural Validity: Evidence that the instrument measures what it
intends to measure as understood by participants of a particular culture.
For example: If your instrument is to be administered to the Hmong population, then
the language, phrases, and connotations should be understood by this culture.
• Both are a type of construct validity
• Convergent validity refers to evidence that
similar constructs are strongly related.
For example: If your instrument is
measuring Depression, the response items
related to Sadness should score similarly.
• Discriminant validity refers to evidence that
dissimilar constructs are NOT related.
For example: If your instrument is
measuring Depression, the response items
related to Happiness should score
dissimilarly.
Quick summary
 In order to make valid decisions we need to use
appropriate instruments that have established
evidence of content-related, construct-related, and
criterion-related validity.
 This is determined in the developmental stage of the
instrument.
 If you are designing an instrument, you need to
establish this.
 If the instrument is already designed, review the
instrument’s manual to determine how this was done.
 If you alter an established instrument from it’s original
state, you need to re-establish validity and reliability.
Reliability
Assessment instruments need to yield valid data
AND be
reliable
What’s “Reliability”
The ability to gather consistent results from a
particular instrument.
There are 3 approaches to establishing
instrument reliability.
1. Stability reliability
2. Alternate-form reliability
3. Internal consistency reliability
Each is a statistical test of correlation to measure
of consistency.
1. Stability Reliability
Definition: Consistent results over time
Also known as “test-retest” reliability
Use this if the assessment is to be given to the same individuals
at different times.
How do we do determine this?
 Give the assessment over again to the same group of people.
 Calculate the correlation between the 2 scores.
 Be sure to wait several days or a few weeks.
 Long enough to reduce the influence of the 1st testing (i.e.
memory of test items) and short enough to reduce the
influence of intervening influences.
2. Alternate-form reliability
Definition: Consistent results between different forms of the
same test.
Also known as “parallel form” reliability
Use this if multiple test forms are needed for interchangeability —
usually for test security (i.e. prevent cheating).
How do we determine this?
Create different forms that are similar in content (i.e. “content
parallel”) and difficulty (i.e. “difficulty-parallel”). Administer both
forms to the same group of people and calculate the correlation.
Are stability reliability and alternate-
form reliability ever combined?
YES!
This is called stability and alternate-form reliability.
This is where there are consistent results over time using two
different test forms of parallel-content and parallel-difficulty.
3. Internal Consistency reliability
The degree to which all test items measure the content domain
consistently.
Use this when there is no concern about stability over time and no
need for an alternate form.
How do we do this?
Split-half technique: Divide test in half by treating the odd numbered
items and even numbered items as 2 separate tests. The entire
test is administered and the 2 sub-scores (scores from even items
& scores from odd items) are correlated.
Reliability Coefficients (known as “r “)
Stability reliability
Alternative form reliability
Internal reliability
Pearson-Product moment
Pearson-Product moment
Pearson-Product moment is used to
correlate each half
or Kuder-Richardson or
Cronbach’s alpha
When establishing reliability a correlation between the two sets of data
needs to be calculated using appropriate statistical formulas.
Reliability method Statistical formula
Acceptable r values
A reliability value of 0.00 means absence of reliability
whereas value of 1.00 means perfect reliability. An
acceptable reliability coefficient should not be below
0.80, less than this value indicates inadequate
reliability.
However with stability and alternative-form combined
reliability, .70 is acceptable since there are more
variables.
So let’s check your understanding
You design an instrument to be used as a pre-post assessment.
Which form of reliability should definitely be established?
____Stability
____Alternative-form
____Internal consistency
What type of statistical formula should you use to correlate the two
results? (i.e test and re-test scores)
____Pearson-Product Moment
____ Spearman Brown
The reliability coefficient was .70 Is the assessment reliable?
____Yes
____No It needs to be at least .80
Remember….
In order for an assessment to be worthwhile it
needs to be
RELIABLE
and able to yield
VALID data
AND….
It’s quite possible for an instrument to be
RELIABLE
and not
provide VALID inferences
HOWEVER….
It’s NOT possible for an instrument to provide
VALID inferences
without being
RELIABLE
This ends Info Session 2
“Validity and Reliability”
I highly recommend traveling through this session at least TWICE

Contenu connexe

Tendances

Good scale measurement
Good scale measurementGood scale measurement
Good scale measurementsai precious
 
Measuring and scaling of quantitative data khalid
Measuring and scaling of quantitative data khalidMeasuring and scaling of quantitative data khalid
Measuring and scaling of quantitative data khalidKhalid Mahmood
 
What is Reliability and its Types?
What is Reliability and its Types? What is Reliability and its Types?
What is Reliability and its Types? Dr. Amjad Ali Arain
 
Reliability & Validity
Reliability & ValidityReliability & Validity
Reliability & ValidityIkbal Ahmed
 
7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)Phong Đá
 
Reliability and dependability by neil jones
Reliability and dependability by neil jonesReliability and dependability by neil jones
Reliability and dependability by neil jonesAmir Hamid Forough Ameri
 
200 chapter 7 measurement :scaling by uma sekaran
200 chapter 7 measurement :scaling by uma sekaran 200 chapter 7 measurement :scaling by uma sekaran
200 chapter 7 measurement :scaling by uma sekaran Irfan Sheikh
 
MYP Science Year 4-5 Criterion A Rubric
MYP Science Year 4-5 Criterion A RubricMYP Science Year 4-5 Criterion A Rubric
MYP Science Year 4-5 Criterion A RubricBrad Kremer
 
MYP Science Year 4-5 Criterion C Rubric
MYP Science Year 4-5 Criterion C RubricMYP Science Year 4-5 Criterion C Rubric
MYP Science Year 4-5 Criterion C RubricBrad Kremer
 
Presentation on Rating scale
Presentation on Rating scalePresentation on Rating scale
Presentation on Rating scaleZubair Bhatti
 
Topic 7 measurement in research
Topic 7   measurement in researchTopic 7   measurement in research
Topic 7 measurement in researchDhani Ahmad
 
MYP Science Year 4-5 Criterion B Rubric
MYP Science Year 4-5 Criterion B RubricMYP Science Year 4-5 Criterion B Rubric
MYP Science Year 4-5 Criterion B RubricBrad Kremer
 

Tendances (20)

Reliability
ReliabilityReliability
Reliability
 
Good scale measurement
Good scale measurementGood scale measurement
Good scale measurement
 
Measuring and scaling of quantitative data khalid
Measuring and scaling of quantitative data khalidMeasuring and scaling of quantitative data khalid
Measuring and scaling of quantitative data khalid
 
What is Reliability and its Types?
What is Reliability and its Types? What is Reliability and its Types?
What is Reliability and its Types?
 
Reliability
ReliabilityReliability
Reliability
 
Likert scale
Likert scaleLikert scale
Likert scale
 
Reliability & Validity
Reliability & ValidityReliability & Validity
Reliability & Validity
 
Chapter 7
Chapter 7Chapter 7
Chapter 7
 
7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)7 measurement & questionnaires design (Dr. Mai,2014)
7 measurement & questionnaires design (Dr. Mai,2014)
 
Lecture 07
Lecture 07Lecture 07
Lecture 07
 
Reliability and dependability by neil jones
Reliability and dependability by neil jonesReliability and dependability by neil jones
Reliability and dependability by neil jones
 
200 chapter 7 measurement :scaling by uma sekaran
200 chapter 7 measurement :scaling by uma sekaran 200 chapter 7 measurement :scaling by uma sekaran
200 chapter 7 measurement :scaling by uma sekaran
 
Attitude scaling
Attitude scalingAttitude scaling
Attitude scaling
 
MYP Science Year 4-5 Criterion A Rubric
MYP Science Year 4-5 Criterion A RubricMYP Science Year 4-5 Criterion A Rubric
MYP Science Year 4-5 Criterion A Rubric
 
MYP Science Year 4-5 Criterion C Rubric
MYP Science Year 4-5 Criterion C RubricMYP Science Year 4-5 Criterion C Rubric
MYP Science Year 4-5 Criterion C Rubric
 
Attitude scales
Attitude scalesAttitude scales
Attitude scales
 
Presentation on Rating scale
Presentation on Rating scalePresentation on Rating scale
Presentation on Rating scale
 
Measurement scaling
Measurement   scalingMeasurement   scaling
Measurement scaling
 
Topic 7 measurement in research
Topic 7   measurement in researchTopic 7   measurement in research
Topic 7 measurement in research
 
MYP Science Year 4-5 Criterion B Rubric
MYP Science Year 4-5 Criterion B RubricMYP Science Year 4-5 Criterion B Rubric
MYP Science Year 4-5 Criterion B Rubric
 

Similaire à Session 2 2018

Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good testcyrilcoscos
 
Validity of a Research Tool
Validity of a Research ToolValidity of a Research Tool
Validity of a Research TooljobyVarghese22
 
Chapter 8 compilation
Chapter 8 compilationChapter 8 compilation
Chapter 8 compilationHannan Mahmud
 
1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docx1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docxSONU61709
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptxJCronus
 
reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234MajaAiraBumatay
 
Validity.pptx
Validity.pptxValidity.pptx
Validity.pptxrupasi13
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment. Tarek Tawfik Amin
 
week_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptxweek_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptxDebdattaMandal3
 
Validity of test
Validity of testValidity of test
Validity of testSarat Rout
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Linejan
 
Topic validity
Topic validityTopic validity
Topic validitymikki khan
 
Qualities of a Good Test
Qualities of a Good TestQualities of a Good Test
Qualities of a Good TestDrSindhuAlmas
 

Similaire à Session 2 2018 (20)

Reliablity and Validity
Reliablity and ValidityReliablity and Validity
Reliablity and Validity
 
Shaheen Anwar
Shaheen AnwarShaheen Anwar
Shaheen Anwar
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
Validity of a Research Tool
Validity of a Research ToolValidity of a Research Tool
Validity of a Research Tool
 
Rep
RepRep
Rep
 
Business research methods
Business research methodsBusiness research methods
Business research methods
 
Chapter 8 compilation
Chapter 8 compilationChapter 8 compilation
Chapter 8 compilation
 
1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docx1) A cyber crime is a crime that involves a computer and the Inter.docx
1) A cyber crime is a crime that involves a computer and the Inter.docx
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptx
 
Validity & reliability
Validity & reliabilityValidity & reliability
Validity & reliability
 
reliability and validity psychology 1234
reliability and validity psychology 1234reliability and validity psychology 1234
reliability and validity psychology 1234
 
Validity.pptx
Validity.pptxValidity.pptx
Validity.pptx
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment.
 
week_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptxweek_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptx
 
Validity of test
Validity of testValidity of test
Validity of test
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
 
Validity and Reliability.pdf
Validity and Reliability.pdfValidity and Reliability.pdf
Validity and Reliability.pdf
 
Validity and Reliability.pdf
Validity and Reliability.pdfValidity and Reliability.pdf
Validity and Reliability.pdf
 
Topic validity
Topic validityTopic validity
Topic validity
 
Qualities of a Good Test
Qualities of a Good TestQualities of a Good Test
Qualities of a Good Test
 

Dernier

Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 

Dernier (20)

Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 

Session 2 2018

  • 1. Validity and Reliability Session 2 Chapters 4 Colton & Covert (2007)
  • 2. What is Validity? According to Colton and Covert (2007), validity is “the ability of an instrument to measure what you intend it to measure” (p. 65). Validity ensures trustworthy and credible information.
  • 3. “Validity is a matter of degree.” (Colton & Covert, 2017, p. 65) • Assessment instruments are not merely valid or invalid. • Validity exists in varying degrees across a continuum. • Validity is a characteristic of the responses/data gathered. • The greater the evidence of validity the greater the likelihood of credible trustworthy data. • Hence, the importance of establishing/testing the validity before the instrument is used.
  • 4. In order to gather evidence that an instrument is valid, we need to establish that it is measuring : 1. the right content (Content Validity) (Does the instrument measure the content it’s intended to measure? 2. the right construct (Construct Validity) (Does the instrument measure the construct it’s designed to measure?) 3. the right criterion (Criterion Validity) (Do the instrument scores align with 1 or more standards or outcomes related to the instrument’s intent?)
  • 5. Establishing Evidence of Content Validity To determine this, ask: Do the items in the instrument represent the topics or process being investigated? Ex: An instrument designed to measure alcohol use should measure behaviors associated with alcohol use (not smoking, drug use, etc.).
  • 6. These steps are done during the assessment development stage: 1. Define the content domain that the assessment intends to measure. 2. Define the components of the content domain that should be represented in the assessment through a literature review. 3. Write the items/questions that reflect this defined content domain. 4. Have a panel of topic experts review the items/questions. Establishing Evidence of Content Validity
  • 7. You are to design an instrument to measure undergraduate college teaching effectiveness. 1. Clearly define the domain of the content that the assessment intends to represent. Determine the topics/principles related to college teaching effectiveness using the literature. 2. Define the components of the content domain that should be represented in the assessment. Select the content areas that are specific to effective undergraduate college teaching (not graduate school or adult learning) 3. Write items/questions that reflect this defined content domain Write response items for each component. 4. Have a panel of topic experts review the response items for clarity and coverage. . An Example: Establishing Evidence of Content-related Validity
  • 8. Recommended method for a response item review by panel of topic experts (Popham, 2000) 1. Have a panel of experts individually examine each item for content relevance—noting YES, it’s relevant or No, not relevant 2. Calculate the percentage of YES responses for each item and then the average percent of YES for all items. This reflects item relevance. 3. Have panel members individually review the instrument for content-coverage—noting a percentage 4. Compute an average percentage of all panelist estimates of coverage. This reflects content coverage.
  • 9. What do the results mean? 95% item relevance 85% content coverage Impressive evidence of content-related validity! You could say with, relative confidence, that the instrument validly measures the content it intends to measure. --------------------- 65% item relevance 40% content coverage Poor evidence of content-related validity You could NOT say with, confidence, that instrument validly measures the content it intends to measure.
  • 10. Establishing Criterion Validity To determine this, ask: Are the results from the instrument comparable to an external standard or outcome? There are 2 types: 1. Predictive-related validity 2. Concurrent-related validity
  • 11. 1. Predictive-related Criterion Validity The assessment scores are valid for predicting future outcomes regarding similar criteria. A significant lag time is needed. Ex: A group of students take a standardized math & verbal aptitude test in 10th grade and score very low. In the students’ senior year, 2 years later, the students’ math and verbal aptitude scores (criterion data) on the SAT (a college entrance exam) bear out to be similarly low. In this case, evidence of predictive criterion-related validity has been established. We can trust the predictive inferences regarding math & verbal skills made from this standardized instrument 
  • 12. 2. Concurrent-related Criterion Validity The assessment scores are valid for indicating current behavior. Ex: A group of students take a standardized math & reading comprehension aptitude test in 10th grade and receive very low scores. The scores are compared to grades in 10th grade algebra and English literature courses. They are equally low. In this case, evidence of concurrent criterion-related validity has been established. We trust the inferences regarding math and reading comprehension scores made from the standardized instrument.
  • 13. Establishing Construct Validity To determine this, ask: Does the instrument measure the construct (i.e. psychological characteristic or human behavior) it’s designed to measure? Note: “Constructs” are hypothetical or theoretical. Example: “Love” is a theoretical construction. Everyone has constructed their own theory of what it is.
  • 14. Establishing Construct Validity • The first step is to use the literature to operationalize (i.e. define) the construct. • A panel of topic experts can add additional support. • Specific studies provide additional evidence.
  • 15. Studies for Establishing Construct Validity 1. Intervention studies 2. Differential-population studies 3. Related-measures studies: compare scores to other measures that measure the same construct
  • 16. 1. Intervention studies Demonstrate pre-post changes in the construct of interest based on a treatment. Ex : An inventory designed to measure test-anxiety is given to 25 students self-identified as having test anxiety and 25 students who claim they do not. The inventory is administered just before a high-stakes final exam. As predicted, the scores were significantly different between the test anxiety group and non-test anxiety group. In this case, evidence of construct-related validity has been established. We can trust inferences regarding anxiety based on the anxiety inventory scores.
  • 17. 2. Differential-population studies: Demonstrate different populations score differently on the measure. Ex : An inventory is designed to measure insecurity due to baldness. The inventory is given to bald-headed men and men with a head full of hair. As predicted the bald-headed men had much higher scores than the men with hair. Evidence of construct-related validity has been established. We can trust inferences regarding bald-headed insecurity based on the inventory scores.
  • 18. 3. Related-measures studies: Correlate scores (positive or negative) to other measures that measure similar constructs. Ex : An inventory is designed to measure introversion. The inventory is given to sales people scoring high on an extroversion inventory. As predicted the sales people introversion inventory resulted in very low scores. Evidence of construct-related validity has been established. We can trust inferences regarding introversion based on the inventory scores. It is recommended to continually establish construct-related validity as the instrument is use. The theoretical definition of a construct changes over time.
  • 19. Other types of validity • Convergent Validity • Discriminant Validity • Multicultural Validity: Evidence that the instrument measures what it intends to measure as understood by participants of a particular culture. For example: If your instrument is to be administered to the Hmong population, then the language, phrases, and connotations should be understood by this culture. • Both are a type of construct validity • Convergent validity refers to evidence that similar constructs are strongly related. For example: If your instrument is measuring Depression, the response items related to Sadness should score similarly. • Discriminant validity refers to evidence that dissimilar constructs are NOT related. For example: If your instrument is measuring Depression, the response items related to Happiness should score dissimilarly.
  • 20. Quick summary  In order to make valid decisions we need to use appropriate instruments that have established evidence of content-related, construct-related, and criterion-related validity.  This is determined in the developmental stage of the instrument.  If you are designing an instrument, you need to establish this.  If the instrument is already designed, review the instrument’s manual to determine how this was done.  If you alter an established instrument from it’s original state, you need to re-establish validity and reliability.
  • 21. Reliability Assessment instruments need to yield valid data AND be reliable
  • 22. What’s “Reliability” The ability to gather consistent results from a particular instrument.
  • 23. There are 3 approaches to establishing instrument reliability. 1. Stability reliability 2. Alternate-form reliability 3. Internal consistency reliability Each is a statistical test of correlation to measure of consistency.
  • 24. 1. Stability Reliability Definition: Consistent results over time Also known as “test-retest” reliability Use this if the assessment is to be given to the same individuals at different times. How do we do determine this?  Give the assessment over again to the same group of people.  Calculate the correlation between the 2 scores.  Be sure to wait several days or a few weeks.  Long enough to reduce the influence of the 1st testing (i.e. memory of test items) and short enough to reduce the influence of intervening influences.
  • 25. 2. Alternate-form reliability Definition: Consistent results between different forms of the same test. Also known as “parallel form” reliability Use this if multiple test forms are needed for interchangeability — usually for test security (i.e. prevent cheating). How do we determine this? Create different forms that are similar in content (i.e. “content parallel”) and difficulty (i.e. “difficulty-parallel”). Administer both forms to the same group of people and calculate the correlation.
  • 26. Are stability reliability and alternate- form reliability ever combined? YES! This is called stability and alternate-form reliability. This is where there are consistent results over time using two different test forms of parallel-content and parallel-difficulty.
  • 27. 3. Internal Consistency reliability The degree to which all test items measure the content domain consistently. Use this when there is no concern about stability over time and no need for an alternate form. How do we do this? Split-half technique: Divide test in half by treating the odd numbered items and even numbered items as 2 separate tests. The entire test is administered and the 2 sub-scores (scores from even items & scores from odd items) are correlated.
  • 28. Reliability Coefficients (known as “r “) Stability reliability Alternative form reliability Internal reliability Pearson-Product moment Pearson-Product moment Pearson-Product moment is used to correlate each half or Kuder-Richardson or Cronbach’s alpha When establishing reliability a correlation between the two sets of data needs to be calculated using appropriate statistical formulas. Reliability method Statistical formula
  • 29. Acceptable r values A reliability value of 0.00 means absence of reliability whereas value of 1.00 means perfect reliability. An acceptable reliability coefficient should not be below 0.80, less than this value indicates inadequate reliability. However with stability and alternative-form combined reliability, .70 is acceptable since there are more variables.
  • 30. So let’s check your understanding You design an instrument to be used as a pre-post assessment. Which form of reliability should definitely be established? ____Stability ____Alternative-form ____Internal consistency What type of statistical formula should you use to correlate the two results? (i.e test and re-test scores) ____Pearson-Product Moment ____ Spearman Brown The reliability coefficient was .70 Is the assessment reliable? ____Yes ____No It needs to be at least .80
  • 31. Remember…. In order for an assessment to be worthwhile it needs to be RELIABLE and able to yield VALID data
  • 32. AND…. It’s quite possible for an instrument to be RELIABLE and not provide VALID inferences
  • 33. HOWEVER…. It’s NOT possible for an instrument to provide VALID inferences without being RELIABLE
  • 34. This ends Info Session 2 “Validity and Reliability” I highly recommend traveling through this session at least TWICE

Notes de l'éditeur

  1. It can be argued that it is almost impossible to establish predictive validity since so many outside variables can impact the results over time.
  2. NOTE: Many believe it is nearly impossible to create two tests of the SAME difficulty. Assessment experts equalize the difficulty variance through a mathematical adjustment.