SlideShare une entreprise Scribd logo
1  sur  42
APPROACHES TO
QUANTITATIVE DATA
ANALYSIS
© LOUIS COHEN, LAWRENCE
MANION & KEITH MORRISON
STRUCTURE OF THE CHAPTER
• Scales of data
• Parametric and non-parametric data
• Descriptive and inferential statistics
• Kinds of variables
• Hypotheses
• One-tailed and two-tailed tests
• Distributions
• Statistical significance
• Hypothesis testing
• Effect size
• A note on symbols
FOUR SCALES OF DATA
NOMINAL
ORDINAL
INTERVAL
RATIO
It is incorrect to apply statistics which can only
be used at a higher scale of data to data at a
lower scale.
• Parametric statistics: where characteristics
of, or factors in, the population are known;
• Non-parametric statistics: where the
characteristics of, or factors in, the population
are unknown.
PARAMETRIC AND NON-
PARAMETRIC STATISTICS
DESCRIPTIVE AND INFERENTIAL
STATISTICS
• Descriptive statistics: to summarize features of
the sample or simple responses of the sample
(e.g. frequencies or correlations).
• No attempt is made to infer or predict population
parameters.
• Inferential statistics: to infer or predict
population parameters or outcomes from simple
measures, e.g. from sampling and from
statistical techniques.
• Based on probability.
DESCRIPTIVE STATISTICS
• The mode (the score obtained by the greatest
number of people);
• The mean (the average score);
• The median (the score obtained by the middle
person in a ranked group of people, i.e. it has an
equal number of scores above it and below it);
• Minimum and maximum scores;
• The range (the distance between the highest
and the lowest scores);
• The variance (a measure of how far scores are
from the mean: the average of the squared
deviations of individual scores from the mean);
SIMPLE STATISTICS
• Frequencies (raw scores and percentages)
– Look for skewness, intensity, distributions and
spread (kurtosis);
• Mode
– For nominal and ordinal data
• Mean
– For interval and ratio data
• Standard deviation
– For interval and ratio data
9
8
Mean
7 |
6 |
5 |
4 |
3 |
2 |
1 X X X X | X
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 2 3 4 20
Mean = 6
High standard deviation
9
8
Mean
7 |
6 |
5 |
4 |
3 |
2 |
1 X X X X X
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 2 6 10 11
Mean = 6
Moderately high
standard deviation
9
8
Mean
7 |
6 |
5 |
4 |
3 X
2 X
1 X X X
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
5 6 6 6 7
Mean = 6
Low standard deviation
STANDARD DEVIATION
• The standard deviation is a standardised measure
of the dispersal of the scores, i.e. how far away from
the mean/average each score is. It is calculated, in
its most simplified form as:
or
• d2
= the deviation of the score from the mean
(average), squared
∀ ∑ = the sum of
• N = the number of cases
• A low standard deviation indicates that the scores
cluster together, whilst a high standard deviation
indicates that the scores are widely dispersed.








−
=
∑
1
..
2
N
d
DS








=
∑
N
d
DS
2
..
DESCRIPTIVE STATISTICS
• The standard deviation (a measure of the
dispersal or range of scores: the square root of
the variance);
• The standard error (the standard deviation of
sample means);
• The skewness (how far the data are
asymmetrical in relation to a ‘normal’ curve of
distribution);
• Kurtosis (how steep or flat is the shape of a
graph or distribution of data; a measure of how
peaked a distribution is and how steep is the
slope or spread of data around the peak).
INFERENTIAL STATISTICS
• Can use descriptive statistics.
• Correlations
• Regression
• Multiple regression
• Difference testing
• Factor analysis
• Structural equation modelling
DEPENDENT AND INDEPENDENT
VARIABLES
• An independent variable is an antecedent
variable, that which causes, in part or in total,
a particular outcome; it is a stimulus that
influences a response, a factor which may be
modified (e.g. under experimental or other
conditions) to affect an outcome.
• A dependent variable is the outcome
variable, that which is caused, in total or in
part, by the input, antecedent variable. It is
the effect, consequence of, or response to, an
independent variable.
DEPENDENT AND INDEPENDENT
VARIABLES
• In using statistical tests which require
independent and dependent variables,
exercise caution in assuming which is or is not
the dependent or independent variable, as the
direction of causality may not be one-way or in
the direction assumed.
FIVE KEY INITIAL QUESTIONS
1. What kind (scales) of data are there?
2. Are the data parametric or non-parametric?
3. Are descriptive or inferential statistics
required?
4. Do dependent and independent variables need
to be identified?
5. Are the relationships considered to be linear or
non-linear?
CATEGORICAL, DISCRETE AND
CONTINUOUS VARIABLES
• A categorical variable is a variable which has
categories of values, e.g. the variable ‘sex’ has
two values: male and female.
• A discrete variable has a finite number of
values of the same item, with no intervals or
fractions of the value, e.g. a person cannot have
half an illness or half a mealtime.
• A continuous variable can vary in quantity, e.g.
money in the bank, monthly earnings. There
are equal intervals, and, usually, a true zero,
e.g. it is possible to have no money in the bank.
CATEGORICAL, DISCRETE AND
CONTINUOUS VARIABLES
• Categorical variables match categorical data.
• Continuous variables match interval and ratio
data.
KINDS OF ANALYSIS
• Univariate analysis: looks for differences
amongst cases within one variable.
• Bivariate analysis: looks for a relationship
between two variables.
• Multivariate analysis: looks for a relationship
between two or more variables.
HYPOTHESES
• Null hypothesis (H0)
• Alternative hypothesis (H1)
• The null hypothesis is the stronger hypothesis,
requiring rigorous evidence not to support it.
• One should commence with the former and
cast the research in the form of a null
hypothesis, and only turn to the latter in the
case of finding the null hypothesis not
supported.
HYPOTHESES
• Direction of hypothesis: states the kind of
difference or relationship between two conditions
or two groups of participants
• One-tailed (directional), e.g.: ‘people who study in
silent surroundings achieve better than those
who study in noisy surroundings’. (‘Better’
indicates the direction.)
• Two-tailed (no direction), e.g.: ‘there is a
difference between people who study in silent
surroundings and those who study in noisy
surroundings’. (There is no indication of which is
the better.)
ONE-TAILED AND TWO-TAILED TESTS
• A one-tailed test makes assumptions about the
population and the direction of the outcome,
e.g. Group A will score more highly than
another on a test.
• A two-tailed test makes no assumptions about
the population and the direction of the
outcome, e.g. there will be a difference in the
test scores.
THE NORMAL CURVE OF DISTRIBUTION
THE NORMAL CURVE OF DISTRIBUTION
• A smooth, perfectly symmetrical, bell-shaped
curve.
• It is symmetrical about the mean and its tails
are assumed to meet the x-axis at infinity.
• Statistical calculations often assume that the
population is distributed normally and then
compare the data collected from the sample to
the population, allowing inferences to be made
about the population.
THE NORMAL CURVE OF DISTRIBUTION
Assumes that:
– 68.3 per cent of people fall within 1 standard
deviation of the mean;
– 27.1 per cent) are between 1 standard
deviation and 2 standard deviations away
from the mean;
– 4.3 per cent are between 2 and 3 standard
deviations away from the mean;
– 0.3 per cent are more than 3 standard
deviations away from the mean.
SKEWNESS
The curve is not
symmetrical or
bell-shaped
KURTOSIS
(STEEPNESS OF THE CURVE)
STATISTICAL SIGNIFICANCE
If the findings hold true 95% of the time then the
statistical significance level (ρ) = 0.05
If the findings hold true 99% of the time then the
statistical significance level (ρ) = 0.01
If the findings hold true 99.9% of the time then the
statistical significance level (ρ) = 0.001
CORRELATION
Shoe size Hat size
1 1
2 2
3 3
4 4
5 5
Perfect positive correlation: + 1
CORRELATION
Hand size Foot size
1 1
2 2
3 3
4 4
5 5
Perfect positive correlation: + 1
CORRELATION
HAND SIZE FOOT SIZE
1 2
2 1
3 4
4 3
5 5
Positive correlation: <+1
0
1
2
3
4
5
6
7
Line 1
PERFECT POSITIVE CORRELATION
0
1
2
3
4
5
6
7
Line 1
PERFECT NEGATIVE CORRELATION
0
2
4
6
8
10
Line 1
MIXED CORRELATION
CORRELATIONS
Statistical significance is a function of the
co-efficient and the sample size:
– the smaller the sample, the larger the
co-efficient has to be in order to obtain
statistical significance;
– the larger the sample, the smaller the
co-efficient can be in order to obtain
statistical signifiance;
– Statistical significance can be attained
either by having a large coefficient
together with a small sample or having a
small coefficient together with a large
sample.
CORRELATIONS
• Begin with a null hypothesis (e.g. there is no
relationship between the size of hands and the size
of feet). The task is not to support the hypothesis,
i.e. the burden of responsibility is not to support the
null hypothesis.
• If the hypothesis is not supported for 95 per cent or
99 per cent or 99.9 per cent of the population, then
there is a statistically significant relationship
between the size of hands and the size of feet at
the 0.05, 0.01 and 0.001 levels of significance
respectively.
• These levels of significance – the 0.05, 0.01 and
0.001 levels – are the levels at which statistical
significance is frequently taken to be demonstrated.
HYPOTHESIS TESTING
• Commence with a null hypothesis
• Set the level of significance (α) to be used to
support or not to support the null hypothesis
(the alpha (α) level); the alpha level is
determined by the researcher.
• Compute the data.
• Determine whether the null hypothesis is
supported or not supported.
• Avoid Type I and Type II errors.
TYPE I AND TYPE II ERRORS
• Null Hypothesis: there is no statistically
significant difference between x and y.
• TYPE I ERROR
– The researcher rejects the null hypothesis when
it is in fact true (like convicting an innocent
person)
∴increase significance level
• TYPE II ERROR
– The researcher accepts the null hypothesis when
it is in fact false (like finding a guilty person
innocent)
∴reduce significance level, increase sample size.
EFFECT SIZE
• Increasingly seen as preferable to statistical
significance.
• A way of quantifying the difference between
two groups. It indicates how big the effect is,
something that statistical significance does not.
• For example, if one group has had an
experimental treatment and the other has not
(the control group), then the effect size is a
measure of the effectiveness of the treatment.
EFFECT SIZE
• It is calculated thus:
• Statistics for calculating effect size include r2
,
adjusted R2
, η2
, ω2
, Cramer’s V, Kendall’s W,
Cohen’s d, Eta, Eta2
.
• Different kinds of statistical treatments use
different effect size calculations.
groupcontroltheofdeviationstandard
group)controlofmeangroupalexperimentof(mean
sizeEffect
−
=
squaresofsumTotal
groupsbetweensquareofSum
)(EtasizeEffect 2
=
EFFECT SIZE
• In using Cohen’s d:
0-0.20 = weak effect
0.21-0.50 = modest effect
0.51-1.00 = moderate effect
>1.00 = strong effect
THE POWER OF A TEST
• An estimate of the ability of the test to separate
the effect size from random variation.

Contenu connexe

Tendances

Applied statistics lecture_8
Applied statistics lecture_8Applied statistics lecture_8
Applied statistics lecture_8
Daria Bogdanova
 
Applied statistics lecture_6
Applied statistics lecture_6Applied statistics lecture_6
Applied statistics lecture_6
Daria Bogdanova
 
Applied statistics lecture_3
Applied statistics lecture_3Applied statistics lecture_3
Applied statistics lecture_3
Daria Bogdanova
 
Anova ancova manova_mancova
Anova  ancova manova_mancovaAnova  ancova manova_mancova
Anova ancova manova_mancova
Carlo Magno
 
Lecture 8 guidelines_and_assignments
Lecture 8 guidelines_and_assignmentsLecture 8 guidelines_and_assignments
Lecture 8 guidelines_and_assignments
Daria Bogdanova
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminar
drdeepika87
 

Tendances (20)

One way repeated measure anova
One way repeated measure anovaOne way repeated measure anova
One way repeated measure anova
 
t Test- Thiyagu
t Test- Thiyagut Test- Thiyagu
t Test- Thiyagu
 
Applied statistics lecture_8
Applied statistics lecture_8Applied statistics lecture_8
Applied statistics lecture_8
 
Applied statistics lecture_6
Applied statistics lecture_6Applied statistics lecture_6
Applied statistics lecture_6
 
Measures of Dispersion - Thiyagu
Measures of Dispersion - ThiyaguMeasures of Dispersion - Thiyagu
Measures of Dispersion - Thiyagu
 
Applied statistics lecture_3
Applied statistics lecture_3Applied statistics lecture_3
Applied statistics lecture_3
 
Analysis of Data - Dr. K. Thiyagu
Analysis of Data - Dr. K. ThiyaguAnalysis of Data - Dr. K. Thiyagu
Analysis of Data - Dr. K. Thiyagu
 
Anova ancova manova_mancova
Anova  ancova manova_mancovaAnova  ancova manova_mancova
Anova ancova manova_mancova
 
Statistical tests
Statistical tests Statistical tests
Statistical tests
 
Repeated Measures ANOVA
Repeated Measures ANOVARepeated Measures ANOVA
Repeated Measures ANOVA
 
Anova lecture
Anova lectureAnova lecture
Anova lecture
 
Parametric tests
Parametric testsParametric tests
Parametric tests
 
Shovan anova main
Shovan anova mainShovan anova main
Shovan anova main
 
ANOVA II
ANOVA IIANOVA II
ANOVA II
 
Lecture 8 guidelines_and_assignments
Lecture 8 guidelines_and_assignmentsLecture 8 guidelines_and_assignments
Lecture 8 guidelines_and_assignments
 
Tests of significance
Tests of significanceTests of significance
Tests of significance
 
MONOVA
MONOVAMONOVA
MONOVA
 
Day 11 t test for independent samples
Day 11 t test for independent samplesDay 11 t test for independent samples
Day 11 t test for independent samples
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminar
 
Anova (1)
Anova (1)Anova (1)
Anova (1)
 

En vedette (7)

Chapter20
Chapter20Chapter20
Chapter20
 
Chapter35
Chapter35Chapter35
Chapter35
 
Chapter22
Chapter22Chapter22
Chapter22
 
Chapter27
Chapter27Chapter27
Chapter27
 
首尔大韩国语语法Topik考试语法合集 刘赢整理
首尔大韩国语语法Topik考试语法合集 刘赢整理首尔大韩国语语法Topik考试语法合集 刘赢整理
首尔大韩国语语法Topik考试语法合集 刘赢整理
 
Chapter7
Chapter7Chapter7
Chapter7
 
Chapter38
Chapter38Chapter38
Chapter38
 

Similaire à Chapter34

Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate Analysis
Soumya Sahoo
 
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdfBASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
Adamu Mohammad
 

Similaire à Chapter34 (20)

ststs nw.pptx
ststs nw.pptxststs nw.pptx
ststs nw.pptx
 
COM 201_Inferential Statistics_18032022.pptx
COM 201_Inferential Statistics_18032022.pptxCOM 201_Inferential Statistics_18032022.pptx
COM 201_Inferential Statistics_18032022.pptx
 
Presentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlalPresentation research- chapter 10-11 istiqlal
Presentation research- chapter 10-11 istiqlal
 
BIOSTATISTICS.pptx
BIOSTATISTICS.pptxBIOSTATISTICS.pptx
BIOSTATISTICS.pptx
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in Research
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate Analysis
 
Common Statistical Terms - Biostatistics - Ravinandan A P.pdf
Common Statistical Terms - Biostatistics - Ravinandan A P.pdfCommon Statistical Terms - Biostatistics - Ravinandan A P.pdf
Common Statistical Terms - Biostatistics - Ravinandan A P.pdf
 
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdfBASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
BASIC STATISTICS AND THEIR INTERPRETATION AND USE IN EPIDEMIOLOGY 050822.pdf
 
Basics of statistics
Basics of statisticsBasics of statistics
Basics of statistics
 
Stats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.pptStats-Review-Maie-St-John-5-20-2009.ppt
Stats-Review-Maie-St-John-5-20-2009.ppt
 
Estimation and hypothesis
Estimation and hypothesisEstimation and hypothesis
Estimation and hypothesis
 
Review of Chapters 1-5.ppt
Review of Chapters 1-5.pptReview of Chapters 1-5.ppt
Review of Chapters 1-5.ppt
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.ppt
 
Basics of Statistical Analysis
Basics of Statistical AnalysisBasics of Statistical Analysis
Basics of Statistical Analysis
 
Statistics
StatisticsStatistics
Statistics
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatistics
 
Introduction to Statistics2312.ppt
Introduction to Statistics2312.pptIntroduction to Statistics2312.ppt
Introduction to Statistics2312.ppt
 
Introduction to Statistics23122223.ppt
Introduction to Statistics23122223.pptIntroduction to Statistics23122223.ppt
Introduction to Statistics23122223.ppt
 
Res701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasamRes701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasam
 

Plus de Ying Liu (20)

Chapter33
Chapter33Chapter33
Chapter33
 
Chapter32
Chapter32Chapter32
Chapter32
 
Chapter31
Chapter31Chapter31
Chapter31
 
Chapter30
Chapter30Chapter30
Chapter30
 
Chapter29
Chapter29Chapter29
Chapter29
 
Chapter28
Chapter28Chapter28
Chapter28
 
Chapter26
Chapter26Chapter26
Chapter26
 
Chapter25
Chapter25Chapter25
Chapter25
 
Chapter24
Chapter24Chapter24
Chapter24
 
Chapter23
Chapter23Chapter23
Chapter23
 
Chapter21
Chapter21Chapter21
Chapter21
 
Chapter19
Chapter19Chapter19
Chapter19
 
Chapter18
Chapter18Chapter18
Chapter18
 
Chapter17
Chapter17Chapter17
Chapter17
 
Chapter16
Chapter16Chapter16
Chapter16
 
Chapter15
Chapter15Chapter15
Chapter15
 
Chapter14
Chapter14Chapter14
Chapter14
 
Chapter13
Chapter13Chapter13
Chapter13
 
Chapter12
Chapter12Chapter12
Chapter12
 
Chapter11
Chapter11Chapter11
Chapter11
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Chapter34

  • 1. APPROACHES TO QUANTITATIVE DATA ANALYSIS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON
  • 2. STRUCTURE OF THE CHAPTER • Scales of data • Parametric and non-parametric data • Descriptive and inferential statistics • Kinds of variables • Hypotheses • One-tailed and two-tailed tests • Distributions • Statistical significance • Hypothesis testing • Effect size • A note on symbols
  • 3. FOUR SCALES OF DATA NOMINAL ORDINAL INTERVAL RATIO It is incorrect to apply statistics which can only be used at a higher scale of data to data at a lower scale.
  • 4. • Parametric statistics: where characteristics of, or factors in, the population are known; • Non-parametric statistics: where the characteristics of, or factors in, the population are unknown. PARAMETRIC AND NON- PARAMETRIC STATISTICS
  • 5. DESCRIPTIVE AND INFERENTIAL STATISTICS • Descriptive statistics: to summarize features of the sample or simple responses of the sample (e.g. frequencies or correlations). • No attempt is made to infer or predict population parameters. • Inferential statistics: to infer or predict population parameters or outcomes from simple measures, e.g. from sampling and from statistical techniques. • Based on probability.
  • 6. DESCRIPTIVE STATISTICS • The mode (the score obtained by the greatest number of people); • The mean (the average score); • The median (the score obtained by the middle person in a ranked group of people, i.e. it has an equal number of scores above it and below it); • Minimum and maximum scores; • The range (the distance between the highest and the lowest scores); • The variance (a measure of how far scores are from the mean: the average of the squared deviations of individual scores from the mean);
  • 7. SIMPLE STATISTICS • Frequencies (raw scores and percentages) – Look for skewness, intensity, distributions and spread (kurtosis); • Mode – For nominal and ordinal data • Mean – For interval and ratio data • Standard deviation – For interval and ratio data
  • 8. 9 8 Mean 7 | 6 | 5 | 4 | 3 | 2 | 1 X X X X | X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 20 Mean = 6 High standard deviation
  • 9. 9 8 Mean 7 | 6 | 5 | 4 | 3 | 2 | 1 X X X X X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 6 10 11 Mean = 6 Moderately high standard deviation
  • 10. 9 8 Mean 7 | 6 | 5 | 4 | 3 X 2 X 1 X X X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 5 6 6 6 7 Mean = 6 Low standard deviation
  • 11. STANDARD DEVIATION • The standard deviation is a standardised measure of the dispersal of the scores, i.e. how far away from the mean/average each score is. It is calculated, in its most simplified form as: or • d2 = the deviation of the score from the mean (average), squared ∀ ∑ = the sum of • N = the number of cases • A low standard deviation indicates that the scores cluster together, whilst a high standard deviation indicates that the scores are widely dispersed.         − = ∑ 1 .. 2 N d DS         = ∑ N d DS 2 ..
  • 12. DESCRIPTIVE STATISTICS • The standard deviation (a measure of the dispersal or range of scores: the square root of the variance); • The standard error (the standard deviation of sample means); • The skewness (how far the data are asymmetrical in relation to a ‘normal’ curve of distribution); • Kurtosis (how steep or flat is the shape of a graph or distribution of data; a measure of how peaked a distribution is and how steep is the slope or spread of data around the peak).
  • 13. INFERENTIAL STATISTICS • Can use descriptive statistics. • Correlations • Regression • Multiple regression • Difference testing • Factor analysis • Structural equation modelling
  • 14. DEPENDENT AND INDEPENDENT VARIABLES • An independent variable is an antecedent variable, that which causes, in part or in total, a particular outcome; it is a stimulus that influences a response, a factor which may be modified (e.g. under experimental or other conditions) to affect an outcome. • A dependent variable is the outcome variable, that which is caused, in total or in part, by the input, antecedent variable. It is the effect, consequence of, or response to, an independent variable.
  • 15. DEPENDENT AND INDEPENDENT VARIABLES • In using statistical tests which require independent and dependent variables, exercise caution in assuming which is or is not the dependent or independent variable, as the direction of causality may not be one-way or in the direction assumed.
  • 16. FIVE KEY INITIAL QUESTIONS 1. What kind (scales) of data are there? 2. Are the data parametric or non-parametric? 3. Are descriptive or inferential statistics required? 4. Do dependent and independent variables need to be identified? 5. Are the relationships considered to be linear or non-linear?
  • 17. CATEGORICAL, DISCRETE AND CONTINUOUS VARIABLES • A categorical variable is a variable which has categories of values, e.g. the variable ‘sex’ has two values: male and female. • A discrete variable has a finite number of values of the same item, with no intervals or fractions of the value, e.g. a person cannot have half an illness or half a mealtime. • A continuous variable can vary in quantity, e.g. money in the bank, monthly earnings. There are equal intervals, and, usually, a true zero, e.g. it is possible to have no money in the bank.
  • 18. CATEGORICAL, DISCRETE AND CONTINUOUS VARIABLES • Categorical variables match categorical data. • Continuous variables match interval and ratio data.
  • 19. KINDS OF ANALYSIS • Univariate analysis: looks for differences amongst cases within one variable. • Bivariate analysis: looks for a relationship between two variables. • Multivariate analysis: looks for a relationship between two or more variables.
  • 20. HYPOTHESES • Null hypothesis (H0) • Alternative hypothesis (H1) • The null hypothesis is the stronger hypothesis, requiring rigorous evidence not to support it. • One should commence with the former and cast the research in the form of a null hypothesis, and only turn to the latter in the case of finding the null hypothesis not supported.
  • 21. HYPOTHESES • Direction of hypothesis: states the kind of difference or relationship between two conditions or two groups of participants • One-tailed (directional), e.g.: ‘people who study in silent surroundings achieve better than those who study in noisy surroundings’. (‘Better’ indicates the direction.) • Two-tailed (no direction), e.g.: ‘there is a difference between people who study in silent surroundings and those who study in noisy surroundings’. (There is no indication of which is the better.)
  • 22. ONE-TAILED AND TWO-TAILED TESTS • A one-tailed test makes assumptions about the population and the direction of the outcome, e.g. Group A will score more highly than another on a test. • A two-tailed test makes no assumptions about the population and the direction of the outcome, e.g. there will be a difference in the test scores.
  • 23. THE NORMAL CURVE OF DISTRIBUTION
  • 24. THE NORMAL CURVE OF DISTRIBUTION • A smooth, perfectly symmetrical, bell-shaped curve. • It is symmetrical about the mean and its tails are assumed to meet the x-axis at infinity. • Statistical calculations often assume that the population is distributed normally and then compare the data collected from the sample to the population, allowing inferences to be made about the population.
  • 25. THE NORMAL CURVE OF DISTRIBUTION Assumes that: – 68.3 per cent of people fall within 1 standard deviation of the mean; – 27.1 per cent) are between 1 standard deviation and 2 standard deviations away from the mean; – 4.3 per cent are between 2 and 3 standard deviations away from the mean; – 0.3 per cent are more than 3 standard deviations away from the mean.
  • 26. SKEWNESS The curve is not symmetrical or bell-shaped
  • 28. STATISTICAL SIGNIFICANCE If the findings hold true 95% of the time then the statistical significance level (ρ) = 0.05 If the findings hold true 99% of the time then the statistical significance level (ρ) = 0.01 If the findings hold true 99.9% of the time then the statistical significance level (ρ) = 0.001
  • 29. CORRELATION Shoe size Hat size 1 1 2 2 3 3 4 4 5 5 Perfect positive correlation: + 1
  • 30. CORRELATION Hand size Foot size 1 1 2 2 3 3 4 4 5 5 Perfect positive correlation: + 1
  • 31. CORRELATION HAND SIZE FOOT SIZE 1 2 2 1 3 4 4 3 5 5 Positive correlation: <+1
  • 35. CORRELATIONS Statistical significance is a function of the co-efficient and the sample size: – the smaller the sample, the larger the co-efficient has to be in order to obtain statistical significance; – the larger the sample, the smaller the co-efficient can be in order to obtain statistical signifiance; – Statistical significance can be attained either by having a large coefficient together with a small sample or having a small coefficient together with a large sample.
  • 36. CORRELATIONS • Begin with a null hypothesis (e.g. there is no relationship between the size of hands and the size of feet). The task is not to support the hypothesis, i.e. the burden of responsibility is not to support the null hypothesis. • If the hypothesis is not supported for 95 per cent or 99 per cent or 99.9 per cent of the population, then there is a statistically significant relationship between the size of hands and the size of feet at the 0.05, 0.01 and 0.001 levels of significance respectively. • These levels of significance – the 0.05, 0.01 and 0.001 levels – are the levels at which statistical significance is frequently taken to be demonstrated.
  • 37. HYPOTHESIS TESTING • Commence with a null hypothesis • Set the level of significance (α) to be used to support or not to support the null hypothesis (the alpha (α) level); the alpha level is determined by the researcher. • Compute the data. • Determine whether the null hypothesis is supported or not supported. • Avoid Type I and Type II errors.
  • 38. TYPE I AND TYPE II ERRORS • Null Hypothesis: there is no statistically significant difference between x and y. • TYPE I ERROR – The researcher rejects the null hypothesis when it is in fact true (like convicting an innocent person) ∴increase significance level • TYPE II ERROR – The researcher accepts the null hypothesis when it is in fact false (like finding a guilty person innocent) ∴reduce significance level, increase sample size.
  • 39. EFFECT SIZE • Increasingly seen as preferable to statistical significance. • A way of quantifying the difference between two groups. It indicates how big the effect is, something that statistical significance does not. • For example, if one group has had an experimental treatment and the other has not (the control group), then the effect size is a measure of the effectiveness of the treatment.
  • 40. EFFECT SIZE • It is calculated thus: • Statistics for calculating effect size include r2 , adjusted R2 , η2 , ω2 , Cramer’s V, Kendall’s W, Cohen’s d, Eta, Eta2 . • Different kinds of statistical treatments use different effect size calculations. groupcontroltheofdeviationstandard group)controlofmeangroupalexperimentof(mean sizeEffect − = squaresofsumTotal groupsbetweensquareofSum )(EtasizeEffect 2 =
  • 41. EFFECT SIZE • In using Cohen’s d: 0-0.20 = weak effect 0.21-0.50 = modest effect 0.51-1.00 = moderate effect >1.00 = strong effect
  • 42. THE POWER OF A TEST • An estimate of the ability of the test to separate the effect size from random variation.