SlideShare une entreprise Scribd logo
1  sur  61
DATA ANALYSIS
Group 5
The mean, median and mode
                          Presenter: Huu Loc

 The mean, median and mode are all valid
 measures of central tendency but, under
 different conditions, some measures of
 central    tendency      become    more
 appropriate to use than others.
Mean
  The mean (or average) is the most
   popular and well known measure of
   central tendency. It can be used with
   both     discrete   and     continuous
   data, although its use is most often
   with continuous data.
  The mean is equal to the sum of all
   the values in the data set divided by
   the number of values in the data set.
So, if we have n values in a data set and
they have values x1, x2, ..., xn, then the
sample mean, usually denoted by
(pronounced x bar), is:
 The mean is essentially a model of your
  data set. It is the value that is most
  common.
 An important property of the mean is
  that it includes every value in your data
  set as part of the calculation. In
  addition, the mean is the only measure
  of central tendency where the sum of
  the deviations of each value from the
  mean is always zero.
Median

The median is the middle score for a set of
data that has been arranged in order of
magnitude. The median is less affected by
outliers and skewed data. In order to
calculate the median, suppose we have
the data below:

We first need to rearrange that data into
order of magnitude (smallest first):
Our median mark is the middle mark -
in this case 56 (highlighted in bold). It
is the middle mark because there are 5
scores before it and 5 scores after it.
Mode
The mode is the most frequent score in
our data set. On a histogram it
represents the highest bar in a bar
chart or histogram. You can, therefore,
sometimes consider the mode as
being the most popular option. An
example of a mode is presented
below:
Normally, the mode is used for
categorical data where we wish to know
which is the most common category as
illustrated below:
One of the problems with the mode is that it is
not unique, so it leaves us with problems when
we have two or more values that share the
highest frequency, such as below:
Summary of when to use the
 mean, median and mode
Using the following summary table to know
what the best measure of central tendency is
with respect to the different types of variable.
MEASURES OF
DISPERSION
Presenter: Nguyen Ngoc Cam
Measures of Dispersion

Measure of central tendency give us good information about the scores in
our distribution.

However, we can have very different shapes to our distribution, yet have
the same central tendency.

Measures of dispersion or variability will give us information about the
spread of the scores in our distribution.

Are the scores clustered close together over a small portion of the scale, or
are the scores spread out over a large segment of the scale?
Main points:



1. Range

2. Standard Deviation

3. Variance
1. Range


The difference between the biggest and the
smallest number in the data of the group.

The range tells you how spread out the data
is.
1. Range
1. Range

Problem:

1. It changes drastically with the magnitude of the extreme
  scores

2. It’s an unstable measure  rarely used for statistical
  analyses
2. Standard Deviation

Standard Deviation is the most frequently used
measure of variability.

It looks at the average variability of all the score
around the mean, all the scores are taken into
account.
2. Standard Deviation

The larger the Standard Deviation, the more
variability from the central point in the
distribution.

The smaller the Standard Deviation, the closer
the distribution is to the central point.
2. Standard Deviation
2. Standard Deviation
2. Standard Deviation

The SD tells us the standard of how far out from
the point of central tendency the individual
scores are distributed.

It tells us information that the mean doesn’t 
as important or even more important than the
mean
3. Variance
PAIRED T-TEST
Presenter: Tran Thi Ngan Giang
Introduction
• A paired t-test is used to compare two population
  means where you have two samples in which
  observations in one sample can be paired with
  observations in the other sample.
• For example:
• A diagnostic test was made before studying a
  particular module and then again after
  completing the module. We want to find out if, in
  general, our teaching leads to improvements in
  students’ knowledge/skills.
First, we see the descriptive statistics
           for both variables.




The post-test mean scores are higher.
Next, we see the correlation between
          the two variables.




There is a strong positive correlation. People who
did well on the pre-test also did well on the post-
test.
Finally, we see the T, degrees of
           freedom, and significance.
• Our significance is .053
• If the significance value is less
  than .05, there is a significant
  difference.
  If the significance value is greater
  than. 05, there is no significant
  difference.
• Here, we see that the significance
  value is approaching
  significance, but it is not a
  significant difference. There is no
  difference between pre- and
  post-test scores. Our test
  preparation course did not help!
INDEPENDENT SAMPLES T-
TESTS

Presenter: Dinh Quoc Minh Dang
Outline



  1. Introduction

  2. Hypothesis for the independent t-test

  3. What do you need to run an independent t-test?

  4. Formula

  5. Example (Calculating + Reporting)
Introduction


The independent t-test, also called the two sample t-test or student's t-test is
an inferential statistical test that determines whether there is a statistically
significant difference between the means in two unrelated groups.
Hypothesis for the independent t-test



The null hypothesis for the independent t-test is that the population means from the two
unrelated groups are equal:

H0: u1 = u2

In most cases, we are looking to see if we can show that we can reject the null hypothesis
and accept the alternative hypothesis, which is that the population means are not equal:

HA: u1 ≠ u2

To do this we need to set a significance level (alpha) that allows us to either reject or accept
the alternative hypothesis. Most commonly, this value is set at 0.05.
What do you need to run an independent t-test?


In order to run an independent t-test you need the following:

     1.      One independent, categorical variable that has two levels.

     2.      One dependent variable
Formula




          M: mean (the average score of the group)

          SD: Standard Deviation

          N: number of scores in each group

          Exp: Experimental Group

          Con: Control Group
Formula
Example
Example
Effect Size
Reporting the Result of an Independent T-Test


When reporting the result of an independent t-test, you need to include the t-
statistic value, the degrees of freedom (df) and the significance value of the
test (P-value). The format of the test result is: t(df) = t-statistic, P =
significance value.
Example result (APA Style)


An independent samples T-test is presented the same as the one-sample t-test:

                       t(75) = 2.11, p = .02 (one –tailed), d = .48

     Degrees
     of
     freedom
                  Value of                                             Effect
                  statistic                                            size if
                                   Significance      Include if test   available
                                   of statistic      is one-tailed




Example: Survey respondents who were employed by the federal, state, or local
government had significantly higher socioeconomic indices (M = 55.42, SD =
19.25) than survey respondents who were employed by a private employer (M =
47.54, SD = 18.94) , t(255) = 2.363, p = .01 (one-tailed).
Analysis of Variance (ANOVA)

      Presenter : Minh Sang
Introduction
We already learned about the chi square test
for independence, which is useful for data
that is measured at the nominal or ordinal
level of analysis.
If we have data measured at the interval
level, we can compare two or more
population groups in terms of their
population means using a technique called
analysis of variance, or ANOVA.
Completely randomized design
Population 1     Population 2….. Population k
Mean = 1         Mean = 2 …. Mean = k
Variance= 12     Variance= 22 … Variance = k2

  We want to know something about how the
  populations compare. Do they have the same
  mean? We can collect random samples from each
  population, which gives us the following data.
Completely randomized design
Mean = M1         Mean = M2 ..…      Mean = Mk
Variance=s12      Variance=s22 ….    Variance = sk2
N1 cases          N2 cases    ….     Nk cases

Suppose we want to compare 3 college majors in a
  business school by the average annual income
  people make 2 years after graduation. We collect
  the following data (in $1000s) based on random
  surveys.
Completely randomized design
Accounting   Marketing   Finance
27           23          48
22           36          35
33           27          46
25           44          36
38           39          28
29           32          29
Completely randomized design
Can the dean conclude that there are
  differences among the major’s incomes?
H o: 1 = 2 = 3
HA:   1   2    3

In this problem we must take into account:
1) The variance between samples, or the actual
   differences by major. This is called the sum of
   squares for treatment (SST).
Completely randomized design
2) The variance within samples, or the
  variance of incomes within a single major.
  This is called the sum of squares for error
  (SSE).
Recall that when we sample, there will always
  be a chance of getting something different
  than the population. We account for this
  through #2, or the SSE.
F-Statistic
For this test, we will calculate a F
  statistic, which is used to compare
  variances.
F = SST/(k-1)
    SSE/(n-k)
SST=sum of squares for treatment
SSE=sum of squares for error
k = the number of populations
N = total sample size
F-statistic
Intuitively, the F statistic is:
F = explained variance
  unexplained variance
Explained variance is the difference between
  majors
Unexplained variance is the difference based
  on random sampling for each group (see
  Figure 10-1, page 327)
Calculating SST
SST = ni(Mi - )2
  = grand mean or = Mi/k or the sum of
  all values for all groups divided by total
  sample size
Mi = mean for each sample
k= the number of populations
Calculating SST
By major
Accounting       M1=29, n1=6
Marketing        M2=33.5, n2=6
Finance          M3=37, n3=6
   = (29+33.5+37)/3 = 33.17
SST = (6)(29-33.17)2 + (6)(33.5-33.17)2 +
  (6)(37-33.17)2 = 193
Calculating SST
Note that when M1 = M2 = M3, then SST=0
  which would support the null hypothesis.
In this example, the samples are of equal size,
  but we can also run this analysis with
  samples of varying size also.
Calculating SSE
SSE = (Xit – Mi)2
In other words, it is just the variance for each sample
   added together.
SSE = (X1t – M1)2 + (X2t – M2)2 +
        (X3t – M3)2
SSE = [(27-29)2 + (22-29)2 +…+ (29-29)2]
      + [(23-33.5)2 + (36-33.5)2 +…]
      + [(48-37)2 + (35-37)2 +…+ (29-37)2]
SSE = 819.5
Statistical Output
When you estimate this information in a computer
 program, it will typically be presented in a table as
 follows:
Source of    df    Sum of       Mean            F-ratio
Variation          squares      squares
Treatment    k-1   SST          MST=SST/(k-1) F=MST
Error        n-k   SSE          MSE=SSE/(n-k)     MSE
Total        n-1   SS=SST+SSE
Calculating F for our example
F = 193/2
   819.5/15
F = 1.77
Our calculated F is compared to the critical
  value using the F-distribution with
  F , k-1, n-k degrees of freedom
k-1 (numerator df)
n-k (denominator df)
The Results
For 95% confidence ( =.05), our critical F is
  3.68 (averaging across the values at 14 and
  16
In this case, 1.77 < 3.68 so we must accept the
  null hypothesis.
The dean is puzzled by these results because
  just by eyeballing the data, it looks like
  finance majors make more money.
The Results
Many other factors may determine the salary
 level, such as GPA. The dean decides to
 collect new data selecting one student
 randomly from each major with the
 following average grades.
New data
Average Accounting   Marketing    Finance M(b)
A+       41          45           51      M(b1)=45.67
A        36          38           45      M(b2)=39.67
B+       27          33           31      M(b3)=30.83
B        32          29           35      M(b4)=32
C+       26          31           32      M(b5)=29.67
C        23          25           27      M(b6)=25
       M(t)1=30.83   M(t)2=33.5   M(t)3=36.83

  = 33.72
Randomized Block Design
Now the data in the 3 samples are not
 independent, they are matched by GPA
 levels. Just like before, matched samples
 are superior to unmatched samples because
 they provide more information. In this
 case, we have added a factor that may
 account for some of the SSE.
Two way ANOVA
Now SS(total) = SST + SSB + SSE
Where SSB = the variability among blocks,
 where a block is a matched group of
 observations from each of the populations
We can calculate a two-way ANOVA to test
 our null hypothesis. We will talk about this
 next week.

Contenu connexe

Tendances

Stat3 central tendency & dispersion
Stat3 central tendency & dispersionStat3 central tendency & dispersion
Stat3 central tendency & dispersionForensic Pathology
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadRione Drevale
 
Thiyagu measures of central tendency final
Thiyagu   measures of central tendency finalThiyagu   measures of central tendency final
Thiyagu measures of central tendency finalThiyagu K
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencykreshajay
 
Introduction to the t test
Introduction to the t testIntroduction to the t test
Introduction to the t testSr Edith Bogue
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic tradingQuantInsti
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyNilanjan Bhaumik
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyPrithwis Mukerjee
 
3 descritive statistics measure of central tendency variatio
3 descritive statistics measure of   central   tendency variatio3 descritive statistics measure of   central   tendency variatio
3 descritive statistics measure of central tendency variatioLama K Banna
 
Measure of central tendency (Mean, Median and Mode)
Measure of central tendency (Mean, Median and Mode)Measure of central tendency (Mean, Median and Mode)
Measure of central tendency (Mean, Median and Mode)Shakehand with Life
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyAlex Chris
 

Tendances (20)

Measures Of Central Tendencies
Measures Of Central TendenciesMeasures Of Central Tendencies
Measures Of Central Tendencies
 
Stat3 central tendency & dispersion
Stat3 central tendency & dispersionStat3 central tendency & dispersion
Stat3 central tendency & dispersion
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 
Central tendency and Measure of Dispersion
Central tendency and Measure of DispersionCentral tendency and Measure of Dispersion
Central tendency and Measure of Dispersion
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spread
 
Thiyagu measures of central tendency final
Thiyagu   measures of central tendency finalThiyagu   measures of central tendency final
Thiyagu measures of central tendency final
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Introduction to the t test
Introduction to the t testIntroduction to the t test
Introduction to the t test
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic trading
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
Stat11t chapter2
Stat11t chapter2Stat11t chapter2
Stat11t chapter2
 
3 descritive statistics measure of central tendency variatio
3 descritive statistics measure of   central   tendency variatio3 descritive statistics measure of   central   tendency variatio
3 descritive statistics measure of central tendency variatio
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Chap07 interval estimation
Chap07 interval estimationChap07 interval estimation
Chap07 interval estimation
 
Measure of central tendency (Mean, Median and Mode)
Measure of central tendency (Mean, Median and Mode)Measure of central tendency (Mean, Median and Mode)
Measure of central tendency (Mean, Median and Mode)
 
Hypo
HypoHypo
Hypo
 
Measure of Central Tendency
Measure of Central TendencyMeasure of Central Tendency
Measure of Central Tendency
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Stat11t chapter3
Stat11t chapter3Stat11t chapter3
Stat11t chapter3
 

Similaire à Data analysis

Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysismetalkid132
 
Topic 2 Measures of Central Tendency.pptx
Topic 2   Measures of Central Tendency.pptxTopic 2   Measures of Central Tendency.pptx
Topic 2 Measures of Central Tendency.pptxCallplanetsDeveloper
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsmolly joy
 
Machine learning pre requisite
Machine learning pre requisiteMachine learning pre requisite
Machine learning pre requisiteRam Singh
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxAnusuya123
 
Day 4 normal curve and standard scores
Day 4 normal curve and standard scoresDay 4 normal curve and standard scores
Day 4 normal curve and standard scoresElih Sutisna Yanto
 
Statistics and permeability engineering reports
Statistics and permeability engineering reportsStatistics and permeability engineering reports
Statistics and permeability engineering reportswwwmostafalaith99
 
Inferential Statistics.pptx
Inferential Statistics.pptxInferential Statistics.pptx
Inferential Statistics.pptxjonatanjohn1
 
Basic Statistical Concepts in Machine Learning.pptx
Basic Statistical Concepts in Machine Learning.pptxBasic Statistical Concepts in Machine Learning.pptx
Basic Statistical Concepts in Machine Learning.pptxbajajrishabh96tech
 
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docxQUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docxmakdul
 
Describing quantitative data with numbers
Describing quantitative data with numbersDescribing quantitative data with numbers
Describing quantitative data with numbersUlster BOCES
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyPrithwis Mukerjee
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research dataAtula Ahuja
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research dataAtula Ahuja
 
Kinds Of Variables Kato Begum
Kinds Of Variables Kato BegumKinds Of Variables Kato Begum
Kinds Of Variables Kato BegumDr. Cupid Lucid
 
Frequencies, Proportion, GraphsFebruary 8th, 2016Frequen.docx
Frequencies, Proportion, GraphsFebruary 8th, 2016Frequen.docxFrequencies, Proportion, GraphsFebruary 8th, 2016Frequen.docx
Frequencies, Proportion, GraphsFebruary 8th, 2016Frequen.docxhanneloremccaffery
 

Similaire à Data analysis (20)

Experimental design data analysis
Experimental design data analysisExperimental design data analysis
Experimental design data analysis
 
Chapter 11 Psrm
Chapter 11 PsrmChapter 11 Psrm
Chapter 11 Psrm
 
Topic 2 Measures of Central Tendency.pptx
Topic 2   Measures of Central Tendency.pptxTopic 2   Measures of Central Tendency.pptx
Topic 2 Measures of Central Tendency.pptx
 
statistics
statisticsstatistics
statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Machine learning pre requisite
Machine learning pre requisiteMachine learning pre requisite
Machine learning pre requisite
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
 
Day 4 normal curve and standard scores
Day 4 normal curve and standard scoresDay 4 normal curve and standard scores
Day 4 normal curve and standard scores
 
Statistics and permeability engineering reports
Statistics and permeability engineering reportsStatistics and permeability engineering reports
Statistics and permeability engineering reports
 
Inferential Statistics.pptx
Inferential Statistics.pptxInferential Statistics.pptx
Inferential Statistics.pptx
 
Basic Statistical Concepts in Machine Learning.pptx
Basic Statistical Concepts in Machine Learning.pptxBasic Statistical Concepts in Machine Learning.pptx
Basic Statistical Concepts in Machine Learning.pptx
 
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docxQUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
 
Describing quantitative data with numbers
Describing quantitative data with numbersDescribing quantitative data with numbers
Describing quantitative data with numbers
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
Descriptive statistics i
Descriptive statistics iDescriptive statistics i
Descriptive statistics i
 
data
datadata
data
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research data
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research data
 
Kinds Of Variables Kato Begum
Kinds Of Variables Kato BegumKinds Of Variables Kato Begum
Kinds Of Variables Kato Begum
 
Frequencies, Proportion, GraphsFebruary 8th, 2016Frequen.docx
Frequencies, Proportion, GraphsFebruary 8th, 2016Frequen.docxFrequencies, Proportion, GraphsFebruary 8th, 2016Frequen.docx
Frequencies, Proportion, GraphsFebruary 8th, 2016Frequen.docx
 

Plus de metalkid132

Qualitative research method
Qualitative research methodQualitative research method
Qualitative research methodmetalkid132
 
Problem (how to form good research question)
Problem (how to form good research question)Problem (how to form good research question)
Problem (how to form good research question)metalkid132
 
Literature review
Literature reviewLiterature review
Literature reviewmetalkid132
 
Experimental design
Experimental designExperimental design
Experimental designmetalkid132
 
Quantitative reseach method
Quantitative reseach methodQuantitative reseach method
Quantitative reseach methodmetalkid132
 
Chapter 8 (guidelines and communication strategies for disclosure)
Chapter 8 (guidelines and communication strategies for disclosure)Chapter 8 (guidelines and communication strategies for disclosure)
Chapter 8 (guidelines and communication strategies for disclosure)metalkid132
 
Chapter 7 (communication in the stages of relationships)
Chapter 7 (communication in the stages of relationships)Chapter 7 (communication in the stages of relationships)
Chapter 7 (communication in the stages of relationships)metalkid132
 
Chapter 6 (intercultural communication competence)
Chapter 6 (intercultural communication competence)Chapter 6 (intercultural communication competence)
Chapter 6 (intercultural communication competence)metalkid132
 
Chapter 5 (what is listening + types)
Chapter 5 (what is listening + types)Chapter 5 (what is listening + types)
Chapter 5 (what is listening + types)metalkid132
 
Chapter 4 (types of non verbal communication)
Chapter 4 (types of non verbal communication)Chapter 4 (types of non verbal communication)
Chapter 4 (types of non verbal communication)metalkid132
 
Chapter 3 (use language that makes your messages memorable)
Chapter 3 (use language that makes your messages memorable)Chapter 3 (use language that makes your messages memorable)
Chapter 3 (use language that makes your messages memorable)metalkid132
 
Chapter 2 (perception of others)
Chapter 2 (perception of others)Chapter 2 (perception of others)
Chapter 2 (perception of others)metalkid132
 
Chapter 9 (social friendship groups)
Chapter 9 (social friendship groups)Chapter 9 (social friendship groups)
Chapter 9 (social friendship groups)metalkid132
 
Chapter 6 (intercultural communication competence)
Chapter 6 (intercultural communication competence)Chapter 6 (intercultural communication competence)
Chapter 6 (intercultural communication competence)metalkid132
 

Plus de metalkid132 (16)

Qualitative research method
Qualitative research methodQualitative research method
Qualitative research method
 
Problem (how to form good research question)
Problem (how to form good research question)Problem (how to form good research question)
Problem (how to form good research question)
 
Literature review
Literature reviewLiterature review
Literature review
 
Experimental design
Experimental designExperimental design
Experimental design
 
Apa style
Apa styleApa style
Apa style
 
Action research
Action researchAction research
Action research
 
Quantitative reseach method
Quantitative reseach methodQuantitative reseach method
Quantitative reseach method
 
Chapter 8 (guidelines and communication strategies for disclosure)
Chapter 8 (guidelines and communication strategies for disclosure)Chapter 8 (guidelines and communication strategies for disclosure)
Chapter 8 (guidelines and communication strategies for disclosure)
 
Chapter 7 (communication in the stages of relationships)
Chapter 7 (communication in the stages of relationships)Chapter 7 (communication in the stages of relationships)
Chapter 7 (communication in the stages of relationships)
 
Chapter 6 (intercultural communication competence)
Chapter 6 (intercultural communication competence)Chapter 6 (intercultural communication competence)
Chapter 6 (intercultural communication competence)
 
Chapter 5 (what is listening + types)
Chapter 5 (what is listening + types)Chapter 5 (what is listening + types)
Chapter 5 (what is listening + types)
 
Chapter 4 (types of non verbal communication)
Chapter 4 (types of non verbal communication)Chapter 4 (types of non verbal communication)
Chapter 4 (types of non verbal communication)
 
Chapter 3 (use language that makes your messages memorable)
Chapter 3 (use language that makes your messages memorable)Chapter 3 (use language that makes your messages memorable)
Chapter 3 (use language that makes your messages memorable)
 
Chapter 2 (perception of others)
Chapter 2 (perception of others)Chapter 2 (perception of others)
Chapter 2 (perception of others)
 
Chapter 9 (social friendship groups)
Chapter 9 (social friendship groups)Chapter 9 (social friendship groups)
Chapter 9 (social friendship groups)
 
Chapter 6 (intercultural communication competence)
Chapter 6 (intercultural communication competence)Chapter 6 (intercultural communication competence)
Chapter 6 (intercultural communication competence)
 

Data analysis

  • 2. The mean, median and mode Presenter: Huu Loc The mean, median and mode are all valid measures of central tendency but, under different conditions, some measures of central tendency become more appropriate to use than others.
  • 3. Mean  The mean (or average) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data, although its use is most often with continuous data.  The mean is equal to the sum of all the values in the data set divided by the number of values in the data set.
  • 4. So, if we have n values in a data set and they have values x1, x2, ..., xn, then the sample mean, usually denoted by (pronounced x bar), is:
  • 5.  The mean is essentially a model of your data set. It is the value that is most common.  An important property of the mean is that it includes every value in your data set as part of the calculation. In addition, the mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero.
  • 6. Median The median is the middle score for a set of data that has been arranged in order of magnitude. The median is less affected by outliers and skewed data. In order to calculate the median, suppose we have the data below: We first need to rearrange that data into order of magnitude (smallest first):
  • 7. Our median mark is the middle mark - in this case 56 (highlighted in bold). It is the middle mark because there are 5 scores before it and 5 scores after it.
  • 8. Mode The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most popular option. An example of a mode is presented below:
  • 9.
  • 10. Normally, the mode is used for categorical data where we wish to know which is the most common category as illustrated below:
  • 11. One of the problems with the mode is that it is not unique, so it leaves us with problems when we have two or more values that share the highest frequency, such as below:
  • 12. Summary of when to use the mean, median and mode Using the following summary table to know what the best measure of central tendency is with respect to the different types of variable.
  • 14. Measures of Dispersion Measure of central tendency give us good information about the scores in our distribution. However, we can have very different shapes to our distribution, yet have the same central tendency. Measures of dispersion or variability will give us information about the spread of the scores in our distribution. Are the scores clustered close together over a small portion of the scale, or are the scores spread out over a large segment of the scale?
  • 15. Main points: 1. Range 2. Standard Deviation 3. Variance
  • 16. 1. Range The difference between the biggest and the smallest number in the data of the group. The range tells you how spread out the data is.
  • 18. 1. Range Problem: 1. It changes drastically with the magnitude of the extreme scores 2. It’s an unstable measure  rarely used for statistical analyses
  • 19. 2. Standard Deviation Standard Deviation is the most frequently used measure of variability. It looks at the average variability of all the score around the mean, all the scores are taken into account.
  • 20. 2. Standard Deviation The larger the Standard Deviation, the more variability from the central point in the distribution. The smaller the Standard Deviation, the closer the distribution is to the central point.
  • 23. 2. Standard Deviation The SD tells us the standard of how far out from the point of central tendency the individual scores are distributed. It tells us information that the mean doesn’t  as important or even more important than the mean
  • 26. Introduction • A paired t-test is used to compare two population means where you have two samples in which observations in one sample can be paired with observations in the other sample. • For example: • A diagnostic test was made before studying a particular module and then again after completing the module. We want to find out if, in general, our teaching leads to improvements in students’ knowledge/skills.
  • 27. First, we see the descriptive statistics for both variables. The post-test mean scores are higher.
  • 28. Next, we see the correlation between the two variables. There is a strong positive correlation. People who did well on the pre-test also did well on the post- test.
  • 29. Finally, we see the T, degrees of freedom, and significance. • Our significance is .053 • If the significance value is less than .05, there is a significant difference. If the significance value is greater than. 05, there is no significant difference. • Here, we see that the significance value is approaching significance, but it is not a significant difference. There is no difference between pre- and post-test scores. Our test preparation course did not help!
  • 31. Outline 1. Introduction 2. Hypothesis for the independent t-test 3. What do you need to run an independent t-test? 4. Formula 5. Example (Calculating + Reporting)
  • 32. Introduction The independent t-test, also called the two sample t-test or student's t-test is an inferential statistical test that determines whether there is a statistically significant difference between the means in two unrelated groups.
  • 33. Hypothesis for the independent t-test The null hypothesis for the independent t-test is that the population means from the two unrelated groups are equal: H0: u1 = u2 In most cases, we are looking to see if we can show that we can reject the null hypothesis and accept the alternative hypothesis, which is that the population means are not equal: HA: u1 ≠ u2 To do this we need to set a significance level (alpha) that allows us to either reject or accept the alternative hypothesis. Most commonly, this value is set at 0.05.
  • 34. What do you need to run an independent t-test? In order to run an independent t-test you need the following: 1. One independent, categorical variable that has two levels. 2. One dependent variable
  • 35. Formula M: mean (the average score of the group) SD: Standard Deviation N: number of scores in each group Exp: Experimental Group Con: Control Group
  • 40. Reporting the Result of an Independent T-Test When reporting the result of an independent t-test, you need to include the t- statistic value, the degrees of freedom (df) and the significance value of the test (P-value). The format of the test result is: t(df) = t-statistic, P = significance value.
  • 41. Example result (APA Style) An independent samples T-test is presented the same as the one-sample t-test: t(75) = 2.11, p = .02 (one –tailed), d = .48 Degrees of freedom Value of Effect statistic size if Significance Include if test available of statistic is one-tailed Example: Survey respondents who were employed by the federal, state, or local government had significantly higher socioeconomic indices (M = 55.42, SD = 19.25) than survey respondents who were employed by a private employer (M = 47.54, SD = 18.94) , t(255) = 2.363, p = .01 (one-tailed).
  • 42. Analysis of Variance (ANOVA) Presenter : Minh Sang
  • 43. Introduction We already learned about the chi square test for independence, which is useful for data that is measured at the nominal or ordinal level of analysis. If we have data measured at the interval level, we can compare two or more population groups in terms of their population means using a technique called analysis of variance, or ANOVA.
  • 44. Completely randomized design Population 1 Population 2….. Population k Mean = 1 Mean = 2 …. Mean = k Variance= 12 Variance= 22 … Variance = k2 We want to know something about how the populations compare. Do they have the same mean? We can collect random samples from each population, which gives us the following data.
  • 45. Completely randomized design Mean = M1 Mean = M2 ..… Mean = Mk Variance=s12 Variance=s22 …. Variance = sk2 N1 cases N2 cases …. Nk cases Suppose we want to compare 3 college majors in a business school by the average annual income people make 2 years after graduation. We collect the following data (in $1000s) based on random surveys.
  • 46. Completely randomized design Accounting Marketing Finance 27 23 48 22 36 35 33 27 46 25 44 36 38 39 28 29 32 29
  • 47. Completely randomized design Can the dean conclude that there are differences among the major’s incomes? H o: 1 = 2 = 3 HA: 1 2 3 In this problem we must take into account: 1) The variance between samples, or the actual differences by major. This is called the sum of squares for treatment (SST).
  • 48. Completely randomized design 2) The variance within samples, or the variance of incomes within a single major. This is called the sum of squares for error (SSE). Recall that when we sample, there will always be a chance of getting something different than the population. We account for this through #2, or the SSE.
  • 49. F-Statistic For this test, we will calculate a F statistic, which is used to compare variances. F = SST/(k-1) SSE/(n-k) SST=sum of squares for treatment SSE=sum of squares for error k = the number of populations N = total sample size
  • 50. F-statistic Intuitively, the F statistic is: F = explained variance unexplained variance Explained variance is the difference between majors Unexplained variance is the difference based on random sampling for each group (see Figure 10-1, page 327)
  • 51. Calculating SST SST = ni(Mi - )2 = grand mean or = Mi/k or the sum of all values for all groups divided by total sample size Mi = mean for each sample k= the number of populations
  • 52. Calculating SST By major Accounting M1=29, n1=6 Marketing M2=33.5, n2=6 Finance M3=37, n3=6 = (29+33.5+37)/3 = 33.17 SST = (6)(29-33.17)2 + (6)(33.5-33.17)2 + (6)(37-33.17)2 = 193
  • 53. Calculating SST Note that when M1 = M2 = M3, then SST=0 which would support the null hypothesis. In this example, the samples are of equal size, but we can also run this analysis with samples of varying size also.
  • 54. Calculating SSE SSE = (Xit – Mi)2 In other words, it is just the variance for each sample added together. SSE = (X1t – M1)2 + (X2t – M2)2 + (X3t – M3)2 SSE = [(27-29)2 + (22-29)2 +…+ (29-29)2] + [(23-33.5)2 + (36-33.5)2 +…] + [(48-37)2 + (35-37)2 +…+ (29-37)2] SSE = 819.5
  • 55. Statistical Output When you estimate this information in a computer program, it will typically be presented in a table as follows: Source of df Sum of Mean F-ratio Variation squares squares Treatment k-1 SST MST=SST/(k-1) F=MST Error n-k SSE MSE=SSE/(n-k) MSE Total n-1 SS=SST+SSE
  • 56. Calculating F for our example F = 193/2 819.5/15 F = 1.77 Our calculated F is compared to the critical value using the F-distribution with F , k-1, n-k degrees of freedom k-1 (numerator df) n-k (denominator df)
  • 57. The Results For 95% confidence ( =.05), our critical F is 3.68 (averaging across the values at 14 and 16 In this case, 1.77 < 3.68 so we must accept the null hypothesis. The dean is puzzled by these results because just by eyeballing the data, it looks like finance majors make more money.
  • 58. The Results Many other factors may determine the salary level, such as GPA. The dean decides to collect new data selecting one student randomly from each major with the following average grades.
  • 59. New data Average Accounting Marketing Finance M(b) A+ 41 45 51 M(b1)=45.67 A 36 38 45 M(b2)=39.67 B+ 27 33 31 M(b3)=30.83 B 32 29 35 M(b4)=32 C+ 26 31 32 M(b5)=29.67 C 23 25 27 M(b6)=25 M(t)1=30.83 M(t)2=33.5 M(t)3=36.83 = 33.72
  • 60. Randomized Block Design Now the data in the 3 samples are not independent, they are matched by GPA levels. Just like before, matched samples are superior to unmatched samples because they provide more information. In this case, we have added a factor that may account for some of the SSE.
  • 61. Two way ANOVA Now SS(total) = SST + SSB + SSE Where SSB = the variability among blocks, where a block is a matched group of observations from each of the populations We can calculate a two-way ANOVA to test our null hypothesis. We will talk about this next week.