SlideShare une entreprise Scribd logo
1  sur  51
T tests, ANOVAs and
       regression

        Tom Jenkins
       Ellen Meierotto
SPM Methods for Dummies 2007
Why do we need t tests?
Objectives
   Types of error
   Probability distribution
   Z scores
   T tests
   ANOVAs
Error
   Null hypothesis
   Type 1 error (α): false positive
   Type 2 error (β): false negative
Normal distribution
Z scores
   Standardised normal distribution
   µ = 0, σ = 1
   Z scores: 0, 1, 1.65, 1.96
   Need to know population standard
    deviation


                              Z=(x-μ)/σ    for
                              one point
                              compared to pop.
T tests
   Comparing means
   1 sample t
   2 sample t
   Paired t
Different sample variances
2 sample t tests

            x1 − x 2
        t =
             s x1 −x2
                                   2   2
Pooled standard
                                  s1 s 2
                                =   +
error of the mean
                    s x1 − x2
                                  n1 n2
1 sample t test
The effect of degrees of
freedom on t distribution
Paired t tests
T tests in SPM: Did the observed signal
change occur by chance or is it stat.
significant?
   Recall GLM. Y= X β + ε
   β1 is an estimate of signal change over time
    attributable to the condition of interest
   Set up contrast (cT) 1 0 for β1: 1xβ1+0xβ2+0xβn/s.d
   Null hypothesis: cTβ=0 No significant effect at each
    voxel for condition β1
   Contrast 1 -1 : Is the difference between 2 conditions
    significantly non-zero?
   t = cTβ/sd[cTβ] – 1 sided
ANOVA
   Variances not means
   Total variance= model variance + error variance
   Results in F score- corresponding to a p value



Variance

        n

       ∑ ( xi − x ) 2       F test = Model variance /Error
                            variance
s2 =   i =1
              n −1
Partitioning the variance




Group   Group       Group   Group       Group   Group
1       2           1       2           1       2



  Total         =     Model         +      Error
                      (Between groups) (Within groups)
T vs F tests
   F tests- any differences between
    multiple groups, interactions
   Have to determine where differences
    are post-hoc
   SPM- T- one tailed (con)
   SPM- F- two tailed (ess)
Conclusions
   T tests describe how unlikely it is that experimental
    differences are due to chance
   Higher the t score, smaller the p value, more unlikely
    to be due to chance
   Can compare sample with population or 2 samples,
    paired or unpaired
   ANOVA/F tests are similar but use variances instead
    of means and can be applied to more than 2 groups
    and other more complex scenarios
Acknowledgements
   MfD slides 2004-2006
   Van Belle, Biostatistics
   Human Brain Function
   Wikipedia
Correlation and Regression
Topics Covered:
   Is there a relationship between x and y?
   What is the strength of this relationship
       Pearson’s r
   Can we describe this relationship and use it to predict
    y from x?
       Regression
   Is the relationship we have described statistically
    significant?
       F- and t-tests
   Relevance to SPM
       GLM
Relationship between x and y
   Correlation describes the strength and
    direction of a linear relationship between two
    variables
   Regression tells you how well a certain
    independent variable predicts a dependent
    variable

    CORRELATION ≠ CAUSATION
       In order to infer causality: manipulate independent
        variable and observe effect on dependent variable
Scattergrams

       Y                        Y                         Y
           Y                        Y                         Y




                       X                          X                    X




Positive correlation       Negative correlation       No correlation
Variance vs. Covariance
   Do two variables change together?
                                n

Variance ~
                               ∑(x    i   − x)   2


                      S =
                       2
                       x
                               i =1
    DX * DX                           n
                                n

Covariance ~                   ∑(x    i   − x)( yi − y )
    DX * DY
               cov( x, y ) =   i =1
                                            n
Covariance

                         n

                        ∑(x    i   − x)( yi − y )
        cov( x, y ) =   i =1
                                     n
   When X and Y : cov (x,y) = pos.
   When X and Y : cov (x,y) = neg.
   When no constant relationship: cov (x,y)
    =0
Example Covariance

7

6                                                   x       y     xi − x   yi − y   ( xi − x )( yi − y )
5
                                                    0       3     -3       0                   0
4
                                                    2       2     -1       -1                  1
3

2
                                                    3       4     0        1                   0
1
                                                    4       0     1        -3                  -3
0
                                                    6       6     3        3                   9
    0    1   2   3      4        5       6   7
                                                    x=3     y=3                               ∑= 7

                  n

                 ∑(x        i   − x)( yi − y ))
                                                   7                       What does this
cov( x, y ) =    i =1
                                                  = = 1.4                  number tell us?
                                     n             5
Example of how covariance value
       relies on variance
                       High variance data                        Low variance data
                                                   
                                                   
Subject          x           y      x error * y       x               y            X error * y
                                    error                                          error
                                                   
1                101         100    2500              54              53           9
                                                   
2                81          80     900               53              52           4
3                61          60     100               52              51           1
4                51          50     0                 51              50           0
5                41          40     100               50              49           1
6                21          20     900               49              48           4
7                1           0      2500              48              47           9
Mean             51          50                       51              50


Sum of x error * y error :          7000              Sum of x error * y error :   28

Covariance:                         1166.67           Covariance:                  4.67
Pearson’s R
          − ∞ ≤ cov( x, y ) ≤ ∞
   Covariance does not really tell us
    anything
       Solution: standardise this measure
   Pearson’s R: standardise by adding std
    to equation:       cov( x, y )
                      rxy =
                               sx s y
Basic assumptions
   Normal distributions
   Variances are constant and not zero
   Independent sampling – no autocorrelations
   No errors in the values of the independent
    variable
   All causation in the model is one-way (not
    necessary mathematically, but essential for
    prediction)
Pearson’s R: degree of linear
  dependence


                 n                                   n

                ∑ (x   i   − x)( yi − y )           ∑ ( x − x)( y − y)
                                                           i                i
cov( x, y ) =   i =1
                                            rxy =   i =1

                             n                                  nsx s y



  −1 ≤ r ≤ 1
                                                     n

                                                    ∑Z     xi      * Z yi
                                            rxy =   i =1
                                                               n
Limitations of r
                 ˆ
    r is actually r
      r = true r of whole population

     
         ˆ
        r = estimate of r based on data

    r is very sensitive to extreme values:

               5


               4


               3


               2


               1


               0
                   0   1   2   3   4   5     6
In the real world…
   r is never 1 or –1
   interpretations for correlations in
    psychological research (Cohen)

Correlation         Negative         Positive
Small               -0.29 to -0.10   00.10 to 0.29
Medium              -0.49 to -0.30   0.30 to 0.49
Large               -1.00 to -0.50   0.50 to 1.00
Regression
   Correlation tells you if there is an
    association between x and y but it
    doesn’t describe the relationship or
    allow you to predict one variable from
    the other.

   To do this we need REGRESSION!
Best-fit Line
   Aim of linear regression is to fit a straight line, ŷ = ax + b, to data
    that gives best prediction of y for any value of x

   This will be the line that
                                                                          ŷ = ax + b
    minimises distance between                                                slope        intercept
    data and fitted line, i.e.
    the residuals                                            ε




                                                                      = ŷ, predicted value
                                                                      = y i , true value
                                                                   ε = residual error
Least Squares Regression
     To find the best line we must minimise the
      sum of the squares of the residuals (the
      vertical distances from the data points to
      our line)
    Model line: ŷ = ax + b    a = slope, b = intercept

    Residual (ε) = y - ŷ
    Sum of squares of residuals = Σ (y – ŷ)2

   we must find values of a and b that minimise
                    Σ (y – ŷ)2
Finding b
   First we find the value of b that gives the min
    sum of squares


                                              b
             ε             b                          ε
       b




   Trying different values of b is equivalent to
    shifting the line up and down the scatter plot
Finding a
          Now we find the value of a that gives the min
           sum of squares


       b                    b                     b




   Trying out different values of a is equivalent to
    changing the slope of the line, while b stays
    constant
Minimising sums of squares
   Need to minimise Σ(y–ŷ)2
    ŷ = ax + b
   so need to minimise:




                                   sums of squares (S)
            Σ(y - ax - b)2

   If we plot the sums of
    squares for all different
    values of a and b we get a
    parabola, because it is a
    squared term                                                     Gradient = 0
                                                                     min S

                                                         Values of a and b
   So the min sum of squares is
    at the bottom of the curve,
    where the gradient is zero.
The maths bit
   So we can find a and b that give min sum of
    squares by taking partial derivatives of Σ(y -
    ax - b)2 with respect to a and b separately

   Then we solve these for 0 to give us the
    values of a and b that give the min sum of
    squares
The solution
           Doing this gives the following equations for a and b:

                   r sy              r = correlation coefficient of x and y
                a= s                 sy = standard deviation of y
                     x               sx = standard deviation of x

   You can see that:
       A low correlation coefficient gives a flatter slope (small
        value of a)
       Large spread of y, i.e. high standard deviation, results in a
        steeper slope (high value of a)
       Large spread of x, i.e. high standard deviation, results in a
        flatter slope (high value of a)
The solution cont.
           Our model equation is ŷ = ax + b
           This line must pass through the mean so:

            y = ax + b                b = y – ax
   We can put our equation into this giving:
                      r sy             r = correlation coefficient of x and y
        b=y-               x           sy = standard deviation of y
                       sx              sx = standard deviation of x
   The smaller the correlation, the closer the intercept is to the mean
    of y
Back to the model

   We can calculate the regression line for any
    data, but the important question is:
    How well does this line fit the data, or how
    good is it at predicting y from x?
How good is our model?
                                                ∑(y – y)2       SSy
         Total variance of y: sy2 =                        =
                                                  n-1           dfy

   Variance of predicted y values (ŷ):
                   ∑(ŷ – y)2       SSpred            This is the variance
          sŷ2 =                =                     explained by our
                      n-1              dfŷ           regression model

   Error variance:                                This is the variance of the error
                                                   between our predicted y values
                      ∑(y – ŷ)2         SSer       and the actual y values, and
           serror =
               2
                                   =               thus is the variance in y that is
                        n-2              dfer      NOT explained by the
                                                   regression model
How good is our model cont.
   Total variance = predicted variance + error variance
                          sy2 = sŷ2 + ser2

   Conveniently, via some complicated rearranging
                            sŷ2 = r2 sy2


                           r2 = sŷ2 / sy2

   so r2 is the proportion of the variance in y that is explained
    by our regression model
How good is our model cont.
   Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get:


                       ser2 = sy2 – r2sy2
                            = sy2 (1 – r2)

   From this we can see that the greater the
    correlation the smaller the error variance, so the
    better our prediction
Is the model significant?
          i.e. do we get a significantly better prediction
           of y from our regression equation than by just
           predicting the mean?

          F-statistic:                complicated
                                       rearranging
                             sŷ2                r2 (n - 2)2
              F(df ,df ) =             =......=
                  ŷ   er
                             ser2                  1 – r2
   And it follows that:
                           r (n - 2)                 So all we need to
(because F = t 2) t(n-2) =                           know are r and n !
                            √1 – r2
General Linear Model
   Linear regression is actually a form of
    the General Linear Model where the
    parameters are a, the slope of the line,
    and b, the intercept.
                   y = ax + b +ε
   A General Linear Model is just any
    model that describes the data in terms
    of a straight line
Multiple regression
   Multiple regression is used to determine the effect of a
    number of independent variables, x1, x2, x3 etc., on a
    single dependent variable, y
   The different x variables are combined in a linear way
    and each has its own regression coefficient:

        y = a1x1+ a2x2 +…..+ anxn + b + ε

   The a parameters reflect the independent contribution of
    each independent variable, x, to the value of the
    dependent variable, y.
   i.e. the amount of variance in y that is accounted for by
    each x variable after all the other x variables have been
    accounted for
SPM
   Linear regression is a GLM that models the effect of one
    independent variable, x, on ONE dependent variable, y

   Multiple Regression models the effect of several independent
    variables, x1, x2 etc, on ONE dependent variable, y

   Both are types of General Linear Model

   GLM can also allow you to analyse the effects of several
    independent x variables on several dependent variables, y1, y2, y3
    etc, in a linear combination

   This is what SPM does and will be explained soon…

Contenu connexe

Tendances (20)

F test
F testF test
F test
 
Multiple Correlation - Thiyagu
Multiple Correlation - ThiyaguMultiple Correlation - Thiyagu
Multiple Correlation - Thiyagu
 
Analysis of variance anova
Analysis of variance anovaAnalysis of variance anova
Analysis of variance anova
 
Introduction to t-tests (statistics)
Introduction to t-tests (statistics)Introduction to t-tests (statistics)
Introduction to t-tests (statistics)
 
Correlation and partial correlation
Correlation and partial correlationCorrelation and partial correlation
Correlation and partial correlation
 
Chi square test
Chi square test Chi square test
Chi square test
 
non parametric statistics
non parametric statisticsnon parametric statistics
non parametric statistics
 
Partial Correlation - Thiyagu
Partial Correlation - ThiyaguPartial Correlation - Thiyagu
Partial Correlation - Thiyagu
 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation Analysis
 
Statistical distributions
Statistical distributionsStatistical distributions
Statistical distributions
 
Correlation
CorrelationCorrelation
Correlation
 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regression
 
Mpc 006 - 02-03 partial and multiple correlation
Mpc 006 - 02-03 partial and multiple correlationMpc 006 - 02-03 partial and multiple correlation
Mpc 006 - 02-03 partial and multiple correlation
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Rank correlation
Rank correlationRank correlation
Rank correlation
 
Spearman Rank Correlation - Thiyagu
Spearman Rank Correlation - ThiyaguSpearman Rank Correlation - Thiyagu
Spearman Rank Correlation - Thiyagu
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
 
PEARSON'CORRELATION
PEARSON'CORRELATION PEARSON'CORRELATION
PEARSON'CORRELATION
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysis
 
Factorial Design
Factorial DesignFactorial Design
Factorial Design
 

Similaire à T tests anovas and regression

Yangs First Lecture Ppt
Yangs First Lecture PptYangs First Lecture Ppt
Yangs First Lecture Pptyangcaoisu
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).pptMuhammadAftab89
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.pptRidaIrfan10
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.pptkrunal soni
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.pptMoinPasha12
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Sciencessuser71ac73
 
regression analysis .ppt
regression analysis .pptregression analysis .ppt
regression analysis .pptTapanKumarDash3
 
Accuracy
AccuracyAccuracy
Accuracyesraz
 
1 hofstad
1 hofstad1 hofstad
1 hofstadYandex
 
Statistical methods
Statistical methods Statistical methods
Statistical methods rcm business
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.pptarkian3
 
SimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptSimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptAdnanAli861711
 
Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.pptbranlymbunga1
 

Similaire à T tests anovas and regression (20)

Yangs First Lecture Ppt
Yangs First Lecture PptYangs First Lecture Ppt
Yangs First Lecture Ppt
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
 
Symmetrical2
Symmetrical2Symmetrical2
Symmetrical2
 
regression analysis .ppt
regression analysis .pptregression analysis .ppt
regression analysis .ppt
 
Mth 4101-2 b
Mth 4101-2 bMth 4101-2 b
Mth 4101-2 b
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
Accuracy
AccuracyAccuracy
Accuracy
 
1 hofstad
1 hofstad1 hofstad
1 hofstad
 
Statistical methods
Statistical methods Statistical methods
Statistical methods
 
Chapter 04
Chapter 04Chapter 04
Chapter 04
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
SimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptSimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.ppt
 
Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 

Plus de University Of Central Punjab

Causal Relationship between Macroeconomic Factors and Stock Prices in Pakistan
Causal Relationship between Macroeconomic Factors and Stock Prices in PakistanCausal Relationship between Macroeconomic Factors and Stock Prices in Pakistan
Causal Relationship between Macroeconomic Factors and Stock Prices in PakistanUniversity Of Central Punjab
 
A letter from DNA pioneer francis crick to his son
A letter from DNA pioneer francis crick to his sonA letter from DNA pioneer francis crick to his son
A letter from DNA pioneer francis crick to his sonUniversity Of Central Punjab
 

Plus de University Of Central Punjab (20)

Causal Relationship between Macroeconomic Factors and Stock Prices in Pakistan
Causal Relationship between Macroeconomic Factors and Stock Prices in PakistanCausal Relationship between Macroeconomic Factors and Stock Prices in Pakistan
Causal Relationship between Macroeconomic Factors and Stock Prices in Pakistan
 
A letter from DNA pioneer francis crick to his son
A letter from DNA pioneer francis crick to his sonA letter from DNA pioneer francis crick to his son
A letter from DNA pioneer francis crick to his son
 
International accounting standards ias intro
International accounting standards   ias introInternational accounting standards   ias intro
International accounting standards ias intro
 
Iasb framework
Iasb frameworkIasb framework
Iasb framework
 
Ias 7
Ias 7Ias 7
Ias 7
 
Ias 2
Ias 2Ias 2
Ias 2
 
Ias 1
Ias 1Ias 1
Ias 1
 
Cash flow
Cash flowCash flow
Cash flow
 
Annual report 2011 Packages
Annual report 2011 PackagesAnnual report 2011 Packages
Annual report 2011 Packages
 
Electricity & its regulations in America
Electricity & its regulations in AmericaElectricity & its regulations in America
Electricity & its regulations in America
 
Tobacco industry strategy
Tobacco industry strategyTobacco industry strategy
Tobacco industry strategy
 
Tobacco industrial article 2012
Tobacco industrial article 2012Tobacco industrial article 2012
Tobacco industrial article 2012
 
Corporate lobbying
Corporate lobbyingCorporate lobbying
Corporate lobbying
 
Federalism in india
Federalism in indiaFederalism in india
Federalism in india
 
Seven layers of atmosphere
Seven layers of atmosphereSeven layers of atmosphere
Seven layers of atmosphere
 
Scientific explanation for the event of miraj
Scientific explanation for the event of mirajScientific explanation for the event of miraj
Scientific explanation for the event of miraj
 
Reason for makkah being most peacful place
Reason for makkah being most peacful placeReason for makkah being most peacful place
Reason for makkah being most peacful place
 
Mentors are meant to be respected
Mentors are meant to be respectedMentors are meant to be respected
Mentors are meant to be respected
 
Makkah as center mean point of the world
Makkah as center mean point of the worldMakkah as center mean point of the world
Makkah as center mean point of the world
 
The power of quran healing
The power of quran healingThe power of quran healing
The power of quran healing
 

Dernier

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Dernier (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

T tests anovas and regression

  • 1. T tests, ANOVAs and regression Tom Jenkins Ellen Meierotto SPM Methods for Dummies 2007
  • 2. Why do we need t tests?
  • 3. Objectives  Types of error  Probability distribution  Z scores  T tests  ANOVAs
  • 4.
  • 5. Error  Null hypothesis  Type 1 error (α): false positive  Type 2 error (β): false negative
  • 7. Z scores  Standardised normal distribution  µ = 0, σ = 1  Z scores: 0, 1, 1.65, 1.96  Need to know population standard deviation Z=(x-μ)/σ for one point compared to pop.
  • 8. T tests  Comparing means  1 sample t  2 sample t  Paired t
  • 10. 2 sample t tests x1 − x 2 t = s x1 −x2 2 2 Pooled standard s1 s 2 = + error of the mean s x1 − x2 n1 n2
  • 11. 1 sample t test
  • 12. The effect of degrees of freedom on t distribution
  • 14. T tests in SPM: Did the observed signal change occur by chance or is it stat. significant?  Recall GLM. Y= X β + ε  β1 is an estimate of signal change over time attributable to the condition of interest  Set up contrast (cT) 1 0 for β1: 1xβ1+0xβ2+0xβn/s.d  Null hypothesis: cTβ=0 No significant effect at each voxel for condition β1  Contrast 1 -1 : Is the difference between 2 conditions significantly non-zero?  t = cTβ/sd[cTβ] – 1 sided
  • 15.
  • 16. ANOVA  Variances not means  Total variance= model variance + error variance  Results in F score- corresponding to a p value Variance n ∑ ( xi − x ) 2 F test = Model variance /Error variance s2 = i =1 n −1
  • 17. Partitioning the variance Group Group Group Group Group Group 1 2 1 2 1 2 Total = Model + Error (Between groups) (Within groups)
  • 18. T vs F tests  F tests- any differences between multiple groups, interactions  Have to determine where differences are post-hoc  SPM- T- one tailed (con)  SPM- F- two tailed (ess)
  • 19.
  • 20. Conclusions  T tests describe how unlikely it is that experimental differences are due to chance  Higher the t score, smaller the p value, more unlikely to be due to chance  Can compare sample with population or 2 samples, paired or unpaired  ANOVA/F tests are similar but use variances instead of means and can be applied to more than 2 groups and other more complex scenarios
  • 21. Acknowledgements  MfD slides 2004-2006  Van Belle, Biostatistics  Human Brain Function  Wikipedia
  • 23. Topics Covered:  Is there a relationship between x and y?  What is the strength of this relationship  Pearson’s r  Can we describe this relationship and use it to predict y from x?  Regression  Is the relationship we have described statistically significant?  F- and t-tests  Relevance to SPM  GLM
  • 24. Relationship between x and y  Correlation describes the strength and direction of a linear relationship between two variables  Regression tells you how well a certain independent variable predicts a dependent variable  CORRELATION ≠ CAUSATION  In order to infer causality: manipulate independent variable and observe effect on dependent variable
  • 25. Scattergrams Y Y Y Y Y Y X X X Positive correlation Negative correlation No correlation
  • 26. Variance vs. Covariance  Do two variables change together? n Variance ~ ∑(x i − x) 2 S = 2 x i =1 DX * DX n n Covariance ~ ∑(x i − x)( yi − y ) DX * DY cov( x, y ) = i =1 n
  • 27. Covariance n ∑(x i − x)( yi − y ) cov( x, y ) = i =1 n  When X and Y : cov (x,y) = pos.  When X and Y : cov (x,y) = neg.  When no constant relationship: cov (x,y) =0
  • 28. Example Covariance 7 6 x y xi − x yi − y ( xi − x )( yi − y ) 5 0 3 -3 0 0 4 2 2 -1 -1 1 3 2 3 4 0 1 0 1 4 0 1 -3 -3 0 6 6 3 3 9 0 1 2 3 4 5 6 7 x=3 y=3 ∑= 7 n ∑(x i − x)( yi − y )) 7 What does this cov( x, y ) = i =1 = = 1.4 number tell us? n 5
  • 29. Example of how covariance value relies on variance   High variance data    Low variance data         Subject x y x error * y   x y X error * y error   error   1 101 100 2500 54 53 9   2 81 80 900   53 52 4 3 61 60 100 52 51 1 4 51 50 0 51 50 0 5 41 40 100 50 49 1 6 21 20 900 49 48 4 7 1 0 2500 48 47 9 Mean 51 50 51 50 Sum of x error * y error : 7000 Sum of x error * y error : 28 Covariance: 1166.67 Covariance: 4.67
  • 30. Pearson’s R − ∞ ≤ cov( x, y ) ≤ ∞  Covariance does not really tell us anything  Solution: standardise this measure  Pearson’s R: standardise by adding std to equation: cov( x, y ) rxy = sx s y
  • 31. Basic assumptions  Normal distributions  Variances are constant and not zero  Independent sampling – no autocorrelations  No errors in the values of the independent variable  All causation in the model is one-way (not necessary mathematically, but essential for prediction)
  • 32. Pearson’s R: degree of linear dependence n n ∑ (x i − x)( yi − y ) ∑ ( x − x)( y − y) i i cov( x, y ) = i =1 rxy = i =1 n nsx s y −1 ≤ r ≤ 1 n ∑Z xi * Z yi rxy = i =1 n
  • 33. Limitations of r  ˆ r is actually r  r = true r of whole population  ˆ r = estimate of r based on data  r is very sensitive to extreme values: 5 4 3 2 1 0 0 1 2 3 4 5 6
  • 34. In the real world…  r is never 1 or –1  interpretations for correlations in psychological research (Cohen) Correlation Negative Positive Small -0.29 to -0.10 00.10 to 0.29 Medium -0.49 to -0.30 0.30 to 0.49 Large -1.00 to -0.50 0.50 to 1.00
  • 35. Regression  Correlation tells you if there is an association between x and y but it doesn’t describe the relationship or allow you to predict one variable from the other.  To do this we need REGRESSION!
  • 36. Best-fit Line  Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that gives best prediction of y for any value of x  This will be the line that ŷ = ax + b minimises distance between slope intercept data and fitted line, i.e. the residuals ε = ŷ, predicted value = y i , true value ε = residual error
  • 37. Least Squares Regression To find the best line we must minimise the sum of the squares of the residuals (the vertical distances from the data points to our line) Model line: ŷ = ax + b a = slope, b = intercept Residual (ε) = y - ŷ Sum of squares of residuals = Σ (y – ŷ)2  we must find values of a and b that minimise Σ (y – ŷ)2
  • 38. Finding b  First we find the value of b that gives the min sum of squares b ε b ε b  Trying different values of b is equivalent to shifting the line up and down the scatter plot
  • 39. Finding a  Now we find the value of a that gives the min sum of squares b b b  Trying out different values of a is equivalent to changing the slope of the line, while b stays constant
  • 40. Minimising sums of squares  Need to minimise Σ(y–ŷ)2  ŷ = ax + b  so need to minimise: sums of squares (S) Σ(y - ax - b)2  If we plot the sums of squares for all different values of a and b we get a parabola, because it is a squared term Gradient = 0 min S Values of a and b  So the min sum of squares is at the bottom of the curve, where the gradient is zero.
  • 41. The maths bit  So we can find a and b that give min sum of squares by taking partial derivatives of Σ(y - ax - b)2 with respect to a and b separately  Then we solve these for 0 to give us the values of a and b that give the min sum of squares
  • 42. The solution  Doing this gives the following equations for a and b: r sy r = correlation coefficient of x and y a= s sy = standard deviation of y x sx = standard deviation of x  You can see that:  A low correlation coefficient gives a flatter slope (small value of a)  Large spread of y, i.e. high standard deviation, results in a steeper slope (high value of a)  Large spread of x, i.e. high standard deviation, results in a flatter slope (high value of a)
  • 43. The solution cont.  Our model equation is ŷ = ax + b  This line must pass through the mean so: y = ax + b b = y – ax  We can put our equation into this giving: r sy r = correlation coefficient of x and y b=y- x sy = standard deviation of y sx sx = standard deviation of x  The smaller the correlation, the closer the intercept is to the mean of y
  • 44. Back to the model  We can calculate the regression line for any data, but the important question is: How well does this line fit the data, or how good is it at predicting y from x?
  • 45. How good is our model? ∑(y – y)2 SSy  Total variance of y: sy2 = = n-1 dfy  Variance of predicted y values (ŷ): ∑(ŷ – y)2 SSpred This is the variance sŷ2 = = explained by our n-1 dfŷ regression model  Error variance: This is the variance of the error between our predicted y values ∑(y – ŷ)2 SSer and the actual y values, and serror = 2 = thus is the variance in y that is n-2 dfer NOT explained by the regression model
  • 46. How good is our model cont.  Total variance = predicted variance + error variance sy2 = sŷ2 + ser2  Conveniently, via some complicated rearranging sŷ2 = r2 sy2 r2 = sŷ2 / sy2  so r2 is the proportion of the variance in y that is explained by our regression model
  • 47. How good is our model cont.  Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get: ser2 = sy2 – r2sy2 = sy2 (1 – r2)  From this we can see that the greater the correlation the smaller the error variance, so the better our prediction
  • 48. Is the model significant?  i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean?  F-statistic: complicated rearranging sŷ2 r2 (n - 2)2 F(df ,df ) = =......= ŷ er ser2 1 – r2  And it follows that: r (n - 2) So all we need to (because F = t 2) t(n-2) = know are r and n ! √1 – r2
  • 49. General Linear Model  Linear regression is actually a form of the General Linear Model where the parameters are a, the slope of the line, and b, the intercept. y = ax + b +ε  A General Linear Model is just any model that describes the data in terms of a straight line
  • 50. Multiple regression  Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y  The different x variables are combined in a linear way and each has its own regression coefficient: y = a1x1+ a2x2 +…..+ anxn + b + ε  The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y.  i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for
  • 51. SPM  Linear regression is a GLM that models the effect of one independent variable, x, on ONE dependent variable, y  Multiple Regression models the effect of several independent variables, x1, x2 etc, on ONE dependent variable, y  Both are types of General Linear Model  GLM can also allow you to analyse the effects of several independent x variables on several dependent variables, y1, y2, y3 etc, in a linear combination  This is what SPM does and will be explained soon…

Notes de l'éditeur

  1. We often want to know whether various variables are ‘linked’, i.e., correlated. This can be interesting in itself, but is also important if we want to predict one variable’s value given a value of the other.
  2. means that correlation cannot be validly used to infer a causal relationship between the variables not be taken to mean that correlations cannot indicate causal relations. However, the causes underlying the correlation, if any, may be indirect and unknown A correlation between age and height in children is fairly causally transparent, but a correlation between mood and health in people is less so. Does improved mood lead to improved health? Or does good health lead to good mood? Or does some other factor underlie both? Or is it pure coincidence? In other words, a correlation can be taken as evidence for a possible causal relationship, but cannot indicate what the causal relationship, if any, might be.
  3. Variance is just a definition. The reason we’re squaring it is to have it get a positive value whether dx is negative or positive, so that we can sum them and positives and negatives will not cancel out. Variance is spread around a mean, covariance is the measure of how much x and y change together; very similar: multiply 2 variables rather than square 1
  4. Problem with Covariance: The value obtained by covariance is dependent on the size of the data’s standard deviations: if large, the value will be greater than if small… even if the relationship between x and y is exactly the same in the large versus small standard deviation datasets.
  5. Can only compare covariances between different variables to see which is greater.
  6. The distance of r from 0 indicates strength of correlation r = 1 or r = (-1) means that we can predict y from x and vice versa with certainty; all data points are on a straight line. i.e., y = ax + b The correlation is 1 in the case of an increasing linear relationship, −1 in the case of a decreasing linear relationship, and some value in between in all other cases, indicating the degree of linear dependence between the variables. The closer the coefficient is to either −1 or 1, the stronger the correlation between the variables. the correlation coefficient detects only linear dependencies between two variables
  7. X av = 3, y av = 2 If extreme value such as y = 5 is possible (graph) X – 1,2,3,4,5, Y – 1,2,3,4,0
  8. the interpretation of a correlation coefficient depends on the context and purposes. A correlation of 0.9 may be very low if one is verifying a physical law using high-quality instruments, but may be regarded as very high in the social sciences where there may be a greater contribution from complicating factors.
  9. So to understand the relationship between two variables, we want to draw the ‘best’ line through the cloud – find the best fit . This is done using the principle of least squares