SlideShare une entreprise Scribd logo
1  sur  67
UNIT - III
MOMENTS, SKEWNESS, AND
KURTOSIS
Moment Ratios
2
3 4
1 23 2
2 2
,
 
 
 
 
2
3 4
1 23 2
2 2
,
m m
b b
m m
 
NON-CENTRAL MOMENTS
CENTRAL MOMENTS
THEOREMS
SKEWNESS
Skewness
A distribution in which the values equidistant from the mean have equal
frequencies and is called Symmetric Distribution.
Any departure from symmetry is called skewness.
In a perfectly symmetric distribution, Mean=Median=Mode and the two
tails of the distribution are equal in length from the mean. These values are
pulled apart when the distribution departs from symmetry and consequently
one tail become longer than the other.
If right tail is longer than the left tail then the distribution is said to have
positive skewness. In this case, Mean>Median>Mode
If left tail is longer than the right tail then the distribution is said to have
negative skewness. In this case, Mean<Median<Mode
KURTOSIS
Kurtosis
For a normal distribution, kurtosis is equal to 3.
When is greater than 3, the curve is more sharply peaked and has narrower
tails than the normal curve and is said to be leptokurtic.
When it is less than 3, the curve has a flatter top and relatively wider tails
than the normal curve and is said to be platykurtic.
4
4
2
1 1
4
4
2
1 1
1 1
,
1 1
,
n n
i
i i
n n
i
i i
x
kurt z for populationdata
n n
x x
kurt b z for sampledata
n n s


 
 
 
    
 
 
    
 
 
 
Curve Fitting and Correlation
This will be concerned primarily with two
separate but closely interrelated processes:
(1) the fitting of experimental data to
mathematical forms that describe their
behavior and
(2) the correlation between different
experimental data to assess how closely
different variables are interdependent.
•The fitting of experimental data to a
mathematical equation is called regression.
Regression may be characterized by different
adjectives according to the mathematical form
being used for the fit and the number of
variables. For example, linear regression
involves using a straight-line or linear equation
for the fit. As another example, Multiple
regression involves a function of more than one
independent variable.
Linear Regression
•Assume n points, with each point having values
of both an independent variable x and a
dependent variable y.
1 2 3The values of are , , ,...., .nx x x x x
1 2 3The values of are , , ,...., .ny y y y y
A best-fitting straight line equation
will have the form
1 0y a x a 
Preliminary Computations
0
1
sample mean of the values
n
k
k
x x x
n 
  
0
1
sample mean of the values
n
k
k
y y y
n 
  
2 2
1
1
sample mean-square of the values
n
k
k
x x x
n 
  
1
1
sample mean of the product
n
k k
k
xy xy x y
n 
  
Best-Fitting Straight Line
    
   
1 22
xy x y
a
x x



     
   
2
0 22
x y x xy
a
x x



0 1Alternately, a y a x 
1 0y a x a 
Example-1. Find best fitting straight
line equation for the data shown
below.
x 0 1 2 3 4 5 6 7 8 9
y 4.00 6.10 8.30 9.90 12.40 14.30 15.70 17.40 19.80 22.30
10
1
1 0 1 2 3 4 5 6 7 8 9 45
4.50
10 10 10
k
k
x x

        
   
10
1
1 4 6.1 8.3 9.9 12.4 14.3 15.7 17.4 19.8 22.3
10 10
130.2
13.02
10
k
k
y y

        
 
 

Multiple Linear Regression
0 1 1 2 2 ..... m my a a x a x a x    
Assume independent variablesm
1 2, ,..... mx x x
Assume a dependent variable that
is to be considered as a linear function
of the independent variables.
y
m
Multiple Regression
(Continuation)
1
Assume that there are values of each
of the variables. For , we have
k
m x
11 12 13 1, , ,....., kx x x x
Similar terms apply for all other variables.
For the th variable, we havem
1 2 3, , ,.....,m m m mkx x x x
Correlation
corr( , ) ( )x y E xy xy 
Cross-Correlation
 cov( , ) ( )( )
corr( , ) ( )( )
( )( )
x y E x x y y
x y x y
xy x y
  
 
 
Covariance
Correlation Coefficient
 ( )( )
( , )
cov( , )
cov( , )cov( , )
x y
E x x y y
C x y
x y
x x y y
 
 


Implications of Correlation Coefficient
• 1. If C(x, y) = 1, the two variables are totally
correlated in a positive sense.
• 2. If C(x, y) = -1 , the two variables are totally
correlated in a negative sense.
• 3. If C(x, y) = 0, the two variables are said to
be uncorrelated.
Binomial Distribution and
Applications
Binomial Probability Distribution
A binomial random variable X is defined to the number
of “successes” in n independent trials where the
P(“success”) = p is constant.
Notation: X ~ BIN(n,p)
In the definition above notice the following conditions
need to be satisfied for a binomial experiment:
1. There is a fixed number of n trials carried out.
2. The outcome of a given trial is either a “success”
or “failure”.
3. The probability of success (p) remains constant
from trial to trial.
4. The trials are independent, the outcome of a trial is
not affected by the outcome of any other trial.
Binomial Distribution
• If X ~ BIN(n, p), then
• where
.,...,1,0)1(
)!(!
!
)1()( nxpp
xnx
n
pp
x
n
xXP xnxxnx








 
psuccessP
nx
nnnn








)"("
trials.insuccesses""
obtaintowaysofnumberthex"choosen"
x
n
11!and10!also,1...)2()1(!
Binomial Distribution
• If X ~ BIN(n, p), then
• E.g. when n = 3 and p = .50 there are 8 possible
equally likely outcomes (e.g. flipping a coin)
SSS SSF SFS FSS SFF FSF FFS FFF
X=3 X=2 X=2 X=2 X=1 X=1 X=1 X=0
P(X=3)=1/8, P(X=2)=3/8, P(X=1)=3/8, P(X=0)=1/8
• Now let’s use binomial probability formula instead…
.,...,1,0)1(
)!(!
!
)1()( nxpp
xnx
n
pp
x
n
xXP xnxxnx








 
Binomial Distribution
• If X ~ BIN(n, p), then
• E.g. when n = 3, p = .50 find P(X = 2)
.,...,1,0)1(
)!(!
!
)1()( nxpp
xnx
n
pp
x
n
xXP xnxxnx








 
8
3or375.)5)(.5(.3)5(.5.
2
3
)2(
ways3
1)12(
123
!1!2
!3
)!23(!2
!3
2
3
12232



















XP
SSF
SFS
FSS
The Poisson Distribution
The Poisson distribution is defined by:
!
)(
x
e
xf
x 
 

Where f(x) is the probability of x occurrences in an interval
 is the expected value or mean value of occurrences within
an interval
e is the natural logarithm. e = 2.71828
Properties of the Poisson Distribution
1. The probability of occurrences is the same for any
two intervals of equal length.
2. The occurrence or nonoccurrence of an event in one
interval is independent of an occurrence on
nonoccurrence of an event in any other interval
Problem
a. Write the appropriate Poisson distribution
b. What is the average number of occurrences in three time periods?
c. Write the appropriate Poisson function to determine the probability
of x occurrences in three time periods.
d. Compute the probability of two occurrences in one time period.
e. Compute the probability of six occurrences in three time periods.
f. Compute the probability of five occurrences in two time periods.
Consider a Poisson probability distribution with an average
number of occurrences of two per period.
Problem
!
2
)(
2
X
e
xf
x 

6
!
6
)(
6
X
e
xf
x 

27067.
2
5413.
!2
2
)2(
22


e
f
(a)
(b)
(c)
(d)
Hypergeometric Distribution
rx
n
N
xn
rN
x
r
xf 




















 0allfor)(
Where
n = the number of trials.
N = number of elements in the population
r = number of elements in the population labeled a success
The Chi-Square Test
Parametric and Nonparametric Tests
It introduces two non-parametric hypothesis
tests using the chi-square statistic: the chi-
square test for goodness of fit and the chi-
square test for independence.
Parametric and Nonparametric Tests
(cont.)
• The term "non-parametric" refers to the fact that the
chi-square tests do not require assumptions about
population parameters nor do they test hypotheses
about population parameters.
• Previous examples of hypothesis tests, such as the t
tests and analysis of variance, are parametric tests
and they do include assumptions about parameters
and hypotheses about parameters.
Parametric and Nonparametric Tests
(cont.)
• The most obvious difference between the
chi-square tests and the other hypothesis
tests we have considered (t and ANOVA) is the
nature of the data.
• For chi-square, the data are frequencies rather
than numerical scores.
The Chi-Square Test for Goodness-of-Fit
• The chi-square test for goodness-of-fit uses
frequency data from a sample to test hypotheses
about the shape or proportions of a population.
• Each individual in the sample is classified into one
category on the scale of measurement.
• The data, called observed frequencies, simply count
how many individuals from the sample are in each
category.
The Chi-Square Test for Goodness-of-Fit
(cont.)
• The null hypothesis specifies the proportion of
the population that should be in each
category.
• The proportions from the null hypothesis are
used to compute expected frequencies that
describe how the sample would appear if it
were in perfect agreement with the null
hypothesis.
The Chi-Square Test for Independence
• The second chi-square test, the chi-square
test for independence, can be used and
interpreted in two different ways:
1. Testing hypotheses about the
relationship between two variables in a
population, or
2. Testing hypotheses about
differences between proportions for two
or more populations.
The Chi-Square Test for Independence
(cont.)
• Although the two versions of the test for
independence appear to be different, they are
equivalent and they are interchangeable.
• The first version of the test emphasizes the
relationship between chi-square and a
correlation, because both procedures examine
the relationship between two variables.
The Chi-Square Test for Independence
(cont.)
• The second version of the test emphasizes the
relationship between chi-square and an
independent-measures t test (or ANOVA)
because both tests use data from two (or
more) samples to test hypotheses about the
difference between two (or more)
populations.
The Chi-Square Test for Independence
(cont.)
• The first version of the chi-square test for
independence views the data as one sample in
which each individual is classified on two
different variables.
• The data are usually presented in a matrix
with the categories for one variable defining
the rows and the categories of the second
variable defining the columns.
The Chi-Square Test for Independence
(cont.)
• The data, called observed frequencies, simply
show how many individuals from the sample
are in each cell of the matrix.
• The null hypothesis for this test states that
there is no relationship between the two
variables; that is, the two variables are
independent.
The Chi-Square Test for Independence
(cont.)
• The second version of the test for independence
views the data as two (or more) separate samples
representing the different populations being
compared.
• The same variable is measured for each sample by
classifying individual subjects into categories of the
variable.
• The data are presented in a matrix with the different
samples defining the rows and the categories of the
variable defining the columns..
The Chi-Square Test for Independence
(cont.)
• The data, again called observed frequencies,
show how many individuals are in each cell of
the matrix.
• The null hypothesis for this test states that the
proportions (the distribution across
categories) are the same for all of the
populations
The Chi-Square Test for Independence
(cont.)
• Both chi-square tests use the same statistic. The
calculation of the chi-square statistic requires two
steps:
1. The null hypothesis is used to construct an idealized
sample distribution of expected frequencies that
describes how the sample would look if the data
were in perfect agreement with the null hypothesis.
The Chi-Square Test for Independence
(cont.)
For the goodness of fit test, the expected frequency for each
category is obtained by
expected frequency = fe = pn
(p is the proportion from the null hypothesis and n is the size
of the sample)
For the test for independence, the expected frequency for each
cell in the matrix is obtained by
(row total)(column total)
expected frequency = fe = ─────────────────
n
The Chi-Square Test for Independence
(cont.)
2. A chi-square statistic is computed to measure the
amount of discrepancy between the ideal sample
(expected frequencies from H0) and the actual
sample data (the observed frequencies = fo).
A large discrepancy results in a large value for chi-
square and indicates that the data do not fit the null
hypothesis and the hypothesis should be rejected.
The Chi-Square Test for Independence
(cont.)
The calculation of chi-square is the same for all chi-
square tests:
(fo – fe)2
chi-square = χ2 = Σ ─────
fe
The fact that chi-square tests do not require scores
from an interval or ratio scale makes these tests a
valuable alternative to the t tests, ANOVA, or
correlation, because they can be used with data
measured on a nominal or an ordinal scale.
Measuring Effect Size for the Chi-Square
Test for Independence
• When both variables in the chi-square test for
independence consist of exactly two
categories (the data form a 2x2 matrix), it is
possible to re-code the categories as 0 and 1
for each variable and then compute a
correlation known as a phi-coefficient that
measures the strength of the relationship.
Measuring Effect Size for the Chi-Square Test
for Independence (cont.)
• The value of the phi-coefficient, or the
squared value which is equivalent to an r2, is
used to measure the effect size.
• When there are more than two categories for
one (or both) of the variables, then you can
measure effect size using a modified version
of the phi-coefficient known as Cramér=s V.
• The value of V is evaluated much the same as
a correlation.
The t-test
Inferences about Population Means
Questions
• What is the main use of the t-test?
• How is the distribution of t related to the unit
normal?
• When would we use a t-test instead of a z-test? Why
might we prefer one to the other?
• What are the chief varieties or forms of the t-test?
• What is the standard error of the difference between
means? What are the factors that influence its size?
Background
• The t-test is used to test hypotheses about
means when the population variance is
unknown (the usual case). Closely related to
z, the unit normal.
• Developed by Gossett for the quality control
of beer.
• Comes in 3 varieties:
• Single sample, independent samples, and
dependent samples.
What kind of t is it?
• Single sample t – we have only 1 group; want to test
against a hypothetical mean.
• Independent samples t – we have 2 means, 2 groups;
no relation between groups, e.g., people randomly
assigned to a single group.
• Dependent t – we have two means. Either same
people in both groups, or people are related, e.g.,
husband-wife, left hand-right hand, hospital patient
and visitor.
Single-sample z test
• For large samples (N>100) can use z to test
hypotheses about means.
• Suppose
• Then
• If
M
M
est
X
z


.
)( 

N
N
XX
N
s
est X
M
1
)(
.
2





200;5;10:;10: 10  NsHH X
35.
14.14
5
200
5
. 
N
s
est X
M
05.96.183.2;83.2
35.
)1011(
11 

 pzX
The t Distribution
We use t when the population variance is unknown (the usual case) and
sample size is small (N<100, the usual case). If you use a stat package for
testing hypotheses about means, you will use t.
The t distribution is a short, fat relative of the normal. The shape of t depends
on its df. As N becomes infinitely large, t becomes normal.
Degrees of Freedom
For the t distribution, degrees of freedom are always a simple function of the
sample size, e.g., (N-1).
One way of explaining df is that if we know the total or mean, and all but one
score, the last (N-1) score is not free to vary. It is fixed by the other scores.
4+3+2+X = 10. X=1.
Single-sample t-test
With a small sample size, we compute the same numbers as we did for z,
but we compare them to the t distribution instead of the z distribution.
25;5;10:;10: 10  NsHH X
1
25
5
. 
N
s
est X
M 1
1
)1011(
11 

 tX
064.2)24,05(. t 1<2.064, n.s.
Interval =
]064.13,936.8[)1(064.211
ˆ

 MtX 
Interval is about 9 to 13 and contains 10, so n.s.
(c.f. z=1.96)
Difference Between Means (1)
• Most studies have at least 2 groups (e.g., M
vs. F, Exp vs. Control)
• If we want to know diff in population means,
best guess is diff in sample means.
• Unbiased:
• Variance of the Difference:
• Standard Error:
2
2
2
121 )var( MMyy  
212121 )()()(   yEyEyyE
2
2
2
1 MMdiff  
Difference Between Means (2)
• We can estimate the standard error of the
difference between means.
• For large samples, can use z
2
2
2
1 ... MMdiff estestest  
diffest
XX
diffz 
 )()( 2121 

3;100;12
2;100;10
0:;0:
222
111
211210



SDNX
SDNX
HH 
36.
100
13
100
9
100
4
. diffest 
05.;56.5
36.
2
36.
0)1210(


 pzdiff
Independent Samples t (1)
• Looks just like z:
• df=N1-1+N2-1=N1+N2-2
• If SDs are equal, estimate is:
diffest
yy
difft 
 )()( 2121 








21
2
2
2
1
2
11
NNNN
diff 


Pooled variance estimate is weighted average:
)]2/(1/[])1()1[( 21
2
22
2
11
2
 NNsNsN
Pooled Standard Error of the Difference (computed):





 



21
21
21
2
22
2
11
2
)1()1(
.
NN
NN
NN
sNsN
est diff
Independent Samples t (2)





 



21
21
21
2
22
2
11
2
)1()1(
.
NN
NN
NN
sNsN
est diff
diffest
yy
difft 
 )()( 2121 

7;83.5;20
5;7;18
0:;0:
2
2
22
1
2
11
211210



Nsy
Nsy
HH 
47.1
35
12
275
)83.5(6)7(4
. 





diffest 
..;36.1
47.1
2
47.1
0)2018(
sntdiff 




tcrit = t(.05,10)=2.23
Dependent t (1)
Observations come in pairs. Brother, sister, repeated measure.
),cov(2 21
2
2
2
1
2
yyMMdiff  
Problem solved by finding diffs between pairs Di=yi1-yi2.
1
)( 2
2




N
DD
s i
D
N
s
est D
MD .N
D
D i
)(
MDest
DED
t
.
)(
 df=N(pairs)-1
Dependent t (2)
Brother Sister
5 7
7 8
3 3
5y 6y
Diff
2 1
1 0
0 1
1D
58.3/1. MDest 
72.1
58.
1
.
)(



MDest
DED
t

1
1
)( 2





N
DD
sD
2
)( DD 

Contenu connexe

Tendances

Chi square test ( x2 )
Chi square test ( x2  )Chi square test ( x2  )
Chi square test ( x2 )
yogesh ingle
 
Chi square[1]
Chi square[1]Chi square[1]
Chi square[1]
sbarkanic
 
Degrees Of Freedom Assignment No 3
Degrees Of Freedom Assignment No 3Degrees Of Freedom Assignment No 3
Degrees Of Freedom Assignment No 3
Abdul Zaman
 

Tendances (20)

Chapter12
Chapter12Chapter12
Chapter12
 
Chi sqr
Chi sqrChi sqr
Chi sqr
 
Chapter08
Chapter08Chapter08
Chapter08
 
Solution to the practice test ch 8 hypothesis testing ch 9 two populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populationsSolution to the practice test ch 8 hypothesis testing ch 9 two populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populations
 
Chapter15
Chapter15Chapter15
Chapter15
 
Chi square test ( x2 )
Chi square test ( x2  )Chi square test ( x2  )
Chi square test ( x2 )
 
stat_03
stat_03stat_03
stat_03
 
What is chi square test
What  is  chi square testWhat  is  chi square test
What is chi square test
 
Chapter14
Chapter14Chapter14
Chapter14
 
Chi square[1]
Chi square[1]Chi square[1]
Chi square[1]
 
Discrete probability distributions
Discrete probability distributionsDiscrete probability distributions
Discrete probability distributions
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
 
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anovaSolution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anova
 
Contingency Tables
Contingency TablesContingency Tables
Contingency Tables
 
Goodness of fit test
Goodness of fit testGoodness of fit test
Goodness of fit test
 
Degrees Of Freedom Assignment No 3
Degrees Of Freedom Assignment No 3Degrees Of Freedom Assignment No 3
Degrees Of Freedom Assignment No 3
 
Goodness Of Fit Test
Goodness Of Fit TestGoodness Of Fit Test
Goodness Of Fit Test
 
Goodness of Fit Notation
Goodness of Fit NotationGoodness of Fit Notation
Goodness of Fit Notation
 
Two-Way ANOVA
Two-Way ANOVATwo-Way ANOVA
Two-Way ANOVA
 
Chi square(hospital admin) A
Chi square(hospital admin) AChi square(hospital admin) A
Chi square(hospital admin) A
 

Similaire à Unit3

CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptxCHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
rathorebhagwan07
 
Theory of probability and probability distribution
Theory of probability and probability distributionTheory of probability and probability distribution
Theory of probability and probability distribution
polscjp
 

Similaire à Unit3 (20)

Inorganic CHEMISTRY
Inorganic CHEMISTRYInorganic CHEMISTRY
Inorganic CHEMISTRY
 
Statistical computing2
Statistical computing2Statistical computing2
Statistical computing2
 
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptxCHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
CHI SQUARE DISTRIBUTIONdjfnbefklwfwpfioaekf.pptx
 
The Standard Normal Distribution
The Standard Normal Distribution  The Standard Normal Distribution
The Standard Normal Distribution
 
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
Discrete distributions:  Binomial, Poisson & Hypergeometric distributionsDiscrete distributions:  Binomial, Poisson & Hypergeometric distributions
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptx
 
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- III
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
 
clustering tendency
clustering tendencyclustering tendency
clustering tendency
 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learning
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Statistics Applied to Biomedical Sciences
Statistics Applied to Biomedical SciencesStatistics Applied to Biomedical Sciences
Statistics Applied to Biomedical Sciences
 
The Central Limit Theorem
The Central Limit Theorem  The Central Limit Theorem
The Central Limit Theorem
 
Chapter 3 sampling and sampling distribution
Chapter 3   sampling and sampling distributionChapter 3   sampling and sampling distribution
Chapter 3 sampling and sampling distribution
 
Unit 4
Unit 4Unit 4
Unit 4
 
Statistical analysis by iswar
Statistical analysis by iswarStatistical analysis by iswar
Statistical analysis by iswar
 
Inferential Statistics.pdf
Inferential Statistics.pdfInferential Statistics.pdf
Inferential Statistics.pdf
 
Sampling distribution.pptx
Sampling distribution.pptxSampling distribution.pptx
Sampling distribution.pptx
 
Theory of probability and probability distribution
Theory of probability and probability distributionTheory of probability and probability distribution
Theory of probability and probability distribution
 
Lecture-6.pdf
Lecture-6.pdfLecture-6.pdf
Lecture-6.pdf
 

Dernier

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 

Dernier (20)

Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 

Unit3

  • 3. Moment Ratios 2 3 4 1 23 2 2 2 ,         2 3 4 1 23 2 2 2 , m m b b m m  
  • 8. Skewness A distribution in which the values equidistant from the mean have equal frequencies and is called Symmetric Distribution. Any departure from symmetry is called skewness. In a perfectly symmetric distribution, Mean=Median=Mode and the two tails of the distribution are equal in length from the mean. These values are pulled apart when the distribution departs from symmetry and consequently one tail become longer than the other. If right tail is longer than the left tail then the distribution is said to have positive skewness. In this case, Mean>Median>Mode If left tail is longer than the right tail then the distribution is said to have negative skewness. In this case, Mean<Median<Mode
  • 10. Kurtosis For a normal distribution, kurtosis is equal to 3. When is greater than 3, the curve is more sharply peaked and has narrower tails than the normal curve and is said to be leptokurtic. When it is less than 3, the curve has a flatter top and relatively wider tails than the normal curve and is said to be platykurtic. 4 4 2 1 1 4 4 2 1 1 1 1 , 1 1 , n n i i i n n i i i x kurt z for populationdata n n x x kurt b z for sampledata n n s                            
  • 11. Curve Fitting and Correlation This will be concerned primarily with two separate but closely interrelated processes: (1) the fitting of experimental data to mathematical forms that describe their behavior and (2) the correlation between different experimental data to assess how closely different variables are interdependent.
  • 12. •The fitting of experimental data to a mathematical equation is called regression. Regression may be characterized by different adjectives according to the mathematical form being used for the fit and the number of variables. For example, linear regression involves using a straight-line or linear equation for the fit. As another example, Multiple regression involves a function of more than one independent variable.
  • 13. Linear Regression •Assume n points, with each point having values of both an independent variable x and a dependent variable y. 1 2 3The values of are , , ,...., .nx x x x x 1 2 3The values of are , , ,...., .ny y y y y A best-fitting straight line equation will have the form 1 0y a x a 
  • 14. Preliminary Computations 0 1 sample mean of the values n k k x x x n     0 1 sample mean of the values n k k y y y n     2 2 1 1 sample mean-square of the values n k k x x x n     1 1 sample mean of the product n k k k xy xy x y n    
  • 15. Best-Fitting Straight Line          1 22 xy x y a x x              2 0 22 x y x xy a x x    0 1Alternately, a y a x  1 0y a x a 
  • 16. Example-1. Find best fitting straight line equation for the data shown below. x 0 1 2 3 4 5 6 7 8 9 y 4.00 6.10 8.30 9.90 12.40 14.30 15.70 17.40 19.80 22.30 10 1 1 0 1 2 3 4 5 6 7 8 9 45 4.50 10 10 10 k k x x               10 1 1 4 6.1 8.3 9.9 12.4 14.3 15.7 17.4 19.8 22.3 10 10 130.2 13.02 10 k k y y               
  • 17. Multiple Linear Regression 0 1 1 2 2 ..... m my a a x a x a x     Assume independent variablesm 1 2, ,..... mx x x Assume a dependent variable that is to be considered as a linear function of the independent variables. y m
  • 18. Multiple Regression (Continuation) 1 Assume that there are values of each of the variables. For , we have k m x 11 12 13 1, , ,....., kx x x x Similar terms apply for all other variables. For the th variable, we havem 1 2 3, , ,.....,m m m mkx x x x
  • 19. Correlation corr( , ) ( )x y E xy xy  Cross-Correlation  cov( , ) ( )( ) corr( , ) ( )( ) ( )( ) x y E x x y y x y x y xy x y        Covariance
  • 20. Correlation Coefficient  ( )( ) ( , ) cov( , ) cov( , )cov( , ) x y E x x y y C x y x y x x y y      
  • 21. Implications of Correlation Coefficient • 1. If C(x, y) = 1, the two variables are totally correlated in a positive sense. • 2. If C(x, y) = -1 , the two variables are totally correlated in a negative sense. • 3. If C(x, y) = 0, the two variables are said to be uncorrelated.
  • 23. Binomial Probability Distribution A binomial random variable X is defined to the number of “successes” in n independent trials where the P(“success”) = p is constant. Notation: X ~ BIN(n,p) In the definition above notice the following conditions need to be satisfied for a binomial experiment: 1. There is a fixed number of n trials carried out. 2. The outcome of a given trial is either a “success” or “failure”. 3. The probability of success (p) remains constant from trial to trial. 4. The trials are independent, the outcome of a trial is not affected by the outcome of any other trial.
  • 24. Binomial Distribution • If X ~ BIN(n, p), then • where .,...,1,0)1( )!(! ! )1()( nxpp xnx n pp x n xXP xnxxnx           psuccessP nx nnnn         )"(" trials.insuccesses"" obtaintowaysofnumberthex"choosen" x n 11!and10!also,1...)2()1(!
  • 25. Binomial Distribution • If X ~ BIN(n, p), then • E.g. when n = 3 and p = .50 there are 8 possible equally likely outcomes (e.g. flipping a coin) SSS SSF SFS FSS SFF FSF FFS FFF X=3 X=2 X=2 X=2 X=1 X=1 X=1 X=0 P(X=3)=1/8, P(X=2)=3/8, P(X=1)=3/8, P(X=0)=1/8 • Now let’s use binomial probability formula instead… .,...,1,0)1( )!(! ! )1()( nxpp xnx n pp x n xXP xnxxnx          
  • 26. Binomial Distribution • If X ~ BIN(n, p), then • E.g. when n = 3, p = .50 find P(X = 2) .,...,1,0)1( )!(! ! )1()( nxpp xnx n pp x n xXP xnxxnx           8 3or375.)5)(.5(.3)5(.5. 2 3 )2( ways3 1)12( 123 !1!2 !3 )!23(!2 !3 2 3 12232                    XP SSF SFS FSS
  • 27. The Poisson Distribution The Poisson distribution is defined by: ! )( x e xf x     Where f(x) is the probability of x occurrences in an interval  is the expected value or mean value of occurrences within an interval e is the natural logarithm. e = 2.71828
  • 28. Properties of the Poisson Distribution 1. The probability of occurrences is the same for any two intervals of equal length. 2. The occurrence or nonoccurrence of an event in one interval is independent of an occurrence on nonoccurrence of an event in any other interval
  • 29. Problem a. Write the appropriate Poisson distribution b. What is the average number of occurrences in three time periods? c. Write the appropriate Poisson function to determine the probability of x occurrences in three time periods. d. Compute the probability of two occurrences in one time period. e. Compute the probability of six occurrences in three time periods. f. Compute the probability of five occurrences in two time periods. Consider a Poisson probability distribution with an average number of occurrences of two per period.
  • 31. Hypergeometric Distribution rx n N xn rN x r xf                       0allfor)( Where n = the number of trials. N = number of elements in the population r = number of elements in the population labeled a success
  • 33. Parametric and Nonparametric Tests It introduces two non-parametric hypothesis tests using the chi-square statistic: the chi- square test for goodness of fit and the chi- square test for independence.
  • 34. Parametric and Nonparametric Tests (cont.) • The term "non-parametric" refers to the fact that the chi-square tests do not require assumptions about population parameters nor do they test hypotheses about population parameters. • Previous examples of hypothesis tests, such as the t tests and analysis of variance, are parametric tests and they do include assumptions about parameters and hypotheses about parameters.
  • 35. Parametric and Nonparametric Tests (cont.) • The most obvious difference between the chi-square tests and the other hypothesis tests we have considered (t and ANOVA) is the nature of the data. • For chi-square, the data are frequencies rather than numerical scores.
  • 36. The Chi-Square Test for Goodness-of-Fit • The chi-square test for goodness-of-fit uses frequency data from a sample to test hypotheses about the shape or proportions of a population. • Each individual in the sample is classified into one category on the scale of measurement. • The data, called observed frequencies, simply count how many individuals from the sample are in each category.
  • 37. The Chi-Square Test for Goodness-of-Fit (cont.) • The null hypothesis specifies the proportion of the population that should be in each category. • The proportions from the null hypothesis are used to compute expected frequencies that describe how the sample would appear if it were in perfect agreement with the null hypothesis.
  • 38.
  • 39. The Chi-Square Test for Independence • The second chi-square test, the chi-square test for independence, can be used and interpreted in two different ways: 1. Testing hypotheses about the relationship between two variables in a population, or 2. Testing hypotheses about differences between proportions for two or more populations.
  • 40. The Chi-Square Test for Independence (cont.) • Although the two versions of the test for independence appear to be different, they are equivalent and they are interchangeable. • The first version of the test emphasizes the relationship between chi-square and a correlation, because both procedures examine the relationship between two variables.
  • 41. The Chi-Square Test for Independence (cont.) • The second version of the test emphasizes the relationship between chi-square and an independent-measures t test (or ANOVA) because both tests use data from two (or more) samples to test hypotheses about the difference between two (or more) populations.
  • 42. The Chi-Square Test for Independence (cont.) • The first version of the chi-square test for independence views the data as one sample in which each individual is classified on two different variables. • The data are usually presented in a matrix with the categories for one variable defining the rows and the categories of the second variable defining the columns.
  • 43. The Chi-Square Test for Independence (cont.) • The data, called observed frequencies, simply show how many individuals from the sample are in each cell of the matrix. • The null hypothesis for this test states that there is no relationship between the two variables; that is, the two variables are independent.
  • 44. The Chi-Square Test for Independence (cont.) • The second version of the test for independence views the data as two (or more) separate samples representing the different populations being compared. • The same variable is measured for each sample by classifying individual subjects into categories of the variable. • The data are presented in a matrix with the different samples defining the rows and the categories of the variable defining the columns..
  • 45. The Chi-Square Test for Independence (cont.) • The data, again called observed frequencies, show how many individuals are in each cell of the matrix. • The null hypothesis for this test states that the proportions (the distribution across categories) are the same for all of the populations
  • 46. The Chi-Square Test for Independence (cont.) • Both chi-square tests use the same statistic. The calculation of the chi-square statistic requires two steps: 1. The null hypothesis is used to construct an idealized sample distribution of expected frequencies that describes how the sample would look if the data were in perfect agreement with the null hypothesis.
  • 47. The Chi-Square Test for Independence (cont.) For the goodness of fit test, the expected frequency for each category is obtained by expected frequency = fe = pn (p is the proportion from the null hypothesis and n is the size of the sample) For the test for independence, the expected frequency for each cell in the matrix is obtained by (row total)(column total) expected frequency = fe = ───────────────── n
  • 48.
  • 49. The Chi-Square Test for Independence (cont.) 2. A chi-square statistic is computed to measure the amount of discrepancy between the ideal sample (expected frequencies from H0) and the actual sample data (the observed frequencies = fo). A large discrepancy results in a large value for chi- square and indicates that the data do not fit the null hypothesis and the hypothesis should be rejected.
  • 50. The Chi-Square Test for Independence (cont.) The calculation of chi-square is the same for all chi- square tests: (fo – fe)2 chi-square = χ2 = Σ ───── fe The fact that chi-square tests do not require scores from an interval or ratio scale makes these tests a valuable alternative to the t tests, ANOVA, or correlation, because they can be used with data measured on a nominal or an ordinal scale.
  • 51. Measuring Effect Size for the Chi-Square Test for Independence • When both variables in the chi-square test for independence consist of exactly two categories (the data form a 2x2 matrix), it is possible to re-code the categories as 0 and 1 for each variable and then compute a correlation known as a phi-coefficient that measures the strength of the relationship.
  • 52. Measuring Effect Size for the Chi-Square Test for Independence (cont.) • The value of the phi-coefficient, or the squared value which is equivalent to an r2, is used to measure the effect size. • When there are more than two categories for one (or both) of the variables, then you can measure effect size using a modified version of the phi-coefficient known as Cramér=s V. • The value of V is evaluated much the same as a correlation.
  • 53.
  • 54. The t-test Inferences about Population Means
  • 55. Questions • What is the main use of the t-test? • How is the distribution of t related to the unit normal? • When would we use a t-test instead of a z-test? Why might we prefer one to the other? • What are the chief varieties or forms of the t-test? • What is the standard error of the difference between means? What are the factors that influence its size?
  • 56. Background • The t-test is used to test hypotheses about means when the population variance is unknown (the usual case). Closely related to z, the unit normal. • Developed by Gossett for the quality control of beer. • Comes in 3 varieties: • Single sample, independent samples, and dependent samples.
  • 57. What kind of t is it? • Single sample t – we have only 1 group; want to test against a hypothetical mean. • Independent samples t – we have 2 means, 2 groups; no relation between groups, e.g., people randomly assigned to a single group. • Dependent t – we have two means. Either same people in both groups, or people are related, e.g., husband-wife, left hand-right hand, hospital patient and visitor.
  • 58. Single-sample z test • For large samples (N>100) can use z to test hypotheses about means. • Suppose • Then • If M M est X z   . )(   N N XX N s est X M 1 )( . 2      200;5;10:;10: 10  NsHH X 35. 14.14 5 200 5 .  N s est X M 05.96.183.2;83.2 35. )1011( 11    pzX
  • 59. The t Distribution We use t when the population variance is unknown (the usual case) and sample size is small (N<100, the usual case). If you use a stat package for testing hypotheses about means, you will use t. The t distribution is a short, fat relative of the normal. The shape of t depends on its df. As N becomes infinitely large, t becomes normal.
  • 60. Degrees of Freedom For the t distribution, degrees of freedom are always a simple function of the sample size, e.g., (N-1). One way of explaining df is that if we know the total or mean, and all but one score, the last (N-1) score is not free to vary. It is fixed by the other scores. 4+3+2+X = 10. X=1.
  • 61. Single-sample t-test With a small sample size, we compute the same numbers as we did for z, but we compare them to the t distribution instead of the z distribution. 25;5;10:;10: 10  NsHH X 1 25 5 .  N s est X M 1 1 )1011( 11    tX 064.2)24,05(. t 1<2.064, n.s. Interval = ]064.13,936.8[)1(064.211 ˆ   MtX  Interval is about 9 to 13 and contains 10, so n.s. (c.f. z=1.96)
  • 62. Difference Between Means (1) • Most studies have at least 2 groups (e.g., M vs. F, Exp vs. Control) • If we want to know diff in population means, best guess is diff in sample means. • Unbiased: • Variance of the Difference: • Standard Error: 2 2 2 121 )var( MMyy   212121 )()()(   yEyEyyE 2 2 2 1 MMdiff  
  • 63. Difference Between Means (2) • We can estimate the standard error of the difference between means. • For large samples, can use z 2 2 2 1 ... MMdiff estestest   diffest XX diffz   )()( 2121   3;100;12 2;100;10 0:;0: 222 111 211210    SDNX SDNX HH  36. 100 13 100 9 100 4 . diffest  05.;56.5 36. 2 36. 0)1210(    pzdiff
  • 64. Independent Samples t (1) • Looks just like z: • df=N1-1+N2-1=N1+N2-2 • If SDs are equal, estimate is: diffest yy difft   )()( 2121          21 2 2 2 1 2 11 NNNN diff    Pooled variance estimate is weighted average: )]2/(1/[])1()1[( 21 2 22 2 11 2  NNsNsN Pooled Standard Error of the Difference (computed):           21 21 21 2 22 2 11 2 )1()1( . NN NN NN sNsN est diff
  • 65. Independent Samples t (2)           21 21 21 2 22 2 11 2 )1()1( . NN NN NN sNsN est diff diffest yy difft   )()( 2121   7;83.5;20 5;7;18 0:;0: 2 2 22 1 2 11 211210    Nsy Nsy HH  47.1 35 12 275 )83.5(6)7(4 .       diffest  ..;36.1 47.1 2 47.1 0)2018( sntdiff      tcrit = t(.05,10)=2.23
  • 66. Dependent t (1) Observations come in pairs. Brother, sister, repeated measure. ),cov(2 21 2 2 2 1 2 yyMMdiff   Problem solved by finding diffs between pairs Di=yi1-yi2. 1 )( 2 2     N DD s i D N s est D MD .N D D i )( MDest DED t . )(  df=N(pairs)-1
  • 67. Dependent t (2) Brother Sister 5 7 7 8 3 3 5y 6y Diff 2 1 1 0 0 1 1D 58.3/1. MDest  72.1 58. 1 . )(    MDest DED t  1 1 )( 2      N DD sD 2 )( DD 