2. What we have Learned!What we have Learned!
1. Inferential statistics helps us to
determine if what we have observed in a
sample, represents a similar phenomenon
in the population.
2. The assumption is that our sample is
quite similar to the population being
studied and we operate under the premise
that we have a obtained a normal
distribution in our sample when looking
at scores of any type.
3. More…More…
3. Normal distributions have standard
deviations and are symmetrical.
4. There is a probability that we may not
have a normal distribution from our
sample (error), therefore we cannot make
inferences if that’s the case.
4. What do we do?What do we do?
A. We try to make our distributions as normal
as possible so that it represents the one in the
population.
B. When this does not happen, we rectify the
problem by utilizing other statistics that are
associated with central tendency measures.
Solutions:
z- z scores
t- family of t scores
5. What are z and t scores?What are z and t scores?
A z score is used to determine where one
particular score stands with the rest of the
scores in a distribution. Central Tendency
measures gives us parameters but not the
distance of each score from the mean.
However using the mean and the standard
deviation allows to calculate the z score.
z cores also help us to compare two individual
scores when we compare two variables.
(e.g. a Math test score and a Spelling score)
6. The Answer is : StandardizationThe Answer is : Standardization
z scores help us to standardize scores in order
to compare individual scores with different
variables.
Scores in different tests use different scales and
does not permit to compare the scores from
different distributions unless we standardize
them.
Standardization is a process of converting each
individual score into a distribution to a z score,
thus telling you how far from the means a
given score is.
7. ExampleExample
Student X has taken two exams, one in Biology
and the other in Statistics. Here are the scores
from the total number of answers in each exam.
• Biology 65 out of 100 items
• Statistics 42 out of 200 items
• Question?- In which test did student X do
better?
• What do you mean by better?
8. Let’s look at each of theseLet’s look at each of these
distributions:distributions:
Score Mean SD
Biology 65 60 10
Statistics 42 37 5
So in what test did student X perform
better?
9. What did you mean by better?What did you mean by better?
1. If I am asking the percentage of correct
answers then my obvious answer is
__________.
2. But wait a minute that is not fair, the
Statistics exam was more difficult!
3. What is the dilemma?
10. Your answer should be:Your answer should be:
A) How did Student X do in comparison to
other students?
B) We could answer this by looking at the
mean and standard deviation in each of
the two exams. They are different
distributions.
11. ConclusionsConclusions
1. Can we compare two scores if the scales are
different?
A) Depending on your answer what is the next
step?
B) Think before you answer!
12. StandardizationStandardization
You cannot compare two different scores
when the scales are different.
We need the same scale(standardization)
When we take raw scores from a test we can
convert them into standard deviation units
through the use of z scores.
Formula: z= raw score-mean
standard deviation
13. Let’s pretend thatLet’s pretend that
Student X took a spelling test and
received a 1.0 in her z score. What can you
tell from this score?
1) __________________________
2) __________________________
3) __________________________
14. Now that you know how to find a z score letsNow that you know how to find a z score lets
consider the following:consider the following:
When a distribution of scores is standardized
the average (mean) for the distribution is 0 and
the standard deviation is 1.0
What does z score tell us if a z score= -1.5
What does a z score tell us if a z score=.29
It can also tell us if:
A) An individual does better or worse than the
average person.
B) How much a score is above or below the average
C) If the score is better or worse to the rest of other
scores
15. The z scores formula depends on whether youThe z scores formula depends on whether you
are observing a population or sampleare observing a population or sample
A normal distribution that is standardized (so that it has
a mean of 0 and a SD of 1) is called the standard normal
distribution, or the normal distribution of z-scores. If we
know the mean m ("mu"), and standard deviation s
("sigma") of a set of scores which are normally
distributed, we can standardize each "raw" score, x, by
converting it into a z score by using the following
formula on each individual score:
Where x-bar and s are used as estimators for the
population's true mean and standard deviation. Both
formulas essentially calculate the same thing:
16. What is it from this scoreWhat is it from this score
that I do not know?that I do not know?
1. I don’t know of the student did better
or worse than the average score.
2. I don’t know how much the score is
below or above the mean.
3) I don’t know how relatively better or
worse this score in comparison to the rest
of those scores that are associated with the
distribution of scores from that given
Spelling Test..
17. Suppose I told you the following:Suppose I told you the following:
The average score in that Spelling test was 12
and the total items in this Spelling test was 50.
The test taker is 7 year old!
Don’t despair, statisticians have already
figured out and can predict the percentage of
scores that will fall between the mean and a z
score!
18. z scores can provide you with:z scores can provide you with:
1) determine percentile scores.
2) The mean in a z score is equal to 0.
3) From this point we can determine that
50% of scores fall on either side of the
means.
Can you explain why?
19. Figuring Out Percentiles with z scoresFiguring Out Percentiles with z scores
Step 1Step 1
The average SAT score for a white male is 517.
Suppose I want to know what score marks the
90th
percentile?
Step 1-Use the z score table. And find the z score
that marks closet to 90th
percentile. The closest is
8997. The z score is 1.28 (intersection)
So a z score of 1.28 corresponds to the 90th
percentile.
What would be a z score that represents the 75th
percentile?
20. Step 2 Convert z score to a raw scoreStep 2 Convert z score to a raw score
We know what score represents the 90th
percentile and we know the means is 517. But
we do not know the real score that marks the
90th
percentile.
X values can be changed into z-scores just as
z-scores can be changed into X values
Step 2- Convert the z score into the original unit
of measurement. We use this formula:
21. z-scores (cont.)z-scores (cont.)
The formula for changing X values into z-scores
is
X = μ + (z) (σ)
X=517 + (1.28) (100)
X=517 +128
X=645
X – μ is a standard deviation score of a z score.
22. Answer using this formula isAnswer using this formula is
X=517 + (1.28) (100)
X=517 +128
X=645
The score that marks the 90th
percentile for
white males that took the SAT score in
2008 is 647.
23. We can also use a z score toWe can also use a z score to
convert an know raw score into aconvert an know raw score into a
percentile scorepercentile score
If student X in my SAT distribution has a score
of 425 on the SAT Math test. And if I want to
know how many students scored above or
below this score? Then:
Step 1-Convert the raw score back into a z score
24. Step 1 Covert Raw Score toStep 1 Covert Raw Score to
a z scorea z score
z=425-517
100
Z= -92
100
Z= -.92
25. Step 2 Use Appendix AStep 2 Use Appendix A
Find the z score that is equivalent to .92 on the
left column moving vertically. Z scores are not
reported as negative because the scores in a
normal distribution are symmetrical. So the
proportion falling above or below is the same.
So what does the z score tell me? It tells me by
using the table that 82.12 % of scores scored
below a z score of .92. It also tells me that
17.88% of the distribution will fall beyond a z
value of .92.
26. Step 3Step 3
A z score of -.92 corresponds to a raw
score of 425 on the Sat –Math exam. A
score of 425 of this test marks the 17.88th
percentile among the distribution of white
males taking the exam in 2008.
27. z-scores (cont.)z-scores (cont.)
We are able to transform every raw score in our
distribution into a distribution of z-scores
This new distribution of z-scores will have 3 main
properties
1. It will have the same shape as the distribution of X
values (if the X distribution was normal, the z
distribution will be normal)
2. It will always have a mean of zero
3. It will always have a standard deviation of one
(example on problem
This z-score distribution is called a standardized
distribution (being standardized now enables us to
compare distributions that we weren’t able to compare
before)
28. z scores can also determine thez scores can also determine the
proportion of scores that fallproportion of scores that fall
between two scoresbetween two scores
Suppose that John received a score of 417
on the Sat Exam. His cousin Mark
received a score of 567. The Joes family
are always quarreling as to whose son is
the brightest. Mark gets smart and says to
John, “I blew you away, there must 50 %
of the students that took this test between
you and me.”
John is upset and want to show his cousin
he is wrong! What must he do?
29. Comparing Raw ScoresComparing Raw Scores
using z scoresusing z scores
The formula for changing X values into z-scores
is
z = X – μ
σ
Step 1-Convert both raw scores to z scores
417-517
100 z= -100
100 z=-1.00
31. Step 2 Using Appendix AStep 2 Using Appendix A
Find the z scores that correspond to -1.00
and .50. Appendix A tells us that .8413 of
the distribution falls below a z value of
1.00. (Remember that the means splits
both distributions by 50/50. So 50% of
scores will fall below the mean. .8413-.50,
this tells us that 34.13 % of the normal
distribution will fall below between the
mean and a z score of 1.00
32. Step 2 Continued…Step 2 Continued…
Using the same process we know that a z
score of .50 that 69.15 of the distribution
falls below a z score of .50. Thus 19.15% of
the scores fall between the mean and a z
score of .50.
Recall that one z score is positive and the
other is negative. So if we add the both
area of scores we find the total area of
these we find the total distribution of
scores between these two scores and the
answer is .34.13 + .1915+ 53.28%. John
must accept defeat!
33. You are on your own.You are on your own.
Mark has another cousin, Martin who
scored 617 in the Math Test score.
1. Determine the proportion of the
population that scored between 617 and
517?
36. Scatter PlotsScatter Plots
We can prepare a scatter plotscatter plot by placing one point for
each pair of two variables that represent an
observation in the data set. The scatter plot provides
a picture of the data including the following:
1. Range of each variable;
2. Pattern of values over the range;
3. A suggestion as to a possible relationship between
the two variables;
4. Indication of outliers (extreme points).
37. Here are the types ofHere are the types of
scatter plots you are likelyscatter plots you are likely
to see:to see:
This could show how the distance
travelled in a vehicle increases as time
increases, if the vehicle maintains a
constant speed.
This could show the increase in a
student's height as their grade
38. Scattergrams or ScatterScattergrams or Scatter
PlotsPlots
Scatter plots are used by researchers to
look for correlations. A correlation is a
relationship between the data, which can
suggest that one event may affect another
event. For example, you might want to
discover whether more hours of studying
will affect your Math mark in school. Perhaps
a scientist wants to find out if the distance
people live from a major city affects their
health.
39. X and Y AxisX and Y Axis
In order to use scatter plots in this way, you
must have two sets of numerical data. One set is
plotted on the x-axis of a graph, and the other
set is plotted on the y-axis. The resulting scatter
plot will often show at a glance whether a
relationship exists between the two sets of data.
40. ExampleExample
Relationship between hours
studying and test score
Here's an example.
Suppose you want to find
out whether more hours
spent studying will have an
affect on a person's mark.
You set up an experiment
with some people,
recording how many hours
they spent studying and
then recording what
happened to their mark.
A correlation is a relationship
between the data, which can suggest
that one event may affect another
event.
41. Seeing Patterns toSeeing Patterns to
determine Relationshipdetermine Relationship
You can see the data in the table at the
right.
It's difficult to see any pattern in the table,
although it's clear that different things
happened to different people. One person
studied for 1 hour and had their mark go
up 2%, while another person who also
studied for 1 hour saw a drop of 1%!
42. Line of Best FitLine of Best Fit
Here is the graph again. We've shown a line that
seems to describe the direction the points are
heading in. This is called the line of best fit.
47. CovarianceCovariance
The covariancecovariance is a measure of the linear relationship
between two variables. A positive value indicates a
direct or increasing linear relationship and a negative
value indicates a decreasing linear relationship. The
covariance calculation is defined by the equation
where xi and yi are the observed values, X and Y are the sample
means, and n is the sample size.
1
))((
),( 1
−
−−
==
∑=
n
YyXx
syxCov
n
i
ii
xy
48. CovarianceCovariance
Scatter Plots of IdealizedScatter Plots of Idealized
Positive and Negative CovariancePositive and Negative Covariance
X
Y
x
y
*
* *
*
*
*
*
*
*
*
*
*
*
Positive Covariance
(Figure 3.5a)
X
Y
x
y
*
* *
*
*
*
*
*
*
*
*
*
Negative Covariance
(Figure 3.5b)
50. Correlation CoefficientCorrelation Coefficient
1. The correlation ranges from –1 to +1 with,
• rxy = +1 indicates a perfect positive linear relationship – the X and
Y points would plot an increasing straight line.
• rxy = 0 indicates no linear relationship between X and Y.
• rxy = -1 indicates a perfect negative linear relationship – the X and
Y points would plot a decreasing straight line.
1.1. Positive correlationsPositive correlations indicate positive or increasing linear
relationships with values closer to +1 indicating data points
closer to a straight line and closer to 0 indicating greater
deviations from a straight line.
2.2. Negative correlationsNegative correlations indicate decreasing linear relationships
with values closer to –1 indicating points closer to a straight
line and closer to 0 indicating greater deviations from a
straight line.
51. Scatter Plots andScatter Plots and
CorrelationCorrelation
(Figure 3.6)(Figure 3.6)
X
Y
(a) r = .8(a) r = .8
52. X
Y
(b)r = -.8(b)r = -.8
Scatter Plots andScatter Plots and
CorrelationCorrelation
(Figure 3.6)(Figure 3.6)
53. Scatter Plots andScatter Plots and
CorrelationCorrelation
(Figure 3.6)(Figure 3.6)
X
Y
(c) r = 0(c) r = 0
54. Linear RelationshipsLinear Relationships
Linear relationshipsLinear relationships can be represented by the basic
equation
where Y is the dependent or endogenous variable that is a
function of X the independent or exogenous variable. The
model contains two parameters, β0 and β1 that are defined
as model coefficients. The coefficient β0, is the intercept
on the Y-axis and the coefficient β1 is the change in Y for
every unit change in X.
XY 10 ββ +=
55. Linear RelationshipsLinear Relationships
(continued)(continued)
The nominal assumption made in linear applications is
that different values of X can be set and there will be a
corresponding mean value of Y that results because of the
underlying linear process being studied. The linear
equation model computes the mean of Y for every value
of X. This idea is the basis for Pearson’s Product
Moment Coefficient in obtaining and partitioning the
relationship between variables and how much effect they
have with one another. In educational issues, there is no
56. Least Squares RegressionLeast Squares Regression
Least Squares RegressionLeast Squares Regression is a technique
used to obtain estimates (i.e. numerical
values) for the linear coefficients β0 and
β1. These estimates are usually defined
as b0 and b1 respectively.
57. Cross TablesCross Tables
Cross TablesCross Tables present the number of observations
that are defined by the joint occurrence of specific
intervals for two variables. The combination of
all possible intervals for the two variables defines
the cells in a table.
58. Key WordsKey Words
Least Squares
Estimation Procedure
Least Squares
Regression
Sample Correlation
Coefficient
Sample Covariance
Scatter Plot
59. References for Additional HelpReferences for Additional Help
http://www.worsleyschool.net/science/files/scat
http://www.oswego.edu/~srp/stats/z.htm