Lecture on Introduction to Descriptive Statistics - Part 1 and Part 2. These slides were presented during a lecture at the Colombo Institute of Research and Psychology.
2. Overview of Intro to Descriptive Statistics I
This lecture will cover the following topics:
ļµ Definition and Types of Descriptive Statistics
ļµ Mean, Median, Mode and Range
ļµ Skewness and Kurtosis
ļµ Normality Curve
ļµ Variance and Standard Deviation
ļµ Quartiles
ļµ Percentiles
ļµ Using Excel for Descriptive Statistics
3. Defining Descriptive Statistics
The analysis of data that helps describe, show or summarize
data in a meaningful way such that, for example, patterns
might emerge from the data.
They do not, however, allow us to make conclusions beyond
the data we have analyzed or reach conclusions regarding
any hypotheses we might have made.
Descriptive vs. Inferential:
Descriptive statistics are used to describe our samples and
inferential statistics are used to generalize from our samples to
the wider population.
4. Types of Descriptive Statistic
1. Measures of central tendency:
These are ways of describing the central position of a
frequency distribution for a group of data.
ļµ We can describe this central position using a number of statistics,
including the mode, median, and mean.
2. Measures of spread:
These are ways of summarizing a group of data by
describing how spread out the scores are.
ļµ Measures of spread help us to summarize how spread out data
are. To describe this spread, a number of statistics are available
us, including the range, quartiles, absolute deviation, variance
and standard deviation.
5. Summarizing Descriptive Statistics
When we use descriptive statistics it is useful to summarize
our group of data using a combination of:
ā¢ tabulated description (i.e., tables)
ā¢ graphical description (i.e., graphs and charts)
ā¢ statistical commentary (i.e., a discussion of the results)
6. Mean, Median, Mode and Range
ā¢ Mean - The mean is the average of all numbers and is sometimes
called the arithmetic mean. To calculate mean, add all of the
in a set and then divide the sum by the total count of numbers.
ā¢ Median - The statistical median is the middle number in a sequence
of numbers. To find the median, organize each number in order by
size; the number in the middle is the median.
ā¢ Mode - The mode is the number that occurs most often within a set
of numbers.
ā¢ Range - The range is the difference between the highest and lowest
values within a set of numbers. To calculate range, subtract the
smallest number from the largest number in the set.
7. Skewness and Kurtosis
ā¢ Skewness - a measure of symmetry, or more precisely,
the lack of symmetry. A distribution, or data set, is
symmetric if it looks the same to the left and right of the
center point.
ā¢ Kurtosis - a measure of whether the data are heavy-
tailed or light-tailed relative to a normal distribution. That
is, data sets with high kurtosis tend to have heavy tails, or
outliers. Data sets with low kurtosis tend to have light
or lack of outliers. A uniform distribution would be the
extreme case.
ā¢ The histogram is an effective graphical technique for
showing both the skewness and kurtosis of data set.
8. Normality Curve
ā¢ The normal distribution is the most important and most widely used
distribution in statistics. It is sometimes called the "bell curveā and the
"Gaussian curveā.
9. Seven Features of Normal Distributions
1. Normal distributions are symmetric around their mean.
2. The mean, median, and mode of a normal distribution are
equal.
3. The area under the normal curve is equal to 1.0.
4. Normal distributions are denser in the center and less dense in
the tails.
5. Normal distributions are defined by two parameters, the mean
(Ī¼) and the standard deviation (Ļ).
6. 68% of the area of a normal distribution is within one standard
deviation of the mean.
7. Approximately 95% of the area of a normal distribution is
within two standard deviations of the mean.
10. Variance and Standard Deviation
ā¢ Variance: measures how far a data set is spread out. The
technical definition is āThe average of the squared
differences from the mean,ā but all it really does is to give
you a very general idea of the spread of your data.
ļµ A value of zero means that there is no variability; All the
numbers in the data set are the same.
ā¢ Standard Deviation: the square root of the variance.
While variance gives you a rough idea of spread, the
standard deviation is more concrete, giving you exact
distances from the mean.
11. Quartiles
ā¢ Quartiles in statistics are values that divide your data into
quarters. They divide your data into four segments
according to where the numbers fall on the number line.
ā¢ The four quarters that divide a data set into quartiles are:
ļµ The lowest 25% of numbers.
ļµ The next lowest 25% of numbers (up to the median).
ļµ The second highest 25% of numbers (above the median).
ļµ The highest 25% of numbers.
12. Percentiles
ā¢ The most common definition of a percentile is a number where a certain
percentage of scores fall below that number.
ļµ The 25th percentile is also called the first quartile.
ļµ The 50th percentile is generally the median (if youāre using the third definitionā
see below).
ļµ The 75th percentile is also called the third quartile.
ļµ The difference between the third and first quartiles is the interquartile range.
ā¢ Percentile Rank:
ļµ The nth percentile is the lowest score that is greater than a certain
percentage (ānā) of the scores.
ļµ The nth percentile is the smallest score that is greater than or equal to a
certain percentage of the scores. To rephrase this, itās the percentage of
data that falls at or below a certain observation.
ā¢ A percentile range is the difference between two specified percentiles.
13. Conducting Descriptive Analysis in Excel
ā¢ Step 1: Type your data into Excel, in a single column. For
example, if you have ten items in your data set, type them
into cells A1 through A10.
ā¢ Step 2: Click the āDataā tab and then click āData
Analysisā in the Analysis group.
ā¢ Step 3: Highlight āDescriptive Statisticsā in the pop-up
Data Analysis window.
ā¢ Step 4: Type an input range into the āInput Rangeā
text box. For this example, type āA1:A10ā into the box.
14. Conducting Descriptive Analysis in Excel
ā¢ Step 5: Check the āLabels in first rowā check box if you
have titled the column in row 1, otherwise leave the box
unchecked.
ā¢ Step 6: Type a cell location into the āOutput Rangeā
box. For example, type āC1.ā Make sure that two adjacent
columns do not have data in them.
ā¢ Step 7: Click the āSummary Statisticsā check box and
then click āOKā to display Excel descriptive statistics. A
of descriptive statistics will be returned in the column you
selected as the Output Range.
16. Overview of Intro to Descriptive Statistics II
This lecture will cover the following topics:
ļµ Bar Charts
ļµ Pie Charts
ļµ Histograms
ļµ Box-Plots
ļµ Scatter Plots
17. Bar Charts
ā¢ A bar graph (also known as a bar chart or bar diagram) is
a visual tool that uses bars to compare data among
categories. A bar graph may run horizontally or vertically.
The important thing to know is that the longer the bar, the
greater its value.
ā¢ Bar graphs consist of two axes.
ļµ On a vertical bar graph, the horizontal axis (or x-axis)
shows the data categories.
ļµ The vertical axis (or y-axis) is the scale.
18. Bar Charts
ā¢ Bar graphs have three key attributes:
1. A bar diagram makes it easy to compare sets of data
between different groups at a glance.
2. The graph represents categories on one axis and a
discrete value in the other. The goal is to show the
relationship between the two axes.
3. Bar charts can also show big changes in data over
time.
21. Pie Charts
ā¢ A pie chart is a circular graph that shows the relative
contribution that different categories contribute to an
overall total.
ā¢ A wedge of the circle represents each categoryās
contribution, such that the graph resembles a pie that
has been cut into different sized slices.
ā¢ Every 1% contribution that a category contributes to the
total corresponds to a slice with an angle of 3.6 degrees.
22. Pie Charts
ā¢ Pie charts are a visual way of displaying data that might
otherwise be given in a small table.
ā¢ Pie charts are useful for displaying data that are classified
into nominal or ordinal categories.
ļµ Nominal data are categorised according to descriptive or
qualitative information such as county of birth or type of
pet owned.
ļµ Ordinal data are similar but the different categories can
also be ranked, for example in a survey people may be
asked to say whether they classed something as very poor,
poor, fair, good, very good.
23. Pie Charts
ā¢ Pie charts are generally used to show percentage or
proportional data and usually the percentage represented
by each category is provided next to the corresponding
slice of pie.
ā¢ Pie charts are good for displaying data for around 6
categories or fewer. When there are more categories it is
difficult for the eye to distinguish between the relative
sizes of the different sectors and so the chart becomes
difficult to interpret.
26. Histograms
ā¢ A histogram is a plot that lets you discover, and show, the
underlying frequency distribution (shape) of a set
of continuous data. This allows the inspection of the data
for its underlying distribution (e.g., normal distribution),
outliers, skewness, etc.
ā¢ The area of the bar that indicates the frequency of
occurrences for each bin. This means that the height of
the bar does not necessarily indicate how many
occurrences of scores there were within each individual
bin. It is the product of height multiplied by the width of
the bin that indicates the frequency of occurrences within
that bin.
27. Histograms
ā¢ One of the reasons that the height of the bars is often
incorrectly assessed as indicating frequency and not the
area of the bar is due to the fact that a lot of histograms
often have equally spaced bars (bins), and under these
circumstances, the height of the bin does reflect the
frequency.
ā¢ The major difference is that a histogram is only used to
plot the frequency of score occurrences in a continuous
data set that has been divided into classes, called bins. Bar
charts, on the other hand, can be used for a great deal of
other types of variables including ordinal and nominal
data sets.
28. Histograms
A histogram showing frequencies of
different age groups in a sample.
Thinking Point:
What can you infer about the
normal distribution of this data
from this chart?
29. Box-Plots
ā¢ A boxplot is a standardized way of displaying the
distribution of data based on a five number summary
(āminimumā, first quartile (Q1), median, third quartile (Q3),
and āmaximumā).
ā¢ It can tell you about your outliers and what their values
are.
ā¢ It can also tell you if your data is symmetrical, how tightly
your data is grouped, and if and how your data is skewed.
30. Example of a Box-Plot
See next slide for description of this box-plot.
31. Elements of a Box-Plot
ā¢ A boxplot is a graph that gives you a good indication of
how the values in the data are spread out.
ļµ median (Q2/50th Percentile): the middle value of the dataset.
ļµ first quartile (Q1/25th Percentile): the middle number between
the smallest number (not the āminimumā) and the median of the
dataset.
ļµ third quartile (Q3/75th Percentile): the middle value between
median and the highest value (not the āmaximumā) of the dataset.
ļµ interquartile range (IQR): 25th to the 75th percentile.
ļµ whiskers (shown in blue)
ļµ outliers (shown as green circles)
ļµ āmaximumā: Q3 + 1.5*IQR
ļµ āminimumā: Q1 -1.5*IQR
32. Scatter Plots
ā¢ A scatter plot is a two-dimensional data visualization that
uses dots to represent the values obtained for two
different variables - one plotted along the x-axis and the
other plotted along the y-axis.
ā¢ Scatter plots are used when you want to show the
relationship between two variables. Scatter plots are
sometimes called correlation plots because they show
how two variables are correlated.
ā¢ However, not all relationships are linear.
33. Examples of Scatter Plots
A scatterplot showing the relationship between weight
(in lb) and height (in inches) in children.
This demonstrates a positive linear relationship.
35. References and Further Reading
Books:
ā¢ Dancey, C. and Reidy, J. (2017). Statistics without Maths
for Psychology,7th Edition. New York: Pearson.
ā¢ Howitt, D., & Cramer, D. (2017). Statistics in psychology
using SPSS. New York: Pearson.
Articles:
ā¢ Bickel, P. J., & Lehmann, E. L. (1975). Descriptive Statistics
for Nonparametric Models I. Introduction. The Annals of
Statistics, 3(5), 1038-1044. doi:10.1214/aos/1176343239 |
https://link.springer.com/content/pdf/10.1007/978-1-
4614-1412-4_42.pdf