Data Display and Summary

Data Display and
Summary
Biostatistics

By Dr Zahid Khan

Learning Objectives
• Acquiring the basic knowledge of biostatistics necessary for

them to understand and comprehend medical literature and
evidence-based medicine, follow up with the expanding
medical knowledge and participate in research.
• Identify the role of biostatistics in medical research
Define, appraise, use and interpret the different tools used for
data analysis
• Define, enumerate and identify the different methods of data
summarization in the form of tables, graphs and numeric
measures of central tendency and dispersion and the ability to
report dichotomy variables.
• Define, appraise, use and interpret the different tools used for
data analysis
2

Data
• Data is a collection of facts, such as values or
measurements.

OR

• Data is information that has been translated into a
form that is more convenient to move or process.
OR

• Data are any facts, numbers, or text that can be
processed by a computer.

3

Statistics
Statistics
is
the
collection, summarizing,
interpretation of data.

study
organization,

of
analysis,

the
and

4

Vital statistics
Vital statistics is collecting, summarizing, organizing,
analysis, presentation, and interpretation of data related to
vital events of life as births, deaths,
marriages, divorces,
health & diseases.

5

Biostatistics
Biostatistics is the application of statistical techniques to
scientific research in health-related fields, including
medicine, biology, and public health.

6

Descriptive Statistics
The term descriptive statistics refers to statistics
that are used to describe. When using descriptive
statistics, every member of a group or population is
measured. A good example of descriptive statistics is
the Census, in which all members of a population are
counted.

7

Inferential or Analytical Statistics
Inferential statistics are used to draw conclusions and make
predictions based on the analysis of numeric data.

8

Primary & Secondary Data
• Raw or Primary data: when data collected having
lot of unnecessary, irrelevant & un wanted
information

• Treated or Secondary data: when we treat &
remove this unnecessary, irrelevant & un wanted
information

• Cooked data: when data collected not genuinely and
is false and fictitious
9

Ungrouped & Grouped Data
•
•

Ungrouped data: when data presented or observed individually. For example if we
observed no. of children in 6 families
2, 4, 6, 4, 6, 4

•
•
•

Grouped data: when we grouped the identical data by frequency. For example above
data of children in 6 families can be grouped as:
No. of children

Families

2

1

4

3

6

2

or alternatively we can make classes:

No. of children

Frequency

2-4

4

5-7

2

10

Variable
A variable is something that can be changed, such as a
characteristic or value. For example age, height, weight,
blood pressure etc

11

Types of Variable
Independent variable: is typically the variable representing the
value being manipulated or changed. For example smoking
Dependent variable: is the observed result of the independent
variable being manipulated. For example ca of lung

Confounding variable: is associated with both exposure and
disease. For example age is factor for many events

12

Quantitative or Numerical data
This data is used to describe a type of information
that can be counted or expressed numerically
(numbers)
2, 4 , 6, 8.5, 10.5

14

Quantitative or Numerical
data (cont.)
This data is of two types

1. Discrete Data: it is in whole numbers or values and has no
fraction. For example

Number of children in a family

=

4

Number of patients in hospital

= 320

2. Continuous Data (Infinite Number): measured on a
continuous scale. It can be in fraction. For example
Height of a person

=

5 feet 6 inches 5”.6’

Temperature

=

92.3 °F

15

Qualitative or Categorical data
This is non numerical data as

Male/Female,

Short/Tall

This is of two types

1.

Nominal Data: it has series of unordered categories
( one can not √ more than one at a time) For example
Sex

2.

=

Male/Female

Blood group = O/A/B/AB

Ordinal or Ranked Data: that has distinct ordered/ranked categories.
For example
Measurement of height can be = Short / Medium / Tall
Degree of pain can be = None / Mild /Moderate / Severe

16

Measures of Central Tendency &
Variation (Dispersion)

17

Measures of Central Tendency
are quantitative indices that describe the center of
a distribution of data. These are

• Mean
• Median
• Mode

(Three M M M)

18

Mean

Mean or arithmetic mean is also called AVERAGE and only calculated
for numerical data. For example

• What average age of children in years?
Children

1234567

Age

6443246
-X = ∑X
___
n

Formula

Mean = 6 + 4 + 4 + 3 + 2 + 4 + 5
7

= 28

= 4 years
7

19

Median
• It is central most value. For example what is central value
in 2, 3, 4, 4, 4, 5, 6 data?

• If we divide data in two equal groups 2, 3, 4, 4, 4, 5, 6
hence 4 is the central most value

• Formula to calculate central value is:
Median = n + 1 (here n is the total no. of value)
2
Median = (n + 1)/2 = 7 + 1 = 8/2 = 4
20

Mode
• is the most frequently (repeated) occurring value in set
of observations. Example

• No mode
Raw data:

10.3 4.9 8.9 11.7 6.3 7.7

• One mode
Raw data:

2 3 4 4 4 5 6

• More than 1 mode
Raw data:

21 28 28 41 43 43
21

Comparison of the Mode, the
Median, and the Mean
• In a normal distribution, the mode , the median, and the
mean have the same value.

• The mean is the widely reported index of central

tendency for variables measured on an interval and ratio
scale.

• The mean takes each and every score into account.
• It also the most stable index of central tendency and thus
yields the most reliable estimate of the central tendency
of the population.

Measures of Dispersion
Quantitative indices that describe the spread of a data set.
These are

•
•
•
•
•
•

Range
Mean deviation

Variance
Standard deviation
Coefficient of variation
Percentile
23

Range
It is difference between highest and lowest values
in a data series. For example:
the ages (in Years) of 10 children are
2, 6, 8, 10, 11, 14, 1, 6, 9, 15
here the range of age will be 15 – 1 = 14 years
24

Mean Deviation
This is average deviation of all observation from the mean
Mean Deviation = ∑ І X – X І
_______
_
n
here X = Value, X = Mean
n = Total no. of value

25

26

Mean Deviation Example
A student took 5 exams in a class and had scores of
92, 75, 95, 90, and 98. Find the mean deviation for her test scores.
• First step find the mean.
_

x=∑x

___
n

= 92+75+95+90+98
5
= 450
5
= 90

• 2nd step find mean deviation
Values = X

ˉ
Mean = X

Deviation from
ˉ
Mean = X - X

Absolute value of
Deviation
Ignoring + signs

92

90

2

2

75

90

-15

15

95

90

5

5

90

90

0

0

98

90

8

8

Total = 450

n= 5

--

Mean Deviation =

Dr. Riaz A. Bhutto

_
∑І X – X І
_______ = 30/5
n

∑ X - X = 30
=6
Average deviation
from mean is 6
9/3/2012
27

Variance
• It is measure of variability which takes into account
the difference between each observation and mean.

• The variance is the sum of the squared deviations
from the mean divided by the number of values in
the series minus 1.

• Sample variance is s² and population variance is σ²
28

Variance (cont.)
•
•
•
•
•

The Variance is defined as:
The average of the squared differences from the Mean.
To calculate the variance follow these steps:
Work out the Mean (the simple average of the numbers)
Then for each number: subtract the Mean and square the
result (the squared difference)

• Then work out the average of those squared differences.
29

30

Example: House hold size of 5 families was recorded as following:
2, 5, 4, 6, 3

Step 1
Values = X

Calculate variance for above data.

Step 2
ˉ
Mean = X

Step 3

Step 4

Deviation from
ˉ
Mean = X - X

ˉ
( X – X)²

2

4

-2

4

5

4

1

1

4

4

0

0

6

4

2

4

3

4

-1

1

Step 6 =
Dr. Riaz A. Bhutto

s² =
_
∑ ( X – X)² = 10/5 = 2
_______
n

∑ = 10 Step 5
S²= 2 persons²
9/3/2012

Standard Deviation

• The Standard Deviation is a measure of how spread out numbers are.
• Its symbol is σ (the greek letter sigma)
• The formula is easy: it is the square root of the Variance.ie
s = √ s²
• SD is most useful measure of dispersion
s = √ (x - x²)
n
(if n > 30) Population
s = √ (x - x²)
n-1

(if n < 30) Sample
31

Standard Deviation and Standard
Error
• SD is an estimate of the variability of the
observations or it is sample estimate of population
parameter .

• SE is a measure of precision of an estimate of a
population parameter.

Graphs and their use
• Histogram & Box plots are used for continuous or
scale variables like temperature, Bone density etc.

• Bar chart & Pie Charts are used to categorical or
nominal variables like gender, name etc.

• Scatterplots . Used to measure to continuous
variables.

33

BAR GRAPHS.
• Bar graphs are frequently used with the categorical
data to compare the sizes of categories

34

PIE CHARTS
• Like bar graphs, pie charts are best used with

categorical data to help us see what percentage of the
whole each category constitutes. Pie charts require all
categories to be included in a graph. Each graph
always represents the whole.

• One of the reasons why bar graphs are more flexible
than pie charts is the fact that bar graphs compare
selected categories, whereas pie charts must either
compare all categories or none.

36

QUANTITATIVE VARIABLES
• STEM PLOTS.
• Stemplots (sometimes called stem-and-leaf plots) are used with

quantitative data to display shapes of distributions, to organize numbers
and make them more comprehensible.
• It is a descriptive technique which gives a good overall impression of the
data. Stemplots include the actual numerical values of the
observations, where each value is separated into two parts, a stem and a
leaf.
• A stem is usually the first digit, or the leftmost digit(s), and a leaf is the
final rightmost digit. We write the stems in a vertical column with the
smallest at the top, and draw a vertical line to the right of the column.
Finally, we write the leaves in the row to the right of the corresponding
stem, starting with the smallest one.
38

STEM PLOTS.
• Grades. The average test grades of 19 students are as
follows (on a scale from 0 to 100, with 100 being the
highest score): 92 95 96 81 95 75 91 79 92 100 89 94
92 86 93 73 74 94 91

• Colour coordinated, in increasing order:
• 73, 74, 75, 79, 81, 86, 89, 91, 91, 92, 92, 92, 93, 94, 9
4, 95, 95, 96, 100
39

STEMPLOT#1:
stem | leaf
7|34
7|59
8|1
8|69
9|11222344
9|556
10 | 0
10 |

STEMPLOT#2:
stem | leaf
7|3459
8|169
9|11222344556
10 | 0
Depending on the number of
stems, different conclusions can
be drawn about a given data set.
In this example, even though
both stemplots show a slight leftskeweness of the data set,
stemplot#1 reflects that more
evidently than stemplot #2.
40

Stem and Leaf Plots
• .Simple way to order and display a data set.
• Abbreviate the observed data into two significant digits.
0.6

Stem
• 0
• 1
• 2
• 3

2.6

0.1

Leaf
6 1
1 3
6 2
2

1.1

0.4

1.3

1.5

2.2

2.0

3.2

4
5
0

41

HISTOGRAMS
• Histograms are yet another graphic way of
presenting data to show the distribution of the
observations. It is one of the most common forms
of graphical presentation of a frequency distribution

42

BOXPLOTS
• Boxplots reveal the main features of a batch of
data, i.e. how the data are spread out.

• Any boxplot is a graph of the five-number summary:
the minimum score, first quartile (Q1-the median of
the lower half of all scores), the median, third
quartile (Q3-the median of the upper half of all
scores), and the maximum score, with suspected
outliers plotted individually.
44

Continued ( Explainable from
Graph)
• The boxplot consists of a rectangular box, which

represents the middle half of all scores (between Q1 and
Q3). Approximately one-fourth of the values should fall
between the minimum and Q1, and approximately onefourth should fall between Q3 and the maximum. A line
in the box marks the median. Lines called whiskers extend
from the box out to the minimum and maximum scores
that are not possible outliers. If an observation falls more
than 1.5x IQR outside of the box, it is plotted individually
as an outlier.
45

BOXPLOTS
•
•
•
•
•
•

FIVE-NUMBER SUMMARY:
MINIMUM
1ST QUARTILE

MEDIAN
3RD QUARTILE
MAXIMUM
46

IQR, or the interquartile range, is the distance between
the first and third quartiles. IQR = Q3 - Q1

47

References
• https://onlinecourses.science.psu.edu/stat100/book
/export/html/20

• http://www.gla.ac.uk/sums/users/jdbmcdonald/Pre
Post_TTest/confid2.html

48

ANY QUESTIONS

• THANK YOU
Dr. Riaz A. Bhutto

3/3/2012

49

Data Display and Summary

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Data Display and Summary

Similaire à Data Display and Summary (20)

Plus de DrZahid Khan

Plus de DrZahid Khan (20)

Dernier

Dernier (20)

Data Display and Summary