Bio statistics1

Bio Statistics
Part 1
INDIAN DENTAL ACADEMY
Leader in continuing dental education
www.indiandentalacademy.com


Contents
•
•
•
•
•
•
•
•
•
•
•
•

Introduction
Common Statistical Terms
Source of data
Types of data
Data presentation
Measures of statistical averages or central tendency
Types of variability
Measures of variation or dispersion
Normal distribution or normal curve
Sampling
Determination of sample size
Probability or p value

Introduction


• Any science needs precision for it’s
development.
• For
precision,
facts,
observations
or
measurements have to be expressed in figures.
• “It has been said when you can measure what
you are speaking about and express it in
numbers, you know something about it, but
when you cannot express it in numbers your
knowledge is of meagre and unsatisfactory
kind.”
- Lord Kelvin

• Similarly in medicine, be it diagnosis,
treatment or research everything depends
on measurement.
• E.g. you have to measure or count the
number of missing teeth OR measure the
vertical dimension and express it in
number so that it makes sense.


• Statistic or datum means a measured or
counted fact or piece of the information stated
as a figure such as height of one person, birth
weight of a baby etc.
• Statistics or data is plural of the same.
• Statistics is the science of figures.
• Bio statistics is the term used when tools of
statistics are applied to data that is derived from
biological sciences such as medicine.

Applications and uses of bio
statistics as a science
• In physiology and anatomy
– To define the limits of normality for variable
such as height or weight or Blood Pressure
etc in a population.
– Variation more than natural limits may be
pathological i.e abnormal due to play of
certain external factors.
– To find correlation between two variables like
height and weight.

• In pharmacology
– To find the action of drugs
– To compare the action of two drugs or two
successive dosages of same drug
– To find the relative potency of a new drug with
respect to a standard drug


• In medicine
– To compare the efficiency of a particular drug,
operation or line of treatment
– To find association between two attributes
such as cancer and smoking
– To identify signs and symptoms of disease


• In community medicine and public health
– To test usefulness of sera or vaccine in the
field
– In epidemiologic studies the role of causative
factors is statistically tested


• In research
– It helps in compilation of data , drawing
conclusions and making recommendations.


• For students
– By learning the methods in biostatistics a
student learns to evaluate articles published
in medical and dental journals or papers read
in medical and dental conferences.
– He also understands the basic methods of
observation in his clinical practice and
research.


• Constant
– Quantities that do not vary e.g. in biostatistics, mean,
standard deviation are considered constant for a
population

• Variable
– Characteristics which takes different values for
different person, place or thing such as height,
weight, blood pressure

• Population
– Population includes all persons, events and objects
under study. it may be finite or infinite.

• Sample
– Defined as a part of a population generally
selected so as to be representative of the
population whose variables are under study

• Parameter
– It is a constant that describes a population
e.g. in a college there are 40% girls. This
describes the population, hence it is a
parameter.

• Statistic
– Statistic is a constant that describes the
sample e.g. out of 200 students of the same
college 45% girls. This 45% will be statistic as
it describes the sample

• Attribute
– A characteristic based on which the
population can be described into categories or
class e.g. gender, caste, religion.

Source of data


Source of data
• The main sources for collection of data
– Experiments
– Surveys
– Records

• Experiments
– Experiments are performed to collect data for
investigations and research by one or more
workers.

Source of data
• Surveys
– Carried out for Epidemiological studies in the
field by trained teams to find incidence or
prevalence of health or disease in a
community.

• Records
– Records are maintained as a routine in
registers and books over a long period of time
– provides readymade data.

Types of data


Types of data
• Data is of two types
• Qualitative or discrete data
• Quantitative or continuous data


Types of data
• Qualitative or discrete data
– In such data there is no notion of magnitude or size of
an attribute as the same cannot be measured.
– The number of person having the same attribute are
variable and are measured
– e.g. like out of 100 people 75 have class I occlusion,
15 have class II occlusion and 10 have class III
occlusion.
– Class I II III are attributes , which cannot be measured
in figures, only no of people having it can be
determined

Types of data
• Quantitative or continuous data
– In this the attribute has a magnitude. both the
attribute and the number of persons having
the attribute vary
– E.g Freeway space. It varies for every patient.
It is a quantity with a different value for each
individual and is measurable. It is continuous
as it can take any value between 2 and 4 like
it can be 2.10 or 2.55 or 3.07 etc.

Data presentation


Data presentation
• Statistical data once collected should be
systematically arranged and presented
– To arouse interest of readers
– For data reduction
– To bring out important points clearly and
strikingly
– For easy grasp and meaningful conclusions
– To facilitate further analysis
– To facilitate communication

Data presentation
• Two main types of data presentation are
– Tabulation
– Graphic representation
diagrams

with


charts

and

Data presentation
Tabulation
• It is the most common method
• Data presentation is in the form of
columns and rows
• It can be of the following types
– Simple tables
– Frequency distribution tables

Simple Table
Number of patients at KIDS, Bgm
Jan 06

2,800

Feb 06

1,900

March 06

1,750

Frequency distribution table
• In a frequency distribution table, the data
is first split into convenient groups ( class
interval ) and the number of items
( frequency ) which occurs in each group
is shown in adjacent column.


Frequency distribution table
Number of Cavities

Number of Patients

0 to 3

78

3 to 6

67

6 to 9

32

9 and above

16

Data presentation
Charts and diagrams
• Useful method of presenting statistical
data
• Powerful impact on imagination of the
people


Charts and diagrams
• They are
–
–
–
–
–
–
–
–
–
–

Bar chart
Histogram
Frequency polygon
Frequency curve
Line diagram
Cumulative frequency diagram or ogive
Scatter diagram
Pie chart
Pictogram
Spot map or map diagram

Bar chart
• Length of bars drawn vertical or horizontal
is proportional to frequency of variable.
• suitable scale is chosen
• bars usually equally spaced


Bar chart
• They are of three types
_simple bar chart
_ multiple bar chart
• two or more variables are grouped together

_component bar chart
• bars are divided into two parts
• each part representing certain
proportional to magnitude of that item


item

and

Simple bar chart
300
250
200
150

Number of CD
Patients

100
50
0
1st Qtr

2nd Qtr

3rd Qtr

4th Qtr


Multiple bar chart
400
350

320

300
250

390

370
280

290

250
220

200

CD Patients
RPD Patients
FPD Patients

180

150
100
50

80

95

45

40

0
1st Qtr

2nd Qtr

3rd Qtr

4th Qtr


Component bar chart
3000
2500

500
450

2000
1500

Patients to prostho

300

1000
1500

200
2100

1850
1400

500
0
1st Qtr

2nd Qtr

3rd Qtr

4th Qtr


Patients to other
Departments

Histogram
• pictorial
presentation
of
frequency
distribution
• consists of series of rectangles
• class interval given on vertical axis
• area of rectangle is proportional to the
frequency


Histogram
80

75

70
60
50
40
30
20

45

43

40

34

32

38
29

22

10
0
Number of carious lesions

0 to 3
3 to 6
6 to 9
9 to 12
12 to 15
15 to 18
18 to 21
21 to 24
24 to 27

Frequency polygon
• obtained by joining midpoints of histogram
blocks at the height of frequency by
straight lines usually forming a polygon


Frequency polygon


Frequency curve
• when number of observations is very large
and class interval is reduced the
frequency polygon losses its angulations
becoming a smooth curve known as
frequency curve


Frequency curve


Line diagram
• line diagram are used to show the trends
of events with the passage of time


Line Diagram
90

85

80
70
60

60

50

Patients with
periodontitis

40
30

25

20
10

10

0
0

1

2

3

4


5

Cumulative Frequency Diagram
• graphical representation of cumulative
frequency .
• it is obtained by adding the frequency of
previous class


Cumulative Frequency Diagram
100
90
80
70
60
50
40
30
20
10
0

90
70
55
35

40

45

25

0 to 10 to 20 to 30 to 40 to 50 to 60 to
10
20
30
40
50
60
70
yrs yrs yrs yrs yrs yrs yrs

Prevalence of Dental
Caries ( in percent)

Scatter or Dot diagram
• shows relationship between two variables
• If the dots are clustered showing a straight
line, it shows a relationship of linear nature


Scatter or Dot diagram
14
12
10
8

Sugar Exposure

6
4
2
0
0

5

10

Carious lesion

15

Pie chart
• In this frequencies of the group are shown
as segment of circle
• Degree of angle denotes the frequency
• Angle is calculated by
– class frequency X 360
total observations


Pie chart
30, 5%
70, 11%
200, 31%

180, 29%

150, 24%


PROSTHO
CONSO
PERIO
ORTHO
PEDO

Pictogram
• Popular method of presenting data to the
common man


Pictogram
Delhi

9000

Bombay

11000

Chennai

8000

Kolkatta

5000

Hyderabad

6000

Bangalore

12000

Pune

4000

Lucknow

5000


• These maps are prepared to show
geographic distribution of frequencies of
characteristics


Measures of statistical
averages or central tendency


• Average value in a distribution is the one
central value around which all the other
observations are concentrated
• Average value helps
– to find most characteristic value of a set of
measurements
– to find which group is better off by comparing
the average of one group with that of the
other

• the most commonly used averages are
– mean
– median
– mode


Mean
• refers to arithmetic mean
• it is the summation of all the observations
divided by the total number of observations (n)
• denoted by X for sample and µ for population
• X = x1 + X2 + X3 …. Xn / n
• Advantages – it is easy to calculate
• Disadvantages – influenced by extreme values


Median
• When all the observation are arranged
either in ascending order or descending
order, the middle observation is known as
median
• In case of even number the average of the
two middle values is taken
• Median is better indicator of central value
as it is not affected by the extreme values

Mode
• Most frequently occurring observation in a
data is called mode
• Not often used in medical statistics.


Example
• Number of decayed teeth in 10 children
2,2,4,1,3,0,10,2,3,8
• Mean = 34 / 10 = 3.4
• Median = (0,1,2,2,2,3,3,4,8,10) = 2+3 /2
= 2.5
• Mode = 2 ( 3 Times)

Types of variability


• There are three types of variability
– Biological variability
– Real variability
– Experimental variability

• Experimental
subtypes

variability

– Observer Error
– Instrumental Error
– Sampling Error

are

of

three

Biological variability
• It is the natural difference which occurs in
individuals due to age, gender and other
attributes which are inherent
• This difference is small and occurs by
chance and is within certain accepted
biological limits
• e.g. vertical dimension may vary from
patient to patient

Real Variability
• such variability is more than the normal
biological limits
• the cause of difference is not inherent or
natural and is due to some external factors
• e.g. difference in incidence of cancer
among smokers and non smokers may be
due to excessive smoking and not due to
chance only

Experimental Variability
• it occurs due to the experimental study
• they are of three types
– Observer error
• the investigator may alter some information or not record the
measurement correctly

– Instrumental error
• this is due to defects in the measuring instrument
• both the observer and the instrument error are called non
sampling error

– Sampling error or errors of bias
• this is the error which occurs when the samples are not
chosen at random from population.
• Thus the sample does not truly represent the population

Measures of variation or
dispersion


• Biological data collected by measurement
shows variation
• e.g. BP of an individual can show variation
even if taken by standardized method and
measured by the same person.
• Thus one should know what is the normal
variation and how to measure it.


• The various measures of variation or
dispersion are
– Range
– Mean or average deviation
– Standard deviation
– Co efficient of variation


Range
• It is the simplest
• Defined as the difference between the
highest and the lowest figures in a sample
• Defines the normal limits of a biological
characteristic e.g. freeway space ranges
between 2-4 mm
• Not satisfactory as based on two extreme
values only

Mean deviation
• It is the summation of difference or
deviations from the mean in any
distribution ignoring the + or – sign
• Denoted by MD
MD = € ( x – x )
n
X = observation
X = mean
n = no of observation

Standard deviation
• Also called root mean square deviation
• It is an Improvement over mean deviation
used most commonly in statistical analysis
• Denoted by SD or s for sample and σ for a
population
• Denoted by the formula
SD = € ( x – x )2
n or n-1

• Greater the standard deviation, greater will
be the magnitude of dispersion from mean
• Small standard deviation means a high
degree of uniformity of the observations
• Usually measurement beyond the range of
± 2 SD are considered rare or unusual in
any distribution


• Uses of Standard Deviation
– It summarizes the deviation of a large
distribution from it’s mean.
– It helps in finding the suitable size of sample
e.g. greater deviation indicates the need for
larger sample to draw meaningful conclusions
– It helps in calculation of standard error which
helps us to determine whether the difference
between two samples is by chance or real

Coefficient of variation
• It is used to compare attributes having two
different units of measurement e.g. height
and weight
• Denoted by CV
CV = SD X 100
Mean
• and is expressed as percentage

Normal distribution or normal
curve


• So much of physiologic variation occurs in
any observation
• Necessary to
– Define normal limits
– Determine the chances of an observation
being normal
– To determine the proportion of observation
that lie within a given range

• Normal distribution or normal curve used
most commonly in statistics helps us to
find these
• Large number of observations with a
narrow class interval gives a frequency
curve called the normal curve


•
•
•
•

It has the following characteristics
Bell shaped
Bilaterally symmetrical
Frequency increases from one side
reaches its highest and decreases exactly
the way it had increased
• The highest point denotes mean, median
and mode which coincide

• Mean +_ 1 SD includes 68.27% of all observations
. such observations are fairly common
• Mean +- 2 SD includes 95.45% of all observations
i.e. by convention values beyond this range are
uncommon or rare. There chances of being
normal is 100 – 95.45 % i.e. only 4.55.%.
• Mean +- 3 SD includes 99.73%. such values are
very rare. There chance of being normal is 0.27%
only
• These limits on either side of measurement are
called confidence limits

Example


• the look of frequency distribution curve may
vary depending on mean and SD . thus it
becomes necessary to standardize it.
• Eg- One study has SD as 3 and other has SD as
2,thus it becomes difficult to compare them
• Thus normal curve is standardized by using the
unit of standard deviation to place any
measurement with reference to mean.
• The curve that emerges through this procedure
is called standard normal curve

Properties of standard normal
curve
• smooth bell shaped
• perfectly symmetrical
• based on infinite number of observations
thus curve does not touch X axis
• mean is zero
• SD is always 1
• total area under the curve is 1
• mean median mode coincide

• the unit of SD here is relative or standard
normal deviate and is denoted by Z
Z=x–x
SD
Z = Observation – Mean
SD


• With the help of Z value we can find the
area under the curve from a table
• This area helps to give the P value


Sampling


• It is not possible to include each and every
member of population as it will be time
consuming, costly , laborious .
• therefore sampling is done
• Sampling is a process by which some unit of a
population or universe are selected for the study
and by subjecting it to statistical computation,
conclusions are drawn about the population from
which these units are drawn

• The sample will be a representative of entire
population only
• It is sufficiently large
• It is unbiased
• Such sample will have its statistics almost equal
to parameters of entire population
• Two main characteristics of a representative
sample are
– Precision
– Unbiased character

Precision
• Precision depends on a sample size
• Ordinarily sample size should not be less than 30
Precision = n
s
n = sample size , s = standard deviation
• Precision is directly proportional to square root of sample
size, greater the sample size greater the precision
• Also greater the SD, less will be the precision
• Thus in such cases to obtain precision, sample size
needs to be increased


Unbiased character
• The sample should be unbiased i.e. every
individual should have an equal chance to be
selected in the sample.
• Thus a standard random sampling method
should be used
• Non sampling errors can be taken care of by
– Using standardized instruments and criteria
– By single , double , triple blind trials
– Use of a control group

Determination of sample
size


For Quantitative Data
• The investigator needs to decide how
large an error due to sampling defect is
allowable i.e. allowable error L
• Either the investigator should start with
assumed SD or do a pilot study to
estimate SD
sample size = 4 SD2
L2

For Quantitative Data
• Mean pulse rate of population is 70 beats
per min with standard deviation of 8
beats. What will be the sample size if
allowable error is ± 1
n = 4 X 8 X 8 = 256
1
• If L is less n will be more i.e. larger the
sample size lesser is the error.

For qualitative data
• In such data we deal with proportion
Sample size = n = 4 p q
L2
• p = proportion of positive character
• q = proportion of negative character
• q = 1-p or (100-p if expressed in percent)
• L = allowable error usually 10% of p

For qualitative data
• e.g. incidence rate in last influenza was found to
be 5% of the population exposed
• what should be the size of the sample
• to find incidence rate in current epidemic if
allowable error is 10%?
• p = 5% q = 95%
• l = 10 % of p = 0.5%
n = 4 X 5 X 95 = 7600
O.5 2

Probability or p value


• Concept of probability is very important in
statistics
• Probability is the chance of occurrence of any
event or permutation combination.
• It is denoted by p for sample and P for
population
• In various tests of significance we are often
interested to know whether the observed
difference between 2 samples is by chance or
due to sampling variation.
• There probability or p value is used

• P ranges from 0 to 1
• 0 = there is no chance that the observed
difference could not be due to sampling
variation
• 1 = it is absolutely certain that observed
difference between 2 samples is due to
sampling variation
• However such extreme values are rare.

• P = 0.4 i.e. chances that the difference is
due to sampling variation is 4 in 10
• Obviously the chances that it is not due to
sampling variation will be 6 in 10
• The essence of any test of significance is
to find out p value and draw inference


• If p value is 0.05 or more
– it is customary to accept that difference is due
to chance (sampling variation) .
– The observed difference is said to be
statistically not significant.

• If p value is less than 0.05
– observed difference is not due chance but
due to role of some external factors.
– The observed difference here is said to be
statistically significant.

Determination of p value
• From shape of normal curve
• We know that 95% observation lie within
mean ± 2SD . Thus probability of value
more or less than this range is 5%
• From probability tables
• p value is also determined by probability
tables in case of student t test or chi
square test

Determination of p value
• By area under normal curve
• Here z= standard normal deviate is
calculated
• Corresponding to z values the area under
the curve is determined (A)
• Probability is given by 2(0.5 - A)


Thank you
For more details please visit


Bio statistics1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (13)

Similar to Bio statistics1

Similar to Bio statistics1 (20)

More from Indian dental academy

More from Indian dental academy (20)

Recently uploaded

Recently uploaded (20)

Bio statistics1