Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Bio statistics1
1. Bio Statistics
Part 1
INDIAN DENTAL ACADEMY
Leader in continuing dental education
www.indiandentalacademy.com
www.indiandentalacademy.com
2. Contents
•
•
•
•
•
•
•
•
•
•
•
•
Introduction
Common Statistical Terms
Source of data
Types of data
Data presentation
Measures of statistical averages or central tendency
Types of variability
Measures of variation or dispersion
Normal distribution or normal curve
Sampling
Determination of sample size
Probability or p value
www.indiandentalacademy.com
4. • Any science needs precision for it’s
development.
• For
precision,
facts,
observations
or
measurements have to be expressed in figures.
• “It has been said when you can measure what
you are speaking about and express it in
numbers, you know something about it, but
when you cannot express it in numbers your
knowledge is of meagre and unsatisfactory
kind.”
- Lord Kelvin
www.indiandentalacademy.com
5. • Similarly in medicine, be it diagnosis,
treatment or research everything depends
on measurement.
• E.g. you have to measure or count the
number of missing teeth OR measure the
vertical dimension and express it in
number so that it makes sense.
www.indiandentalacademy.com
6. • Statistic or datum means a measured or
counted fact or piece of the information stated
as a figure such as height of one person, birth
weight of a baby etc.
• Statistics or data is plural of the same.
• Statistics is the science of figures.
• Bio statistics is the term used when tools of
statistics are applied to data that is derived from
biological sciences such as medicine.
www.indiandentalacademy.com
7. Applications and uses of bio
statistics as a science
• In physiology and anatomy
– To define the limits of normality for variable
such as height or weight or Blood Pressure
etc in a population.
– Variation more than natural limits may be
pathological i.e abnormal due to play of
certain external factors.
– To find correlation between two variables like
height and weight.
www.indiandentalacademy.com
8. Applications and uses of bio
statistics as a science
• In pharmacology
– To find the action of drugs
– To compare the action of two drugs or two
successive dosages of same drug
– To find the relative potency of a new drug with
respect to a standard drug
www.indiandentalacademy.com
9. Applications and uses of bio
statistics as a science
• In medicine
– To compare the efficiency of a particular drug,
operation or line of treatment
– To find association between two attributes
such as cancer and smoking
– To identify signs and symptoms of disease
www.indiandentalacademy.com
10. Applications and uses of bio
statistics as a science
• In community medicine and public health
– To test usefulness of sera or vaccine in the
field
– In epidemiologic studies the role of causative
factors is statistically tested
www.indiandentalacademy.com
11. Applications and uses of bio
statistics as a science
• In research
– It helps in compilation of data , drawing
conclusions and making recommendations.
www.indiandentalacademy.com
12. Applications and uses of bio
statistics as a science
• For students
– By learning the methods in biostatistics a
student learns to evaluate articles published
in medical and dental journals or papers read
in medical and dental conferences.
– He also understands the basic methods of
observation in his clinical practice and
research.
www.indiandentalacademy.com
14. Common Statistical Terms
• Constant
– Quantities that do not vary e.g. in biostatistics, mean,
standard deviation are considered constant for a
population
• Variable
– Characteristics which takes different values for
different person, place or thing such as height,
weight, blood pressure
• Population
– Population includes all persons, events and objects
under study. it may be finite or infinite.
www.indiandentalacademy.com
15. Common Statistical Terms
• Sample
– Defined as a part of a population generally
selected so as to be representative of the
population whose variables are under study
• Parameter
– It is a constant that describes a population
e.g. in a college there are 40% girls. This
describes the population, hence it is a
parameter.
www.indiandentalacademy.com
16. Common Statistical Terms
• Statistic
– Statistic is a constant that describes the
sample e.g. out of 200 students of the same
college 45% girls. This 45% will be statistic as
it describes the sample
• Attribute
– A characteristic based on which the
population can be described into categories or
class e.g. gender, caste, religion.
www.indiandentalacademy.com
18. Source of data
• The main sources for collection of data
– Experiments
– Surveys
– Records
• Experiments
– Experiments are performed to collect data for
investigations and research by one or more
workers.
www.indiandentalacademy.com
19. Source of data
• Surveys
– Carried out for Epidemiological studies in the
field by trained teams to find incidence or
prevalence of health or disease in a
community.
• Records
– Records are maintained as a routine in
registers and books over a long period of time
– provides readymade data.
www.indiandentalacademy.com
21. Types of data
• Data is of two types
• Qualitative or discrete data
• Quantitative or continuous data
www.indiandentalacademy.com
22. Types of data
• Qualitative or discrete data
– In such data there is no notion of magnitude or size of
an attribute as the same cannot be measured.
– The number of person having the same attribute are
variable and are measured
– e.g. like out of 100 people 75 have class I occlusion,
15 have class II occlusion and 10 have class III
occlusion.
– Class I II III are attributes , which cannot be measured
in figures, only no of people having it can be
determined
www.indiandentalacademy.com
23. Types of data
• Quantitative or continuous data
– In this the attribute has a magnitude. both the
attribute and the number of persons having
the attribute vary
– E.g Freeway space. It varies for every patient.
It is a quantity with a different value for each
individual and is measurable. It is continuous
as it can take any value between 2 and 4 like
it can be 2.10 or 2.55 or 3.07 etc.
www.indiandentalacademy.com
25. Data presentation
• Statistical data once collected should be
systematically arranged and presented
– To arouse interest of readers
– For data reduction
– To bring out important points clearly and
strikingly
– For easy grasp and meaningful conclusions
– To facilitate further analysis
– To facilitate communication
www.indiandentalacademy.com
26. Data presentation
• Two main types of data presentation are
– Tabulation
– Graphic representation
diagrams
with
www.indiandentalacademy.com
charts
and
27. Data presentation
Tabulation
• It is the most common method
• Data presentation is in the form of
columns and rows
• It can be of the following types
– Simple tables
– Frequency distribution tables
www.indiandentalacademy.com
28. Simple Table
Number of patients at KIDS, Bgm
Jan 06
2,800
Feb 06
1,900
March 06
1,750
www.indiandentalacademy.com
29. Frequency distribution table
• In a frequency distribution table, the data
is first split into convenient groups ( class
interval ) and the number of items
( frequency ) which occurs in each group
is shown in adjacent column.
www.indiandentalacademy.com
31. Data presentation
Charts and diagrams
• Useful method of presenting statistical
data
• Powerful impact on imagination of the
people
www.indiandentalacademy.com
32. Charts and diagrams
• They are
–
–
–
–
–
–
–
–
–
–
Bar chart
Histogram
Frequency polygon
Frequency curve
Line diagram
Cumulative frequency diagram or ogive
Scatter diagram
Pie chart
Pictogram
Spot map or map diagram
www.indiandentalacademy.com
33. Bar chart
• Length of bars drawn vertical or horizontal
is proportional to frequency of variable.
• suitable scale is chosen
• bars usually equally spaced
www.indiandentalacademy.com
34. Bar chart
• They are of three types
_simple bar chart
_ multiple bar chart
• two or more variables are grouped together
_component bar chart
• bars are divided into two parts
• each part representing certain
proportional to magnitude of that item
www.indiandentalacademy.com
item
and
40. Frequency polygon
• obtained by joining midpoints of histogram
blocks at the height of frequency by
straight lines usually forming a polygon
www.indiandentalacademy.com
42. Frequency curve
• when number of observations is very large
and class interval is reduced the
frequency polygon losses its angulations
becoming a smooth curve known as
frequency curve
www.indiandentalacademy.com
46. Cumulative Frequency Diagram
• graphical representation of cumulative
frequency .
• it is obtained by adding the frequency of
previous class
www.indiandentalacademy.com
48. Scatter or Dot diagram
• shows relationship between two variables
• If the dots are clustered showing a straight
line, it shows a relationship of linear nature
www.indiandentalacademy.com
50. Pie chart
• In this frequencies of the group are shown
as segment of circle
• Degree of angle denotes the frequency
• Angle is calculated by
– class frequency X 360
total observations
www.indiandentalacademy.com
57. • Average value in a distribution is the one
central value around which all the other
observations are concentrated
• Average value helps
– to find most characteristic value of a set of
measurements
– to find which group is better off by comparing
the average of one group with that of the
other
www.indiandentalacademy.com
58. • the most commonly used averages are
– mean
– median
– mode
www.indiandentalacademy.com
59. Mean
• refers to arithmetic mean
• it is the summation of all the observations
divided by the total number of observations (n)
• denoted by X for sample and µ for population
• X = x1 + X2 + X3 …. Xn / n
• Advantages – it is easy to calculate
• Disadvantages – influenced by extreme values
www.indiandentalacademy.com
60. Median
• When all the observation are arranged
either in ascending order or descending
order, the middle observation is known as
median
• In case of even number the average of the
two middle values is taken
• Median is better indicator of central value
as it is not affected by the extreme values
www.indiandentalacademy.com
61. Mode
• Most frequently occurring observation in a
data is called mode
• Not often used in medical statistics.
www.indiandentalacademy.com
62. Example
• Number of decayed teeth in 10 children
2,2,4,1,3,0,10,2,3,8
• Mean = 34 / 10 = 3.4
• Median = (0,1,2,2,2,3,3,4,8,10) = 2+3 /2
= 2.5
• Mode = 2 ( 3 Times)
www.indiandentalacademy.com
64. • There are three types of variability
– Biological variability
– Real variability
– Experimental variability
• Experimental
subtypes
variability
– Observer Error
– Instrumental Error
– Sampling Error
www.indiandentalacademy.com
are
of
three
65. Biological variability
• It is the natural difference which occurs in
individuals due to age, gender and other
attributes which are inherent
• This difference is small and occurs by
chance and is within certain accepted
biological limits
• e.g. vertical dimension may vary from
patient to patient
www.indiandentalacademy.com
66. Real Variability
• such variability is more than the normal
biological limits
• the cause of difference is not inherent or
natural and is due to some external factors
• e.g. difference in incidence of cancer
among smokers and non smokers may be
due to excessive smoking and not due to
chance only
www.indiandentalacademy.com
67. Experimental Variability
• it occurs due to the experimental study
• they are of three types
– Observer error
• the investigator may alter some information or not record the
measurement correctly
– Instrumental error
• this is due to defects in the measuring instrument
• both the observer and the instrument error are called non
sampling error
– Sampling error or errors of bias
• this is the error which occurs when the samples are not
chosen at random from population.
• Thus the sample does not truly represent the population
www.indiandentalacademy.com
69. • Biological data collected by measurement
shows variation
• e.g. BP of an individual can show variation
even if taken by standardized method and
measured by the same person.
• Thus one should know what is the normal
variation and how to measure it.
www.indiandentalacademy.com
70. • The various measures of variation or
dispersion are
– Range
– Mean or average deviation
– Standard deviation
– Co efficient of variation
www.indiandentalacademy.com
71. Range
• It is the simplest
• Defined as the difference between the
highest and the lowest figures in a sample
• Defines the normal limits of a biological
characteristic e.g. freeway space ranges
between 2-4 mm
• Not satisfactory as based on two extreme
values only
www.indiandentalacademy.com
72. Mean deviation
• It is the summation of difference or
deviations from the mean in any
distribution ignoring the + or – sign
• Denoted by MD
MD = € ( x – x )
n
X = observation
X = mean
n = no of observation
www.indiandentalacademy.com
73. Standard deviation
• Also called root mean square deviation
• It is an Improvement over mean deviation
used most commonly in statistical analysis
• Denoted by SD or s for sample and σ for a
population
• Denoted by the formula
SD = € ( x – x )2
n or n-1
www.indiandentalacademy.com
74. • Greater the standard deviation, greater will
be the magnitude of dispersion from mean
• Small standard deviation means a high
degree of uniformity of the observations
• Usually measurement beyond the range of
± 2 SD are considered rare or unusual in
any distribution
www.indiandentalacademy.com
75. • Uses of Standard Deviation
– It summarizes the deviation of a large
distribution from it’s mean.
– It helps in finding the suitable size of sample
e.g. greater deviation indicates the need for
larger sample to draw meaningful conclusions
– It helps in calculation of standard error which
helps us to determine whether the difference
between two samples is by chance or real
www.indiandentalacademy.com
76. Coefficient of variation
• It is used to compare attributes having two
different units of measurement e.g. height
and weight
• Denoted by CV
CV = SD X 100
Mean
• and is expressed as percentage
www.indiandentalacademy.com
78. • So much of physiologic variation occurs in
any observation
• Necessary to
– Define normal limits
– Determine the chances of an observation
being normal
– To determine the proportion of observation
that lie within a given range
www.indiandentalacademy.com
79. • Normal distribution or normal curve used
most commonly in statistics helps us to
find these
• Large number of observations with a
narrow class interval gives a frequency
curve called the normal curve
www.indiandentalacademy.com
80. •
•
•
•
It has the following characteristics
Bell shaped
Bilaterally symmetrical
Frequency increases from one side
reaches its highest and decreases exactly
the way it had increased
• The highest point denotes mean, median
and mode which coincide
www.indiandentalacademy.com
82. • Mean +_ 1 SD includes 68.27% of all observations
. such observations are fairly common
• Mean +- 2 SD includes 95.45% of all observations
i.e. by convention values beyond this range are
uncommon or rare. There chances of being
normal is 100 – 95.45 % i.e. only 4.55.%.
• Mean +- 3 SD includes 99.73%. such values are
very rare. There chance of being normal is 0.27%
only
• These limits on either side of measurement are
called confidence limits
www.indiandentalacademy.com
85. • the look of frequency distribution curve may
vary depending on mean and SD . thus it
becomes necessary to standardize it.
• Eg- One study has SD as 3 and other has SD as
2,thus it becomes difficult to compare them
• Thus normal curve is standardized by using the
unit of standard deviation to place any
measurement with reference to mean.
• The curve that emerges through this procedure
is called standard normal curve
www.indiandentalacademy.com
87. Properties of standard normal
curve
• smooth bell shaped
• perfectly symmetrical
• based on infinite number of observations
thus curve does not touch X axis
• mean is zero
• SD is always 1
• total area under the curve is 1
• mean median mode coincide
www.indiandentalacademy.com
88. • the unit of SD here is relative or standard
normal deviate and is denoted by Z
Z=x–x
SD
Z = Observation – Mean
SD
www.indiandentalacademy.com
89. • With the help of Z value we can find the
area under the curve from a table
• This area helps to give the P value
www.indiandentalacademy.com
92. • It is not possible to include each and every
member of population as it will be time
consuming, costly , laborious .
• therefore sampling is done
• Sampling is a process by which some unit of a
population or universe are selected for the study
and by subjecting it to statistical computation,
conclusions are drawn about the population from
which these units are drawn
www.indiandentalacademy.com
93. • The sample will be a representative of entire
population only
• It is sufficiently large
• It is unbiased
• Such sample will have its statistics almost equal
to parameters of entire population
• Two main characteristics of a representative
sample are
– Precision
– Unbiased character
www.indiandentalacademy.com
94. Precision
• Precision depends on a sample size
• Ordinarily sample size should not be less than 30
Precision = n
s
n = sample size , s = standard deviation
• Precision is directly proportional to square root of sample
size, greater the sample size greater the precision
• Also greater the SD, less will be the precision
• Thus in such cases to obtain precision, sample size
needs to be increased
www.indiandentalacademy.com
95. Unbiased character
• The sample should be unbiased i.e. every
individual should have an equal chance to be
selected in the sample.
• Thus a standard random sampling method
should be used
• Non sampling errors can be taken care of by
– Using standardized instruments and criteria
– By single , double , triple blind trials
– Use of a control group
www.indiandentalacademy.com
97. For Quantitative Data
• The investigator needs to decide how
large an error due to sampling defect is
allowable i.e. allowable error L
• Either the investigator should start with
assumed SD or do a pilot study to
estimate SD
sample size = 4 SD2
L2
www.indiandentalacademy.com
98. For Quantitative Data
• Mean pulse rate of population is 70 beats
per min with standard deviation of 8
beats. What will be the sample size if
allowable error is ± 1
n = 4 X 8 X 8 = 256
1
• If L is less n will be more i.e. larger the
sample size lesser is the error.
www.indiandentalacademy.com
99. For qualitative data
• In such data we deal with proportion
Sample size = n = 4 p q
L2
• p = proportion of positive character
• q = proportion of negative character
• q = 1-p or (100-p if expressed in percent)
• L = allowable error usually 10% of p
www.indiandentalacademy.com
100. For qualitative data
• e.g. incidence rate in last influenza was found to
be 5% of the population exposed
• what should be the size of the sample
• to find incidence rate in current epidemic if
allowable error is 10%?
• p = 5% q = 95%
• l = 10 % of p = 0.5%
n = 4 X 5 X 95 = 7600
O.5 2
www.indiandentalacademy.com
102. • Concept of probability is very important in
statistics
• Probability is the chance of occurrence of any
event or permutation combination.
• It is denoted by p for sample and P for
population
• In various tests of significance we are often
interested to know whether the observed
difference between 2 samples is by chance or
due to sampling variation.
• There probability or p value is used
www.indiandentalacademy.com
103. • P ranges from 0 to 1
• 0 = there is no chance that the observed
difference could not be due to sampling
variation
• 1 = it is absolutely certain that observed
difference between 2 samples is due to
sampling variation
• However such extreme values are rare.
www.indiandentalacademy.com
104. • P = 0.4 i.e. chances that the difference is
due to sampling variation is 4 in 10
• Obviously the chances that it is not due to
sampling variation will be 6 in 10
• The essence of any test of significance is
to find out p value and draw inference
www.indiandentalacademy.com
105. • If p value is 0.05 or more
– it is customary to accept that difference is due
to chance (sampling variation) .
– The observed difference is said to be
statistically not significant.
• If p value is less than 0.05
– observed difference is not due chance but
due to role of some external factors.
– The observed difference here is said to be
statistically significant.
www.indiandentalacademy.com
106. Determination of p value
• From shape of normal curve
• We know that 95% observation lie within
mean ± 2SD . Thus probability of value
more or less than this range is 5%
• From probability tables
• p value is also determined by probability
tables in case of student t test or chi
square test
www.indiandentalacademy.com
107. Determination of p value
• By area under normal curve
• Here z= standard normal deviate is
calculated
• Corresponding to z values the area under
the curve is determined (A)
• Probability is given by 2(0.5 - A)
www.indiandentalacademy.com
108. Thank you
For more details please visit
www.indiandentalacademy.com
www.indiandentalacademy.com