1. What is Statistics
• Statistics is a mathematical science pertaining to collection,
analysis, interpretation, and presentation of data.
• It is applicable to a wide variety of academic disciplines from
the physical and social sciences to the humanities, as well as to
business, government, medicine and industry.
• Statistical skills enable you to intelligently collect, analyze and
interpret data relevant to their decision-making.
• Statistical concepts enable us to solve problems in a diversity of
contexts.
• Statistical thinking enables you to add substance to your
decisions
2. What is Data.......?
• Data can be defined as a systematic record of a particular
quantity. It is the different values of that quantity represented
together in a set. It is a collection of facts and figures to be used
for a specific purpose such as a survey or analysis. When
arranged in an organized form, can be called information.
4. Qualitative or Categorigal Data
• Qualitative data is a bunch of information that cannot be measured in
the form of numbers. It is also known as categorical data. It normally
comprises words, narratives, and we labelled them with names.
• It delivers information about the qualities of things in data. The
outcome of qualitative data analysis can come in the type of featuring
key words, extracting data, and ideas elaboration.
5. Quantitative Data or Numerical data
• Quantitative data is a bunch of information gathered from a
group of individuals and includes statistical data analysis.
Numerical data is another name for quantitative data. Simply, it
gives information about quantities of items in the data and the
items that can be estimated. And, we can formulate them in
terms of numbers.
6. Nominal Data
• Nominal data are used to label variables where there is no quantitative value and has no order. So, if
you change the order of the value then the meaning will remain the same.
• Thus, nominal data are observed but not measured, are unordered but non-equidistant, and have no
meaningful zero
• The only numerical activities you can perform on nominal data is to state that perception is (or isn't)
equivalent to another (equity or inequity), and you can use this data to amass them.
• You can't organize nominal data, so you can't sort them.
• Neither would you be able to do any numerical tasks as they are saved for numerical data. With
nominal data, you can calculate frequencies, proportions, percentages, and central points.
7. Examples of Nominal data:
• What is your gender?
– Male
– Female
• What languages do you speak?
– Tamil
– German
– French
– English
• What’s your nationality?
– American
– Indian
– Japanese
– German
8. Ordinal Data
• Ordinal data is almost the same as nominal data but not in the case of order as their
categories can be ordered like 1st, 2nd, etc. However, there is no continuity in the
relative distances between adjacent categories.
• Ordinal Data is observed but not measured, is ordered but non-equidistant, and has
no meaningful zero. Ordinal scales are always used for measuring happiness,
satisfaction, etc.
• As ordinal data are ordered, they can be arranged by making basic comparisons
between the categories, for example, greater or less than, higher or lower, and so on.
• With ordinal data, you can calculate the same things as nominal data like frequencies,
proportions, percentage, central point but there is one more point added in ordinal
data that is summary statistics and similarly bayesian statistics.
9. Examples of Ordinal data:
• Opinion
– Agree
– Disagree
– Mostly agree
– Neutral
– Mostly disagree
• Time of day
– Morning
– Noon
– Night
• Economic status:
– Low
– Medium
– High
10. Interval Data or Discrete data
• Interval Data are measured and ordered with the nearest items but have no meaningful
zero.
• The central point of an Interval scale is that the word 'Interval' signifies 'space in between',
which is the significant thing to recall, interval scales not only educate us about the order
but additionally about the value between every item.
• Even though interval data can show up fundamentally the same as ratio data, the thing that
matters is in their characterized zero-points. If the zero-point of the scale has been picked
subjectively, at that point the data can't be ratio data and should be interval data.
• There are some descriptive statistics that you can calculate for interval data are central
point (mean, median, mode), range (minimum, maximum), and spread (percentiles,
interquartile range, and standard deviation).
• In addition to that, similar other statistical data analysis techniques can be used for more
analysis.
11. Examples of Interval data:
• The number of students in a class
• The number of workers in a company
• Time interval on a 12-hour clock
The difference between 5 minutes and 10 minutes is the
same as 15 minutes and 20 minutes in a 12-hour clock.
• The number of test questions you answered correctly
• Temperature (300 c, 600 c, etc.,)
12. Ratio Data or contineous data
• Ratio Data are measured and ordered with equidistant items and
a meaningful zero and never be negative like interval data.
• An outstanding example of ratio data is the measurement of
heights. It could be measured in centimetres, inches, meters, or
feet and it is not practicable to have a negative height.
• Ratio data enlightens us regarding the order for variables, the
contrasts among them, and they have absolutely zero. It permits
a wide range of estimations and surmisings to be performed and
drawn.
13. Example of Ratio data:
• Age (from 0 years to 100+)
• The height of children
• Distance (measured with a ruler or any other assessing
device)
• Speed of cars (80-100)
• The amount of time required to complete a project (1-2
hour)
16. Application of statistics
Biology and Medicine
• In biology and medical sciences, there is regular use of
statistical tools for collecting, presenting, and also analyzing the
observed data pertaining to the causes of the incidence of
diseases.
• For example, the statistical pulse rate, body temperature, blood
pressure, etc. of the patients helps the physician in diagnosing
the disease properly. Additionally, statistics help in testing the
efficacy of manufacturing drugs or injections or medicines for
controlling or curing certain diseases.
17. Measure of central tendency
• Measure of central tendency is also known as summary
statistics that is used to represents the center point or a
particular value of a data set or sample set.
• In statistics, there are three common measures of central
tendency as shown
–The mode
–The median
–The mean
18. Mean
It is measure of average of all value in a sample set.
For example,
Calculate the mean of the following data:
1 5 4 3 2
Sum the scores (X):
1 + 5 + 4 + 3 + 2 = 15
Divide the sum (X = 15) by the number of scores (N = 5):
15 / 5 = 3
• Mean = X = 3
19. Median
• What is the median of the following scores:
24 18 19 42 12 16
• Sort the scores:
42 24 19 18 16 12
• Determine the middle score:
middle = (N + 1) / 2 = (6 + 1) / 2 = 3.5
• Median = average of 3rd and 4th scores:
(19 + 18) / 2 = 18.5
20. Mode
• The mode is the score that occurs most frequently in a set
of data
• Example:
8,4, 6, 8, 5, 9, 7, 8
Mode=8
• When a distribution has two “modes,” it is called bimodal
• If a distribution has more than 2 “modes,” it is called
multimodal
21. Measures of variability
• Dispersion is the state of getting dispersed or spread. Statistical
dispersion means the extent to which a numerical data is likely
to vary about an average value. In other words, dispersion helps
to understand the distribution of the data.
22. Absolute Measure of Dispersion
An absolute measure of dispersion contains the same unit as the original data set.
Absolute dispersion method expresses the variations in terms of the average of
deviations of observations like standard or means deviations. It includes range, standard
deviation, quartile deviation, etc. The types of absolute measures of dispersion are:
Range: It is simply the difference between the maximum value and the minimum
value given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
Variance: Deduct the mean from each data in the set then squaring each of them and
adding each square and finally dividing them by the total no of values in the data set
is the variance. Variance (σ2)=∑(X−μ)2/N
Standard Deviation: The square root of the variance is known as the standard
deviation i.e. S.D. = √σ.
Quartiles and Quartile Deviation: The quartiles are values that divide a list of
numbers into quarters. The quartile deviation is half of the distance between the third
23. Range
It is the simplest method of measurement of dispersion.
It is defined as the difference between the largest and the smallest item in a given
distribution.
Range = Largest item (L) – Smallest item (S)
Interquartile Range
It is defined as the difference between the Upper Quartile and Lower Quartile of a
given distribution.
Interquartile Range = Upper Quartile (Q3)–Lower Quartile(Q1)
24. Variance
Variance is a measure of how data points differ from the mean.
A variance is a measure of how far a set of data (numbers) are spread out from their mean
(average) value.
The more the value of variance, the data is more scattered from its mean and if the value
of variance is low or minimum, then it is less scattered from mean. Therefore, it is called a
measure of spread of data from mean.
the formula for variance is
Var (X) = E[(X –μ) 2]
the variance is the square of standard deviation, i.e.,
Variance = (Standard deviation)2= σ2
25. Variance
Example: Find the variance of the numbers 3, 8, 6, 10, 12, 9,
11, 10, 12, 7.
Given,
3, 8, 6, 10, 12, 9, 11, 10, 12, 7
Step 1: Compute the mean of the 10 values given.
Mean (μ) = (3+8+6+10+12+9+11+10+12+7) / 10 = 88 / 10 = 8.8
27. Coefficient of variance
• The coefficient of variance (CV) is a relative measure of variability that
indicates the size of a standard deviation in relation to its mean.
• It is a standardized, unitless measure that allows you to compare
variability between disparate groups and characteristics.
• It is also known as the r
• elative standard deviation (RSD).
• The coefficient of variation facilitates meaningful comparisons in
scenarios where absolute measures cannot.