3. Introduction
The 21st century is often
referred to as the
‘century of data’.
Massive amounts of data
are recorded in the
digital world we live in
today.
One of the most pressing
challenges in modern
science and technology is
to cope with this massive
amount of data.
Also, To analyze this data
to extract important
information.
4. As it
Happened!!
The term statistics is ultimately derived
from the new Latin statisticum collegium
("council of state") and the Italian word
statista ("statesman" or "politician").]
The original principal purpose of Statistik
was data to be used by governmental and
(often centralized) administrative bodies.
It acquired the meaning of the collection
and classification of data generally in the
early 19th century.
5. 1791
It was introduced into English in 1791 by Sir John
Sinclair when he published the first of 21 volumes titled
“Statistical Account of Scotland”.
1845
The first book to have 'statistics' in its title was
"Contributions to Vital Statistics" (1845) by Francis GP
Neison, actuary to the Medical Invalid and General Life
Office.
7. Scattered Diagram
• The scatter diagram graphs pairs of
numerical data, with one variable on
each axis, to look for a relationship
between them. If the variables are
correlated, the points will fall along a
line or curve. The better the
correlation, the tighter the points will
hug the line.
8. Histogram
• A histogram is a plot that lets you discover, and
show, the underlying frequency distribution (shape)
of a set of continuous data. This allows the
inspection of the data for its underlying distribution
(e.g., normal distribution), outliers, skewness, etc.
9. Ogive Curve
• An ogive (oh-jive), sometimes called
a cumulative frequency polygon, is a
type of frequency polygon that
shows cumulative frequencies.
• An ogive graph plots cumulative
frequency on the y-axis and class
boundaries along the x-axis.
10. Pie Chart
• A pie chart is a circular chart divided
into wedge-like sectors, illustrating
proportion. Each wedge represents a
proportionate part of the whole, and
the total value of the pie is always 100
percent.
• Pie charts can make the size of portions
easy to understand at a glance. They're
widely used in business presentations
and education to show the proportions
among a large variety of categories
including expenses, segments of a
population, or answers to a survey.
11. Bar Graph
• A bar graph (also known as a bar
chart or bar diagram) is a visual
tool that uses bars to compare
data among categories. A bar
graph may run horizontally or
vertically. The important thing to
know is that the longer the bar,
the greater its value.
• Bar graphs consist of two axes. On
a vertical bar graph, as shown
above, the horizontal axis (or x-
axis) shows the data categories.
13. Mean
• Mean term is used to refer to a central value of a
discrete set of numbers: specifically, the sum of the
values divided by the number of values. The
arithmetic mean of a set of numbers x1, x2, ..., xn is
typically denoted by 𝑥 , pronounced "x bar".
• Formula:
𝑥 = 𝑖=1
𝑛
𝑥𝑖
14. Median
• Median is the middle number in a sorted list of numbers.
• To determine the median value in a sequence of numbers, the numbers must first
be arranged in value order from lowest to highest.
• median = observation in position
𝑛+1
2
, if n odd.
= average of two observations in positions
𝑛
2
and
𝑛+2
2
, if n even.
15. Mode
• The mode of a set of data values is the value
that appears most often.
• It is the value of x at which its probability mass
function takes its maximum value.
• In other words, it is the value that is most likely
to be sampled
16. Standard Deviation
In statistics, the standard deviation (SD, also represented by the Greek letter sigma
σ or the Latin letter s) is a measure that is used to quantify the amount of variation
or dispersion of a set of data values.
A low standard deviation indicates that the data points tend to be close to the
mean (also called the expected value) of the set, while a high standard deviation
indicates that the data points are spread out over a wider range of values.
Formula:
s = 𝑖=1
𝑛 (𝑥 𝑖 − 𝑥)2
𝑛−1
17. Coefficient of Variance
The coefficient of variation (CV) is a measure of relative variability.
It is the ratio of the standard deviation to the mean (average).
Higher CV implies more inconsistency and vice-versa.
Formula:
CV =
𝑠
𝑥
* 100
19. Conclusion
Data has been an essential part of our life and to represent and
analyze it we have been using statistics since 100’s of years.
They are very important in any field in which data is
involved.
Extraction of important facts is prime requirement from this
data.
This is done using Data Science and Business Analytics.
Notes de l'éditeur
Earlier statistics was used for tabular representation of data