SlideShare une entreprise Scribd logo
1  sur  81
Télécharger pour lire hors ligne
STATISTICS
Visit: Learnbay.co
Data science and AI Certification Course
What is Statistics?
Statistics is a mathematical science including methods of collecting, organizing and analyzing data in
such a way that meaningful conclusions can be drawn from them. In general, its investigations and
analyses fall into two broad categories called descriptive and inferential statistics.
Visit: Learnbay.co
Developing Statistical Thinking
Statistics include numerical facts and figures. For instance:
• The largest earthquake measured 9.2 on the Richter scale.
• Men are at least 10 times more likely than women to commit murder.
• One in every 8 Americans is COVID positive.
The study of statistics involves math and relies upon calculations of numbers. But it also relies heavily on how the
numbers are chosen and how the statistics are interpreted. For example, consider some scenarios and the
interpretations based upon the presented statistics.
1. A new advertisement for Amul’s ice cream introduced in late May of last year resulted in a 30% increase in ice
cream sales for the following three months. Thus, the advertisement was effective.
2. The more liquor shop in a city, the more crime there is. Thus, liquor shops lead to crime.
Visit: Learnbay.co
1. Flaw: A major flaw is that ice cream consumption generally increases in the months of June, July, and
August regardless of advertisements. This effect is called a history effect and leads people to interpret
outcomes as the result of one variable when another variable (in this case, one having to do with the
passage of time) is actually responsible.
2. Flaw: A major flaw is that both increased liquor shops and increased crime rates can be explained by
larger populations. In bigger cities, there are both more liquor shops and more crime. This problem refers
to the third-variable problem. Namely, a third variable can cause both situations; however, people
erroneously believe that there is a causal relationship between the two primary variables rather than
recognize that a third variable can cause both.
Hence, the correct Interpretation of the numbers are necessary!!!!
Visit: Learnbay.co
Types of Statistics:
1. Descriptive statistics deals with the processing of data without attempting to draw any
inferences from it. The characteristics of the data are described in simple terms. Events that are
dealt with include everyday happenings such as accidents, prices of goods, business, incomes,
epidemics, sports data, population data.
2. Inferential statistics is a scientific discipline that uses mathematical tools to make forecasts
and projections by analysing the given data. This is of use to people employed in such fields as
engineering, economics, biology, the social sciences, business, agriculture and
communications.
Visit: Learnbay.co
Population?
Refers to the total amount of things.
Sample?
Small part of population that is used for
study.
Sample Size?
Total amount of things in a sample.
Variable?
What we are studying. (Measurable,
Countable, Categorized)
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
• The difference between interval and ratio scales comes from their ability to dip below zero. Interval scales hold no true zero and
can represent values below zero. For example, you can measure temperature below 0 degrees Celsius, such as -10 degrees.
• Ratio variables, on the other hand, never fall below zero. Height and weight measure from 0 and above, but never fall below it.
QUANTITATIVE VARIABLE
INTERVAL RATIO
An interval scale is one where there is order and
the difference between two values is meaningful.
A ratio variable, has all the properties of an interval variable,
and also has a clear definition of 0.
Ex: Ex:
Temperature (Fahrenheit),
Temperature (Celsius), pH, Weight, length, temperature in Kelvin
Visit: Learnbay.co
Data Visualization Basics
We need data visualization because a visual summary of information makes it easier to identify patterns and
trends than looking through thousands of rows on a spreadsheet. It’s the way the human brain works.
Since the purpose of data analysis is to gain insights, data is much more valuable when it is visualized.
Even if a data analyst can pull insights from data without visualization, it will be more difficult to communicate the
meaning without visualization.
Charts and graphs make communicating data findings easier even if you can identify the patterns without them.
Data visualization is the representation of data or information in a graph, chart, or other visual format. It
communicates relationships of the data with images. This is important because it allows trends and patterns to
be more easily seen.
What is data Visualization?
What is its importance?
Visit: Learnbay.co
Line Chart.
A line chart is, as one can imagine, a line or multiple lines showing how single, or multiple
variables develop over time.
Pie Chart.
A pie chart is a circular graph divided into slices. The larger a slice is the bigger portion of
the total quantity it represents.
Bar Graph.
A bar chart or bar graph is a chart or graph that presents categorical data with rectangular
bars with heights or lengths proportional to the values that they represent. The bars can be
plotted vertically or horizontally. Can be of one variable or many variable.
Histogram
A series of bins showing us the frequency of observations of a given variable.
Scatter Plots
A scatter plot is a great indicator that allows us to see whether there is a pattern to be
found between two variables. E.g. : Positive or negative relationship.
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Descriptive Statistics
deals with the processing of data without attempting to draw any inferences from it. The characteristics
of the data are described in simple terms. Events that are dealt with include everyday happenings such
as accidents, prices of goods, business, incomes, epidemics, sports data, population data.
When we give description of data, there can be 3 kinds:
1. Measures of Central Tendency – Mean, Median and Mode
2. Measures of Dispersion – Standard Deviation, Variance, Range, IQR (Inter Quartile Range)
3. Measure of Symmetricity/Shape – Skewness and Kurtosis
Visit: Learnbay.co
1. Measure of Central Tendency
A measure of central tendency is a summary statistic that represents the centre point or typical value of a dataset. These measures indicate
where most values in a distribution fall and are also referred to as the central location of a distribution.
1. Mean
Average value of the set of Numbers. Mean is a a number around which a whole data is spread out. Denoted by µ for population mean and
for sample mean.
Example: Find the mean of 5,5,2,6,3,8,9?
A: Mean is (5+5+2+6+3+8+9) / 7 = 38/7 = 5.43
2. Median
Median is the value which divides the data in 2 equal parts i.e. number of terms on right side of it is same as number of terms on left side of it
when data is arranged in either ascending or descending order.
(Note: If you sort data in descending order, it won’t affect median but IQR will be negative. IQR will be discussed in next slide.)
Example: Find the Median of 5,5,2,6,3,8,9?
A: Putting it in ascending order = 2,3,5,5,6,8,9. Hence, Median = Mid Number = 5.
(Note: Median of a even set of numbers can be found by taking the average of the 2 middle numbers.
E.g. Median of 2,3,4,7 = average of (3 and 4 ) = 3.5)
3. Mode
Mode is the term appearing maximum time in data set i.e. term that has highest frequency.
Example: Find the Median of 5,5,2,6,3,8,9?
A: Mode = Maximum number of repetition in dataset = 5. Hence, Mode = 5.
(Note: If there is no repetition of data then mode is not present. E.g.: What is the mode of 1,2,3,5,6?
A: None i.e. mode is not present.)
Visit: Learnbay.co
1. What is Minimum and Maximum value?
It is the minimum and Maximum values of the dataset respectively.
2. What is 1st and 3rd Quartile? – Also called the lower and upper quartile respectively.
When we divide the dataset into two groups while calculating median (sorted in ascending order), then the
median of first half is 1st Quartile and median of second half is 3rd Quartile.
3. Then where is the 2nd Quartile?
Your median is the 2nd Quartile ;-)
Let’s build some concepts before going ahead.
Visit: Learnbay.co
Visit: Learnbay.co
Q. Given is the ages of people registered for a webinar, calculate the 5 point summary (5 number summary) of
the ages of the participants?
19, 26, 25, 37, 32, 28, 22, 23, 29, 34, 39, 31
Visit: Learnbay.co
19, 26, 25, 37, 32, 28, 22, 23, 29, 34, 39, 31
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
2. Measure of Spread / Dispersion
1. Standard deviation
Standard deviation is the measurement of average distance between each quantity and mean. That is, how data is
spread out from mean. A low standard deviation indicates that the data points tend to be close to the mean of the
data set, while a high standard deviation indicates that the data points are spread out over a wider range of values.
SD of Population
(s)
SD of Sample
(s)
i
i
In Python :
Population STD = pstdev()
Sample STD = stdev()
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Measure of Spread / Dispersion
2. Variance
Variance is a square of average distance between each quantity and mean. That is, it is square of standard deviation.
In Python : Population Var = pvariance() Sample Variance = variance()
3. Range
Range is one of the simplest techniques of descriptive statistics. It is the difference between lowest and highest value.
Range = Maximum - Minimum
4. IQR (Interquartile Range)
In statistics and probability, quartiles are values that divide your data into quarters provided data is sorted in an ascending
order.
IQR = Q3 – Q1
i i
Population Variance
(s2)
Sample Variance
(S2)
Visit: Learnbay.co
Measure of Spread / Dispersion
Steps to find out the IQR
1. Order the data from least to greatest
2. Find the median
3. The left side of median is lower half and right side of the data is upper half.
4. Calculate the median of both the lower and upper half of the data (Called Q1 and Q3 respectively)
5. The IQR is the difference between the upper and lower medians
(Note: When we write down Minimum, Maximum, Q1, Q2 (Median) and Q3, this is called 5-point summary or 5
number summary)
Let’s solve some questions to find IQR.
Visit: Learnbay.co
Transforming Data
Look at below Question.
1. Below are the weights of 5 persons. Calculate Mean, Standard Deviation :
105, 156, 145, 172, 100
2. Suppose each one of them gained extra 5 Kg. weight during winters. Can you calculate the new Mean and
Standard deviation?
Visit: Learnbay.co
Transforming Data
Example 2:
1. Considering the same set of people from previous example, Suppose that these persons are advised to drink
2.5 ml of water for every Kg they weigh plus 750 ml of water everyday.
105, 156, 145, 172, 100
What is the mean and STD for the amount of water consumed everyday?
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Multiplicative
term
Additive
term
Multiplicative
term
Visit: Learnbay.co
3. Measure of symmetricity – Skewness and Kurtosis
Visit: Learnbay.co
1. Skewness
Skewness is usually described as a measure of a dataset’s symmetry – or lack of symmetry. A perfectly
symmetrical data set will have a skewness of 0. The normal distribution has a skewness of 0. Skewness is
calculated as:
import numpy as np
from scipy.stats import skew
x = np.random.normal(0, 2, 10000) # create random values based on a normal distribution
print(skew(x))
….
Mathematically:
where n is the sample size, Xi is the ith X value, X-Bar is the average and s is the sample standard
deviation. Note the exponent in the summation. It is “3”. The skewness is referred to as the
“third standardized central moment for the probability model.” Visit: Learnbay.co
Skewness
So, when is the skewness too much? The rule of thumb seems to be:
1. If the skewness is between -0.5 and 0.5, the data are fairly symmetrical.
2. If the skewness is between -1 and – 0.5 or between 0.5 and 1, the data are moderately skewed.
3. If the skewness is less than -1 or greater than 1, the data are highly skewed.
Importance of Skewness:
Measures of asymmetry like skewness are the link between central tendency measures and probability
theory, which ultimately allows us to get a more complete understanding of the data we are working with.
Knowing that the market has a 70% probability of going up and a 30% probability of going down may appear
helpful if you rely on normal distributions. However, if you were told that if the market goes up, it will go up
2% and if it goes down, it will go down 10%, then you could see the skewed returns and make a better
informed decision.
E(r) = 0.7*0.02 + 0.3*-0.1 = -0.014
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
Visit: Learnbay.co
2. Kurtosis
Kurtosis is all about the tails of the distribution – not the peakness or flatness. It measures
the tail-heaviness of the distribution. Kurtosis is calculated as:
where n is the sample size, Xi is the ith X value, X-Bar is the average and s is the sample standard deviation. Note the exponent in the
summation. It is “4”. The kurtosis is referred to as the “fourth standardized central moment for the probability model.”
Note: Kurtosis calculated by Excel or through Python/R is actually excess kurtosis, which is (Kurtosis – 3)
import numpy as np
from scipy.stats import kurtosis
x = np.random.normal(0, 2, 10000) # create random values based on a normal distribution
print(kurtosis(x))
Mathematically:
Visit: Learnbay.co
The reference standard is a normal distribution, which has a kurtosis of 3. In token of this,
often the excess kurtosis is presented: excess kurtosis is simply kurtosis−3. For example, the
“kurtosis” reported by Excel or any statistical library is actually the excess kurtosis.
1. A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any distribution
with kurtosis ≈3 (excess ≈0) is called mesokurtic.
2. A distribution with kurtosis <3 (excess kurtosis <0) is called platykurtic. Compared to a
normal distribution, its tails are shorter and thinner, and often its central peak is lower and
broader.
3. A distribution with kurtosis >3 (excess kurtosis >0) is called leptokurtic. Compared to a
normal distribution, its tails are longer and fatter, and often its central peak is higher and
sharper.
What does the value of Kurtosis tells about the shape?
Visit: Learnbay.co
Uses of Kurtosis:
1. Depicts the shape of the distribution - specially tails.
2. Outlier Detection : Large Kurtosis suggests there could be outliers in the data.
3. With high kurtosis, there is a chance of high variance and hence test on Mean could lead to bad results.
Hence, in that case, we would need to choose a more robust option – like test on Median.
4. Financial Risk: E.g. The return of your asset can be farther from the mean. (Than predicted using normal
distribution).
Visit: Learnbay.co
Visit: Learnbay.co
Outliers
What is outlier?
An outlier is an observation that lies an abnormal distance from other values in a random sample from a
population. In a sense, this definition leaves it up to the analyst to decide what will be considered
abnormal.
Common Causes of Outliers
1. Data entry errors (human errors)
2. Measurement errors (instrument errors)
3. Experimental errors (data extraction or experiment
planning/executing errors)
4. Intentional (dummy outliers made to test detection methods)
5. Data processing errors (data manipulation or data set
unintended mutations)
6. Sampling errors (extracting or mixing data from wrong or various
sources)
7. Natural (not an error, novelties in data)
Visit: Learnbay.co
Common methods of determining an Outlier
1. Sort the data and see for the extreme values
2. Plotting – Boxplot, Scatterplot
3. IQR Method
4. Z-Score Method
Why do we need to treat outliers?
Outliers can impact the results of our analysis and statistical modelling in a drastic way.
Visit: Learnbay.co
IQR Method
Visit: Learnbay.co
Visit: Learnbay.co
Q. Can you identify the outliers from the below dataset, using the IQR method?
26.0 ℃ , 15.0 ℃ , 20.5 ℃ , 31 ℃ , -350.0 ℃ , 31.0 ℃ , 30.5 ℃
Visit: Learnbay.co
Outliers < Q1 – 1.5 (IQR)
> Q3 + 1.5 (IQR)
25 – 1.5 (11) = 8.5
36 + 1.5 (11) = 52.5
Hence, we can say that 59 is the only outlier we have in our dataset. Visit:
Learnbay.co
Visit: Learnbay.co
Z-Score Method
What is Z-Score?
Z-scores can quantify the unusualness of an observation when your data follow the normal distribution. Z-scores are
the number of standard deviations above and below the mean that each value falls, assuming a Normal distribution.
For example, a Z-score of 2 indicates that an observation is two standard deviations above the average while a Z-
score of -2 signifies it is two standard deviations below the mean.
Z-Score Formula?
Point to understand:
The further away an observation’s Z-score is from zero, the more unusual it is. A standard cut-off value for finding
outliers are Z-scores of +/-3 or further from zero. In a population that follows the normal distribution, Z-score values
more extreme than +/- 3 have a probability of 0.0027 (2 * 0.00135), which is about 1 in 370 observations.
Visit: Learnbay.co
Visit: Learnbay.co
Example: Consider the below dataset. Find out the outlier using Z-score method.
1, 2, 2, 2, 3, 1, 1, 15, 2, 2, 2, 3, 1, 1, 2
Mean = 2.66
Std = 3.36
Z (1) = (1 – 2.66)/3.36 = -0.49405
Z(2) = (2 – 2.66)/3.36 = -0.19643
Z(3) = (3 – 2.66)/3.36 = 0.10119
Z(15) = (15 – 2.66)/3.36 = 3.67262
We will term the point outlier if it has a z-score of 3 or above (in any side - positive or negative).
Hence, here the outlier is 15.
Solution:
Ref:
Z score formula =
Visit: Learnbay.co
Covariance
Visit: Learnbay.co
Covariance
x > µx, y > µy
x < µx, y < µy
x > µx,
x < µx,
+
-
+
-
+ -
- +
y < µy
y > µy
In Python: use cov() function
Visit: Learnbay.co
Correlation
Visit: Learnbay.co
What is Correlation?
Correlation is a statistical technique to depict the relationship between 2 variables – strength and
direction. We measure the correlation with the help of Correlation Coefficient.
For example, height and weight are related; taller people tend to be heavier than shorter people.
What is Correlation Coefficient?
The Pearson’s correlation coefficient (r) is a measure that determines the degree to which the movement
of two variables is associated. The value of Correlation Coefficient lies between -1 and 1.
Formula: (Pearson’s Correlation Coefficient) - Standard Formula
In Python: DataFrame.corr(method=’pearson’)
(n = sample size, and Sx, Sy are the standard deviation of samples x and y. X-bar and y-bar are the respective
means of x and y samples whereas Xi and Yi are sample points of X and Y respectively.) Visit: Learnbay.co
Positive and Negative Correlation:
1. Correlation Coefficient greater than zero indicates a positive relationship
2. while a value less than zero signifies a negative relationship
3. and a value of zero indicates no relationship between the two variables being compared.
Visit: Learnbay.co
Strong and Weak Correlation:
Kind of correlation = depicted by sign of correlation coefficient
How Strong =. Value of Correlation Coefficient
Visit: Learnbay.co
Rule of thumb: Any relationship with magnitude of r greater than 0.75 can be considered to be a strong
correlation.
E.g.: -0.84 is a strong Negative correlation and 0.90 is a strong positive correlation. Visit: Learnbay.co
Visit:
Learnbay.co
Visit: Learnbay.co
Question: The local ice cream shop keeps track of how much ice cream they sell versus the temperature on
that day, here are their figures for the last 12 days. Can you tell if Ice cream sales are correlated to that of
temperature? Find out the nature and strength of correlation.
Temperature Ice Cream Sales
14.2° $215
16.4° $325
11.9° $185
15.2° $332
18.5° $406
22.1° $522
19.4° $412
25.1° $614
23.4° $544
18.1° $421
22.6° $445
17.2° $408
Visit: Learnbay.co
Spearman Rank Correlation
• Used for Non-Linear Variables
• Spearman Corr Coeff = Pearson Corr coeff (rank varaibles)
• In Python: DataFrame.corr(method=’spearman’)
• Denoted by rho.
Visit: Learnbay.co
Steps for Spearman Correlation Coefficient
1. Create a new column for rank(x) and assign the rank of each variable.
2. Assign the rank of 2nd variable in a new column rank(y).
3. Calculate the difference in rank of both the variables = d.
4. Calculate the d-squared.
5. Add up d-squared score.
6. Put in the formula provided:
Visit:
Learnbay.co
Question: The scores for 10 students in English and Maths are as follows:
Compute the Spearman rank correlation.
Visit: Learnbay.co
Solution:
Step 1,2,3 and 4:
Visit: Learnbay.co
Solution Contd.
Step 5:
Step 6:
Hence, the Spearman Rank Coefficient is 0.67.
Visit: Learnbay.co
Points to Ponder (Correlation):
Example: A few years ago a survey of employees found a strong positive correlation
between "Studying an external course" and Sick Leave Days.
Does this mean?
• Studying makes them sick?
• Sick people study a lot?
• Or did they lie about being sick so they can study more?
Without further research we can't be sure why. :-D
Example: Poor suburbs are more likely to have high pollution.
Why?
• Do poor people make pollution?
• Are polluted suburbs the only place poor people can afford?
• Is it a common link, such as factories with low paying jobs and lots of pollution?
Visit: Learnbay.co
Recap – Descriptive Statistics
• Statistics? Its Importance. Population vs Sample.
• Types of variable – (Quantitative, Categorical), - (Ordinal, Nominal), (discrete, continuous), (interval, ratio).
• Types of charts – Pie, Donut, Line, Scatterplot, Histogram, Bar chart, Box-plot
• Descriptive Stats:
Measure of central tendency -- Mean , Median & Model
Measures of Dispersion/spread – Standard Deviation, Variance, range & IQR
Measures of symmetricity – Skewness and Kurtosis
• 5 Number Summary – Box Plot (Box and Whiskers)
• Effect of transformation on central tendency and spread.
• Outliers? How to detect? Modified Boxplot. (IQR Method, Z-Score Method)
• Covariance, Correlation. Pearson’s Correlation Coefficient. Nature & Strength of
Correlation. How to calculate Pearson’s Correlation Coefficient and Spearman’s
Rank Correlation Coefficient.
Visit: Learnbay.co
THANK YOU!
Visit: Learnbay.co

Contenu connexe

Tendances

Statistics Class 10 CBSE
Statistics Class 10 CBSE Statistics Class 10 CBSE
Statistics Class 10 CBSE Smitha Sumod
 
Bcs 040 Descriptive Statistics
Bcs 040 Descriptive StatisticsBcs 040 Descriptive Statistics
Bcs 040 Descriptive StatisticsNarayan Thapa
 
Statics for the management
Statics for the managementStatics for the management
Statics for the managementRohit Mishra
 
Introduction to Statistics - Part 1
Introduction to Statistics - Part 1Introduction to Statistics - Part 1
Introduction to Statistics - Part 1Damian T. Gordon
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsAndi Koentary
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statisticsMona Sajid
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statisticsaan786
 
Statics for management
Statics for managementStatics for management
Statics for managementparth06
 
Statistics
StatisticsStatistics
Statisticsitutor
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsSr Edith Bogue
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyChie Pegollo
 
Business Statistics
Business StatisticsBusiness Statistics
Business Statisticsshorab
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boaraileeanne
 
Statistics - Presentation
Statistics - PresentationStatistics - Presentation
Statistics - PresentationROCIO YUSTE
 
Statistics for management
Statistics for managementStatistics for management
Statistics for managementJohn Prarthan
 

Tendances (20)

Statistics Class 10 CBSE
Statistics Class 10 CBSE Statistics Class 10 CBSE
Statistics Class 10 CBSE
 
Bcs 040 Descriptive Statistics
Bcs 040 Descriptive StatisticsBcs 040 Descriptive Statistics
Bcs 040 Descriptive Statistics
 
Statics for the management
Statics for the managementStatics for the management
Statics for the management
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 
Introduction to Statistics - Part 1
Introduction to Statistics - Part 1Introduction to Statistics - Part 1
Introduction to Statistics - Part 1
 
STATISTICS
STATISTICSSTATISTICS
STATISTICS
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
descriptive and inferential statistics
descriptive and inferential statisticsdescriptive and inferential statistics
descriptive and inferential statistics
 
Chapter 1: Statistics
Chapter 1: StatisticsChapter 1: Statistics
Chapter 1: Statistics
 
Elementary Statistics
Elementary Statistics Elementary Statistics
Elementary Statistics
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
Statics for management
Statics for managementStatics for management
Statics for management
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Business Statistics
Business StatisticsBusiness Statistics
Business Statistics
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boa
 
Statistics - Presentation
Statistics - PresentationStatistics - Presentation
Statistics - Presentation
 
Statistics for management
Statistics for managementStatistics for management
Statistics for management
 

Similaire à Statistics

Statistic Project Essay
Statistic Project EssayStatistic Project Essay
Statistic Project EssayRobin Anderson
 
Mat 255 chapter 3 notes
Mat 255 chapter 3 notesMat 255 chapter 3 notes
Mat 255 chapter 3 notesadrushle
 
assignment of statistics 2.pdf
assignment of statistics 2.pdfassignment of statistics 2.pdf
assignment of statistics 2.pdfSyedDaniyalKazmi2
 
Statistics Based On Ncert X Class
Statistics Based On Ncert X ClassStatistics Based On Ncert X Class
Statistics Based On Ncert X ClassRanveer Kumar
 
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxPETTIROSETALISIC
 
Unit III - Statistical Process Control (SPC)
Unit III - Statistical Process Control (SPC)Unit III - Statistical Process Control (SPC)
Unit III - Statistical Process Control (SPC)Dr.Raja R
 
lecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelinglecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelingstone55
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2marshalkalra
 
Statistics for Managers notes.pdf
Statistics for Managers notes.pdfStatistics for Managers notes.pdf
Statistics for Managers notes.pdfVelujv
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics Bahzad5
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
Chapter 4 MMW.pdf
Chapter 4 MMW.pdfChapter 4 MMW.pdf
Chapter 4 MMW.pdfRaRaRamirez
 
Unit 8 data analysis and interpretation
Unit 8 data analysis and interpretationUnit 8 data analysis and interpretation
Unit 8 data analysis and interpretationAsima shahzadi
 
Mapping Elections1. Identify at least three types of maps (e.docx
Mapping Elections1. Identify at least three types of maps (e.docxMapping Elections1. Identify at least three types of maps (e.docx
Mapping Elections1. Identify at least three types of maps (e.docxinfantsuk
 
1. week 1
1. week 11. week 1
1. week 1renz50
 
Computing Descriptive Statistics © 2014 Argos.docx
 Computing Descriptive Statistics     © 2014 Argos.docx Computing Descriptive Statistics     © 2014 Argos.docx
Computing Descriptive Statistics © 2014 Argos.docxaryan532920
 

Similaire à Statistics (20)

Statistic Project Essay
Statistic Project EssayStatistic Project Essay
Statistic Project Essay
 
Data Analysis
Data Analysis Data Analysis
Data Analysis
 
Unit 1 Introduction
Unit 1 IntroductionUnit 1 Introduction
Unit 1 Introduction
 
Mat 255 chapter 3 notes
Mat 255 chapter 3 notesMat 255 chapter 3 notes
Mat 255 chapter 3 notes
 
assignment of statistics 2.pdf
assignment of statistics 2.pdfassignment of statistics 2.pdf
assignment of statistics 2.pdf
 
Medical Statistics.pptx
Medical Statistics.pptxMedical Statistics.pptx
Medical Statistics.pptx
 
Data Science 1.pdf
Data Science 1.pdfData Science 1.pdf
Data Science 1.pdf
 
Statistics Based On Ncert X Class
Statistics Based On Ncert X ClassStatistics Based On Ncert X Class
Statistics Based On Ncert X Class
 
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
 
Unit III - Statistical Process Control (SPC)
Unit III - Statistical Process Control (SPC)Unit III - Statistical Process Control (SPC)
Unit III - Statistical Process Control (SPC)
 
lecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelinglecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modeling
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
 
Statistics for Managers notes.pdf
Statistics for Managers notes.pdfStatistics for Managers notes.pdf
Statistics for Managers notes.pdf
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
Chapter 4 MMW.pdf
Chapter 4 MMW.pdfChapter 4 MMW.pdf
Chapter 4 MMW.pdf
 
Unit 8 data analysis and interpretation
Unit 8 data analysis and interpretationUnit 8 data analysis and interpretation
Unit 8 data analysis and interpretation
 
Mapping Elections1. Identify at least three types of maps (e.docx
Mapping Elections1. Identify at least three types of maps (e.docxMapping Elections1. Identify at least three types of maps (e.docx
Mapping Elections1. Identify at least three types of maps (e.docx
 
1. week 1
1. week 11. week 1
1. week 1
 
Computing Descriptive Statistics © 2014 Argos.docx
 Computing Descriptive Statistics     © 2014 Argos.docx Computing Descriptive Statistics     © 2014 Argos.docx
Computing Descriptive Statistics © 2014 Argos.docx
 

Plus de Learnbay Datascience

Artificial Intelligence- Neural Networks
Artificial Intelligence- Neural NetworksArtificial Intelligence- Neural Networks
Artificial Intelligence- Neural NetworksLearnbay Datascience
 
Artificial intelligence - expert systems
 Artificial intelligence - expert systems Artificial intelligence - expert systems
Artificial intelligence - expert systemsLearnbay Datascience
 
Artificial intelligence - research areas
Artificial intelligence - research areasArtificial intelligence - research areas
Artificial intelligence - research areasLearnbay Datascience
 
Artificial intelligence intelligent systems
Artificial intelligence   intelligent systemsArtificial intelligence   intelligent systems
Artificial intelligence intelligent systemsLearnbay Datascience
 

Plus de Learnbay Datascience (20)

Top data science projects
Top data science projectsTop data science projects
Top data science projects
 
Python my SQL - create table
Python my SQL - create tablePython my SQL - create table
Python my SQL - create table
 
Python my SQL - create database
Python my SQL - create databasePython my SQL - create database
Python my SQL - create database
 
Python my sql database connection
Python my sql   database connectionPython my sql   database connection
Python my sql database connection
 
Python - mySOL
Python - mySOLPython - mySOL
Python - mySOL
 
AI - Issues and Terminology
AI - Issues and TerminologyAI - Issues and Terminology
AI - Issues and Terminology
 
AI - Fuzzy Logic Systems
AI - Fuzzy Logic SystemsAI - Fuzzy Logic Systems
AI - Fuzzy Logic Systems
 
AI - working of an ns
AI - working of an nsAI - working of an ns
AI - working of an ns
 
Artificial Intelligence- Neural Networks
Artificial Intelligence- Neural NetworksArtificial Intelligence- Neural Networks
Artificial Intelligence- Neural Networks
 
AI - Robotics
AI - RoboticsAI - Robotics
AI - Robotics
 
Applications of expert system
Applications of expert systemApplications of expert system
Applications of expert system
 
Components of expert systems
Components of expert systemsComponents of expert systems
Components of expert systems
 
Artificial intelligence - expert systems
 Artificial intelligence - expert systems Artificial intelligence - expert systems
Artificial intelligence - expert systems
 
AI - natural language processing
AI - natural language processingAI - natural language processing
AI - natural language processing
 
Ai popular search algorithms
Ai   popular search algorithmsAi   popular search algorithms
Ai popular search algorithms
 
AI - Agents & Environments
AI - Agents & EnvironmentsAI - Agents & Environments
AI - Agents & Environments
 
Artificial intelligence - research areas
Artificial intelligence - research areasArtificial intelligence - research areas
Artificial intelligence - research areas
 
Artificial intelligence composed
Artificial intelligence composedArtificial intelligence composed
Artificial intelligence composed
 
Artificial intelligence intelligent systems
Artificial intelligence   intelligent systemsArtificial intelligence   intelligent systems
Artificial intelligence intelligent systems
 
Applications of ai
Applications of aiApplications of ai
Applications of ai
 

Dernier

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 

Dernier (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 

Statistics

  • 1. STATISTICS Visit: Learnbay.co Data science and AI Certification Course
  • 2. What is Statistics? Statistics is a mathematical science including methods of collecting, organizing and analyzing data in such a way that meaningful conclusions can be drawn from them. In general, its investigations and analyses fall into two broad categories called descriptive and inferential statistics. Visit: Learnbay.co
  • 3. Developing Statistical Thinking Statistics include numerical facts and figures. For instance: • The largest earthquake measured 9.2 on the Richter scale. • Men are at least 10 times more likely than women to commit murder. • One in every 8 Americans is COVID positive. The study of statistics involves math and relies upon calculations of numbers. But it also relies heavily on how the numbers are chosen and how the statistics are interpreted. For example, consider some scenarios and the interpretations based upon the presented statistics. 1. A new advertisement for Amul’s ice cream introduced in late May of last year resulted in a 30% increase in ice cream sales for the following three months. Thus, the advertisement was effective. 2. The more liquor shop in a city, the more crime there is. Thus, liquor shops lead to crime. Visit: Learnbay.co
  • 4. 1. Flaw: A major flaw is that ice cream consumption generally increases in the months of June, July, and August regardless of advertisements. This effect is called a history effect and leads people to interpret outcomes as the result of one variable when another variable (in this case, one having to do with the passage of time) is actually responsible. 2. Flaw: A major flaw is that both increased liquor shops and increased crime rates can be explained by larger populations. In bigger cities, there are both more liquor shops and more crime. This problem refers to the third-variable problem. Namely, a third variable can cause both situations; however, people erroneously believe that there is a causal relationship between the two primary variables rather than recognize that a third variable can cause both. Hence, the correct Interpretation of the numbers are necessary!!!! Visit: Learnbay.co
  • 5. Types of Statistics: 1. Descriptive statistics deals with the processing of data without attempting to draw any inferences from it. The characteristics of the data are described in simple terms. Events that are dealt with include everyday happenings such as accidents, prices of goods, business, incomes, epidemics, sports data, population data. 2. Inferential statistics is a scientific discipline that uses mathematical tools to make forecasts and projections by analysing the given data. This is of use to people employed in such fields as engineering, economics, biology, the social sciences, business, agriculture and communications. Visit: Learnbay.co
  • 6. Population? Refers to the total amount of things. Sample? Small part of population that is used for study. Sample Size? Total amount of things in a sample. Variable? What we are studying. (Measurable, Countable, Categorized) Visit: Learnbay.co
  • 10. • The difference between interval and ratio scales comes from their ability to dip below zero. Interval scales hold no true zero and can represent values below zero. For example, you can measure temperature below 0 degrees Celsius, such as -10 degrees. • Ratio variables, on the other hand, never fall below zero. Height and weight measure from 0 and above, but never fall below it. QUANTITATIVE VARIABLE INTERVAL RATIO An interval scale is one where there is order and the difference between two values is meaningful. A ratio variable, has all the properties of an interval variable, and also has a clear definition of 0. Ex: Ex: Temperature (Fahrenheit), Temperature (Celsius), pH, Weight, length, temperature in Kelvin Visit: Learnbay.co
  • 11. Data Visualization Basics We need data visualization because a visual summary of information makes it easier to identify patterns and trends than looking through thousands of rows on a spreadsheet. It’s the way the human brain works. Since the purpose of data analysis is to gain insights, data is much more valuable when it is visualized. Even if a data analyst can pull insights from data without visualization, it will be more difficult to communicate the meaning without visualization. Charts and graphs make communicating data findings easier even if you can identify the patterns without them. Data visualization is the representation of data or information in a graph, chart, or other visual format. It communicates relationships of the data with images. This is important because it allows trends and patterns to be more easily seen. What is data Visualization? What is its importance? Visit: Learnbay.co
  • 12. Line Chart. A line chart is, as one can imagine, a line or multiple lines showing how single, or multiple variables develop over time. Pie Chart. A pie chart is a circular graph divided into slices. The larger a slice is the bigger portion of the total quantity it represents. Bar Graph. A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. Can be of one variable or many variable. Histogram A series of bins showing us the frequency of observations of a given variable. Scatter Plots A scatter plot is a great indicator that allows us to see whether there is a pattern to be found between two variables. E.g. : Positive or negative relationship. Visit: Learnbay.co
  • 18. Descriptive Statistics deals with the processing of data without attempting to draw any inferences from it. The characteristics of the data are described in simple terms. Events that are dealt with include everyday happenings such as accidents, prices of goods, business, incomes, epidemics, sports data, population data. When we give description of data, there can be 3 kinds: 1. Measures of Central Tendency – Mean, Median and Mode 2. Measures of Dispersion – Standard Deviation, Variance, Range, IQR (Inter Quartile Range) 3. Measure of Symmetricity/Shape – Skewness and Kurtosis Visit: Learnbay.co
  • 19. 1. Measure of Central Tendency A measure of central tendency is a summary statistic that represents the centre point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the central location of a distribution. 1. Mean Average value of the set of Numbers. Mean is a a number around which a whole data is spread out. Denoted by µ for population mean and for sample mean. Example: Find the mean of 5,5,2,6,3,8,9? A: Mean is (5+5+2+6+3+8+9) / 7 = 38/7 = 5.43 2. Median Median is the value which divides the data in 2 equal parts i.e. number of terms on right side of it is same as number of terms on left side of it when data is arranged in either ascending or descending order. (Note: If you sort data in descending order, it won’t affect median but IQR will be negative. IQR will be discussed in next slide.) Example: Find the Median of 5,5,2,6,3,8,9? A: Putting it in ascending order = 2,3,5,5,6,8,9. Hence, Median = Mid Number = 5. (Note: Median of a even set of numbers can be found by taking the average of the 2 middle numbers. E.g. Median of 2,3,4,7 = average of (3 and 4 ) = 3.5) 3. Mode Mode is the term appearing maximum time in data set i.e. term that has highest frequency. Example: Find the Median of 5,5,2,6,3,8,9? A: Mode = Maximum number of repetition in dataset = 5. Hence, Mode = 5. (Note: If there is no repetition of data then mode is not present. E.g.: What is the mode of 1,2,3,5,6? A: None i.e. mode is not present.) Visit: Learnbay.co
  • 20. 1. What is Minimum and Maximum value? It is the minimum and Maximum values of the dataset respectively. 2. What is 1st and 3rd Quartile? – Also called the lower and upper quartile respectively. When we divide the dataset into two groups while calculating median (sorted in ascending order), then the median of first half is 1st Quartile and median of second half is 3rd Quartile. 3. Then where is the 2nd Quartile? Your median is the 2nd Quartile ;-) Let’s build some concepts before going ahead. Visit: Learnbay.co
  • 22. Q. Given is the ages of people registered for a webinar, calculate the 5 point summary (5 number summary) of the ages of the participants? 19, 26, 25, 37, 32, 28, 22, 23, 29, 34, 39, 31 Visit: Learnbay.co
  • 23. 19, 26, 25, 37, 32, 28, 22, 23, 29, 34, 39, 31 Visit: Learnbay.co
  • 28. 2. Measure of Spread / Dispersion 1. Standard deviation Standard deviation is the measurement of average distance between each quantity and mean. That is, how data is spread out from mean. A low standard deviation indicates that the data points tend to be close to the mean of the data set, while a high standard deviation indicates that the data points are spread out over a wider range of values. SD of Population (s) SD of Sample (s) i i In Python : Population STD = pstdev() Sample STD = stdev() Visit: Learnbay.co
  • 31. Measure of Spread / Dispersion 2. Variance Variance is a square of average distance between each quantity and mean. That is, it is square of standard deviation. In Python : Population Var = pvariance() Sample Variance = variance() 3. Range Range is one of the simplest techniques of descriptive statistics. It is the difference between lowest and highest value. Range = Maximum - Minimum 4. IQR (Interquartile Range) In statistics and probability, quartiles are values that divide your data into quarters provided data is sorted in an ascending order. IQR = Q3 – Q1 i i Population Variance (s2) Sample Variance (S2) Visit: Learnbay.co
  • 32. Measure of Spread / Dispersion Steps to find out the IQR 1. Order the data from least to greatest 2. Find the median 3. The left side of median is lower half and right side of the data is upper half. 4. Calculate the median of both the lower and upper half of the data (Called Q1 and Q3 respectively) 5. The IQR is the difference between the upper and lower medians (Note: When we write down Minimum, Maximum, Q1, Q2 (Median) and Q3, this is called 5-point summary or 5 number summary) Let’s solve some questions to find IQR. Visit: Learnbay.co
  • 33. Transforming Data Look at below Question. 1. Below are the weights of 5 persons. Calculate Mean, Standard Deviation : 105, 156, 145, 172, 100 2. Suppose each one of them gained extra 5 Kg. weight during winters. Can you calculate the new Mean and Standard deviation? Visit: Learnbay.co
  • 34. Transforming Data Example 2: 1. Considering the same set of people from previous example, Suppose that these persons are advised to drink 2.5 ml of water for every Kg they weigh plus 750 ml of water everyday. 105, 156, 145, 172, 100 What is the mean and STD for the amount of water consumed everyday? Visit: Learnbay.co
  • 39. 3. Measure of symmetricity – Skewness and Kurtosis Visit: Learnbay.co
  • 40. 1. Skewness Skewness is usually described as a measure of a dataset’s symmetry – or lack of symmetry. A perfectly symmetrical data set will have a skewness of 0. The normal distribution has a skewness of 0. Skewness is calculated as: import numpy as np from scipy.stats import skew x = np.random.normal(0, 2, 10000) # create random values based on a normal distribution print(skew(x)) …. Mathematically: where n is the sample size, Xi is the ith X value, X-Bar is the average and s is the sample standard deviation. Note the exponent in the summation. It is “3”. The skewness is referred to as the “third standardized central moment for the probability model.” Visit: Learnbay.co
  • 41. Skewness So, when is the skewness too much? The rule of thumb seems to be: 1. If the skewness is between -0.5 and 0.5, the data are fairly symmetrical. 2. If the skewness is between -1 and – 0.5 or between 0.5 and 1, the data are moderately skewed. 3. If the skewness is less than -1 or greater than 1, the data are highly skewed. Importance of Skewness: Measures of asymmetry like skewness are the link between central tendency measures and probability theory, which ultimately allows us to get a more complete understanding of the data we are working with. Knowing that the market has a 70% probability of going up and a 30% probability of going down may appear helpful if you rely on normal distributions. However, if you were told that if the market goes up, it will go up 2% and if it goes down, it will go down 10%, then you could see the skewed returns and make a better informed decision. E(r) = 0.7*0.02 + 0.3*-0.1 = -0.014 Visit: Learnbay.co
  • 50. 2. Kurtosis Kurtosis is all about the tails of the distribution – not the peakness or flatness. It measures the tail-heaviness of the distribution. Kurtosis is calculated as: where n is the sample size, Xi is the ith X value, X-Bar is the average and s is the sample standard deviation. Note the exponent in the summation. It is “4”. The kurtosis is referred to as the “fourth standardized central moment for the probability model.” Note: Kurtosis calculated by Excel or through Python/R is actually excess kurtosis, which is (Kurtosis – 3) import numpy as np from scipy.stats import kurtosis x = np.random.normal(0, 2, 10000) # create random values based on a normal distribution print(kurtosis(x)) Mathematically: Visit: Learnbay.co
  • 51. The reference standard is a normal distribution, which has a kurtosis of 3. In token of this, often the excess kurtosis is presented: excess kurtosis is simply kurtosis−3. For example, the “kurtosis” reported by Excel or any statistical library is actually the excess kurtosis. 1. A normal distribution has kurtosis exactly 3 (excess kurtosis exactly 0). Any distribution with kurtosis ≈3 (excess ≈0) is called mesokurtic. 2. A distribution with kurtosis <3 (excess kurtosis <0) is called platykurtic. Compared to a normal distribution, its tails are shorter and thinner, and often its central peak is lower and broader. 3. A distribution with kurtosis >3 (excess kurtosis >0) is called leptokurtic. Compared to a normal distribution, its tails are longer and fatter, and often its central peak is higher and sharper. What does the value of Kurtosis tells about the shape? Visit: Learnbay.co
  • 52. Uses of Kurtosis: 1. Depicts the shape of the distribution - specially tails. 2. Outlier Detection : Large Kurtosis suggests there could be outliers in the data. 3. With high kurtosis, there is a chance of high variance and hence test on Mean could lead to bad results. Hence, in that case, we would need to choose a more robust option – like test on Median. 4. Financial Risk: E.g. The return of your asset can be farther from the mean. (Than predicted using normal distribution). Visit: Learnbay.co
  • 54. Outliers What is outlier? An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In a sense, this definition leaves it up to the analyst to decide what will be considered abnormal. Common Causes of Outliers 1. Data entry errors (human errors) 2. Measurement errors (instrument errors) 3. Experimental errors (data extraction or experiment planning/executing errors) 4. Intentional (dummy outliers made to test detection methods) 5. Data processing errors (data manipulation or data set unintended mutations) 6. Sampling errors (extracting or mixing data from wrong or various sources) 7. Natural (not an error, novelties in data) Visit: Learnbay.co
  • 55. Common methods of determining an Outlier 1. Sort the data and see for the extreme values 2. Plotting – Boxplot, Scatterplot 3. IQR Method 4. Z-Score Method Why do we need to treat outliers? Outliers can impact the results of our analysis and statistical modelling in a drastic way. Visit: Learnbay.co
  • 58. Q. Can you identify the outliers from the below dataset, using the IQR method? 26.0 ℃ , 15.0 ℃ , 20.5 ℃ , 31 ℃ , -350.0 ℃ , 31.0 ℃ , 30.5 ℃ Visit: Learnbay.co
  • 59. Outliers < Q1 – 1.5 (IQR) > Q3 + 1.5 (IQR) 25 – 1.5 (11) = 8.5 36 + 1.5 (11) = 52.5 Hence, we can say that 59 is the only outlier we have in our dataset. Visit: Learnbay.co
  • 61. Z-Score Method What is Z-Score? Z-scores can quantify the unusualness of an observation when your data follow the normal distribution. Z-scores are the number of standard deviations above and below the mean that each value falls, assuming a Normal distribution. For example, a Z-score of 2 indicates that an observation is two standard deviations above the average while a Z- score of -2 signifies it is two standard deviations below the mean. Z-Score Formula? Point to understand: The further away an observation’s Z-score is from zero, the more unusual it is. A standard cut-off value for finding outliers are Z-scores of +/-3 or further from zero. In a population that follows the normal distribution, Z-score values more extreme than +/- 3 have a probability of 0.0027 (2 * 0.00135), which is about 1 in 370 observations. Visit: Learnbay.co
  • 63. Example: Consider the below dataset. Find out the outlier using Z-score method. 1, 2, 2, 2, 3, 1, 1, 15, 2, 2, 2, 3, 1, 1, 2 Mean = 2.66 Std = 3.36 Z (1) = (1 – 2.66)/3.36 = -0.49405 Z(2) = (2 – 2.66)/3.36 = -0.19643 Z(3) = (3 – 2.66)/3.36 = 0.10119 Z(15) = (15 – 2.66)/3.36 = 3.67262 We will term the point outlier if it has a z-score of 3 or above (in any side - positive or negative). Hence, here the outlier is 15. Solution: Ref: Z score formula = Visit: Learnbay.co
  • 65. Covariance x > µx, y > µy x < µx, y < µy x > µx, x < µx, + - + - + - - + y < µy y > µy In Python: use cov() function Visit: Learnbay.co
  • 67. What is Correlation? Correlation is a statistical technique to depict the relationship between 2 variables – strength and direction. We measure the correlation with the help of Correlation Coefficient. For example, height and weight are related; taller people tend to be heavier than shorter people. What is Correlation Coefficient? The Pearson’s correlation coefficient (r) is a measure that determines the degree to which the movement of two variables is associated. The value of Correlation Coefficient lies between -1 and 1. Formula: (Pearson’s Correlation Coefficient) - Standard Formula In Python: DataFrame.corr(method=’pearson’) (n = sample size, and Sx, Sy are the standard deviation of samples x and y. X-bar and y-bar are the respective means of x and y samples whereas Xi and Yi are sample points of X and Y respectively.) Visit: Learnbay.co
  • 68. Positive and Negative Correlation: 1. Correlation Coefficient greater than zero indicates a positive relationship 2. while a value less than zero signifies a negative relationship 3. and a value of zero indicates no relationship between the two variables being compared. Visit: Learnbay.co
  • 69. Strong and Weak Correlation: Kind of correlation = depicted by sign of correlation coefficient How Strong =. Value of Correlation Coefficient Visit: Learnbay.co
  • 70. Rule of thumb: Any relationship with magnitude of r greater than 0.75 can be considered to be a strong correlation. E.g.: -0.84 is a strong Negative correlation and 0.90 is a strong positive correlation. Visit: Learnbay.co
  • 73. Question: The local ice cream shop keeps track of how much ice cream they sell versus the temperature on that day, here are their figures for the last 12 days. Can you tell if Ice cream sales are correlated to that of temperature? Find out the nature and strength of correlation. Temperature Ice Cream Sales 14.2° $215 16.4° $325 11.9° $185 15.2° $332 18.5° $406 22.1° $522 19.4° $412 25.1° $614 23.4° $544 18.1° $421 22.6° $445 17.2° $408 Visit: Learnbay.co
  • 74. Spearman Rank Correlation • Used for Non-Linear Variables • Spearman Corr Coeff = Pearson Corr coeff (rank varaibles) • In Python: DataFrame.corr(method=’spearman’) • Denoted by rho. Visit: Learnbay.co
  • 75. Steps for Spearman Correlation Coefficient 1. Create a new column for rank(x) and assign the rank of each variable. 2. Assign the rank of 2nd variable in a new column rank(y). 3. Calculate the difference in rank of both the variables = d. 4. Calculate the d-squared. 5. Add up d-squared score. 6. Put in the formula provided: Visit: Learnbay.co
  • 76. Question: The scores for 10 students in English and Maths are as follows: Compute the Spearman rank correlation. Visit: Learnbay.co
  • 77. Solution: Step 1,2,3 and 4: Visit: Learnbay.co
  • 78. Solution Contd. Step 5: Step 6: Hence, the Spearman Rank Coefficient is 0.67. Visit: Learnbay.co
  • 79. Points to Ponder (Correlation): Example: A few years ago a survey of employees found a strong positive correlation between "Studying an external course" and Sick Leave Days. Does this mean? • Studying makes them sick? • Sick people study a lot? • Or did they lie about being sick so they can study more? Without further research we can't be sure why. :-D Example: Poor suburbs are more likely to have high pollution. Why? • Do poor people make pollution? • Are polluted suburbs the only place poor people can afford? • Is it a common link, such as factories with low paying jobs and lots of pollution? Visit: Learnbay.co
  • 80. Recap – Descriptive Statistics • Statistics? Its Importance. Population vs Sample. • Types of variable – (Quantitative, Categorical), - (Ordinal, Nominal), (discrete, continuous), (interval, ratio). • Types of charts – Pie, Donut, Line, Scatterplot, Histogram, Bar chart, Box-plot • Descriptive Stats: Measure of central tendency -- Mean , Median & Model Measures of Dispersion/spread – Standard Deviation, Variance, range & IQR Measures of symmetricity – Skewness and Kurtosis • 5 Number Summary – Box Plot (Box and Whiskers) • Effect of transformation on central tendency and spread. • Outliers? How to detect? Modified Boxplot. (IQR Method, Z-Score Method) • Covariance, Correlation. Pearson’s Correlation Coefficient. Nature & Strength of Correlation. How to calculate Pearson’s Correlation Coefficient and Spearman’s Rank Correlation Coefficient. Visit: Learnbay.co