SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
STAT2111
INTRODUCTION TO STATISTICS &
PROBABILITY
Instructor: Mr. HM Zahid
Lecturer,
Department of Information Sciences
University of Education, Lahore, Pakistan
PART 1 - DATA
INTRODUCTION
PROBABILITY AND STATISTICS
◾ Statistics is the mathematical science behind the problem
“what can I know about a population if I’m unable to reach
every member?”
PROBABILITY AND STATISTICS
◾ If we could measure the height of every resident of Australia,
then we could make a statement about the average height of
Australians at the time we took our measurement.
◾ This is where random sampling comes in.
PROBABILITY AND STATISTICS
◾ If we take a reasonably sized random sample of Australians and
measure their heights, we can form a statistical inference about
the population of Australia.
◾ Probability helps us know how sure we are of our conclusions!
DATA
WHAT IS DATA
◾ Data = the collected observations we have about
something.
◾ Data can be continuous
"What is the stock
price?”
◾ or categorical
"What car has the best repair history?”
WHY DATA MATTERS
◾ Helps us understand things as they are:
"What relationships if any exist between two events?"
"Do people who eat an apple a day enjoy fewer doctor's visits
than those who don't?"
WHY DATA MATTERS
◾ Helps us predict future behavior to guide
business decisions:
"Based on a user's click history which ad is
more likely to bring them to our site?"
VISUALIZING DATA
◾ Comparing a
table:
◾ Not much can
be gained by
reading it.
Flight
s
VISUALIZING DATA
◾ To a graph:
◾ The graph uncovers
two distinct trends
1. an increase in
passengers flying
over the years
2. a greater number of
passengers flying in
the summer
months.
Flight
s
ANALYZE VISUALIZATIONS CRITICALLY!
◾ Graphs can be
misleading:
MEASURING DATA
LEVELS OF MEASUREMENTS
◾ Nominal
◾ Predetermined categories
◾ Can’t be sorted
◾ Animal classification (mammal, fish, or
reptile)
◾ Political party (PTI, PPP, or PMLN)
LEVELS OF MEASUREMENTS
◾ Ordinal
◾ Can be sorted
◾ Lacks scale
◾ Survey responses
LEVELS OF MEASUREMENTS
◾ Interval
◾ Provides scale
◾ Lacks a “zero”
point
◾ Temperature (C,
F)
LEVELS OF MEASUREMENTS
◾ Ratio
◾ Values have a true zero point
◾ Age, weight,
salary,Temperature (K)
POPULATION VS. SAMPLE
◾ Population = every member
of a group
◾ Sample = a subset of
members that time and
resources allow you to
measure
MATHEMATICAL SYMBOLS & SYNTAX
EXPONENTS
𝒙𝟓
= 𝑥 × 𝑥 × 𝑥 × 𝑥 × 𝑥
1 2 3 4 5
EXAMPLE: 34
= 3 × 3 × 3 × 3 =
81
EXPONENTS - SPECIAL CASES
FACTORIALS
SIMPLE SUMS
SERIES SUMS
EQUATION EXAMPLE
◾ Formula for calculating a sample
mean:
◾ Read out loud:
◾ “𝒙 bar (the symbol for the sample mean) is equal to the sum (indicated
by the Greek letter sigma) of all the 𝒙-sub-𝒊 values in the series as 𝒊
goes from 1 to the number 𝒏 items in the series divided by 𝒏."
EQUATION EXAMPLE
EQUATION EXAMPLE
MEASUREMENT TYPES
CENTRAL TENDENCY
MEASUREMENTS OF DATA
◾ “What was the average return?”
Measures of Central Tendency
◾ “How far from the average did individual values
stray?”
Measures of Dispersion
MEASURES OF CENTRAL TENDENCY
(MEAN, MEDIAN, MODE)
◾ Describe the “location” of the
data
◾ Fail to describe the “shape” of the
data
◾ mean= “calculated average”
◾ median= “middle value”
MEAN
◾ Shows “location” but not “how spread
out”
MEDIAN – ODD NUMBER OF VALUES
MEDIAN – EVEN NUMBER OF VALUES
MEAN VS. MEDIAN
◾ The mean can be influenced by outliers.
◾ The mean of {2,3,2,3,2,12} is 4
◾ The median is 2.5
◾ The median is much closer to most of the values
in the series!
MODE
◾ 10 10 11 13 15 16 16 16 21 23 28 30 33 34 36
44
= 16
MEASURES OF CENTRAL TENDENCY
◾ Data Set:
3,4,3,1,2,3,9,5,6,7,4,8
◾ Mean
◾ Median
1,2,3,3,3,4,4,5,6,
7,8,9
◾ Mode
Hence Answer =
4
The value 3 appears 3 times, and 4 appears 2 times and all other values appear
once.
Hence 3 is the mode
MEASUREMENT TYPES
DISPERSION
MEASURE OF DISPERSION
(RANGE,VARIANCE, STANDARD DEVIATION)
9 10 11 13 15 16 19 19 21 23 28 30 33 34 36
39
◾ In this sample the mean is 22.25
◾ How do we describe how “spread out” the
sample is?
RANGE
9 10 11 13 15 16 19 19 21 23 28 30 33 34 36
39
VARIANCE
◾ Calculated as the sum of square distances from each
point to the mean
◾ There’s a difference between the SAMPLE variance
and the POPULATION variance
◾ subject to Bessel's correction (𝒏−𝟏)
VARIANCE
SAMPLE VARIANCE
STANDARD DEVIATION
◾ square root of the variance
◾ benefit: same units as the sample
◾ meaningful to talk about
“values that lie within one standard deviation of the
mean”
STANDARD DEVIATION
◾ 68-95-99.7
Rule
◾ The normal distribution is commonly associated with the
68-95-
99.7 rule
◾ 68% of the data is within 1 standard deviation (σ) of the mean
(μ),
◾ 95% of the data is within 2 standard deviations (σ) of the
mean (μ),
SAMPLE STANDARD DEVIATION
POPULATION STANDARD
DEVIATION
QUIZ#1
◾ Dataset:
2, 5, 7, 8, 13, 16, 25, 30, 39,
45
◾ Calculate:
◾ Mean
◾ Variance
◾ Standard Deviation (SD)
◾ How many values fall
within:
◾ 1 SD
◾ 2 SD
◾ 3 SD
MEASUREMENT TYPES
QUARTILES
QUARTILES AND IQR
◾ The quartile concept is used to divide the data into
four parts.
◾ It is the same as median where it divides the given data
into two equal parts
◾ This quartile concept comes under the subject of
statistics which is a study of the collection of data
analyzing it, interpreting, presenting organized data.
QUARTILES AND IQR
◾ Another way to describe data is through quartiles
and the interquartile range (IQR)
◾ Has the advantage that every data point is
considered, not aggregated!
Quartile Formula
As mentioned above Quartile divides the data into 4 equal parts. This can
be represented visually by the below figure.
Quartile 1 lies between starting term and the middle term.
Quartile 2 lies between starting terms and last term i.e., Middle term.
Quartile 3 lies between quartile 2 and last term.
There is a separate formula for finding each quartile value. And in order
to find these quartile values first, sort the given number series data into
ascending order.
Quartile Formula
Steps to obtain quartile formula are as shown below as follows:
1. Sort the given data in ascending order.
2. Find respective quartile values/terms as per need from the below
formulae.
Where n is the total count of numbers in the given data. And this formula
is valid for even number
Examples: Question 1: Find the Quartile 1 for the given data 10, 30, 5, 12,
20, 40, 25, 15, 18.
Step 1: Sort the given data in ny order ( ascending order / descending
order)
5, 10, 12, 15, 18, 20, 25, 30, 40
Step 2: Find 1st Quartile [Note: These formulas are for odd number]
First Quartile = (frac{n + 1}{4})th
term
Second Quartile = (frac{n + 1}{2})th
term
Third Quartile = (frac{3(n + 1)}{4})th
term
Here n = 9 because there are total 9 numbers in the given data.
First Quartile = ((9 + 1)/4)th term
Examples: Question 1: Find the Quartile 1 for the given data 10, 30, 5, 12,
20, 40, 25, 15, 18.
2.5th term = 2nd term + (0.5) (3rd term - 2nd term)
= (10) + (0.5) (12 - 10)
= 10+1
= 11
The First Quartile value is 11.
Question 2: Find the Second Quartile for the data 10, 30, 5, 12, 20, 40, 25,
15, 18.
Sort the given data in the ascending order
5, 10, 12, 15, 18, 20, 25, 30, 40
Step 2: Find 2nd Quartile
Here i=2
Here n = 9 because there are total 9 numbers in the given data.
Second Quartile = 2(9)/4 th term
= (10/2)th term
= 5th term
5th term is 18
So the second Quartile value is 18.
QUARTILES AND IQR
◾ Consider the following series of 20 values:
9 10 10 11 13 15 16 19 19 21 23 28 30 33 34 36 44 45
47 60
1. Divide the series
2. Divide each subseries
3. These become
quartiles
QUARTILES AND IQR
◾ 1st
quartile=
14
◾ 2nd
quartile=
22
rd
PLOT THE QUARTILES
◾ Quartile
ranges are
seldom the
same size!
FENCES & OUTLIERS
◾ What is considered an “outlier”?
◾ A common practice is to set a “fence” that is 1.5
times the width of the IQR
◾ Anything outside the fence is an outlier
◾ This is determined by the data, not an arbitrary
percentage!
FENCES & OUTLIERS
FENCES & OUTLIERS
◾ In this set,
60 is not an
outlier, but
70 would be
FENCES & OUTLIERS
◾ Here 70
is a
true
outlier
◾ When drawing box plots, the whiskers are brought inward to the outermost
values inside

Contenu connexe

Similaire à STAT2111 Introduction to Statistics & Probability

Lecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptxLecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptxABCraftsman
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersionAbhinav yadav
 
Statistics for math (English Version)
Statistics for math (English Version)Statistics for math (English Version)
Statistics for math (English Version)Tito_14
 
First term notes 2020 econs ss2 1
First term notes 2020 econs ss2 1First term notes 2020 econs ss2 1
First term notes 2020 econs ss2 1OmotaraAkinsowon
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of datajennytuazon01630
 
Basics of Stats (2).pptx
Basics of Stats (2).pptxBasics of Stats (2).pptx
Basics of Stats (2).pptxmadihamaqbool6
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptxCalvinAdorDionisio
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data sciencepujashri1975
 
Q4_Day 1_PPT.pptx
Q4_Day 1_PPT.pptxQ4_Day 1_PPT.pptx
Q4_Day 1_PPT.pptxErapUsman
 
Maths A - Chapter 10
Maths A - Chapter 10Maths A - Chapter 10
Maths A - Chapter 10westy67968
 
Tabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency tableTabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency tableJagdish Powar
 
Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...
Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...
Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...Osama Yousaf
 
Stastistics
StastisticsStastistics
StastisticsRivan001
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxboyfieldhouse
 
Penggambaran Data dengan Grafik
Penggambaran Data dengan GrafikPenggambaran Data dengan Grafik
Penggambaran Data dengan Grafikanom0164
 

Similaire à STAT2111 Introduction to Statistics & Probability (20)

Descriptive
DescriptiveDescriptive
Descriptive
 
Data and Statistics
Data and StatisticsData and Statistics
Data and Statistics
 
Lecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptxLecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptx
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
 
Statistics for math (English Version)
Statistics for math (English Version)Statistics for math (English Version)
Statistics for math (English Version)
 
First term notes 2020 econs ss2 1
First term notes 2020 econs ss2 1First term notes 2020 econs ss2 1
First term notes 2020 econs ss2 1
 
Statistics(Basic)
Statistics(Basic)Statistics(Basic)
Statistics(Basic)
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
 
Statistics.ppt
Statistics.pptStatistics.ppt
Statistics.ppt
 
Basics of Stats (2).pptx
Basics of Stats (2).pptxBasics of Stats (2).pptx
Basics of Stats (2).pptx
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptx
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
 
Q4_Day 1_PPT.pptx
Q4_Day 1_PPT.pptxQ4_Day 1_PPT.pptx
Q4_Day 1_PPT.pptx
 
data
datadata
data
 
Maths A - Chapter 10
Maths A - Chapter 10Maths A - Chapter 10
Maths A - Chapter 10
 
Tabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency tableTabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency table
 
Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...
Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...
Importance of The non-modal and the bi-modal situation,(simple) arithmetic me...
 
Stastistics
StastisticsStastistics
Stastistics
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
 
Penggambaran Data dengan Grafik
Penggambaran Data dengan GrafikPenggambaran Data dengan Grafik
Penggambaran Data dengan Grafik
 

Dernier

social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 

Dernier (20)

social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 

STAT2111 Introduction to Statistics & Probability

  • 1. STAT2111 INTRODUCTION TO STATISTICS & PROBABILITY Instructor: Mr. HM Zahid Lecturer, Department of Information Sciences University of Education, Lahore, Pakistan PART 1 - DATA
  • 3. PROBABILITY AND STATISTICS ◾ Statistics is the mathematical science behind the problem “what can I know about a population if I’m unable to reach every member?”
  • 4. PROBABILITY AND STATISTICS ◾ If we could measure the height of every resident of Australia, then we could make a statement about the average height of Australians at the time we took our measurement. ◾ This is where random sampling comes in.
  • 5. PROBABILITY AND STATISTICS ◾ If we take a reasonably sized random sample of Australians and measure their heights, we can form a statistical inference about the population of Australia. ◾ Probability helps us know how sure we are of our conclusions!
  • 7. WHAT IS DATA ◾ Data = the collected observations we have about something. ◾ Data can be continuous "What is the stock price?” ◾ or categorical "What car has the best repair history?”
  • 8. WHY DATA MATTERS ◾ Helps us understand things as they are: "What relationships if any exist between two events?" "Do people who eat an apple a day enjoy fewer doctor's visits than those who don't?"
  • 9. WHY DATA MATTERS ◾ Helps us predict future behavior to guide business decisions: "Based on a user's click history which ad is more likely to bring them to our site?"
  • 10. VISUALIZING DATA ◾ Comparing a table: ◾ Not much can be gained by reading it. Flight s
  • 11. VISUALIZING DATA ◾ To a graph: ◾ The graph uncovers two distinct trends 1. an increase in passengers flying over the years 2. a greater number of passengers flying in the summer months. Flight s
  • 12. ANALYZE VISUALIZATIONS CRITICALLY! ◾ Graphs can be misleading:
  • 14. LEVELS OF MEASUREMENTS ◾ Nominal ◾ Predetermined categories ◾ Can’t be sorted ◾ Animal classification (mammal, fish, or reptile) ◾ Political party (PTI, PPP, or PMLN)
  • 15. LEVELS OF MEASUREMENTS ◾ Ordinal ◾ Can be sorted ◾ Lacks scale ◾ Survey responses
  • 16. LEVELS OF MEASUREMENTS ◾ Interval ◾ Provides scale ◾ Lacks a “zero” point ◾ Temperature (C, F)
  • 17. LEVELS OF MEASUREMENTS ◾ Ratio ◾ Values have a true zero point ◾ Age, weight, salary,Temperature (K)
  • 18. POPULATION VS. SAMPLE ◾ Population = every member of a group ◾ Sample = a subset of members that time and resources allow you to measure
  • 20. EXPONENTS 𝒙𝟓 = 𝑥 × 𝑥 × 𝑥 × 𝑥 × 𝑥 1 2 3 4 5 EXAMPLE: 34 = 3 × 3 × 3 × 3 = 81
  • 25. EQUATION EXAMPLE ◾ Formula for calculating a sample mean: ◾ Read out loud: ◾ “𝒙 bar (the symbol for the sample mean) is equal to the sum (indicated by the Greek letter sigma) of all the 𝒙-sub-𝒊 values in the series as 𝒊 goes from 1 to the number 𝒏 items in the series divided by 𝒏."
  • 29. MEASUREMENTS OF DATA ◾ “What was the average return?” Measures of Central Tendency ◾ “How far from the average did individual values stray?” Measures of Dispersion
  • 30. MEASURES OF CENTRAL TENDENCY (MEAN, MEDIAN, MODE) ◾ Describe the “location” of the data ◾ Fail to describe the “shape” of the data ◾ mean= “calculated average” ◾ median= “middle value”
  • 31. MEAN ◾ Shows “location” but not “how spread out”
  • 32. MEDIAN – ODD NUMBER OF VALUES
  • 33. MEDIAN – EVEN NUMBER OF VALUES
  • 34. MEAN VS. MEDIAN ◾ The mean can be influenced by outliers. ◾ The mean of {2,3,2,3,2,12} is 4 ◾ The median is 2.5 ◾ The median is much closer to most of the values in the series!
  • 35. MODE ◾ 10 10 11 13 15 16 16 16 21 23 28 30 33 34 36 44 = 16
  • 36. MEASURES OF CENTRAL TENDENCY ◾ Data Set: 3,4,3,1,2,3,9,5,6,7,4,8 ◾ Mean ◾ Median 1,2,3,3,3,4,4,5,6, 7,8,9 ◾ Mode Hence Answer = 4 The value 3 appears 3 times, and 4 appears 2 times and all other values appear once. Hence 3 is the mode
  • 38. MEASURE OF DISPERSION (RANGE,VARIANCE, STANDARD DEVIATION) 9 10 11 13 15 16 19 19 21 23 28 30 33 34 36 39 ◾ In this sample the mean is 22.25 ◾ How do we describe how “spread out” the sample is?
  • 39. RANGE 9 10 11 13 15 16 19 19 21 23 28 30 33 34 36 39
  • 40. VARIANCE ◾ Calculated as the sum of square distances from each point to the mean ◾ There’s a difference between the SAMPLE variance and the POPULATION variance ◾ subject to Bessel's correction (𝒏−𝟏)
  • 43. STANDARD DEVIATION ◾ square root of the variance ◾ benefit: same units as the sample ◾ meaningful to talk about “values that lie within one standard deviation of the mean”
  • 44. STANDARD DEVIATION ◾ 68-95-99.7 Rule ◾ The normal distribution is commonly associated with the 68-95- 99.7 rule ◾ 68% of the data is within 1 standard deviation (σ) of the mean (μ), ◾ 95% of the data is within 2 standard deviations (σ) of the mean (μ),
  • 47. QUIZ#1 ◾ Dataset: 2, 5, 7, 8, 13, 16, 25, 30, 39, 45 ◾ Calculate: ◾ Mean ◾ Variance ◾ Standard Deviation (SD) ◾ How many values fall within: ◾ 1 SD ◾ 2 SD ◾ 3 SD
  • 49. QUARTILES AND IQR ◾ The quartile concept is used to divide the data into four parts. ◾ It is the same as median where it divides the given data into two equal parts ◾ This quartile concept comes under the subject of statistics which is a study of the collection of data analyzing it, interpreting, presenting organized data.
  • 50. QUARTILES AND IQR ◾ Another way to describe data is through quartiles and the interquartile range (IQR) ◾ Has the advantage that every data point is considered, not aggregated!
  • 51. Quartile Formula As mentioned above Quartile divides the data into 4 equal parts. This can be represented visually by the below figure. Quartile 1 lies between starting term and the middle term. Quartile 2 lies between starting terms and last term i.e., Middle term. Quartile 3 lies between quartile 2 and last term. There is a separate formula for finding each quartile value. And in order to find these quartile values first, sort the given number series data into ascending order.
  • 52. Quartile Formula Steps to obtain quartile formula are as shown below as follows: 1. Sort the given data in ascending order. 2. Find respective quartile values/terms as per need from the below formulae. Where n is the total count of numbers in the given data. And this formula is valid for even number
  • 53. Examples: Question 1: Find the Quartile 1 for the given data 10, 30, 5, 12, 20, 40, 25, 15, 18. Step 1: Sort the given data in ny order ( ascending order / descending order) 5, 10, 12, 15, 18, 20, 25, 30, 40 Step 2: Find 1st Quartile [Note: These formulas are for odd number] First Quartile = (frac{n + 1}{4})th term Second Quartile = (frac{n + 1}{2})th term Third Quartile = (frac{3(n + 1)}{4})th term Here n = 9 because there are total 9 numbers in the given data. First Quartile = ((9 + 1)/4)th term
  • 54. Examples: Question 1: Find the Quartile 1 for the given data 10, 30, 5, 12, 20, 40, 25, 15, 18. 2.5th term = 2nd term + (0.5) (3rd term - 2nd term) = (10) + (0.5) (12 - 10) = 10+1 = 11 The First Quartile value is 11.
  • 55. Question 2: Find the Second Quartile for the data 10, 30, 5, 12, 20, 40, 25, 15, 18. Sort the given data in the ascending order 5, 10, 12, 15, 18, 20, 25, 30, 40 Step 2: Find 2nd Quartile Here i=2 Here n = 9 because there are total 9 numbers in the given data. Second Quartile = 2(9)/4 th term = (10/2)th term = 5th term 5th term is 18 So the second Quartile value is 18.
  • 56. QUARTILES AND IQR ◾ Consider the following series of 20 values: 9 10 10 11 13 15 16 19 19 21 23 28 30 33 34 36 44 45 47 60 1. Divide the series 2. Divide each subseries 3. These become quartiles
  • 57. QUARTILES AND IQR ◾ 1st quartile= 14 ◾ 2nd quartile= 22 rd
  • 58. PLOT THE QUARTILES ◾ Quartile ranges are seldom the same size!
  • 59. FENCES & OUTLIERS ◾ What is considered an “outlier”? ◾ A common practice is to set a “fence” that is 1.5 times the width of the IQR ◾ Anything outside the fence is an outlier ◾ This is determined by the data, not an arbitrary percentage!
  • 61. FENCES & OUTLIERS ◾ In this set, 60 is not an outlier, but 70 would be
  • 62. FENCES & OUTLIERS ◾ Here 70 is a true outlier ◾ When drawing box plots, the whiskers are brought inward to the outermost values inside