# Types of Probability Distributions - Statistics II

Data Scientist for Advanced Analytics
13 Jan 2022
1 sur 26

### Types of Probability Distributions - Statistics II

• 1. Types Of Probability distribution
• 2. Types of probability distributions. 1. Discrete distribution  binomial  poisson  hyper geometric  negative binomial  geometric 2. Continuous distribution  Normal distribution  T –distribution
• 3. Continuous probability distribution  If a random variable is a continuous variable (i.e. if a variable can take any value between a specific range), its probability distribution is called as continuous probability distribution.  The equation used to describe a continuous probability distribution is called as probability density function (pdf) and also density function. Rupak Roy
• 4. Discrete probability function  If a random variable is a discrete variable, its probability distribution is called as discrete probability distribution. Example: What is the chance of exactly getting 10 heads out of 20 tosses? In Excel: =BINOMDIST(10,20,0.5, FALSE) The answer is 17.61% Rupak Roy
• 5. Binomial Probability distribution To understand binomial distribution and binomial probability. Let’s first understand what is binomial experiment. A binomial experiment is a statistical experiment that has the following properties: - The experiment consists of n repeated trails. - Each trail can result in just two possible outcomes that is success or a failure. - The trails are independent, that is getting head on one trail does not affect whether we get heads on other trails. Rupak Roy
• 6. Therefore binomial distribution is the number of successes x in n repeated trails of a binomial experiment. Bernoulli trail is also an another name of Binomial distribution In Excel: =Binom.dist( number_s,trails, probablity_s, cumulative) Where number_s = number of P events (success) trails = total number of events Probability_s = success rate i.e. 0.5(50% head,50%tail) Cumulative = True(<=) False(point probability) Rupak Roy
• 7. Example q What is the probability of getting exactly 2 heads with 5 coins flips? number_s = number of P events (success) = 2 trails = total number of events = 5 probability_s = success rate = 0.5(50% head,50%tail) cumulative = False( point probability) In Excel: = binom.dist(2,5,0.5,False) = 0.3125 q What is the probability of getting 2 heads or less with 5 coin flips? = binom.dist(2,5,0.5,true) =0.5 Note: Cumulative: False (point probability = exactly 2 heads Cumulative: True (<=) = 2 heads or less Rupak Roy
• 8. Example continued q What is the probability of getting more than 2 heads with 5 coin flips? = 1- binom.dist (2,5,0.5,true) = 1- 0.5 = 0.5 i.e. 1- (probability(p) of getting less than 2 heads) this will give you the right side of the values, that is the probability of getting heads more then 2, and remember probability is always between 0 to1 ) Alternatively, we can also achieve this by calculating each point prob. = probability(3 heads)+ probability(4 heads) + probability(5 heads = binom.dist(3,5,0.5,false)+binom.dist(4,5,0.5,false)+ binom.dist (5,5,0.5,false) =0.5 which is not an effective way to calculate probability In simple mathematical example, 10 - 2 = 8 values remaining greater than 2 Rupak Roy
• 9. Example 2 In a factory unit the product has a faulty rate of 30%, as part of quality inspector; you randomly selected products of 15. If from a selection, you get 7 faulty products. How likely is the outcome due to randomness? In Excel: = binom.dist ( number_s, trails, probablity_s, cumulative ) Where , number_s = 7, trails = 15 probability_s = 30%(0.3) , cumulative = False = binom.dist(7,15,0.3,false) = 0.08 Rupak Roy
• 10. Poisson Probability distribution It is an another discrete frequency distribution which gives the probability of a number of independent events occurring in a fixed time. Characteristics: The experiments consist of counting of number of events that are occurred during a specific interval of time or in a specific distance, area, or volume. Rupak Roy
• 11. Examples  Number of calls in day.  Number of car accident in a month.  Diseases spread over a period of month.  Number of emergency services needed in hospital for the hour. Rupak Roy
• 12. A soda vending machine with an average of 80 withdrawals in a day and a average transaction amount of \$70. The owner needs to know much he have to stock to maintain an equilibrium profit. What is the most appropriate amount of soda that needs to be stocked for 5 days? The owner can tolerate loss up to 10%. In Excel: POISSON.DIST( x, mean, cumulative) Example continued Rupak Roy
• 13. Poisson Probability = ( x, mean, cumulative) P = ( x (withdrawals), 80,true) P > withdrawals = 1 - p We can see 10% level at 101 withdrawals which is the appropriate amount to keep Hence, we can also Conclude Amount of sales For those 5 days = 101 withdrawals * \$70 average * 5 days = \$35,350 Note: here cumulative is True which gives probability ( <= ) of withdrawals. Therefore 1- p gives probability (p) > withdrawals Rupak Roy
• 14. Hyper geometric Probability Distribution The hyper geometric distribution is used to calculate Bernoulli trails without replacement. Assume there are total 196 voters out of which 95 are male. A random sample of 10 voters is drawn, what is the probability of 7 are males . In Excel: = HYPGEOM.DIST( sample_s, numer_sample, population_s, number_population, cumulative) Rupak Roy
• 15. Hyper-geometric Probability Distribution  In Excel: HYPGEOM.DIST( sample_s, number_sample, population_s, number_population, cumulative) where, sample_s = 7, number_sample = 10 population_s = 95 , number_population =196 Therefore, =HYPGEOM.DIST(7,10,95,196,FALSE) = 0.100864 Rupak Roy
• 16. Negative Binomial distribution  A negative binomial distribution is the number of repeated trails to get X success. In excel: NEGBINOM.DIST( number_f, number_s, probability_s , cumulative(false)) Rupak Roy
• 17. Geometric Probability distribution  Is a special case of the negative binomial distribution that deals with the number of trials required for a single success. Example: tossing a coin until it hits head. What is the probability that the first head occurs on the third flip. This is known as geometric probability. In Excel: NEGBINOM.DIST ( number_f, number_s, probability_s, cumulative(true) ) Rupak Roy
• 18. Normal Probability distribution  A normal probability distribution function tells the probability of any real observation that falls between two specified real limits (numbers) and the sample size should be more than 30 or else it will fall under T- distribution.  In Probability Theory, Normal distribution or Gaussian distribution is one of the common continuous probability distribution. They are important in statistics and are often used in the natural and social sciences to represent the real-values random variables whose distributions are not known.  The normal distribution is useful because of the central limit theorem.  Informally normal distribution is called as bell curve. Rupak Roy
• 19. Central Limit Theorem CLT (in brief )  The central limit theorem says irrespective of the underlying population distribution, when you pick a multiple random sample from underlying population with a sample size of at least 30. The distribution of sample average will be normal even if the underlying population is not normal Rupak Roy
• 20.  Most of the data values in a normal distribution tend to cluster around the mean. The further a data point is from the mean, the less likely it is to occur.  Normal distributions are symmetric, unimodel and asymptotic and the mean, median and mode are all equal.  In normal distribution we never calculate point value of cumulative because Here we always take about less, greater then but never equal to probability of an outcome Rupak Roy
• 21.  In a normal distribution 50% of the observations are less than median, mode, mean.  In normal distribution 68% of the observations are written 1 standard deviation of the mean OR 95% of the area of a normal distribution is within 2 standard deviation of the mean.  The mean, median, mode of a normal distribution is equal. Remember: Standard deviations refers to standard way or average of how near or far the observations are from the mean. Rupak Roy
• 22.  A group of students took a test and the final grades have a mean of 70 and standard deviation of 10, what of these students a) Scored higher than 80? b) Should pass the test (grades >60)? c) Should fail the test (grades < 60)? In excel: = normal.dist ( Outcome, Mean , Standard Deviation, Cumulative) A) =1-norm.dist(80,70,10,TRUE) = 1- 0.841 =0.15 B) = 1-norm.dist(60,70,10,TRUE) = 1- 0.841 = 0.841 C)= norm.dist(60,70,10,TRUE) = 0.158 Rupak Roy
• 23. Extra: Probability distributions in R programming R is an open source software where we can perform advance statistical computing. In R programming probability distribution is further divided into 2 functions Density and Cumulative  Density function are like excel functions where we provide FALSE as the input to cumulative in excel  In Cumulative function we provide TRUE as an input in the excel counterpart. For Binomial distribution: Density function: dbinom(numberOfSuccess, numberOfTrials, probabilityOfSuccess) Cumulative function: pbinom (numberOfSuccess, numberOfTrials, probabilityOfSuccess) Rupak Roy
• 24.  For Negative Distribution Density function: dnbinom (numerOfFalse, numerOf_s, probability_s) Cumulative function: pnbinom (numerOfFalse, numerOf_s, probability_s)  For Hyper-geometric distribution Density function: dhyper (sample_s, pop_s, pop_f, sample_size) Cumulative function: phyper (sample_s, pop_s, pop_f, sample_size)  For Poisson distribution Density function: dpois ( x , mean) Cumulative function: ppois ( X, mean)  For Normal distribution: normal distribution are always greater then or small but never to point probability / value, so the probability function for this is only Cumulative: pnorm (ObservedDataValue, mean, standard-deviation) Rupak Roy
• 25. The further when we will proceed for advance analytics we will be familiar with the R open source programming concepts. Next. What probability distribution we will use if the sample size if less than 30 ? Rupak Roy
• 26. To be continued. Rupak Roy