R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
Bayesian models in r
1. Bayesian Models in R 10/3/14, 13:37
Bayesian Models in R
Vivian Zhang | SupStat Inc.
Copyright SupStat Inc., All rights reserved
http://docs.supstat.com/BayesianModelEN/#1 Page 1 of 53
2. Bayesian Models in R 10/3/14, 13:37
Outline
1. Introduction to Bayes and Bayes' Theorem
2. Distribution estimation
3. Conditional probability
4. Bayesian models
2/53
http://docs.supstat.com/BayesianModelEN/#1 Page 2 of 53
3. Bayesian Models in R 10/3/14, 13:37
Introduction to Bayes and
Bayes' Theorem
3/53
http://docs.supstat.com/BayesianModelEN/#1 Page 3 of 53
4. Bayesian Models in R 10/3/14, 13:37
The*story*behind*the*Bayesian*model
Thomas Bayes
18th century English statistician
Most known for the Bayes Theorem
Essential contributor to early development of probability theory
·
·
·
Source: http://www.bioquest.org/products/auth_images/422_bayes.gif
4/53
http://docs.supstat.com/BayesianModelEN/#1 Page 4 of 53
5. Bayesian Models in R 10/3/14, 13:37
The*Model
1. Models using Bayes' theorem (based on conditional probablity
· Naive Bayes, Association Rules
2. Bayes Decision Theory
· Classical Bayesian model for Decision Theory
3. Models implementing Bayesian thinking
· Treat all the parameter as random variables, especially in hierarchical models
5/53
http://docs.supstat.com/BayesianModelEN/#1 Page 5 of 53
6. Bayesian Models in R 10/3/14, 13:37
Distribution Estimation
6/53
http://docs.supstat.com/BayesianModelEN/#1 Page 6 of 53
7. Bayesian Models in R 10/3/14, 13:37
Distribu6on*Es6ma6on
Probablity Density Function
In statistics, the Probablity Density Function (PDF) of a continous random variable is an output
discribing this variable, which means the probability around a certain point.
Example: plot of PDF of the Normal distribution
·
·
7/53
http://docs.supstat.com/BayesianModelEN/#1 Page 7 of 53
8. Bayesian Models in R 10/3/14, 13:37
Distribu6on*Es6ma6on
Probablity Density Function
The PDF has an important place in statistics
- It contains all the information in the random variable
Knowing the PDF, we can calculate the
·
·
Mean
Variance
Median
etc.
-
-
-
-
8/53
http://docs.supstat.com/BayesianModelEN/#1 Page 8 of 53
9. Bayesian Models in R 10/3/14, 13:37
Distribu6on*Es6ma6on
Probablity Density Function
Obtain the PDF, get everything from a random variable. This allows you to perform:
Bayesian Hypothesis Tests
Bayesian Interval Estimation
Bayesian Regression Models
Bayesian Logistic Models
etc.
·
·
·
·
·
9/53
http://docs.supstat.com/BayesianModelEN/#1 Page 9 of 53
10. Bayesian Models in R 10/3/14, 13:37
Distribu6on*Es6ma6on
Probablity Density Function
ExampleBayesian Regression:
Y = Xβ + ϵ, ϵ ∼ N(0, σ2 )
Estimation methods for the regression model
·
·
-
- β ∼ N((X′ X)−1X′ Y, (X′ X)−1 )
- = ( X Y βˆ
OLS (Ordinary Least Squres)
X′ )−1X′ is the estimator of
β
10/53
http://docs.supstat.com/BayesianModelEN/#1 Page 10 of 53
11. Bayesian Models in R 10/3/14, 13:37
Distribu6on*Es6ma6on
The Bayesian Model
Before obtaining data, one has beliefs about the value of the proportion and models his or her
beliefs in terms of a prior distribution.
After data have been observed, one updates one’s beliefs about the proportion by computing the
posterior distribution.
·
·
11/53
http://docs.supstat.com/BayesianModelEN/#1 Page 11 of 53
12. Bayesian Models in R 10/3/14, 13:37
Distribu6on*Es6ma6on
The Bayesian Model
Building a Bayesian model begins with Bayesian Thinking (every value has its own distribution).
Steps to build a Bayesian model:
·
·
Make inferences about prior distribution
Calculate the parameter of the posterior distribution
Finish the statistical task (interval estimationstatistical decision, etc.)
-
-
-
12/53
http://docs.supstat.com/BayesianModelEN/#1 Page 12 of 53
13. Bayesian Models in R 10/3/14, 13:37
Inferring*from*the*posterior*distribu6on
Posterior inference is the core of Bayes' Theorem, because we do not actually know the
population distribution which generated our data. We use the conditional distribution to address
this gap indirectly. In this section, a certain degree of mathematical sophistication is required
without which we cannot easily implement the model computationally.
·
Essentials:
Bayes' theorem
Conditional distribution
- For example: ϵ in regression is from a normal distribution
Certain prior distribution
·
·
·
- No information given
13/53
http://docs.supstat.com/BayesianModelEN/#1 Page 13 of 53
14. Bayesian Models in R 10/3/14, 13:37
Calcula6ng*the*posterior*distribu6on
The most difficult part is calculating the posterior distribution, which requires integration.
· Markov chain Monte Carlo (MCMC)
Gibbs
MH method
-
-
14/53
http://docs.supstat.com/BayesianModelEN/#1 Page 14 of 53
15. Bayesian Models in R 10/3/14, 13:37
Conditional probability
15/53
http://docs.supstat.com/BayesianModelEN/#1 Page 15 of 53
16. Bayesian Models in R 10/3/14, 13:37
Condi6onal*probability
What is conditional probability?
· A B
P(A|B)
The probablity that event will occur when event has occurred. This probability is written as
.
P(A|B) =
P(AB)
P(B)
A and B are two events
· P(AB)
· P(B)
is the probability that both A and B occur.
is the probability that B occurs.
·
16/53
http://docs.supstat.com/BayesianModelEN/#1 Page 16 of 53
17. Bayesian Models in R 10/3/14, 13:37
Condi6onal*probability
Why conditional probability
Example
· Suppose
A: The event of getting a cold
B: The event of a rainy day (p = 0.2)
AB: The event that when it rains you get a cold (p = 0.1)
-
-
-
P(AB)
P(B)
0.1
0.2
P(A|B) = = = 0.5
· Interpretation:
- When it rains, the probablity of getting a cold is 50%
17/53
http://docs.supstat.com/BayesianModelEN/#1 Page 17 of 53
18. Bayesian Models in R 10/3/14, 13:37
Condi6onal*probability
Exercise
· There are two kids in a family.
If one of the kids is a boy, the probability that the other one is also a boy is...
If the first one is a boy, the probability that the other one is a boy is...
,
-
-
- 23
12
18/53
http://docs.supstat.com/BayesianModelEN/#1 Page 18 of 53
19. Bayesian Models in R 10/3/14, 13:37
Condi6onal*Probability
The model relates to conditional probability
· A priori
Mining associated rules
The association from A to B is defined as:
-
-
P(AB)
P(A)
A = B : = P(B|A)
· In R, use the arules package
19/53
http://docs.supstat.com/BayesianModelEN/#1 Page 19 of 53
20. Bayesian Models in R 10/3/14, 13:37
Condi6onal*Probability
A priori
Goal: find the items with strong relationships
First, load the data:
·
·
library(arules)
data = read.csv(data/BASKETS1n)
names(data)
[1] cardid value pmethod sex homeown income
[7] age fruitveg freshmeat dairy cannedveg cannedmeat
[13] frozenmeal beer wine softdrink fish confectionery
20/53
http://docs.supstat.com/BayesianModelEN/#1 Page 20 of 53
22. Bayesian Models in R 10/3/14, 13:37
Condi6onal*Probability
A priori
mat = sparseMatrix(i = trans.items.ind,
j = trans.code.ind,
x = 1,
dims = c(length(unique(trans.items)),
length(unique(trans.code))))
mat = as(mat, 'ngCMatrix')
#after setting the argument we get the model:
trans.res = apriori(mat,parameter = list(confidence=0.05,
support=0.05,
minlen=2,maxlen=3))
22/53
http://docs.supstat.com/BayesianModelEN/#1 Page 22 of 53
23. Bayesian Models in R 10/3/14, 13:37
Condi6onal*Probability
A priori
parameter specification:
confidence minval smax arem aval originalSupport support minlen maxlen target ext
0.05 0.1 1 none FALSE TRUE 0.05 2 3 rules FALSE
algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[11 item(s), 940 transaction(s)] done [0.00s].
sorting and recoding items ... [11 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [108 rule(s)] done [0.00s]. 23/53
http://docs.supstat.com/BayesianModelEN/#1 Page 23 of 53
24. Bayesian Models in R 10/3/14, 13:37
Condi6onal*Probability
· At last, we have the items with the strongest relationship in one basket
#let's see these rules:
lhs.generic = unique(trans.items)[trans.res@lhs@data@i+1]
rhs.generic = unique(trans.items)[trans.res@rhs@data@i+1]
cbind(lhs.generic, rhs.generic)[1:10, ]
lhs.generic rhs.generic
[1,] dairy confectionery
[2,] confectionery dairy
[3,] dairy fish
[4,] fish dairy
[5,] dairy fruitveg
[6,] fruitveg dairy
[7,] dairy frozenmeal
[8,] frozenmeal dairy
[9,] freshmeat confectionery
[10,] confectionery freshmeat
24/53
http://docs.supstat.com/BayesianModelEN/#1 Page 24 of 53
25. Bayesian Models in R 10/3/14, 13:37
Condi6onal*Probability
The model relates to conditional probablity
· Naive Bayes
Used in recommendation systemsclassification problems
Compute the posterior probability for all values of C using the Bayes
theorem:
-
- P(C|A1, A2,…, An)
P(C|A1A2 ⋯An) =
- Choose the value of C that maximizes
P(C|A1, A2, . . . , An)
- P(A1, A2, . . . , An|C)P(C)
P(A1A2 ⋯An |C) × P(C)
P(A1A2 ⋯An )
Equivalent to choosing the value of C that maximizes
25/53
http://docs.supstat.com/BayesianModelEN/#1 Page 25 of 53
26. Bayesian Models in R 10/3/14, 13:37
Naive*Bayes
data(iris)
m = naiveBayes(Species ~ ., data=iris)
## alternatively:
m = naiveBayes(iris[, -5], iris[, 5])
26/53
http://docs.supstat.com/BayesianModelEN/#1 Page 26 of 53
27. Bayesian Models in R 10/3/14, 13:37
Naive*Bayes
Model:
m
Naive Bayes Classifier for Discrete Predictors
Call:
naiveBayes.default(x = iris[, -5], y = iris[, 5])
A-priori probabilities:
iris[, 5]
setosa versicolor virginica
0.33333 0.33333 0.33333
Conditional probabilities:
Sepal.Length
iris[, 5] [,1] [,2]
setosa 5.006 0.35249 27/53
http://docs.supstat.com/BayesianModelEN/#1 Page 27 of 53
29. Bayesian Models in R 10/3/14, 13:37
From*condi6onal*probablity*to*Bayes'*Theorem
We have:
So:
Change the Conditional Prob.
·
P(B|A) =
P(AB)
P(A)
·
P(AB) = P(B|A)P(A)
·
P(AB)
P(B)
P(A|B) = =
P(B|A)P(A)
P(B)
29/53
http://docs.supstat.com/BayesianModelEN/#1 Page 29 of 53
30. Bayesian Models in R 10/3/14, 13:37
Bayes'*Theorem
P(A|B) =
P(B|A)P(A)
P(B)
Bayes' theorem relates the conditional probablity to the marginal distribution of a random varable.
Bayes' theorm can tell us how to update our thinking after obtaining new data.
Harold Jeffreys has claimed that Bayes' theorem is to Statistics as the Pythagorean theorem is to
geometry.
·
·
30/53
http://docs.supstat.com/BayesianModelEN/#1 Page 30 of 53
31. Bayesian Models in R 10/3/14, 13:37
Bayes'*theorem
Continuous situation
The Bayes' theorem mentioned above is in discrete form
In the real world often we are using and analyzing continuous random variables
The Bayes' theorem can be written in continuous form as:
·
·
·
π(θ|x) =
f (x|θ)π(θ)
m(x)
31/53
http://docs.supstat.com/BayesianModelEN/#1 Page 31 of 53
32. Bayesian Models in R 10/3/14, 13:37
Bayes'*Theorem
Continous form
π(θ|x) =
f (x|θ)π(θ)
m(x)
· Here
- θ
is an unknown parameter
- X
is the data observed
- Processing is from π(θ) to
π(θ|x)
- From the original knowledge of θ updated to the situation after we observe
X
32/53
http://docs.supstat.com/BayesianModelEN/#1 Page 32 of 53
33. Bayesian Models in R 10/3/14, 13:37
Bayes'*Theorem
Continuous form
π(θ|x) =
f (x|θ)π(θ)
m(x)
· Based on the properties of continous random variables, it can be written as:
π(θ|x) =
f (x|θ)π(θ)
∫ f (x|θ)π(θ)dθ
33/53
http://docs.supstat.com/BayesianModelEN/#1 Page 33 of 53
34. Bayesian Models in R 10/3/14, 13:37
Bayes'*Theorem
Continuous form
Important distributions:
f (x|θ)π(θ)
m(x)
π(θ|x) = =
f (x|θ)π(θ)
∫ f (x|θ)π(θ)dθ
· π(θ)
- Prior distribution
· π(θ|x)
- Posterior distribution
34/53
http://docs.supstat.com/BayesianModelEN/#1 Page 34 of 53
35. Bayesian Models in R 10/3/14, 13:37
Bayes'*Theorem
Continuous form
Other distributions:
f (x|θ)π(θ)
m(x)
π(θ|x) = =
f (x|θ)π(θ)
∫ f (x|θ)π(θ)dθ
· m(x) = ∫ f (x|θ)π(θ)dθ
- Marginal Distribution
· f (x|θ)π(θ) = f (x, θ)
- Joint distribution
35/53
http://docs.supstat.com/BayesianModelEN/#1 Page 35 of 53
36. Bayesian Models in R 10/3/14, 13:37
Bayesian Models
36/53
http://docs.supstat.com/BayesianModelEN/#1 Page 36 of 53
37. Bayesian Models in R 10/3/14, 13:37
Bayesian*Models
Bayesian thinking
data(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
· Data are random variables with a mean of μ
37/53
http://docs.supstat.com/BayesianModelEN/#1 Page 37 of 53
38. Bayesian Models in R 10/3/14, 13:37
Bayesian*Models
Bayesian thinking
· The frequency perspective: The mean μ is a constant
colMeans(iris[, 1:3])
Sepal.Length Sepal.Width Petal.Length
5.8433 3.0573 3.7580
38/53
http://docs.supstat.com/BayesianModelEN/#1 Page 38 of 53
39. Bayesian Models in R 10/3/14, 13:37
Bayesian*Models
Bayesian thinking
· The Bayesian perspective: The mean μ is a random variable
PROB SEPAL LENGTH SEPAL WIDTH PETAL LENGTH
90% 5.843333 3.057333 3.758000
10% Others Others Others
39/53
http://docs.supstat.com/BayesianModelEN/#1 Page 39 of 53
40. Bayesian Models in R 10/3/14, 13:37
Bayesian*Models
In fact, nearly all of modern Bayesian modeling uses Bayesian thinking
Nearly all statistical models can be implemented as Bayesian-form models
Even some non-parametric models can be transformed to Bayeseian versions
Bayes Cluster
Bayes Regression
- Logit, Probit, Tobit, Quantile, LASSO...
Bayes Neural Net
Non-parametric Bayes
Hierarchical model
etc.
·
·
·
·
·
·
·
·
·
40/53
http://docs.supstat.com/BayesianModelEN/#1 Page 40 of 53
41. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
Question
For a Sample from a normal distribution. We want to know the mean of this sample.
ˆ
θ· X1, X2, . . . , Xn ∼ N(θ, σ)
· Frequentists think
= mean(x) · θ
· θ ∼ N(μ, τ2)
Bayesians think is a random variable with a distribution
Suppose that
·
Infer the posterior distribution
Calculate the posterior distribution
Estimate the mean of the sample
-
-
-
41/53
http://docs.supstat.com/BayesianModelEN/#1 Page 41 of 53
42. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
Inference
Inferring the posterior distribution using Bayes' Theorem in continous form:
f (x|θ)π(θ)
m(x)
π(θ|x) = =
f (x|θ)π(θ)
∫ f (x|θ)π(θ)dθ
· Put the distribution into the theorem to calculate the posterior distribution
- Prior distribution
θ ∼ N(μ, τ2)
- Conditional distribution
x|θ ∼ N(θ, σ2 )
42/53
http://docs.supstat.com/BayesianModelEN/#1 Page 42 of 53
43. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
Inference
43/53
http://docs.supstat.com/BayesianModelEN/#1 Page 43 of 53
44. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
Calculating the posterior distribution
According to the theorem, we know the mean and the variance of θ for a normal distribution.
postDis = function(miu=2, tau=4, n=100) {
x = rnorm(n,3,5)
a = list(0)
a[[1]] = (var(x)*miu+tau^2*mean(x))/(var(x)+tau^2)
a[[2]] = var(x)*tau^2/(var(x)+tau^2)
a
}
postDis(3, 5, 1000)
[[1]]
[1] 2.9284
[[2]]
[1] 12.254
44/53
http://docs.supstat.com/BayesianModelEN/#1 Page 44 of 53
45. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
Estimating the mean
· μ
In ordinary statistics, the MLE and moment estimators of in a normal distribution are the sample
mean.
For the Bayes posterior distribution
·
MLE --- posterior maximum likelihood estimator
Can be considered as MLE of posterior distribution
Posterior distribution is normal, too. So, the parameter of the mean is:
-
-
-
(σ2μ + τ2x)/(σ2 + τ2 )
45/53
http://docs.supstat.com/BayesianModelEN/#1 Page 45 of 53
46. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
Estimating the mean
· x ∼ N(μ, σ) = N(3, 5)
- The mean is 3
When using a different prior distribution
Observe the error in a different situation
·
·
46/53
http://docs.supstat.com/BayesianModelEN/#1 Page 46 of 53
47. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
· Prior distribution: N(3, 1)
library(ggplot2)
plot_dif = function(miu=3, tau=1) {
i = seq(100, 10000, by=10)
set.seed(123)
meanCompare = function(n=100, miu=3, tau=1) {
x = rnorm(n, 3, 5)
(var(x)*miu+tau^2*mean(x))/(var(x)+tau^2)-3
}
aa = sapply(i, meanCompare, miu=miu, tau=tau)
bb = sapply(i,function(i) mean(rnorm(i,3,5))-3)
g = ggplot(data.frame(i=i, a=aa, b=bb)) +
geom_line(aes(x=i ,y=b), col=blue) +
geom_line(aes(x=i, y=a), col=red)
print(g)
}
47/53
http://docs.supstat.com/BayesianModelEN/#1 Page 47 of 53
48. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
· Prior distribution: N(3, 1) (Bayes estimator in red, MLE in blue)
plot_dif(3, 1)
48/53
http://docs.supstat.com/BayesianModelEN/#1 Page 48 of 53
49. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
· Prior distribution: N(2, 1) (Bayes estimator in red, MLE in blue)
plot_dif(2,1)
49/53
http://docs.supstat.com/BayesianModelEN/#1 Page 49 of 53
50. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
· Prior distribution: N(2, 4) (Bayes estimator in red, MLE in blue)
plot_dif(2,4)
50/53
http://docs.supstat.com/BayesianModelEN/#1 Page 50 of 53
51. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
· Prior distribution: N(2, 100) (Bayes estimator in red, MLE in blue)
plot_dif(2,100)
51/53
http://docs.supstat.com/BayesianModelEN/#1 Page 51 of 53
52. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
1. As we can see, if the prior distribution is very accurate, the Bayes estimator is better than the
ordinary estimator.
2. If the prior distribution is not accurate enough:
Larger variance is better
For a suitable variance more data is better
·
·
52/53
http://docs.supstat.com/BayesianModelEN/#1 Page 52 of 53
53. Bayesian Models in R 10/3/14, 13:37
Bayesian*Modeling*Example
Choosing the prior distribution
· Choosing a prior distribution...
If sure for the model, can improve the accuracy of the estimator
If not sure, should be done by selecting for greater variance to improve the estimator
-
-
53/53
http://docs.supstat.com/BayesianModelEN/#1 Page 53 of 53