Bayesian models in r

Bayesian Models in R 10/3/14, 13:37
Bayesian Models in R
Vivian Zhang | SupStat Inc.
Copyright SupStat Inc., All rights reserved
http://docs.supstat.com/BayesianModelEN/#1 Page 1 of 53

Outline
1. Introduction to Bayes and Bayes' Theorem
2. Distribution estimation
3. Conditional probability
4. Bayesian models
2/53

Introduction to Bayes and
Bayes' Theorem
3/53

The*story*behind*the*Bayesian*model
Thomas Bayes
18th century English statistician
Most known for the Bayes Theorem
Essential contributor to early development of probability theory
·
·
·
Source: http://www.bioquest.org/products/auth_images/422_bayes.gif
4/53

The*Model
1. Models using Bayes' theorem (based on conditional probablity
· Naive Bayes, Association Rules
2. Bayes Decision Theory
· Classical Bayesian model for Decision Theory
3. Models implementing Bayesian thinking
· Treat all the parameter as random variables, especially in hierarchical models
5/53

Distribution Estimation
6/53

Distribu6on*Es6ma6on
Probablity Density Function
In statistics, the Probablity Density Function (PDF) of a continous random variable is an output
discribing this variable, which means the probability around a certain point.
Example: plot of PDF of the Normal distribution
·
·
7/53

The PDF has an important place in statistics
- It contains all the information in the random variable
Knowing the PDF, we can calculate the
·
·
Mean
Variance
Median
etc.
-
-
-
-
8/53

Obtain the PDF, get everything from a random variable. This allows you to perform:
Bayesian Hypothesis Tests
Bayesian Interval Estimation
Bayesian Regression Models
Bayesian Logistic Models
etc.
·
·
·
·
·
9/53

ExampleBayesian Regression:
Y = Xβ + ϵ, ϵ ∼ N(0, σ2 )
Estimation methods for the regression model
·
·
-
- β ∼ N((X′ X)−1X′ Y, (X′ X)−1 )
- = ( X Y βˆ
OLS (Ordinary Least Squres)
X′ )−1X′ is the estimator of
β
10/53

The Bayesian Model
Before obtaining data, one has beliefs about the value of the proportion and models his or her
beliefs in terms of a prior distribution.
After data have been observed, one updates one’s beliefs about the proportion by computing the
posterior distribution.
·
·
11/53

The Bayesian Model
Building a Bayesian model begins with Bayesian Thinking (every value has its own distribution).
Steps to build a Bayesian model:
·
·
Make inferences about prior distribution
Calculate the parameter of the posterior distribution
Finish the statistical task (interval estimationstatistical decision, etc.)
-
-
-
12/53

Inferring*from*the*posterior*distribu6on
Posterior inference is the core of Bayes' Theorem, because we do not actually know the
population distribution which generated our data. We use the conditional distribution to address
this gap indirectly. In this section, a certain degree of mathematical sophistication is required
without which we cannot easily implement the model computationally.
·
Essentials:
Bayes' theorem
Conditional distribution
- For example: ϵ in regression is from a normal distribution
Certain prior distribution
·
·
·
- No information given
13/53

Calcula6ng*the*posterior*distribu6on
The most difficult part is calculating the posterior distribution, which requires integration.
· Markov chain Monte Carlo (MCMC)
Gibbs
MH method
-
-
14/53

Conditional probability
15/53

Condi6onal*probability
What is conditional probability?
· A B
P(A|B)
The probablity that event will occur when event has occurred. This probability is written as
.
P(A|B) =
P(AB)
P(B)
A and B are two events
· P(AB)
· P(B)
is the probability that both A and B occur.
is the probability that B occurs.
·
16/53

Why conditional probability
Example
· Suppose
A: The event of getting a cold
B: The event of a rainy day (p = 0.2)
AB: The event that when it rains you get a cold (p = 0.1)
-
-
-
P(AB)
P(B)
0.1
0.2
P(A|B) = = = 0.5
· Interpretation:
- When it rains, the probablity of getting a cold is 50%
17/53

Exercise
· There are two kids in a family.
If one of the kids is a boy, the probability that the other one is also a boy is...
If the first one is a boy, the probability that the other one is a boy is...
,
-
-
- 23
12
18/53

Condi6onal*Probability
The model relates to conditional probability
· A priori
Mining associated rules
The association from A to B is defined as:
-
-
P(AB)
P(A)
A = B : = P(B|A)
· In R, use the arules package
19/53

A priori
Goal: find the items with strong relationships
First, load the data:
·
·
library(arules)
data = read.csv(data/BASKETS1n)
names(data)
[1] cardid value pmethod sex homeown income
[7] age fruitveg freshmeat dairy cannedveg cannedmeat
[13] frozenmeal beer wine softdrink fish confectionery
20/53

A priori
basket = data[, 8:18]
names(basket)[which(basket[1, ] == T)]
[1] freshmeat dairy confectionery
tbs2 = apply(basket, 1, function(x) names(basket)[which(x==T)])
len = sapply(tbs2, length)
require(arules)
trans.code = rep(1:1000, len)
trans.items = unname(unlist(tbs2))
trans.code.ind = match(trans.code, unique(trans.code))
trans.items.ind = match(trans.items, unique(trans.items))
21/53

A priori
mat = sparseMatrix(i = trans.items.ind,
j = trans.code.ind,
x = 1,
dims = c(length(unique(trans.items)),
length(unique(trans.code))))
mat = as(mat, 'ngCMatrix')
#after setting the argument we get the model:
trans.res = apriori(mat,parameter = list(confidence=0.05,
support=0.05,
minlen=2,maxlen=3))
22/53

A priori
parameter specification:
confidence minval smax arem aval originalSupport support minlen maxlen target ext
0.05 0.1 1 none FALSE TRUE 0.05 2 3 rules FALSE
algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
apriori - find association rules with the apriori algorithm
version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[11 item(s), 940 transaction(s)] done [0.00s].
sorting and recoding items ... [11 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [108 rule(s)] done [0.00s]. 23/53

· At last, we have the items with the strongest relationship in one basket
#let's see these rules:
lhs.generic = unique(trans.items)[trans.res@lhs@data@i+1]
rhs.generic = unique(trans.items)[trans.res@rhs@data@i+1]
cbind(lhs.generic, rhs.generic)[1:10, ]
lhs.generic rhs.generic
[1,] dairy confectionery
[2,] confectionery dairy
[3,] dairy fish
[4,] fish dairy
[5,] dairy fruitveg
[6,] fruitveg dairy
[7,] dairy frozenmeal
[8,] frozenmeal dairy
[9,] freshmeat confectionery
[10,] confectionery freshmeat
24/53

The model relates to conditional probablity
· Naive Bayes
Used in recommendation systemsclassification problems
Compute the posterior probability for all values of C using the Bayes
theorem:
-
- P(C|A1, A2,…, An)
P(C|A1A2 ⋯An) =
- Choose the value of C that maximizes
P(C|A1, A2, . . . , An)
- P(A1, A2, . . . , An|C)P(C)
P(A1A2 ⋯An |C) × P(C)
P(A1A2 ⋯An )
Equivalent to choosing the value of C that maximizes
25/53

Naive*Bayes
data(iris)
m = naiveBayes(Species ~ ., data=iris)
## alternatively:
m = naiveBayes(iris[, -5], iris[, 5])
26/53

Naive*Bayes
Model:
m
Naive Bayes Classifier for Discrete Predictors
Call:
naiveBayes.default(x = iris[, -5], y = iris[, 5])
A-priori probabilities:
iris[, 5]
setosa versicolor virginica
0.33333 0.33333 0.33333
Conditional probabilities:
Sepal.Length
iris[, 5] [,1] [,2]
setosa 5.006 0.35249 27/53

Naive*Bayes
Predict:
table(predict(m, iris), iris[,5])
setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 3
virginica 0 3 47
28/53

From*condi6onal*probablity*to*Bayes'*Theorem
We have:
So:
Change the Conditional Prob.
·
P(B|A) =
P(AB)
P(A)
·
P(AB) = P(B|A)P(A)
·
P(AB)
P(B)
P(A|B) = =
P(B|A)P(A)
P(B)
29/53

Bayes'*Theorem
P(A|B) =
P(B|A)P(A)
P(B)
Bayes' theorem relates the conditional probablity to the marginal distribution of a random varable.
Bayes' theorm can tell us how to update our thinking after obtaining new data.
Harold Jeffreys has claimed that Bayes' theorem is to Statistics as the Pythagorean theorem is to
geometry.
·
·
30/53

Bayes'*theorem
Continuous situation
The Bayes' theorem mentioned above is in discrete form
In the real world often we are using and analyzing continuous random variables
The Bayes' theorem can be written in continuous form as:
·
·
·
π(θ|x) =
f (x|θ)π(θ)
m(x)
31/53

Bayes'*Theorem
Continous form
π(θ|x) =
f (x|θ)π(θ)
m(x)
· Here
- θ
is an unknown parameter
- X
is the data observed
- Processing is from π(θ) to
π(θ|x)
- From the original knowledge of θ updated to the situation after we observe
X
32/53

Bayes'*Theorem
Continuous form
Important distributions:
f (x|θ)π(θ)
m(x)
π(θ|x) = =
f (x|θ)π(θ)
· π(θ)
- Prior distribution
· π(θ|x)
- Posterior distribution
34/53

Bayesian Models
36/53

Bayesian*Models
Bayesian thinking
data(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
· Data are random variables with a mean of μ
37/53

Bayesian*Models
Bayesian thinking
· The frequency perspective: The mean μ is a constant
colMeans(iris[, 1:3])
Sepal.Length Sepal.Width Petal.Length
5.8433 3.0573 3.7580
38/53

Bayesian*Models
Bayesian thinking
· The Bayesian perspective: The mean μ is a random variable
PROB SEPAL LENGTH SEPAL WIDTH PETAL LENGTH
90% 5.843333 3.057333 3.758000
10% Others Others Others
39/53

Bayesian*Models
In fact, nearly all of modern Bayesian modeling uses Bayesian thinking
Nearly all statistical models can be implemented as Bayesian-form models
Even some non-parametric models can be transformed to Bayeseian versions
Bayes Cluster
Bayes Regression
- Logit, Probit, Tobit, Quantile, LASSO...
Bayes Neural Net
Non-parametric Bayes
Hierarchical model
etc.
·
·
·
·
·
·
·
·
·
40/53

Bayesian*Modeling*Example
Question
For a Sample from a normal distribution. We want to know the mean of this sample.
ˆ
θ· X1, X2, . . . , Xn ∼ N(θ, σ)
· Frequentists think
= mean(x) · θ
· θ ∼ N(μ, τ2)
Bayesians think is a random variable with a distribution
Suppose that
·
Infer the posterior distribution
Calculate the posterior distribution
Estimate the mean of the sample
-
-
-
41/53

Inference
Inferring the posterior distribution using Bayes' Theorem in continous form:
f (x|θ)π(θ)
m(x)
π(θ|x) = =
f (x|θ)π(θ)
· Put the distribution into the theorem to calculate the posterior distribution
- Prior distribution
θ ∼ N(μ, τ2)
- Conditional distribution
x|θ ∼ N(θ, σ2 )
42/53

Inference
43/53

Calculating the posterior distribution
According to the theorem, we know the mean and the variance of θ for a normal distribution.
postDis = function(miu=2, tau=4, n=100) {
x = rnorm(n,3,5)
a = list(0)
a[[1]] = (var(x)*miu+tau^2*mean(x))/(var(x)+tau^2)
a[[2]] = var(x)*tau^2/(var(x)+tau^2)
a
}
postDis(3, 5, 1000)
[[1]]
[1] 2.9284
[[2]]
[1] 12.254
44/53

Estimating the mean
· μ
In ordinary statistics, the MLE and moment estimators of in a normal distribution are the sample
mean.
For the Bayes posterior distribution
·
MLE --- posterior maximum likelihood estimator
Can be considered as MLE of posterior distribution
Posterior distribution is normal, too. So, the parameter of the mean is:
-
-
-
(σ2μ + τ2x)/(σ2 + τ2 )
45/53

Estimating the mean
· x ∼ N(μ, σ) = N(3, 5)
- The mean is 3
When using a different prior distribution
Observe the error in a different situation
·
·
46/53

· Prior distribution: N(3, 1)
library(ggplot2)
plot_dif = function(miu=3, tau=1) {
i = seq(100, 10000, by=10)
set.seed(123)
meanCompare = function(n=100, miu=3, tau=1) {
x = rnorm(n, 3, 5)
(var(x)*miu+tau^2*mean(x))/(var(x)+tau^2)-3
}
aa = sapply(i, meanCompare, miu=miu, tau=tau)
bb = sapply(i,function(i) mean(rnorm(i,3,5))-3)
g = ggplot(data.frame(i=i, a=aa, b=bb)) +
geom_line(aes(x=i ,y=b), col=blue) +
geom_line(aes(x=i, y=a), col=red)
print(g)
}
47/53

· Prior distribution: N(3, 1) (Bayes estimator in red, MLE in blue)
plot_dif(3, 1)
48/53

plot_dif(2,1)
49/53

plot_dif(2,4)
50/53

plot_dif(2,100)
51/53

1. As we can see, if the prior distribution is very accurate, the Bayes estimator is better than the
ordinary estimator.
2. If the prior distribution is not accurate enough:
Larger variance is better
For a suitable variance more data is better
·
·
52/53

Choosing the prior distribution
· Choosing a prior distribution...
If sure for the model, can improve the accuracy of the estimator
If not sure, should be done by selecting for greater variance to improve the estimator
-
-
53/53

Bayesian models in r

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (10)

Similaire à Bayesian models in r

Similaire à Bayesian models in r (17)

Plus de Vivian S. Zhang

Plus de Vivian S. Zhang (17)

Bayesian models in r