SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Week 9:
Count Data - Poisson Regression
Applied Statistical Analysis II
Jeffrey Ziegler, PhD
Assistant Professor in Political Science & Data Science
Trinity College Dublin
Spring 2023
Roadmap through Stats Land
Where we’ve been:
Over-arching goal: We’re learning how to make inferences
about a population from a sample
Last time: We learned how to conduct a linear regression
when our outcome is an (un)ordered category
Today we will:
Review exam
Estimate & interpret a Poisson regression for count data! ©
1 29
Introduction to Poisson distribution
Let X be distributed as a Poisson random variable with single
parameter λ
P(X = k) =
e−kλk
k!
k ∈ (0, 1, 2, 3, 4, · · · )
X is a discrete random
variable with
probabilities expressed
in whole #s
2 29
Introduction to Poisson distribution
If Y ∼ Poisson(λ), then
E(Y) = λ and Var(Y) = λ
Mean and variance are equal, and variance is tied to mean
If mean of Y increases with covariate X, so does variance of Y
3 29
Framework: Poisson regression
Poisson regression model:
ln(λi) = β0 + β1X1i + β2X2i + · · · + βkXki
where
λi = eβ0+β1X1i+β2X2i+···+βkXki
Poisson parameter λi depends on covariates of each
observation
I So, each observation can have its own mean
Again, mean depends on covariates, and variance depends
on covariates
4 29
Background: Poisson regression
Poisson regression is another generalized linear model
Instead of a log function of Bernoulli parameter πi (logistic
regression), we use a log function of Poisson parameter λi
λi > 0 → −∞ < ln(λi) < ∞
5 29
Background: Poisson regression
The logit function in logistic model and log function in
Poisson model are called the link functions for these GLMs
In this modeling, we assume that ln(λi) is linearly related to
independent variables
I And that mean and variance are equal for a given λi
An iterative process is used to solve the likelihood equations
and get maximum likelihood estimates (MLE)
I If you’re interested in this specifically applied with Poisson,
check out Gill (2001)
6 29
Zoology Example: mating of elephants
There is competition for female mates between young and
old male elephants1
Male elephants continue to grow throughout their lives →
older elephants are larger and Pr(Successful mating) ↑
Variables:
I Response: # of
mates
I Predictor: Age of
male elephant
(years)
1
Source: J. H. Poole, Mate Guarding, Reproductive Success and Female Choice in
African Elephants, Animal Behavior 37 (1989): 842-49
7 29
Zoology Example: mating of elephants
Let’s look at jitter scatterplot first
30 35 40 45 50
0
2
4
6
8
Age
Number
of
Mates
It looks like the number
of mates tends to be
higher for older
elephants
Seems to be more
variability in the
number of mates as
age increases
Elephants of age 30
have between 0 and 4
mates
Elephants of age 45
have between 0 and 9
mates
8 29
Zoology Example: Poisson regression model
If dispersion (variance) ↑ with mean for a count response,
then Poisson regression may be a good modeling choice
I Why? Because variance is tied to mean!
ln(λi) = β̂0 + β̂1X
1 elephant_poisson <− glm ( Matings ~ Age , data=elephant , family =poisson )
(Intercept) −1.582∗∗
(0.545)
Age_in_Years 0.069∗∗∗
(0.014)
AIC 156.458
BIC 159.885
Log Likelihood -76.229
Deviance 51.012
Num. obs. 41
∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05
9 29
Example: Poisson regression curve
Add fitted curve to scatterplot:
1 coeffs <− coefficients (
elephant_poisson )
2 xvalues <− sort ( elephant$
Age )
3 means <− exp ( coeffs [ 1 ] +
coeffs [ 2 ] * xvalues )
4 lines ( xvalues , means , l t y
=2 , col = " red " )
30 35 40 45 50
0
2
4
6
8
Age
Number
of
Mates
Poisson regression is a nonlinear model for E[Y]
10 29
Example: significance test
(Intercept) −1.582∗∗
(0.545)
Age_in_Years 0.069∗∗∗
(0.014)
AIC 156.458
BIC 159.885
Log Likelihood -76.229
Deviance 51.012
Num. obs. 41
∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05
Age is a reliable and
positive predictor of # of
mates for an elephant
11 29
Example: parameter interpretation
One covariate: ln(λi) = β0 + β1Xi
β0 : eβ0 is mean of Poisson distribution when X = 0
β1 : Increasing X by 1 unit has a multiplicative effect on the
mean of Poisson by eβ1
λ(x+1)
λ(x)
=
eβ0+β1(x+1)
eβ0+β1x
=
eβ
0eβ1xebeta1
eβ0 eβ1x
= eβ1
λ(x+1) = λ(x)eβ1
If β1 > 0, then expected count increases as X increases
If β1 < 0, then expected count decreases as X increases
12 29
Example: parameter interpretation
For the elephant data:
β̂0 : No inherent meaning in the context of the data since
age= 0 is not meaningful, outside of range of possible data
Since coefficient is positive, expected # of mates ↑ with age
β̂1 : An increase of 1 year in age increases expected number
of elephant mates by a multiplicative factor of e0.06859 ≈ 1.07
13 29
Example: Getting fitted values
Fitted model:
λi = eβ̂0+β̂1Xi
What is fitted count for an elephant of 30 years?
Estimated mean number of mates = 1.6
Estimated variance in number of mates = 1.6
14 29
Example: Estimating fitted values
λi = eβ̂0+β̂1Xi
What is fitted count for an elephant of 45 years?
Estimated mean number of mates = 4.5
Estimated variance in number of mates = 4.5
15 29
Getting fitted values in R
1 predicted_values <− cbind ( predict ( elephant_poisson , data . frame ( Age = seq (25 , 55 , 5) ) ,
type=" response " , se . f i t =TRUE ) , data . frame ( Age = seq (25 , 55 , 5) ) )
2 # create lower and upper bounds for CIs
3 predicted_values$lowerBound <− predicted_values$ f i t − 1.96 * predicted_values$se . f i t
4 predicted_values$upperBound <− predicted_values$ f i t + 1.96 * predicted_values$se . f i t
5
10
3
0
4
0
5
0
Age (Years)
Predicted
#
of
mates
16 29
Assumptions: Over-dispersion
Assuming that model is correctly specified, assumption that
conditional variance is equal to conditional mean should be
checked
There are several tests including the likelihood ratio test of
over-dispersion parameter alpha by running same model
using negative binomial distribution
R package AER provides many functions for count data
including dispersiontest for testing over-dispersion
One common cause of over-dispersion is excess zeros, which
in turn are generated by an additional data generating
process
In this situation, zero-inflated model should be considered
17 29
Zero inflatied poisson: # of mates
# of mates
Frequency
0 2 4 6 8
0
2
4
6
8
10
12
14
Though predictors do
seem to impact
distribution of
elephant mates,
Poisson regression
may not be a good fit
(large # of 0s)
We’ll check by
I Running an
over-dispersion
test
I Fit a zero-inflated
Poisson
regression
18 29
Over-dispersion test in R
1 # check equal variance assumption
2 dispersiontest ( elephant_poisson )
Overdispersion test
data: elephant_poisson
z = 0.49631, p-value = 0.3098
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion
1.107951
Doesn’t seem like we really need a ZIP model, but we’ll do it
anyway...
19 29
Intuition behind Zero-inflated Poisson
In terms of fitting the model, we combine logistic regression
model and Poisson regression model
ZIP model:
I We model probability of being a perfect zero as a logistic
regression
I Then, we model Poisson part as a Poisson regression
There are two generalized linear models working together to
explain data
20 29
ZIP model in R
R contributed package “pscl" contains the function zeroinfl:
1 # same equation for l o g i t and poisson
2 z e r o i n f l _poisson <− z e r o i n f l ( Matings ~ Age , data=elephant , dist =" poisson " )
Count model: (Intercept) −1.45∗∗
(0.55)
Count model: Age_in_Years 0.07∗∗∗
(0.01)
Zero model: (Intercept) 222.47
(232.27)
Zero model: Age_in_Years −8.12
(8.44)
AIC 157.88
Log Likelihood -74.94
Num. obs. 41
Further evidence we don’t really need zero-inflated model
21 29
Exposure Variables: Offset parameter
Count data often have an exposure variable, which indicates
# of times event could have happened
This variable should be incorporated into a Poisson model
using offset option
22 29
Ex: Food insecurity in Tanzania and Mozambique
Survey data from households about agriculture
Covered such things as:
I Household features (e.g. construction materials used,
number of household members)
I Agricultural practices (e.g. water usage)
I Assets (e.g. number and types of livestock)
I Details about the household members
Collected through interviews conducted between Nov. 2016 -
June 2017 using forms downloaded to Android Smartphones
23 29
What predicts owning more livestock?
Outcome: Livestock count [1-5]
Predictors:
I # of years lived in village
I # of people who live in household
I Whether they’re apart of a farmer cooperative
I Conflict with other farmers
24 29
Owning Livestock: Estimate poisson regression
1 # load data
2 s a f i <− read . csv ( " https : //raw .
githubusercontent . com/ASDS−
TCD/ S t a t s I I _Spring2023/main
/datasets/SAFI . csv " ,
stringsAsFactors = T )
1
2 # estimate poisson regression
model
3 s a f i _poisson <− glm ( l i v _count ~
no_membrs + years_ l i v +
memb_assoc + affect _
conflicts , data= safi ,
family =poisson )
(Intercept) 0.40∗∗
(0.15)
no_membrs 0.03
(0.02)
years_liv 0.01∗
(0.00)
memb_assoc_yes −0.03
(0.16)
affect_conflicts_frequently 0.09
(0.24)
affect_conflicts_more_once 0.14
(0.15)
affect_conflicts_once 0.09
(0.25)
AIC 417.98
BIC 438.11
Log Likelihood −201.99
Deviance 54.52
N 131
∗∗∗p < 0.001; ∗∗p < 0.01; ∗p < 0.05
25 29
Owning Livestock: Poisson regression curve
Add fitted curve to scatterplot:
0 20 40 60 80
1
2
3
4
5
Years lived in village
Number
of
livestock
As # of years in village ↑, ↑ expected # of livestock
26 29
Owning Livestock: Fitted values in R
1 s a f i _ex <− data . frame (no_membrs = rep (mean( s a f i $no_membrs) , 6) ,
2 years_ l i v = seq ( 1 , 60 , 10) ,
3 memb_assoc = rep ( "no" , 6) ,
4 affect _ c o n f l i c t s = rep ( " never " , 6) )
5 pred_ s a f i <− cbind ( predict ( s a f i _poisson , s a f i _ex , type= " response " , se . f i t =TRUE ) , s a f i _ex )
1.5
2.0
2.5
3.0 0
1
0
2
0
3
0
4
0
5
0
Years in village
Predicted
#
of
livestock
27 29
Owning Livestock: Over-dispersion
1 dispersiontest ( s a f i _poisson )
Overdispersion test
data: safi_poisson
z = -12.433, p-value = 1
alternative hypothesis: true dispersion is greater than 1
sample estimates:
dispersion
0.4130252
Don’t really need a ZIP model
28 29
Wrap Up
In this lesson, we went over how to...
Estimate and interpret a Poisson regression for count data
Next time, we’ll talk about...
Duration models
Censoring & truncation
Selection
29 / 29

Contenu connexe

Similaire à 9_Poisson_printable.pdf

Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov Chains
University of Salerno
 
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
ohenebabismark508
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Umberto Picchini
 

Similaire à 9_Poisson_printable.pdf (20)

The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-test
 
4_logit_printable_.pdf
4_logit_printable_.pdf4_logit_printable_.pdf
4_logit_printable_.pdf
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
Foundations of Statistics for Ecology and Evolution. 4. Maximum Likelihood
Foundations of Statistics for Ecology and Evolution. 4. Maximum LikelihoodFoundations of Statistics for Ecology and Evolution. 4. Maximum Likelihood
Foundations of Statistics for Ecology and Evolution. 4. Maximum Likelihood
 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov Chains
 
Lecture 1 maximum likelihood
Lecture 1 maximum likelihoodLecture 1 maximum likelihood
Lecture 1 maximum likelihood
 
L1 updated introduction.pptx
L1 updated introduction.pptxL1 updated introduction.pptx
L1 updated introduction.pptx
 
Interpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptxInterpreting Logistic Regression.pptx
Interpreting Logistic Regression.pptx
 
Input analysis
Input analysisInput analysis
Input analysis
 
Estimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, BelgiumEstimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, Belgium
 
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
 
Research Assignment INAR(1)
Research Assignment INAR(1)Research Assignment INAR(1)
Research Assignment INAR(1)
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptx
 
Calibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced ClassificationCalibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced Classification
 
Eigenvalues for HIV-1 dynamic model with two delays
Eigenvalues for HIV-1 dynamic model with two delaysEigenvalues for HIV-1 dynamic model with two delays
Eigenvalues for HIV-1 dynamic model with two delays
 
JISA_Paper
JISA_PaperJISA_Paper
JISA_Paper
 
Slides ensae-2016-9
Slides ensae-2016-9Slides ensae-2016-9
Slides ensae-2016-9
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
Survival analysis 1
Survival analysis 1Survival analysis 1
Survival analysis 1
 

Dernier

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Dernier (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 

9_Poisson_printable.pdf

  • 1. Week 9: Count Data - Poisson Regression Applied Statistical Analysis II Jeffrey Ziegler, PhD Assistant Professor in Political Science & Data Science Trinity College Dublin Spring 2023
  • 2. Roadmap through Stats Land Where we’ve been: Over-arching goal: We’re learning how to make inferences about a population from a sample Last time: We learned how to conduct a linear regression when our outcome is an (un)ordered category Today we will: Review exam Estimate & interpret a Poisson regression for count data! © 1 29
  • 3. Introduction to Poisson distribution Let X be distributed as a Poisson random variable with single parameter λ P(X = k) = e−kλk k! k ∈ (0, 1, 2, 3, 4, · · · ) X is a discrete random variable with probabilities expressed in whole #s 2 29
  • 4. Introduction to Poisson distribution If Y ∼ Poisson(λ), then E(Y) = λ and Var(Y) = λ Mean and variance are equal, and variance is tied to mean If mean of Y increases with covariate X, so does variance of Y 3 29
  • 5. Framework: Poisson regression Poisson regression model: ln(λi) = β0 + β1X1i + β2X2i + · · · + βkXki where λi = eβ0+β1X1i+β2X2i+···+βkXki Poisson parameter λi depends on covariates of each observation I So, each observation can have its own mean Again, mean depends on covariates, and variance depends on covariates 4 29
  • 6. Background: Poisson regression Poisson regression is another generalized linear model Instead of a log function of Bernoulli parameter πi (logistic regression), we use a log function of Poisson parameter λi λi > 0 → −∞ < ln(λi) < ∞ 5 29
  • 7. Background: Poisson regression The logit function in logistic model and log function in Poisson model are called the link functions for these GLMs In this modeling, we assume that ln(λi) is linearly related to independent variables I And that mean and variance are equal for a given λi An iterative process is used to solve the likelihood equations and get maximum likelihood estimates (MLE) I If you’re interested in this specifically applied with Poisson, check out Gill (2001) 6 29
  • 8. Zoology Example: mating of elephants There is competition for female mates between young and old male elephants1 Male elephants continue to grow throughout their lives → older elephants are larger and Pr(Successful mating) ↑ Variables: I Response: # of mates I Predictor: Age of male elephant (years) 1 Source: J. H. Poole, Mate Guarding, Reproductive Success and Female Choice in African Elephants, Animal Behavior 37 (1989): 842-49 7 29
  • 9. Zoology Example: mating of elephants Let’s look at jitter scatterplot first 30 35 40 45 50 0 2 4 6 8 Age Number of Mates It looks like the number of mates tends to be higher for older elephants Seems to be more variability in the number of mates as age increases Elephants of age 30 have between 0 and 4 mates Elephants of age 45 have between 0 and 9 mates 8 29
  • 10. Zoology Example: Poisson regression model If dispersion (variance) ↑ with mean for a count response, then Poisson regression may be a good modeling choice I Why? Because variance is tied to mean! ln(λi) = β̂0 + β̂1X 1 elephant_poisson <− glm ( Matings ~ Age , data=elephant , family =poisson ) (Intercept) −1.582∗∗ (0.545) Age_in_Years 0.069∗∗∗ (0.014) AIC 156.458 BIC 159.885 Log Likelihood -76.229 Deviance 51.012 Num. obs. 41 ∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05 9 29
  • 11. Example: Poisson regression curve Add fitted curve to scatterplot: 1 coeffs <− coefficients ( elephant_poisson ) 2 xvalues <− sort ( elephant$ Age ) 3 means <− exp ( coeffs [ 1 ] + coeffs [ 2 ] * xvalues ) 4 lines ( xvalues , means , l t y =2 , col = " red " ) 30 35 40 45 50 0 2 4 6 8 Age Number of Mates Poisson regression is a nonlinear model for E[Y] 10 29
  • 12. Example: significance test (Intercept) −1.582∗∗ (0.545) Age_in_Years 0.069∗∗∗ (0.014) AIC 156.458 BIC 159.885 Log Likelihood -76.229 Deviance 51.012 Num. obs. 41 ∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05 Age is a reliable and positive predictor of # of mates for an elephant 11 29
  • 13. Example: parameter interpretation One covariate: ln(λi) = β0 + β1Xi β0 : eβ0 is mean of Poisson distribution when X = 0 β1 : Increasing X by 1 unit has a multiplicative effect on the mean of Poisson by eβ1 λ(x+1) λ(x) = eβ0+β1(x+1) eβ0+β1x = eβ 0eβ1xebeta1 eβ0 eβ1x = eβ1 λ(x+1) = λ(x)eβ1 If β1 > 0, then expected count increases as X increases If β1 < 0, then expected count decreases as X increases 12 29
  • 14. Example: parameter interpretation For the elephant data: β̂0 : No inherent meaning in the context of the data since age= 0 is not meaningful, outside of range of possible data Since coefficient is positive, expected # of mates ↑ with age β̂1 : An increase of 1 year in age increases expected number of elephant mates by a multiplicative factor of e0.06859 ≈ 1.07 13 29
  • 15. Example: Getting fitted values Fitted model: λi = eβ̂0+β̂1Xi What is fitted count for an elephant of 30 years? Estimated mean number of mates = 1.6 Estimated variance in number of mates = 1.6 14 29
  • 16. Example: Estimating fitted values λi = eβ̂0+β̂1Xi What is fitted count for an elephant of 45 years? Estimated mean number of mates = 4.5 Estimated variance in number of mates = 4.5 15 29
  • 17. Getting fitted values in R 1 predicted_values <− cbind ( predict ( elephant_poisson , data . frame ( Age = seq (25 , 55 , 5) ) , type=" response " , se . f i t =TRUE ) , data . frame ( Age = seq (25 , 55 , 5) ) ) 2 # create lower and upper bounds for CIs 3 predicted_values$lowerBound <− predicted_values$ f i t − 1.96 * predicted_values$se . f i t 4 predicted_values$upperBound <− predicted_values$ f i t + 1.96 * predicted_values$se . f i t 5 10 3 0 4 0 5 0 Age (Years) Predicted # of mates 16 29
  • 18. Assumptions: Over-dispersion Assuming that model is correctly specified, assumption that conditional variance is equal to conditional mean should be checked There are several tests including the likelihood ratio test of over-dispersion parameter alpha by running same model using negative binomial distribution R package AER provides many functions for count data including dispersiontest for testing over-dispersion One common cause of over-dispersion is excess zeros, which in turn are generated by an additional data generating process In this situation, zero-inflated model should be considered 17 29
  • 19. Zero inflatied poisson: # of mates # of mates Frequency 0 2 4 6 8 0 2 4 6 8 10 12 14 Though predictors do seem to impact distribution of elephant mates, Poisson regression may not be a good fit (large # of 0s) We’ll check by I Running an over-dispersion test I Fit a zero-inflated Poisson regression 18 29
  • 20. Over-dispersion test in R 1 # check equal variance assumption 2 dispersiontest ( elephant_poisson ) Overdispersion test data: elephant_poisson z = 0.49631, p-value = 0.3098 alternative hypothesis: true dispersion is greater than 1 sample estimates: dispersion 1.107951 Doesn’t seem like we really need a ZIP model, but we’ll do it anyway... 19 29
  • 21. Intuition behind Zero-inflated Poisson In terms of fitting the model, we combine logistic regression model and Poisson regression model ZIP model: I We model probability of being a perfect zero as a logistic regression I Then, we model Poisson part as a Poisson regression There are two generalized linear models working together to explain data 20 29
  • 22. ZIP model in R R contributed package “pscl" contains the function zeroinfl: 1 # same equation for l o g i t and poisson 2 z e r o i n f l _poisson <− z e r o i n f l ( Matings ~ Age , data=elephant , dist =" poisson " ) Count model: (Intercept) −1.45∗∗ (0.55) Count model: Age_in_Years 0.07∗∗∗ (0.01) Zero model: (Intercept) 222.47 (232.27) Zero model: Age_in_Years −8.12 (8.44) AIC 157.88 Log Likelihood -74.94 Num. obs. 41 Further evidence we don’t really need zero-inflated model 21 29
  • 23. Exposure Variables: Offset parameter Count data often have an exposure variable, which indicates # of times event could have happened This variable should be incorporated into a Poisson model using offset option 22 29
  • 24. Ex: Food insecurity in Tanzania and Mozambique Survey data from households about agriculture Covered such things as: I Household features (e.g. construction materials used, number of household members) I Agricultural practices (e.g. water usage) I Assets (e.g. number and types of livestock) I Details about the household members Collected through interviews conducted between Nov. 2016 - June 2017 using forms downloaded to Android Smartphones 23 29
  • 25. What predicts owning more livestock? Outcome: Livestock count [1-5] Predictors: I # of years lived in village I # of people who live in household I Whether they’re apart of a farmer cooperative I Conflict with other farmers 24 29
  • 26. Owning Livestock: Estimate poisson regression 1 # load data 2 s a f i <− read . csv ( " https : //raw . githubusercontent . com/ASDS− TCD/ S t a t s I I _Spring2023/main /datasets/SAFI . csv " , stringsAsFactors = T ) 1 2 # estimate poisson regression model 3 s a f i _poisson <− glm ( l i v _count ~ no_membrs + years_ l i v + memb_assoc + affect _ conflicts , data= safi , family =poisson ) (Intercept) 0.40∗∗ (0.15) no_membrs 0.03 (0.02) years_liv 0.01∗ (0.00) memb_assoc_yes −0.03 (0.16) affect_conflicts_frequently 0.09 (0.24) affect_conflicts_more_once 0.14 (0.15) affect_conflicts_once 0.09 (0.25) AIC 417.98 BIC 438.11 Log Likelihood −201.99 Deviance 54.52 N 131 ∗∗∗p < 0.001; ∗∗p < 0.01; ∗p < 0.05 25 29
  • 27. Owning Livestock: Poisson regression curve Add fitted curve to scatterplot: 0 20 40 60 80 1 2 3 4 5 Years lived in village Number of livestock As # of years in village ↑, ↑ expected # of livestock 26 29
  • 28. Owning Livestock: Fitted values in R 1 s a f i _ex <− data . frame (no_membrs = rep (mean( s a f i $no_membrs) , 6) , 2 years_ l i v = seq ( 1 , 60 , 10) , 3 memb_assoc = rep ( "no" , 6) , 4 affect _ c o n f l i c t s = rep ( " never " , 6) ) 5 pred_ s a f i <− cbind ( predict ( s a f i _poisson , s a f i _ex , type= " response " , se . f i t =TRUE ) , s a f i _ex ) 1.5 2.0 2.5 3.0 0 1 0 2 0 3 0 4 0 5 0 Years in village Predicted # of livestock 27 29
  • 29. Owning Livestock: Over-dispersion 1 dispersiontest ( s a f i _poisson ) Overdispersion test data: safi_poisson z = -12.433, p-value = 1 alternative hypothesis: true dispersion is greater than 1 sample estimates: dispersion 0.4130252 Don’t really need a ZIP model 28 29
  • 30. Wrap Up In this lesson, we went over how to... Estimate and interpret a Poisson regression for count data Next time, we’ll talk about... Duration models Censoring & truncation Selection 29 / 29