7. logistics regression using spss

Dr Nisha Arora
Logistic Regression using SPSS

Object-wise Analysis
4
Steps to select appropriate statistical
test
 Define clearly the objective of the
study
 Define the level of measurement
(metric/non-metric) of each variable
to be included in the analysis.

Selecting the appropriate technique
10
Bivariate techniques
Response Variable (DV)
Explanatory
Variable
(IDV)
Metric Non-metric
Metric Regression Logistic
Regression/
LDA
Non-metric Dummy Var
Reg./
Hypothesis
Test*
Chi-square
test
Make sure to check all assumptions before applying any statistical
technique.

Selecting the appropriate technique
12
Response Variable(s) (DVs)
One DV More than
one DV
Explanatory
Variable(s)
(IDVs)
One IDV
Metric Non-metric Metric
Metric Simple
Regression
Binary/Multi
Nominal
(Logistic) Reg
Path
Analysis
Non-metric t test/Anova Chi Square
Test
Manova
More
than one
IDV
All Metric Multiple Reg Multiple Logit
Reg/Multiple
Multinominal
Path
Analysis
All Non-
metric
n – way Anova Complex
Crosstab/
Log-linear
analysis
n – way
Manova
Mixed n – way
Ancova/Dumm
y var
Multiple Logit
Reg/Multiple
Multinominal
n– way
Mancova

Selecting the appropriate Technique
13
Binary (Binomial) Logistic Regression
Multi-Nominal Logistic Regression
Ordinal Logistic regression
Poisson Regression

• Response has only two 2 possible outcomes.
• E.g.: Spam or Not
Binary
• Three or more categories without ordering.
• E.g.: Predicting which food is preferred more
(Veg, Non-Veg, Vegan)
Multinominal
• Three or more categories with ordering.
• E.g.: Movie rating from 1 to 5
Ordinal
14
Types of Logistic Regression

Prediction or Classification?
15

16
Types of Classification Problems
Multi-Label
Classification
Multi-Class
Classification
Binary
Classification

17
 To predict in advance whether a product launch will be
successful or not
 An online banking service must be able to determine whether or
not a transaction being performed on the site is fraudulent
 Benign or malignant tumor
 Spam detection
 Movies genres classification
Classification Problem

Box-Tidwell Test
 In the model, include interactions between the continuous predictors and
their logs.
 If such an interaction is significant, then the assumption has been
violated.
 If any interaction is significant, try adding to the model powers of the
predictor (that is, going polynomial)
Caution:
 Not a very robust test as it gets affected by sample size.
 You should not be very concerned with a just significant interaction
when sample sizes are large.

Assumptions of Logit Regression
 Binary response variable with mutually exclusive and exhaustive
categories.
 One or more predictor variable(s)
 Independent Observations
 linear relationship between continuous independent variable(s) and
the logit transformation of the dependent variable
 This assumption can be tested by using Box-Tidwell Test.
 including in the model interactions between the continuous
predictors and their logs. If such an interaction is significant, then
the assumption has been violated.

Assumptions of Logit Regression
 Binary response variable with mutually exclusive and exhaustive
categories.
 One or more predictor variable(s)
 Independent Observations
 linear relationship between continuous independent variable(s) and
the logit transformation of the dependent variable
What about Co-linearity, Perfect co-linearity, and Multi-co-linearity?
https://stats.stackexchange.com/a/432543/79100

More Discussion on Multi-Colinearity
• What happens when you’ve Multi-Colinearity?
Multicollinearity isn't as deleterious for prediction but may affect variable’s
Significance
https://stats.stackexchange.com/questions/168622/why-is-multicollinearity-not-
checked-in-modern-statistics-machine-learning
• Can you safely ignore Multi-Colineairty?
https://statisticalhorizons.com/multicollinearity
• How to handle Multi-colinearity?
https://www.researchgate.net/post/how_to_deal_with_multicolinearity#view=580e
f132ed99e1c1046fcf01
• Why not to use STEP_WISE method?
http://www.philender.com/courses/linearmodels/notes4/swprobs.html
http://www.danielezrajohnson.com/stepwise.pdf

Assumptions _ More Considerations
 Logistic regression typically requires a large sample size because they
use maximum likelihood estimation techniques. [maximum likelihood
estimates are less powerful at low sample sizes than ordinary least
square].
 It is also important to keep in mind that when the outcome is rare, even
if the overall dataset is large, it can be difficult to estimate a logit model.
 Empty cells or small cells: You should check for empty or small cells
by doing a crosstab between categorical predictors and the outcome
variable. If a cell has very few cases (a small cell), the model may
become unstable or it might not run at all.

26
Why can’t we use Linear
Regression for
Classification Problems?

What is Logistic Regression?
The Logistic Regression Curve is
called as “Sigmoid Curve”, also
known as S-Curve
How to decide whether
the value is 0 or 1 from
this curve?
Set a
threshold

 Default - 0.5
 Based on group sizes (as we do in LDA)
 Based on performance evaluation matrix using cross validation
31
How to set a threshold?

Logistic Regression Equation
Rather than modeling this response Y directly,
Logistic regression models the probability that Y belongs
to a particular category.
 P(Y =1 | X) or P(X) can take values from 0 to 1
n
n X
X
X
P
X
P


 












...
)
(
1
)
(
log 1
1
0
How to
interpret
?

Logistic Regression Equation
 Alternatively, we can write
 Or
n
n X
X
e
X
P
X
P 

 




...
1
1
0
)
(
1
)
(
 
 
n
n
n
n
X
X
X
X
e
e
X
P 












 ...
...
1
1
0
1
1
0
1
)
(

Understanding the Odds
Exp(B) represents the ratio-change in the odds of the event of interest for a one-unit
change in the predictor.
n
n X
X
e
X
P
X
P 

 




...
1
1
0
)
(
1
)
(

Understanding Odds
Logit = log (Odds) = Log (p/1-p)
= log (probability of event happening/ probability of
event not happening)
Odds Ratio/ OR =
0
0
|
_
_
_
_
_
1
|
_
_
_
_
_
X
X
Y
event
of
favor
in
Odds
X
X
Y
event
of
favor
in
Odds




Interpreting the coefficients
41
Response: default [Y/N] Predictor: [Account] balance
Estimated coefficients of the logistic regression model that predicts the
probability of default using balance.
A one-unit increase in balance is associated with
An increase in the log odds of default by 0.0055 units. OR
A change in odds by exp(0.0055), i.e., 1.0055

Interpreting the coefficients
42
Probability of default for an individual with a balance of $1, 000 is
Probability of default for an individual with a balance of $2, 000 is
%
576
.
0
00576
.
0
1 1000
*
0055
.
0
6513
.
10
1000
*
0055
.
0
6513
.
10


 



e
e
%
6
.
58
586
.
0
1 2000
*
0055
.
0
6513
.
10
2000
*
0055
.
0
6513
.
10


 



e
e

Maximum Likelihood Estimation
The objective: Not to “correctly” estimate the logit, but to make better
classification.
Parameters should take values which result in such a score [probabilities or p]
which enables us to have a good cutoff.
Meaning this “score” should be high for one class and low for another
If P(Yi = 1|Xi) = P(Xi) = Pi, then
To maximize collective form of this function for all observations
Maximum Likelihood Estimation
  
i
i Y
i
Y
i
i P
P
L



1
1
*
n
i
i
L
Max
MaxL
1




Note Points
 For a standard logistic regression you should ignore
the Previous and Next buttons because they are for sequential (hierarchical)
logistic regression.
 The Method: option needs to be kept at the default value, which is Enter Method.
 The "Enter" method is the name given by SPSS Statistics to standard regression
analysis.
 SPSS Statistics requires you to define all the categorical predictor values in the
logistic regression model. It does not do this automatically.
 The default behaviour in SPSS Statistics is for the last category (numerically) to
be selected as the reference category.
 If we change the method from Enter to Forward: Wald the quality of the logistic
regression improves. Now only the significant coefficients are included in the
logistic regression equation.
47 https://statistics.laerd.com/

48
Interpretation of SPSS Output

Using ROC to find Optimal Cut-Off
65

How to report the results SPSS
A logistic regression was performed to ascertain the effects of x, y, and gender on the
likelihood that participants to have the event (positive response).
1. The logistic regression model was statistically significant, χ2(df) = 28.605, p < 0.05
[Omninus test]
2. A non-significant test result (p=0.78) of Hosmer Lemeshow test is an indicator of
good model fit.
3. The psudeo R2 measures for explained variations are: 56.4% (Cox & Snell R2) and
67.8% (Nagelkerke R2) [For validation data, psudeo R2 …]
… … … … … … … … … … … … … … … …
66

1. The model correctly classified 81.0% (model accuracy) of cases for the training
data set and 76 % of cases for validation data set.
[The data set was randomly divided into training & validation set with 70%
observation into training and rest of the observations into the validation set.]
2. The model specificity
3. Sensitivity
At cut-off value
2. ROC curve was used to optimize cut-off point
… … … … … … … … … … … … … … … …
67

The results from the "Variables in the Equation" table, including which of the
predictor variables were statistically significant and what predictions can be made
based on the use of odds ratios. E.g.,
Males were 6.02 times more likely to do this (event) than females.
Increasing x was associated with an decrease in likelihood of the event, but increasing y
was associated with a reduction in the likelihood of the event
… … … … … … … … … … … … … … … …
68

Box-Tidwell (1962) Test:
69

Evaluation Matrices
 AIC
 Null & Residual Deviance
 Accuracy & Misclassification Error
 Sensitivity & Specificity
 ROC & AUC
 Precision & Recall
 Lift & Gain
 KS Statistics
 F Scores
 FDR & FOR
 FPR & FNR
 Hosmer Lemeshow Test
 Customized function specific to business requirement
https://learnerworld.tumblr.com/
How to choose
appropriate
evaluation
matrices?

ROC Curve_Applet
74
https://kennis-research.shinyapps.io/ROC-
Curves/

Other Considerations
 Categorical Predictors
 Accuracy Paradox
 Balanced, Unbalanced & Rare Event Data
 Complete or Quasi-separation
 Psudo R2 Measures
 Multinomial and Ordinal Logistic Regression Homoskedasticity
is not an
assumption in
logistic
regression

Evaluation Matrices
 Efron’s R2
 McFadden’s R2
 McFadden’s Adjusted R2
 Cox & Snell R2
 Nagelkerke / Cragg & Uhler’s R2
 McKelvey & Zavoina R2
 Count R2
 Adjusted Count R2
 https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-
pseudo-r-squareds/

References
78
 Field, A. P. (2013). Discovering statistics using IBM SPSS
Statistics: and sex and drugs and rock 'n' roll (fourth edition).
London: Sage publications.
 Field, A. P., Miles, J. N. V., & Field, Z. C. (2012). Discovering
statistics using R: and sex and drugs and rock 'n' roll. London: Sage
publications.
 Field, A. P. & Miles, J. N. V. (2010). Discovering statistics using
SAS: and sex and drugs and rock 'n' roll. London: Sage
publications.
 Kothri, C. R. (2004). Research methodology : methods &
techniques. New Age publications.

79
My Interesting answers/posts
To understand results of logistics regression or other classifiers
https://learnerworld.tumblr.com/post/152327498485/enjoystatisticswith
mebinaryclassifierperformance
Hypothesis testing in layman’s terms
https://learnerworld.tumblr.com/search/hypothesis
Understanding mediation effect
https://learnerworld.tumblr.com/post/146541892120/mediation-
effectenjoystatisticswtihme

80
My Interesting answers/posts
Dependence Vs Correlation
https://www.quora.com/What-is-the-difference-between-dependence-
and-correlation/answer/Nisha-Arora-9
Co-linearity & Correlation
https://www.quora.com/In-statistics-what-is-the-difference-between-
collinearity-and-correlation/answer/Nisha-Arora-9

81
My Expertise
Technical Topics:
 Python for Data Science or Data Analysis
 R Programming
 Data Visualization & Storytelling
 Machine Learning/Data Science
 Statistics [For researchers/Data Science practitioners/ university
students] _Theory/mathematical proofs/application based/using
interactive tools/playing with data using some software
 Data Analysis using SPSS
 Mathematics [Don't want to write too much but depends on what is the
requirement]
 Excel [Basic to intermediate/tools for data analysis/operations
research/operations management/specific course for academicians, etc]
To know more about these,
click here

82
My Expertise
Non-technical Topics:
 Interactive pedagogical tools/web resources
 The art of effective use of Information & Communication Tools (ICT)
 Tools/Platform for hosting online lectures/meetings/live sessions
 Effective Googling for finding the right resources (books/ research
papers/ answers)
 Leveraging online research communities, Q/A sites, groups, meet-ups
to dive deep in a particular topic of interest
 Bridging the gap between industry & academia
 Creating a personal brand by leveraging power of social media
 Getting smart with MS Office (Word, Excel, Power Point, etc.)
 Learning Google products (mentioned in the slide)
 Learning how to learn
 Learning how to teach
 Note Taking
 Effective Communication & Presentation
Dr.aroranisha@gmail.com

Follow me
http://stats.stackexchange.com/users/79100/nisha-arora
http://stackoverflow.com/users/5114585/nisha-arora
https://www.researchgate.net/profile/Nisha_Arora2/contributions
https://www.quora.com/profile/Nisha-Arora-9
http://learnerworld.tumblr.com/
https://www.slideshare.net/NishaArora1
https://www.youtube.com/channel/UCniyhvrD_8AM2jXki3eEErw/videos
https://www.linkedin.com/in/drnishaarora/

Any other topic which you want to hear/learn from me?
Feel free to leave a comment on my YouTube or mail me at
dr.aroranisha@gmail.com

References
87
http://machinelearningmastery.com/
https://www.analyticsvidhya.com/
http://www.analyticbridge.com/
http://www.datasciencecentral.com/
https://www.kaggle.com/
http://stats.stackexchange.com
http://datascience.stackexchange.com/
https://www.researchgate.net
https://www.quora.com
https://github.com/

88
Reach Out to Me
http://stats.stackexchange.com/users/79100/nisha-arora
https://www.researchgate.net/profile/Nisha_Arora2/contributions
https://www.quora.com/profile/Nisha-Arora-9
http://learnerworld.tumblr.com/
nishaarora4@gmail.com

7. logistics regression using spss

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à 7. logistics regression using spss

Similaire à 7. logistics regression using spss (20)

Plus de Dr Nisha Arora

Plus de Dr Nisha Arora (15)

Dernier

Dernier (20)

7. logistics regression using spss