SlideShare une entreprise Scribd logo
1  sur  76
Télécharger pour lire hors ligne
Dr Nisha Arora
Logistic Regression using SPSS
2
Object-wise Analysis
4
Steps to select appropriate statistical
test
 Define clearly the objective of the
study
 Define the level of measurement
(metric/non-metric) of each variable
to be included in the analysis.
5
Selecting the appropriate technique
10
Bivariate techniques
Response Variable (DV)
Explanatory
Variable
(IDV)
Metric Non-metric
Metric Regression Logistic
Regression/
LDA
Non-metric Dummy Var
Reg./
Hypothesis
Test*
Chi-square
test
Make sure to check all assumptions before applying any statistical
technique.
Selecting the appropriate technique
12
Response Variable(s) (DVs)
One DV More than
one DV
Explanatory
Variable(s)
(IDVs)
One IDV
Metric Non-metric Metric
Metric Simple
Regression
Binary/Multi
Nominal
(Logistic) Reg
Path
Analysis
Non-metric t test/Anova Chi Square
Test
Manova
More
than one
IDV
All Metric Multiple Reg Multiple Logit
Reg/Multiple
Multinominal
Path
Analysis
All Non-
metric
n – way Anova Complex
Crosstab/
Log-linear
analysis
n – way
Manova
Mixed n – way
Ancova/Dumm
y var
Multiple Logit
Reg/Multiple
Multinominal
n– way
Mancova
Selecting the appropriate Technique
13
Binary (Binomial) Logistic Regression
Multi-Nominal Logistic Regression
Ordinal Logistic regression
Poisson Regression
• Response has only two 2 possible outcomes.
• E.g.: Spam or Not
Binary
• Three or more categories without ordering.
• E.g.: Predicting which food is preferred more
(Veg, Non-Veg, Vegan)
Multinominal
• Three or more categories with ordering.
• E.g.: Movie rating from 1 to 5
Ordinal
14
Types of Logistic Regression
Prediction or Classification?
15
16
Types of Classification Problems
Multi-Label
Classification
Multi-Class
Classification
Binary
Classification
17
 To predict in advance whether a product launch will be
successful or not
 An online banking service must be able to determine whether or
not a transaction being performed on the site is fraudulent
 Benign or malignant tumor
 Spam detection
 Movies genres classification
Classification Problem
Box-Tidwell Test
 In the model, include interactions between the continuous predictors and
their logs.
 If such an interaction is significant, then the assumption has been
violated.
 If any interaction is significant, try adding to the model powers of the
predictor (that is, going polynomial)
Caution:
 Not a very robust test as it gets affected by sample size.
 You should not be very concerned with a just significant interaction
when sample sizes are large.
Assumptions of Logit Regression
 Binary response variable with mutually exclusive and exhaustive
categories.
 One or more predictor variable(s)
 Independent Observations
 linear relationship between continuous independent variable(s) and
the logit transformation of the dependent variable
 This assumption can be tested by using Box-Tidwell Test.
 including in the model interactions between the continuous
predictors and their logs. If such an interaction is significant, then
the assumption has been violated.
Assumptions of Logit Regression
 Binary response variable with mutually exclusive and exhaustive
categories.
 One or more predictor variable(s)
 Independent Observations
 linear relationship between continuous independent variable(s) and
the logit transformation of the dependent variable
What about Co-linearity, Perfect co-linearity, and Multi-co-linearity?
https://stats.stackexchange.com/a/432543/79100
More Discussion on Multi-Colinearity
• What happens when you’ve Multi-Colinearity?
Multicollinearity isn't as deleterious for prediction but may affect variable’s
Significance
https://stats.stackexchange.com/questions/168622/why-is-multicollinearity-not-
checked-in-modern-statistics-machine-learning
• Can you safely ignore Multi-Colineairty?
https://statisticalhorizons.com/multicollinearity
• How to handle Multi-colinearity?
https://www.researchgate.net/post/how_to_deal_with_multicolinearity#view=580e
f132ed99e1c1046fcf01
• Why not to use STEP_WISE method?
http://www.philender.com/courses/linearmodels/notes4/swprobs.html
http://www.danielezrajohnson.com/stepwise.pdf
22
Assumptions _ More Considerations
 Logistic regression typically requires a large sample size because they
use maximum likelihood estimation techniques. [maximum likelihood
estimates are less powerful at low sample sizes than ordinary least
square].
 It is also important to keep in mind that when the outcome is rare, even
if the overall dataset is large, it can be difficult to estimate a logit model.
 Empty cells or small cells: You should check for empty or small cells
by doing a crosstab between categorical predictors and the outcome
variable. If a cell has very few cases (a small cell), the model may
become unstable or it might not run at all.
26
Why can’t we use Linear
Regression for
Classification Problems?
Why not Linear Regression?
29
What is Logistic Regression?
The Logistic Regression Curve is
called as “Sigmoid Curve”, also
known as S-Curve
How to decide whether
the value is 0 or 1 from
this curve?
Set a
threshold
 Default - 0.5
 Based on group sizes (as we do in LDA)
 Based on performance evaluation matrix using cross validation
31
How to set a threshold?
Logistic Regression Equation
Rather than modeling this response Y directly,
Logistic regression models the probability that Y belongs
to a particular category.
 P(Y =1 | X) or P(X) can take values from 0 to 1
n
n X
X
X
P
X
P


 












...
)
(
1
)
(
log 1
1
0
How to
interpret
?
35
Logistic Regression Equation
 Alternatively, we can write
 Or
n
n X
X
e
X
P
X
P 

 




...
1
1
0
)
(
1
)
(
 
 
n
n
n
n
X
X
X
X
e
e
X
P 












 ...
...
1
1
0
1
1
0
1
)
(
Logistic Regression Equation
 Alternatively, we can write
 Or
n
n X
X
e
X
P
X
P 

 




...
1
1
0
)
(
1
)
(
 
 
n
n
n
n
X
X
X
X
e
e
X
P 












 ...
...
1
1
0
1
1
0
1
)
(
Understanding the Odds
Exp(B) represents the ratio-change in the odds of the event of interest for a one-unit
change in the predictor.
n
n X
X
e
X
P
X
P 

 




...
1
1
0
)
(
1
)
(
39
Odds & Odds Ratio
Understanding Odds
Logit = log (Odds) = Log (p/1-p)
= log (probability of event happening/ probability of
event not happening)
Odds Ratio/ OR =
0
0
|
_
_
_
_
_
1
|
_
_
_
_
_
X
X
Y
event
of
favor
in
Odds
X
X
Y
event
of
favor
in
Odds



Interpreting the coefficients
41
Response: default [Y/N] Predictor: [Account] balance
Estimated coefficients of the logistic regression model that predicts the
probability of default using balance.
A one-unit increase in balance is associated with
An increase in the log odds of default by 0.0055 units. OR
A change in odds by exp(0.0055), i.e., 1.0055
Interpreting the coefficients
42
Probability of default for an individual with a balance of $1, 000 is
Probability of default for an individual with a balance of $2, 000 is
%
576
.
0
00576
.
0
1 1000
*
0055
.
0
6513
.
10
1000
*
0055
.
0
6513
.
10


 



e
e
%
6
.
58
586
.
0
1 2000
*
0055
.
0
6513
.
10
2000
*
0055
.
0
6513
.
10


 



e
e
43
Parameter Estimation
Maximum Likelihood Estimation
The objective: Not to “correctly” estimate the logit, but to make better
classification.
Parameters should take values which result in such a score [probabilities or p]
which enables us to have a good cutoff.
Meaning this “score” should be high for one class and low for another
If P(Yi = 1|Xi) = P(Xi) = Pi, then
To maximize collective form of this function for all observations
Maximum Likelihood Estimation
  
i
i Y
i
Y
i
i P
P
L



1
1
*
n
i
i
L
Max
MaxL
1



Log Likelihood Function
45
Let’s See It In Action
46
Note Points
 For a standard logistic regression you should ignore
the Previous and Next buttons because they are for sequential (hierarchical)
logistic regression.
 The Method: option needs to be kept at the default value, which is Enter Method.
 The "Enter" method is the name given by SPSS Statistics to standard regression
analysis.
 SPSS Statistics requires you to define all the categorical predictor values in the
logistic regression model. It does not do this automatically.
 The default behaviour in SPSS Statistics is for the last category (numerically) to
be selected as the reference category.
 If we change the method from Enter to Forward: Wald the quality of the logistic
regression improves. Now only the significant coefficients are included in the
logistic regression equation.
47 https://statistics.laerd.com/
48
Interpretation of SPSS Output
49
50
51
52
53
We do not report this
55
Omnibus Test Output
56
Omnibus Test Output
57
58
59
60
61
62
63
64
Using ROC to find Optimal Cut-Off
65
How to report the results SPSS
A logistic regression was performed to ascertain the effects of x, y, and gender on the
likelihood that participants to have the event (positive response).
1. The logistic regression model was statistically significant, χ2(df) = 28.605, p < 0.05
[Omninus test]
2. A non-significant test result (p=0.78) of Hosmer Lemeshow test is an indicator of
good model fit.
3. The psudeo R2 measures for explained variations are: 56.4% (Cox & Snell R2) and
67.8% (Nagelkerke R2) [For validation data, psudeo R2 …]
… … … … … … … … … … … … … … … …
66
How to report the results SPSS
1. The model correctly classified 81.0% (model accuracy) of cases for the training
data set and 76 % of cases for validation data set.
[The data set was randomly divided into training & validation set with 70%
observation into training and rest of the observations into the validation set.]
2. The model specificity
3. Sensitivity
At cut-off value
2. ROC curve was used to optimize cut-off point
… … … … … … … … … … … … … … … …
67
How to report the results SPSS
The results from the "Variables in the Equation" table, including which of the
predictor variables were statistically significant and what predictions can be made
based on the use of odds ratios. E.g.,
Males were 6.02 times more likely to do this (event) than females.
Increasing x was associated with an decrease in likelihood of the event, but increasing y
was associated with a reduction in the likelihood of the event
… … … … … … … … … … … … … … … …
68
How to report the results SPSS
Box-Tidwell (1962) Test:
69
70
Source: Andy Field
72
Model Evaluation
Evaluation Matrices
 AIC
 Null & Residual Deviance
 Accuracy & Misclassification Error
 Sensitivity & Specificity
 ROC & AUC
 Precision & Recall
 Lift & Gain
 KS Statistics
 F Scores
 FDR & FOR
 FPR & FNR
 Hosmer Lemeshow Test
 Customized function specific to business requirement
https://learnerworld.tumblr.com/
How to choose
appropriate
evaluation
matrices?
ROC Curve_Applet
74
https://kennis-research.shinyapps.io/ROC-
Curves/
Other Considerations
 Categorical Predictors
 Accuracy Paradox
 Balanced, Unbalanced & Rare Event Data
 Complete or Quasi-separation
 Psudo R2 Measures
 Multinomial and Ordinal Logistic Regression Homoskedasticity
is not an
assumption in
logistic
regression
76
Psudo R2 Measures
Evaluation Matrices
 Efron’s R2
 McFadden’s R2
 McFadden’s Adjusted R2
 Cox & Snell R2
 Nagelkerke / Cragg & Uhler’s R2
 McKelvey & Zavoina R2
 Count R2
 Adjusted Count R2
 https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-
pseudo-r-squareds/
References
78
 Field, A. P. (2013). Discovering statistics using IBM SPSS
Statistics: and sex and drugs and rock 'n' roll (fourth edition).
London: Sage publications.
 Field, A. P., Miles, J. N. V., & Field, Z. C. (2012). Discovering
statistics using R: and sex and drugs and rock 'n' roll. London: Sage
publications.
 Field, A. P. & Miles, J. N. V. (2010). Discovering statistics using
SAS: and sex and drugs and rock 'n' roll. London: Sage
publications.
 Kothri, C. R. (2004). Research methodology : methods &
techniques. New Age publications.
79
My Interesting answers/posts
To understand results of logistics regression or other classifiers
https://learnerworld.tumblr.com/post/152327498485/enjoystatisticswith
mebinaryclassifierperformance
Hypothesis testing in layman’s terms
https://learnerworld.tumblr.com/search/hypothesis
Understanding mediation effect
https://learnerworld.tumblr.com/post/146541892120/mediation-
effectenjoystatisticswtihme
80
My Interesting answers/posts
Dependence Vs Correlation
https://www.quora.com/What-is-the-difference-between-dependence-
and-correlation/answer/Nisha-Arora-9
Co-linearity & Correlation
https://www.quora.com/In-statistics-what-is-the-difference-between-
collinearity-and-correlation/answer/Nisha-Arora-9
81
My Expertise
Technical Topics:
 Python for Data Science or Data Analysis
 R Programming
 Data Visualization & Storytelling
 Machine Learning/Data Science
 Statistics [For researchers/Data Science practitioners/ university
students] _Theory/mathematical proofs/application based/using
interactive tools/playing with data using some software
 Data Analysis using SPSS
 Mathematics [Don't want to write too much but depends on what is the
requirement]
 Excel [Basic to intermediate/tools for data analysis/operations
research/operations management/specific course for academicians, etc]
To know more about these,
click here
82
My Expertise
Non-technical Topics:
 Interactive pedagogical tools/web resources
 The art of effective use of Information & Communication Tools (ICT)
 Tools/Platform for hosting online lectures/meetings/live sessions
 Effective Googling for finding the right resources (books/ research
papers/ answers)
 Leveraging online research communities, Q/A sites, groups, meet-ups
to dive deep in a particular topic of interest
 Bridging the gap between industry & academia
 Creating a personal brand by leveraging power of social media
 Getting smart with MS Office (Word, Excel, Power Point, etc.)
 Learning Google products (mentioned in the slide)
 Learning how to learn
 Learning how to teach
 Note Taking
 Effective Communication & Presentation
Dr.aroranisha@gmail.com
Follow me
http://stats.stackexchange.com/users/79100/nisha-arora
http://stackoverflow.com/users/5114585/nisha-arora
https://www.researchgate.net/profile/Nisha_Arora2/contributions
https://www.quora.com/profile/Nisha-Arora-9
http://learnerworld.tumblr.com/
https://www.slideshare.net/NishaArora1
https://www.youtube.com/channel/UCniyhvrD_8AM2jXki3eEErw/videos
https://www.linkedin.com/in/drnishaarora/
Any other topic which you want to hear/learn from me?
Feel free to leave a comment on my YouTube or mail me at
dr.aroranisha@gmail.com
Thank You
References
87
http://machinelearningmastery.com/
https://www.analyticsvidhya.com/
http://www.analyticbridge.com/
http://www.datasciencecentral.com/
https://www.kaggle.com/
http://stats.stackexchange.com
http://datascience.stackexchange.com/
https://www.researchgate.net
https://www.quora.com
https://github.com/
88
Reach Out to Me
http://stats.stackexchange.com/users/79100/nisha-arora
https://www.researchgate.net/profile/Nisha_Arora2/contributions
https://www.quora.com/profile/Nisha-Arora-9
http://learnerworld.tumblr.com/
nishaarora4@gmail.com
Thank You

Contenu connexe

Tendances

Regression analysis
Regression analysisRegression analysis
Regression analysis
Ravi shankar
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
saba khan
 

Tendances (20)

Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examples
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Application of ordinal logistic regression in the study of students’ performance
Application of ordinal logistic regression in the study of students’ performanceApplication of ordinal logistic regression in the study of students’ performance
Application of ordinal logistic regression in the study of students’ performance
 
Binary OR Binomial logistic regression
Binary OR Binomial logistic regression Binary OR Binomial logistic regression
Binary OR Binomial logistic regression
 
Multinomial Logistic Regression
Multinomial Logistic RegressionMultinomial Logistic Regression
Multinomial Logistic Regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Logistic Ordinal Regression
Logistic Ordinal RegressionLogistic Ordinal Regression
Logistic Ordinal Regression
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliRegression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
 
Correlation & Regression Analysis using SPSS
Correlation & Regression Analysis  using SPSSCorrelation & Regression Analysis  using SPSS
Correlation & Regression Analysis using SPSS
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis on SPSS
Regression analysis on SPSSRegression analysis on SPSS
Regression analysis on SPSS
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Multiple Linear Regression
Multiple Linear Regression Multiple Linear Regression
Multiple Linear Regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 

Similaire à 7. logistics regression using spss

Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdf
AlemAyahu
 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.ppt
mousaderhem1
 

Similaire à 7. logistics regression using spss (20)

Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
 
Logistical Regression.pptx
Logistical Regression.pptxLogistical Regression.pptx
Logistical Regression.pptx
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
604_multiplee.ppt
604_multiplee.ppt604_multiplee.ppt
604_multiplee.ppt
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Introduction to Limited Dependent variable
Introduction to Limited Dependent variableIntroduction to Limited Dependent variable
Introduction to Limited Dependent variable
 
Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical information
 
Logistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental researchLogistic regression - one of the key regression tools in experimental research
Logistic regression - one of the key regression tools in experimental research
 
Regression for class teaching
Regression for class teachingRegression for class teaching
Regression for class teaching
 
CHAPTER 11 LOGISTIC REGRESSION.pptx
CHAPTER 11 LOGISTIC REGRESSION.pptxCHAPTER 11 LOGISTIC REGRESSION.pptx
CHAPTER 11 LOGISTIC REGRESSION.pptx
 
report
reportreport
report
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdf
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
 
Logistic Regression.pptx
Logistic Regression.pptxLogistic Regression.pptx
Logistic Regression.pptx
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.ppt
 
linear model multiple predictors.pdf
linear model multiple predictors.pdflinear model multiple predictors.pdf
linear model multiple predictors.pdf
 

Plus de Dr Nisha Arora

Plus de Dr Nisha Arora (15)

1. python for data science
1. python for data science1. python for data science
1. python for data science
 
What do corporates look for in a data science candidate?
What do corporates look for in a data science candidate?What do corporates look for in a data science candidate?
What do corporates look for in a data science candidate?
 
Statistical Inference /Hypothesis Testing
Statistical Inference /Hypothesis Testing Statistical Inference /Hypothesis Testing
Statistical Inference /Hypothesis Testing
 
4 Descriptive Statistics with R
4 Descriptive Statistics with R4 Descriptive Statistics with R
4 Descriptive Statistics with R
 
3 Data Structure in R
3 Data Structure in R3 Data Structure in R
3 Data Structure in R
 
2 data types and operators in r
2 data types and operators in r2 data types and operators in r
2 data types and operators in r
 
My talk_ Using data to get business insights
My talk_ Using data to get business insightsMy talk_ Using data to get business insights
My talk_ Using data to get business insights
 
Discriminant analysis using spss
Discriminant analysis using spssDiscriminant analysis using spss
Discriminant analysis using spss
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Cluster analysis using spss
Cluster analysis using spssCluster analysis using spss
Cluster analysis using spss
 
5 mistakes you might be making as a teacher
5 mistakes you might be making as a teacher5 mistakes you might be making as a teacher
5 mistakes you might be making as a teacher
 
Data visualization & Story Telling with Data
Data visualization & Story Telling with DataData visualization & Story Telling with Data
Data visualization & Story Telling with Data
 
1 machine learning demystified
1 machine learning demystified1 machine learning demystified
1 machine learning demystified
 
1 introduction to data science
1 introduction to data science1 introduction to data science
1 introduction to data science
 
1 installing & Getting Started with R
1 installing & Getting Started with R1 installing & Getting Started with R
1 installing & Getting Started with R
 

Dernier

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 

Dernier (20)

Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 

7. logistics regression using spss

  • 1. Dr Nisha Arora Logistic Regression using SPSS
  • 2. 2
  • 3. Object-wise Analysis 4 Steps to select appropriate statistical test  Define clearly the objective of the study  Define the level of measurement (metric/non-metric) of each variable to be included in the analysis.
  • 4. 5
  • 5. Selecting the appropriate technique 10 Bivariate techniques Response Variable (DV) Explanatory Variable (IDV) Metric Non-metric Metric Regression Logistic Regression/ LDA Non-metric Dummy Var Reg./ Hypothesis Test* Chi-square test Make sure to check all assumptions before applying any statistical technique.
  • 6. Selecting the appropriate technique 12 Response Variable(s) (DVs) One DV More than one DV Explanatory Variable(s) (IDVs) One IDV Metric Non-metric Metric Metric Simple Regression Binary/Multi Nominal (Logistic) Reg Path Analysis Non-metric t test/Anova Chi Square Test Manova More than one IDV All Metric Multiple Reg Multiple Logit Reg/Multiple Multinominal Path Analysis All Non- metric n – way Anova Complex Crosstab/ Log-linear analysis n – way Manova Mixed n – way Ancova/Dumm y var Multiple Logit Reg/Multiple Multinominal n– way Mancova
  • 7. Selecting the appropriate Technique 13 Binary (Binomial) Logistic Regression Multi-Nominal Logistic Regression Ordinal Logistic regression Poisson Regression
  • 8. • Response has only two 2 possible outcomes. • E.g.: Spam or Not Binary • Three or more categories without ordering. • E.g.: Predicting which food is preferred more (Veg, Non-Veg, Vegan) Multinominal • Three or more categories with ordering. • E.g.: Movie rating from 1 to 5 Ordinal 14 Types of Logistic Regression
  • 10. 16 Types of Classification Problems Multi-Label Classification Multi-Class Classification Binary Classification
  • 11. 17  To predict in advance whether a product launch will be successful or not  An online banking service must be able to determine whether or not a transaction being performed on the site is fraudulent  Benign or malignant tumor  Spam detection  Movies genres classification Classification Problem
  • 12. Box-Tidwell Test  In the model, include interactions between the continuous predictors and their logs.  If such an interaction is significant, then the assumption has been violated.  If any interaction is significant, try adding to the model powers of the predictor (that is, going polynomial) Caution:  Not a very robust test as it gets affected by sample size.  You should not be very concerned with a just significant interaction when sample sizes are large.
  • 13. Assumptions of Logit Regression  Binary response variable with mutually exclusive and exhaustive categories.  One or more predictor variable(s)  Independent Observations  linear relationship between continuous independent variable(s) and the logit transformation of the dependent variable  This assumption can be tested by using Box-Tidwell Test.  including in the model interactions between the continuous predictors and their logs. If such an interaction is significant, then the assumption has been violated.
  • 14. Assumptions of Logit Regression  Binary response variable with mutually exclusive and exhaustive categories.  One or more predictor variable(s)  Independent Observations  linear relationship between continuous independent variable(s) and the logit transformation of the dependent variable What about Co-linearity, Perfect co-linearity, and Multi-co-linearity? https://stats.stackexchange.com/a/432543/79100
  • 15. More Discussion on Multi-Colinearity • What happens when you’ve Multi-Colinearity? Multicollinearity isn't as deleterious for prediction but may affect variable’s Significance https://stats.stackexchange.com/questions/168622/why-is-multicollinearity-not- checked-in-modern-statistics-machine-learning • Can you safely ignore Multi-Colineairty? https://statisticalhorizons.com/multicollinearity • How to handle Multi-colinearity? https://www.researchgate.net/post/how_to_deal_with_multicolinearity#view=580e f132ed99e1c1046fcf01 • Why not to use STEP_WISE method? http://www.philender.com/courses/linearmodels/notes4/swprobs.html http://www.danielezrajohnson.com/stepwise.pdf
  • 16. 22
  • 17. Assumptions _ More Considerations  Logistic regression typically requires a large sample size because they use maximum likelihood estimation techniques. [maximum likelihood estimates are less powerful at low sample sizes than ordinary least square].  It is also important to keep in mind that when the outcome is rare, even if the overall dataset is large, it can be difficult to estimate a logit model.  Empty cells or small cells: You should check for empty or small cells by doing a crosstab between categorical predictors and the outcome variable. If a cell has very few cases (a small cell), the model may become unstable or it might not run at all.
  • 18. 26 Why can’t we use Linear Regression for Classification Problems?
  • 19. Why not Linear Regression?
  • 20. 29
  • 21. What is Logistic Regression? The Logistic Regression Curve is called as “Sigmoid Curve”, also known as S-Curve How to decide whether the value is 0 or 1 from this curve? Set a threshold
  • 22.  Default - 0.5  Based on group sizes (as we do in LDA)  Based on performance evaluation matrix using cross validation 31 How to set a threshold?
  • 23. Logistic Regression Equation Rather than modeling this response Y directly, Logistic regression models the probability that Y belongs to a particular category.  P(Y =1 | X) or P(X) can take values from 0 to 1 n n X X X P X P                 ... ) ( 1 ) ( log 1 1 0 How to interpret ?
  • 24. 35
  • 25. Logistic Regression Equation  Alternatively, we can write  Or n n X X e X P X P         ... 1 1 0 ) ( 1 ) (     n n n n X X X X e e X P               ... ... 1 1 0 1 1 0 1 ) (
  • 26. Logistic Regression Equation  Alternatively, we can write  Or n n X X e X P X P         ... 1 1 0 ) ( 1 ) (     n n n n X X X X e e X P               ... ... 1 1 0 1 1 0 1 ) (
  • 27. Understanding the Odds Exp(B) represents the ratio-change in the odds of the event of interest for a one-unit change in the predictor. n n X X e X P X P         ... 1 1 0 ) ( 1 ) (
  • 28. 39 Odds & Odds Ratio
  • 29. Understanding Odds Logit = log (Odds) = Log (p/1-p) = log (probability of event happening/ probability of event not happening) Odds Ratio/ OR = 0 0 | _ _ _ _ _ 1 | _ _ _ _ _ X X Y event of favor in Odds X X Y event of favor in Odds   
  • 30. Interpreting the coefficients 41 Response: default [Y/N] Predictor: [Account] balance Estimated coefficients of the logistic regression model that predicts the probability of default using balance. A one-unit increase in balance is associated with An increase in the log odds of default by 0.0055 units. OR A change in odds by exp(0.0055), i.e., 1.0055
  • 31. Interpreting the coefficients 42 Probability of default for an individual with a balance of $1, 000 is Probability of default for an individual with a balance of $2, 000 is % 576 . 0 00576 . 0 1 1000 * 0055 . 0 6513 . 10 1000 * 0055 . 0 6513 . 10        e e % 6 . 58 586 . 0 1 2000 * 0055 . 0 6513 . 10 2000 * 0055 . 0 6513 . 10        e e
  • 33. Maximum Likelihood Estimation The objective: Not to “correctly” estimate the logit, but to make better classification. Parameters should take values which result in such a score [probabilities or p] which enables us to have a good cutoff. Meaning this “score” should be high for one class and low for another If P(Yi = 1|Xi) = P(Xi) = Pi, then To maximize collective form of this function for all observations Maximum Likelihood Estimation    i i Y i Y i i P P L    1 1 * n i i L Max MaxL 1   
  • 35. Let’s See It In Action 46
  • 36. Note Points  For a standard logistic regression you should ignore the Previous and Next buttons because they are for sequential (hierarchical) logistic regression.  The Method: option needs to be kept at the default value, which is Enter Method.  The "Enter" method is the name given by SPSS Statistics to standard regression analysis.  SPSS Statistics requires you to define all the categorical predictor values in the logistic regression model. It does not do this automatically.  The default behaviour in SPSS Statistics is for the last category (numerically) to be selected as the reference category.  If we change the method from Enter to Forward: Wald the quality of the logistic regression improves. Now only the significant coefficients are included in the logistic regression equation. 47 https://statistics.laerd.com/
  • 38. 49
  • 39. 50
  • 40. 51
  • 41. 52
  • 42. 53 We do not report this
  • 45. 57
  • 46. 58
  • 47. 59
  • 48. 60
  • 49. 61
  • 50. 62
  • 51. 63
  • 52. 64
  • 53. Using ROC to find Optimal Cut-Off 65
  • 54. How to report the results SPSS A logistic regression was performed to ascertain the effects of x, y, and gender on the likelihood that participants to have the event (positive response). 1. The logistic regression model was statistically significant, χ2(df) = 28.605, p < 0.05 [Omninus test] 2. A non-significant test result (p=0.78) of Hosmer Lemeshow test is an indicator of good model fit. 3. The psudeo R2 measures for explained variations are: 56.4% (Cox & Snell R2) and 67.8% (Nagelkerke R2) [For validation data, psudeo R2 …] … … … … … … … … … … … … … … … … 66
  • 55. How to report the results SPSS 1. The model correctly classified 81.0% (model accuracy) of cases for the training data set and 76 % of cases for validation data set. [The data set was randomly divided into training & validation set with 70% observation into training and rest of the observations into the validation set.] 2. The model specificity 3. Sensitivity At cut-off value 2. ROC curve was used to optimize cut-off point … … … … … … … … … … … … … … … … 67
  • 56. How to report the results SPSS The results from the "Variables in the Equation" table, including which of the predictor variables were statistically significant and what predictions can be made based on the use of odds ratios. E.g., Males were 6.02 times more likely to do this (event) than females. Increasing x was associated with an decrease in likelihood of the event, but increasing y was associated with a reduction in the likelihood of the event … … … … … … … … … … … … … … … … 68
  • 57. How to report the results SPSS Box-Tidwell (1962) Test: 69
  • 60. Evaluation Matrices  AIC  Null & Residual Deviance  Accuracy & Misclassification Error  Sensitivity & Specificity  ROC & AUC  Precision & Recall  Lift & Gain  KS Statistics  F Scores  FDR & FOR  FPR & FNR  Hosmer Lemeshow Test  Customized function specific to business requirement https://learnerworld.tumblr.com/ How to choose appropriate evaluation matrices?
  • 62. Other Considerations  Categorical Predictors  Accuracy Paradox  Balanced, Unbalanced & Rare Event Data  Complete or Quasi-separation  Psudo R2 Measures  Multinomial and Ordinal Logistic Regression Homoskedasticity is not an assumption in logistic regression
  • 64. Evaluation Matrices  Efron’s R2  McFadden’s R2  McFadden’s Adjusted R2  Cox & Snell R2  Nagelkerke / Cragg & Uhler’s R2  McKelvey & Zavoina R2  Count R2  Adjusted Count R2  https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are- pseudo-r-squareds/
  • 65. References 78  Field, A. P. (2013). Discovering statistics using IBM SPSS Statistics: and sex and drugs and rock 'n' roll (fourth edition). London: Sage publications.  Field, A. P., Miles, J. N. V., & Field, Z. C. (2012). Discovering statistics using R: and sex and drugs and rock 'n' roll. London: Sage publications.  Field, A. P. & Miles, J. N. V. (2010). Discovering statistics using SAS: and sex and drugs and rock 'n' roll. London: Sage publications.  Kothri, C. R. (2004). Research methodology : methods & techniques. New Age publications.
  • 66. 79 My Interesting answers/posts To understand results of logistics regression or other classifiers https://learnerworld.tumblr.com/post/152327498485/enjoystatisticswith mebinaryclassifierperformance Hypothesis testing in layman’s terms https://learnerworld.tumblr.com/search/hypothesis Understanding mediation effect https://learnerworld.tumblr.com/post/146541892120/mediation- effectenjoystatisticswtihme
  • 67. 80 My Interesting answers/posts Dependence Vs Correlation https://www.quora.com/What-is-the-difference-between-dependence- and-correlation/answer/Nisha-Arora-9 Co-linearity & Correlation https://www.quora.com/In-statistics-what-is-the-difference-between- collinearity-and-correlation/answer/Nisha-Arora-9
  • 68. 81 My Expertise Technical Topics:  Python for Data Science or Data Analysis  R Programming  Data Visualization & Storytelling  Machine Learning/Data Science  Statistics [For researchers/Data Science practitioners/ university students] _Theory/mathematical proofs/application based/using interactive tools/playing with data using some software  Data Analysis using SPSS  Mathematics [Don't want to write too much but depends on what is the requirement]  Excel [Basic to intermediate/tools for data analysis/operations research/operations management/specific course for academicians, etc] To know more about these, click here
  • 69. 82 My Expertise Non-technical Topics:  Interactive pedagogical tools/web resources  The art of effective use of Information & Communication Tools (ICT)  Tools/Platform for hosting online lectures/meetings/live sessions  Effective Googling for finding the right resources (books/ research papers/ answers)  Leveraging online research communities, Q/A sites, groups, meet-ups to dive deep in a particular topic of interest  Bridging the gap between industry & academia  Creating a personal brand by leveraging power of social media  Getting smart with MS Office (Word, Excel, Power Point, etc.)  Learning Google products (mentioned in the slide)  Learning how to learn  Learning how to teach  Note Taking  Effective Communication & Presentation Dr.aroranisha@gmail.com
  • 71. Any other topic which you want to hear/learn from me? Feel free to leave a comment on my YouTube or mail me at dr.aroranisha@gmail.com
  • 73.
  • 75. 88 Reach Out to Me http://stats.stackexchange.com/users/79100/nisha-arora https://www.researchgate.net/profile/Nisha_Arora2/contributions https://www.quora.com/profile/Nisha-Arora-9 http://learnerworld.tumblr.com/ nishaarora4@gmail.com