1. Analysis of a Binary Outcome Variable Using the FREQ and the LOGISTIC Procedures Arthur Li
2.
3.
4.
5. ODDS RATIO A Odds 1 = B Odds 0 = C D D C 0 B A 1 Exposure (X) 0 1 Outcome (Y) Odds Ratio = Odds 1 Odds 0 AD BC =
6.
7.
8. PROC FREQ data breathTest; input test $ 1 - 8 neversmk $ 10 - 16 count; datalines ; abnormal current 131 normal current 927 abnormal never 38 normal never 741 ; 741 (D) 38 (C) NEVER (0) 927 (B) 131 (A) CURRENT (1) SMOKING STATUS (X) NORMAL (0) ABNORMAL (1) BREATHING TEST (Y)
9. PROC FREQ proc freq data =breathTest; weight count; tables neversmk*test; run ; the data is entered directly from the cell count of the table The FREQ Procedure Table of neversmk by test neversmk test Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚abnormal‚normal ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ current ‚ 131 ‚ 927 ‚ 1058 ‚ 7.13 ‚ 50.46 ‚ 57.59 ‚ 12.38 ‚ 87.62 ‚ ‚ 77.51 ‚ 55.58 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ never ‚ 38 ‚ 741 ‚ 779 ‚ 2.07 ‚ 40.34 ‚ 42.41 ‚ 4.88 ‚ 95.12 ‚ ‚ 22.49 ‚ 44.42 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 169 1668 1837 9.20 90.80 100.00
10.
11.
12. PROC FREQ - CHISQ proc freq data =breathTest; weight count; tables neversmk*test/ relrisk chisq ; run ; Statistics for Table of neversmk by test Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 30.2421 <.0001 Likelihood Ratio Chi-Square 1 32.3820 <.0001 Continuity Adj. Chi-Square 1 29.3505 <.0001 Mantel-Haenszel Chi-Square 1 30.2257 <.0001 Phi Coefficient 0.1283 Contingency Coefficient 0.1273 Cramer's V 0.1283
13.
14.
15. LOGISTIC REGRESSION MODEL Reference cell coding β: the increment in log odds for current smokers compared to those that never smoked 741 38 NEVER 927 131 CURRENT SMOKING STATUS NORMAL ABNORMAL BREATHING TEST
16. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk / param =ref; weight count; model test = neversmk; run ; The LOGISTIC Procedure Model Information Data Set WORK.BREATHTEST Response Variable test Number of Response Levels 2 Weight Variable count Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 4 Number of Observations Used 4 Sum of Weights Read 1837 Sum of Weights Used 1837
17.
18.
19.
20.
21.
22.
23.
24.
25.
26. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = "never" ) / param =ref; weight count; model test = neversmk; run ; Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq neversmk 1 28.2434 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.9704 0.1663 318.9365 <.0001 neversmk current 1 1.0136 0.1907 28.2434 <.0001 Current smoker has 1.01 increase in the log odds of having abnormal test compared to people who never smoked OR = exp(1.0136) = 2.756
27. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = "never" ) / param =ref; weight count; model test = neversmk; run ; Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits neversmk current vs never 2.756 1.896 4.004 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 2.7557 1.8962 4.0047 Cohort (Col1 Risk) 2.5383 1.7904 3.5987 Cohort (Col2 Risk) 0.9211 0.8960 0.9470 Sample Size = 1837 Result from PROC FREQ:
28. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = "never" ) / param =ref; weight count; model test = neversmk; oddsratio 'smoking' neversmk; run ; ODDSRATIO <‘label’> variable </options>; new to 9.2! Wald Confidence Interval for Odds Ratios Label Estimate 95% Confidence Limits smoking neversmk current vs never 2.756 1.896 4.004
29.
30. CONFOUNDING Smoking Test Age Not including Age can cause either over-/under-estimates of the relationship between Smoking & Test
31. CONFOUNDING Log (odds) Non smoker smoker Smoking Test Age Adjusting age, you are comparing smoker and non-smoker at the common values of age Age Non smoker Non smoker smoker smoker < 40 ≥ 40
32.
33.
34.
35.
36.
37. THE PURPOSES AND STRATEGIES FOR MODEL BUILDING Is the association between “Smoking” & “Test” different in the 2 age groups? There is an interaction. Report age-specific OR No Interaction. Is “Age” a confounder? Report Crude OR Report Age-Adjusted OR Y N Y N
38.
39. PROC FREQ: INTERACTION EFFECT data breathTestAge; input test $ 1 - 8 neversmk $ 10 - 16 over40 $ 18 - 20 count; datalines ; normal never no 577 abnormal never no 34 normal current no 682 abnormal current no 57 normal never yes 164 abnormal never yes 4 normal current yes 245 abnormal current yes 74 ;
40.
41. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ; Breslow-Day Test for Homogeneity of the Odds Ratios ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 18.0829 DF 1 Pr > ChiSq <.0001 Total Sample Size = 1837 the association between smoking status and the breathing test are not the same across different age groups
42. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ; Statistics for Table 1 of neversmk by test Controlling for over40=no Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 2.4559 0.1171 Likelihood Ratio Chi-Square 1 2.4893 0.1146 Continuity Adj. Chi-Square 1 2.1260 0.1448 Mantel-Haenszel Chi-Square 1 2.4541 0.1172 Phi Coefficient 0.0427 Contingency Coefficient 0.0426 Cramer's V 0.0427 Statistics for Table 1 of neversmk by test Controlling for over40=no Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 1.4184 0.9144 2.2000 Cohort (Col1 Risk) 1.3861 0.9190 2.0906 Cohort (Col2 Risk) 0.9772 0.9499 1.0054 Sample Size = 1350
43. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ; Statistics for Table 2 of neversmk by test Controlling for over40=yes Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 35.4510 <.0001 Likelihood Ratio Chi-Square 1 45.1246 <.0001 Continuity Adj. Chi-Square 1 33.9203 <.0001 Mantel-Haenszel Chi-Square 1 35.3782 <.0001 Phi Coefficient 0.2698 Contingency Coefficient 0.2605 Cramer's V 0.2698 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 12.3837 4.4416 34.5272 Cohort (Col1 Risk) 9.7429 3.6253 26.1844 Cohort (Col2 Risk) 0.7868 0.7374 0.8394
44. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ; Summary Statistics for neversmk by test Controlling for over40 Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Nonzero Correlation 1 25.2444 <.0001 2 Row Mean Scores Differ 1 25.2444 <.0001 3 General Association 1 25.2444 <.0001 Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control Mantel-Haenszel 2.5683 1.7618 3.7441 (Odds Ratio) Logit 1.9840 1.3252 2.9702 Cohort Mantel-Haenszel 2.4174 1.6754 3.4879 (Col1 Risk) Logit 1.8475 1.2641 2.7001 Cohort Mantel-Haenszel 0.9289 0.9046 0.9538 (Col2 Risk) Logit 0.9437 0.9195 0.9686 These statistics and its adjusted OR are only useful if there is a homogeneity in the OR across each category of the adjusting variable
45. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; run ;
46. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; run ; Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.8315 0.1765 257.4193 <.0001 neversmk current 1 0.3495 0.2240 2.4355 0.1186 over40 yes 1 -0.8820 0.5359 2.7086 0.0998 neversmk*over40 current yes 1 2.1668 0.5691 14.4985 0.0001 Wald Test:
47. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; run ; Likelihood Ratio Test:
48. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; run ; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 1130.417 1055.467 SC 1130.497 1055.785 -2 Log L 1128.417 1047.467 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 80.9500 3 <.0001 Score 95.7956 3 <.0001 Wald 81.3305 3 <.0001
49. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40; run ; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 1130.417 1074.123 SC 1130.497 1074.361 -2 Log L 1128.417 1068.123 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 60.2942 2 <.0001 Score 61.2515 2 <.0001 Wald 56.4737 2 <.0001
50. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; ods output FitStatistics = log2Ratio_full GlobalTests = df_full; data _null_ ; set log2Ratio_full; if Criterion = '-2 Log L' ; call symput( 'neg2L_full' , InterceptAndCovariates); data _null_ ; set df_full; if Test = 'Likelihood Ratio' ; call symput( 'df_full' , DF);
51. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40; ods output FitStatistics = log2Ratio_reduce GlobalTests = df_reduce; data _null_ ; set log2Ratio_reduce; if Criterion = '-2 Log L' ; call symput( 'neg2L_reduce' , InterceptAndCovariates); data _null_ ; set df_reduce; if Test = 'Likelihood Ratio' ; call symput( 'df_reduce' , DF); run ;
52. PROC LOGISTIC: INTERACTION EFFECT data result; LR = &neg2L_reduce - &neg2L_full; df = &df_full - &df_reduce; p = 1 -probchi(LR,df); label LR = 'Likelihood Ratio' ; proc print data =result label noobs ; title "Likelihood ratio test" ; run ; Likelihood ratio test Likelihood Ratio df p 20.6558 1 .000005497
53. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = "never" ) over40 ( ref = "no" )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; oddsratio neversmk/ at (over40 = 'no' ) ; oddsratio neversmk/ at (over40 = 'yes' ); run ; Wald Confidence Interval for Odds Ratios Label Estimate 95% Confidence Limits neversmk current vs never at over40=no 1.418 0.914 2.200 neversmk current vs never at over40=yes 12.383 4.441 34.525
54.
55. NURSE HEALTH STUDY data nurse_study; input bc age oc count; datalines ; 1 0 1 71 0 0 1 28418 1 0 0 35 0 0 0 12267 1 1 1 143 0 1 1 20661 1 1 0 321 0 1 0 44424 ; BREAST CANCER 35 71 CASE (1) AGE 30 – 39 (0) 12267 28418 CONTROL (0) 44424 321 NO (0) 20651 143 YES (1) OC USE CONTROL (0) CASE (1) AGE 40 – 55 (1)
56. NURSE HEALTH STUDY proc freq data =nurse_study order =data; weight count; tables age*oc*bc/ chisq relrisk cmh ; run ; Breslow-Day Test for Homogeneity of the Odds Ratios ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 0.1521 DF 1 Pr > ChiSq 0.6966 There is no interaction Check for confounding
57. NURSE HEALTH STUDY Summary Statistics for oc by bc Controlling for age Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Nonzero Correlation 1 0.4361 0.5090 2 Row Mean Scores Differ 1 0.4361 0.5090 3 General Association 1 0.4361 0.5090 Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control Mantel-Haenszel 0.9419 0.7882 1.1256 (Odds Ratio) Logit 0.9415 0.7882 1.1246 Cohort Mantel-Haenszel 0.9422 0.7897 1.1243 (Col1 Risk) Logit 0.9419 0.7894 1.1238 Cohort Mantel-Haenszel 1.0003 0.9994 1.0013 (Col2 Risk) Logit 1.0003 0.9995 1.0012
58. NURSE HEALTH STUDY proc freq data =nurse_study order =data; weight count; tables oc*bc/ chisq relrisk ; run ; Statistics for Table of oc by bc Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 17.8881 <.0001 Likelihood Ratio Chi-Square 1 18.1401 <.0001 Continuity Adj. Chi-Square 1 17.5337 <.0001 Mantel-Haenszel Chi-Square 1 17.8879 <.0001 Phi Coefficient -0.0130 Contingency Coefficient 0.0130 Cramer's V -0.0130 Statistics for Table of oc by bc Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 0.6944 0.5858 0.8230 Cohort (Col1 Risk) 0.6957 0.5874 0.8239 Cohort (Col2 Risk) 1.0019 1.0010 1.0028
59.
60. NURSE HEALTH STUDY proc logistic data =nurse_study descending ; weight count; model bc = oc age; run ; Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -5.9083 0.1156 2612.5788 <.0001 oc 1 -0.0602 0.0911 0.4360 0.5090 age 1 0.9835 0.1133 75.3707 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits oc 0.942 0.788 1.126 age 2.674 2.141 3.338
61. NURSE HEALTH STUDY proc logistic data =nurse_study descending ; weight count; model bc = oc; run ; Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -5.0704 0.0532 9095.8096 <.0001 oc 1 -0.3646 0.0867 17.6834 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits oc 0.694 0.586 0.823