SlideShare une entreprise Scribd logo
1  sur  10
Télécharger pour lire hors ligne
Heart Data | 1
1
Christy Lee
Dana Alswyan
Thuan Nguyen
Business Analytics Project: Data on Heart Diseases
EXECUTIVE OVERVIEW
We have 2 datasets, LA Heart, which we did a logistic regression, and Cardiovas, which
we ran correlations and multiple regression analyses with respectively. From the LA Heart data,
we found that systolic blood pressure has a high probability of being related to having heart
disease. The Cardiovas data shows that the dependent variables hemoglobin A1C and blood
glucose each have independent variables that are moderately to highly significant to explaining
the outcome of their dependent variable. Therefore, to explain the results of hemoglobin A1C,
one must refer to levels of blood glucose and cholesterol, and take into account the patient’s age
and waist size. Also, to explain outcomes of blood glucose, one must see hemoglobin A1C levels
as well as age and weight.
LA HEART - LOGISTIC REGRESSION
The data for LA Heart was recorded in 1950. We analyzed three variables: cholesterol,
diastolic blood pressure, systolic blood pressure, and socioeconomic status, against whether the
patients under analysis are ill in terms of complications found in their cardiovascular system. The
dataset has 200 observations; 171 of those observations are healthy patients and 29 people are ill
patients.
We ran a binary logistic regression as a means to generate a probabilistic statistical
classification for the analysed variables. Convergence status per each independent variable has
been satisfied. Our dependent variable was whether the patient was ill (with some sort of heart
condition), and what role the following variables played in them being ill: systolic blood pressure
(mm Hg), diastolic blood pressure (mm Hg), cholesterol (mg%), and socioeconomic status
(Ordinal; 1=high,...,5=low).
The predictive power of the model, which is < 0.0001, is high since the p value is less
than 0.05, as seen in Figure 1. This means the prediction is significant, which is valuable to put
to practical reference.
Heart Data | 2
2
Figure 1. LA Heart Predictive Power
Our findings are also concordant; the log odds of the first observation are higher than the
second one. The model was predicted correctly, as seen in Figure 2.
Figure 2. LA Heart Concordance
The ROC curve testing showcases that the area under the curve= 0.9141 which is
classified as excellent (A). This attests to the accuracy of the test we ran as a whole.
Figure 3. LA Heart ROC Curve
Heart Data | 3
3
LA HEART - KEY FINDINGS
The 95% confidence interval for SBP_50 lies entirely above 1 to 1 odds, so we are
confident that the odds go up with (SBP_50), the log odds are positive. In Figure 4, we see the
variable with the highest confidence is SBP_50, so we can infer that as SBP_50 goes up, the
odds of having heart problems also increases. The 95% confidence interval of the variables
(DBP_50), (SES), and (CHOL_50) lie on both sides of 1 to 1 odds line, so we aren’t confident
that the odds go either up or down with the three variables mentioned above. The log odds
coefficient is not significant. The confidence interval for SES is especially broad, thus it is the
variable with the least confident odds.
Figure 4. LA Heart Odds Ratios
Also, the influence diagnostics show that the there are some data points that will
influence the confidence levels of each variable versus whether the patient is ill or not. For
example, the influence diagnostic graph for SES shows there are many data points that are far
from 0. Since there is a widespread, the confidence levels for SES are the widest/least confident,
as seen in Figure 5.
Heart Data | 4
4
Figure 5. LA Heart Influence Diagnostics
LA HEART - RESULTS
Our results show that the probability of having heart disease being related to the systolic
blood pressure is high. We see this from the influence diagnostics and how the spread for
SBP_50 is tighter compared to the other variables that were compared. Also, we have high
confidence as seen in Figure 4, that SBP_50 is probable due to it having both confidence limits
above 1, and the 95% confidence is also close together, indicating measurable and consistent
results. Socio-economic status has little to do with the probability of having heart disease, while
the other variables have weak probabilities.
CARDIOVAS - CORRELATION
The Cardiovas dataset consists of cardiovascular risk factor data, with 403 observations.
We first conducted a correlation analysis to see which variables would be most useful in our
linear and multiple regressions. We decided cholesterol, systolic blood pressure, and hemoglobin
A1C were dependent variables due to our correlation analysis in Figure 6. We conducted a
multiple regression for each dependent variable, each with 8 independent (explanatory)
variables.
Heart Data | 5
5
Figure 6. Cardiovas Correlation Analysis
The slightest blood glucose increase raises the risk of having heart disease.1
An increase
of cholesterol in the blood will build up in the walls of the arteries causing what is known as
“atherosclerosis”. There are two forms of cholesterol: Low-density lipoprotein LDL and it’s
known as “bad” cholesterol, and high-density lipoprotein HDL "good" cholesterol.2
We did not
include HDL as a factor in the analysis due to the fact that it is “good” cholesterol, while there
was no data found for LDL in the data set we selected. For the age factor, it is known that as
people grow older, the heart goes through many physiological changes; age could compound the
problems related to the heart if a cardiovascular disease existed.3
According to WebMD research,
people over the age of 50 have the highest chance of getting heart disease.
CARDIOVAS - LINEAR REGRESSION
In most people, systolic blood pressure rises steadily with age due to increasing stiffness
of large arteries, long-term build-up of plaque, and increased incidence of cardiac and vascular
disease.4
Systolic blood pressure as an independent variable could be a strong predictor for risk
1
MediLexicon International. "Glucose Increases Raise Heart Disease Risk." Medical News Today.
http://www.medicalnewstoday.com/articles/246612.php (accessed December 15, 2013).
2
WebMD. "Cholesterol and Heart Disease." WebMD. http://www.webmd.com/heart-disease/guide/heart-disease-
lower-cholesterol-risk (accessed December 13, 2013).
3
"Heart of the matter." Deccan Herald. http://www.deccanherald.com/content/302951/heart-matter.html (accessed
December 15, 2013).
4
"Understanding Blood Pressure Readings." American Heart Association .
http://www.heart.org/HEARTORG/Conditions/HighBloodPressure/AboutHighBloodPressure/Understanding-Blood-
Pressure-Readings_UCM_301764_Article.jsp# (accessed December 14, 2013).
Heart Data | 6
6
of cardiovascular diseases.5
Age also has a strong correlation with systolic blood pressure with a
44.3% correlation; and the r2 value of 19.63% in the linear regression test further proves the
point that systolic blood pressure is explained by age, as seen in Figure 7. Human weight also
plays a big role in showcasing whether a patient is at risk of developing a blockage in the heart.6
But the waist and the hip combined in a ratio are shown to be a better predictor of cardiovascular
diseases than body-mass index.7
Some signs of heart disease include a high level of hemoglobin
A1c. The concentration of hemoglobin A1c in the blood increases the risk for cardiovascular
diseases if found in the blood.8
Figure 7. Cardiovas Linear Regression on Age and Systolic Blood Pressure
In general, the higher the hemoglobin A1C, the higher the risk that a person can will
develop Heart disease. If hemoglobin A1C stays high for a long period of time, the risk for heart
problems is even greater. Our test results show that older people are more likely to have higher
levels of hemoglobin A1C. Diabetesjournals.org reports that hemoglobin A1C levels ≥5.5–6.0%
5
U.S. National Library of Medicine. "Elevated systolic blood pressure and risk of cardiovascular and renal disease."
National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov/pubmed/10467215 (accessed
December 15, 2013).
6
"Weight & Waistlines: Heart Disease Risk Factors." WebMD. http://www.webmd.com/heart-
disease/features/weight-waistlines-heart-disease-risk (accessed December 15, 2013).
7
Wang, Z. "Waist Circumference, Body Mass Index, Hip Circumference and Waist-To-Hip Ratio as Predictors of
Cardiovascular Disease in Aboriginal People ." UThe University of Queensland.
http://espace.library.uq.edu.au/eserv.php?pid=UQ:9338&dsID=wh.pdf (accessed December 15, 2013).
8
U.S. National Library of Medicine. "Association of hemoglobin A1c with cardiovascular disease and mortality in
adults: the European prospective investigation into cancer in Norfolk.." National Center for Biotechnology
Information. http://www.ncbi.nlm.nih.gov/pubmed/15381514 (accessed December 15, 2013).
Heart Data | 7
7
is associated with incident heart failure in a middle-aged population, suggesting that hemoglobin
A1C in relation to older age contributes to development of heart failure.
Figure 8. Cardiovas Correlation of Hemoglobin A1C & Age
Figure 9. Cardiovas Linear Regression of Hemoglobin A1C & Age
CARDIOVAS - MULTIPLE REGRESSION TEST 1: CHOLESTEROL
The independent variables put into SAS against cholesterol are the following: Age, Blood
Glucose, Diastolic Blood Pressure, Hemoglobin A1C, Hip, Systolic Blood Pressure, Waist, and
Weight. We used a stepwise selection and found only 3 independents remained significant:
hemoglobin A1C, age, and diastolic blood pressure.
When looking at the r2
value, or how much each independent variable factors into
affecting cholesterol, diastolic blood pressure shows the highest r2
value with 0.1204. Thus, DBP
Heart Data | 8
8
has a 12.04% significance of affecting cholesterol levels. While age and hemoglobin A1C have
r2
values of 0.1002 and 0.0714 respectively
Figure 10. Cardiovas Cholesterol Stepwise Summary
CARDIOVAS - MULTI. REGR. TEST 2: SYSTOLIC BLOOD PRESSURE
The independent variables that are included in the model ran in SAS against systolic
blood pressure are the following: Age, Blood Glucose, Cholesterol, Diastolic Blood Pressure,
Hemoglobin A1C, Hip, Waist, and Weight. The stepwise selection showcased a relation towards
only four independent variables: diastolic blood pressure, age, hip, and weight. The r2
value
results for the variables entered were mostly high indicators with the following numerical values:
0.3686 for diastolic blood pressure which is 36.86% in effect towards systolic blood pressure,
0.5402 for age which is 54.02% as a factor towards the levels of systolic blood pressure, 0.5440
for hip independent variable which is 54.40%, and lastly 0.5476 for the weight factor which
accumulates for 54.76%. All the variables entered showcase a moderate level of significance
towards the dependent variable of systolic blood pressure.
Figure 11. Cardiovas Systolic Blood Pressure Stepwise Summary
CARDIOVAS - MULTI. REGR. TEST 3: HEMOGLOBIN A1C
The independent variables in the the third testing against hemoglobin A1C are as follows:
Age, Blood Glucose, Cholesterol, Diastolic Blood Pressure, Hip, Systolic Blood Pressure, Waist,
and Weight. The stepwise selection showcased a relation towards four independent variables:
blood glucose, cholesterol, age, and waist. The r2
value results for the variables entered
showcases a moderate to high level of significance towards hemoglobin A1C. The numerical
values for the entered variables were the following: 0.5542 for blood glucose which accumulates
for 55.42%, 0.5745 for cholesterol which accumulates for 57.45%, 0.5838 for the age variable
which is 58.38%, and lastly for the waist variable the r2
value showcased a 0.5869 which
accumulates for 58.69%.
Heart Data | 9
9
Figure 12. Cardiovas Hemoglobin A1C Stepwise Summary
CARDIOVAS - MULTI. REGR. TEST 4: BLOOD GLUCOSE
The independent variables that are included in the model ran in SAS against Blood
Glucose are the following: Age, Cholesterol, Diastolic Blood Pressure, Hemoglobin A1C, Hip,
Systolic Blood Pressure, Waist, and Weight. The stepwise selection showcased a relation
towards only four independent variables: hemoglobin A1C, weight, and age. The r2
value results
for the variables entered were mostly high indicators with the following numerical values:
0.5542 for hemoglobin A1C which is 55.42% as a factor towards the effect of blood glucose,
0.5579 for the weight variable which accumulates for 55.79%, and lastly 0.5605 for the age
independent variable and that accumulates for 56.05%. All the variables entered showcase a
moderate to high level of significance towards the dependent variable of blood glucose.
Figure 13. Cardiovas Blood Glucose Stepwise Summary
CARDIOVAS - MULTI. REGR. RESULTS
From the previous 4 multiple regressions, hemoglobin A1C’s independent variables as
well as the independent variables for blood glucose have the strongest (moderate-high)
significance to affecting their dependent variable; while the independent variables for systolic
blood pressure have moderate significance towards explaining; and cholesterol’s independent
variables have the weakest (low) significance to explaining the outcome of cholesterol.
Therefore, the independent variables for both hemoglobin A1C and blood glucose should
be referred to in order to explain how the results of each dependent variable came to be.
However, the independent variables for cholesterol, which have a weak significance, should not
be discarded as they still contribute to explaining some part of the outcome of their dependent
variable, albeit a rather small portion of it.
Our most valuable tests are Test 3 and Test 4, as the independent variables in each test
can explain at least 50% of the outcome seen in their dependent variable. Accordingly, to explain
Heart Data | 10
10
the results of hemoglobin A1C, levels of blood glucose, cholesterol, the patient’s age and waist
size should be taken into account. Additionally, to explain what affects blood glucose,
hemoglobin A1C levels as well as age and weight must be inspected.
CONCLUSIONS: LA HEART & CARDIOVAS
From the data we found on the LA Heart dataset, our logistic regression test illustrates
systolic blood pressure has a high probability of being related to developing heart disease, while
the other independent variables (diastolic blood pressure, cholesterol, and socioeconomic status)
have less to do with predicting heart illnesses. Both 95% confidence limits of the log odds for
systolic blood pressure are above 1 on the odds line. Influence diagnostics also shows that
systolic blood pressure has the most compact graph, showing that it has strong influence on
having heart illnesses.
The Cardiovas dataset shows us that hemoglobin A1C can be explained the best by waist
size with an r2
value of 0.5869, as seen in Figure 12, which is a moderate to strong significance.
The other independent variables - age, cholesterol, and blood glucose - also have moderate to
high significance to explaining hemoglobin A1C, however, waist size has the highest r2
value of
them all. Additionally, blood glucose can be best explained by the independent variable of age
with an r2
value of 0.5605, as seen in Figure 13. The other independent variables result in
moderate-high significance to affecting blood glucose and they include weight and hemoglobin
A1C.

Contenu connexe

Tendances

Lifestyle factors in cv ds
Lifestyle factors in cv dsLifestyle factors in cv ds
Lifestyle factors in cv ds
jayarajgr
 
Physical activity and risk of cardiovascular disease—a
Physical activity and risk of cardiovascular disease—aPhysical activity and risk of cardiovascular disease—a
Physical activity and risk of cardiovascular disease—a
ArhamSheikh1
 
International Journal of Cardiovascular Diseases & Diagnosis
International Journal of Cardiovascular Diseases & DiagnosisInternational Journal of Cardiovascular Diseases & Diagnosis
International Journal of Cardiovascular Diseases & Diagnosis
SciRes Literature LLC. | Open Access Journals
 
WHARTON-Flyer_ASCEND
WHARTON-Flyer_ASCENDWHARTON-Flyer_ASCEND
WHARTON-Flyer_ASCEND
Colin Lamb
 

Tendances (20)

Risk factors for adverse coutcomes
Risk factors for adverse coutcomesRisk factors for adverse coutcomes
Risk factors for adverse coutcomes
 
SXGYFL
SXGYFLSXGYFL
SXGYFL
 
11. DM STUDY
11. DM STUDY11. DM STUDY
11. DM STUDY
 
Revisiting the Diet Heart Disease Hypothesis Paper
Revisiting the Diet Heart Disease Hypothesis PaperRevisiting the Diet Heart Disease Hypothesis Paper
Revisiting the Diet Heart Disease Hypothesis Paper
 
Risc cardiovascular i dislipèmies
Risc cardiovascular i dislipèmiesRisc cardiovascular i dislipèmies
Risc cardiovascular i dislipèmies
 
Cigarette smoking, systolic blood pressure, and cardiovascular diseases in th...
Cigarette smoking, systolic blood pressure, and cardiovascular diseases in th...Cigarette smoking, systolic blood pressure, and cardiovascular diseases in th...
Cigarette smoking, systolic blood pressure, and cardiovascular diseases in th...
 
ECG
ECGECG
ECG
 
ACC/AHA lipid guidelines 2018
ACC/AHA lipid guidelines 2018ACC/AHA lipid guidelines 2018
ACC/AHA lipid guidelines 2018
 
Coronary Artery Disease and Menopause: A Consequence of Adverse Lipid Changes
Coronary Artery Disease and Menopause: A Consequence of Adverse Lipid ChangesCoronary Artery Disease and Menopause: A Consequence of Adverse Lipid Changes
Coronary Artery Disease and Menopause: A Consequence of Adverse Lipid Changes
 
Lifestyle factors in cv ds
Lifestyle factors in cv dsLifestyle factors in cv ds
Lifestyle factors in cv ds
 
Physical activity and risk of cardiovascular disease—a
Physical activity and risk of cardiovascular disease—aPhysical activity and risk of cardiovascular disease—a
Physical activity and risk of cardiovascular disease—a
 
Seattle heart failure model
Seattle heart failure modelSeattle heart failure model
Seattle heart failure model
 
International Journal of Cardiovascular Diseases & Diagnosis
International Journal of Cardiovascular Diseases & DiagnosisInternational Journal of Cardiovascular Diseases & Diagnosis
International Journal of Cardiovascular Diseases & Diagnosis
 
Heart disease causes prevention and current
Heart disease causes prevention and currentHeart disease causes prevention and current
Heart disease causes prevention and current
 
What’s new in Lipidology, Lessons from “recent guidelines“
What’s new in Lipidology, Lessons from “recent guidelines“What’s new in Lipidology, Lessons from “recent guidelines“
What’s new in Lipidology, Lessons from “recent guidelines“
 
1472 6874-4-s1-s15
1472 6874-4-s1-s151472 6874-4-s1-s15
1472 6874-4-s1-s15
 
Aspirin use for the primary prevention of cardiovascular disease and colorect...
Aspirin use for the primary prevention of cardiovascular disease and colorect...Aspirin use for the primary prevention of cardiovascular disease and colorect...
Aspirin use for the primary prevention of cardiovascular disease and colorect...
 
Cardiovascular diseases traditional_and_non-tradit
Cardiovascular diseases traditional_and_non-traditCardiovascular diseases traditional_and_non-tradit
Cardiovascular diseases traditional_and_non-tradit
 
WHARTON-Flyer_ASCEND
WHARTON-Flyer_ASCENDWHARTON-Flyer_ASCEND
WHARTON-Flyer_ASCEND
 
Diabetes And Heart
Diabetes And HeartDiabetes And Heart
Diabetes And Heart
 

En vedette

Regulatory Analysis and Performance Management
Regulatory Analysis and Performance ManagementRegulatory Analysis and Performance Management
Regulatory Analysis and Performance Management
Mercatus Center
 
129 sample 1_st few pages for final doc
129  sample 1_st few pages for final doc129  sample 1_st few pages for final doc
129 sample 1_st few pages for final doc
sshaili
 
Marsh Magazine, Issue 3 - December 2014
Marsh Magazine, Issue 3 - December 2014Marsh Magazine, Issue 3 - December 2014
Marsh Magazine, Issue 3 - December 2014
Fabio Tomassini
 
Marka patent kavrami
Marka patent kavramiMarka patent kavrami
Marka patent kavrami
MhmtTprdmz
 
Technology’s role in 21st century learning
Technology’s role in 21st century learningTechnology’s role in 21st century learning
Technology’s role in 21st century learning
f3tm3
 
Hello my name is
Hello my name isHello my name is
Hello my name is
Mmoyer94
 

En vedette (18)

2011 09-09 activiti
2011 09-09 activiti2011 09-09 activiti
2011 09-09 activiti
 
linee guida per la valorizzazione della montagna piemontese
linee guida per la valorizzazione della montagna piemonteselinee guida per la valorizzazione della montagna piemontese
linee guida per la valorizzazione della montagna piemontese
 
Regulatory Analysis and Performance Management
Regulatory Analysis and Performance ManagementRegulatory Analysis and Performance Management
Regulatory Analysis and Performance Management
 
BIO International Convention Attendee and Exhibitor Summary 2011
BIO International Convention Attendee and Exhibitor Summary 2011BIO International Convention Attendee and Exhibitor Summary 2011
BIO International Convention Attendee and Exhibitor Summary 2011
 
Interieur Projecten JHK
Interieur Projecten JHKInterieur Projecten JHK
Interieur Projecten JHK
 
129 sample 1_st few pages for final doc
129  sample 1_st few pages for final doc129  sample 1_st few pages for final doc
129 sample 1_st few pages for final doc
 
Quantum ikhlastekhnologiaktivasikekuatanhati
Quantum ikhlastekhnologiaktivasikekuatanhatiQuantum ikhlastekhnologiaktivasikekuatanhati
Quantum ikhlastekhnologiaktivasikekuatanhati
 
Foreclosure-Underground Credit Restoration
Foreclosure-Underground Credit RestorationForeclosure-Underground Credit Restoration
Foreclosure-Underground Credit Restoration
 
잡코리아 글로벌 프런티어 8기_newSage_탐방 보고서
잡코리아 글로벌 프런티어 8기_newSage_탐방 보고서잡코리아 글로벌 프런티어 8기_newSage_탐방 보고서
잡코리아 글로벌 프런티어 8기_newSage_탐방 보고서
 
Marsh Magazine, Issue 3 - December 2014
Marsh Magazine, Issue 3 - December 2014Marsh Magazine, Issue 3 - December 2014
Marsh Magazine, Issue 3 - December 2014
 
Strategic Storytelling
Strategic Storytelling Strategic Storytelling
Strategic Storytelling
 
36th mtg in NIBIO
 36th mtg in NIBIO 36th mtg in NIBIO
36th mtg in NIBIO
 
Marka patent kavrami
Marka patent kavramiMarka patent kavrami
Marka patent kavrami
 
The market
The marketThe market
The market
 
Technology’s role in 21st century learning
Technology’s role in 21st century learningTechnology’s role in 21st century learning
Technology’s role in 21st century learning
 
CSS Meetup at The Hive in Rock Hill, SC - 2014
CSS Meetup at The Hive in Rock Hill, SC - 2014CSS Meetup at The Hive in Rock Hill, SC - 2014
CSS Meetup at The Hive in Rock Hill, SC - 2014
 
Hello my name is
Hello my name isHello my name is
Hello my name is
 
Примеры проектов по брендингу. Предметный промоушен
Примеры проектов по брендингу. Предметный промоушенПримеры проектов по брендингу. Предметный промоушен
Примеры проектов по брендингу. Предметный промоушен
 

Similaire à Tom Nguyen - SAS Project

Role of quantitative assessment in fetal echocardiography
Role of quantitative assessment in fetal echocardiographyRole of quantitative assessment in fetal echocardiography
Role of quantitative assessment in fetal echocardiography
gisa_legal
 
CVD Egypt Clinical Diabetes Reprint Summer 2010
CVD Egypt Clinical Diabetes Reprint Summer 2010CVD Egypt Clinical Diabetes Reprint Summer 2010
CVD Egypt Clinical Diabetes Reprint Summer 2010
Mahmoud IBRAHIM
 
Stroke in young adults
Stroke in young adults Stroke in young adults
Stroke in young adults
Ekta Chaudhary
 
A N NA L S O F FA M I LY M E D I C I N E ✦ W W W. A N N FA.docx
A N NA L S  O F  FA M I LY  M E D I C I N E  ✦ W W W. A N N FA.docxA N NA L S  O F  FA M I LY  M E D I C I N E  ✦ W W W. A N N FA.docx
A N NA L S O F FA M I LY M E D I C I N E ✦ W W W. A N N FA.docx
ransayo
 

Similaire à Tom Nguyen - SAS Project (20)

Heart attack possibility.pptx
Heart attack possibility.pptxHeart attack possibility.pptx
Heart attack possibility.pptx
 
Cholesterol Lowering - A Failed Strategy
Cholesterol Lowering - A Failed StrategyCholesterol Lowering - A Failed Strategy
Cholesterol Lowering - A Failed Strategy
 
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCEPREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
PREVENTION OF HEART PROBLEM USING ARTIFICIAL INTELLIGENCE
 
Index Percentile Distribution Poster
Index Percentile Distribution PosterIndex Percentile Distribution Poster
Index Percentile Distribution Poster
 
Impact of diastolic and systolic blood pressure on mortality - implications ...
Impact of diastolic and systolic blood pressure on mortality  - implications ...Impact of diastolic and systolic blood pressure on mortality  - implications ...
Impact of diastolic and systolic blood pressure on mortality - implications ...
 
Goal attainments and their discrepancies for low density lipoprotein choleste...
Goal attainments and their discrepancies for low density lipoprotein choleste...Goal attainments and their discrepancies for low density lipoprotein choleste...
Goal attainments and their discrepancies for low density lipoprotein choleste...
 
Oscillometric Blood Pressure Limits
Oscillometric Blood Pressure LimitsOscillometric Blood Pressure Limits
Oscillometric Blood Pressure Limits
 
Role of quantitative assessment in fetal echocardiography
Role of quantitative assessment in fetal echocardiographyRole of quantitative assessment in fetal echocardiography
Role of quantitative assessment in fetal echocardiography
 
CVD Egypt Clinical Diabetes Reprint Summer 2010
CVD Egypt Clinical Diabetes Reprint Summer 2010CVD Egypt Clinical Diabetes Reprint Summer 2010
CVD Egypt Clinical Diabetes Reprint Summer 2010
 
Diagnosis of Early Risks, Management of Risks, and Reduction of Vascular Dise...
Diagnosis of Early Risks, Management of Risks, and Reduction of Vascular Dise...Diagnosis of Early Risks, Management of Risks, and Reduction of Vascular Dise...
Diagnosis of Early Risks, Management of Risks, and Reduction of Vascular Dise...
 
Lipids and cerebrovascular diseases
Lipids and cerebrovascular diseasesLipids and cerebrovascular diseases
Lipids and cerebrovascular diseases
 
Logistic Regression: Predicting The Chances Of Coronary Heart Disease
Logistic Regression: Predicting The Chances Of Coronary Heart DiseaseLogistic Regression: Predicting The Chances Of Coronary Heart Disease
Logistic Regression: Predicting The Chances Of Coronary Heart Disease
 
Jnc 8 full
Jnc 8 fullJnc 8 full
Jnc 8 full
 
Review of Lipid Guidelines 2011 to 2017
Review of Lipid Guidelines 2011 to 2017Review of Lipid Guidelines 2011 to 2017
Review of Lipid Guidelines 2011 to 2017
 
Stroke in young adults
Stroke in young adults Stroke in young adults
Stroke in young adults
 
Project ppt
Project pptProject ppt
Project ppt
 
Dyslipidemia in stroke
Dyslipidemia in stroke  Dyslipidemia in stroke
Dyslipidemia in stroke
 
A N NA L S O F FA M I LY M E D I C I N E ✦ W W W. A N N FA.docx
A N NA L S  O F  FA M I LY  M E D I C I N E  ✦ W W W. A N N FA.docxA N NA L S  O F  FA M I LY  M E D I C I N E  ✦ W W W. A N N FA.docx
A N NA L S O F FA M I LY M E D I C I N E ✦ W W W. A N N FA.docx
 
Heart Disease Classification: Machine Learning Analysis
Heart Disease Classification: Machine Learning AnalysisHeart Disease Classification: Machine Learning Analysis
Heart Disease Classification: Machine Learning Analysis
 
Heart Disease Classification: Machine Learning Analysis
Heart Disease Classification: Machine Learning AnalysisHeart Disease Classification: Machine Learning Analysis
Heart Disease Classification: Machine Learning Analysis
 

Tom Nguyen - SAS Project

  • 1. Heart Data | 1 1 Christy Lee Dana Alswyan Thuan Nguyen Business Analytics Project: Data on Heart Diseases EXECUTIVE OVERVIEW We have 2 datasets, LA Heart, which we did a logistic regression, and Cardiovas, which we ran correlations and multiple regression analyses with respectively. From the LA Heart data, we found that systolic blood pressure has a high probability of being related to having heart disease. The Cardiovas data shows that the dependent variables hemoglobin A1C and blood glucose each have independent variables that are moderately to highly significant to explaining the outcome of their dependent variable. Therefore, to explain the results of hemoglobin A1C, one must refer to levels of blood glucose and cholesterol, and take into account the patient’s age and waist size. Also, to explain outcomes of blood glucose, one must see hemoglobin A1C levels as well as age and weight. LA HEART - LOGISTIC REGRESSION The data for LA Heart was recorded in 1950. We analyzed three variables: cholesterol, diastolic blood pressure, systolic blood pressure, and socioeconomic status, against whether the patients under analysis are ill in terms of complications found in their cardiovascular system. The dataset has 200 observations; 171 of those observations are healthy patients and 29 people are ill patients. We ran a binary logistic regression as a means to generate a probabilistic statistical classification for the analysed variables. Convergence status per each independent variable has been satisfied. Our dependent variable was whether the patient was ill (with some sort of heart condition), and what role the following variables played in them being ill: systolic blood pressure (mm Hg), diastolic blood pressure (mm Hg), cholesterol (mg%), and socioeconomic status (Ordinal; 1=high,...,5=low). The predictive power of the model, which is < 0.0001, is high since the p value is less than 0.05, as seen in Figure 1. This means the prediction is significant, which is valuable to put to practical reference.
  • 2. Heart Data | 2 2 Figure 1. LA Heart Predictive Power Our findings are also concordant; the log odds of the first observation are higher than the second one. The model was predicted correctly, as seen in Figure 2. Figure 2. LA Heart Concordance The ROC curve testing showcases that the area under the curve= 0.9141 which is classified as excellent (A). This attests to the accuracy of the test we ran as a whole. Figure 3. LA Heart ROC Curve
  • 3. Heart Data | 3 3 LA HEART - KEY FINDINGS The 95% confidence interval for SBP_50 lies entirely above 1 to 1 odds, so we are confident that the odds go up with (SBP_50), the log odds are positive. In Figure 4, we see the variable with the highest confidence is SBP_50, so we can infer that as SBP_50 goes up, the odds of having heart problems also increases. The 95% confidence interval of the variables (DBP_50), (SES), and (CHOL_50) lie on both sides of 1 to 1 odds line, so we aren’t confident that the odds go either up or down with the three variables mentioned above. The log odds coefficient is not significant. The confidence interval for SES is especially broad, thus it is the variable with the least confident odds. Figure 4. LA Heart Odds Ratios Also, the influence diagnostics show that the there are some data points that will influence the confidence levels of each variable versus whether the patient is ill or not. For example, the influence diagnostic graph for SES shows there are many data points that are far from 0. Since there is a widespread, the confidence levels for SES are the widest/least confident, as seen in Figure 5.
  • 4. Heart Data | 4 4 Figure 5. LA Heart Influence Diagnostics LA HEART - RESULTS Our results show that the probability of having heart disease being related to the systolic blood pressure is high. We see this from the influence diagnostics and how the spread for SBP_50 is tighter compared to the other variables that were compared. Also, we have high confidence as seen in Figure 4, that SBP_50 is probable due to it having both confidence limits above 1, and the 95% confidence is also close together, indicating measurable and consistent results. Socio-economic status has little to do with the probability of having heart disease, while the other variables have weak probabilities. CARDIOVAS - CORRELATION The Cardiovas dataset consists of cardiovascular risk factor data, with 403 observations. We first conducted a correlation analysis to see which variables would be most useful in our linear and multiple regressions. We decided cholesterol, systolic blood pressure, and hemoglobin A1C were dependent variables due to our correlation analysis in Figure 6. We conducted a multiple regression for each dependent variable, each with 8 independent (explanatory) variables.
  • 5. Heart Data | 5 5 Figure 6. Cardiovas Correlation Analysis The slightest blood glucose increase raises the risk of having heart disease.1 An increase of cholesterol in the blood will build up in the walls of the arteries causing what is known as “atherosclerosis”. There are two forms of cholesterol: Low-density lipoprotein LDL and it’s known as “bad” cholesterol, and high-density lipoprotein HDL "good" cholesterol.2 We did not include HDL as a factor in the analysis due to the fact that it is “good” cholesterol, while there was no data found for LDL in the data set we selected. For the age factor, it is known that as people grow older, the heart goes through many physiological changes; age could compound the problems related to the heart if a cardiovascular disease existed.3 According to WebMD research, people over the age of 50 have the highest chance of getting heart disease. CARDIOVAS - LINEAR REGRESSION In most people, systolic blood pressure rises steadily with age due to increasing stiffness of large arteries, long-term build-up of plaque, and increased incidence of cardiac and vascular disease.4 Systolic blood pressure as an independent variable could be a strong predictor for risk 1 MediLexicon International. "Glucose Increases Raise Heart Disease Risk." Medical News Today. http://www.medicalnewstoday.com/articles/246612.php (accessed December 15, 2013). 2 WebMD. "Cholesterol and Heart Disease." WebMD. http://www.webmd.com/heart-disease/guide/heart-disease- lower-cholesterol-risk (accessed December 13, 2013). 3 "Heart of the matter." Deccan Herald. http://www.deccanherald.com/content/302951/heart-matter.html (accessed December 15, 2013). 4 "Understanding Blood Pressure Readings." American Heart Association . http://www.heart.org/HEARTORG/Conditions/HighBloodPressure/AboutHighBloodPressure/Understanding-Blood- Pressure-Readings_UCM_301764_Article.jsp# (accessed December 14, 2013).
  • 6. Heart Data | 6 6 of cardiovascular diseases.5 Age also has a strong correlation with systolic blood pressure with a 44.3% correlation; and the r2 value of 19.63% in the linear regression test further proves the point that systolic blood pressure is explained by age, as seen in Figure 7. Human weight also plays a big role in showcasing whether a patient is at risk of developing a blockage in the heart.6 But the waist and the hip combined in a ratio are shown to be a better predictor of cardiovascular diseases than body-mass index.7 Some signs of heart disease include a high level of hemoglobin A1c. The concentration of hemoglobin A1c in the blood increases the risk for cardiovascular diseases if found in the blood.8 Figure 7. Cardiovas Linear Regression on Age and Systolic Blood Pressure In general, the higher the hemoglobin A1C, the higher the risk that a person can will develop Heart disease. If hemoglobin A1C stays high for a long period of time, the risk for heart problems is even greater. Our test results show that older people are more likely to have higher levels of hemoglobin A1C. Diabetesjournals.org reports that hemoglobin A1C levels ≥5.5–6.0% 5 U.S. National Library of Medicine. "Elevated systolic blood pressure and risk of cardiovascular and renal disease." National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov/pubmed/10467215 (accessed December 15, 2013). 6 "Weight & Waistlines: Heart Disease Risk Factors." WebMD. http://www.webmd.com/heart- disease/features/weight-waistlines-heart-disease-risk (accessed December 15, 2013). 7 Wang, Z. "Waist Circumference, Body Mass Index, Hip Circumference and Waist-To-Hip Ratio as Predictors of Cardiovascular Disease in Aboriginal People ." UThe University of Queensland. http://espace.library.uq.edu.au/eserv.php?pid=UQ:9338&dsID=wh.pdf (accessed December 15, 2013). 8 U.S. National Library of Medicine. "Association of hemoglobin A1c with cardiovascular disease and mortality in adults: the European prospective investigation into cancer in Norfolk.." National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov/pubmed/15381514 (accessed December 15, 2013).
  • 7. Heart Data | 7 7 is associated with incident heart failure in a middle-aged population, suggesting that hemoglobin A1C in relation to older age contributes to development of heart failure. Figure 8. Cardiovas Correlation of Hemoglobin A1C & Age Figure 9. Cardiovas Linear Regression of Hemoglobin A1C & Age CARDIOVAS - MULTIPLE REGRESSION TEST 1: CHOLESTEROL The independent variables put into SAS against cholesterol are the following: Age, Blood Glucose, Diastolic Blood Pressure, Hemoglobin A1C, Hip, Systolic Blood Pressure, Waist, and Weight. We used a stepwise selection and found only 3 independents remained significant: hemoglobin A1C, age, and diastolic blood pressure. When looking at the r2 value, or how much each independent variable factors into affecting cholesterol, diastolic blood pressure shows the highest r2 value with 0.1204. Thus, DBP
  • 8. Heart Data | 8 8 has a 12.04% significance of affecting cholesterol levels. While age and hemoglobin A1C have r2 values of 0.1002 and 0.0714 respectively Figure 10. Cardiovas Cholesterol Stepwise Summary CARDIOVAS - MULTI. REGR. TEST 2: SYSTOLIC BLOOD PRESSURE The independent variables that are included in the model ran in SAS against systolic blood pressure are the following: Age, Blood Glucose, Cholesterol, Diastolic Blood Pressure, Hemoglobin A1C, Hip, Waist, and Weight. The stepwise selection showcased a relation towards only four independent variables: diastolic blood pressure, age, hip, and weight. The r2 value results for the variables entered were mostly high indicators with the following numerical values: 0.3686 for diastolic blood pressure which is 36.86% in effect towards systolic blood pressure, 0.5402 for age which is 54.02% as a factor towards the levels of systolic blood pressure, 0.5440 for hip independent variable which is 54.40%, and lastly 0.5476 for the weight factor which accumulates for 54.76%. All the variables entered showcase a moderate level of significance towards the dependent variable of systolic blood pressure. Figure 11. Cardiovas Systolic Blood Pressure Stepwise Summary CARDIOVAS - MULTI. REGR. TEST 3: HEMOGLOBIN A1C The independent variables in the the third testing against hemoglobin A1C are as follows: Age, Blood Glucose, Cholesterol, Diastolic Blood Pressure, Hip, Systolic Blood Pressure, Waist, and Weight. The stepwise selection showcased a relation towards four independent variables: blood glucose, cholesterol, age, and waist. The r2 value results for the variables entered showcases a moderate to high level of significance towards hemoglobin A1C. The numerical values for the entered variables were the following: 0.5542 for blood glucose which accumulates for 55.42%, 0.5745 for cholesterol which accumulates for 57.45%, 0.5838 for the age variable which is 58.38%, and lastly for the waist variable the r2 value showcased a 0.5869 which accumulates for 58.69%.
  • 9. Heart Data | 9 9 Figure 12. Cardiovas Hemoglobin A1C Stepwise Summary CARDIOVAS - MULTI. REGR. TEST 4: BLOOD GLUCOSE The independent variables that are included in the model ran in SAS against Blood Glucose are the following: Age, Cholesterol, Diastolic Blood Pressure, Hemoglobin A1C, Hip, Systolic Blood Pressure, Waist, and Weight. The stepwise selection showcased a relation towards only four independent variables: hemoglobin A1C, weight, and age. The r2 value results for the variables entered were mostly high indicators with the following numerical values: 0.5542 for hemoglobin A1C which is 55.42% as a factor towards the effect of blood glucose, 0.5579 for the weight variable which accumulates for 55.79%, and lastly 0.5605 for the age independent variable and that accumulates for 56.05%. All the variables entered showcase a moderate to high level of significance towards the dependent variable of blood glucose. Figure 13. Cardiovas Blood Glucose Stepwise Summary CARDIOVAS - MULTI. REGR. RESULTS From the previous 4 multiple regressions, hemoglobin A1C’s independent variables as well as the independent variables for blood glucose have the strongest (moderate-high) significance to affecting their dependent variable; while the independent variables for systolic blood pressure have moderate significance towards explaining; and cholesterol’s independent variables have the weakest (low) significance to explaining the outcome of cholesterol. Therefore, the independent variables for both hemoglobin A1C and blood glucose should be referred to in order to explain how the results of each dependent variable came to be. However, the independent variables for cholesterol, which have a weak significance, should not be discarded as they still contribute to explaining some part of the outcome of their dependent variable, albeit a rather small portion of it. Our most valuable tests are Test 3 and Test 4, as the independent variables in each test can explain at least 50% of the outcome seen in their dependent variable. Accordingly, to explain
  • 10. Heart Data | 10 10 the results of hemoglobin A1C, levels of blood glucose, cholesterol, the patient’s age and waist size should be taken into account. Additionally, to explain what affects blood glucose, hemoglobin A1C levels as well as age and weight must be inspected. CONCLUSIONS: LA HEART & CARDIOVAS From the data we found on the LA Heart dataset, our logistic regression test illustrates systolic blood pressure has a high probability of being related to developing heart disease, while the other independent variables (diastolic blood pressure, cholesterol, and socioeconomic status) have less to do with predicting heart illnesses. Both 95% confidence limits of the log odds for systolic blood pressure are above 1 on the odds line. Influence diagnostics also shows that systolic blood pressure has the most compact graph, showing that it has strong influence on having heart illnesses. The Cardiovas dataset shows us that hemoglobin A1C can be explained the best by waist size with an r2 value of 0.5869, as seen in Figure 12, which is a moderate to strong significance. The other independent variables - age, cholesterol, and blood glucose - also have moderate to high significance to explaining hemoglobin A1C, however, waist size has the highest r2 value of them all. Additionally, blood glucose can be best explained by the independent variable of age with an r2 value of 0.5605, as seen in Figure 13. The other independent variables result in moderate-high significance to affecting blood glucose and they include weight and hemoglobin A1C.