CLA 2 PresentationBUS 606 Advanced Statistical Concepts An

CLA 2 Presentation BUS 606 Advanced Statistical Concepts And Business Analytics Agenda Introduction Multiple linear regression is the most appropriate statistical technique in predicting the outcome of a dependent variable at different values (Keith, 2019). The study assessed the relationship between the cost of constructing an LWR Plant and the three predictor variables S, N, and CT. We assessed the association between the two-test used to examine the employee performance. Assumption of Regression Analysis Multicollinearity Multicollinearity is the condition where the predictor variables are highly correlated (Alin, 2010). Correlation Analysis 4 Assumption of Regression Analysis Cont’ Normality test The normality assumption is not violated after transforming the outcome variable C, using natural log (C) (Shapiro-Wilk = 0.967, p = 0.414). 5 Results and Discussion – Regression Analysis Use Residual Analysis and R2 to Check Your Model The R-Squared of 0.232 indicates that the model can explain about 23.2% of ln(C) The low R-Square indicated that the model does not fit the data well (Brown, 2009). 6 Results and Discussion Cont’ State which Variables are Important in predicting the cost of constructing an LWR plant? S is a significant contributing factor in predicting ln(C)(p = 0.021), but N and CT have no significant effect in predicting (p > 0.05) 7 Results and Discussion Cont’ State a prediction equation that can be used to predict ln(C). After dropping N and CT from the model since they do not have a significance effect in predicting ln(C), the prediction equation is given by: Does adding CT improve R2? If so, by what amount? Adding CT in the model changes R-Square by 0.001 from 0.232 to 0.234 which is not significant different from zero (p > 0.05). 8 Results and Discussion Cont’ - Correlational Analysis Evaluate the correlation between the two scores and state if there seems to be any association between the two. There was a weak positive correlation between the two tests (r = 0.187). This suggested that the two test scores were not correlated. 9 Results and Discussion Cont’ Find the probability of upgrading for each division of the sample by the Bayes’ theorem. P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1) = (23/46*46/86) ÷43/86 = 23/43 P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2) = (23/46*46/86) ÷43/86 = 23/43 10 Results and Discussion Cont’ Find the probability of upgrading for each division of the sample by the naïve version of the Bayes’ theorem P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1) = (23/46*46/86) ÷43/86 = 23/43 P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2) = (23/46*46/86) ÷43/86 = 23/43 11 Results and Discussion Cont’ Compare your results in parts b and c and explain the difference or indifference based on observed probabilities Naïve version and Bayes theorem have similar probabilities. We have only one predictor in each sample division This is because Naïve is a ...

CLA 2 Presentation
BUS 606 Advanced Statistical Concepts And Business Analytics
Agenda
Introduction
Multiple linear regression is the most appropriate statistical
technique in predicting the outcome of a dependent variable at
different values (Keith, 2019).
The study assessed the relationship between the cost of
constructing an LWR Plant and the three predictor variables S,
N, and CT.
We assessed the association between the two-test used to
examine the employee performance.
Assumption of Regression Analysis
Multicollinearity
Multicollinearity is the condition where the predictor variables
are highly correlated (Alin, 2010).
Correlation Analysis
4
Assumption of Regression Analysis Cont’
Normality test
The normality assumption is not violated after transforming the
outcome variable C, using natural log (C) (Shapiro-Wilk =
0.967, p = 0.414).
5
Results and Discussion – Regression Analysis
Use Residual Analysis and R2 to Check Your Model
The R-Squared of 0.232 indicates that the model can explain
about 23.2% of ln(C)
The low R-Square indicated that the model does not fit the data
well (Brown, 2009).
6
Results and Discussion Cont’
State which Variables are Important in predicting the cost of
constructing an LWR plant?
S is a significant contributing factor in predicting ln(C)(p =
0.021), but N and CT have no significant effect in predicting (p
> 0.05)
7
Results and Discussion Cont’
State a prediction equation that can be used to predict ln(C).
After dropping N and CT from the model since they do not have
a significance effect in predicting ln(C), the prediction equation
is given by:
Does adding CT improve R2? If so, by what amount?
Adding CT in the model changes R-Square by 0.001 from 0.232
to 0.234 which is not significant different from zero (p > 0.05).
8
Results and Discussion Cont’ - Correlational Analysis
Evaluate the correlation between the two scores and state if
there seems to be any association between the two.
There was a weak positive correlation between the two tests (r =
0.187). This suggested that the two test scores were not
correlated.
9
Results and Discussion Cont’
Find the probability of upgrading for each division of the
sample by the Bayes’ theorem.
P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1)
= (23/46*46/86) ÷43/86
= 23/43
P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2)
= (23/46*46/86) ÷43/86
= 23/43
10
Results and Discussion Cont’
Find the probability of upgrading for each division of the
sample by the naïve version of the Bayes’ theorem
P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1)
= (23/46*46/86) ÷43/86
= 23/43
P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2)
= (23/46*46/86) ÷43/86
= 23/43
11
Results and Discussion Cont’
Compare your results in parts b and c and explain the difference
or indifference based on observed probabilities
Naïve version and Bayes theorem have similar probabilities.
We have only one predictor in each sample division
This is because Naïve is applied with Bayes's theorem with an
assumption of independence between the features of predictor
variables (Webb, 2010).
12
Conclusion and Recommendations
– LWR Plant
Most of the variations of the outcome variable are explained by
variables not included in the model.
Further analysis indicated that the S predictor had a significant
effect in predicting the cost of constructing an LWR plant (C).
N and CT did not have a significant effect in predicting.
Should drop N and CT predictors from the model.
13
Conclusion and Recommendations
– Employee Performance
The analysis also indicated that the two test were not related
with each other but, the first test had the best ability in
discriminating than the second test.
Which suggested that the first test was the best to use in
predicting whether employees will be unsuccessful or successful
in the position.
14
References
Alin, A. (2010). Multicollinearity. Wiley Interdisciplinary
Reviews: Computational Statistics, 2(3), 370-374.
Brown, J. D. (2009). The coefficient of determination.
Daoud, J. I. (2017, December). Multicollinearity and regression
analysis. In Journal of Physics: Conference Series (Vol. 949,
No. 1, p. 012009). IOP Publishing.
Keith, T. Z. (2019). Multiple regression and beyond: An
introduction to multiple regression and structural equation
modeling. Routledge.
Puth, M. T., Neuhäuser, M., & Ruxton, G. D. (2014). Effective
use of Pearson's product–moment correlation
coefficient. Animal behaviour, 93, 183-189.
Webb, G. I. (2010). Naïve Bayes. Encyclopedia of machine
learning, 15, 713-714.
Kolmogorov-Smirnov
a
Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
ln_C .104 32 .200
*
.967 32 .414
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
Model R
R
Square
Adjusted
R Square
Std. Error of
the Estimate
Change Statistics
R Square
Change
F
Change df1 df2
Sig. F
Change
1 .482
a
.232 .179 .34240 .232 4.385 2 29 .022
2 .483
b
.234 .151 .34814 .001 .052 1 28 .822
a. Predictors: (Constant), N, S
b. Predictors: (Constant), N, S, CT
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
Collinearity
Statistics
B Std. Error Beta Tolerance VIF
(Constant) 5.300 .277
19.161 .000
S .001 .000 .406 2.447 .021 .963 1.039
N .012 .010 .193 1.164 .254 .963 1.039
(Constant) 5.294 .283
18.718 .000
S .001 .000 .403 2.385 .024 .958 1.044
N .011 .010 .189 1.110 .276 .950 1.053
CT .028 .125 .038 .227 .822 .978 1.022
a. Dependent Variable: ln_C
2
Mohammed Alsaadi
Maria Claver
Gern 400
6/11/2021
Life Review: Proposal/Script
In life, every individual goes through unique
experiences during the different stages of their lives. It is
through these experiences that one is able to develop their
personal perspectives, judgments, and also acquire personal
strengths and wisdom. Much of what an individual is a today is
determined by the physical, emotional and mental experiences,
challenges and hardships that they have gone through in life.
For this proposal, I plan to interview my elderly
neighbour Salim, whom I have always been in awe of and who
has always fascinated me with his dinner conversation during
his visits. He is 67 years old and we have been neighbour for
more than 20 years. He and his family are very nice I spent
many times with them until they become like a family. The level
of this interview would be very comfortable and easy-going
since we know each other for a quite a long time. Furthermore,
this is a great chance to talk to him and learn more about his
experience growing up. I hope to learn from him about his
family background, his childhood, career and retirement and all
the significant experiences and events that have taken place in
his life. Through this interview, I expect to find out how one’s
life experiences can determine a person’s mental outlook and
physical conditions in the course of getting old. Apart from
acquiring information in this interview, I also intend to apply
the lessons and knowledge that I have accumulated in life also.
The questions in this interview will be containing components
of the biopsychosocial model, which entail biological,
psychological and social aspects that make up the reality of life
(Derek Bolton). To conclude, am hopeful of gaining a better
understanding of how these three aspects have influenced his
life and how it has affected both his mental and physical
health.
Biological Questions
1. When were you born? Do you know the date?
2. Can you describe the family background?
3. What was your childhood like and how were you brought up?
4. Who were your friends growing up, how did they influence
you? Are they still present in your life?
5. What are your opinions on the physical aspects of your life
and the changes to your body?
6. Do you suffer from health complication and if so, do you
need assistance in relation to the health complication?
7. What is your current diet and is it in relation to your health?
Psychological Questions
1. How do you perceive yourself and how would you describe
your life experiences?
2. What are some of the challenges that you have experienced in
your life and how have they influenced you? How were you able
to conquer these scenarios and how did it affect your life and
the person that you are today?
3. What was your coping mechanism in overcoming the stress,
anxiety and frustration caused by the challenges in your life?
4. Would you change the past given an opportunity to go back
in time, and what would you do differently?
5. What do you think is most different today from when you
were growing up?
6. How has technology and telecommunication affected you? Do
you find it easy or hard to cope with this era of technology?
Social Questions
1. What is your support system, is it family or social support?
How has it changed your experience?
2. Do you have a written will or testament and what does it
entail? What are your views on nursing homes or specialized
facilities?
3. Are you happy with the life you have led and do you feel
happier, sad or depressed as you advanced in age?
4. What do you do in your spare time for fun? What do you find
most entertaining?
5. What is your legacy? What do you think you will be
remembered for?
6. What positive or negative impacts will you leave on the
world? Will you leave the world a better place than you found
it?
7.
Works Cited
Derek Bolton, Grant Gillett. The Biopsychosocial Model of
Health and Disease: New Philosophical and Scientific
Development. Springer, 2019.
THE BIOLOGY OF AGING 3
The Biology of Aging.
Alazhar Alsaadi
CSULB
Dr. Maria Claver
6/11/2021
Study of aging.
Abstract.
I got the opportunity to interview Mrs. Fatima Al-Saadi with
who we live in the same house. She suffers from back pains. I
got the chance to assess her life, the challenges that are facing
her, and how she was dealing with them. Further, we assessed
the cause and how the condition was being managed.
Proposal.
Mrs. Fatima, 75 years old, is my grandmother and we stay in the
same house. It is entrusted upon me the duty of caring for her.
She is of medium built and has a condition with her backbone
and as a result, it does not support her body well. As a result,
all her movement is through the use of a wheelchair.
Before conducting the interview, she must be prepared to give
her ample time to recollect herself and be prepared for the
questions. After preparation, the interview will take place after
a period of two days. One will take place on the first day of the
week after which she will be allowed to rest and the other will
take place on the third day of the week. The interview will take
place in the living room which is where she receives and
entertains guests and will be conducted in the mid-morning. The
timing will ensure that she is not too exhausted to give accurate
answers.
The reason I chose her as the subject of the interview was due
to the stories that she narrated about herself as a child. She was
a strong and hardworking lady before an illness at the age of 45
that affected her spinal cord and since then, she has never been
able to walk again. As a result, it arouses a curiosity within me
as I tried to figure out the cause of her medical condition, and
the resultant effect of this on her health and her perception of
life. Every time looking at her vibrant in her wheelchair, it
becomes more of a challenge to see her without it.
As a result of the curiosity to know what happened to my
grandmother, this assignment level is very comfortable. Further,
she accepted her condition and moved on and is also an
inspiration to us thus I was comfortable choosing her as a
recipient of this assignment. Besides, as a result of the bond
between us, we are free with each other thus making the
interview easier to conduct.
By the time I complete this assignment, I hope to have known
the cause of her illness, whether it can be cured, and whether it
is hereditary and if so, how to protect future generations. The
interview theory that I shall use is the biology of aging theory
because it deals with the human body and the damages as a
result of how it has been programmed.
Script.
Interview questions.
1. Can you elaborate on your life prior to the illness?
2. Are there activities that you used to regularly carry out
before the illness?
3. Was there anything that happened that could have triggered
the illness?
4. What were the initial signs and symptoms pf the condition?
5. Were you bullied?
6. How do you explain the illness to your friends and relatives?
7. Does questioning you about the diseases by any way make
you uncomfortable?
8. Do you feel as if you differ from other people?
9. What is the most challenging thing about being disabled?
10. What is your perception of self?
11. Do people treat you differently when they visit than they
once did?
12. Do you ever wish you were able to walk again?
13. Are the medication expenses a burden to you?
14. What does it feel like to be on medication on a daily basis?
15. How are you able to be so optimistic about life in your
condition?
16. Was the illness by any chance preventable? If it was, how
would you advise future generations to protect themselves?
Running Head: LOGISTIC REGRESSION AND
DISCRIMINANT ANALYSIS 1
LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS
9
LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS
NAME:
INSTRUCTOR:
DATE:
Part I
Logistic Regression
The age and gender of guests in a nursing home were examined
whether they are the cause of deaths in 2015. Data was
collected for gender, age and whether the guest died or not. In
this case, death is our dependent variable while age and gender
are the independent variables.
Since the dependent variable “died” was categorical with two
levels, this was an indication that logistic regression analysis
was suitable for prediction in this study (Austin & Merlo,
2017). The assumption of a dichotomous dependent variable
was met in this case where “died” took to values 0) No, 1) Yes.
The assumption of one or more predictor variables was met.
Age was quantitative reporting respective ages of the guests.
Gender was categorical with two levels 0) Females and 1)
Males.
Analysis
The collected data was analyzed to examine the
relationship between the predictor variables and the binary
dependent variable. A sample of 284 guests was used for this
study for easy analysis and generalizations. The overall logistic
regression model is given as;
Table 1 shows the total number of participants and the valid
sample that was utilized in this study.
Table 1: Logistic regression summary
The analysis showed that there were 144 successes and 140
failures. According to the results, the overall model was
statistically significant with χ2 (2) = 82.46, p < 0.001 (Warner,
2020). This implies that we can carry on with the analysis.
Analysis showed that gender and age were both statistically
significant and contributed to the variation in deaths. Gender
was statistically significant where b = 1.96, OR = 7.08, p < 0.05
implying it had an impact on deaths. Age was also statistically
significant where b = 0.196, OR = 1.22, p < 0.05. The
likelihood of dying is 7.08 times higher in males as compared to
females according to the odds ratio. Older people are 1.22 more
likely to succeed in the tests (Norton et al., 2018).
The logistic regression equation that helps in prediction of
death of a person given the age and gender will be given as;
Part II
Discriminant Analysis
Two tests were developed in a firm to determine whether some
of the employees will perform in a given position. A sample of
43 employees was examined. The main aim is to group
employees as either successful or unsuccessful by using the
tests given.
Discriminant analysis suits this case since exclusive grouping
was required and the dependent variable was categorical with
two groups 0) Unsuccessful and 1) Successful (Bowerman et al.,
2019). Two independent variables used in this study (Test 1 and
Test 2) were quantitative reporting the scores of the employees
in the two tests.
Analysis
Discriminant analysis was carried out in SPSS to classify the
employees as successful or unsuccessful based on the two tests.
Descriptive statistics were as shown in table 2.
Table 2: Descriptive statistics
Group Statistics
Group
Mean
Std. Deviation
Valid N (listwise)
Unweighted
Weighted
Unsuccessful
Test1
84.7500
4.24109
20
20.000
Test2
79.1000
4.38778
20
20.000
Successful
Test1
92.4348
3.47492
23
23.000
Test2
84.7826
6.23740
23
23.000
Total
Test1
88.8605
5.43175
43
43.000
Test2
82.1395
6.10847
43
43.000
The mean of test 1 in the unsuccessful group was 84.75 while
for test 2 in the unsuccessful group was 79.10. The means for
test 1 and 2 in the successful group were 92.43 and 84.78
respectively.
Table 3 shows the importance of the independent variables in
the discriminant function used to group the employees.
Table 3: Test of equality of group means
Tests of Equality of Group Means
Wilks' Lambda
F
df1
df2
Sig.
Test1
.490
42.644
1
41
.000
Test2
.780
11.593
1
41
.001
According to the analysis, both tests scores were statistically
significant in for the discriminant function.
Table 4 shows the correlation matrix of the predictor variables.
Table 4: Correlation matrix
Pooled Within-Groups Matrices
Test1
Test2
Correlation
Test1
1.000
.187
Test2
.187
1.000
According to the analysis, the correlation between the scores of
test 1 and test 2 was r = 0.19. This is a weak positive
relationship implying the independent variables are not
correlated.
The assumption of multivariate normality was examined and the
test results were as shown in the Box’s M statistics given in
table 10 (Ul Hassan et al., 2017).
Table 5: Homogeneity of covariance matrix
Test Results
Box's M
5.014
F
Approx.
1.582
df1
3
df2
936960.353
Sig.
.191
Tests null hypothesis of equal population covariance matrices.
According to the analysis, it is clear that groups did not differ
in the covariance matrices implying that the assumption is not
violated and the analysis can continue.
According to table 6, one discriminant function was found given
the two-grouped dependent variable.
Table 6: Canonical discriminant function
Eigenvalues
Function
Eigenvalue
% of Variance
Cumulative %
Canonical Correlation
1
1.161a
100.0
100.0
.733
a. First 1 canonical discriminant functions were used in the
analysis.
The strong positive canonical correlation implies that there was
a strong association between the discriminant function and the
dependent variable (Uurtio et al., 2017).
Table 7 shows the coefficients of the independent variables.
Table 7: Standardized canonical discriminant function
coefficients
Standardized Canonical Discriminant Function Coefficients
Function
1
Test1
.885
Test2
.328
According to the analysis, Test 1 had the best ability in
discriminating as compared to Test 2. This implies that Test 1 is
very significant in predicting whether employees will be
successful or unsuccessful in the position.
Table 8 shows the unstandardized canonical coefficients of the
model.
Table 8: Unstandardized canonical coefficients
Canonical Discriminant Function Coefficients
Function
1
Test1
.230
Test2
.060
(Constant)
-25.380
Unstandardized coefficients
The discriminant equation becomes;
D = -25.38 + 0.23*Test 1 + 0.06*Test 2
Table 9 shows the classification of the given variables.
Table 9: Classification
Classification Resultsa,c
Group
Predicted Group Membership
Total
Unsuccessful
Successful
Original
Count
Unsuccessful
16
4
20
Successful
5
18
23
%
Unsuccessful
80.0
20.0
100.0
Successful
21.7
78.3
100.0
Cross-validatedb
Count
Unsuccessful
16
4
20
Successful
5
18
23
%
Unsuccessful
80.0
20.0
100.0
Successful
21.7
78.3
100.0
a. 79.1% of original grouped cases correctly classified.
b. Cross validation is done only for those cases in the analysis.
In cross validation, each case is classified by the functions
derived from all cases other than that case.
c. 79.1% of cross-validated grouped cases correctly classified.
Analysis showed that 80% of the employees classified as
unsuccessful were unsuccessful while 20% who were successful
were classified as unsuccessful. 78.30% successful employees
were classified as successful while 21.7% successful were
classified as unsuccessful. In overall, 79.10% cases were
correctly classified.
Reference
Austin, P. C., & Merlo, J. (2017). Intermediate and advanced
topics in multilevel logistic regression analysis. Statistics in
medicine, 36(20), 3257-3277.
Warner, R. M. (2012). Applied statistics: From bivariate
through multivariate techniques. Sage Publications.
Norton, E. C., Dowd, B. E., & Maciejewski, M. L. (2018). Odds
ratios—current best practice and use. Jama, 320(1), 84-85.
Bowerman, B., Drougas, A. M., Duckworth, A. G., Hummel, R.
M. Moniger, K. B., & Schur, P. J. (2019). Business
statistics and analytics in practice (9th ed.). McGraw-Hill
Ul Hassan, E., Zainuddin, Z., & Nordin, S. (2017). A review of
financial distress prediction models: logistic regression and
multivariate discriminant analysis. Indian-Pacific Journal of
Accounting and Finance, 1(3), 13-23.
Uurtio, V., Monteiro, J. M., Kandola, J., Shawe-Taylor, J.,
Fernandez-Reyes, D., & Rousu, J. (2017). A tutorial on
canonical correlation methods. ACM Computing Surveys
(CSUR), 50(6), 1-33.
Introduction
In this paper, I will use the cardUpgrade dataset to run the K
nearest neighbor algorithm in JMP and predict the likeli hood of
a customer upgrading to platinum status or not. This dataset has
three attributes:
a) UpGrade – this column contains categorical data represented
in nominal form.
b) Purchases – is a numeric column that describes the monetary
value of purchases made by each customer.
c) PlatProfile – this is also a categorical data that was used at
the time the customers signed up to evaluate whether they fit
the profile for a platinum member or not.
K Nearest Neighbors (KNN)
The K Nearest Neighbors algorithm is a supervised non-
parametric lazy learning algorithm applicable to either
classification or regression tasks (Okfalisa et al., 2017). First, a
supervised machine learning algorithm relies on labeled training
input to produce a desired output given non-labelled data.
Secondly, non-parametric means that the algorithm makes no
assumptions and any model built on it relies entirely on the
training data it is fed without further assumptions on the
structure of the data. Thirdly, the algorithm is lazy learning
meaning that it does not make any generalization since the
training is minimal. In essence, as opposed to most machine
learning algorithms, the training data used in a KNN model is
also used to test the model. KNN classifies a single data point
by comparing it to the points it is closest and most similar to. It
assumes that similar items exist in close proximity
KNN is apt for different ML tasks including decision-making
(as in this task), recommender systems, and image recognition.
For this particular task, I will use JMP to apply KNN on a
customer’s dataset. The model will compare the initial customer
profile attribute and purchase history to their current upgrade
status to be able to predict whether other customers would be
willing to upgrade or not. Because this is a classification task,
the output should be a discrete value that shows whether a
customer will upgrade or not. There is no middle ground which
is why the values are binary, that is, 0 or 1. The model will
have two predictors and a label. The output from this model will
also be nominal which means it represents the upgrade status of
an individual. While the output will be in the form of numerical
values (0 or 1), these numbers are only representational and
have no mathematical meaning (Ghattas et al., 2017).
KNN Scheme on JMP
First I imported the excel data set into JMP. The application
treats the two categorical columns as numeric. This must be
specified back to nominal in JMP. While this process is rather
limited, it is a form of data cleaning when creating a machine
learning model. Other data cleaning tasks typically include
identification and removal of duplicate records/observations and
finding ways to handle missing values. The second step is to
conduct an exploratory data analysis (EDA). According to Jebb
et al., this stage helps to decide the algorithm and variables that
would be used (2017). EDA also involves data visualization
which gives a cursory insight about the observations. For
instance, below is a bubble plot of UpGrade status against
purchases. It shows that those who did not upgrade are more
concentrated on the lower end of purchases while those who did
upgrade made higher purchase volumes. It also shows some
outliers which should be handled. I have chosen to ignore these
outliers because there are only one for each class meaning that
they would have minimal impact on the model. Additionally, the
dataset contains only forty observations, thus, there is very
little room for cropping out other observations.
I also visualized the distribution of the purchase column to
determine the percentage. The graph shows that the distribution
is almost split evenly (55% for non-upgrades and 45% for
upgrades).
Splitting the Data
For most machine learning modeling tasks, the data should only
be split into training and testing sets. The train-test approach is
used to avoid evaluating the model based on training dataset as
this would result in a biased score. Kuhn & Johnson state that it
is pragmatic for the model to be evaluated on data that was not
used to either build or finetune it (2013, p. 67). Some
algorithms including KNN, on the other hand, produce the best
results when the data is split into three including a validation
set. This third set is often seen as another set of test data, but it
is used to tune the hyperparameters of the model. However,
Touvron et al., elaborate that this splitting of the data into two
or three sets only produces optimal results when the dataset has
a large amount of data such that each class that could
potentially be observed is included in each set (2020).
Since this dataset only has forty observations, it cannot be split
thrice without compromising on model performance. It is,
instead, split into training and test sets at the ratio of 4:1. This
gives 32 observations for training and 8 observations for
testing. The KNN algorithm also expects a k value which is
typically selected as the square root of the total number of
observations. When there are only two classes to be predicted,
the standard practice is to pick an odd number to avoid a tie
during majority voting. For this reason, I have picked a 7 as the
value of k.
Confusion Matrix Interpretation
The confusion or error matrix is used to enhance the
understanding of a classification task. Simply reporting on the
accuracy of the model does not fully capture the performance of
the algorithm (Okfalisa et al., 2017). The precision and recall
scores are calculated from the output of the confusion matrix. In
the matrix, the predicted values are described as either positive
or negative whereas the actual values are resented as true or
false. In the end, there are four possibilities:
a) True Positive – these are the values that have been correctly
predicted by the model as positive and they are true.
b) True Negative – these values are actually true but the model
predicts them as negative.
c) False Positive – values predicted as positive yet they are
actually false
d) False Negative – these values have been predicted as false
and they are also negative.
In this case these values would be interpreted as follows:
a) True Positive (TP) – customers who did upgrade to platinum
and who the model has correctly predicted to have upgraded.
b) True Negative (TN) – customers predicted to upgrade yet did
not upgrade.
c) False Positive (FP) – customers predicted to not upgrade yet
they did upgrade.
d) False Negative (FN) – customers correctly predicted to not
have upgraded their cards to platinum.
The matrix for this model is as in the image below:
This means that the model produces 2 TPs, 0 TN, 1 FP, and 5
FN. Thus, only one out of the eight possible predictions is
incorrect. This model, therefore, has an accuracy score of 87.5%
PART 2
Using the Bayes theorem formula as indicated below, we can
calculate the probability that a customer who makes purchases
above 32.450 will also be likely to upgrade to platinum (Rouder
& Morey, 2018).
Below are the notations for the probabilities:
P(A) – the probability that a customer upgrades to platinum
(45%)
P(B) – the probability that a customer makes purchases equal to
or more than 32.450 (52.5%)
P(A|B) – the probability that a customer upgrades given that he
makes purchases equal to or above 32.450 (79.32%)
P(B|A) – the probability that a customer makes purchases equal
to or above 32.450 given that he has upgraded.
Thus P(B|A) = .9254, that is, 92.54% which is a significant
improvement from 87.5%. Therefore, using the Bayes theorem
formula, the model’s accuracy has been improved.
References
Bowerman, B., Drougas, A. M., Duckworth, A. G., Hummel, R.
M., Moniger, K. B., & Schur, P. J. (2019). Business statistics
and analytics in practice (pp. 186–189). Mcgraw-Hill Education.
Ghattas, B., Michel, P., & Boyer, L. (2017). Clustering nominal
data using unsupervised binary decision trees: Comparisons
with the state of the art methods. Pattern Recognition, 67, 177–
185. https://doi.org/10.1016/j.patcog.2017.01.031
Jebb, A. T., Parrigon, S., & Woo, S. E. (2017). Exploratory data
analysis as a foundation of inductive research. Human Resource
Management Review, 27(2), 265–276.
https://doi.org/10.1016/j.hrmr.2016.08.003
Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling.
In Google Books (p. 67). Springer Science & Business Media.
https://books.google.com/books/about/Applied_Predictive_Mod
eling.html?id=xYRDAAAAQBAJ&source=kp_book_description
Okfalisa, Gazalba, I., Mustakim, & Reza, N. G. I. (2017).
Comparative analysis of k-nearest neighbor and modified k-
nearest neighbor algorithm for data classification. 2017 2nd
International Conferences on Information Technology,
Information Systems and Electrical Engineering (ICITISEE).
https://doi.org/10.1109/icitisee.2017.8285514
Rouder, J. N., & Morey, R. D. (2018). Teaching Bayes’
Theorem: Strength of Evidence as Predictive Accuracy. The
American Statistician, 73(2), 186–190.
https://doi.org/10.1080/00031305.2017.1341334
Touvron, H., Vedaldi, A., Douze, M., & Jégou, H. (2020).
Fixing the train-test resolution discrepancy. ArXiv:1906.06423
[Cs]. https://arxiv.org/abs/1906.06423
Running Head: MERGING AND ACQUISITION
1
MERGING AND ACQUISITION
2
MERGING AND ACQUISITION
NAME:
INSTRUCTOR:
DATE:
Merging and Acquisition
Introduction
Some of the factors determining merger and acquisition
activities in retailing were examined in this paper to aid in
decision making that have a positive impact on a firm.
Characteristics of firms that were targeted for acquisition and
the firms that were willing to make acquisition were looked
into. Growth rate of sales for target firms and bidders were
tested to enable management make decision wisely on
acquisition (Christofi et al., 2017).
Growth rate of sales for target firms
The growth rate of sales for firms targeted for acquisition was
tested using normal distribution tests. A sample of 25 firms was
collected for this study. The mean sales growth rate was 0.16
and the standard deviation was 0.12. These are required in order
to test for normally distributed data (D’Agostino, 2017). We
perform t-test since the only sample statistics were known
(Emmert-Streib & Dehmer, 2019).
The research question for this study will be: Is there statistical
significant difference between the population mean and sample
mean for sales growth rate of target firms? The hypothesis
testing process was as given below;
Hypotheses
The following are the null and alternative hypotheses.
H0: μ = .10. The mean growth rate of sales for target firms is
not different from 10%
Ha: μ >.10. The mean growth rate of sales foe target firms
exceeds 10%.
Test statistics
T-test was carried out to examine whether the sales growth rate
of 0.16 was indeed statistically significant different from the
0.10.
The t-test is given as;
t =
= = 2.50
Calculate the degrees of freedom by 1 from the sample size
(Sung & Han, 2018);
df = 25 – 1 = 24
Determine the critical value from the t table at α = 0.10, 0.05,
0.01 and 0.001.
For α = 0.10, tα = 1.318 which is less than the calculated value
of t = 2.50 implying we reject the null hypothesis at α = 0.10
and conclude Ha: μ >.10.
For α = 0.05, tα = 1.711 which is less than the calculated value
of t = 2.50 implying we reject the null hypothesis at α = 0.05
and conclude Ha: μ >.10.
For α = 0.01, tα = 2.492 which is less than the calculated value
of t = 2.50 implying we reject the null hypothesis at α = 0.01
and conclude Ha: μ >.10.
For α = 0.001, tα = 3.467 which is less than the calculated value
of t = 2.50 implying we fail to reject the null hypothesis at α =
0.001 and conclude Ha: μ =.10.
Decision
There was very strong evidence to reject the null
hypothesis at α = 0.01 and we conclude that the mean growth
rate of sales for target firms exceeds 10% (Bowerman et al.,
2019).
Growth rate of sales for bidders
The growth rate for firms that were willing to make
acquisition was examined in order to come up with decisions
regarding acquisition. A sample size of 25 firms was collected
for analysis necessary for decision making. The mean growth
rate of sales was 0.12 and the standard deviation was 0.09. We
perform t-test since the only sample statistics were known;
Hypotheses Tested
Hypotheses to be tested;
H0: μ = .10. The mean growth rate of sales for bidders is not
different from 10%
Ha: μ >.10. The mean growth rate of sales foe bidders exceeds
10%.
Test statistics
T-test was carried out to examine whether the sales growth
rate of 0.12 was indeed statistically significant different from
the 0.10.
The t-test formula is given as follows;
t =
= = 1.111
Calculate the degrees of freedom by 1 from the sample size;
= 25 – 1 = 24
Determine the critical value from the t table at α = 0.10, 0.05,
0.01 and 0.001.
For α = 0.10, tα = 1.318 which is greater than the calculated
value of t = 1.111 implying we fail to reject the null hypothesis
at α = 0.10 and conclude Ha: μ = .10 (Trafimow, & Earp, 2017).
For α = 0.05, tα = 1.711 which is greater than the calculated
value of t = 1.111 implying we fail to reject the null hypothesis
at α = 0.10 and conclude Ha: μ = .10.
For α = 0.01, tα = 2.492 which is greater than the calculated
value of t = 1.111 implying we fail to reject the null hypothesi s
at α = 0.10 and conclude Ha: μ = .10.
For α = 0.001, tα = 3.467 which is greater than the calculated
value of t = 1.111 implying we fail to reject the null hypothesis
at α = 0.10 and conclude Ha: μ = .10.
Decision
We failed to reject the null hypothesis at α = 0.10 and conclude
that there was extremely evidence that the mean growth rate of
sales for bidders did NOT exceed 10%.
Conclusion
Analysis shows that the growth of sales for firms targeted
for acquisition exceeded 10% but the firms that were will ing to
place bids for acquisition had sales growth rate not exceeding
10%. The bidding firms should be analyzed critically to ensure
they meet the requirements of acquiring the already existing
firms that have great growth rate curves.
Reference
Christofi, M., Leonidou, E., & Vrontis, D. (2017). Marketing
research on mergers and acquisitions: a systematic review and
future directions. International Marketing Review.
D’Agostino, R. B. (2017). Tests for the normal distribution.
In Goodness-of-fit techniques (pp. 367-420). Routledge.
Emmert-Streib, F., & Dehmer, M. (2019). Understanding
statistical hypothesis Testing: the logic of statistical
inference. Machine Learning and Knowledge Extraction, 1(3),
945-961.
Bowerman, B., Drougas, A. M., Duckworth, A. G., Hummel, R.
M. Moniger, K. B., & Schur, P. J. (2019). Business
statistics and analytics in practice (9th ed.). McGraw-Hill
ISBN 9781260187496
Sung, W. P., & Han, T. Y. (Eds.). (2018, July). Exploration and
Practice of a New Formula for Calculating the Degree of
Freedom. In MATEC Web of Conferences (Vol. 175, p. 03018).
EDP Sciences.
Trafimow, D., & Earp, B. D. (2017). Null hypothesis
significance testing and Type I error: The domain
problem. New Ideas in Psychology, 45, 19-27.
Introduction
The prediction is the process of determining the magnitude of
predictors on the response variable. Prediction helps determine
the value of the outcome variable in the future using predictors
or factors included during the study. Moreover, multiple linear
regression is a statistical test used in assessing the relationship
between the response variable and more than one predictor
variable (Keith, 2019). Also, Cacoullos (2014) argued that
discriminant which is used in testing the equality of group
centroids is associated with multivariate analysis of variance
since it uses Wilks’ lambda applied in GLM multivariate
In this regard, multiple linear regression and discriminant
analysis are the most appropriate statistical techniques in
predicting the outcome of a dependent variable at different
values of the predictor variables and assess the contribution of
the predictors on the outcome and association between the
independent variable respectively (Keith, 2019). Besides, the
researcher can assess the variation of the outcome explained by
the independent variables included in the model. The test is
used in predicting the values of the response variable using
more than one predictor variable. Discriminant analysis was
used to assess the association between the two tests that were
developed in a firm to examine employee performance. The
study used a sample of 43 employees who were grouped as
either successful or unsuccessful and performed the two test to
assess their performance in a given position. Before conducting
both multiple linear regression and discriminant, it is good to
check whether the variables meet the necessary assumptions.
For multiple linear regression dependent variable must be
continuous and approximately normally distributed, while the
predictor variables can either be continuous or categorical. For
discriminant analysis the dependent variable must be divided
into two or more groups. Besides, the predictor variables should
not be correlated with each other, or there should be no
multicollinearity (Alin, 2010) for both statistical tests. This can
be assessed using the variance inflation factor, which should be
less than ten, or through correlation coefficients between the
predictor variables. The current study assessed the relationship
between the cost of constructing an LWR plant and the three
predictor variables S, N, and CT and assessed the association
between the two test used to examine the employee
performance.
Assumption of Regression Analysis
Multicollinearity
Correlation analysis is used in determining the strength and
direction of association between two variables (Puth et al.,
2014). The Pearson correlation coefficient is used for testing
the strength and direction of association between two variables
with a continuous level of measurement. However, when the
variables have an ordinal level of measurement, we use the
Spearman rank correlation coefficient (Puth et al., 2014). The
Pearson correlation coefficient ranges from -1 to 1, with -1 or 1
indicating perfect correlation and zero indicating no cor relation.
Table 1: Correlation Analysis
S
N
S
Pearson Correlation
1
.193
Sig. (2-tailed)
.289
N
32
32
N
Pearson Correlation
.193
1
Sig. (2-tailed)
.289
N
32
32
The correlation analysis in table 1 above indicates that the
correlation between the two predictor variables (S and N) is
positively weak and not significant at 0.05 level of significance
(r = 0.289, p = 0.193). This suggested that multicollinearity
does not exist and the assumption multicollinearity is not
violated. Furthermore, based on the analysis, the variance
inflation factor (VIF) is less than 10 (Daoud, 2017, December),
suggesting that the multicollinearity assumption is not violated.
Normality test
Table 2: Tests of Normality
Kolmogorov-Smirnova
Shapiro-Wilk
Statistic
df
Sig.
Statistic
df
Sig.
ln_C
.104
32
.200*
.967
32
.414
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
The test of normality of the dependent variable (ln(C)) revealed
that the normality assumption is not violated after transforming
the variable C, using natural log at 0.05 level of significance
(Shapiro-Wilk = 0.967, p = 0.414).
Results and discussion
Regression Analysis
a. Use residual analysis and R2 to check your model.
Table 3: Model Summary
Model
R
R Square
Adjusted R Square
Std. Error of the Estimate
Change Statistics
R Square Change
F Change
df1
df2
Sig. F Change
1
.482a
.232
.179
.34240
.232
4.385
2
29
.022
2
.483b
.234
.151
.34814
.001
.052
1
28
.822
a. Predictors: (Constant), N, S
b. Predictors: (Constant), N, S, CT
The R-Squared of 0.232 indicates that the model can explain
about 23.2% of ln(C) variation and 76.8% of the variation is
explained by other variables not included in the model. Besides,
the analysis indicated high residuals. The low R-Square and
high residuals indicated that the model does not fit the data well
(Brown, 2009).
b. State which variables are important in predicting the cost of
constructing an LWR plant?
c. Table 4: Regression Coefficients
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
Collinearity Statistics
B
Std. Error
Beta
Tolerance
VIF
1
(Constant)
5.300
.277
19.161
.000
S
.001
.000
.406
2.447
.021
.963
1.039
N
.012
.010
.193
1.164
.254
.963
1.039
2
(Constant)
5.294
.283
18.718
.000
S
.001
.000
.403
2.385
.024
.958
1.044
N
.011
.010
.189
1.110
.276
.950
1.053
CT
.028
.125
.038
.227
.822
.978
1.022
a. Dependent Variable: ln_C
The regression analysis displayed in table 4 above indicates that
S is a significant contributing factor in predicting ln(C) at a
0.05 level of significance (p = 0.021). However, the predictor
variable N does not significantly predict the ln(C) at a 0.05
level of significance (p = 0.254). According to the analysis,
there is no significant difference in ln(C) between the two
levels of the cooling tower (p = 0.822), suggesting that the
dummy variable CT does not have a significant effect in
predicting ln(C). Therefore, the researcher used the S predictor
to predict the cost of constructing an LWR plant but removed N
and CT from the model.
c. State a prediction equation that can be used to predict ln(C).
After dropping N and CT from the model since they do not have
a significance effect in predicting ln(C), the prediction equation
is given by:
d. Does adding CT improve R2? If so, by what amount?
Based on the analysis displayed on table 3 above, there is no
significant improvement in R-Squared after adding CT (p =
0.822). Adding CT in the model changes R-Square by 0.001
from 0.232 to 0.234 which is not significant different from zero.
Correlational Analysis
a. Evaluate the correlation between the two scores and state if
there seems to be any association between the two.
Table 5: Pooled Within-Groups Matrices
Test1
Test2
Correlation
Test1
1.000
.187
Test2
.187
1.000
The correlation analysis is shown in table 5 above indicates that
there was a weak positive correlation between the two tests (r =
0.187). This suggested that the two test scores were not
correlated.
b. Find the probability of upgrading for each division of the
sample by the Bayes’ theorem.
Given that: P(T1) = 43/86; P(T2) = 43/86
P (T1/Up) = 23/46; P (T2/Up) = 23/46
P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1)
= (23/46*46/86) ÷43/86
= 23/43
P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2)
= (23/46*46/86) ÷43/86
= 23/43
c. Find the probability of upgrading for each division of the
sample by the naïve version of the Bayes’ theorem.
P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1)
= (23/46*46/86) ÷43/86
= 23/43
P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2)
= (23/46*46/86) ÷43/86
= 23/43
d. Compare your results in parts b and c and explain the
difference or indifference based on observed probabilities
Since we have only one predictor in each sample division, the
naïve version and Bayes theorem have similar probabilities.
There is indifference based on observed probabilities. This is
because it is applied with Bayes's theorem with an assumption
of independence between the features of predictor variables
(Webb, 2010).
Conclusion and Recommendations
The analysis revealed that the model with the three predictors
predicting the cost of constructing an LWR plant does not fit
the data well. This suggested that most of the variations of the
outcome variable are explained by variables not included in the
model. Further analysis indicated that the S predictor had a
significant effect in predicting the cost of constructing an LWR
plant (C). However, N and CT did not have a significant effect
in predicting. Therefore, the researcher should drop N and CT
predictors from the model and only use the S predictor in
predicting the cost of constructing the LWR plant. The analysis
also indicated that the two test were not associated with each
other but, the first test had the best potential in discriminating
than the second test. Which suggested that the first test was the
best to use in predicting whether employees will be
unsuccessful or successful in the position.
Nevertheless, the study did not control for alternative
explanations that would affect the validity of the findings.
Further study is needed that will include variables with a good
fit of the data to help in predicting the cost of constructing an
LWR plant. In addition, a study that will control all possible
confounding is required to help in the prediction of the outcome
variable and assessing the best test in predicting employee
performance.
References
Alin, A. (2010). Multicollinearity. Wiley Interdisciplinary
Reviews: Computational Statistics, 2(3), 370-374.
Brown, J. D. (2009). The coefficient of determination.
Cacoullos, T. (Ed.). (2014). Discriminant analysis and
applications. Academic Press.
Daoud, J. I. (2017, December). Multicollinearity and regression
analysis. In Journal of Physics: Conference Series (Vol. 949,
No. 1, p. 012009). IOP Publishing.
Keith, T. Z. (2019). Multiple regression and beyond: An
introduction to multiple regression and structural equation
modeling. Routledge.
Puth, M. T., Neuhäuser, M., & Ruxton, G. D. (2014). Effective
use of Pearson's product–moment correlation
coefficient. Animal behaviour, 93, 183-189.
Webb, G. I. (2010). Naïve Bayes. Encyclopedia of machine
learning, 15, 713-714.

Contenu connexe

Similaire à CLA 2 PresentationBUS 606 Advanced Statistical Concepts An(20)

Plus de VinaOconner450(20)

Dernier(20)

ICS3211_lecture 08_2023.pdfICS3211_lecture 08_2023.pdf
ICS3211_lecture 08_2023.pdf
Vanessa Camilleri68 vues
Industry4wrd.pptxIndustry4wrd.pptx
Industry4wrd.pptx
BC Chew153 vues
Dance KS5 BreakdownDance KS5 Breakdown
Dance KS5 Breakdown
WestHatch52 vues
Streaming Quiz 2023.pdfStreaming Quiz 2023.pdf
Streaming Quiz 2023.pdf
Quiz Club NITW87 vues
Psychology KS5Psychology KS5
Psychology KS5
WestHatch53 vues
AI Tools for Business and StartupsAI Tools for Business and Startups
AI Tools for Business and Startups
Svetlin Nakov57 vues
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
DR .PALLAVI PATHANIA156 vues
Structure and Functions of Cell.pdfStructure and Functions of Cell.pdf
Structure and Functions of Cell.pdf
Nithya Murugan142 vues
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptxGopal Chakraborty Memorial Quiz 2.0 Prelims.pptx
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptx
Debapriya Chakraborty221 vues
Nico Baumbach IMR Media ComponentNico Baumbach IMR Media Component
Nico Baumbach IMR Media Component
InMediaRes1186 vues
discussion post.pdfdiscussion post.pdf
discussion post.pdf
jessemercerail70 vues
Azure DevOps Pipeline setup for Mule APIs #36Azure DevOps Pipeline setup for Mule APIs #36
Azure DevOps Pipeline setup for Mule APIs #36
MysoreMuleSoftMeetup75 vues
Classification of crude drugs.pptxClassification of crude drugs.pptx
Classification of crude drugs.pptx
GayatriPatra1449 vues
Universe revised.pdfUniverse revised.pdf
Universe revised.pdf
DrHafizKosar84 vues
Psychology KS4Psychology KS4
Psychology KS4
WestHatch52 vues

CLA 2 PresentationBUS 606 Advanced Statistical Concepts An

  • 1. CLA 2 Presentation BUS 606 Advanced Statistical Concepts And Business Analytics Agenda Introduction Multiple linear regression is the most appropriate statistical technique in predicting the outcome of a dependent variable at different values (Keith, 2019). The study assessed the relationship between the cost of constructing an LWR Plant and the three predictor variables S, N, and CT. We assessed the association between the two-test used to examine the employee performance.
  • 2. Assumption of Regression Analysis Multicollinearity Multicollinearity is the condition where the predictor variables are highly correlated (Alin, 2010). Correlation Analysis 4 Assumption of Regression Analysis Cont’ Normality test The normality assumption is not violated after transforming the outcome variable C, using natural log (C) (Shapiro-Wilk = 0.967, p = 0.414). 5 Results and Discussion – Regression Analysis Use Residual Analysis and R2 to Check Your Model
  • 3. The R-Squared of 0.232 indicates that the model can explain about 23.2% of ln(C) The low R-Square indicated that the model does not fit the data well (Brown, 2009). 6 Results and Discussion Cont’ State which Variables are Important in predicting the cost of constructing an LWR plant? S is a significant contributing factor in predicting ln(C)(p = 0.021), but N and CT have no significant effect in predicting (p > 0.05) 7 Results and Discussion Cont’ State a prediction equation that can be used to predict ln(C). After dropping N and CT from the model since they do not have a significance effect in predicting ln(C), the prediction equation is given by:
  • 4. Does adding CT improve R2? If so, by what amount? Adding CT in the model changes R-Square by 0.001 from 0.232 to 0.234 which is not significant different from zero (p > 0.05). 8 Results and Discussion Cont’ - Correlational Analysis Evaluate the correlation between the two scores and state if there seems to be any association between the two. There was a weak positive correlation between the two tests (r = 0.187). This suggested that the two test scores were not correlated. 9 Results and Discussion Cont’ Find the probability of upgrading for each division of the sample by the Bayes’ theorem. P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1) = (23/46*46/86) ÷43/86 = 23/43 P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2) = (23/46*46/86) ÷43/86
  • 5. = 23/43 10 Results and Discussion Cont’ Find the probability of upgrading for each division of the sample by the naïve version of the Bayes’ theorem P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1) = (23/46*46/86) ÷43/86 = 23/43 P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2) = (23/46*46/86) ÷43/86 = 23/43 11 Results and Discussion Cont’ Compare your results in parts b and c and explain the difference or indifference based on observed probabilities Naïve version and Bayes theorem have similar probabilities. We have only one predictor in each sample division This is because Naïve is applied with Bayes's theorem with an assumption of independence between the features of predictor variables (Webb, 2010). 12
  • 6. Conclusion and Recommendations – LWR Plant Most of the variations of the outcome variable are explained by variables not included in the model. Further analysis indicated that the S predictor had a significant effect in predicting the cost of constructing an LWR plant (C). N and CT did not have a significant effect in predicting. Should drop N and CT predictors from the model. 13 Conclusion and Recommendations – Employee Performance The analysis also indicated that the two test were not related with each other but, the first test had the best ability in discriminating than the second test. Which suggested that the first test was the best to use in predicting whether employees will be unsuccessful or successful in the position. 14 References Alin, A. (2010). Multicollinearity. Wiley Interdisciplinary Reviews: Computational Statistics, 2(3), 370-374. Brown, J. D. (2009). The coefficient of determination. Daoud, J. I. (2017, December). Multicollinearity and regression analysis. In Journal of Physics: Conference Series (Vol. 949, No. 1, p. 012009). IOP Publishing. Keith, T. Z. (2019). Multiple regression and beyond: An
  • 7. introduction to multiple regression and structural equation modeling. Routledge. Puth, M. T., Neuhäuser, M., & Ruxton, G. D. (2014). Effective use of Pearson's product–moment correlation coefficient. Animal behaviour, 93, 183-189. Webb, G. I. (2010). Naïve Bayes. Encyclopedia of machine learning, 15, 713-714. Kolmogorov-Smirnov a Shapiro-Wilk Statistic df Sig. Statistic df Sig. ln_C .104 32 .200 * .967 32 .414 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics R Square Change F Change df1 df2 Sig. F Change 1 .482 a
  • 8. .232 .179 .34240 .232 4.385 2 29 .022 2 .483 b .234 .151 .34814 .001 .052 1 28 .822 a. Predictors: (Constant), N, S b. Predictors: (Constant), N, S, CT Model Unstandardized Coefficients Standardized Coefficients t Sig. Collinearity Statistics B Std. Error Beta Tolerance VIF (Constant) 5.300 .277 19.161 .000 S .001 .000 .406 2.447 .021 .963 1.039 N .012 .010 .193 1.164 .254 .963 1.039 (Constant) 5.294 .283 18.718 .000 S .001 .000 .403 2.385 .024 .958 1.044 N .011 .010 .189 1.110 .276 .950 1.053 CT .028 .125 .038 .227 .822 .978 1.022 a. Dependent Variable: ln_C 2
  • 9. Mohammed Alsaadi Maria Claver Gern 400 6/11/2021 Life Review: Proposal/Script In life, every individual goes through unique experiences during the different stages of their lives. It is through these experiences that one is able to develop their personal perspectives, judgments, and also acquire personal strengths and wisdom. Much of what an individual is a today is determined by the physical, emotional and mental experiences, challenges and hardships that they have gone through in life. For this proposal, I plan to interview my elderly neighbour Salim, whom I have always been in awe of and who has always fascinated me with his dinner conversation during his visits. He is 67 years old and we have been neighbour for more than 20 years. He and his family are very nice I spent many times with them until they become like a family. The level of this interview would be very comfortable and easy-going since we know each other for a quite a long time. Furthermore, this is a great chance to talk to him and learn more about his
  • 10. experience growing up. I hope to learn from him about his family background, his childhood, career and retirement and all the significant experiences and events that have taken place in his life. Through this interview, I expect to find out how one’s life experiences can determine a person’s mental outlook and physical conditions in the course of getting old. Apart from acquiring information in this interview, I also intend to apply the lessons and knowledge that I have accumulated in life also. The questions in this interview will be containing components of the biopsychosocial model, which entail biological, psychological and social aspects that make up the reality of life (Derek Bolton). To conclude, am hopeful of gaining a better understanding of how these three aspects have influenced his life and how it has affected both his mental and physical health. Biological Questions 1. When were you born? Do you know the date? 2. Can you describe the family background? 3. What was your childhood like and how were you brought up? 4. Who were your friends growing up, how did they influence you? Are they still present in your life? 5. What are your opinions on the physical aspects of your life and the changes to your body? 6. Do you suffer from health complication and if so, do you need assistance in relation to the health complication? 7. What is your current diet and is it in relation to your health? Psychological Questions 1. How do you perceive yourself and how would you describe your life experiences? 2. What are some of the challenges that you have experienced in your life and how have they influenced you? How were you able to conquer these scenarios and how did it affect your life and the person that you are today?
  • 11. 3. What was your coping mechanism in overcoming the stress, anxiety and frustration caused by the challenges in your life? 4. Would you change the past given an opportunity to go back in time, and what would you do differently? 5. What do you think is most different today from when you were growing up? 6. How has technology and telecommunication affected you? Do you find it easy or hard to cope with this era of technology? Social Questions 1. What is your support system, is it family or social support? How has it changed your experience? 2. Do you have a written will or testament and what does it entail? What are your views on nursing homes or specialized facilities? 3. Are you happy with the life you have led and do you feel happier, sad or depressed as you advanced in age? 4. What do you do in your spare time for fun? What do you find most entertaining? 5. What is your legacy? What do you think you will be remembered for? 6. What positive or negative impacts will you leave on the world? Will you leave the world a better place than you found it? 7. Works Cited Derek Bolton, Grant Gillett. The Biopsychosocial Model of Health and Disease: New Philosophical and Scientific Development. Springer, 2019. THE BIOLOGY OF AGING 3
  • 12. The Biology of Aging. Alazhar Alsaadi CSULB Dr. Maria Claver 6/11/2021 Study of aging. Abstract. I got the opportunity to interview Mrs. Fatima Al-Saadi with who we live in the same house. She suffers from back pains. I got the chance to assess her life, the challenges that are facing her, and how she was dealing with them. Further, we assessed the cause and how the condition was being managed.
  • 13. Proposal. Mrs. Fatima, 75 years old, is my grandmother and we stay in the same house. It is entrusted upon me the duty of caring for her. She is of medium built and has a condition with her backbone and as a result, it does not support her body well. As a result, all her movement is through the use of a wheelchair. Before conducting the interview, she must be prepared to give her ample time to recollect herself and be prepared for the questions. After preparation, the interview will take place after a period of two days. One will take place on the first day of the week after which she will be allowed to rest and the other will take place on the third day of the week. The interview will take place in the living room which is where she receives and entertains guests and will be conducted in the mid-morning. The timing will ensure that she is not too exhausted to give accurate answers. The reason I chose her as the subject of the interview was due to the stories that she narrated about herself as a child. She was a strong and hardworking lady before an illness at the age of 45 that affected her spinal cord and since then, she has never been able to walk again. As a result, it arouses a curiosity within me as I tried to figure out the cause of her medical condition, and the resultant effect of this on her health and her perception of life. Every time looking at her vibrant in her wheelchair, it becomes more of a challenge to see her without it. As a result of the curiosity to know what happened to my grandmother, this assignment level is very comfortable. Further, she accepted her condition and moved on and is also an inspiration to us thus I was comfortable choosing her as a recipient of this assignment. Besides, as a result of the bond between us, we are free with each other thus making the interview easier to conduct. By the time I complete this assignment, I hope to have known the cause of her illness, whether it can be cured, and whether it
  • 14. is hereditary and if so, how to protect future generations. The interview theory that I shall use is the biology of aging theory because it deals with the human body and the damages as a result of how it has been programmed. Script. Interview questions. 1. Can you elaborate on your life prior to the illness? 2. Are there activities that you used to regularly carry out before the illness? 3. Was there anything that happened that could have triggered the illness? 4. What were the initial signs and symptoms pf the condition? 5. Were you bullied? 6. How do you explain the illness to your friends and relatives? 7. Does questioning you about the diseases by any way make you uncomfortable? 8. Do you feel as if you differ from other people? 9. What is the most challenging thing about being disabled? 10. What is your perception of self? 11. Do people treat you differently when they visit than they once did? 12. Do you ever wish you were able to walk again? 13. Are the medication expenses a burden to you? 14. What does it feel like to be on medication on a daily basis? 15. How are you able to be so optimistic about life in your condition? 16. Was the illness by any chance preventable? If it was, how would you advise future generations to protect themselves? Running Head: LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS 1 LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS 9
  • 15. LOGISTIC REGRESSION AND DISCRIMINANT ANALYSIS NAME: INSTRUCTOR: DATE: Part I Logistic Regression The age and gender of guests in a nursing home were examined whether they are the cause of deaths in 2015. Data was collected for gender, age and whether the guest died or not. In this case, death is our dependent variable while age and gender are the independent variables. Since the dependent variable “died” was categorical with two levels, this was an indication that logistic regression analysis was suitable for prediction in this study (Austin & Merlo, 2017). The assumption of a dichotomous dependent variable was met in this case where “died” took to values 0) No, 1) Yes. The assumption of one or more predictor variables was met. Age was quantitative reporting respective ages of the guests. Gender was categorical with two levels 0) Females and 1) Males. Analysis The collected data was analyzed to examine the relationship between the predictor variables and the binary
  • 16. dependent variable. A sample of 284 guests was used for this study for easy analysis and generalizations. The overall logistic regression model is given as; Table 1 shows the total number of participants and the valid sample that was utilized in this study. Table 1: Logistic regression summary The analysis showed that there were 144 successes and 140 failures. According to the results, the overall model was statistically significant with χ2 (2) = 82.46, p < 0.001 (Warner, 2020). This implies that we can carry on with the analysis. Analysis showed that gender and age were both statistically significant and contributed to the variation in deaths. Gender was statistically significant where b = 1.96, OR = 7.08, p < 0.05 implying it had an impact on deaths. Age was also statistically significant where b = 0.196, OR = 1.22, p < 0.05. The likelihood of dying is 7.08 times higher in males as compared to females according to the odds ratio. Older people are 1.22 more likely to succeed in the tests (Norton et al., 2018). The logistic regression equation that helps in prediction of death of a person given the age and gender will be given as; Part II Discriminant Analysis Two tests were developed in a firm to determine whether some of the employees will perform in a given position. A sample of 43 employees was examined. The main aim is to group employees as either successful or unsuccessful by using the tests given. Discriminant analysis suits this case since exclusive grouping was required and the dependent variable was categorical with two groups 0) Unsuccessful and 1) Successful (Bowerman et al., 2019). Two independent variables used in this study (Test 1 and
  • 17. Test 2) were quantitative reporting the scores of the employees in the two tests. Analysis Discriminant analysis was carried out in SPSS to classify the employees as successful or unsuccessful based on the two tests. Descriptive statistics were as shown in table 2. Table 2: Descriptive statistics Group Statistics Group Mean Std. Deviation Valid N (listwise) Unweighted Weighted Unsuccessful Test1 84.7500 4.24109 20 20.000 Test2 79.1000 4.38778 20 20.000 Successful Test1 92.4348 3.47492 23 23.000
  • 18. Test2 84.7826 6.23740 23 23.000 Total Test1 88.8605 5.43175 43 43.000 Test2 82.1395 6.10847 43 43.000 The mean of test 1 in the unsuccessful group was 84.75 while for test 2 in the unsuccessful group was 79.10. The means for test 1 and 2 in the successful group were 92.43 and 84.78 respectively. Table 3 shows the importance of the independent variables in the discriminant function used to group the employees. Table 3: Test of equality of group means Tests of Equality of Group Means Wilks' Lambda F df1 df2 Sig. Test1 .490 42.644 1
  • 19. 41 .000 Test2 .780 11.593 1 41 .001 According to the analysis, both tests scores were statistically significant in for the discriminant function. Table 4 shows the correlation matrix of the predictor variables. Table 4: Correlation matrix Pooled Within-Groups Matrices Test1 Test2 Correlation Test1 1.000 .187 Test2 .187 1.000 According to the analysis, the correlation between the scores of test 1 and test 2 was r = 0.19. This is a weak positive relationship implying the independent variables are not correlated. The assumption of multivariate normality was examined and the test results were as shown in the Box’s M statistics given in table 10 (Ul Hassan et al., 2017). Table 5: Homogeneity of covariance matrix Test Results Box's M
  • 20. 5.014 F Approx. 1.582 df1 3 df2 936960.353 Sig. .191 Tests null hypothesis of equal population covariance matrices. According to the analysis, it is clear that groups did not differ in the covariance matrices implying that the assumption is not violated and the analysis can continue. According to table 6, one discriminant function was found given the two-grouped dependent variable. Table 6: Canonical discriminant function Eigenvalues Function Eigenvalue % of Variance Cumulative % Canonical Correlation 1 1.161a 100.0 100.0 .733 a. First 1 canonical discriminant functions were used in the analysis. The strong positive canonical correlation implies that there was
  • 21. a strong association between the discriminant function and the dependent variable (Uurtio et al., 2017). Table 7 shows the coefficients of the independent variables. Table 7: Standardized canonical discriminant function coefficients Standardized Canonical Discriminant Function Coefficients Function 1 Test1 .885 Test2 .328 According to the analysis, Test 1 had the best ability in discriminating as compared to Test 2. This implies that Test 1 is very significant in predicting whether employees will be successful or unsuccessful in the position. Table 8 shows the unstandardized canonical coefficients of the model. Table 8: Unstandardized canonical coefficients Canonical Discriminant Function Coefficients Function 1 Test1 .230 Test2 .060 (Constant) -25.380 Unstandardized coefficients The discriminant equation becomes; D = -25.38 + 0.23*Test 1 + 0.06*Test 2 Table 9 shows the classification of the given variables.
  • 22. Table 9: Classification Classification Resultsa,c Group Predicted Group Membership Total Unsuccessful Successful Original Count Unsuccessful 16 4 20 Successful 5 18 23 % Unsuccessful 80.0 20.0 100.0 Successful 21.7 78.3
  • 23. 100.0 Cross-validatedb Count Unsuccessful 16 4 20 Successful 5 18 23 % Unsuccessful 80.0 20.0 100.0 Successful 21.7 78.3 100.0 a. 79.1% of original grouped cases correctly classified. b. Cross validation is done only for those cases in the analysis. In cross validation, each case is classified by the functions derived from all cases other than that case. c. 79.1% of cross-validated grouped cases correctly classified. Analysis showed that 80% of the employees classified as unsuccessful were unsuccessful while 20% who were successful were classified as unsuccessful. 78.30% successful employees were classified as successful while 21.7% successful were classified as unsuccessful. In overall, 79.10% cases were
  • 24. correctly classified. Reference Austin, P. C., & Merlo, J. (2017). Intermediate and advanced topics in multilevel logistic regression analysis. Statistics in medicine, 36(20), 3257-3277. Warner, R. M. (2012). Applied statistics: From bivariate through multivariate techniques. Sage Publications. Norton, E. C., Dowd, B. E., & Maciejewski, M. L. (2018). Odds ratios—current best practice and use. Jama, 320(1), 84-85. Bowerman, B., Drougas, A. M., Duckworth, A. G., Hummel, R. M. Moniger, K. B., & Schur, P. J. (2019). Business statistics and analytics in practice (9th ed.). McGraw-Hill Ul Hassan, E., Zainuddin, Z., & Nordin, S. (2017). A review of financial distress prediction models: logistic regression and multivariate discriminant analysis. Indian-Pacific Journal of Accounting and Finance, 1(3), 13-23. Uurtio, V., Monteiro, J. M., Kandola, J., Shawe-Taylor, J., Fernandez-Reyes, D., & Rousu, J. (2017). A tutorial on canonical correlation methods. ACM Computing Surveys (CSUR), 50(6), 1-33.
  • 25. Introduction In this paper, I will use the cardUpgrade dataset to run the K nearest neighbor algorithm in JMP and predict the likeli hood of a customer upgrading to platinum status or not. This dataset has three attributes: a) UpGrade – this column contains categorical data represented in nominal form. b) Purchases – is a numeric column that describes the monetary value of purchases made by each customer. c) PlatProfile – this is also a categorical data that was used at the time the customers signed up to evaluate whether they fit the profile for a platinum member or not. K Nearest Neighbors (KNN) The K Nearest Neighbors algorithm is a supervised non- parametric lazy learning algorithm applicable to either classification or regression tasks (Okfalisa et al., 2017). First, a supervised machine learning algorithm relies on labeled training input to produce a desired output given non-labelled data. Secondly, non-parametric means that the algorithm makes no assumptions and any model built on it relies entirely on the training data it is fed without further assumptions on the structure of the data. Thirdly, the algorithm is lazy learning meaning that it does not make any generalization since the training is minimal. In essence, as opposed to most machine learning algorithms, the training data used in a KNN model is also used to test the model. KNN classifies a single data point by comparing it to the points it is closest and most similar to. It
  • 26. assumes that similar items exist in close proximity KNN is apt for different ML tasks including decision-making (as in this task), recommender systems, and image recognition. For this particular task, I will use JMP to apply KNN on a customer’s dataset. The model will compare the initial customer profile attribute and purchase history to their current upgrade status to be able to predict whether other customers would be willing to upgrade or not. Because this is a classification task, the output should be a discrete value that shows whether a customer will upgrade or not. There is no middle ground which is why the values are binary, that is, 0 or 1. The model will have two predictors and a label. The output from this model will also be nominal which means it represents the upgrade status of an individual. While the output will be in the form of numerical values (0 or 1), these numbers are only representational and have no mathematical meaning (Ghattas et al., 2017). KNN Scheme on JMP First I imported the excel data set into JMP. The application treats the two categorical columns as numeric. This must be specified back to nominal in JMP. While this process is rather limited, it is a form of data cleaning when creating a machine learning model. Other data cleaning tasks typically include identification and removal of duplicate records/observations and finding ways to handle missing values. The second step is to conduct an exploratory data analysis (EDA). According to Jebb et al., this stage helps to decide the algorithm and variables that would be used (2017). EDA also involves data visualization which gives a cursory insight about the observations. For instance, below is a bubble plot of UpGrade status against purchases. It shows that those who did not upgrade are more concentrated on the lower end of purchases while those who did upgrade made higher purchase volumes. It also shows some outliers which should be handled. I have chosen to ignore these outliers because there are only one for each class meaning that they would have minimal impact on the model. Additionally, the dataset contains only forty observations, thus, there is very
  • 27. little room for cropping out other observations. I also visualized the distribution of the purchase column to determine the percentage. The graph shows that the distribution is almost split evenly (55% for non-upgrades and 45% for upgrades). Splitting the Data For most machine learning modeling tasks, the data should only be split into training and testing sets. The train-test approach is used to avoid evaluating the model based on training dataset as this would result in a biased score. Kuhn & Johnson state that it is pragmatic for the model to be evaluated on data that was not used to either build or finetune it (2013, p. 67). Some algorithms including KNN, on the other hand, produce the best results when the data is split into three including a validation set. This third set is often seen as another set of test data, but it is used to tune the hyperparameters of the model. However, Touvron et al., elaborate that this splitting of the data into two or three sets only produces optimal results when the dataset has a large amount of data such that each class that could potentially be observed is included in each set (2020). Since this dataset only has forty observations, it cannot be split thrice without compromising on model performance. It is, instead, split into training and test sets at the ratio of 4:1. This gives 32 observations for training and 8 observations for testing. The KNN algorithm also expects a k value which is typically selected as the square root of the total number of observations. When there are only two classes to be predicted, the standard practice is to pick an odd number to avoid a tie during majority voting. For this reason, I have picked a 7 as the value of k. Confusion Matrix Interpretation
  • 28. The confusion or error matrix is used to enhance the understanding of a classification task. Simply reporting on the accuracy of the model does not fully capture the performance of the algorithm (Okfalisa et al., 2017). The precision and recall scores are calculated from the output of the confusion matrix. In the matrix, the predicted values are described as either positive or negative whereas the actual values are resented as true or false. In the end, there are four possibilities: a) True Positive – these are the values that have been correctly predicted by the model as positive and they are true. b) True Negative – these values are actually true but the model predicts them as negative. c) False Positive – values predicted as positive yet they are actually false d) False Negative – these values have been predicted as false and they are also negative. In this case these values would be interpreted as follows: a) True Positive (TP) – customers who did upgrade to platinum and who the model has correctly predicted to have upgraded. b) True Negative (TN) – customers predicted to upgrade yet did not upgrade. c) False Positive (FP) – customers predicted to not upgrade yet they did upgrade. d) False Negative (FN) – customers correctly predicted to not have upgraded their cards to platinum. The matrix for this model is as in the image below: This means that the model produces 2 TPs, 0 TN, 1 FP, and 5 FN. Thus, only one out of the eight possible predictions is incorrect. This model, therefore, has an accuracy score of 87.5% PART 2 Using the Bayes theorem formula as indicated below, we can calculate the probability that a customer who makes purchases above 32.450 will also be likely to upgrade to platinum (Rouder & Morey, 2018).
  • 29. Below are the notations for the probabilities: P(A) – the probability that a customer upgrades to platinum (45%) P(B) – the probability that a customer makes purchases equal to or more than 32.450 (52.5%) P(A|B) – the probability that a customer upgrades given that he makes purchases equal to or above 32.450 (79.32%) P(B|A) – the probability that a customer makes purchases equal to or above 32.450 given that he has upgraded. Thus P(B|A) = .9254, that is, 92.54% which is a significant improvement from 87.5%. Therefore, using the Bayes theorem formula, the model’s accuracy has been improved. References Bowerman, B., Drougas, A. M., Duckworth, A. G., Hummel, R. M., Moniger, K. B., & Schur, P. J. (2019). Business statistics and analytics in practice (pp. 186–189). Mcgraw-Hill Education. Ghattas, B., Michel, P., & Boyer, L. (2017). Clustering nominal data using unsupervised binary decision trees: Comparisons with the state of the art methods. Pattern Recognition, 67, 177– 185. https://doi.org/10.1016/j.patcog.2017.01.031 Jebb, A. T., Parrigon, S., & Woo, S. E. (2017). Exploratory data analysis as a foundation of inductive research. Human Resource Management Review, 27(2), 265–276. https://doi.org/10.1016/j.hrmr.2016.08.003 Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling. In Google Books (p. 67). Springer Science & Business Media. https://books.google.com/books/about/Applied_Predictive_Mod eling.html?id=xYRDAAAAQBAJ&source=kp_book_description Okfalisa, Gazalba, I., Mustakim, & Reza, N. G. I. (2017). Comparative analysis of k-nearest neighbor and modified k- nearest neighbor algorithm for data classification. 2017 2nd International Conferences on Information Technology,
  • 30. Information Systems and Electrical Engineering (ICITISEE). https://doi.org/10.1109/icitisee.2017.8285514 Rouder, J. N., & Morey, R. D. (2018). Teaching Bayes’ Theorem: Strength of Evidence as Predictive Accuracy. The American Statistician, 73(2), 186–190. https://doi.org/10.1080/00031305.2017.1341334 Touvron, H., Vedaldi, A., Douze, M., & Jégou, H. (2020). Fixing the train-test resolution discrepancy. ArXiv:1906.06423 [Cs]. https://arxiv.org/abs/1906.06423 Running Head: MERGING AND ACQUISITION 1 MERGING AND ACQUISITION 2 MERGING AND ACQUISITION NAME: INSTRUCTOR: DATE: Merging and Acquisition Introduction
  • 31. Some of the factors determining merger and acquisition activities in retailing were examined in this paper to aid in decision making that have a positive impact on a firm. Characteristics of firms that were targeted for acquisition and the firms that were willing to make acquisition were looked into. Growth rate of sales for target firms and bidders were tested to enable management make decision wisely on acquisition (Christofi et al., 2017). Growth rate of sales for target firms The growth rate of sales for firms targeted for acquisition was tested using normal distribution tests. A sample of 25 firms was collected for this study. The mean sales growth rate was 0.16 and the standard deviation was 0.12. These are required in order to test for normally distributed data (D’Agostino, 2017). We perform t-test since the only sample statistics were known (Emmert-Streib & Dehmer, 2019). The research question for this study will be: Is there statistical significant difference between the population mean and sample mean for sales growth rate of target firms? The hypothesis testing process was as given below; Hypotheses The following are the null and alternative hypotheses. H0: μ = .10. The mean growth rate of sales for target firms is not different from 10% Ha: μ >.10. The mean growth rate of sales foe target firms exceeds 10%. Test statistics T-test was carried out to examine whether the sales growth rate of 0.16 was indeed statistically significant different from the 0.10. The t-test is given as; t = = = 2.50 Calculate the degrees of freedom by 1 from the sample size (Sung & Han, 2018); df = 25 – 1 = 24
  • 32. Determine the critical value from the t table at α = 0.10, 0.05, 0.01 and 0.001. For α = 0.10, tα = 1.318 which is less than the calculated value of t = 2.50 implying we reject the null hypothesis at α = 0.10 and conclude Ha: μ >.10. For α = 0.05, tα = 1.711 which is less than the calculated value of t = 2.50 implying we reject the null hypothesis at α = 0.05 and conclude Ha: μ >.10. For α = 0.01, tα = 2.492 which is less than the calculated value of t = 2.50 implying we reject the null hypothesis at α = 0.01 and conclude Ha: μ >.10. For α = 0.001, tα = 3.467 which is less than the calculated value of t = 2.50 implying we fail to reject the null hypothesis at α = 0.001 and conclude Ha: μ =.10. Decision There was very strong evidence to reject the null hypothesis at α = 0.01 and we conclude that the mean growth rate of sales for target firms exceeds 10% (Bowerman et al., 2019). Growth rate of sales for bidders The growth rate for firms that were willing to make acquisition was examined in order to come up with decisions regarding acquisition. A sample size of 25 firms was collected for analysis necessary for decision making. The mean growth rate of sales was 0.12 and the standard deviation was 0.09. We perform t-test since the only sample statistics were known; Hypotheses Tested Hypotheses to be tested; H0: μ = .10. The mean growth rate of sales for bidders is not different from 10% Ha: μ >.10. The mean growth rate of sales foe bidders exceeds 10%. Test statistics
  • 33. T-test was carried out to examine whether the sales growth rate of 0.12 was indeed statistically significant different from the 0.10. The t-test formula is given as follows; t = = = 1.111 Calculate the degrees of freedom by 1 from the sample size; = 25 – 1 = 24 Determine the critical value from the t table at α = 0.10, 0.05, 0.01 and 0.001. For α = 0.10, tα = 1.318 which is greater than the calculated value of t = 1.111 implying we fail to reject the null hypothesis at α = 0.10 and conclude Ha: μ = .10 (Trafimow, & Earp, 2017). For α = 0.05, tα = 1.711 which is greater than the calculated value of t = 1.111 implying we fail to reject the null hypothesis at α = 0.10 and conclude Ha: μ = .10. For α = 0.01, tα = 2.492 which is greater than the calculated value of t = 1.111 implying we fail to reject the null hypothesi s at α = 0.10 and conclude Ha: μ = .10. For α = 0.001, tα = 3.467 which is greater than the calculated value of t = 1.111 implying we fail to reject the null hypothesis at α = 0.10 and conclude Ha: μ = .10. Decision We failed to reject the null hypothesis at α = 0.10 and conclude that there was extremely evidence that the mean growth rate of sales for bidders did NOT exceed 10%. Conclusion Analysis shows that the growth of sales for firms targeted for acquisition exceeded 10% but the firms that were will ing to place bids for acquisition had sales growth rate not exceeding 10%. The bidding firms should be analyzed critically to ensure they meet the requirements of acquiring the already existing firms that have great growth rate curves. Reference Christofi, M., Leonidou, E., & Vrontis, D. (2017). Marketing
  • 34. research on mergers and acquisitions: a systematic review and future directions. International Marketing Review. D’Agostino, R. B. (2017). Tests for the normal distribution. In Goodness-of-fit techniques (pp. 367-420). Routledge. Emmert-Streib, F., & Dehmer, M. (2019). Understanding statistical hypothesis Testing: the logic of statistical inference. Machine Learning and Knowledge Extraction, 1(3), 945-961. Bowerman, B., Drougas, A. M., Duckworth, A. G., Hummel, R. M. Moniger, K. B., & Schur, P. J. (2019). Business statistics and analytics in practice (9th ed.). McGraw-Hill ISBN 9781260187496 Sung, W. P., & Han, T. Y. (Eds.). (2018, July). Exploration and Practice of a New Formula for Calculating the Degree of Freedom. In MATEC Web of Conferences (Vol. 175, p. 03018). EDP Sciences. Trafimow, D., & Earp, B. D. (2017). Null hypothesis significance testing and Type I error: The domain problem. New Ideas in Psychology, 45, 19-27. Introduction The prediction is the process of determining the magnitude of predictors on the response variable. Prediction helps determine the value of the outcome variable in the future using predictors or factors included during the study. Moreover, multiple linear regression is a statistical test used in assessing the relationship between the response variable and more than one predictor variable (Keith, 2019). Also, Cacoullos (2014) argued that discriminant which is used in testing the equality of group centroids is associated with multivariate analysis of variance since it uses Wilks’ lambda applied in GLM multivariate In this regard, multiple linear regression and discriminant analysis are the most appropriate statistical techniques in predicting the outcome of a dependent variable at different
  • 35. values of the predictor variables and assess the contribution of the predictors on the outcome and association between the independent variable respectively (Keith, 2019). Besides, the researcher can assess the variation of the outcome explained by the independent variables included in the model. The test is used in predicting the values of the response variable using more than one predictor variable. Discriminant analysis was used to assess the association between the two tests that were developed in a firm to examine employee performance. The study used a sample of 43 employees who were grouped as either successful or unsuccessful and performed the two test to assess their performance in a given position. Before conducting both multiple linear regression and discriminant, it is good to check whether the variables meet the necessary assumptions. For multiple linear regression dependent variable must be continuous and approximately normally distributed, while the predictor variables can either be continuous or categorical. For discriminant analysis the dependent variable must be divided into two or more groups. Besides, the predictor variables should not be correlated with each other, or there should be no multicollinearity (Alin, 2010) for both statistical tests. This can be assessed using the variance inflation factor, which should be less than ten, or through correlation coefficients between the predictor variables. The current study assessed the relationship between the cost of constructing an LWR plant and the three predictor variables S, N, and CT and assessed the association between the two test used to examine the employee performance. Assumption of Regression Analysis Multicollinearity Correlation analysis is used in determining the strength and direction of association between two variables (Puth et al., 2014). The Pearson correlation coefficient is used for testing the strength and direction of association between two variables with a continuous level of measurement. However, when the variables have an ordinal level of measurement, we use the
  • 36. Spearman rank correlation coefficient (Puth et al., 2014). The Pearson correlation coefficient ranges from -1 to 1, with -1 or 1 indicating perfect correlation and zero indicating no cor relation. Table 1: Correlation Analysis S N S Pearson Correlation 1 .193 Sig. (2-tailed) .289 N 32 32 N Pearson Correlation .193 1 Sig. (2-tailed) .289 N 32 32 The correlation analysis in table 1 above indicates that the correlation between the two predictor variables (S and N) is positively weak and not significant at 0.05 level of significance
  • 37. (r = 0.289, p = 0.193). This suggested that multicollinearity does not exist and the assumption multicollinearity is not violated. Furthermore, based on the analysis, the variance inflation factor (VIF) is less than 10 (Daoud, 2017, December), suggesting that the multicollinearity assumption is not violated. Normality test Table 2: Tests of Normality Kolmogorov-Smirnova Shapiro-Wilk Statistic df Sig. Statistic df Sig. ln_C .104 32 .200* .967 32 .414 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction The test of normality of the dependent variable (ln(C)) revealed that the normality assumption is not violated after transforming the variable C, using natural log at 0.05 level of significance (Shapiro-Wilk = 0.967, p = 0.414). Results and discussion Regression Analysis a. Use residual analysis and R2 to check your model.
  • 38. Table 3: Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics R Square Change F Change df1 df2 Sig. F Change 1 .482a .232 .179 .34240 .232 4.385 2 29 .022 2 .483b .234 .151 .34814 .001 .052 1 28
  • 39. .822 a. Predictors: (Constant), N, S b. Predictors: (Constant), N, S, CT The R-Squared of 0.232 indicates that the model can explain about 23.2% of ln(C) variation and 76.8% of the variation is explained by other variables not included in the model. Besides, the analysis indicated high residuals. The low R-Square and high residuals indicated that the model does not fit the data well (Brown, 2009). b. State which variables are important in predicting the cost of constructing an LWR plant? c. Table 4: Regression Coefficients Model Unstandardized Coefficients Standardized Coefficients t Sig. Collinearity Statistics B Std. Error Beta Tolerance VIF 1 (Constant) 5.300 .277 19.161 .000
  • 41. N .011 .010 .189 1.110 .276 .950 1.053 CT .028 .125 .038 .227 .822 .978 1.022 a. Dependent Variable: ln_C The regression analysis displayed in table 4 above indicates that S is a significant contributing factor in predicting ln(C) at a 0.05 level of significance (p = 0.021). However, the predictor variable N does not significantly predict the ln(C) at a 0.05 level of significance (p = 0.254). According to the analysis, there is no significant difference in ln(C) between the two levels of the cooling tower (p = 0.822), suggesting that the dummy variable CT does not have a significant effect in predicting ln(C). Therefore, the researcher used the S predictor to predict the cost of constructing an LWR plant but removed N and CT from the model. c. State a prediction equation that can be used to predict ln(C). After dropping N and CT from the model since they do not have a significance effect in predicting ln(C), the prediction equation is given by:
  • 42. d. Does adding CT improve R2? If so, by what amount? Based on the analysis displayed on table 3 above, there is no significant improvement in R-Squared after adding CT (p = 0.822). Adding CT in the model changes R-Square by 0.001 from 0.232 to 0.234 which is not significant different from zero. Correlational Analysis a. Evaluate the correlation between the two scores and state if there seems to be any association between the two. Table 5: Pooled Within-Groups Matrices Test1 Test2 Correlation Test1 1.000 .187 Test2 .187 1.000 The correlation analysis is shown in table 5 above indicates that there was a weak positive correlation between the two tests (r = 0.187). This suggested that the two test scores were not correlated. b. Find the probability of upgrading for each division of the sample by the Bayes’ theorem. Given that: P(T1) = 43/86; P(T2) = 43/86 P (T1/Up) = 23/46; P (T2/Up) = 23/46 P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1) = (23/46*46/86) ÷43/86 = 23/43 P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2) = (23/46*46/86) ÷43/86 = 23/43 c. Find the probability of upgrading for each division of the
  • 43. sample by the naïve version of the Bayes’ theorem. P(Up/T1) = P (T1/Up) P(Up) ÷ P(T1) = (23/46*46/86) ÷43/86 = 23/43 P(Up/T2) = P (T2/Up) P(Up) ÷ P(T2) = (23/46*46/86) ÷43/86 = 23/43 d. Compare your results in parts b and c and explain the difference or indifference based on observed probabilities Since we have only one predictor in each sample division, the naïve version and Bayes theorem have similar probabilities. There is indifference based on observed probabilities. This is because it is applied with Bayes's theorem with an assumption of independence between the features of predictor variables (Webb, 2010). Conclusion and Recommendations The analysis revealed that the model with the three predictors predicting the cost of constructing an LWR plant does not fit the data well. This suggested that most of the variations of the outcome variable are explained by variables not included in the model. Further analysis indicated that the S predictor had a significant effect in predicting the cost of constructing an LWR plant (C). However, N and CT did not have a significant effect in predicting. Therefore, the researcher should drop N and CT predictors from the model and only use the S predictor in predicting the cost of constructing the LWR plant. The analysis also indicated that the two test were not associated with each other but, the first test had the best potential in discriminating than the second test. Which suggested that the first test was the best to use in predicting whether employees will be unsuccessful or successful in the position. Nevertheless, the study did not control for alternative explanations that would affect the validity of the findings. Further study is needed that will include variables with a good fit of the data to help in predicting the cost of constructing an LWR plant. In addition, a study that will control all possible
  • 44. confounding is required to help in the prediction of the outcome variable and assessing the best test in predicting employee performance. References Alin, A. (2010). Multicollinearity. Wiley Interdisciplinary Reviews: Computational Statistics, 2(3), 370-374. Brown, J. D. (2009). The coefficient of determination. Cacoullos, T. (Ed.). (2014). Discriminant analysis and applications. Academic Press. Daoud, J. I. (2017, December). Multicollinearity and regression analysis. In Journal of Physics: Conference Series (Vol. 949, No. 1, p. 012009). IOP Publishing. Keith, T. Z. (2019). Multiple regression and beyond: An introduction to multiple regression and structural equation modeling. Routledge. Puth, M. T., Neuhäuser, M., & Ruxton, G. D. (2014). Effective use of Pearson's product–moment correlation coefficient. Animal behaviour, 93, 183-189. Webb, G. I. (2010). Naïve Bayes. Encyclopedia of machine learning, 15, 713-714.