Goal:
To understand the relationships between physical health and social aspects and whether they coincide with anxiety or mood disorders.
Objectives:
To achieve a deeper general understanding of the physical and social factors that potentially influence or are influenced by mental health
To understand identified relationships and patterns from a technical perspective in the data
To transform the data using techniques so that it is a suitable input for the models being used.
To create the basis for a machine learning model that can be used to predict the onset of mental disease and to ultimately answer the question of whether mental illness can be predicted based on a set of physical and social factors
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Prediction and Analysis of Mood Disorders Based On Physical and Social Health Indicators
1. Prediction and Analysis
of Mood Disorders Based
On Physical and Social
Health Indicators
Findings from the CCHS-2014 survey
IMAT5314 PROJECT 2019
P16233152
2. Problem
Statement
Although anxiety and mood
disorders are commonly found in
many communities, there is little
empirical evidence of one single
concrete cause of these illnesses
In fact, mental illnesses typically
have multiple causes that can
stem from factors such as
individual emotional experiences,
state of living, addiction or/and
upbringing.
How can we understand which
factors influence, cause or
deepen anxiety and mood
disorders?
3. Solution Scope
By extracting results from the CCHS
(Canadian Community Health Survey), it
was possible to perform an exploratory
data analysis on physical, social and mental
health factors.
The extracted data showed an opportunity
to apply machine learning techniques to
attempt to uncover patterns and to
attempt to understand the relationships
between physical and social health factors
on mood and anxiety disorders.
This research focuses on the underlying
influences of physical health (such as onset
of physical illnesses, level of exercise,
smoking habits) and social factors
(including sense of belonging, individual
income) and their relationship with anxiety
and mood disorders.
4. Project Goal and Objectives
Goal:
To understand the relationships between physical health and
social aspects and whether they coincide with anxiety or mood
disorders.
Objectives:
1. To achieve a deeper general understanding of the physical and
social factors that potentially influence or are influenced by
mental health
2. To understand identified relationships and patterns from a
technical perspective in the data
3. To transform the data using techniques so that it is a suitable
input for the models being used.
4. To create the basis for a machine learning model that can be
used to predict the onset of mental disease and to ultimately
answer the question of whether mental illness can be predicted
based on a set of physical and social factors
6. Literature Review: Short Summary of Findings:
Relationships between physical/social health
aspects and mental illnesses
Relationships have been previously researched and observed between:
Poverty, social cohesiveness, identity, self-esteem Anxiety, Depression
Anxiety (Stress Heart rhythm) Blood pressure
Alcohol Anxiety, Depression
Anxiety Smoking
Diabetes / Cancer (low number of cases) Anxiety, Depression
Arthritis (RA) / Asthma Depression
Physical Activity Anxiety
Note: All references are included in the Project Documentation
8. Literature Review: Short Summary of
Findings: Technical understanding of
analysis methods suitable for the dataset
Exploring case studies from existing research projects that applied
machine learning to health data and digital health:
Many used ML methods to predict the onset of mental illnesses using
different techniques such as random forests, neural networks and
naïve bayes
Data Analytics and ML go hand in hand, where DA attempts to construct
hypothesis through investigation, and ML attempts to answer these
hypothesis through training and testing data
Understanding Machine Learning methods:
Classification VS Regression
Supervised VS Unsupervised
Random Forests, Regression, Ensemble Methods, Pattern Mining
Class imbalance
Feature selection and reduction
Performance Metrics (confusion matrix, AUC, F score)
9. Methodology
First, conduct exploratory pattern analysis of the data and
extract meaningful findings, thus addressing research objectives
1 and 2.
Transform, resample and normalize the data to address research
objective 3, which is also a prerequisite for objective 4
Apply machine learning models in order to build a prototype for
a mood disorder prediction model, thus referencing objective 4.
Tools:
Python & various libraries for Machine Learning
Objectives
revisited:
1. Deeper general understanding
2. Understand patterns from a
technical perspective
3. Transform the data
4. Apply/Configure a machine
learning model to predict the
onset of mental disease
10. Methodology cont.
Data pre-processing and sampling (DA)
Splitting between nominal and continuous
Deriving count, mean, ranges, standard deviation and
distribution
Correlation analysis to determine whether features would
need to be stripped
Comparative analysis
Tools:
Data exploration was done using Microsoft Excel
Python was used with packages for statistical analysis
11. Methodology cont.
Data pre-processing and sampling (ML)
Normalisation
Splitting between test and training data 70%/30%
Tackling class imbalance with SMOTE
Applying classification models
Measuring performance
Selecting the top performing models were selected based on the highest scores
Tools:
Data exploration was done using Microsoft Excel
Python & various libraries for Machine Learning
12. Data Analysis Findings
The comparative analysis exercise did in fact justify that mental illness was, in
general, more present in those suffering from physical illness - although the
differences were not significant.
Due to the complexity of the relationships that each variable has, which cannot simply
be explained directly with correlation, this gives a further reason of why machine
learning is a suitable candidate to analyse this sort of data.
Some of the results:
Alcohol drinkers experience more mood disorders and anxiety. Although the difference is
minimal.
Out of the segment of smokers that smoke at least 31 cigarettes a day, 25.52% are classified
as suffering from a mood disorder, an increase of 16.86% from the general sample.
Active people that engage in regular physical exercise show a lower proportion of people
diagnosed with (1.42% lower) anxiety and (2.69% lower) mood disorder
Association results demonstrated that the strongest physical illness links with mental
illnesses were arthritis, followed by high blood pressure and asthma.
13. Machine
Learning Results
SVMs were observed to
be the most effective
predictor for mood
disorders and anxiety
as in terms of accuracy
14. Conclusions & Lessons
Learnt
• Data availability is the true bottleneck for DA and ML projects
• Pre-processing was possibly the most important step in this project.
Throughout the first phases, lots of experimentation and research was
done in order to fine tune and prepare the data in the best way possible
• Machine Learning can prove to be a reliable predictor for classification
problems and can be applied in many ways as long as the data is
available
• Tools and learning resources are prevalent, updates are frequent,
techniques are evolving continuously
15. Future steps for mental illness classification?
Other data types could be
explored, such as images stemming
from brain scans (MRI, PET) that
show brain activity for individuals
experiencing mood disorders or
anxiety.
Better infrastructure allows for
heavier algorithms, which means
that there can be better results
Eventually, a more refined version
of this model can be used as a
back-end structure to an app or
website that raises awareness for
individuals to be able to gauge how
their lifestyle, habits and physical
factors could potentially affect
their mental health.
Notes de l'éditeur
Thank you for taking the time to listen to this presentation. I would like to describe the project which was done for my masters course, based on the prediction and analysis of mood disorders based on physical and social health indicators.
This project was done in order to utilise the data mining and machine learning methods that were taught during the course
So the problem statement is based on the issue that, anxiety and mood disorders are very common although their cause is not easily identifiable from existing research. Mental diseases in fact can typically stem from a different number of sources such as
Emotional, physical and social. -> So how can we understand which factors influence, cause or deepen anxiety and mood disorders?
Anxiety and mood disorders in particular are found in many communities and whilst the causes of these disorders are often researched, there is little empirical evidence of one single concrete cause of these illnesses, since “anxiety” and “mood disorders” are also used as umbrella terms for more specialised disorders. Research (also discussed in Chapter 2) shows that mental illnesses typically have multiple triggers and causes that stem from factors such as individual emotional experiences, state of living, addiction and upbringing. This research focuses on the underlying influences of physical health (such as onset of physical illnesses, level of exercise, smoking habits) and social factors (including sense of belonging, individual income) and their impact on causing or deepening anxiety and mood disorders. By extracting results from the CCHS (Canadian Community Health Survey), it was possible to perform exploratory data analysis on the factors based on physical, social and mental health aspects. The extracted data showed an opportunity to apply advanced machine learning techniques to attempt to uncover patterns and to attempt to understand the relationships between physical and social health factors on mood and anxiety disorders.
After going through the available data repositories, I came across survey data from CCHS which showed the collected data from various participants including generic data on physical and social wellbeing, and mental illnesses.
Particularly, the survey isolated anxiety and mood disorders.
Having two variables available as target variables, this presented an opportunity to use DA in order to perform exploratory analysis and use ML as a form of prediction, to understand whether and which variables can effect mental illnesses,
The first section of the literature review in Chapter 2 goes over the theoretical principles and concepts of the machine learning and data mining models and metrics that were used to derive the results to this thesis. Next, mood and anxiety disorders, their researched causes and links with health factors are reviewed. Furthermore, an analysis of the state-of-the-art academic papers that have analysed relationships between physical/social health aspects and mental illnesses are also described in the last section of the review.
The thesis then expands on the methodology and models used the data and its structure (Chapter 3), then reports the results of each model tested (Chapters 4, 5). Finally, the results of each model are illustrated to compare their performance based on different indicators (Chapter 6).
Through the literature review and analysis of research online, various links can be found between certain factors and mental illnesses.
Moving on to the technical side of things. This diagram represents on a high level, the approach that data analytics takes to solve problems.
Analytics first begins with describing a problem area, understanding why it happened, understanding how or when it will happen again, and ultimately taking action for it to happen more or less. (in our case, less).
Diving in deeper to the technical side of things, the literature review of the project goes into various topics of ML that have been applied for the outcome of this project. Existing case studies were analysed from the papers
Available in the library based on disease prediction, and the models used were noted and studied in further detail.
Certain ML approaches were good to know also, such as understanding that ML essentially splits into solving two problems: classification (used to predict labelled variables that was used in this project) and regression (which is numerical).
Supervised vs unsupervised ML, which dictates whether variables are labelled prior to training the model.
Different ML models such as RF, regression and pattern mining.
Understanding the challenges with class imbalance, which was the major challenge that I found during this project.
Applying feature selection and reduction techniques such as PCA (principal component analysis)
Then understanding and assessing performance using the right metrics
Read up. ^^^^^
For the purpose of this study and for testing the hypothesis, the arguments for and against conducting a quantitative analysis for the context of this study are as follows:
The aim is to classify features and use statistical models to explain observations
The outcome is known by the researcher before the study
Data is available and collected. It can be transformed to be used for statistical models
Measurement and analysis of target concepts are part of the objective
Researcher is objectively separated from the subject matter
Quantitative data is efficient for hypothesis testing although contextual detail may be missed.
Adapted from: Miles. (1994, p. 40). Qualitative Data Analysis, [online] available at http://wilderdom.com/research/QualitativeVersusQuantitativeResearch.html (Accessed: 28 January 2019)), Table 3.1: Features of Qualitative & Quantitative Research
More specifically, the data was preprocessed by splitting the variables, deriving statistical information about them and performing correlation and comparative analysis.
Data was pre preprocessed using normalisation to make sure that the data points were on the same scale. The training and test set was split 70/30. Smote was used to deal with the class imbalance issue
Which incorporates a blend of up and down sampling. Applying the different classification models which were tuned beforehand and tested with different combinations of parameters.
The performance was then measured and reported. Then these metrics were compared to select the top performing models.
Moving on to the findings and results now,
There is no evidence from this study that proves that a very strong correlation exists between physical and mental attributes since only weak (< 0.3) correlation coefficients were revealed during the data analysis. The comparative analysis exercise did in fact justify that mental illness was, in general, more present in those suffering from physical illness, forming detrimental health benefits or with low social satisfaction – although the differences were not significant when looking at variables individually. The question that developed was then that perhaps a combination of multiple factors at once could have a larger effect on mental illness (for instance alcohol abuse plus low social connectedness).
some of the results:…
Despite the findings in the literature review, the data that was investigated during this study showed weak correlations between physical and social attributes in relation with mood disorders and anxiety. However, one could question whether multiple physical illnesses and a low social health existing simultaneously for one participant could together greatly impact the outcome of mental illness, since several of these factors display some positive correlation. In fact, during principal component analysis detailed in section 3.3, 31 factors were retained, meaning that there were at least 31 variables that explain the variation of data for CCC_280 and CCC_290. Due to the complexity of the relationships that each variable has, which cannot simply be explained directly with correlation, this gives a further reason of why machine learning is a suitable candidate to analyse this sort of data. In terms of hypothesis, this insight introduces a possibility that multiple inputs (not one specific or social variable) could in fact predict, to a certain level of accuracy, the onset of mental illness.
280 mood disorder
290 anxiety
SVMs were observed to be the most effective predictor for mood and anxiety disorder in terms of accuracy with ridge, log reg and voting classifier also performing quite well.
high precision means that an algorithm returned substantially more relevant results than irrelevant ones, while high recall means that an algorithm returned most of the relevant results.
Recall true positives/ t[p+fn
Precision = tp/tp+fp
F score harmonisation of precision and recall
Cohens cappa = obvserved – expected agreement all over 1 – expected agreement,
When a model outputs a predicted value, there is a chance that the value was correctly guessed based on chance. Cohen’s kappa takes this random factor into consideration and measures the observed accuracy: that is, the correctly guessed observations against the expected accuracy that could result out of chance.
AUC for ROC
Based on the confusion matrix results, a plot of the true versus the false positive rates (y and x axis, respectively) can be done to measure the area under the curve, where a greater area under the curve represents a higher accuracy.
Confusion matrix:
In the context of any binary classification problem, the actual data (test set) is compared to the ‘guessed’ observations that are output by the model based on some threshold that determines the classified observation
My laptop can stay on for days without crashing during Gridsearch!
The CCHS is done on an annual basis and this ml model could easily be used by entities to understand underlying factors of anxiety and mood disorders.
Other researchers could further tweak this model to develop it further, refine the used variables and continually evolve it to give better results.