SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
M A C H I N E L E A R N I N G M O D E L S
F O R C L A S S I F I C A T I O N A N D P R E D I C T I O N
O N O S T E O P O R O T I C S P I N A L F R A C T U R E S
Erennio Iannotta– UP919761
M S c I n f or m a t i on S yst e m s
T H E
J O U R N E Y
P L A N
0 1
P R O B L E M
D O M A I N
A brief introduction to the
problem of Osteoporosis, then
to Machine Learning, its
purposes and techniques, its
evaluation methods, focusing
on the applied project
techniques
0 2
0 3
0 4
P R O J E C T
W O R K
Project development tools and
work flow presentation, going
in the details of the main steps
Q U E S T I O N
& A N S W E R S
Any further questions?
Just ask :)
C O N C L U S I O N S
A N D F U T U R E
D E V E L O P M E N T S
A final evaluation with personal
conclusions regarding the
whole project work, with hints
for future developments
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E
J O U R N E Y
P L A N
0 1
P R O B L E M
D O M A I N
A brief introduction to the
problem of Osteoporosis, then
to Machine Learning, its
purposes and techniques, its
evaluation methods, focusing
on the applied project
techniques
P R O B L E M D O M A I N
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
O S T E O P O R O S I S
Osteoporosis is a progressive
condition that is characterized
by a reduction of Bone Mineral
Density (BMD) leading to
greater bones' fragility.
Healthy Bone Osteoporotic Bone
Bone Density comparison
P R O B L E M D O M A I N
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
O S T E O P O R O S I S
Consequences:
• Pain
• Difficultly walking
• Paralysis
• Death
Spinal Fractures
M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
W H A T I S I T ?
M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
N E U R A L N E T W O R K S
A neural network is a type of machine
learning which models itself after the
human brain.
This creates an artificial neural network
that via an algorithm allows the computer
to learn by incorporating new data.
Made of:
• Nodes
Two ways of learning:
• Supervised
• Unsupervised
Evaluation criteria:
• ROC-AUC Curve
• Confusion Matrix
M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
R O C – A U C C U R V E
True Positive Rate (TPR) vs False Positive Rate (FPR)
Perfect separability
True Positive Rate:
!"
!" + $%
False Positive Rate:
$"
!% + $"
M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
R O C – A U C C U R V E
True Positive Rate (TPR) vs False Positive Rate (FPR)
Good separability
True Positive Rate:
!"
!" + $%
False Positive Rate:
$"
!% + $"
M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
R O C – A U C C U R V E
True Positive Rate:
!"
!" + $%
True Positive Rate (TPR) vs False Positive Rate (FPR)
No Separability
False Positive Rate:
$"
!% + $"
M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E C O N F U S I O N - M A T R I X
True Positive False Negative
False Positive True Negative
Confusion
Matrix
PredictedValues Actual Values
Positive (1) Negative (0)
Positive (1)
Negative (0)
F A L S E P O S I T I V E
M A C H I N E L E A R N I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
M O D E L S ’ C O N F U S I O N - M A T R I X E V A L U A T I O N
You are
pregnant!
F A L S E N E G A T I V E
You are
not
pregnant!
Whait…
What?!
0 2
P R O J E C T
W O R K
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E
J O U R N E Y
P L A N
Project development tools and
work flow presentation, going
in the details of the main steps
P R O J E C T W O R K F L O W
D A T A P R E P R O C E S S I N G"
Understand the data in order to
improve its quality, to give a better
knowledge base to the Machine
Learning algorithms.
M O D E L I N G"
The application of Machine Learning
algorithms to learn and predict new
informations, based on the previously
prepared data
" F I N A L E V A L U A T I O N
Cost-Analysis based evaluation to find
the best Analyzed methodology
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
R +
R - S T U D I O
Used for:
• Explorative Data Analysis during
the Data Understanding step
Used for:
• Data preparation
• Modeling and local Evaluation
• Final Evaluation
D E V E L O P M E N T T O O L S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
P Y T H O N +
J U P Y T E R N O T E B O O K
D A T A
U N D E R S T A N D I N G
This is a phase of information extraction, meant to
find the best insight abouth the composition of the
data, to manipulate it in the next steps.
In this phase the data will be prepared, following the
insight given by the understanding phase, to be the best
knowledge base as possible
D A T A P R E P R O C E S S I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
D A T A
P R E P A R A T I O N
D A T A D E S C R I P T I O N
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
Data source:
• UK Biobank (http://www.ukbiobank.ac.uk/)
Data Composition:
• Shrinked from 680 to 29 variables
Data Acquisition:
• Supervised
• Answering survey
• Analysis of biological samples (blood, saliva)
D A T A D E S C R I P T I O N
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the variables:
• 29
• Eid
• Sex (gender)
• Age
• Ethnic
• Weight
• Height
• BMI
• Waist
• BMI_Category
• Waist_Category
• VBX
• HIPX
• Menopause
• HRT
• Smoking
• ReumathoidArthrits
• SecondaryOsteoporosis
• Alcohol
• Alcohol24
• VitaminD
• Calcium
• Dose_Walk
• Dose_moderate
• Dose_vigorous
• Dose_pleasure
• Dose_sport
• Dose_exercise
• Dose_lightDIY
• Dose_heavyDIY
D A T A D E S C R I P T I O N
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the variables:
• 29
Main variables kept after the analysis:
• 17
• Sex (gender)
• Age
• Weight
• Height
• VBX
• HIPX
• Menopause
• HRT
• Smoking
• ReumathoidArthrits
• SecondaryOsteoporosis
• Alcohol
• VitaminD
• Calcium
• Dose_Walk
• Dose_moderate
• Dose_vigorous
D A T A D E S C R I P T I O N
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the variables:
• 29
Main variables kept after the analysis:
• 17
• Sex (gender)
• Age
• Weight
• Height
• Class
• HIPX
• Menopause
• HRT
• Smoking
• ReumathoidArthrits
• SecondaryOsteoporosis
• Alcohol
• VitaminD
• Calcium
• Dose_Walk
• Dose_moderate
• Dose_vigorous
M I S S I N G D A T A A N A L Y S I S
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
Number of patients without missing values:
• 74.708
Number of patients without a spinal fracture:
• 74.554
Number of patients affected by spinal fracture:
• 154
D I S T R I B U T I O N O F M I S S I N G D A T A P E R F E A T U R E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
D I S T R I B U T I O N O F M I S S I N G D A T A P E R F E A T U R E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
D I S T R I B U T I O N O F M I S S I N G D A T A P E R F E A T U R E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
M I S S I N G D A T A A N A L Y S I S
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
Number of patients without missing values:
• 153.884
Number of patients without a spinal fracture:
• 153.606
Number of patients affected by spinal fracture:
• 278
Without Fracture Affected by Fracture
S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
t-SNE, that stands for t-distributed Stocastic Neighbour Embedding, is
an high dimensionality embedding approach specific for visualization
of high-dimensional datasets in a low-dimensional space (usually
composed of two or three dimension) through nonlinear
dimensionality reduction
S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
• All the Patients together
• Male Patients
• Female Patients
• Female Patients affected by menopause
• Female Patients Not affected by menopause
S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the Observations
Sampled Observations
A L L P A T I E N T S
S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the Observations
Sampled Observations
M A L E P A T I E N T S
S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the Observations
Sampled Observations
A L L F E M A L E P A T I E N T S
S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the Observations
Sampled Observations
F E M A L E P A T I E N T S A F F E C T E D B Y M E N O P A U S E
S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
All the Observations
Sampled Observations
F E M A L E P A T I E N T S N O T A F F E C T E D B Y M E N O P A U S E
S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E
D A T A U N D E R S T A N D I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
Female Patients in menopause condition + All other patients
R E S U L T S
1
D A T A
S T A N D A R D I Z A T I O N
2
3
D A T A
S P L I T T I N G
The dataset has to be split into 3 different group of data
because of previous t-SNE considerations:
1. Complete data
2. Female Patients in Menopause Condition / Other Patients
3. Male Patients /Female Patients in Menopause / Female
patients Not in Menopause
Then, for each subset, the data has to be split into the Training,
Validation and Test sets.
D A T A
U N D E R S A M P L I N G
The subsets have to be undersampled to reduce the imbalance
between the two classes of patients.
The used technique is the Random undersampler.
The phase in which the dataset become a good knowledge base
D A T A P R E P A R A T I O N
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
N E U R A L N E T W O R K
T R A I N I N G
M O D E L I N G
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
N E U R A L N E T W O R K
L O C A L E V A L U A T I O N
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
MODELING
T R A I N I N G A L G O R I T H M
For each group previously defined:
1. Two basic training using the original dataset, undersampled
with a ratio of 0.5 and 1.0 between the two classes (for
instance, 100 fractured patients and 200 not fracuted ones
with a ratio of 0.5, 100 fractured patients and 100 not
fractured ones with a ratio of 1.0).
2. Then there will be the search of the best Neural Network
between one and two layers, composed of a number of
neurons that goes from 1 to the double of the input size,
compared using the AUC score.
3. Once that the best Neural Network has been found, all the
False Negative extracted from this best model, will be
classified and ranked through formula obtained by
Professor Lee in one of his publications.
4. Then there will be 5 new datasets for each of the starting
bases: 10,20,30,40,50 percentage of the ranked patients
will be added to the datasets and there will be the search
for a new Neural Network for each of the new datasets, with
ratio of 0.5 and 1.
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
MODELING
T R A I N I N G A L G O R I T H M S T E P S
1. Base Neural Networks training.
2. Search fot the best Neural Networks among all the
possible ones.
3. False Negative evaluation.
4. Dataset enhancing.
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
MODELING
L O C A L E V A L U A T I O N
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
MODELING
L O C A L E V A L U A T I O N
Best Neural Network specific:
• Trained with a base ratio of 0.5
• Enchanced with 20% of the False Negative patients
• Trained with a local ratio of 0.5
Once found the best Neural Network, we have to use the
Test set to get the model ready for the final evaluation.
A C O S T - M A T R I X A N A L Y S I S
F I N A L E V A L U A T I O N
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
FINAL EVALUATION
C O S T M A T R I X A N A L Y S I S
True
Positive
False
Negative
False
Positive
True
Negative
Confusion
Matrix
£ 0.00 £ 47.00
£ 453.00 £ 0.00
Cost
Matrix
• Data source for costs: Southampton Hospital
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
FINAL EVALUATION
C O S T M A T R I X A N A L Y S I S
• Total Costs: £ 202,277.00
Group 1: All the Patients
• Total Costs: £ 226,965.00
Group 2: t-SNE Division
• Total Costs: £ 295,974.00
Group 3: Complete Division
0 3
C O N C L U S I O N S
A N D F U T U R E
D E V E L O P M E N T S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E
J O U R N E Y
P L A N
A final evaluation with personal
conclusions regarding the
whole project work, with hints
for future developments
C O N C L U S I O N S
A N D F U T U R E D E V E L O P M E N T S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E B E S T M O D E L
• 2 hidden layers
• 16 neurons on input layer
• 32 neurons on first hidden layer
• 29 neurons on second hidden layer
• 1 neuron on output layer
C O N C L U S I O N S
A N D F U T U R E D E V E L O P M E N T S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E B E S T M O D E L
• 13 % of the healthy patients are going to have a
check for security
• 1 ill patient out of 3 needs a double check to
find his fractured status
C O N C L U S I O N S
A N D F U T U R E D E V E L O P M E N T S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
• T-SNE gave us better results for single Networks, but
worst on overall cost’s analysis.
• Even with a misclassification of 1 out of 3, Neural
Networks are a good tool to deal with this kind of
issues.
I N C O N C L U S I O N …
C O N C L U S I O N S
A N D F U T U R E D E V E L O P M E N T S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
• Deep Learning for Deep Neural Networks could be
applied on this same way of working to find differences
with the Machine Learning ones, and compare the two
kinds of Neural Networks, finding the best approach to
this problem.
F U T U R E D E V E L O P M E N T S
0 4
Q U E S T I O N
& A N S W E R S
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H E
J O U R N E Y
P L A N
Any further question?
Just ask :)
A N Y Q U E S T I O N S ?
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
T H A N K Y O U
F O R T H E A T T E N T I O N
Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
Project data available at:
https://github.com/TimeParadox89/MSc-Thesis
Slides available at:
https://www.slideshare.net/ErennioIannotta

Contenu connexe

Similaire à Machine Learning models for classification and prediction on osteoporotic spinal fractures

[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Rp2-2015-technology trends enriching consumer experience
Rp2-2015-technology trends enriching consumer experienceRp2-2015-technology trends enriching consumer experience
Rp2-2015-technology trends enriching consumer experienceRavi Pal
 
أدوات قياس الانتاجية العلمية والتأثير العلمي للباحثين على شبكة الانترنت لرفع...
 أدوات قياس الانتاجية العلمية والتأثير العلمي للباحثين على شبكة الانترنت لرفع... أدوات قياس الانتاجية العلمية والتأثير العلمي للباحثين على شبكة الانترنت لرفع...
أدوات قياس الانتاجية العلمية والتأثير العلمي للباحثين على شبكة الانترنت لرفع...Beni-Suef University
 
Hack4Sports 2016: Pitch team COAT
Hack4Sports 2016: Pitch team COATHack4Sports 2016: Pitch team COAT
Hack4Sports 2016: Pitch team COATHack4Sports
 
12 YEARS OF DATA, RESULTS AND EXPERIENCES IN THE EUROPEAN RESEARCHERS’ NIGHT ...
12 YEARS OF DATA, RESULTS AND EXPERIENCES IN THE EUROPEAN RESEARCHERS’ NIGHT ...12 YEARS OF DATA, RESULTS AND EXPERIENCES IN THE EUROPEAN RESEARCHERS’ NIGHT ...
12 YEARS OF DATA, RESULTS AND EXPERIENCES IN THE EUROPEAN RESEARCHERS’ NIGHT ...Giovanni Mazzitelli
 
Science communication a new frontier of researcher’s job - part 1
Science communication a new frontier of researcher’s job - part 1Science communication a new frontier of researcher’s job - part 1
Science communication a new frontier of researcher’s job - part 1Giovanni Mazzitelli
 
ICPC 2015 - Welcome from the chairs
ICPC 2015 - Welcome from the chairsICPC 2015 - Welcome from the chairs
ICPC 2015 - Welcome from the chairsRocco Oliveto
 
Psychoinformatics in management
Psychoinformatics in managementPsychoinformatics in management
Psychoinformatics in managementD Dutta Roy
 
A pattern analysis of Twitter accounts I follow
A pattern analysis of Twitter accounts I followA pattern analysis of Twitter accounts I follow
A pattern analysis of Twitter accounts I followademoins
 
Gabe Tartaglia - Engaging in a Connected World With the Power of Audio
Gabe Tartaglia -  Engaging in a Connected World With the Power of AudioGabe Tartaglia -  Engaging in a Connected World With the Power of Audio
Gabe Tartaglia - Engaging in a Connected World With the Power of AudioJulia Grosman
 
Gabe Tartaglia - Engaging in a Connected World With the Power of Audio
Gabe Tartaglia	 - Engaging in a Connected World With the Power of AudioGabe Tartaglia	 - Engaging in a Connected World With the Power of Audio
Gabe Tartaglia - Engaging in a Connected World With the Power of AudioJulia Grosman
 
Data Visualizations in Digital Products (ProductCamp Boston 2016)
Data Visualizations in Digital Products (ProductCamp Boston 2016)Data Visualizations in Digital Products (ProductCamp Boston 2016)
Data Visualizations in Digital Products (ProductCamp Boston 2016)ProductCamp Boston
 
DIY Data Visualisation to Fuel Your Content Marketing Strategy
DIY Data Visualisation to Fuel Your Content Marketing StrategyDIY Data Visualisation to Fuel Your Content Marketing Strategy
DIY Data Visualisation to Fuel Your Content Marketing StrategyKrystian Szastok
 
Salt Lake Community College - Herriman Campus General Education Building
Salt Lake Community College - Herriman Campus General Education BuildingSalt Lake Community College - Herriman Campus General Education Building
Salt Lake Community College - Herriman Campus General Education BuildingHigherEdUtah
 
1. Introduction to biostatistics
1. Introduction to biostatistics1. Introduction to biostatistics
1. Introduction to biostatisticsRazif Shahril
 

Similaire à Machine Learning models for classification and prediction on osteoporotic spinal fractures (20)

Donnart Melvil Resume
Donnart Melvil ResumeDonnart Melvil Resume
Donnart Melvil Resume
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Rp2-2015-technology trends enriching consumer experience
Rp2-2015-technology trends enriching consumer experienceRp2-2015-technology trends enriching consumer experience
Rp2-2015-technology trends enriching consumer experience
 
أدوات قياس الانتاجية العلمية والتأثير العلمي للباحثين على شبكة الانترنت لرفع...
 أدوات قياس الانتاجية العلمية والتأثير العلمي للباحثين على شبكة الانترنت لرفع... أدوات قياس الانتاجية العلمية والتأثير العلمي للباحثين على شبكة الانترنت لرفع...
أدوات قياس الانتاجية العلمية والتأثير العلمي للباحثين على شبكة الانترنت لرفع...
 
Resume(Nat)
Resume(Nat)Resume(Nat)
Resume(Nat)
 
Hack4Sports 2016: Pitch team COAT
Hack4Sports 2016: Pitch team COATHack4Sports 2016: Pitch team COAT
Hack4Sports 2016: Pitch team COAT
 
12 YEARS OF DATA, RESULTS AND EXPERIENCES IN THE EUROPEAN RESEARCHERS’ NIGHT ...
12 YEARS OF DATA, RESULTS AND EXPERIENCES IN THE EUROPEAN RESEARCHERS’ NIGHT ...12 YEARS OF DATA, RESULTS AND EXPERIENCES IN THE EUROPEAN RESEARCHERS’ NIGHT ...
12 YEARS OF DATA, RESULTS AND EXPERIENCES IN THE EUROPEAN RESEARCHERS’ NIGHT ...
 
Science communication a new frontier of researcher’s job - part 1
Science communication a new frontier of researcher’s job - part 1Science communication a new frontier of researcher’s job - part 1
Science communication a new frontier of researcher’s job - part 1
 
ICPC 2015 - Welcome from the chairs
ICPC 2015 - Welcome from the chairsICPC 2015 - Welcome from the chairs
ICPC 2015 - Welcome from the chairs
 
CV Nuriman Novianto
CV Nuriman NoviantoCV Nuriman Novianto
CV Nuriman Novianto
 
Psychoinformatics in management
Psychoinformatics in managementPsychoinformatics in management
Psychoinformatics in management
 
A pattern analysis of Twitter accounts I follow
A pattern analysis of Twitter accounts I followA pattern analysis of Twitter accounts I follow
A pattern analysis of Twitter accounts I follow
 
Resume 2016
Resume 2016 Resume 2016
Resume 2016
 
Gabe Tartaglia - Engaging in a Connected World With the Power of Audio
Gabe Tartaglia -  Engaging in a Connected World With the Power of AudioGabe Tartaglia -  Engaging in a Connected World With the Power of Audio
Gabe Tartaglia - Engaging in a Connected World With the Power of Audio
 
Gabe Tartaglia - Engaging in a Connected World With the Power of Audio
Gabe Tartaglia	 - Engaging in a Connected World With the Power of AudioGabe Tartaglia	 - Engaging in a Connected World With the Power of Audio
Gabe Tartaglia - Engaging in a Connected World With the Power of Audio
 
Data Visualizations in Digital Products (ProductCamp Boston 2016)
Data Visualizations in Digital Products (ProductCamp Boston 2016)Data Visualizations in Digital Products (ProductCamp Boston 2016)
Data Visualizations in Digital Products (ProductCamp Boston 2016)
 
DIY Data Visualisation to Fuel Your Content Marketing Strategy
DIY Data Visualisation to Fuel Your Content Marketing StrategyDIY Data Visualisation to Fuel Your Content Marketing Strategy
DIY Data Visualisation to Fuel Your Content Marketing Strategy
 
Small data big impact
Small data big impactSmall data big impact
Small data big impact
 
Salt Lake Community College - Herriman Campus General Education Building
Salt Lake Community College - Herriman Campus General Education BuildingSalt Lake Community College - Herriman Campus General Education Building
Salt Lake Community College - Herriman Campus General Education Building
 
1. Introduction to biostatistics
1. Introduction to biostatistics1. Introduction to biostatistics
1. Introduction to biostatistics
 

Plus de Erennio Iannotta

Progetto SAxS - Sistema assistenziale per sordomuti
Progetto SAxS - Sistema assistenziale per sordomutiProgetto SAxS - Sistema assistenziale per sordomuti
Progetto SAxS - Sistema assistenziale per sordomutiErennio Iannotta
 
CoolMi by CooLook - Business Plan
CoolMi by CooLook - Business PlanCoolMi by CooLook - Business Plan
CoolMi by CooLook - Business PlanErennio Iannotta
 
CoolMi by CooLooK - the pitch
CoolMi by CooLooK - the pitchCoolMi by CooLooK - the pitch
CoolMi by CooLooK - the pitchErennio Iannotta
 
TIM - An Italian ISO Case Study
TIM - An Italian ISO Case StudyTIM - An Italian ISO Case Study
TIM - An Italian ISO Case StudyErennio Iannotta
 
Tell Me Stories - Discovery challenge implementation
Tell Me Stories - Discovery challenge implementationTell Me Stories - Discovery challenge implementation
Tell Me Stories - Discovery challenge implementationErennio Iannotta
 
Discovery challenge - a CBL project
Discovery challenge - a CBL projectDiscovery challenge - a CBL project
Discovery challenge - a CBL projectErennio Iannotta
 
SLEM - Status and Location of Equipment and Material
SLEM - Status and Location of Equipment and Material SLEM - Status and Location of Equipment and Material
SLEM - Status and Location of Equipment and Material Erennio Iannotta
 

Plus de Erennio Iannotta (7)

Progetto SAxS - Sistema assistenziale per sordomuti
Progetto SAxS - Sistema assistenziale per sordomutiProgetto SAxS - Sistema assistenziale per sordomuti
Progetto SAxS - Sistema assistenziale per sordomuti
 
CoolMi by CooLook - Business Plan
CoolMi by CooLook - Business PlanCoolMi by CooLook - Business Plan
CoolMi by CooLook - Business Plan
 
CoolMi by CooLooK - the pitch
CoolMi by CooLooK - the pitchCoolMi by CooLooK - the pitch
CoolMi by CooLooK - the pitch
 
TIM - An Italian ISO Case Study
TIM - An Italian ISO Case StudyTIM - An Italian ISO Case Study
TIM - An Italian ISO Case Study
 
Tell Me Stories - Discovery challenge implementation
Tell Me Stories - Discovery challenge implementationTell Me Stories - Discovery challenge implementation
Tell Me Stories - Discovery challenge implementation
 
Discovery challenge - a CBL project
Discovery challenge - a CBL projectDiscovery challenge - a CBL project
Discovery challenge - a CBL project
 
SLEM - Status and Location of Equipment and Material
SLEM - Status and Location of Equipment and Material SLEM - Status and Location of Equipment and Material
SLEM - Status and Location of Equipment and Material
 

Dernier

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 

Dernier (20)

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 

Machine Learning models for classification and prediction on osteoporotic spinal fractures

  • 1. M A C H I N E L E A R N I N G M O D E L S F O R C L A S S I F I C A T I O N A N D P R E D I C T I O N O N O S T E O P O R O T I C S P I N A L F R A C T U R E S Erennio Iannotta– UP919761 M S c I n f or m a t i on S yst e m s
  • 2. T H E J O U R N E Y P L A N 0 1 P R O B L E M D O M A I N A brief introduction to the problem of Osteoporosis, then to Machine Learning, its purposes and techniques, its evaluation methods, focusing on the applied project techniques 0 2 0 3 0 4 P R O J E C T W O R K Project development tools and work flow presentation, going in the details of the main steps Q U E S T I O N & A N S W E R S Any further questions? Just ask :) C O N C L U S I O N S A N D F U T U R E D E V E L O P M E N T S A final evaluation with personal conclusions regarding the whole project work, with hints for future developments Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
  • 3. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 T H E J O U R N E Y P L A N 0 1 P R O B L E M D O M A I N A brief introduction to the problem of Osteoporosis, then to Machine Learning, its purposes and techniques, its evaluation methods, focusing on the applied project techniques
  • 4. P R O B L E M D O M A I N Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 O S T E O P O R O S I S Osteoporosis is a progressive condition that is characterized by a reduction of Bone Mineral Density (BMD) leading to greater bones' fragility. Healthy Bone Osteoporotic Bone Bone Density comparison
  • 5. P R O B L E M D O M A I N Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 O S T E O P O R O S I S Consequences: • Pain • Difficultly walking • Paralysis • Death Spinal Fractures
  • 6. M A C H I N E L E A R N I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 W H A T I S I T ?
  • 7. M A C H I N E L E A R N I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 N E U R A L N E T W O R K S A neural network is a type of machine learning which models itself after the human brain. This creates an artificial neural network that via an algorithm allows the computer to learn by incorporating new data. Made of: • Nodes Two ways of learning: • Supervised • Unsupervised Evaluation criteria: • ROC-AUC Curve • Confusion Matrix
  • 8. M A C H I N E L E A R N I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 R O C – A U C C U R V E True Positive Rate (TPR) vs False Positive Rate (FPR) Perfect separability True Positive Rate: !" !" + $% False Positive Rate: $" !% + $"
  • 9. M A C H I N E L E A R N I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 R O C – A U C C U R V E True Positive Rate (TPR) vs False Positive Rate (FPR) Good separability True Positive Rate: !" !" + $% False Positive Rate: $" !% + $"
  • 10. M A C H I N E L E A R N I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 R O C – A U C C U R V E True Positive Rate: !" !" + $% True Positive Rate (TPR) vs False Positive Rate (FPR) No Separability False Positive Rate: $" !% + $"
  • 11. M A C H I N E L E A R N I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 T H E C O N F U S I O N - M A T R I X True Positive False Negative False Positive True Negative Confusion Matrix PredictedValues Actual Values Positive (1) Negative (0) Positive (1) Negative (0)
  • 12. F A L S E P O S I T I V E M A C H I N E L E A R N I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 M O D E L S ’ C O N F U S I O N - M A T R I X E V A L U A T I O N You are pregnant! F A L S E N E G A T I V E You are not pregnant! Whait… What?!
  • 13. 0 2 P R O J E C T W O R K Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 T H E J O U R N E Y P L A N Project development tools and work flow presentation, going in the details of the main steps
  • 14. P R O J E C T W O R K F L O W D A T A P R E P R O C E S S I N G" Understand the data in order to improve its quality, to give a better knowledge base to the Machine Learning algorithms. M O D E L I N G" The application of Machine Learning algorithms to learn and predict new informations, based on the previously prepared data " F I N A L E V A L U A T I O N Cost-Analysis based evaluation to find the best Analyzed methodology Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
  • 15. R + R - S T U D I O Used for: • Explorative Data Analysis during the Data Understanding step Used for: • Data preparation • Modeling and local Evaluation • Final Evaluation D E V E L O P M E N T T O O L S Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 P Y T H O N + J U P Y T E R N O T E B O O K
  • 16. D A T A U N D E R S T A N D I N G This is a phase of information extraction, meant to find the best insight abouth the composition of the data, to manipulate it in the next steps. In this phase the data will be prepared, following the insight given by the understanding phase, to be the best knowledge base as possible D A T A P R E P R O C E S S I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 D A T A P R E P A R A T I O N
  • 17. D A T A D E S C R I P T I O N D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 Data source: • UK Biobank (http://www.ukbiobank.ac.uk/) Data Composition: • Shrinked from 680 to 29 variables Data Acquisition: • Supervised • Answering survey • Analysis of biological samples (blood, saliva)
  • 18. D A T A D E S C R I P T I O N D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 All the variables: • 29 • Eid • Sex (gender) • Age • Ethnic • Weight • Height • BMI • Waist • BMI_Category • Waist_Category • VBX • HIPX • Menopause • HRT • Smoking • ReumathoidArthrits • SecondaryOsteoporosis • Alcohol • Alcohol24 • VitaminD • Calcium • Dose_Walk • Dose_moderate • Dose_vigorous • Dose_pleasure • Dose_sport • Dose_exercise • Dose_lightDIY • Dose_heavyDIY
  • 19. D A T A D E S C R I P T I O N D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 All the variables: • 29 Main variables kept after the analysis: • 17 • Sex (gender) • Age • Weight • Height • VBX • HIPX • Menopause • HRT • Smoking • ReumathoidArthrits • SecondaryOsteoporosis • Alcohol • VitaminD • Calcium • Dose_Walk • Dose_moderate • Dose_vigorous
  • 20. D A T A D E S C R I P T I O N D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 All the variables: • 29 Main variables kept after the analysis: • 17 • Sex (gender) • Age • Weight • Height • Class • HIPX • Menopause • HRT • Smoking • ReumathoidArthrits • SecondaryOsteoporosis • Alcohol • VitaminD • Calcium • Dose_Walk • Dose_moderate • Dose_vigorous
  • 21. M I S S I N G D A T A A N A L Y S I S D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 Number of patients without missing values: • 74.708 Number of patients without a spinal fracture: • 74.554 Number of patients affected by spinal fracture: • 154
  • 22. D I S T R I B U T I O N O F M I S S I N G D A T A P E R F E A T U R E D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
  • 23. D I S T R I B U T I O N O F M I S S I N G D A T A P E R F E A T U R E D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
  • 24. D I S T R I B U T I O N O F M I S S I N G D A T A P E R F E A T U R E D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
  • 25. M I S S I N G D A T A A N A L Y S I S D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 Number of patients without missing values: • 153.884 Number of patients without a spinal fracture: • 153.606 Number of patients affected by spinal fracture: • 278 Without Fracture Affected by Fracture
  • 26. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 t-SNE, that stands for t-distributed Stocastic Neighbour Embedding, is an high dimensionality embedding approach specific for visualization of high-dimensional datasets in a low-dimensional space (usually composed of two or three dimension) through nonlinear dimensionality reduction
  • 27. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 • All the Patients together • Male Patients • Female Patients • Female Patients affected by menopause • Female Patients Not affected by menopause
  • 28. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 All the Observations Sampled Observations A L L P A T I E N T S
  • 29. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 All the Observations Sampled Observations M A L E P A T I E N T S
  • 30. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 All the Observations Sampled Observations A L L F E M A L E P A T I E N T S
  • 31. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 All the Observations Sampled Observations F E M A L E P A T I E N T S A F F E C T E D B Y M E N O P A U S E
  • 32. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 All the Observations Sampled Observations F E M A L E P A T I E N T S N O T A F F E C T E D B Y M E N O P A U S E
  • 33. S E P A R A B I L I T Y A N A L Y S I S T H R O U G H T - S N E D A T A U N D E R S T A N D I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 Female Patients in menopause condition + All other patients R E S U L T S
  • 34. 1 D A T A S T A N D A R D I Z A T I O N 2 3 D A T A S P L I T T I N G The dataset has to be split into 3 different group of data because of previous t-SNE considerations: 1. Complete data 2. Female Patients in Menopause Condition / Other Patients 3. Male Patients /Female Patients in Menopause / Female patients Not in Menopause Then, for each subset, the data has to be split into the Training, Validation and Test sets. D A T A U N D E R S A M P L I N G The subsets have to be undersampled to reduce the imbalance between the two classes of patients. The used technique is the Random undersampler. The phase in which the dataset become a good knowledge base D A T A P R E P A R A T I O N Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
  • 35. N E U R A L N E T W O R K T R A I N I N G M O D E L I N G Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 N E U R A L N E T W O R K L O C A L E V A L U A T I O N
  • 36. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 MODELING T R A I N I N G A L G O R I T H M For each group previously defined: 1. Two basic training using the original dataset, undersampled with a ratio of 0.5 and 1.0 between the two classes (for instance, 100 fractured patients and 200 not fracuted ones with a ratio of 0.5, 100 fractured patients and 100 not fractured ones with a ratio of 1.0). 2. Then there will be the search of the best Neural Network between one and two layers, composed of a number of neurons that goes from 1 to the double of the input size, compared using the AUC score. 3. Once that the best Neural Network has been found, all the False Negative extracted from this best model, will be classified and ranked through formula obtained by Professor Lee in one of his publications. 4. Then there will be 5 new datasets for each of the starting bases: 10,20,30,40,50 percentage of the ranked patients will be added to the datasets and there will be the search for a new Neural Network for each of the new datasets, with ratio of 0.5 and 1.
  • 37. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 MODELING T R A I N I N G A L G O R I T H M S T E P S 1. Base Neural Networks training. 2. Search fot the best Neural Networks among all the possible ones. 3. False Negative evaluation. 4. Dataset enhancing.
  • 38. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 MODELING L O C A L E V A L U A T I O N
  • 39. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 MODELING L O C A L E V A L U A T I O N Best Neural Network specific: • Trained with a base ratio of 0.5 • Enchanced with 20% of the False Negative patients • Trained with a local ratio of 0.5 Once found the best Neural Network, we have to use the Test set to get the model ready for the final evaluation.
  • 40. A C O S T - M A T R I X A N A L Y S I S F I N A L E V A L U A T I O N Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
  • 41. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 FINAL EVALUATION C O S T M A T R I X A N A L Y S I S True Positive False Negative False Positive True Negative Confusion Matrix £ 0.00 £ 47.00 £ 453.00 £ 0.00 Cost Matrix • Data source for costs: Southampton Hospital
  • 42. Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 FINAL EVALUATION C O S T M A T R I X A N A L Y S I S • Total Costs: £ 202,277.00 Group 1: All the Patients • Total Costs: £ 226,965.00 Group 2: t-SNE Division • Total Costs: £ 295,974.00 Group 3: Complete Division
  • 43. 0 3 C O N C L U S I O N S A N D F U T U R E D E V E L O P M E N T S Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 T H E J O U R N E Y P L A N A final evaluation with personal conclusions regarding the whole project work, with hints for future developments
  • 44. C O N C L U S I O N S A N D F U T U R E D E V E L O P M E N T S Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 T H E B E S T M O D E L • 2 hidden layers • 16 neurons on input layer • 32 neurons on first hidden layer • 29 neurons on second hidden layer • 1 neuron on output layer
  • 45. C O N C L U S I O N S A N D F U T U R E D E V E L O P M E N T S Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 T H E B E S T M O D E L • 13 % of the healthy patients are going to have a check for security • 1 ill patient out of 3 needs a double check to find his fractured status
  • 46. C O N C L U S I O N S A N D F U T U R E D E V E L O P M E N T S Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 • T-SNE gave us better results for single Networks, but worst on overall cost’s analysis. • Even with a misclassification of 1 out of 3, Neural Networks are a good tool to deal with this kind of issues. I N C O N C L U S I O N …
  • 47. C O N C L U S I O N S A N D F U T U R E D E V E L O P M E N T S Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 • Deep Learning for Deep Neural Networks could be applied on this same way of working to find differences with the Machine Learning ones, and compare the two kinds of Neural Networks, finding the best approach to this problem. F U T U R E D E V E L O P M E N T S
  • 48. 0 4 Q U E S T I O N & A N S W E R S Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 T H E J O U R N E Y P L A N Any further question? Just ask :)
  • 49. A N Y Q U E S T I O N S ? Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019
  • 50. T H A N K Y O U F O R T H E A T T E N T I O N Erennio Iannotta – UP919761 MSc Information Systems – 2018/2019 Project data available at: https://github.com/TimeParadox89/MSc-Thesis Slides available at: https://www.slideshare.net/ErennioIannotta