2. WHO AM I?
Javier Samir Rey
Systems engineer
Machine learning engineer - Direktio
Co-organizer meetup Big Data Colombia
jreyro@gmail.com
javier-samir-rey-7104195
github/jasam
5. 3 - GOOD HEALTH AND WELL-BEING
“Ensure healthy lives and promoting
well-being for all at all ages.”
● Reproductive maternal and child
health.
● Communicable, non-communicable
and environmental diseases.
● Health risk reduction and
management.
● Universal health coverage.
6. NO AND COMMUNICABLE DISEASES
The incidence of major infectious
diseases: HIV, tuberculosis and
malaria.
Almost half the world’s
population is at risk of malaria.
889,000 people died from
infectious diseases caused largely by
faecal contamination of water.
40 millions global death were due NCDs.
48% deaths were premature.
75% of premature deaths were caused by
cardiovascular disease,
cancer, diabetes and chronic
respiratory disease.
80% of heart disease, stroke and diabetes
can be prevented.
Source: United Nations
CDs NCDs
7. Noncommunicable diseases (NCDs), also known as chronic diseases, tend
to be of long duration and are the result of a combination of genetic,
physiological, environmental and behaviours factors.
Detection, screening and treatment of NCDs, as well as palliative care, are
key components of the response to NCDs.
An important way to control NCDs is to focus on reducing the risk factors
associated with these diseases. Low-cost solutions exist for
governments and other stakeholders to reduce the common modifiable
risk factors. Monitoring progress and trends of NCDs and their risk is
important for guiding policy and priorities.
NON COMMUNICABLE DISEASES
8. Decreased quality
of life of the
human being.
IMPACT
In low-resource settings,
health-care costs for NCDs
quickly drain
household resources.
The exorbitant costs of
NCDs, including often
lengthy and expensive
treatment and loss of
breadwinners, force millions
of people into poverty
annually and stifle.
9. Hypertension and
Diabetes Mellitus
COLOMBIA NO COMMUNICABLE DISEASES
major precursors of
- Ischemic cardiovascular disease
- Cerebrovascular events
- End-stage renal disease
- Death
prevalence
- Hypertension: 6.5 %
- Diabetes: 1.9 %
20% of the population
consumes 80%of the
resources.
Source: cuenta de alto costo
10. SOME REVIEW
Data is quickly emerging as the greatest asset of the
healthcare industry. The trend in our industry is to drive many
decisions supported by data. it is a walk of maturity with the real gold
nuggets coming in Analytics 3.0 and beyond. This will not be solved
with a product or purchased off the shelf. Big Data needs to be
part of the DNA of an organization.
-- Chris Belmont, MBA
Vice President and Chief Information Officer
MD Anderson Cancer Center
11. “I know that 50% of my
advertising is wasted, I just
don’t know which half.”
WANAMAKER’S QUESTION
Healthcare industry is now awash in data in a way that it has never
been before: biological, gene expression, sensors, DNA, sequence, EHRs,
drug and medicals. We have entered a new era in which we can work on
massive datasets effectively combining it. We can start asking
the important questions, the wanamaker questions!
The opportunities are huge!.
Source: wikipedia
15. AGILE DATA SCIENCE MANIFIESTO
Source: agile data science 2.0
Iterate, iterate, iterate: tables, charts, reports, predictions
- roadmap projects.
1
Integrate the tyrannical opinion of data in product
management.
4
Ship intermediate output. Even failed experiments have
output.
2
Prototype experiments over implementing tasks.
3
16. AGILE DATA SCIENCE MANIFIESTO
Source: agile data science 2.0
Climb up and down the data-value pyramid as we work.
5
Discover and pursue the critical path to a killer product.
6
Get meta. Describe the process, not just the end-state.
7
18. BUSINESS UNDERSTANDING
It is one of the most
important concepts of
data science!
It is vital to understand the problem to be solved and context.
1
Often recasting the problem and designing a solution is an
iterative process of discovery.
2
The Business Understanding stage represents a part of the
craft where the analysts’ creativity plays a large role.
3
19. BUSINESS UNDERSTANDING
It is one of the most
important concepts of
data science!
The key to a great success is a creative problem formulation how
to cast the business problem as one or more data science
problems (subproblems).
4
What is the expected value.
5
Team’s help is really important, we are not alone.
6
20. BUSINESS UNDERSTANDING - HEALTH
Source: mckinsey and company
Big data has a higher
potential in 3 ways:
● Precision medicine
● Diagnose diseases
● Optimize clinical
trials
21. BUSINESS UNDERSTANDING - HEALTH
ACTORS
● Clinicians, domain experts and financial
analysts
● Managers, IT developers, consultants and
vendors
● Policy makers
● Patients and consumers
● Executives and lines-of-business leaders
● Researches and academia
● Health institutions
● Society
Build your strategy together!
22. BUSINESS UNDERSTANDING - HEALTH
CHRONIC CONDITIONS CARE MODEL
Source: Cuidado das Condições Crônicas na Atenção Primária à Saúde
Inspired by
the pyramid
of Kaiser
Permanente!
23. DATA UNDERSTANDING
Solving the business problem is the goal.
1
It is important to understand the strengths and
limitations of the data because rarely is there an exact
match with the problem.
2
Some data will be available virtually for free while others
will require effort to obtain.
3
Cleaning and matching different sources in only one record
match is itself could be a complicated analytics problem
4
24. DATA UNDERSTANDING
Remember all V’s about data: volume, velocity, variety,
variability, veracity, visualization and value.
5
Design and build data engineering team that supports
your data requirements.
6
Data Governance DAMA (Data Management Association
International)
7
25. DATA UNDERSTANDING - HEALTH
SOURCES FOR DATA IN HEALTHCARE
Healthcare data Examples
Images Radiographic, Images, MRIs, Ultrasounds and Nuclear
imaging
Un-/semi-structured Clinical narratives, Physician notes, Level 2,3 OMICS,
Summaries, Pathology reports
Streaming Bedside, remote monitors, Implants, fitness bands, smart
watches and smart phones
Social media Facebook, Twitter, Web forums and communities
Structure data All claims, EHR, ERP and other information systems
Dark data Server logs, application error logs, account information,
emails and documents
29. DATA PREPARATION
The analytic technologies could be powerful but they impose
certain requirements on the data they use (data table).
1
Typical examples of data preparation are converting data
to tabular format
2
Feature engineering.3
Technology is important but this is not the main point.4
30. DATA PREPARATION
The process defining the variables. This is one of the main
points at which human creativity, common sense,
and business knowledge come into play.
4
Document your time process.
5
Think optimization process -Big O6
Little blocks of processing - plan for scale7
31. DATA PREPARATION - COMPUTING BOUND
Source: hadoop in the enterprise: architecture
32. DATA PREPARATION - DATA ENGINEERING
Pair review
Modularize
your project
Create professional
projects - world
class solutions
using: versioning,
standards, right
tools, unit tests.
33. DATA PREPARATION - TABULAR FORM - THE GOAL
Primary care
Secondary care
Medication
Other data… a lot of
types
ID age med height weight BMI diet
1 15 Y 168 60 21.3 Y
2 20 Y 185 80 23.4 Y
3 65 N 192 90 24.4 N
4 48 N 172 85 28.7 N
5 45 Y 185 79 23.1 N
6 79 N 182 71 21.4 Y
7 22 Y 186 79 22.8 Y
Feature engineering
Data points this is
the key (N*M)! After a
very expensive
process
To put data together
is challenging
Data engineering
N features
Mobservations
34. DATA MODELING
The creation of models from data is known as model induction.
Induction is a term from philosophy that refers to generalizing from
specific cases to general rules (or laws, or truths).
Source: Data science for business
Generally speaking, a model is a simplified representation of
reality created to serve a purpose.
In data science, a predictive model is a formula for estimating the
unknown value of interest: the target. The formula could be
mathematical, or it could be a logical statement such as a rule. Often it is
a hybrid of the two.
Many Names for the
Same Things!.
35. DATA MODELING - BEST PRACTICES
Ask a specific question, Remember you are solving a
business problem, not a math problem.
1
Start simple, start with the minimal set of data.
2
Try many algorithms but remember that data is more
important than the exact algorithm, better your features.
3
Treat your data with suspicion, understand its
idiosyncrasy.
4
Normalize your inputs
5
36. DATA MODELING - BEST PRACTICES
● Validate your model (set validation and clinical)
● Do the benchmark attempt, don’t be afraid to launch your product
without ML
● Set up a feedback loop
● Healthcare doesn’t trust black boxes
● Correlation is not causation
● Monitor ongoing performance
● Don’t be fooled by “accuracy”
● Labeled data
● Use medical support libraries eg: pubmed, cochrane, American
Heart Association, Diabetes UK and so on.
42. DATA MODELING - USE CASE - ELSEVIER
RISK PREDICTIONS: WHICH DISEASE WILL YOU
LIKE GET WITHIN 4 YEARS
1600+ models
integrated into a
same
information
system.
Source: Elsevier Medical Graph - slideshare
43. DATA MODELING - USE CASE - ELSEVIER
Source: Elsevier Medical Graph - slideshare
Physician want
explanations.
Otherwise they
will not trust
the predictions
Typical best-in-class
classification methods
(deep learning, random
forest) do not yet deliver
explainable models.
In practice, you
need to save the
users processing
time, not add to it.
Visualization is
key.
Building a classification
model using open source
tools is simple. Scaling
input data size is also
manageable. Building
1000+ models is complex.
Open source tools have
failures (as have proprietary
tools). Debugging can be a
nightmare.
Implementing, applying
and maintaining a
security framework to
keep personal health
information secure is a
substantial effort.