2. WHO AM I?
Javier Samir Rey Rodríguez
● Systems engineer.
● Postgraduate studies software building.
● Mooc-learner (40 courses finished).
● Machine learning engineer - Direktio.
● Co-organizer meetup ML and data science Bogota (COL).
● Currently master degree data science (UOC).
jreyro@gmail.com
javier-samir-rey-7104195
github/jasam
jasam
3.
4.
5. “If there is no health, there is
nothing.”
Dr. Tedros Adhanom Ghebreyesus
Director-general of the World Health
Organization.
6. “Information is the oil of the 21st
century, and analytics is the
combustion engine.”
Peter Sondegaard, Senior Vice
President and Global Head of
Research for Gartner.
8. 3 - GOOD HEALTH AND WELL-BEING
“Ensure healthy lives and promoting
well-being for all at all ages.”
● Reproductive maternal and child
health.
● Communicable, non-communicable
and environmental diseases.
● Health risk reduction and
management.
● Universal health coverage.
9. NO AND COMMUNICABLE DISEASES
The incidence of major infectious
diseases: HIV, tuberculosis and
malaria.
Almost half the world’s
population is at risk of malaria.
889,000 people died from
infectious diseases caused largely by
faecal contamination of water.
40 millions global death were due NCDs.
48% deaths were premature.
75% of premature deaths were caused by
cardiovascular disease,
cancer, diabetes and chronic
respiratory disease.
80% of heart disease, stroke and diabetes
can be prevented.
Source: United Nations
CDs NCDs
16. Hypertension and
Diabetes Mellitus
SPAIN NON COMMUNICABLE DISEASES
major precursors of
- Ischemic cardiovascular disease
- Cerebrovascular events
- End-stage renal disease
- Death
prevalence
- Hypertension: 25 %
- Diabetes: 7.17
~20% of the population
consumes 80%of the
resources.
Source: who
18. “I know that 50% of my
advertising is wasted, I just
don’t know which half.”
WANAMAKER’S QUESTION
Source: wikipedia
19. “Health care is the most
difficult, chaotic, and
complex industry to
manage today. Peter
Drucker - 2002”
20.
21. THE POTENTIAL OF AI IN HEALTHCARE
● Expand capacity to generate new knowledge
–the effectiveness of treatments.
–the prediction of outcomes.
● Knowledge dissemination.
● Using AI(analytics) to combine EHR and genomic
data to translate personalized medicine to clinical
practice.
● Deliver information directly to patients and increase
patient participation in their health care.
22.
23. AGILE AI MANIFIESTO
Source: agile data science 2.0
Iterate, iterate, iterate: tables, charts, reports, predictions
- roadmap projects.
1
Integrate the tyrannical opinion of data in product
management.
4
Ship intermediate output. Even failed experiments have
output.
2
Prototype experiments over implementing tasks.
3
24. AGILE DATA SCIENCE MANIFIESTO
Source: agile data science 2.0
Climb up and down the data-value pyramid as we work.
5
Discover and pursue the critical path to a killer product.
6
Get meta. Describe the process, not just the end-state.
7
27. BUSINESS UNDERSTANDING
It is one of the most
important concepts of
data science!
It is vital to understand the problem to be solved and
context.
1
Often recasting the problem and designing a solution is an
iterative process of discovery.
2
The Business Understanding stage represents a part of the
craft where the analysts’ creativity plays a large role.
3
28. BUSINESS UNDERSTANDING
It is one of the most
important concepts of
data science!
The key to a great success is a creative problem formulation how
to cast the business problem as one or more data science
problems (subproblems).
4
What is the expected value.
5
Team’s help is really important, we are not alone.
6
29. BUSINESS UNDERSTANDING - HEALTH
ACTORS
● Clinicians, domain experts and financial
analysts
● Managers, IT developers, consultants and
vendors
● Policy makers
● Patients and consumers
● Executives and lines-of-business leaders
● Researches and academia
● Health institutions
● Society
Build your strategy together!
30.
31. DAMA - DMBOK GUIDE
Fuente: The DAMA Guide to the Data Management Body of Knowledge" (DAMA-DMBOK Guide)
Culture and
empowerment
Education
32. DATA UNDERSTANDING - HEALTH
SOURCES FOR DATA IN HEALTHCARE
Healthcare data Examples
Images Radiographic, Images, MRIs, Ultrasounds and Nuclear
imaging
Un-/semi-structured Clinical narratives, Physician notes, Level 2,3 OMICS,
Summaries, Pathology reports
Streaming Bedside, remote monitors, Implants, fitness bands, smart
watches and smart phones
Social media Facebook, Twitter, Web forums and communities
Structure data All claims, EHR, ERP and other information systems
Dark data Server logs, application error logs, account information,
emails and documents
34. DATA MODELING
The creation of models from data is known as model induction.
Induction is a term from philosophy that refers to generalizing from
specific cases to general rules (or laws, or truths).
Source: Data science for business
Generally speaking, a model is a simplified representation of
reality created to serve a purpose.
In ML (Artificial Intelligence), a predictive model is a formula for
estimating the unknown value of interest: the target. The
formula could be mathematical, or it could be a logical statement such
as a rule. Often it is a hybrid of the two.
Many Names for the
Same Things!.
35. DATA MODELING - BEST PRACTICES
Ask a specific question, Remember you are solving a
business problem, not a math problem.
1
Start simple, start with the minimal set of data.
2
Try many algorithms but remember that data is more
important than the exact algorithm, better your features.
3
Treat your data with suspicion, understand its
idiosyncrasy.
4
Normalize your inputs
5
36. DATA MODELING - BEST PRACTICES
● Validate your model (test validation and clinical)
● Do the benchmark attempt, don’t be afraid to launch your product
without AI.
● Operationalize their models.
● Set up a feedback loop.
● Healthcare doesn’t trust black boxes.
● Correlation is not causation.
● Monitor ongoing performance.
● Don’t be fooled by “accuracy”.
● Use medical support libraries eg: pubmed, cochrane, American
Heart Association, Diabetes UK and so on.
37.
38. DATA MODELING - TRADE OFF - NO FREE LUNCH
Source: oreilly strata 2013
43. DATA PREPARATION - TABULAR FORM - THE
GOAL
Primary care
Secondary care
Medication
Other data… a lot of
types
ID age med height weight BMI diet
1 15 Y 168 60 21.3 Y
2 20 Y 185 80 23.4 Y
3 65 N 192 90 24.4 N
4 48 N 172 85 28.7 N
5 45 Y 185 79 23.1 N
6 79 N 182 71 21.4 Y
7 22 Y 186 79 22.8 Y
Feature engineering
Data points this is
the key (N*M)! After a
very expensive
process
To put data together
is challenging
Data engineering
N features
Mobservations
TARGET
44. FEATURE ENGINEERING
Dealing with outliers, binarization, categorical vars are managed
by algorithm.1
Dealing with duplicates, Log transform, design thinking in terms
of TIME (Generate Lag Features).
2
Creating key (id, date), creating lag and lead features.
AUTOML3
“Coming up with features is difficult, time-consuming, requires
expert knowledge. ‘Applied machine learning’ is basically feature
engineering.”
— Prof. Andrew Ng.
50. ● Data quality.
● Lack of access to data and tools.
● Insufficient training in data science methods.
● Interoperability.
● Operational clinical data may be:
❏ Inaccurate.
❏ Incomplete.
❏ Of unknown provenance.
❏ Of insufficient granularity.
● There are many idiosyncrasies about data.
● Business models (regulations, interdisciplinary, monetization,
investment, and so on).
CHALLENGES