Where Are We Heading About Fight Against Cancer ?
How Big Data Analytics Is Transforming Oncology ?
Which Big Data OncoAnalytics Architecture Patterns ?
Which Big Data OncoAnalytics Solutions ?
Which Big Data OncoAnalytics Solution Examples ?
How Getting Start Big Data OncoAnalytics Project ?
Which Big Data OncoAnalytics Value Proposition ?
2. 2
PERSONAL INTERESTS
Big Data
Analytics
Artificial Intelligence
Data Warehousing Technology
Cloud Computing Data Center
Oncology
Architecture
Astronomy
Data ScienceHealthcare
Philippe Julio – Big Data Analytics Architect
https://www.linkedin.com/in/juliophilippe/
Digital
Big data OncoAnalytics
Consulting
3. 3
ONCOANALYTICS INSIGHT
ONCOPHYSICS ONCOGENOMICS
See into the future
• Objects : x-rays, gamma-rays, magnetic
resonance, ultrasounds, laser light...
• Quantum mechanics : photon, electron,
magnetism…
• Objects : tumors, cells, chromosomes, genes,
DNA, enzyme, hormone, antibody…
• Quantum biology : DNA mutation, cellular
respiration…
• Objects : data, bits, Qbits
• Quantum computer : artificial
Intelligence, algorithms, cryptography,
search, simulation, linear equations,
prediction, recommendation, risk…
See into the present
Quantum Physics and
Artificial Intelligence
ONCOANALYTICS
Radioscopy vision
Computer vision
• Objects : tumors, molecules, proteins,
hormones, enzymes, biomarkers, amino
acids …
• Quantum chemistry : biomolecular modeling,
chemical energy, spectra analysis…
ONCOPHARMACEUTICS
Microscopic vision
Infinite
Analytics Use Cases
See into the presentSee into the present
Microscopic vision
Big data OncoAnalytics
4. 4
KEYS QUESTIONS
• Where Are We Heading About Fight Against Cancer ?
• How Big Data Analytics Is Transforming Oncology ?
• Which Big Data OncoAnalytics Architecture Patterns ?
• Which Big Data OncoAnalytics Solutions ?
• Which Big Data OncoAnalytics Solution Examples ?
• How Getting Start Big Data OncoAnalytics Project ?
• Which Big Data OncoAnalytics Value Proposition ?
5. Where Are We Heading About
Fight Against Cancer ?
6. 6
• Cancer can start any place in the body
• Cells grow out of control and crowd out normal
cells
• There are more than 200 different types of
cancer
• Cancer starts when gene changes make one cell
or a few cells begin to grow and multiply too
much
• Cell growth is called a tumor
• Primary tumor is the name for where a cancer
starts
• Secondary tumor or a metastasis is a name for
where the cancer spreads to other parts of the
body
• Oncology is a branch of medicine that
specializes in the diagnosis and treatment of
cancer. It includes medical oncology, radiation
oncology and surgical oncology
WHAT IS CANCER AND ONCOLOGY ?
Tumor Nodes Metastasis Classification
T : (0-4) tumor size or direct extent of the primary tumor
N : (0-2) degree of spread to regional lymph nodes
M : (0-1) presence of distant metastasis
Sub classifications for some cancer types
Where are we heading about fight against cancer ?
Grade Tumor Growth and Spread Classification
Gx: Grade cannot be assessed (undetermined grade)
G1: Well differentiated (low grade)
G2: Moderately differentiated (intermediate grade)
G3: Poorly differentiated (high grade)
G4: Undifferentiated (high grade)
7. 7
• Building a world where every cancer patient receives
the right care at the right place at the right time
• Curing or considerably prolonging the life of patients
and to ensure the best possible quality of life for
cancer survivors
• Providing in a equitable and sustainable way the most
effective treatments thanks to early detection,
accurate diagnosis and staging and adhere to
evidence-based standards of care
• Forecasting the cancer for offering a real opportunity
for clinical benefit that is based on anticipatory action
ONCOLOGY GOALS
Where are we heading about fight against cancer ?
8. 8
CANCER GENOMICS BASICS
Genome is unique for every person
• All the DNA molecules contained in our cells makes up our
genome
• Genome sequencing complete in 2004
• 99.9% of our genome is the same
• 0,1% is enough to make each one of us unique
• On average of 1-3 bases differ from person to person
• Differences can change the shape and function of a protein,
or they can change how much protein is made, when it's
made, or where it's made
• In most cells, the genome is packaged into two sets of
chromosomes: one set from our mother and one set from
our father
Cancer is a disease of the genome
• In cancer cells, small changes in the genetic letters can
change what a genomic word or sentence means
• A changed letter (A, C, T, G) can cause the cell to make a
protein that doesn’t allow the cell to work as it should
• Scientists can discover what letter changes are causing a cell
to become a cancer
• The genome cancer cell can also be used to tell one type of
cancer from another
• In some cases, studying the genome in a cancer can help
identify a subtype of cancer within that type
Cell, Chromosome, DNA, RNA, Gene, Protein, Amino Acid
• Human have 23 pairs of chromosomes in every cell (22 pairs of autosomes and
1 pair of gonosomes (XY male, XX female)
• All DNA complex molecules containing deoxyribose acids in a specific order
determined by the base sequence of 3,2 Billion nucleotides (Adenine, Cytosine,
Thymine, Guanine) including 20,000 to 25,000 genes coding a protein
• RNA is a molecule composed of nitrogenous bases (Guanine, Uracil, Adenine,
Cytosine) similar to DNA but containing ribose rather than deoxyribose. RNA is
formed upon a DNA template. There are several classes of RNA
molecules.(messager RNA : mNRA, transfer RNA : tRNA and ribosomal RNA : rRNA)
• Protein is a large molecule (enzyme, hormone, antibody…) composed of one or
more chains of amino acids
Where are we heading about fight against cancer ?
9. 9
WORLD CANCER STATISTICS 2018
Source : International Agency for Research on Cancer
Where are we heading about fight against cancer ?
Cancer burden rises
to 18.1 million new
cases and 9.6
million cancer
deaths in the world
Cancer is the
second leading
cause of death in
the world
The most common
cancers in the world
are (in decreasing
order) those of the
lung, breast,
colorectum and
prostate
10. 10
CANCER RISKS FACTORS
• Tobacco
• Alcohol
• Nutrition
• Certain class of drugs
• Genome
• Physical inactivity
• Radio frequency
• Phytosanitary products
• Pollution
• Radio activity
• Skin exposure (sun, ultraviolet light)
• Infectious agents (HPV, HBV, HCV, HIV…)
Source : American Institute for Cancer Research
Where are we heading about fight against cancer ?
11. 11
CANCER MANAGEMENT
• Health history review
• Physical examination
• Laboratory tests (blood, urine…)
• Biopsy
• Medical imaging (X-ray, PET/CT, MRI,
Ultrasound…)
• Endoscopy
• Genetic tests
• Surgery
• Chemotherapy
• Radiotherapy (Curie-Therapy, Proton Therapy)
• Immunotherapy
• Genetic CRISPER Cas9
• Stem cell transplant
• Hyperthermia
• Photodynamic therapy
• Laser
• Blood products, transfusion
• Psychiatry
• Psychology
• Sophrology, Meditation, Mindfulness, Wellness
• Dietetic
• Speech therapy
• Kinesitherapy
• Spiritual
• Rehabilitation
• Social work
• Volunteer
Diagnostics
Treatments
Supportive Cares
Where are we heading about fight against cancer ?
12. 12
CANCER MEDICINE MODEL
Personal medicine
taking into account the patient genetic
or protein profile
Preventive medicine
taking into consideration health problems by
focusing on wellness and not disease
Predictive medicine
indicating the most appropriate
treatments for the patient and trying to
avoid drug reactions
Participative medicine
leading patients to be more responsible for
their health and care
Proof medicine
proving medical service to patients,
particularly when it is based on connected
health and remote medicine
Pathway medicine
connecting between different medical and
para-medical actors, outpatient medicine,
digital hospital and home care
Cancer Medicine
of the Future
Where are we heading about fight against cancer ?
13. 13
• Personal medicine
Vision that all people one day will be offered customized care, with treatments that match our genomic
profiles and personal histories
FUTURE OF THE CANCER TREATMENT
• Immunotherapy drugs
Immunotherapy is the fruition of a century-old idea: that a person’s own immune system can be stimulated to
fight cancer
• Cell-based therapies
Patient’s own T cells are directly manipulated to more readily attack cancer cells. In this treatment, T cells are
collected from a patient’s blood, genetically engineered to recognize certain proteins on cancer cells, and
loaded back into the patient’s bloodstream
• Epigenetic therapies
Cancer could be treated in a different way, by transforming cancer cells back to normal rather than destroying
them. CRISPR technology is used to easily alter DNA sequences and modify gene function. The protein Cas9 (or
"CRISPR-associated") is an enzyme that acts like a pair of molecular scissors, capable of cutting strands of DNA
• Battling metastases
Metastatic tumor cells have a remarkable tendency to cling to blood vessels, a survival mechanism that might
be important for the spread of many types of cancer
Where are we heading about fight against cancer ?
14. How Big Data Analytics
Is Transforming Oncology ?
15. 15
ONCOANALYTICS NEEDS
• Providing the best diagnosis and treatment plans
for the patient
• Using the significant advances in data to better
prioritize resources, lowering costs and improving
patient outcomes
• Developing a cognitive interface between clinicians
and technology
• Integrating various primary and secondary sources
for enhance patients and treatment pathway
insights
• Providing a high performance architecture,
scalable, available and secure supporting a large
volume of various data
How big data analytics is transforming Oncology ?
16. 16
ONCOANALYTICS MATURITY MODEL
DESCRIPTIVE
ANALYTICS
What
happened ?
DIAGNOSTIC
ANALYTICS
What did it
happen ?
PREDICTIVE
ANALYTICS
What could happen ?
PRESCRIPTIVE
ANALYTICS
What should be
done ?
Value
Maturity
ADVANCEDANALYTTICSANALYTTICS
How big data analytics is transforming Oncology ?
17. 17
• Isoforms proteins biomarkers analysis
Estimating cancer patients better survival with isoforms
proteins biomarkers
• Survivorship analysis for breast cancer
Understanding what happens to patients after breast
cancer diagnosis
• Rapid diagnosis of rare leukemia
Identifying cause and life-saving therapy faster for a rare
leukemia
• Improving early detection of breast cancer
Detecting breast cancer at the earliest stage
• Approach to developing cancer drugs
Developing cancer drugs based on genetic mutations
tumors
• Biomarkers for relapse in lymphoma patients
Understanding why some lymphoma patients relapse and
others don’t
ONCOANALYTICS USE CASES EXAMPLES
How big data analytics is transforming Oncology ?
Analytics
The future of
Cancer Care
18. 18
ARTIFICIAL INTELLIGENCE
ONCOANALYTICS EMERGING TECHNOLOGIES
MACHINE LEARNING
Makes it possible for machines to learn from experience, adjust to new inputs and perform human-like tasks.
Computers can be trained to accomplish specific tasks by processing large amounts of data and recognizing
patterns in the data
Learns without being explicitly
programmed to do so
NATURAL LANGUAGE
PROCESSING
Helps computers
communicate with doctors
and researchers in their
own language, making it
possible for computers to
read text, hear speech,
interpret it, measure
sentiment and determine
which parts are important
COMPUTER VISION
Trains computers to
interpret and understand
the visual world. Using
images, machines can
accurately identify and
classify objects and then
react to what they “see”
DEEP LEARNING
Makes the computation of
multi-layer neural network
feasible
IoT
PLATFORM
CONVERSATIONAL
AI PLATFORM
KNOWLEDGE
GRAPHS
BLOCKCHAIN
FOR DATA SECURTY
QUANTUM
COMPUTING
Facilitates
communication, data
flows, devices
management and
applications
functionalities
Uses set of
technologies that
enable computers to
simulate real
conversations
Creates a knowledge
domain with the help
of intelligent machine
learning algorithms
Creates continuously
growing list digital
records in packages
(called blocks) which
are linked and secured
using cryptography
Uses a quantum-
mechanical
phenomena such as
superposition and
entanglement to
perform computation
How big data analytics is transforming Oncology ?
19. 19
ONCOANALYTICS BIG DATA MODEL
Variety
Great data variety combining traditional clinical and
administrative data, unstructured data (genomics,
imaging, text…), socioeconomic data and social data
Volume
Great use of precision medicine, big data
explosion in cancer care, especially as genomic
and environmental data become more ubiquitous
Velocity
Rapidly increasing speed at which new data is
being created by technological advances, and the
corresponding need for that data to be integrated
and analyzed in near real-time
Value
Significant advances in data to better
diagnosis and treatment plans, the
patient outcomes, better prioritize
resources and lowering costs
Veracity
Good data quality. Data source is authoritative.
Privacy and data protection safeguards. Data are
regularly updated. Data are unambiguous,
complete, easy to find, understand and use
Big Data for
OncoAnalytics
How big data analytics is transforming Oncology ?
20. 20
ONCOANALYTICS ACTORS AND ROLES
Gets personalized
guidance on treatment
decisions by matching
each patient’s care
against quality
standards and data
from patients like
theirs
Has access to
metrics and tools
that support high-
quality efficient
cares and costs
Receives new insight for
discovery through
access to a massive
body of de-identified
patient care data to
analyze patterns
Has access to
metrics and tools
that support high-
quality efficient
data and IT costs
Doctor Researcher Manager Administrator
How big data analytics is transforming Oncology ?
21. 21
ONCOANALYTICS DATA SOURCES
Imaging
PharmaceuticsFinancial
Biologic Genomics
Radiology Literature
Text
Publications DocumentsStudies
Video
How big data analytics is transforming Oncology ?
Patient data
Trial data
Research data
Diagnosis Treatments Supportive cares
Social Economic Environment
22. 22
ONCOANALYTICS PATIENT PRIVACY
Patient data extraction
Patient data loading
Patient data transformation
Private data
Public data
• Private patient data be used for medical and non-commercial purposes
• Only patient and his doctor are authorized to access to private patient data
de-identification
Data Privacy Method
Data Privacy Rules
Integration Process
• De-identifying patient identification code numbers are de-identified by replacing the original code number by a unique
random code number, creating de-identified dataset. It’s reversible process.
• Anonymization destroys all links between the de-identified dataset and the original dataset. It’s non-reversible process
How big data analytics is transforming Oncology ?
23. 23
ONCOANALYTICS PRIVATE PATIENT DATA
• Names
• All geographic subdivisions (except country)
• All elements of dates (except year)
• Telephone numbers
• Fax numbers
• Email addresses
• Social security numbers
• Medical record numbers
• Health plan beneficiary numbers
• Account numbers
• Certificate/license numbers
• Vehicle identifiers and serial numbers
• Device identifiers and serial numbers
• URL
• IP address
• Biometric identifiers, including finger and voice prints
• Full-face photographs and any comparable images
• Any other unique identifying number, characteristic, or
code
How big data analytics is transforming Oncology ?
Private patient data must be
de-identified before integration within public
databases according to data sharing agreements
Data Sharing
Agreement
24. 24
ONCOANALYTICS DISCOVERY
Cellular : looking for patterns in the data of individual cancer cells
to discover genetic biomarkers. Finding common features could
help us better predict how individual tumors might mutate and
what drug treatments might be most effective
Patient : patient medical history and DNA data could be used to
help define the best combination of therapies and
recommendations for them, based on their tumor, their genes
and the effects of treatments on patients with similar disease
patterns and genetics
Population : population data can be analyzed to inform treatment
strategies for patients based on their different lifestyles,
geographies, and cancer types
• Molecular analysis
• Survivorship analysis
• Early detection of cancer
• Cancer drugs development
• Benchmarking and standardization cares
• Cancer prevention and recommendations
• Prediction with better precision
• Diagnosis and treatments plan
• …
Collecting data from various sources by detecting patterns and outliers with
the help of guided advanced analytics and visual navigation of data, thus
enabling consolidation of cellular, patient and population data
How big data analytics is transforming Oncology ?
Use Cases Examples
25. 25
• Patient diagnosis, treatments and supportive cares
• Expenses and investments
• Billing and reimbursement
• Profits and margin
• Cost savings
• Quality control of cares
• Clinical trials
ONCOANALYTICS CLINICAL
Diagnostics
Treatments
Cares support
Clinical trials
Research
Coordination
Others
Others
ExpensesInvestments
Time
• Return on investment
• Payments models
• Donation
• Forecasting
• Predictive financial modeling
• Health surveys
• …
Making use patient data to generate insights,
take decisions, increase revenues, enhance cares
diagnosis and coordination, minimize abuse and
fraud and save on costs
Technology
How big data analytics is transforming Oncology ?
Use Cases Examples
Making use investments to generate insights and
enhance diagnosis and treatments
27. 27
BIG DATA ONCOANALYTICS ARCHITECTURE PATTERNS
Big data
analytics is the
often complex
process of
examining large
and varied data
sets
Big data
warehouse is
mainly technology,
which stands on
volume, velocity
and variety of data
sources
Big data fabric is
a system that
provides
seamless, real-
time integration
and access across
the multiple data
sources
Big data
management is
the organization,
administration
and governance
of large volumes
of both
structured and
unstructured data
Big data
infrastructure is the
consistent efficient
hardware
architecture,
massively parallel,
highly scalable and
available to handle
very large data
volumes up to several
petabytes
Consisting an integrated and modular architecture for OncoAnalytics
Which big data OncoAnalytics architecture patterns ?
Doctor Researcher Manager Administrator
Patient, Trial and Research data
BIG DATA WAREHOUSE
BIGDATAMANAGEMENYT
AdministrationGovernance
DATA DISCOVERY DATA CLINICAL
BIGDATAINFRASTRUCTURE
StorageServers
BIG DATA FABRIC
Transformation LoadingExtraction
DATA LAKE
BIG DATA ANALTICS
Diagnostics
Analytics
Prescriptive
Analytics
Predictive
Analytics
Descriptive
Analytics
Visualization
Network
28. 28
BIG DATA ANALYTICS PATTERNS
Helping on doctors, researchers and managers to run different types
of analytics, from dashboard and visualization to big data processing,
real-time analytics, and machine learning to guide better decisions
Prescriptive
Analytics
Predictive
Analytics
Diagnostic
Analytics
Descriptive
Analytics
Doctor ManagerResearcher
Visualization
Using data aggregation
and data mining to
provide insight into the
past
Using techniques such as
drill-down, data mining
and correlations
Using techniques such
as statistics, predictive
modeling and
forecasting
Using techniques such as
graph analysis, simulation,
complex event processing,
machine learning, neural
networks
Which big data OncoAnalytics architecture patterns ?
29. 29
BIG DATA WAREHOUSE PATTERNS
Database
Operating
System
Languages
Connectors
Hardware
Software system
Analytics Optimized System (Appliance)
• Software system and hardware integrated
• Languages (R, SQL, NOSQL…)
• Unstructured/structured data
• Components redundancy
• Massively Parallel Processing
• In-memory computing
• Resources management
• Partitioning, indexing
• Column database or HDFS
• Compression
• Connectivity (Hadoop…)
• Scalability, high Availability
• Security
Centralizing all data at any scale with flexible software and available
architecture for massively parallel data processing on a network of
lower costs commodity hardware
Moving data processing to storage
Appliance architecture
Storage patterns
Which big data OncoAnalytics architecture patterns ?
Direct Attach Storage (DAS)
• Improve data access
performance
• Storage inside each server
(DAS)
• Improve data access
performances
• Reduce network latency
• Improve data flows
30. 30
BIG DATA FABRIC PATTERNS
• Extraction phase : extracts data from various data sources
• Transformation phase : transforms data for storing it in de-identify format or structure for the purposes of querying and analysis
• Loading phase : loads data within the big data warehouse
• Phases are running in real-time streaming or batch processing
Data sources Transform
Extract
Extract
Extract
Extract
Extract
Extract
Load
Load
Load
Load
Load
Load
Database Transform
Database
• While data is being extracted, the transformation phase is executed and the
already received data are prepared for loading. As soon as there is some data
ready to be loaded into the big data warehouse, the data loading kicks off
without waiting for the completion of the previous phase
• While data is being extracted, the already received data are prepared for loading.
As soon as there is some data ready to be loaded into the big data warehouse,
the data loading kicks off and transformation is executed in-database without
waiting for the completion of the previous phase
Combining relevant data residing in different sources and providing
doctors, researchers and managers with a unified view of them
Phases
ETL
ELT
Data sources
Which big data OncoAnalytics architecture patterns ?
31. 31
BIG DATA MANAGEMENT PATTERNS
• Maintaining a full audit history across all data in a
single place
• Tracking, classifying and locating data to comply
with governance and compliance rules
• Visualizing the upstream and downstream lineage
of data to verify reliability
• Defining and automating complex data lifecycle
activities with integrated metadata policies
• Verifying access privileges
• Searching metadata and visualizing lineage
• Encrypting or decrypting data
Organization
Metadata
Data Security
Data Quality
Master Data
Management
Data Life Cycle
Management
Doctors and researchers relationship, people management and costs control
Managing data about other data generally referred to as content data (catalog,
dictionary, taxonomy
Data security management is a way to maintain data integrity and to make sure that the
data is not accessible by unauthorized parties or susceptible to corruption of data
Set of characteristics of data : completeness, validity, accuracy, consistency, availability
and timeliness fulfills requirements
Processes, policies, standards and tools that consistently define and manage the critical
data to provide a single point of reference
Managing information throughout its lifecycle, from requirements through retirement..
Data archiving and lineage
Backup/Restore
Storage Array
Data restore
• Monitoring and scheduling system
• Hardware failure , disk or server crash, rack failure
• Data deletion, data corruption
• Site failure , disaster (fire, water, network, power…)
• Backup and restore management Data backup
Consisting an integrated, modular environment to manage application
data and optimize data-driven over their lifetime
Disaster recoveryPrimary
Data replication
Monitoring
Administration
Governance
Which big data OncoAnalytics architecture patterns ?
32. 32
BIG DATA INFRASTRUCTURE PATTERNS
• SQL
• High availability
• Limited scalability
• Structured data
• Limited data volume
• Cluster
• Up to several terabytes
• SQL and NoSQL
• Very high availability
• Infinite horizontal scalability
• Unstructured and
structured data
• Massively Parallel
Processing
• High data volume
• Grid
• Up to several petabytes
Consisting efficient infrastructure from traditional SQL database
to big data massively parallel, highly scalable and available to handle
very large data volumes
Traditional Infrastructure
Big Data Infrastructure
• CPU add-on
• RAM add-on
• Disks add-on
• I/O controller add-on
Infinite Horizontal
scalability
Scale-out
• Server add-on
• Network switch add-on
• Appliance add-on
• Rack add-on
Limited Horizontal
scalability
Scale-out
• Server add-on
• Network switch add-on
• Appliance add-on
• CPU add-on
• RAM add-on
• Disks add-on
• I/O controller add-on
Which big data OncoAnalytics architecture patterns ?
Limited Vertical
scalability
Scale-in
Limited Vertical
scalability
Scale-in
33. 33
BIG DATA CLOUD PATTERNS
Infrastructure
as a Service (IaaS)
Hardware
OS Virtualization
Database
Software
Platform
as a Service (PaaS)
Hardware
OS Virtualization
Database
Software
Software
as a Service (SaaS)
Hardware
OS Virtualization
Database
Software Hospital
Cloud Provider
The Cloud is hosted within the
hospital data center. It is tailored to
needs and infrastructure. The data is
located within the hospital data
center. Investment costs are
managed by the hospital
The hybrid cloud is a mix of private
cloud and public cloud. Cloud services
are distributed according to the needs.
Costs are split between investment
and services
The hardware and software are owned
and managed by a cloud services
provider. It is fast and inexpensive to
set up. It adapts quickly to the
fluctuation of needs. De-identify data
is hosted by the provider. Costs
services are billed for use
PRIVATE CLOUD PUBLIC CLOUD HYBRID CLOUD
Consisting an integrated, modular environment to manage application
data and lowering investment costs
Which big data OncoAnalytics architecture patterns ?
35. 35
• New tools and techniques are required to efficiently process all information, more
data sources emerge
• There is no one cure-all for cancer, there is no single tool for data analytics
• Supercomputing power required to rapidly process huge structured and
unstructured data volume
• Solution is classified into 6 domains :
The solution lies in greater collaboration, working together to use
multiple software each looking for specific features
1. BIG DATA FABRIC 2. BIG DATA WAREHOUSE
3. BIG DATA ANALYTICS 4. BIG DATA MANAGEMENT
Which big data OncoAnalytics solutions ?
5. BIG DATA HADOOP 6. BIG DATA CLOUD PLATFORM
NO ONE-SIZE-FITS-ALL SOLUTION
36. 36
Bid Data Fabric,
Q2 2018
• Provides single enterprise-class
solution for data integration,
data quality, data profiling and
text data processing
• Allows to integrate data from
multiple various sources
• Provides rich connectivity to
many sources and targets
• Manages services
BIG DATA FABRIC
• Discovers hidden insights with a
personalized analytics
experience
• Provides high performance
analytics, analytical data
preparation, data discovery
• Provides superfast processing
for large-scale data
manipulation, exploration,
advanced analytics, artificial
intelligence and machine
learning
• Manages services
Big Data Predictive Analytics
and Machine Learning Solutions, Q3 2018
BIG DATA ANALYTICS
Source : Forrester Source : Forrester
• Complies with regulations
• Executes health center
programs based on secured data
of quality
• Ensures analytics and reports be
trusted
• Leverages understanding and
knowledge with data and
context
• Measures, analyzes and
visualizes the cares programs
• Executes administration process
(backup, disaster recovery…)
Data Governance Stewardship and
Discovery Providers Q2 2017
BIG DATA MANAGEMENT
Source : Forrester
Which big data OncoAnalytics solutions ?
Big Data Warehouse,
Q2 2017
• Provides flexible, high-
performance, secure platform
• Delivers powerful ready-to-run
enterprise platform that is pre-
configured and optimized
specifically for big data
• Builds platform for big data
storage, refinement and
analytics of structured and
unstructured data
• Connects to Hadoop
• Manages services
BIG DATA WAREHOUSE
Source : Forrester
BIG DATA MARKET ANALYSIS
37. 37
• Runs Hadoop deliver quick
setup, higher performance and
automation
• Helps overcome these issues by
optimizing the infrastructure
with automation, balanced
system resources, and
integrated testing
• Runs Hadoop framework
• Uses Apache Spark and Storm in
option
• Manages services
Big Data Hadoop-Optimized Systems
Q2 2016
BIG DATA HADOOP
Source : Forrester
Which big data OncoAnalytics solutions ?
HADOOP ADD-ON
• HDFS : scalable, Fault tolerant, High
performance distributed file system.
namenode holds filesystem metadata.
Files are broken up and spread over
the datanodes
• MAPREDUCE : software framework for
distributed computation. JobTracker
schedules and manages jobs.
TaskTracker executes individual map
and reduce tasks on each cluster node
• YARN : foundation of the new
generation of Hadoop. Architectural
center that allows multiple data
processing engines
• SPARK : fast and general engine for
large-scale data processing. Faster than
MapReduce. combine SQL, streaming,
and complex analytics
• STORM : a system for processing
streaming data in real time and data
processing capabilities to Enterprise
Hadoop
• KAFKA : is a distributed streaming
platform. Kafka brokers massive
message streams for low-latency
analysis in Enterprise Apache Hadoop
HADOOP FRAMEWORK
BIG DATA HADOOP MARKET ANALYSIS
38. 38
BIG DATA CLOUD MARKET ANALYSIS
Forrester Wave : Global public cloud platforms For
enterprise Developers, Q3 2016
Source : Forrester
• Provides cloud solution with
massive volumes of medical
documents and patient data
• Integrates big data fabric, big
data warehouse, big data
analytics and big data
management
• Provides artificial intelligence to
the clinicians of open-domain
question answering
• Loads de-identified patient data
• Manages cloud services
BIG DATA CLOUD PLATFORMBig data fabric, big data warehouse, big data
analytics and big data management integrated
in only-one OncoAnalytics Cloud Platform
AWS CLOUD MICROSOFT CLOUD
GOOGLE CLOUDSAP CLOUD
IBM CLOUD
Watson for Oncology
IBM
FLATIRON
ASCO
VARIAN
SEVEN BRIDGES
IBM CLOUD
Watson for Genomics
OncoCloud OncoPeer
CancerLinQ Cancer Genomics Cloud
IBM
GOOGLE CLOUD
GOOGLE
Google Genomics
Which big data OncoAnalytics solutions ?
39. 39
BIG DATA OPEN SOURCE MARKET ANALYSIS
CompuCell3D
• Flexible scriptable modeling
environment, which allows the
rapid construction of sharable
virtual tissue in-silico simulations
of a wide variety of multi-scale,
multi-cellular problems including
angiogenesis, bacterial colonies,
cancer, developmental biology,
evolution, the immune system,
tissue engineering, toxicology
and even non-cellular soft
materials
Cell-based Chaste
• Mathematical and
computational models of
biological and physiological
systems are rapidly increasing in
complexity. This is especially
true of fields such as cancer
modeling, where the amount of
available biological data is
increasing exponentially.
Modeling approaches therefore
span the range from detailed
models of molecular level
processes, right through to
biomechanical models at the
tissue level
PhysiCell
• Many multicellular systems
problems can only be
understood by studying how
cells move, grow, divide,
interact, and die. Tissue-scale
dynamics emerge from systems
of many interacting cells as they
respond to and influence their
microenvironment. The ideal
"virtual laboratory" for such
multicellular systems simulates
both the biochemical
microenvironment (the "stage")
and many mechanically and
biochemically interacting cells
SimCells
Dichloroacetetic acid (DCA) cell
simulator used to simulate cellular
and biochemical processes,
calculated by a DCA algorithm. The
user, through the use of the
interface may create : small
molecules, membrane, membrane
proteins (which can only exist in
membranes), protein/RNA
molecules, DNA molecules (which
are non-mobile), and genes (which
can only exist in membranes).
Four open source software proved particularly useful to model tumor
growth and the effects of therapies and acknowledged in the
cancer modeling
Which big data OncoAnalytics solutions ?
OncoAnalytics Discovery at cellular level
Open source software is software with
source code that anyone can inspect,
modify, and enhance. "Source code" is the
part of software that most computer users
don't ever see; it's the code computer
programmers can manipulate to change
how a piece of software a program or
application works
41. 41
BIG DATA ONCOANALYTICS ARCHITECTURE
Software Architecture
• Data Discovery cloud platform
• Data Lake Hadoop appliance
• Data Clinical appliance
• Data Integration software
• Data Analytics software
• Data Management software
• Software support
Technical Architecture
• Servers
• Storage
• InfiniBand & Ethernet Network
• Hardware support
Which Big data OncoAnalytics solution examples ?
Doctor Researcher Manager Administrator
Patient, Trial and Research data
BIG DATA WAREHOUSE
BIGDATAMANAGEMENYT
AdministrationGovernance
DATA DISCOVERY DATA CLINICAL
BIGDATAINFRASTRUCTURE
StorageServers
BIG DATA FABRIC
Transformation LoadingExtraction
DATA LAKE
BIG DATA ANALTICS
Diagnostics
Analytics
Prescriptive
Analytics
Predictive
Analytics
Descriptive
Analytics
Visualization
Network
42. 42
ANALYTICSSOURCES
BIG DATA ONCOANALYTCS DATA FLOWS
DISCOVERY
CLINICAL
Public data
Public data
Private Patient data
DATA LAKE
Private Trial data
Private Research data
Private data
Researcher
Manager
Doctor
Doctor
Doctor
Public data
de-identified
IMANAGEMENT
Administraor
Public data
de-identified
Which Big data OncoAnalytics solution examples ?
43. 43
BIG DATA ONCOANALYTICS INFRASTRUCTURE
40Gb Infiniband
1 to 10Gb Ethernet
10Gb Ethernet
PRIMARYSITE
Doctor ManagerResearcher Administrator
Oncology
Platform
MANAGEMENT
Web/Application
Servers
DISCOVERY
SECONDARYSITE
INTEGRATION
Web/Application
Servers
ANALYTICS
Web/Application
Servers
MANAGEMENT
Web/Application
Servers
INTEGRATION
Web/Application
Servers
ANALYTICS
Web/Application
Servers
Asynchronous data replication from primary to secondary site
10Gb Ethernet
Web Browser
Which Big data OncoAnalytics solution examples ?
Oncology
Platform
DISCOVERYCLINICAL
Analytics
Appliance
CLINICAL
Analytics
Appliance
DATA LAKE
Hadoop
Platform
DATA LAKE
Hadoop
Platform
BACKUP
Backup
Array
BACKUP
Backup
Array
Web Browser
45. 45
BIG DATA ONCOANALYTICS PROJECT - GETTING START
• Have a big data insight – understanding concepts of big data, domains, needs, opportunities,
market, social behavior
• Discuss with key people – doctors, researchers, managers…
• Identify new skills and competencies – data scientists, architects…
• Identify alliances – services providers, software and hardware vendors
• Build a case – few public proof points or metrics to leverage, create much of it from scratch, focus
on single problem and only a handful of metrics
• Use internal data in priority – Electronic Medical Records exist in the cares center. Integrate external
data in second step
• Evangelize big data in financial and social terms – make an evangelization deck, explain how the
cares center will benefit from big data and the financial and social opportunities it creates. The
objective is for clinicians to embrace it and include it in their plans. Make it friendly
• Identify a sponsor – here’s the challenge with big data technology, looking for someone dynamic,
who understands the stakes and believes that technology can drive competitive advantage
• Capture metrics and use them to tell a story – identify only a few metrics that will be measure and
tell a story, people will remember the story long after they forget the numbers in the case
• Emphasize on big data opportunity – some people can’t see big data, it’s hard to
get passionate about abstract concepts. Need to visualize the problem and the opportunity.
Do a demonstration of big data project and show what new results will occur. A picture is always
worth than a thousand words
How getting start big data OncoAnalytics project ?
46. 46
BIG DATA ONCOANALYTICS PROJECT
RunningBuilding
• Oncology goals
• Oncology needs
• Actors and roles
• Data sources,
• Data flows
• Data model
• Data volume
• Software and tools
• Macro architecture
• Documentation
• Data structures
• Data flows
• Algorithms and data
processing
• Dashboards and reports
• Architecture design
• Data administration and
governance
• Test plan
• Documentation
DesigningConsulting
• Databases building
• Data integration
• Data analytics
• Dashboards and reports
• Data administration and
governance
• Application deployment
• Hardware, software
installation
• Tests
• Documentation
• Hardware and software
support according to the
Service Level Agreement
• Hardware requirements
• Monitoring and scheduling
• Data security
• Master data
• Data quality
• Data life cycle
• Data backup and disaster
recovery
• Documentation
Managing
• Planning
• Team organization and coordination
• Meeting with doctors, researchers, manager and IT people
IT Consultant
IT Architect
IT Consultant
IT Architect
IT Developer
IT Engineer
IT Engineer
IT Administrator
IT Manager
How getting start big data OncoAnalytics project ?
48. 48
• Working with clinicians, researchers and
IT engineers together to fight against cancer
• Patients diagnosis, treatments and care support plan
• Cancer predictions and recommendations
• High-quality efficient cares and costs
• Best practices, publications and cancer literature sharing across
many care centers and laboratories
• EMR and EHR integration (best practices, documents, genomic,
biology, radiology, pharmacology, financial…)
• Advanced algorithms based on natural language, artificial
intelligence, machine learning and deep learning
• Discovery and clinical unified platform integrating hardware
software and added value services
• Available, scalable and flexible high performance platform
• Hardware and software vendors partnership
• References architectures
BIG DATA ONCOANALYTICS VALUE PROPOSITION
Which big data OncoAnalytics value proposition ?