8. Analytics – Trends
• Lots of (big) data
– Combination of external
and internal data
– And the diversity of
sources
• Visualisation and
explore relationships
– Dashboards and
interactive plots
• New insights from old
data
• Internet of Things
• … and wearables
• Rise of the data
scientist role
• Analytics as a service
• “No IT” solutions
• Cloud analytics
• …
9. Oversight Hindsight Insight Foresight
Not all Analytics are the Same
visual exploration
pattern finding
predictive models
optimal choice
manual automatic
11. R&D and the need for speed
• R&D is a unique environment
• Speed, Trust and Agility needed
• People need answer to questions
immediately
– Have rapidly changing priorities across a
number of scientific domains
– Direct and govern the next phase of work
– Can’t wait for months or even years
12. In R&D the right technology can start the
analytics opportunity. The right people with
the right skills are needed to deliver quick
solutions and keep the research cadence
going.
Analytics presents many opportunities and
challenges…
How is R&D responding?
13. Data Management Strategy.
• Data retention strategy.
– When storage costs can outstrip generation
costs, we need a very clear data retention
strategy.
• Data Blue Printing.
– How is data flowing through your processes,
and how is it informing decisions?
• Clear Data Standards.
– Which ones are you adopting, and how will
you translate between leading standards?
(Batch, Lot, Sample, Batch record, etc..)
• Roles like Chief Data Officer
becoming important.
– Company wide standards, sponsored from
the top.
14. Secondary Data Use
Public Data Published DataIn House Data Real World Data
Data Translation (Re-Formatting, Text Analytics, Normalisation) to produce
ANALYTICS READY DATA
Data Analysis
Partner Data
15. Bringing our software to their data.
Challenge:
•Data Sets can be too large to mirror.
•Data may not be permitted to leave a Partner organisation.
One Solution:
• Embassy Approach:
• Secured computing environment in the Partner organisation.
• GSK has set up one of these at the EBI.
• Firewall
• Intrusion detection
• Antivirus
• 2FA
• Encryption at rest
• VM management
• Software licensing
• Back-ups
• User Account management
• Support
• etc….
16. Complex calculations.
http://aws.amazon.com/solutions/case-studies/novartis/
One of the most extreme use cases is from Novartis who run complex insilico screening on
the cloud - the linked example shows how they ran a 10 million compound screen over
87,000 compute cores for 9 hours (39 CPU years of calculations), for a cost of $4,232
Need for a clear Information
protection strategy
17. 17
Agenda
• Data Science examples
• The biomedical industry vision
• Data re-use examples in the biomedical sector
◦ Gene Expression Omnibus (GEO) – Children’s Tumor Foundation
◦ SciDB - Novartis
◦ The Cancer Genome Atlas (TCGA) - Roche
• Devices re-use new trend
• Potentiating a new role
20. 20
Agenda
• Data Science examples
• The biomedical industry vision
• Data re-use examples in the biomedical sector
◦ Gene Expression Omnibus (GEO) – Children’s Tumor Foundation
◦ SciDB - Novartis
◦ The Cancer Genome Atlas (TCGA) - Roche
• Devices re-use new trend
• Potentiating a new role
• Targeting Data Scientists
21. 21
The biomedical industry vision
• Translational Medicine is already transforming
how new therapies are discovered and developed
Patient Population Segregated Patient Population
IHC
FISH
Multiplex ELISA
NGS
GEA
CNV and
Translocations
22. 22
The biomedical industry vision
• Translational Medicine is already transforming
how new therapies are discovered and developed while
high-content omics technologies accelerate this trend
Patient Population Segregated Patient Population
IHC
FISH
Multiplex ELISA
NGS
GEA
CNV and
Translocations
23. 23
Agenda
• Data Science examples
• The biomedical industry vision
• Data re-use examples in the biomedical sector
◦ Gene Expression Omnibus (GEO) – Children’s Tumor Foundation
◦ SciDB - Novartis
◦ The Cancer Genome Atlas (TCGA) - Roche
• Devices re-use new trend
• Potentiating a new role
26. 26
Data re-use examples in the biomedical sector
• The Cancer Genome Atlas (TCGA)
◦ Predicting novel therapeutic targets, novel biomarkers…
◦ Example of public
genomic data mining
& leveraging
◦ TCGA
- Contains the main
genomic changes
in cancer
- >30 cancer types
- >45000 archives
- Size ~75 TB
27. 27
Data re-use examples in the biomedical sector
• The Cancer Genome Atlas (TCGA)
◦ Predicting novel therapeutic targets, novel biomarkers
28. 28
Agenda
• Data Science examples
• The biomedical industry vision
• Data re-use examples in the biomedical sector
◦ Gene Expression Omnibus (GEO) – Children’s Tumor Foundation
◦ SciDB - Novartis
◦ The Cancer Genome Atlas (TCGA) - Roche
• Devices re-use new trend
• Potentiating a new role
• Targeting Data Scientists
29. 29
Devices re-use new trend
• Sensor analysis for improved trial designs with Kinect
32. 32
Agenda
• Data Science examples
• The biomedical industry vision
• Data re-use examples in the biomedical sector
◦ Gene Expression Omnibus (GEO) – Children’s Tumor Foundation
◦ SciDB - Novartis
◦ The Cancer Genome Atlas (TCGA) - Roche
• Devices re-use new trend
• Potentiating a new role
33. 33
Potentiating a new role
• Producing the data is not where the value resides
◦ Mining the data provides the valuable findings
◦ Algorithms are at the core of analytic capabilities
• Analytics potential value
◦ Find patterns/markers associated with a patient response
◦ Extract quantitative and discriminative information (faster) from sensors
• Technologies involved
◦ Computing closer to the data source
◦ “Translational” databases
34. 34
Potentiating a new role
• Challenge
◦ The complexity and size of the data, coupled to complex technologies limit the
opportunity for life science experts to explore and interpret the data
• A solution – Data Science
◦ Integrate the tools allowing to process the data and visualize the results
◦ Data Science combines strong scientific and disease domain expertise with
analytics capabilities to generate answers rather than information
• Educate “Data Scientists” to be able to use such integrated tools,
enabling them to perform advanced results exploration and queries
◦ For instance finding patients with similar patterns of mutations
in large genome-wide association studies databases
37. Next webinar:
Sharing Data with my Co-opetition
Wednesday 7th October @ 11am-midday EDT
Register at https://attendee.gotowebinar.com/register/8451409689562232065
Where to store?(in-house, offline, cloud?)
What Standards to use?
How to index?
How long to keep the raw data v analysed results? (likelihood of a new analysis method to shed new light on the data?)
Using these data types?
E.g. Using R statistical language
Spinal Muscular Atrophy (SMA)
•SMA is a fatal genetic neurological disease
•1 in 8000 to 11000 babies is born with SMA
•1 in 50 people carry the gene mutation for SMA
•SMA cause progressive and irreversible muscular atrophy
•50% of babies with SMA will die before their second birthday
•Children who survive are profoundly physically disabled
Will require de-centralized / embarked computation
Real-time processing
Connectivity with the Cloud
According to Gartner: the discipline of extracting nontrivial knowledge from (often complex and voluminous) data in order to improve decision making. It involves a variety of core steps ranging from business and data understanding, data preparation, and modeling/optimization/simulation to testing and then the final deployment into the business environment.