How 2019 became the year FAIR landed in biopharmaceutical R&D

Kees van Bochove, Founder, The Hyve
How 2019 became the year FAIR
landed in biopharmaceutical R&D
@keesvanbochove
#PharmaTec19
London, 24 Sep 2019

Outline
1. FAIR Data is about people
2. The data lake is a passing phase
3. Relational data models are back

The Hyve
We advance biology and medical research…
… by building and serving thriving open source communities.
Services
Professional support for
open source software in
biomedical informatics
➢Software development
➢Data engineering
➢Consultancy
➢Hosting / SLAs
Core values
Share
Reuse
Specialize
Office Locations
Utrecht, The Netherlands
Cambridge, MA, United States
Customer Segments
Pharma
Life Sciences
Healthcare
Fast-growing
Started in 2012
40+ people by now

FAIR Data is
about people
Statement #1
@keesvanbochove @TheHyveNL

The roots of FAIR
►Public-private partnership to advance:
►Open Science
► Sustainability & reuse of data
►Workshop in Leiden in 2014
►Towards a Modular Blueprint ‘Floor-plan’ of a safe
and fair Data Stewardship, Trading and Routing
environment, provisionally called the Data
FAIRPORT
https://www.lorentzcenter.nl/lc/web/2014/602/info.php3?wsid=602

FAIR Workshop at The Hyve in Utrecht, 2018
http://blog.thehyve.nl/blog/highlights-from-pistoia-alliances-fair-workshop
https://www.sciencedirect.com/science/article/pii/S1359644618303039

FAIR Data Principles <> People
 GO-CHANGE: socio-cultural changes around working together on
data: it’s about connecting people to each other’s data
 GO-TRAIN: promote awareness of FAIR and teach best practices on
how to make your data available to others
 GO-BUILD: provide the infrastructure that supports this change
 Goes by many names: digital transformation, data-driven, FAIR, silo-
breaking etc., but the result is improved (scientific) collaboration

Why resilience to change matters
● Domain changes and focus shifts: new data types,
applications etc.
● Organizational changes: M&A, re-orgs, people
moving roles etc.
● Technology changes: new software and hardware
platforms, analysis methods, automation, ML/AI etc.

Let’s look at one of the 15 principles as example
Findable:
F1. (meta)data are assigned a globally
unique and persistent identifier;
F2. data are described with rich metadata;
GO-CHANGE
● Adapt information processes to systematically
acquire, capture and persist metadata
GO-TRAIN
● Work with data and domain experts to define
important metadata to capture for all datasets
GO-BUILD
▶ Choose widely accepted and easy to produce
machine-readable format for describing metadata
(hint: RDFa, JSON-LD etc.)
▶ Master metadata management services
FAIR Maturity Indicators
● F2A Structured Metadata
● F2B Grounded Metadata

FAIR Data is
about people
Statement #1
● Connecting people to
each other’s data
● Changing processes
● Supporting change

The classical monolith
Enterprise
Data Warehouse
ETL
ETL
ETL
Business Intelligence
/ Analytics

The modern (?) monolith
Ingest
Self-service
Pipelines
AnalyticsEnterprise Data Lake
Ingestion Team Data Engineering Team Unification TeamSearch TeamPlatform API Team Analytics Team
Architectural division
Axis of
change

Decentralized data management
● IRI / identifier schemes
● Metadata standards
● Provenance standards
CDO
Data Federation
{
{
Oncology
Neuro-
science Development
ClinOps
HCS
Omics platforms
Data science
Preclinical
ADME/Tox
Biomarker dev.
RWD
Epidemiology
● Catalog function
● Data standards
● Entities / data sets
Publish

Advantages of a decentralized FAIR approach
● More resilient to change: no dependency on large central functions
● Allows for an iterative data strategy operationalization (no ‘big bang’
data lake delivery needed, FAIRification can start today and locally)
● No need to shuffle people around to start a big data lake project:
embed informatics and data experts directly in the research and
development teams
● Centralize only standardization functions, decentralize the rest 
empower teams to do their own data science and informatics
● Embrace usage of external data and collaborations, no need to
‘ingest first’ via a central function, but use & link directly

The data lake is a
passing phase
Statement #2
● Centralization is a
potential bottleneck and
a barrier for change
● The solution is in
decentralization of
storage, applications etc.
● Standards management
and data federation as
central functions

Teams at The Hyve: open source communities
Research Data Management
● FAIR Data Governance consultancy
● Fairspace (meta)data management
Genomics
● Cancer data portal: cBioPortal
● Knowledge base: Open Targets
Health Data Networks
● Data warehouses: tranSMART, i2b2
● Cohort selection: Glowing Bear
● Request Portals: Podium
Real World Data
● Real world evidence: OMOP/OHDSI
● Wearables platform: RADAR-BASE

FAIR Services at The Hyve
● Semantic modelling: creating (meta)data models that allow traversal of
linked data
● Data conformance: choose the right data standard for specific problems,
align with community standards to maximize benefits from the open
science communities and precompetitive collaborations
● Data landscape: create an understanding of existing applications and
data sources in the company and readiness for FAIR
● FAIRification: get started with FAIRifying datasets, defining metadata,
appropriate standards, provenance etc.
● Data catalog: build collaborative environment around data catalog (e.g.
using Fairspace)

Example: OMOP CDM v5 for RWE/RWD
● Observational
healthcare
data
● Fields defined
per domain
● Standardized
Vocabularies

cBioPortal: hard to resist value proposition
● 4000+ citations
in literature
● ~20k+ unique
users per
month
● Local instances
deployed in
many pharma
companies
and cancer
centers

Relational data
models are back
Statement #3
● RDBMS abandoned in favor
of NoSQL, ‘schemaless’,
‘we use ElasticSearch’ etc.
● But some applications need
strong (relational)
semantics (e.g. CDISC)
● Descriptions can be in
relational db (e.g. OMOP),
RDF, JSON-LD etc.
● Underlying infrastructure
doesn’t matter as long as it
does not leak abstractions

We advance biology and medical
sciences by building and serving
thriving open source communities

How 2019 became the year FAIR landed in biopharmaceutical R&D

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à How 2019 became the year FAIR landed in biopharmaceutical R&D

Similaire à How 2019 became the year FAIR landed in biopharmaceutical R&D (20)

Plus de Kees van Bochove

Plus de Kees van Bochove (12)

Dernier

Dernier (20)

How 2019 became the year FAIR landed in biopharmaceutical R&D