The document outlines plans for the VODAN Africa FAIR data project. It discusses the FAIR principles of findability, accessibility, interoperability, and reusability and how they will guide the project. The architecture will include tools like CEDAR for machine-readable data production and a triple store for exposing metadata. An initial minimal viable product will integrate clinical data from DHIS2 to validate the approach before full deployment.
2. Outline
1
FAIR Principles, FAIR METRICS and FAIRness
Internet of FAIR Data and Service
VODAN Africa Overarching Architecture
MVP and Deployment
Plan
3. FAIR
2
A set of guiding principles and practices to find, access,
interoperate, and reuse digital object.
They were designed with data-driven and machine-assisted
open science in mind.
They are universal in scope and some of the principles will
need to be refined for a particular community, such as
medical informatics.
Aim to guide scientific data management and stewardship, and
are relevant to all stakeholders in the digital health ecosystem.
4. FAIR Principles
3
• Datasets should be described, identified and registered or indexed in a clear and
unequivocal manner.
Findable
• Datasets should be accessible through a clearly defined access procedure (GDPR),
ideally using automated means. Metadata should always remain accessible.
Accessible
• Data and metadata are conceptualised, expressed and structured using common,
published standards
Interoperable
• Characteristics of data and their provenance are described in detail according to
domain- relevant community standards, with clear and accessible conditions for
use.
Reusable
5. FAIR Principles
4
Findable:
F1. (meta)data are assigned a globally
unique and
persistent identifier – DOI, PURL
F2. data are described with rich metadata
(defined by R1 below)
F3. metadata clearly and explicitly include the
identifier of the data it describes
F4. (meta)data are registered or indexed in a
searchable resource – Google Indexing
Accessible:
A1. (meta)data are retrievable by their
identifier using a standardized
communications protocol – HTTP/S
A1.1 the protocol is open, free, and
universally implementable
A1.2 the protocol allows for an
authentication and authorization
procedure, where necessary – Certification,
API Key
A2. metadata are accessible, even when the
data are no longer available
Interoperable:
I1. (meta)data use a formal, accessible,
shared, and broadly applicable language for
knowledge representation - RDF
I2. (meta)data use vocabularies that follow
FAIR principles – Bioportal/Ontoportal
I3. (meta)data include qualified references
to other (meta)data
Reusable:
R1. meta(data) are richly described with a
plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear
and accessible data usage license - CC
R1.2. (meta)data are associated with detailed
provenance
R1.3. (meta)data meet domain-relevant
community standards
6. FAIR Principles
5
FAIRness
• It is often useful to assess to which extent a
resource (data or metadata) follows the FAIR
principles.
• FAIRmetrics are a specific type of metadata.
They tell us about the FAIRness of a dataset.
• FAIRness in all phases of the data life cycle.
FAIR Metrics Group -
http://www.fairmetrics.org FAIRification
FAIR
metrics
FAIRness
Note: given metrics may not be applicable to certain types of resources
from a given community.
7. FAIR Principles
6
The scalable and transparent ‘routing’ of data, tools, and compute (to
run the tools on) is a key central feature of the envisioned Internet of
FAIR Data & Services (IFDS).
8. VODAN Africa
FAIR IN
7
VODAN Africa
FAIR data project
depends on:
• Ensuring that data production increases understanding of the
relevance of such data at point of care and improves the
quality of health
• One-directional use of the data (away from where the data is
produced) being replaced by a model of collaboration with a
view to benefiting the data subjects
• The capacity for data stewardship for data handling,
processing, analytics and visualization being enhanced to
service the data producers
Objectives
Enhance the Findability, Accessibility, Interoperability
and Reusability of digital resources such as patient
data, for both humans and machines.
• Learning from data without the data leaving the
provenance.
9. Architecture
Flexible and agile machine-readable data production, and templates
to be seamless related to the data flows in clinics and hospitals.
8
10. Architecture
CEDAR
A workbench used to produce machine-
readable data.
Bulk Upload
Uploading bulk datasets from
different sources.
• COVID-19
HMIS
Push aggregated data from
CEDAR to HMIS.
• DHIS2
Enables data visiting by exposing
meta(data) globally in a machine
readable format.
• Triple Store (Allegrograph)
FDP
Internal Dashboard
Aggregated data Visualisation
at each facility
External Dashboard
Aggregated data Visualisation
at VODAN Africa
9
28. VODANA
GLLOSARY
27
FAIR: Findable, Accessible,
Interoperable, Reusable (more info).
GO BUILD: GO BUILD is one of three GO FAIR
pillars; it deals with building the technical
infrastructure (more context).
GO CHANGE: GO CHANGE is one of three GO
FAIR pillars; it aims to instigate cultural change to
make the FAIR principles a working standard in
science and to reform reward systems to
incorporate open science activities (more
context).
GO TRAIN: GO TRAIN is one of three GO FAIR
pillars; it is about training the data stewards
capable of providing FAIR data services (more
context).
29. VODANA
GLLOSARY
28
IFDS: Internet of FAIR Data & Services (more
info).
Machine-actionability: the capacity of
computational systems to find, access,
interoperate, and reuse data with none or
minimal human intervention (more context).
Metadata: information about the data, such as
the protocol used to create the data or the type of
molecules that are the focus of the study (more
context).
Ontology: can be roughly described as a
vocabulary with hierarchies, meaningful relations
among concepts, and their constraints (more
context).
30. VODANA
GLLOSARY
29
Persistant identifier: An identifier is a sequence
of characters that identifies an entity. The term
‘persistent identifier’ is usually used in the context
of digital objects that are accessible over the
Internet. Typically, such an identifier is not only
persistent but also actionable, i.e., it is a Uniform
Resource Identifier (URI), usually of type
http/s, that you can paste in a web browser
address bar to be taken to the identified source
(more info).
RDF: Resource Description Framework, a
standard model for data interchange on the Web
(more info).
URI: Uniform Resource Identifier, an actionable
sequence of characters that identifies an entity
(see also: persistant identifier).
Vocabulary: a computer-readable file that
captures terms, their URIs, and descriptions