Presentation by Luiz Olavo Bonino about the current state of the developments on FAIR Data supporting tools at the Dutch Techcentre for Life Sciences Partners Event on November 3-4 2016.
6. Jan 2014
SWT
1st Skunkworks hackathon
Maastricht - NL
W3C DCAT - FAIR Profiles
Apr 2014 Sept 2014
FAIR Data Principles
@ FORCE 11
7. Mar 2015 Apr 2015 Aug 2015
DFDET
ODEX4all project
SWT
2nd Skunkworks hackathon
Hinxton - UK
FAIR Profiles, Beacons,
Molgenis
DFDET
Released first beta version of
ORKA
DTL
8. Sept 2015 Feb 2016 Jun 2016
The FAIR Guiding Principles
paper on Scientific Data *
SWT
Skunkworks @ Biohackathon
Final version of the Principles
First attempt FAIR Projection
DFDET
Starts the work on
FAIR Data Point
DFDET + SWT
FAIR Data Point paper **
* http://www.nature.com/articles/sdata201618
** http://www.iste.co.uk/index.php?f=a&ACTION=View&id=1073
Mar 2016
SWT + DFDET + others
Starts work group
on FAIR metrics for
data and services
9. Sept 2016
DFDET
Starts the work on the
FAIR Data Search Engine
and on the FAIRifier.
FAIR Data Point incorporates
RML
Oct 2016 Nov 2016
SWT + DFDET
FAIR Technologies paper
LDP, LDF, RML,
FAIR Projectors
DFDET
First FAIR Hackathons with
Molgenis, Castor EDC,
RDRF and OSSE
FAIR Data Point
DFDET
FAIR Hackathons with
Mendeley and Quaero Systems
FAIR Data Point
DFDET
FAIR Data Point v. 1.0
FAIR Data Point
DFDET + SWT
FAIR Data workshop
@ ECCB 2016
The Hague - NL
15. FAIR DATA POINT
A particular class of FAIR Data System that provides access to
datasets in a FAIR manner. The datasets can be external or
internal to the FAIR Data Point. Also, the source data can be a
non-FAIR dataset or a FAIR Data Resource. If the source data is
non-FAIR, the FAIR Data Point needs to made the necessary FAIR
transformations on the fly.
30. FAIR Data Point metadata
Title
Responsible institution(s)
Contact
FAIR API version
License
…
31. FDP METADATA
<http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp> dct:alternative "DTL FDP"@en ;
dct:description "The DTL FAIR Data Point hosts the FAIR Data versions of datasets that have been made FAIR
during BYODs as well as other relevant life sciences datasets"@en ;
dct:subject "FAIR Data" , "Life Sciences" ;
dct:title "DTL FAIR Data Point"@en ;
<http://www.re3data.org/schema/3-0#api> <http://dtls.nl/fdp#api=1> ;
<http://www.re3data.org/schema/3-0#catalog> <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/biobank> , <http://dev-
vm.fair-dtls.surf-hosted.nl:8082/fdp/comparativeGenomics> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/patient-registry>
, <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/textmining> , <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/transcriptomics>
;
<http://www.re3data.org/schema/3-0#institution> <http://dtls.nl> ;
<http://www.re3data.org/schema/3-0#institutionCountry> <http://lexvo.org/id/iso3166/NL> ;
<http://www.re3data.org/schema/3-0#lastUpdate> "2016-10-27"^^xsd:date ;
<http://www.re3data.org/schema/3-0#software> "FAIR Data Point" ;
<http://www.re3data.org/schema/3-0#startDate> "2016-10-27"^^xsd:date ;
a <http://www.re3data.org/schema/3-0#Repository> ;
rdfs:label "DTL FAIR Data Point"@en ;
<http://xmlns.com/foaf/0.1/landingpage> <http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/swagger-ui.html> .
32. FAIR Data Point metadata
Catalog metadata
Title
Theme taxonomy
Issued date
…
40. FAIR Data Point metadata
Catalog 2 metadataCatalog 1 metadata
Dataset 1 metadata
Distribution
1.a metadata
Data record metadata
Distribution
1.b metadata
Dataset 2 metadata
Distribution
2.a metadata
Data record metadata
Distribution
2.b metadata
Dataset 3 metadata
Distribution
3.a metadata
Data record metadata
42. METADATA LAYERS
Layer Description Example Standard
FDP (Data
repository)
Information about the FDP as
a data repository
PID, title, description,
license, owner, API
version, etc.
RE3Data
Catalog Information about the catalog
of datasets offered
PID, title, description,
publisher, etc.
W3C DCAT
#Catalog
Dataset Information about each of the
offered datasets
Publisher, issue date,
theme, etc.
W3C DCAT
#Dataset,
Distribution Information about how the
dataset is distributed
AccessURL,
downloadURL, format,
mediaType, etc.
W3C DCAT
#Distribution
Data record Information about the actual
data, types, identifiers, etc.
Data items types,
identifiers, domain,
range, etc.
RML
OAI-PMH
43. DEMO FAIR DATA POINT
http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/swagger-ui.html
http://dev-vm.fair-dtls.surf-hosted.nl:8082/fdp/
API
GUI
51. FAIRIFICATION PROCESS
Retrieve original data
Dataset identification and analysis
Definition of the semantic model
Data transformation
License assignment
Metadata definition
FAIR Data resource (data, metadata, license) deployment
54. FAIRIFIER
Transform non-FAIR datasets into FAIR Data Resources
(dataset in FAIR format, license and metadata)
Data munging
Semantic modeling
License definition
Metadata definition and extraction
Data publication
58. FAIRIFICATION - NEW DATASET TYPE
FAIR Data Resource
submit generate
FAIR Data Model
Registry
store
Non-FAIR
- FAIR
mapping
59. FAIRIFICATION - RECURRING DATASET TYPE
FAIR Data Resource
submit generate
FAIR Data Model
Registry
query
Non-FAIR
- FAIR
mapping
retrieve
60. A particular class of FAIR Data System to provide
support for data interoperability;
Supports publication and access to FAIR data.
Fosters an ecosystems of applications and services;
Federated architecture: different FAIRports (and other
FAIR Data Systems) are interconnectable;
Supports citations of datasets and data items;
Provides metrics for data usage and citation;
63. Allow third-party annotation on existing knowledge bases
Capture the provenance of the annotator and the original
statement
Open RDF
Knowledge AnnotatorORKA
67. TOOLS ROADMAP
Dec 16 Jan 17 Feb 17 Mar 17
FAIR Data
Point
Version 1
Metadata editor,
release metadata,
POST, FAIR
accessor
Version 1.1
Reintroduce OAI-
PMH compliance
Version 1.2
Update notification
FAIR Data
Search
Engine
Beta 1
Crawler, metadata
index and search
GUI
Beta 2
Improved search
GUI, search API
FAIRifier
Beta 1
OpenRefine + RDF
plugin, publication
to FAIR Data Point
Beta 2
Metadata definition
and extraction
(RML), license
picker
68. TOOLS ROADMAP
Dec 16 Jan 17 Feb 17 Mar 17
FAIR Data
Model
Registry
Alpha 1
Start of the
integration work
ORKA
Beta 1
Definition of 2-3
use cases
Beta 2
Extended with
features required
by the use cases
Data
FAIRport
Alpha 1
Start of the
integration work
71. FAIR HACKATHON - GOALS
Align solutions with FAIR Data Point specifications.
Metadata content
API
Data
72. FAIR HACKATHON OUTCOME
FAIR data model for solutions content;
Architecture of the required adjustments/extensions;
Technical specification of the adjustments/extensions;
Proof-of-concept of the adjusted solution;
77. DTL’S FAIR HACKATHONS ROADMAP
EUDAT (pilot project ongoing)
EGA (July 6-8 2016)
Molgenis (Oct 19-20 2016)
Patient registry solution providers (Oct 25-27 2016)
Mendeley (Nov 18 2016)
Quaero Systems (Nov 24 2016)
tranSMART (TBD)
phenotypeDB (TBD)
Euretos Knowledge Platform (TBD)
NIH, Australian National Data Services, Brazilian open government
data, …
78. BRING YOUR OWN DATA - BYOD
Goals:
■ Learn how to make data linkable “hands-on” with experts
■ Create a “telling story” to demonstrate its use
■ Make FAIR Data at the source
Composition:
■ Data owners – specialists on given datasets
■ Data interoperability experts
■ Domain experts
Source: Marcos Roos
82. BYOD Planning
Preparation
Identify Plan
Datasets
Attendees' profile
Output data access
Tentative dates
Tentative venue
Costs
Funds
Coordination
Set date
Invite attendees
Set venue
Catering
Lodging
Financial planning
Publicity
Working document
Preparatory calls
Data hosting
Software hosting
Documentation hosting
83. BYOD Planning
Execution
Day One
Introduction
SW, LD, Ontology intro
Use case intro
Workgroups division
Working sessions
WWW/TTTALA
Day Two
Progress report
Working sessions
Groups reports
WWW/TTTALA
Day Three
Data integration
Answer driving question
Explore data
Demo improvement
Final report
WWW/TTTALA
One point to be stressed is that the data in the functionally interlinkable format has the sole purpose of facilitating data integration and interoperability. This doesn’t mean that the data in this format should be used for other purposes. Once the datasets are integrated and the scientific questions has been answered, for streamlining analysis on the selected integrated datasets, further processing may be necessary to transform the data into a format that would be optimal for the intended analysis.
Stores FAIRification information for different dataset types.
In summary, in a BYOD we take non-FAIR datasets and, with the expertise of data owners, data experts and domain specialists, produces functionally interlinkable FAIR data by combining the data with the appropriate ontologies. These FAIR datasets, then, can be more easily integrated, giving answers to questions that wouldn’t be possible with the isolated datasets. These questions tend to be richer and more complex, fostering a richer knowledge discovery.