SlideShare a Scribd company logo
1 of 167
Mark D. Wilkinson
CBGP-UPM/INIA, Madrid
markw@illuminae.com
A novel, API-free approach to
interoperability leads to FAIRness for
legacy and prospective data.
IBC Scientific Days
January 17-18, 2017
The Problem
...one recent survey of 18 microarray studies found that only
two were fully reproducible using the archived data. Another
study of 19 papers in population genetics found that 30% of
analyses could not be reproduced from the archived data and
that 35% of datasets were incorrectly or insufficiently
described.
“
”
Dominique G. Roche , Loeske E. B. Kruuk,
Robert Lanfear, Sandra A. Binning (2015)
http://dx.doi.org/10.1371/journal.pbio.1002295
The Problem
We surveyed 100 datasets associated with nonmolecular
studies in journals that commonly publish ecological and
evolutionary research and have a strong PDA policy. Out of
these datasets, 56% were incomplete, and
64% were archived in a way that
partially or entirely prevented reuse.
“
”
Dominique G. Roche , Loeske E. B. Kruuk,
Robert Lanfear, Sandra A. Binning (2015)
http://dx.doi.org/10.1371/journal.pbio.1002295
The Problem
Is that data, therefore...
Useless?
The Problem
NO!!
The Problem
It’s Reuseless!

h.t. to Barend Mons
FAIR
Findable
→ Globally unique, resolvable, and persistent identifiers
→ Machine-actionable contextual information supporting discovery
Accessible
→ Clearly-defined access protocol
→ Clearly-defined rules for authorization/authentication
Interoperable
→ Use shared vocabularies and/or ontologies
→ Syntactically and semantically machine-accessible format
Reusable
→ Be compliant with the F, A, and I Principles
→ Contextual information, allowing proper interpretation
→ Rich provenance information facilitating accurate citation
The Four Principles
“Skunkworks”
Task: Build a prototype
Skunkworks Participants
Mark Wilkinson
Michel Dumontier
Barend Mons
Tim Clark
Jun Zhao
Paolo Ciccarese
Paul Groth
Erik van Mulligen
Luiz Olavo Bonino da Silva
Santos
Matthew Gamble
Carole Goble
Joël Kuiper
Morris Swertz
Erik Schultes
Erik Schultes
Mercè Crosas
Adrian Garcia
Philip Durbin
Jeffrey Grethe
Katy Wolstencroft
Sudeshna Das
M. Emily Merrill
The Hourglass Concept
We want a large ecosystem of
apps that use FAIR Data
The Hourglass Concept
We want to support a wide
range of source providers
The Hourglass Concept
The FAIR solution between them must be THIN!
Skunkworks
participants had tons of
experience v.v.
metadata around
scholarly publication
Skunkworks
participants had tons of
experience v.v.
metadata around
scholarly publication
RDA,
Force11,
Dataverse,
Research
Objects,
NanoPubs,
Semantic
Science,
SADI,
AlzForum,
SWAN,
LSID,
…
…
…
...
There was very little
disagreement
about F,
about A,
or about R
The “I” is the big problem
Interoperability is
Hard!!
The “I” is the big problem
Keeping the history brief
A series of teleconferences led to the concept of putting
metadata into an iterative set of ~identical “containers”
Skunkworks Hackathons
The “containers of containers of containers” idea
was elaborated by the belief that we should also
reject any solution that required a new API
ProgrammableWeb.com already catalogues
>16,000 different Web APIs
Skunkworks Hackathons
The “containers of containers of containers” idea
was elaborated by the belief that we should also
reject any solution that required a new API
ProgrammableWeb.com already catalogues
>16,000 different Web APIs
APIs DO NOT MAKE YOU INTEROPERABLE!
Skunkworks Hackathons
The “containers of containers of containers” idea
was elaborated by the belief that we should also
reject any solution that required a new API
Skunkworks Hackathons
Are there existing standards that are
And have the properties of
?
Uses machine-accessible standards and
representations, following a REST paradigm
LDP
Useful Features
I
I + R
F + A
I
Defines HTTP-resolvable URIs for each of
these containers
Defines the concept of a “Container” - a
machine-actionable way to represent
repositories, data deposits, data files,
data points, and their metadata
Uses a widely accepted standard (DCAT)
to relate metadata to data → machine-
actionable data mining
Uses machine-accessible standards and
representations, following a REST paradigm
LDP
Useful Features
I
I + R
F + A
I
Defines HTTP-resolvable URIs for each of
these containers
Defines the concept of a “Container” - a
machine-actionable way to represent
repositories, data deposits, data files,
data points, and their metadata
Uses a widely accepted standard (DCAT)
to relate metadata to data → machine-
actionable data mining
The FAIR Accessor
In incremental detail
What can we describe with
FAIR Accessors?
FAIR Accessors provide a machine-actionable, structured,
REST-oriented way to publish Metadata
about a wide range of scholarly “entities”
What can we describe with
FAIR Accessors?
Warehouses (e.g. EBI)
Databases (e.g. UniProt)
Repositories (e.g. Zenodo, INRA-URGI Wheat Repo, UniProt)
Datasets (e.g. output from a workflow)
Research Objects (data a/o workflow a/o results a/o publications)
Data “slices” (e.g. the result of a database query)
Data Records (e.g. image, excel file, patient clinical record)
Other…
HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
Container
Resource
(a “resource”
is a URI / URL) HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
Container
Resource
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
What does a FAIR Accessor “look like”?
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
What does a FAIR Accessor “look like”?
There is a URI for the “Container”
(of any of the kinds
listed in the previous slide)
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
What does a FAIR Accessor “look like”?
Resources are manipulated using the HTTP
protocol on the Resource URI
For the FAIR Accessor, the only HTTP method
we currently require is HTTP GET
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
What does a FAIR Accessor “look like”?
What is returned is a document full of
metadata richly describing that Container
(warehouse, database, dataset, slice, etc.)
And a list of Resources (URIs) that represent
the contained “things”
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
What does a FAIR Accessor “look like”?
Looking more closely at one of those
contained things...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
The contained thing is a Resource
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
That Resource can be resolved by HTTP GET
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
To retrieve a Metadata document describing
that resource (e.g. a single record)
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
Which record does this Metadata describe?
The foaf:primaryTopic attribute defines this
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
Using the metadata structures defined by DCAT the
FAIR Accessor may also tell you how to get the
content of the record, and what formats are available
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
In this case, the record is available in XML format
By calling HTTP GET on URL_U2
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
Or in RDF format by calling HTTP GET on URL_U1
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
What does a FAIR Accessor “look like”?
Or you may add additional layers...
Metadata
Metadata
Metadata
Metadata
DATA - format 1
DATA - format 2
Features of the FAIR Accessor
1: There is no API
GET
Interpret the Metadata
Select the desired Resource
GET
Features of the FAIR Accessor
1: There is no API
GET
Interpret the Metadata
Select the desired Resource
GET
ANY Web agent can explore/index a FAIR Accessor
(e.g. Google)
An agent that understands globally-accepted vocabularies
can explore it “intelligently”
Features of the FAIR Accessor
49
1: There is no API
It’s difficult to get thinner than nothing...
Features of the FAIR Accessor
2: Identifiers for unidentifi-ed/-able things
HTTP GET
<FAIR metadata/>
This is the ArrayExpress query
I did for paper doi:10/1234.56
Results:
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
Features of the FAIR Accessor
2: Identifiers for unidentifi-ed/-able things
HTTP GET
<FAIR metadata/>
This is the ArrayExpress query
I did for paper doi:10/1234.56
Results:
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
Should assist with reproducibility and transparency
Features of the FAIR Accessor
3: A predictable “place” for metadata
PrimaryTopic: record 1A445
Record Metadata...
DATA - format 1
DATA - format 2
Different “kinds” of metadata have distinct ontological types, and
distinct document structures. There is no ambiguity regarding
what the metadata is describing - a repository or a record.
Repository metadata
MetaRecordURL
Features of the FAIR Accessor
3: Symmetry & predictable path to citation
XXX
Part of dataset XXX
Metadata...
DATA - format 1
DATA - format 2
The record metadata contains an “upward” link to the Repository-
level metadata, which should contain license and citation
information
Repository metadata:
Cite: doi:10/8847.384
License: cc-by
Features of the FAIR Accessor
4: Granularity of Access/Privacy/Security
Container
Resource HTTP GET
<FAIR metadata/>
Contains
<<184 Records>>
Contact Mark Wilkinson
For more information about
These records
Features of the FAIR Accessor
4: Granularity of Access/Privacy/Security
Container
Resource HTTP GET
<FAIR metadata/>
Contains
<<184 Records>>
Contact Mark Wilkinson
For more information about
These records
Features of the FAIR Accessor
Container HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource3
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:distribution
<<NONE>>
HTTP GET
4: Granularity of Access/Privacy/Security
Features of the FAIR Accessor
Container HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource3
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:distribution
<<NONE>>
HTTP GET
4: Granularity of Access/Privacy/Security
Container HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
HTTP GET
Features of the FAIR Accessor
4: Granularity of Access/Privacy/Security
Container HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
HTTP GET
Features of the FAIR Accessor
4: Granularity of Access/Privacy/Security
Features of the FAIR Accessor
4: Granularity of Access/Privacy/Security
Thin solution - if it’s private, do nothing! Literally!
The Real Thing
A working FAIR Accessor
Serving a “Slice” of UniProt
A real-world scenario...
You are publishing a paper describing the
evolution of proteins in the RNA Processing
machineries of the fungus Aspergillus nidulans.
You want to be a good scholarly publisher
interested in transparency and reproducibility
So you must describe, in detail, the inclusion/exclusion
criteria for selecting proteins for your dataset
(today, this is generally done either in the text of the
paper, or not at all...)
The query that returns the relevant proteins
WHERE
{
?protein a up:Protein .
?protein up:organism ?organism .
?organism rdfs:subClassOf taxon:162425 .
?protein up:classifiedWith ?go .
?go rdfs:subClassOf* <http://purl.obolibrary.org/obo/GO_0006396> .
bind(replace(str(?protein),
"http://purl.uniprot.org/uniprot/", "", "i") as ?id)
}
The query that returns the relevant proteins
WHERE
{
?protein a up:Protein .
?protein up:organism ?organism .
?organism rdfs:subClassOf taxon:162425 .
?protein up:classifiedWith ?go .
?go rdfs:subClassOf* <http://purl.obolibrary.org/obo/GO_0006396> .
bind(replace(str(?protein),
"http://purl.uniprot.org/uniprot/", "", "i") as ?id)
}
NCBI Taxonomy:
Aspergillus nidulans
The query that returns the relevant proteins
WHERE
{
?protein a up:Protein .
?protein up:organism ?organism .
?organism rdfs:subClassOf taxon:162425 .
?protein up:classifiedWith ?go .
?go rdfs:subClassOf* <http://purl.obolibrary.org/obo/GO_0006396> .
bind(replace(str(?protein),
"http://purl.uniprot.org/uniprot/", "", "i") as ?id)
}
Gene Ontology:
RNA Processing
Create and publish a FAIR Accessor for that query
http://linkeddata.systems/Accessors/UniProtAccessor
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
Create and publish a FAIR Accessor for that query
http://linkeddata.systems/Accessors/UniProtAccessor
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
Resolve the URI
(in software or in your
browser)
Create and publish a FAIR Accessor for that query
Returns a page of metadata (in this example, in RDF)
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
70
71
Note that this Metadata is about ME! I am the creator of this dataset, and may be credited for it.
Container Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
Step down to individual Record metadata
Step down to individual Record metadata
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
Software calls HTTP GET on the URL
representing the MetaRecord Resource
for the desired record in the Container
(or just click on it, or type it into your browser)
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
The document that is returned
Note the change in metadata focus!
This metadata is about the UniProt Record
(not about Mark Wilkinson).
The record described in this metadata was
created by UniProt, so the citation and
authorship information is now THEIRS, not
MINE.
Container
Resource
Symmetrical Link
back upward to the Accessor
Container, for additional
metadata
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
Two ways to retrieve the record - RDF or HTML
(in REST-speak, two Representations
of that Resource)
Note that this metadata record is
somewhat more FAIR, than what you
can (easily) retrieve from UniProt itself!
e.g. the UniProt record does not include
the citation or license information - you
have to manually surf around the
UniProt Web page to find that.
So the Accessor makes UniProt’s
already notably FAIR data, even more
FAIR (with respect to “R”)
How FAIR are we now?
What does the Accessor give us?
What we have achieved
We have created a FAIR record for something - i.e. a slice of a database - that was,
historically, un-recordable and un-identifiable in any formal way.F
F + A
F + R
Accessors are a standard approach to providing human & machine accessible metadata
to facilitate appropriate discovery (contextual, biological), proper usage (license) and
proper citation for any kind of data.
The discovery, accessibility, and drill-down/up behaviors do not require any novel
API, rather simply rely on global Web standards; this allows them to be indexed by
existing Web search engines
What we have achieved
F + AI
The metadata itself uses machine-accessible syntaxes, and widely adopted ontologies
and vocabularies, thus easily integrates with other metadata
A
Accessors provide a lightweight means to protect privacy while still providing the
maximum degree of transparency possible
+
Accessors can be static, or dynamic. i.e. we can provide template Accessor file(s)
that are edited in Notepad, then published together with the data; or Accessors can
dynamically generate their output from code (e.g. layered on a database server)
So far, we have focused on
FAIR Metadata
Are there approaches to
making the DATA FAIR?
Making a Plant-related Resource FAIR
FAIR reformatting
of the plant component of the
Pathogen Host Interaction Database
(PHI-base)
Making a Plant-related Resource FAIR
Dr. Mikel Egaña Aranguren
Ontologist
Dr. Alejandro Rodríguez González
Database Expert
Dr. Alejandro Rodríguez Iglesias
(PhD student at the time)
Rodriguez-Iglesias A., Rodriguez-González A., Irvine AG., Sesma A., Urban M., Hammond-Kosack KE.,
Wilkinson MD. 2016. Publishing FAIR Data: An Exemplar Methodology Utilizing PHI-Base.
Frontiers in Plant Science 7.
Extract  Transform  Load
A “Brute Force” approach to FAIRness
Requires a ~comprehensive data/semantic model
Making a Plant-related Resource FAIR
The Plant Pathogen Interaction Ontology (PPIO)
Written in OWL2
Many of the Classes are defined by rich logical axioms
Designed for automated classification and enrichment of data
through logical reasoning
(e.g. if attached to a data stream)
Semantic Modeling of
Plant Pathogen Interaction Data
General introduction 95
The Disease Triangle – Pathogen/Host/Environment
http://fyi.uwex.edu/fieldcroppathology/field-crops-fungicide-information/
General introduction 96
The Disease Triangle – Pathogen/Host/Environment
This concept has evolved over decades of
domain-expert thought and discussion
Why not use this as the basis for our
Semantic model of Pathogenicity?
The Disease Triangle, Modelled as “Contexts”
Interaction Context
Environmental Context
Host Context
Pathogen Context
Resulting
phenotype
Interaction Context
Phenotype
Resistance phenotype
Susceptibility phenotype
Phenotypic Process
1. Abnormal growth development phenotype.
2. Color variation phenotype.
3. Tissue disintegration phenotype.
4. Vascular system damage phenotype.
99
Interaction Context
Phenotypic Process Branch
100
+
Interaction Context
Phenotypic Process Branch
The Disease Triangle, Modelled as “Contexts”
Interaction Context
Environmental Context
Host Context
Pathogen Context
Resulting
phenotype
Environmental Context
manually extracted from the literature
Environmental Context
Environmental Context
manually extracted from the literature
Environmental Context
Environmental Context
manually extracted from the literature
Environmental Context
A pathogen that enters through
the stomata will be more successful
in high humidity, and have
higher pathogenicity
Environmental Context
manually extracted from the literature
Environmental Context
The Disease Triangle, Modelled as “Contexts”
Interaction Context
Environmental Context
Host Context
Pathogen Context
Resulting
phenotype
• Plant-pathogen interaction data, including:
• Resulting phenotypes
• Molecular/genetic basis of pathogenicity
• Experimental approaches
• Provenance information
Host
Context
Pathogen
Context
• 4800 interactions
• 3300 gene-mutant records
• 220 pathogens
• 130 hosts
• 261 registered diseases
• 1700 references
Host
Context
Pathogen
Context
Interaction Context
Interaction
Context
[WT]
Interaction
Context
[mutant]
Host
Context
Host
Context
Pathogen
Context
Pathogen
Context
Description DescriptionProtocol Protocol
“Historical
observation”
“Base
state” “reduced
virulence”
“soft rot”
Protocol
descriptionCitation
PubMed
ID
“PMID:1234”
“gene
deletion”
Environmental
Context
Environmental
Context
Rodríguez-Iglesias A. et al Front. Plant Sci., (2016)
Interaction Context
Interaction
Context
[WT]
Interaction
Context
[mutant]
Host
Context
Host
Context
Pathogen
Context
Pathogen
Context
Description DescriptionProtocol Protocol
“Historical
observation”
“Base
state” “reduced
virulence”
“soft rot”
Protocol
descriptionCitation
PubMed
ID
“PMID:1234”
“gene
deletion”
Environmental
Context
Environmental
Context
Rodríguez-Iglesias A. et al Front. Plant Sci., (2016)
Pathogen
Context
Allele
Gene
Locus
ID
Gene
function
Gene
name
Gene
accession
“AEQ95741”
“Effector protein”
“TAL2G”
“G7TJZ8”
Rodríguez-Iglesias A. et al Front. Plant Sci., (2016)
Pathogen
Context
Allele
Gene
Locus
ID
Gene
function
Gene
name
Gene
accession
“AEQ95741”
“Effector protein”
“TAL2G”
“G7TJZ8”
Rodríguez-Iglesias A. et al Front. Plant Sci., (2016)
http://identifiers.org/*/*
Pathogen
Context
Allele
Gene
Locus
ID
Gene
function
Gene
name
Gene
accession
“AEQ95741”
“Effector protein”
“TAL2G”
“G7TJZ8”
Rodríguez-Iglesias A. et al Front. Plant Sci., (2016)
rdfs:label
Pathogen
Context
Allele
Gene
Locus
ID
Gene
function
Gene
name
Gene
accession
“AEQ95741”
“Effector protein”
“TAL2G”
“G7TJZ8”
Rodríguez-Iglesias A. et al Front. Plant Sci., (2016)
http://identifiers.org/*/*
Pathogen
Context
Allele
Gene
Locus
ID
Gene
function
Gene
name
Gene
accession
“AEQ95741”
“Effector protein”
“TAL2G”
“G7TJZ8”
Rodríguez-Iglesias A. et al Front. Plant Sci., (2016)
http://identifiers.org/*/*
KEY MESSAGE:
Because we use identifiers.org URIs,
and AgroLD does also,
we can query our Pathogen Host
Interaction database, and
DYNAMICALLY RETRIEVE
additional information from AgroLD
with NO additional effort!!
Transform PHI-base data into RDF
compliant with the PPIO Ontology
Load into Virtuoso Triplestore
Transform PHI-base data into RDF
compliant with the PPIO Ontology
Load into Virtuoso Triplestore
(this was a LOT of work!!)
Transform PHI-base data into RDF
compliant with the PPIO Ontology
Load into Virtuoso Triplestore
…but are we now FAIR?
Transform PHI-base data into RDF
compliant with the PPIO Ontology
Load into Virtuoso Triplestore
…but are we now FAIR?
…Not really….
Findable
Accessible
Interoperable
ReusableX
X
HTTP GET, SPARQL, open access
RDF with published ontologies
Findable
Accessible
Interoperable
ReusableX
X
• How would you find this database?
• How would you know if anything interesting is in it?
• How would you (your machine) find a record?
• Who do you cite if you reuse a piece of data?
• What are the license conditions?
• Can I reuse the data at all??
HTTP GET, SPARQL, open access
RDF with published ontologies
Build a FAIR Accessor
http://linkeddata.systems/SemanticPHIBase/Metadata
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic SemPHI #1
dcat:Distribution_1
Source URL_U1
Format rdf+xml
dcat:Distribution_2
Source URL_U2
Format HTML
HTTP GET
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic SemPHI #1
dcat:Distribution_1
Source URL_U1
Format rdf+xml
dcat:Distribution_2
Source URL_U2
Format HTML
HTTP GET
Build a FAIR Accessor
http://linkeddata.systems/SemanticPHIBase/Metadata
The URL of the record in
“native” PHI-base
Container
Resource HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic SemPHI #1
dcat:Distribution_1
Source URL_U1
Format rdf+xml
dcat:Distribution_2
Source URL_U2
Format HTML
HTTP GET
Build a FAIR Accessor
http://linkeddata.systems/SemanticPHIBase/Metadata
The URL of the (RDF) record in
Semantic PHI-base
This allows us to find Semantic PHI-base
based on its Repository-level Metadata
“what kind of data does Semantic PHI-base Contain?”
“Does it have any information about my gene of interest?”
And “drill-down” to a record of interest
selected based on its Record Metadata
But… FAIR Accessors should be symmetrical
How do I go from data back “upwards” to metadata?
But… FAIR Accessors should be symmetrical
How do I go from data back “upwards” to metadata?
To allow the retrieval of the Metadata
for any piece of data in Semantic PHI Base
Use the URL of the FAIR Accessor (Container Resource)
as the URL of the “Named Graph” in the triplestore
using RDF “Quads”
SubjectURI  PredicateURI  ObjectURI  ContextURL
Container URL HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
But… FAIR Accessors should be symmetrical
How do I go from data back “upwards” to metadata?
Container URL HTTP GET
<FAIR metadata/>
Contains
MetaRecordResource1
MetaRecordResource2
MetaRecordResource3
...
To allow the retrieval of the Metadata
for any piece of data in Semantic PHI Base
Use the URL of the FAIR Accessor (Container Resource)
as the URL of the “Named Graph” in the triplestore
using RDF “Quads”
SubjectURI  PredicateURI  ObjectURI  Container URL
Findable
Accessible
Interoperable
Reusable
The Brute Force approach is…
a lot of work!
Worthwhile for community-critical
resources and databases
like AgroLD, UniProt, PHI-base, ChEMBL, etc.
Is there a more “elegant” & lightweight
way to be FAIR?
FAIR Projection:
Providing FAIR Data
from non-FAIR Data
Dynamically
This is going to be a bit complicated, but
please be patient
Imagine the data
we need to integrate
is in a CSV file
in FigShare or Zenodo
How do we discover and integrate that data?
Things we need to do:
We need a way to query “opaque” data blobs (like CSV) about their content
We need a way to retrieve that content in a FAIR format
We need, therefore, to model semantics for that opaque data content
We need to model various semantics for that content (one “size” doesn’t fit all!)
We need to associate those semantic models with a record or record-sets
We need a way to query those semantics determine which “size” fits our req’s
We would like to reuse semantic definitions as much as possible
We need to do all of this without creating a new API :-)
Triple Pattern Fragments
+
RDF Mapping Language
Ruben Verborgh
Ghent University
Anastasia Dimou
Ghent University
Triple Pattern Fragments (TPF)
A REST interface for requesting/retrieving RDF Triples
(from any source)
Ruben Verborgh
“Slices” of data, from any source, are considered Resources
and are therefore represented by a distinct URL:
http://some.database.org/dataset?s=___;p=___;o=___
Calling HTTP GET on a TPF URL returns the set of Triples matching {?s, ?p, ?o}
PLUS hypermedia instructions and Resource URLs for other relevant slices.
Triple Pattern Fragments (TPF)
A REST interface for retrieving RDF Triples
(from any source)
Ruben Verborgh
For example, the “BMI” column from a patient registry is a Resource with the URL:
http://my.registry.org/patients?p=CMO:0000105 (CMO:0000105 = “body mass index””)
HTTP GET gives me all BMI triples in the registry, together with other Resource URLs
representing other “slices” that might be useful, for example:
http://my.registry.org/patients?p=CMO:0000004 (CMO:0000004 = “systolic B.P.”)
Triple Pattern Fragments (TPF)
A REST interface for retrieving RDF Triples
(from any source)
Ruben Verborgh
For example, the “BMI” column from a patient registry is a Resource with the URL:
http://my.registry.org/patients?p=CMO:0000105 (CMO:0000105 = “body mass index””)
HTTP GET gives me all BMI triples in the registry, together with other Resource URLs
representing other “slices” that might be useful, for example:
http://my.registry.org/patients?p=CMO:0000004 (CMO:0000004 = “systolic B.P.”)
We have a standard, RESTful way to
request triples from any data source
i.e. every slice of every dataset will be considered a distinct Resource
→ simply call HTTP GET on that Resource to get the Triples
But...
We have no way to know what TPF Resources
are available for any given dataset
or what those Resources “are”
(proteins? genes? patients? articles?)
RML
A way to describe the structure of an RDF document
Anastasia Dimou
RML allows us to create models of (meta)data structures
“What could this data look like, if it were mapped to RDF?”
RML fulfills similar objectives to DCAT Profiles, the Dublin Core
Application Profile, and ISO 11179 - Metadata Registries;
but has added advantages!
http://rml.io/RMLmappingLanguage.html
Using RML to describe the structure
and semantics of a single Triple
Map1
Predicate
Object Map
Subject
Map
Object
Map
ex:Patient
Record
subjectMap template
“http://example.org/patient/{id}”
predicate ex:has
Variant
Map2
Subject
Map2
SO:0000694
(“SNP”)
template
“http://identifiers.org/dbsnp/{snp}”
T
H
E
M
O
D
E
L
Using RML to describe the structure
and semantics of a single Triple
Map1
Predicate
Object Map
Subject
Map
Object
Map
ex:Patient
Record
subjectMap template
“http://example.org/patient/{id}”
predicate ex:has
Variant
Map2
Subject
Map2
SO:0000694
(“SNP”)
template
“http://identifiers.org/dbsnp/{snp}”
T
H
E
M
O
D
E
L
We call this a “Triple Descriptor”
These are used to describe the
structure of data “slices” in which
all Triples have the same structure
T
H
E
M
O
D
E
L
Using RML to describe the structure
and semantics of a single Triple
Map1
Predicate
Object Map
Subject
Map
Object
Map
ex:Patient
Record
subjectMap template
“http://example.org/patient/{id}”
predicate ex:has
Variant
Map2
Subject
Map2
SO:0000694
(“SNP”)
template
“http://identifiers.org/dbsnp/{snp}”
Patient:123
rdf:type
ex:Patient
Record
snp:
rs0020394
ex:hasVariant
rdf:type
ex:Patient
Record
The Data
Where are we now?
TPF - A standard, RESTful way to request Triples
Triple Descriptors - A standard way to describe
the structure and meaning of a Triple
Where are we now?
TPF - A standard, RESTful way to request Triples
Triple Descriptors - A standard way to describe
the structure and meaning of a Triple
We need a way to associate these with each other
We need a way to associate these with a dataset or record
Luckily, we have already solved this!
MetaRecord
Resource3
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
HTTP GET
The FAIR Accessor can do this
Using the metadata structures defined by DCAT the
FAIR Accessor also tells you how to get the content of
the record, and what formats are available
If we consider the TPF Resource URL to be just another
DCAT Distribution, we get...
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPF_URL
Format rdf+xml
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPF_URL
Format rdf+xml
If we consider the TPF Resource URL to be just another
DCAT Distribution, we get...
URL
representing the
Triple Pattern
Fragment
Resource
If we consider the TPF Resource URL to be just another
DCAT Distribution, we get… now add the Triple Descriptor
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source
TPFrag_URL_1
format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPF_URL
Format rdf+xml
Model: Triple_Desc_URL
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPF_URL_1
Format rdf+xml
Model: Triple_Desc_URL
If we consider the TPF Resource URL to be just another
DCAT Distribution, we get… now add the Triple Descriptor
HTTP GET on that
URL returns:
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPF_URL
Format rdf+xml
Model: Triple_Desc_URL
If we consider the TPF Resource URL to be just another
DCAT Distribution, we get… now add the Triple Descriptor
Record
+
TPF Server
+
RML Model
=
FAIR
Projector
If we consider the TPF Resource URL to be just another
DCAT Distribution, we get… now add the Triple Descriptor
HTTP GET on TPF_URL
returns rdf+xml triples from Record R
That look like
Interoperability
without Brute
Force
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPF_URL
Format rdf+xml
Model: Triple_Desc_URL
I hear you objecting… I skipped something important!!!
We still have not defined a way to
CREATE these triples
<FAIR metadata/>
foaf:primaryTopic Record R
dcat:Distribution_1
Source URL_U1
format rdf+xml
dcat:Distribution_2
Source URL_U2
format application/xml
dcat:Distribution_3
Source TPF_URL
Format rdf+xml
Model: Triple_Desc_URL
I hear you objecting… I skipped something important!!!
How does this
return Triples?
We still have not defined a way to
CREATE these triples
Sadly, there is no magic wand to create interoperability
Sadly, there is no magic wand to create interoperability
Someone has to write the TPF server that converts the data
Interoperability will never come “for free”
(because semantics will never come “for free”)
However, there are reasons for optimism!
1. Researchers transform data anyway to integrate it - this
is a daily routine in most bioinformatics labs
2. For the most common file formats (e.g. CSV or Excel),
there are RML-based tools to automate the RDF
transformation; simply create an RML model of what
you want, and ask the tool to covert the file.
3. Investing time into creating an RML model is more FAIR
than ad hoc “re-useless” brute-force transformation.
When you create a FAIR Projector for your own data
transformation needs, it is reusable!
However, there are reasons for optimism!
AND
4. RML Triple Descriptors are very simple (one triple!) so
we can also templatize their construction  creating a
FAIR Projector is quite easy in many cases!
5. Citations Citations Citations!
FAIR Accessors/Projectors are FAIR objects - You can get
credit if other people use your Projector for their analyses
Summary of FAIR Projectors
FAIR Projectors provide a discoverable and standardized REST interface to
retrieve interoperable data, and its interoperable metadataF+I+R
A + I
I
FAIR Projectors can convert non-FAIR data into FAIR data, or can change
the structure, URL format, or semantics of existing FAIR data sources
FAIR Projectors can be deployed over, and provide a common interface to:
- Static Data Deposits, in any format, anywhere
- Databases
- Triplestores
- Certain (common) types of Web Services
R+++ Triple Descriptors are FAIR entities, intended for reuse, &
None of this required a new API
Siri…
I need data about the expression of the Oryza dwarf-1 gene under
high salt conditions.
Please find that data, regardless of location and format
If possible, please reformat it automatically to match my local dataset
Also, please collect the citation information for each piece of data
If the data is not under an open access license, or if any of the data is
behind a firewall or paywall, please provide me the contact
information of the data owner so that I can ask them for a copy.
The (near!) future of FAIR
Thanks to:
Michel Dumontier - Stanford Center for Biomedical Informatics Research, Stanford, California.
Ruben Verborgh – Ghent University – imec, Ghent, Belgium
Luiz Olavo Bonino da Silva Santos - Dutch Techcentre for Life Sciences, Utrecht, The Netherlands -
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
Tim Clark - Department of Neurology, Massachusetts General Hospital Boston MA and Harvard Medical
School, Boston, MA, USA
Morris A. Swertz - Genomics Coordination Center and Department of Genetics, University Medical
Center Groningen, Groningen, The Netherlands
Fleur D.L. Kelpin - Genomics Coordination Center and Department of Genetics, University Medical
Center Groningen, Groningen, The Netherlands
Alasdair J. G. Gray - Department of Computer Science, School of Mathematical and Computer Sciences,
Heriot-Watt University, Edinburgh, UK
Erik A. Schultes - Department of Human Genetics, Leiden University Medical Center, The Netherlands
Erik M. van Mulligen - Department of Medical Informatics, Erasmus University Medical Center
Rotterdam, The Netherlands
Paolo Ciccarese - Perkin Elmer Innovation Lab, Cambridge MA and Harvard Medical School, Boston
MA, USA
Mark Thompson - Leiden University Medical Center, Leiden, The Netherlands
Jerven T. Bolleman - Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical
Universitaire, Geneva, Switzerland
Thanks to my former lab members… I MISS YOU!!!
Dr. Mikel Egaña Aranguren
Ontologist
Dr. Alejandro Rodríguez González
Database Expert
Dr. Alejandro Rodríguez Iglesias
(PhD student at the time)
Funding for Mark Wilkinson from:
Fundacion BBVA and the UPM Isaac Peral programme, and the
Spanish Ministerio de Economía y Competitividad grant number
TIN2014-55993-R.
Additional support for FAIR Skunkworks members comes from:
European Union funded projects ELIXIR-EXCELERATE (H2020 no. 676559),
ADOPT BBMRI-ERIC (H2020 no. 676550)
CORBEL (H2020 no. 654248)
Netherlands Organisation for Scientific Research (Odex4all project)
Stichting Topconsortium voor Kennis en Innovatie High Tech Systemen en Materialen
(FAIRdICT project)
BBMRI-NL
RD-Connect and ELIXIR (Rare disease implementation study FP7 no. 305444).
You are not only welcome to share
and reuse this presentation...
...You are encouraged to!
http://tinyurl.com/IBC-FAIR
Important: All our templates are free to use under Creative Commons Attribution License. If you use the graphic assets (photos,
icons and typographies) included in this Google Slides Templates you must keep the Credits slide or add all attributions in the last
slide notes.
FGST
Free GoogleSlides
Templates
Some graphical elements were taken from slide templates provided by:

More Related Content

What's hot

Anatomy of a semantic virus
Anatomy of a semantic virusAnatomy of a semantic virus
Anatomy of a semantic virus
UltraUploader
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Michel Dumontier
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
Araport
 

What's hot (20)

Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
 
Omitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddataOmitola w3 c_govtlinkeddata
Omitola w3 c_govtlinkeddata
 
Semantic Web Adoption
Semantic Web AdoptionSemantic Web Adoption
Semantic Web Adoption
 
04 findable imming
04 findable imming04 findable imming
04 findable imming
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearch
 
Anatomy of a semantic virus
Anatomy of a semantic virusAnatomy of a semantic virus
Anatomy of a semantic virus
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental MetadataMaking it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
Making it Easier, Possibly Even Pleasant, to Author Rich Experimental Metadata
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...
 
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
Tech. session : Interoperability and Data FAIRness emerges from a novel combi...
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
 
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
 
New PID developments
New PID developmentsNew PID developments
New PID developments
 
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
 
Role of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly worksRole of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly works
 

Viewers also liked

Building a Mega Community with PressWork
Building a Mega Community with PressWorkBuilding a Mega Community with PressWork
Building a Mega Community with PressWork
Brendan Sera-Shriar
 
82378 andrea bocelli-y_celline_dion-1
82378 andrea bocelli-y_celline_dion-182378 andrea bocelli-y_celline_dion-1
82378 andrea bocelli-y_celline_dion-1
pipis397
 
Ryan Lost In Rainforest
Ryan Lost In RainforestRyan Lost In Rainforest
Ryan Lost In Rainforest
nat014
 
Patient Account Services
Patient Account ServicesPatient Account Services
Patient Account Services
meghandue
 
Artikel opleiden in de school in Rotterdam
Artikel opleiden in de school in RotterdamArtikel opleiden in de school in Rotterdam
Artikel opleiden in de school in Rotterdam
Luc Sluijsmans
 
Resultaten project lerarentekort Rotterdam juni 2011
Resultaten project lerarentekort Rotterdam juni 2011Resultaten project lerarentekort Rotterdam juni 2011
Resultaten project lerarentekort Rotterdam juni 2011
Luc Sluijsmans
 

Viewers also liked (20)

Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
Data FAIRport Prototype & Demo - Presentation to Elsevier, Jul 10, 2015
 
Sin ropa
Sin ropaSin ropa
Sin ropa
 
Encouraging Curriculum Change in the Netherlands
Encouraging Curriculum Change in the NetherlandsEncouraging Curriculum Change in the Netherlands
Encouraging Curriculum Change in the Netherlands
 
¡ALIMENTOS Y MALESTARES!
¡ALIMENTOS Y MALESTARES!¡ALIMENTOS Y MALESTARES!
¡ALIMENTOS Y MALESTARES!
 
Building a Mega Community with PressWork
Building a Mega Community with PressWorkBuilding a Mega Community with PressWork
Building a Mega Community with PressWork
 
SWAT4LS 2011: SADI Knowledge Explorer Plug-in
SWAT4LS 2011: SADI Knowledge Explorer Plug-inSWAT4LS 2011: SADI Knowledge Explorer Plug-in
SWAT4LS 2011: SADI Knowledge Explorer Plug-in
 
Tutorial 1.3 - Run Enrichment Analysis
Tutorial 1.3 - Run Enrichment AnalysisTutorial 1.3 - Run Enrichment Analysis
Tutorial 1.3 - Run Enrichment Analysis
 
82378 andrea bocelli-y_celline_dion-1
82378 andrea bocelli-y_celline_dion-182378 andrea bocelli-y_celline_dion-1
82378 andrea bocelli-y_celline_dion-1
 
Grupo de carlos
Grupo de carlosGrupo de carlos
Grupo de carlos
 
The Semantic Web - This time... its Personal
The Semantic Web - This time... its PersonalThe Semantic Web - This time... its Personal
The Semantic Web - This time... its Personal
 
Tumor Type Search
Tumor Type SearchTumor Type Search
Tumor Type Search
 
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...
Eindadvies over-de-vernieuwing-van-de-examenprogrammas-maatschappijwetenschap...
 
hi
hihi
hi
 
Ryan Lost In Rainforest
Ryan Lost In RainforestRyan Lost In Rainforest
Ryan Lost In Rainforest
 
Patient Account Services
Patient Account ServicesPatient Account Services
Patient Account Services
 
Rcg Presentation
Rcg PresentationRcg Presentation
Rcg Presentation
 
Leveraging trade associations
Leveraging trade associationsLeveraging trade associations
Leveraging trade associations
 
Artikel opleiden in de school in Rotterdam
Artikel opleiden in de school in RotterdamArtikel opleiden in de school in Rotterdam
Artikel opleiden in de school in Rotterdam
 
Resultaten project lerarentekort Rotterdam juni 2011
Resultaten project lerarentekort Rotterdam juni 2011Resultaten project lerarentekort Rotterdam juni 2011
Resultaten project lerarentekort Rotterdam juni 2011
 

Similar to IBC FAIR Data Prototype Implementation slideshow

RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Carole Goble
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 

Similar to IBC FAIR Data Prototype Implementation slideshow (20)

Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
RO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital ObjectsRO-Crate: packaging metadata love notes into FAIR Digital Objects
RO-Crate: packaging metadata love notes into FAIR Digital Objects
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)ITWS Capstone (RPI, Fall 2013)
ITWS Capstone (RPI, Fall 2013)
 
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Introduction of Linked Data for Science
Introduction of Linked Data for ScienceIntroduction of Linked Data for Science
Introduction of Linked Data for Science
 
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
 
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of ...
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
 
Building Federated FAIR Data Spaces, Yann Le Franc, EOSC-Pillar
Building Federated FAIR Data Spaces, Yann Le Franc, EOSC-PillarBuilding Federated FAIR Data Spaces, Yann Le Franc, EOSC-Pillar
Building Federated FAIR Data Spaces, Yann Le Franc, EOSC-Pillar
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 

More from Mark Wilkinson

Sample data and other ur ls
Sample data and other ur lsSample data and other ur ls
Sample data and other ur ls
Mark Wilkinson
 
SADI in Taverna Tutorial
SADI in Taverna TutorialSADI in Taverna Tutorial
SADI in Taverna Tutorial
Mark Wilkinson
 

More from Mark Wilkinson (20)

FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1FAIR Metrics - Presentation to NIH KC1
FAIR Metrics - Presentation to NIH KC1
 
Introducing the fair evaluator
Introducing the fair evaluatorIntroducing the fair evaluator
Introducing the fair evaluator
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
Sample data and other ur ls
Sample data and other ur lsSample data and other ur ls
Sample data and other ur ls
 
Example code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web ServiceExample code for the SADI BMI Calculator Web Service
Example code for the SADI BMI Calculator Web Service
 
Sadi service
Sadi serviceSadi service
Sadi service
 
Tutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-servicesTutorial - Creating SADI semantic-web-services
Tutorial - Creating SADI semantic-web-services
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...Enhancing Reproducibility and Transparency in Clinical Research through Seman...
Enhancing Reproducibility and Transparency in Clinical Research through Seman...
 
SADI CSHALS 2013
SADI CSHALS 2013SADI CSHALS 2013
SADI CSHALS 2013
 
Web Science 2.0 - in silico science
Web Science 2.0 - in silico scienceWeb Science 2.0 - in silico science
Web Science 2.0 - in silico science
 
Web Science - ISoLA 2012
Web Science - ISoLA 2012Web Science - ISoLA 2012
Web Science - ISoLA 2012
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the Singularity
 
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
Evaluating Hypotheses using SPARQL-DL as an abstract workflow language to cho...
 
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
SADI in Perl - Protege Plugin Tutorial (fixed Aug 24, 2011)
 
Technologies, methods and challenges to data sharing and aggrigation
Technologies, methods and challenges to data sharing and aggrigationTechnologies, methods and challenges to data sharing and aggrigation
Technologies, methods and challenges to data sharing and aggrigation
 
ISoLA 2010: SADI Taverna plug-in
ISoLA 2010:  SADI Taverna plug-inISoLA 2010:  SADI Taverna plug-in
ISoLA 2010: SADI Taverna plug-in
 
The Scientific Method on the Semantic Web
The Scientific Method on the Semantic WebThe Scientific Method on the Semantic Web
The Scientific Method on the Semantic Web
 
How SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico scienceHow SADI & SHARE help restore the Scientific Method to in silico science
How SADI & SHARE help restore the Scientific Method to in silico science
 
SADI in Taverna Tutorial
SADI in Taverna TutorialSADI in Taverna Tutorial
SADI in Taverna Tutorial
 

Recently uploaded

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Recently uploaded (20)

COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 

IBC FAIR Data Prototype Implementation slideshow

  • 1. Mark D. Wilkinson CBGP-UPM/INIA, Madrid markw@illuminae.com A novel, API-free approach to interoperability leads to FAIRness for legacy and prospective data. IBC Scientific Days January 17-18, 2017
  • 2. The Problem ...one recent survey of 18 microarray studies found that only two were fully reproducible using the archived data. Another study of 19 papers in population genetics found that 30% of analyses could not be reproduced from the archived data and that 35% of datasets were incorrectly or insufficiently described. “ ” Dominique G. Roche , Loeske E. B. Kruuk, Robert Lanfear, Sandra A. Binning (2015) http://dx.doi.org/10.1371/journal.pbio.1002295
  • 3. The Problem We surveyed 100 datasets associated with nonmolecular studies in journals that commonly publish ecological and evolutionary research and have a strong PDA policy. Out of these datasets, 56% were incomplete, and 64% were archived in a way that partially or entirely prevented reuse. “ ” Dominique G. Roche , Loeske E. B. Kruuk, Robert Lanfear, Sandra A. Binning (2015) http://dx.doi.org/10.1371/journal.pbio.1002295
  • 4. The Problem Is that data, therefore... Useless?
  • 7. FAIR Findable → Globally unique, resolvable, and persistent identifiers → Machine-actionable contextual information supporting discovery Accessible → Clearly-defined access protocol → Clearly-defined rules for authorization/authentication Interoperable → Use shared vocabularies and/or ontologies → Syntactically and semantically machine-accessible format Reusable → Be compliant with the F, A, and I Principles → Contextual information, allowing proper interpretation → Rich provenance information facilitating accurate citation The Four Principles
  • 9. Skunkworks Participants Mark Wilkinson Michel Dumontier Barend Mons Tim Clark Jun Zhao Paolo Ciccarese Paul Groth Erik van Mulligen Luiz Olavo Bonino da Silva Santos Matthew Gamble Carole Goble Joël Kuiper Morris Swertz Erik Schultes Erik Schultes Mercè Crosas Adrian Garcia Philip Durbin Jeffrey Grethe Katy Wolstencroft Sudeshna Das M. Emily Merrill
  • 10. The Hourglass Concept We want a large ecosystem of apps that use FAIR Data
  • 11. The Hourglass Concept We want to support a wide range of source providers
  • 12. The Hourglass Concept The FAIR solution between them must be THIN!
  • 13. Skunkworks participants had tons of experience v.v. metadata around scholarly publication
  • 14. Skunkworks participants had tons of experience v.v. metadata around scholarly publication RDA, Force11, Dataverse, Research Objects, NanoPubs, Semantic Science, SADI, AlzForum, SWAN, LSID, … … … ...
  • 15. There was very little disagreement about F, about A, or about R
  • 16. The “I” is the big problem
  • 18. Keeping the history brief A series of teleconferences led to the concept of putting metadata into an iterative set of ~identical “containers”
  • 19. Skunkworks Hackathons The “containers of containers of containers” idea was elaborated by the belief that we should also reject any solution that required a new API ProgrammableWeb.com already catalogues >16,000 different Web APIs
  • 20. Skunkworks Hackathons The “containers of containers of containers” idea was elaborated by the belief that we should also reject any solution that required a new API ProgrammableWeb.com already catalogues >16,000 different Web APIs APIs DO NOT MAKE YOU INTEROPERABLE!
  • 21. Skunkworks Hackathons The “containers of containers of containers” idea was elaborated by the belief that we should also reject any solution that required a new API
  • 22. Skunkworks Hackathons Are there existing standards that are And have the properties of ?
  • 23.
  • 24. Uses machine-accessible standards and representations, following a REST paradigm LDP Useful Features I I + R F + A I Defines HTTP-resolvable URIs for each of these containers Defines the concept of a “Container” - a machine-actionable way to represent repositories, data deposits, data files, data points, and their metadata Uses a widely accepted standard (DCAT) to relate metadata to data → machine- actionable data mining
  • 25. Uses machine-accessible standards and representations, following a REST paradigm LDP Useful Features I I + R F + A I Defines HTTP-resolvable URIs for each of these containers Defines the concept of a “Container” - a machine-actionable way to represent repositories, data deposits, data files, data points, and their metadata Uses a widely accepted standard (DCAT) to relate metadata to data → machine- actionable data mining
  • 26. The FAIR Accessor In incremental detail
  • 27. What can we describe with FAIR Accessors? FAIR Accessors provide a machine-actionable, structured, REST-oriented way to publish Metadata about a wide range of scholarly “entities”
  • 28. What can we describe with FAIR Accessors? Warehouses (e.g. EBI) Databases (e.g. UniProt) Repositories (e.g. Zenodo, INRA-URGI Wheat Repo, UniProt) Datasets (e.g. output from a workflow) Research Objects (data a/o workflow a/o results a/o publications) Data “slices” (e.g. the result of a database query) Data Records (e.g. image, excel file, patient clinical record) Other…
  • 29. HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? Container Resource
  • 30. (a “resource” is a URI / URL) HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”?
  • 31. HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? Container Resource
  • 32. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... What does a FAIR Accessor “look like”?
  • 33. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... What does a FAIR Accessor “look like”? There is a URI for the “Container” (of any of the kinds listed in the previous slide)
  • 34. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... What does a FAIR Accessor “look like”? Resources are manipulated using the HTTP protocol on the Resource URI For the FAIR Accessor, the only HTTP method we currently require is HTTP GET
  • 35. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... What does a FAIR Accessor “look like”? What is returned is a document full of metadata richly describing that Container (warehouse, database, dataset, slice, etc.) And a list of Resources (URIs) that represent the contained “things”
  • 36. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... What does a FAIR Accessor “look like”? Looking more closely at one of those contained things...
  • 37. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? The contained thing is a Resource
  • 38. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? That Resource can be resolved by HTTP GET
  • 39. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? To retrieve a Metadata document describing that resource (e.g. a single record)
  • 40. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? Which record does this Metadata describe? The foaf:primaryTopic attribute defines this
  • 41. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? Using the metadata structures defined by DCAT the FAIR Accessor may also tell you how to get the content of the record, and what formats are available
  • 42. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? In this case, the record is available in XML format By calling HTTP GET on URL_U2
  • 43. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”? Or in RDF format by calling HTTP GET on URL_U1
  • 44. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET What does a FAIR Accessor “look like”?
  • 45. Or you may add additional layers... Metadata Metadata Metadata Metadata DATA - format 1 DATA - format 2
  • 46. Features of the FAIR Accessor 1: There is no API GET Interpret the Metadata Select the desired Resource GET
  • 47. Features of the FAIR Accessor 1: There is no API GET Interpret the Metadata Select the desired Resource GET ANY Web agent can explore/index a FAIR Accessor (e.g. Google) An agent that understands globally-accepted vocabularies can explore it “intelligently”
  • 48. Features of the FAIR Accessor 49 1: There is no API It’s difficult to get thinner than nothing...
  • 49. Features of the FAIR Accessor 2: Identifiers for unidentifi-ed/-able things HTTP GET <FAIR metadata/> This is the ArrayExpress query I did for paper doi:10/1234.56 Results: MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ...
  • 50. Features of the FAIR Accessor 2: Identifiers for unidentifi-ed/-able things HTTP GET <FAIR metadata/> This is the ArrayExpress query I did for paper doi:10/1234.56 Results: MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... Should assist with reproducibility and transparency
  • 51. Features of the FAIR Accessor 3: A predictable “place” for metadata PrimaryTopic: record 1A445 Record Metadata... DATA - format 1 DATA - format 2 Different “kinds” of metadata have distinct ontological types, and distinct document structures. There is no ambiguity regarding what the metadata is describing - a repository or a record. Repository metadata MetaRecordURL
  • 52. Features of the FAIR Accessor 3: Symmetry & predictable path to citation XXX Part of dataset XXX Metadata... DATA - format 1 DATA - format 2 The record metadata contains an “upward” link to the Repository- level metadata, which should contain license and citation information Repository metadata: Cite: doi:10/8847.384 License: cc-by
  • 53. Features of the FAIR Accessor 4: Granularity of Access/Privacy/Security Container Resource HTTP GET <FAIR metadata/> Contains <<184 Records>> Contact Mark Wilkinson For more information about These records
  • 54. Features of the FAIR Accessor 4: Granularity of Access/Privacy/Security Container Resource HTTP GET <FAIR metadata/> Contains <<184 Records>> Contact Mark Wilkinson For more information about These records
  • 55. Features of the FAIR Accessor Container HTTP GET <FAIR metadata/> Contains MetaRecordResource3 MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:distribution <<NONE>> HTTP GET 4: Granularity of Access/Privacy/Security
  • 56. Features of the FAIR Accessor Container HTTP GET <FAIR metadata/> Contains MetaRecordResource3 MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:distribution <<NONE>> HTTP GET 4: Granularity of Access/Privacy/Security
  • 57. Container HTTP GET <FAIR metadata/> Contains MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml HTTP GET Features of the FAIR Accessor 4: Granularity of Access/Privacy/Security
  • 58. Container HTTP GET <FAIR metadata/> Contains MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml HTTP GET Features of the FAIR Accessor 4: Granularity of Access/Privacy/Security
  • 59. Features of the FAIR Accessor 4: Granularity of Access/Privacy/Security Thin solution - if it’s private, do nothing! Literally!
  • 60. The Real Thing A working FAIR Accessor Serving a “Slice” of UniProt
  • 61. A real-world scenario... You are publishing a paper describing the evolution of proteins in the RNA Processing machineries of the fungus Aspergillus nidulans. You want to be a good scholarly publisher interested in transparency and reproducibility So you must describe, in detail, the inclusion/exclusion criteria for selecting proteins for your dataset (today, this is generally done either in the text of the paper, or not at all...)
  • 62. The query that returns the relevant proteins WHERE { ?protein a up:Protein . ?protein up:organism ?organism . ?organism rdfs:subClassOf taxon:162425 . ?protein up:classifiedWith ?go . ?go rdfs:subClassOf* <http://purl.obolibrary.org/obo/GO_0006396> . bind(replace(str(?protein), "http://purl.uniprot.org/uniprot/", "", "i") as ?id) }
  • 63. The query that returns the relevant proteins WHERE { ?protein a up:Protein . ?protein up:organism ?organism . ?organism rdfs:subClassOf taxon:162425 . ?protein up:classifiedWith ?go . ?go rdfs:subClassOf* <http://purl.obolibrary.org/obo/GO_0006396> . bind(replace(str(?protein), "http://purl.uniprot.org/uniprot/", "", "i") as ?id) } NCBI Taxonomy: Aspergillus nidulans
  • 64. The query that returns the relevant proteins WHERE { ?protein a up:Protein . ?protein up:organism ?organism . ?organism rdfs:subClassOf taxon:162425 . ?protein up:classifiedWith ?go . ?go rdfs:subClassOf* <http://purl.obolibrary.org/obo/GO_0006396> . bind(replace(str(?protein), "http://purl.uniprot.org/uniprot/", "", "i") as ?id) } Gene Ontology: RNA Processing
  • 65. Create and publish a FAIR Accessor for that query http://linkeddata.systems/Accessors/UniProtAccessor Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ...
  • 66. Create and publish a FAIR Accessor for that query http://linkeddata.systems/Accessors/UniProtAccessor Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... Resolve the URI (in software or in your browser)
  • 67. Create and publish a FAIR Accessor for that query Returns a page of metadata (in this example, in RDF) Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ...
  • 68.
  • 69. 70
  • 70. 71 Note that this Metadata is about ME! I am the creator of this dataset, and may be credited for it.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ...
  • 76. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET Step down to individual Record metadata
  • 77. Step down to individual Record metadata MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET Software calls HTTP GET on the URL representing the MetaRecord Resource for the desired record in the Container (or just click on it, or type it into your browser)
  • 78. <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml The document that is returned
  • 79.
  • 80. Note the change in metadata focus! This metadata is about the UniProt Record (not about Mark Wilkinson). The record described in this metadata was created by UniProt, so the citation and authorship information is now THEIRS, not MINE.
  • 81. Container Resource Symmetrical Link back upward to the Accessor Container, for additional metadata
  • 82. <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml
  • 83. Two ways to retrieve the record - RDF or HTML (in REST-speak, two Representations of that Resource)
  • 84. Note that this metadata record is somewhat more FAIR, than what you can (easily) retrieve from UniProt itself! e.g. the UniProt record does not include the citation or license information - you have to manually surf around the UniProt Web page to find that. So the Accessor makes UniProt’s already notably FAIR data, even more FAIR (with respect to “R”)
  • 85. How FAIR are we now? What does the Accessor give us?
  • 86. What we have achieved We have created a FAIR record for something - i.e. a slice of a database - that was, historically, un-recordable and un-identifiable in any formal way.F F + A F + R Accessors are a standard approach to providing human & machine accessible metadata to facilitate appropriate discovery (contextual, biological), proper usage (license) and proper citation for any kind of data. The discovery, accessibility, and drill-down/up behaviors do not require any novel API, rather simply rely on global Web standards; this allows them to be indexed by existing Web search engines
  • 87. What we have achieved F + AI The metadata itself uses machine-accessible syntaxes, and widely adopted ontologies and vocabularies, thus easily integrates with other metadata A Accessors provide a lightweight means to protect privacy while still providing the maximum degree of transparency possible + Accessors can be static, or dynamic. i.e. we can provide template Accessor file(s) that are edited in Notepad, then published together with the data; or Accessors can dynamically generate their output from code (e.g. layered on a database server)
  • 88. So far, we have focused on FAIR Metadata
  • 89. Are there approaches to making the DATA FAIR?
  • 90. Making a Plant-related Resource FAIR FAIR reformatting of the plant component of the Pathogen Host Interaction Database (PHI-base)
  • 91. Making a Plant-related Resource FAIR Dr. Mikel Egaña Aranguren Ontologist Dr. Alejandro Rodríguez González Database Expert Dr. Alejandro Rodríguez Iglesias (PhD student at the time) Rodriguez-Iglesias A., Rodriguez-González A., Irvine AG., Sesma A., Urban M., Hammond-Kosack KE., Wilkinson MD. 2016. Publishing FAIR Data: An Exemplar Methodology Utilizing PHI-Base. Frontiers in Plant Science 7.
  • 92. Extract  Transform  Load A “Brute Force” approach to FAIRness Requires a ~comprehensive data/semantic model Making a Plant-related Resource FAIR
  • 93. The Plant Pathogen Interaction Ontology (PPIO) Written in OWL2 Many of the Classes are defined by rich logical axioms Designed for automated classification and enrichment of data through logical reasoning (e.g. if attached to a data stream) Semantic Modeling of Plant Pathogen Interaction Data
  • 94. General introduction 95 The Disease Triangle – Pathogen/Host/Environment http://fyi.uwex.edu/fieldcroppathology/field-crops-fungicide-information/
  • 95. General introduction 96 The Disease Triangle – Pathogen/Host/Environment This concept has evolved over decades of domain-expert thought and discussion Why not use this as the basis for our Semantic model of Pathogenicity?
  • 96. The Disease Triangle, Modelled as “Contexts” Interaction Context Environmental Context Host Context Pathogen Context Resulting phenotype
  • 97. Interaction Context Phenotype Resistance phenotype Susceptibility phenotype Phenotypic Process 1. Abnormal growth development phenotype. 2. Color variation phenotype. 3. Tissue disintegration phenotype. 4. Vascular system damage phenotype.
  • 100. The Disease Triangle, Modelled as “Contexts” Interaction Context Environmental Context Host Context Pathogen Context Resulting phenotype
  • 101. Environmental Context manually extracted from the literature Environmental Context
  • 102. Environmental Context manually extracted from the literature Environmental Context
  • 103. Environmental Context manually extracted from the literature Environmental Context
  • 104. A pathogen that enters through the stomata will be more successful in high humidity, and have higher pathogenicity Environmental Context manually extracted from the literature Environmental Context
  • 105. The Disease Triangle, Modelled as “Contexts” Interaction Context Environmental Context Host Context Pathogen Context Resulting phenotype
  • 106. • Plant-pathogen interaction data, including: • Resulting phenotypes • Molecular/genetic basis of pathogenicity • Experimental approaches • Provenance information Host Context Pathogen Context
  • 107. • 4800 interactions • 3300 gene-mutant records • 220 pathogens • 130 hosts • 261 registered diseases • 1700 references Host Context Pathogen Context
  • 108. Interaction Context Interaction Context [WT] Interaction Context [mutant] Host Context Host Context Pathogen Context Pathogen Context Description DescriptionProtocol Protocol “Historical observation” “Base state” “reduced virulence” “soft rot” Protocol descriptionCitation PubMed ID “PMID:1234” “gene deletion” Environmental Context Environmental Context Rodríguez-Iglesias A. et al Front. Plant Sci., (2016)
  • 109. Interaction Context Interaction Context [WT] Interaction Context [mutant] Host Context Host Context Pathogen Context Pathogen Context Description DescriptionProtocol Protocol “Historical observation” “Base state” “reduced virulence” “soft rot” Protocol descriptionCitation PubMed ID “PMID:1234” “gene deletion” Environmental Context Environmental Context Rodríguez-Iglesias A. et al Front. Plant Sci., (2016)
  • 114. Pathogen Context Allele Gene Locus ID Gene function Gene name Gene accession “AEQ95741” “Effector protein” “TAL2G” “G7TJZ8” Rodríguez-Iglesias A. et al Front. Plant Sci., (2016) http://identifiers.org/*/* KEY MESSAGE: Because we use identifiers.org URIs, and AgroLD does also, we can query our Pathogen Host Interaction database, and DYNAMICALLY RETRIEVE additional information from AgroLD with NO additional effort!!
  • 115. Transform PHI-base data into RDF compliant with the PPIO Ontology Load into Virtuoso Triplestore
  • 116. Transform PHI-base data into RDF compliant with the PPIO Ontology Load into Virtuoso Triplestore (this was a LOT of work!!)
  • 117. Transform PHI-base data into RDF compliant with the PPIO Ontology Load into Virtuoso Triplestore …but are we now FAIR?
  • 118. Transform PHI-base data into RDF compliant with the PPIO Ontology Load into Virtuoso Triplestore …but are we now FAIR? …Not really….
  • 119. Findable Accessible Interoperable ReusableX X HTTP GET, SPARQL, open access RDF with published ontologies
  • 120. Findable Accessible Interoperable ReusableX X • How would you find this database? • How would you know if anything interesting is in it? • How would you (your machine) find a record? • Who do you cite if you reuse a piece of data? • What are the license conditions? • Can I reuse the data at all?? HTTP GET, SPARQL, open access RDF with published ontologies
  • 121. Build a FAIR Accessor http://linkeddata.systems/SemanticPHIBase/Metadata Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic SemPHI #1 dcat:Distribution_1 Source URL_U1 Format rdf+xml dcat:Distribution_2 Source URL_U2 Format HTML HTTP GET
  • 122. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic SemPHI #1 dcat:Distribution_1 Source URL_U1 Format rdf+xml dcat:Distribution_2 Source URL_U2 Format HTML HTTP GET Build a FAIR Accessor http://linkeddata.systems/SemanticPHIBase/Metadata The URL of the record in “native” PHI-base
  • 123. Container Resource HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic SemPHI #1 dcat:Distribution_1 Source URL_U1 Format rdf+xml dcat:Distribution_2 Source URL_U2 Format HTML HTTP GET Build a FAIR Accessor http://linkeddata.systems/SemanticPHIBase/Metadata The URL of the (RDF) record in Semantic PHI-base
  • 124. This allows us to find Semantic PHI-base based on its Repository-level Metadata “what kind of data does Semantic PHI-base Contain?” “Does it have any information about my gene of interest?” And “drill-down” to a record of interest selected based on its Record Metadata
  • 125. But… FAIR Accessors should be symmetrical How do I go from data back “upwards” to metadata?
  • 126. But… FAIR Accessors should be symmetrical How do I go from data back “upwards” to metadata? To allow the retrieval of the Metadata for any piece of data in Semantic PHI Base Use the URL of the FAIR Accessor (Container Resource) as the URL of the “Named Graph” in the triplestore using RDF “Quads” SubjectURI  PredicateURI  ObjectURI  ContextURL Container URL HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ...
  • 127. But… FAIR Accessors should be symmetrical How do I go from data back “upwards” to metadata? Container URL HTTP GET <FAIR metadata/> Contains MetaRecordResource1 MetaRecordResource2 MetaRecordResource3 ... To allow the retrieval of the Metadata for any piece of data in Semantic PHI Base Use the URL of the FAIR Accessor (Container Resource) as the URL of the “Named Graph” in the triplestore using RDF “Quads” SubjectURI  PredicateURI  ObjectURI  Container URL
  • 129. The Brute Force approach is… a lot of work! Worthwhile for community-critical resources and databases like AgroLD, UniProt, PHI-base, ChEMBL, etc.
  • 130. Is there a more “elegant” & lightweight way to be FAIR?
  • 131. FAIR Projection: Providing FAIR Data from non-FAIR Data Dynamically
  • 132. This is going to be a bit complicated, but please be patient
  • 133. Imagine the data we need to integrate is in a CSV file in FigShare or Zenodo How do we discover and integrate that data?
  • 134. Things we need to do: We need a way to query “opaque” data blobs (like CSV) about their content We need a way to retrieve that content in a FAIR format We need, therefore, to model semantics for that opaque data content We need to model various semantics for that content (one “size” doesn’t fit all!) We need to associate those semantic models with a record or record-sets We need a way to query those semantics determine which “size” fits our req’s We would like to reuse semantic definitions as much as possible We need to do all of this without creating a new API :-)
  • 135. Triple Pattern Fragments + RDF Mapping Language Ruben Verborgh Ghent University Anastasia Dimou Ghent University
  • 136. Triple Pattern Fragments (TPF) A REST interface for requesting/retrieving RDF Triples (from any source) Ruben Verborgh “Slices” of data, from any source, are considered Resources and are therefore represented by a distinct URL: http://some.database.org/dataset?s=___;p=___;o=___ Calling HTTP GET on a TPF URL returns the set of Triples matching {?s, ?p, ?o} PLUS hypermedia instructions and Resource URLs for other relevant slices.
  • 137. Triple Pattern Fragments (TPF) A REST interface for retrieving RDF Triples (from any source) Ruben Verborgh For example, the “BMI” column from a patient registry is a Resource with the URL: http://my.registry.org/patients?p=CMO:0000105 (CMO:0000105 = “body mass index””) HTTP GET gives me all BMI triples in the registry, together with other Resource URLs representing other “slices” that might be useful, for example: http://my.registry.org/patients?p=CMO:0000004 (CMO:0000004 = “systolic B.P.”)
  • 138. Triple Pattern Fragments (TPF) A REST interface for retrieving RDF Triples (from any source) Ruben Verborgh For example, the “BMI” column from a patient registry is a Resource with the URL: http://my.registry.org/patients?p=CMO:0000105 (CMO:0000105 = “body mass index””) HTTP GET gives me all BMI triples in the registry, together with other Resource URLs representing other “slices” that might be useful, for example: http://my.registry.org/patients?p=CMO:0000004 (CMO:0000004 = “systolic B.P.”)
  • 139. We have a standard, RESTful way to request triples from any data source i.e. every slice of every dataset will be considered a distinct Resource → simply call HTTP GET on that Resource to get the Triples
  • 140. But... We have no way to know what TPF Resources are available for any given dataset or what those Resources “are” (proteins? genes? patients? articles?)
  • 141. RML A way to describe the structure of an RDF document Anastasia Dimou RML allows us to create models of (meta)data structures “What could this data look like, if it were mapped to RDF?” RML fulfills similar objectives to DCAT Profiles, the Dublin Core Application Profile, and ISO 11179 - Metadata Registries; but has added advantages! http://rml.io/RMLmappingLanguage.html
  • 142. Using RML to describe the structure and semantics of a single Triple Map1 Predicate Object Map Subject Map Object Map ex:Patient Record subjectMap template “http://example.org/patient/{id}” predicate ex:has Variant Map2 Subject Map2 SO:0000694 (“SNP”) template “http://identifiers.org/dbsnp/{snp}” T H E M O D E L
  • 143. Using RML to describe the structure and semantics of a single Triple Map1 Predicate Object Map Subject Map Object Map ex:Patient Record subjectMap template “http://example.org/patient/{id}” predicate ex:has Variant Map2 Subject Map2 SO:0000694 (“SNP”) template “http://identifiers.org/dbsnp/{snp}” T H E M O D E L We call this a “Triple Descriptor” These are used to describe the structure of data “slices” in which all Triples have the same structure
  • 144. T H E M O D E L Using RML to describe the structure and semantics of a single Triple Map1 Predicate Object Map Subject Map Object Map ex:Patient Record subjectMap template “http://example.org/patient/{id}” predicate ex:has Variant Map2 Subject Map2 SO:0000694 (“SNP”) template “http://identifiers.org/dbsnp/{snp}” Patient:123 rdf:type ex:Patient Record snp: rs0020394 ex:hasVariant rdf:type ex:Patient Record The Data
  • 145. Where are we now? TPF - A standard, RESTful way to request Triples Triple Descriptors - A standard way to describe the structure and meaning of a Triple
  • 146. Where are we now? TPF - A standard, RESTful way to request Triples Triple Descriptors - A standard way to describe the structure and meaning of a Triple We need a way to associate these with each other We need a way to associate these with a dataset or record
  • 147. Luckily, we have already solved this!
  • 148. MetaRecord Resource3 <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml HTTP GET The FAIR Accessor can do this Using the metadata structures defined by DCAT the FAIR Accessor also tells you how to get the content of the record, and what formats are available
  • 149. If we consider the TPF Resource URL to be just another DCAT Distribution, we get... <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPF_URL Format rdf+xml
  • 150. <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPF_URL Format rdf+xml If we consider the TPF Resource URL to be just another DCAT Distribution, we get... URL representing the Triple Pattern Fragment Resource
  • 151. If we consider the TPF Resource URL to be just another DCAT Distribution, we get… now add the Triple Descriptor <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPFrag_URL_1 format rdf+xml Model: Triple_Desc_URL <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPF_URL Format rdf+xml Model: Triple_Desc_URL
  • 152. <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPF_URL_1 Format rdf+xml Model: Triple_Desc_URL If we consider the TPF Resource URL to be just another DCAT Distribution, we get… now add the Triple Descriptor HTTP GET on that URL returns:
  • 153. <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPF_URL Format rdf+xml Model: Triple_Desc_URL If we consider the TPF Resource URL to be just another DCAT Distribution, we get… now add the Triple Descriptor Record + TPF Server + RML Model = FAIR Projector
  • 154. If we consider the TPF Resource URL to be just another DCAT Distribution, we get… now add the Triple Descriptor HTTP GET on TPF_URL returns rdf+xml triples from Record R That look like Interoperability without Brute Force <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPF_URL Format rdf+xml Model: Triple_Desc_URL
  • 155. I hear you objecting… I skipped something important!!! We still have not defined a way to CREATE these triples
  • 156. <FAIR metadata/> foaf:primaryTopic Record R dcat:Distribution_1 Source URL_U1 format rdf+xml dcat:Distribution_2 Source URL_U2 format application/xml dcat:Distribution_3 Source TPF_URL Format rdf+xml Model: Triple_Desc_URL I hear you objecting… I skipped something important!!! How does this return Triples? We still have not defined a way to CREATE these triples
  • 157. Sadly, there is no magic wand to create interoperability
  • 158. Sadly, there is no magic wand to create interoperability Someone has to write the TPF server that converts the data Interoperability will never come “for free” (because semantics will never come “for free”)
  • 159. However, there are reasons for optimism! 1. Researchers transform data anyway to integrate it - this is a daily routine in most bioinformatics labs 2. For the most common file formats (e.g. CSV or Excel), there are RML-based tools to automate the RDF transformation; simply create an RML model of what you want, and ask the tool to covert the file. 3. Investing time into creating an RML model is more FAIR than ad hoc “re-useless” brute-force transformation. When you create a FAIR Projector for your own data transformation needs, it is reusable!
  • 160. However, there are reasons for optimism! AND 4. RML Triple Descriptors are very simple (one triple!) so we can also templatize their construction  creating a FAIR Projector is quite easy in many cases! 5. Citations Citations Citations! FAIR Accessors/Projectors are FAIR objects - You can get credit if other people use your Projector for their analyses
  • 161. Summary of FAIR Projectors FAIR Projectors provide a discoverable and standardized REST interface to retrieve interoperable data, and its interoperable metadataF+I+R A + I I FAIR Projectors can convert non-FAIR data into FAIR data, or can change the structure, URL format, or semantics of existing FAIR data sources FAIR Projectors can be deployed over, and provide a common interface to: - Static Data Deposits, in any format, anywhere - Databases - Triplestores - Certain (common) types of Web Services R+++ Triple Descriptors are FAIR entities, intended for reuse, & None of this required a new API
  • 162. Siri… I need data about the expression of the Oryza dwarf-1 gene under high salt conditions. Please find that data, regardless of location and format If possible, please reformat it automatically to match my local dataset Also, please collect the citation information for each piece of data If the data is not under an open access license, or if any of the data is behind a firewall or paywall, please provide me the contact information of the data owner so that I can ask them for a copy. The (near!) future of FAIR
  • 163. Thanks to: Michel Dumontier - Stanford Center for Biomedical Informatics Research, Stanford, California. Ruben Verborgh – Ghent University – imec, Ghent, Belgium Luiz Olavo Bonino da Silva Santos - Dutch Techcentre for Life Sciences, Utrecht, The Netherlands - Vrije Universiteit Amsterdam, Amsterdam, The Netherlands. Tim Clark - Department of Neurology, Massachusetts General Hospital Boston MA and Harvard Medical School, Boston, MA, USA Morris A. Swertz - Genomics Coordination Center and Department of Genetics, University Medical Center Groningen, Groningen, The Netherlands Fleur D.L. Kelpin - Genomics Coordination Center and Department of Genetics, University Medical Center Groningen, Groningen, The Netherlands Alasdair J. G. Gray - Department of Computer Science, School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK Erik A. Schultes - Department of Human Genetics, Leiden University Medical Center, The Netherlands Erik M. van Mulligen - Department of Medical Informatics, Erasmus University Medical Center Rotterdam, The Netherlands Paolo Ciccarese - Perkin Elmer Innovation Lab, Cambridge MA and Harvard Medical School, Boston MA, USA Mark Thompson - Leiden University Medical Center, Leiden, The Netherlands Jerven T. Bolleman - Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
  • 164. Thanks to my former lab members… I MISS YOU!!! Dr. Mikel Egaña Aranguren Ontologist Dr. Alejandro Rodríguez González Database Expert Dr. Alejandro Rodríguez Iglesias (PhD student at the time)
  • 165. Funding for Mark Wilkinson from: Fundacion BBVA and the UPM Isaac Peral programme, and the Spanish Ministerio de Economía y Competitividad grant number TIN2014-55993-R. Additional support for FAIR Skunkworks members comes from: European Union funded projects ELIXIR-EXCELERATE (H2020 no. 676559), ADOPT BBMRI-ERIC (H2020 no. 676550) CORBEL (H2020 no. 654248) Netherlands Organisation for Scientific Research (Odex4all project) Stichting Topconsortium voor Kennis en Innovatie High Tech Systemen en Materialen (FAIRdICT project) BBMRI-NL RD-Connect and ELIXIR (Rare disease implementation study FP7 no. 305444).
  • 166. You are not only welcome to share and reuse this presentation... ...You are encouraged to! http://tinyurl.com/IBC-FAIR
  • 167. Important: All our templates are free to use under Creative Commons Attribution License. If you use the graphic assets (photos, icons and typographies) included in this Google Slides Templates you must keep the Credits slide or add all attributions in the last slide notes. FGST Free GoogleSlides Templates Some graphical elements were taken from slide templates provided by:

Editor's Notes

  1. http://www.freedigitalphotos.net/images/Business_people_g201-Businessman_shaking_hand_p88735.html
  2. One of the cornerstones in plant pathology is the disease triangle , which states that for a disease to occur, three factors must be present: a virulent pathogen, a susceptible host and a propitious environment (case a.). If one of the components is manipulated somehow, the disease severity will be affected (rest of the cases). Thus, host-pathogen interactions are contextually-dependent on internal (e.g. genotype) and external factors (e.g. environment).
  3. One of the cornerstones in plant pathology is the disease triangle , which states that for a disease to occur, three factors must be present: a virulent pathogen, a susceptible host and a propitious environment (case a.). If one of the components is manipulated somehow, the disease severity will be affected (rest of the cases). Thus, host-pathogen interactions are contextually-dependent on internal (e.g. genotype) and external factors (e.g. environment).
  4. Therefore, for every Interaction Context, there must be an Environment Context, a Host Context and a Pathogen Context. Moreover, the outcome of the interaction can result on a resistance response or the development of the disease. These outcome come in form of an observable phenotype. I will now describe how all these elements were modeled.
  5. Significant effort was made towards modeling phenotypes in PPIO, both pathogenic and plant phenotypes. An infection of a susceptible host will usually result in an altered phenotype, produced through some physiological infective process. We model these using two classes. The first one, PHENOTYPE is the semantic representation of the outcome of the interaction, that is, a resistance response or the development of the disease. The RESISTANCE PHENOTYPE class contains a set of subclasses representing the different resistance mechanisms, such as HR, cell wall reinforcement or production of antimicrobial compounds. The disease situation, modeled as SUSCEPTIBILITY PHENOTYPE class, was divided into four general subclasses: ABNORMAL GROWTH DEVELOPMENT PHENOTYPE COLOR VARIATION PHENOTYPE TISSUE DISINTEGRATION PHENOTYPE VASCULAR SYSTEM DAMAGE PHENOTYPE We also created the PHENOTYPIC PROCESS class to represent the symptoms plants exhibit due to alterations at the cellular level. Every phenotypic response in PPIO is deeply axiomatized and connected to one or more of the SUSCEPTIBILITY PHENOTYPE subclasses, according to the symptomatology of the type of lesion.
  6. This example shows how the CHLOROTIC STRIPE LESION subclass was axiomatically modeled as resulting in a COLOR VARIATION PHENOTYPE subclass., showing the connection between the two classes I just mentioned. In some cases, the interconnection between these two classes was more complex. In this example, CANKER subclass was classified and connected with two Phenotype subclasses, such as the Vascular system damage phenotype and theTissue disintegration phenotype. At this point, and following best ontology and FAIR practices, we decided to make use of the Plant Trait Ontology. This platform, part of the Gramene Project, contains semantic descriptions of physiological, biochemical and molecular plant traits. By importing this into our own ontology, we made possible the interconnection of every phenotypic response and the proper PTO traits affected by it, as in these two examples. As a result, the interconnection of the Susceptibility phenotype, Phenotypic process and the PTO classes assured a solid way to express the disease outcome situation for every interaction.
  7. This example shows how the CHLOROTIC STRIPE LESION subclass was axiomatically modeled as resulting in a COLOR VARIATION PHENOTYPE subclass., showing the connection between the two classes I just mentioned. In some cases, the interconnection between these two classes was more complex. In this example, CANKER subclass was classified and connected with two Phenotype subclasses, such as the Vascular system damage phenotype and theTissue disintegration phenotype. At this point, and following best ontology and FAIR practices, we decided to make use of the Plant Trait Ontology. This platform, part of the Gramene Project, contains semantic descriptions of physiological, biochemical and molecular plant traits. By importing this into our own ontology, we made possible the interconnection of every phenotypic response and the proper PTO traits affected by it, as in these two examples. As a result, the interconnection of the Susceptibility phenotype, Phenotypic process and the PTO classes assured a solid way to express the disease outcome situation for every interaction.
  8. Therefore, for every Interaction Context, there must be an Environment Context, a Host Context and a Pathogen Context. Moreover, the outcome of the interaction can result on a resistance response or the development of the disease. These outcome come in form of an observable phenotype. I will now describe how all these elements were modeled.
  9. By the time the PPIO was created, we found no trace of semantic databases storing environmental data that allowed automatic extraction, so we were limited to extract the initial knowledge from the literature. We selected moisture, nutrient availability, soil traits and temperature as main subclasses of the Environmental parameter class, modeled here using Protégé. Relation between infection and environment is shown in this simple example. A combination of the environmental annotation of HIGH HUMIDITY (that favours stomatal opening) and a pathogen annotation like ENTRY THROUGH STOMATA could be passed to a logical reasoner which could automatically generate the novel assumption that a particular pathogen would be more succesful under those conditions. In the future, we expect to achieve a rich axiomatiaztion of all these parameters to provide powerful reasoning behaviors.
  10. By the time the PPIO was created, we found no trace of semantic databases storing environmental data that allowed automatic extraction, so we were limited to extract the initial knowledge from the literature. We selected moisture, nutrient availability, soil traits and temperature as main subclasses of the Environmental parameter class, modeled here using Protégé. Relation between infection and environment is shown in this simple example. A combination of the environmental annotation of HIGH HUMIDITY (that favours stomatal opening) and a pathogen annotation like ENTRY THROUGH STOMATA could be passed to a logical reasoner which could automatically generate the novel assumption that a particular pathogen would be more succesful under those conditions. In the future, we expect to achieve a rich axiomatiaztion of all these parameters to provide powerful reasoning behaviors.
  11. By the time the PPIO was created, we found no trace of semantic databases storing environmental data that allowed automatic extraction, so we were limited to extract the initial knowledge from the literature. We selected moisture, nutrient availability, soil traits and temperature as main subclasses of the Environmental parameter class, modeled here using Protégé. Relation between infection and environment is shown in this simple example. A combination of the environmental annotation of HIGH HUMIDITY (that favours stomatal opening) and a pathogen annotation like ENTRY THROUGH STOMATA could be passed to a logical reasoner which could automatically generate the novel assumption that a particular pathogen would be more succesful under those conditions. In the future, we expect to achieve a rich axiomatiaztion of all these parameters to provide powerful reasoning behaviors.
  12. By the time the PPIO was created, we found no trace of semantic databases storing environmental data that allowed automatic extraction, so we were limited to extract the initial knowledge from the literature. We selected moisture, nutrient availability, soil traits and temperature as main subclasses of the Environmental parameter class, modeled here using Protégé. Relation between infection and environment is shown in this simple example. A combination of the environmental annotation of HIGH HUMIDITY (that favours stomatal opening) and a pathogen annotation like ENTRY THROUGH STOMATA could be passed to a logical reasoner which could automatically generate the novel assumption that a particular pathogen would be more succesful under those conditions. In the future, we expect to achieve a rich axiomatiaztion of all these parameters to provide powerful reasoning behaviors.
  13. Therefore, for every Interaction Context, there must be an Environment Context, a Host Context and a Pathogen Context. Moreover, the outcome of the interaction can result on a resistance response or the development of the disease. These outcome come in form of an observable phenotype. I will now describe how all these elements were modeled.
  14. Although we already had modeled both the Host and Pathogen conceptual entities, there was a need for modeling a set of additional concepts around them. The previously mentioned Pathogen-Host Interaction Database (PHI-base) was used as a source of the data we needed to model. This is an important resource for plant sciences , and while not limited to plants only, it captures the data relevant to thousands of plant/pathogen interactions, the resulting phenotypes, in many cases information about the molecular/genetic basis of pathogenicity, experimental approaches and provenance information. After obtaining the permission of the PHI-base consortium, we extracted the plant-portion data, containing around 4800 different interactions, 3300 gene records, interactions, 225 pathogens, 132 hosts, with 261 registered diseases and 1693 references.
  15. Although we already had modeled both the Host and Pathogen conceptual entities, there was a need for modeling a set of additional concepts around them. The previously mentioned Pathogen-Host Interaction Database (PHI-base) was used as a source of the data we needed to model. This is an important resource for plant sciences , and while not limited to plants only, it captures the data relevant to thousands of plant/pathogen interactions, the resulting phenotypes, in many cases information about the molecular/genetic basis of pathogenicity, experimental approaches and provenance information. After obtaining the permission of the PHI-base consortium, we extracted the plant-portion data, containing around 4800 different interactions, 3300 gene records, interactions, 225 pathogens, 132 hosts, with 261 registered diseases and 1693 references.
  16. For each of these interactions, we actually created two contexts: an interaction context regarding the WT situation and another context where the pathogen genetic context is altered. Both these two contents contained an Enviromental, a Host and a Pahtogen context, a Description and a Protocol class. Description class was aimed to represent the literal phenotypic descriptions per se. The Protocol class describes the experimental approaches undertaken to study the pathogen genetic background in each case. Since PHI Base only contains genetic and experimental data regarding the mutant panorama, the WT context was modeled with basic information. The Mutant context situation, on the other hand, was deeply modeled, adding all the information PHI-Base contained, like the mutant-derived phenotypic outcome, the experimental approach chosen and its provenance information (including the PubMed article ID). During this modelling phase we reuse class names and property names from a wide variety of third-party ontologies, like the Relation Ontology, EDAM Ontology, Experimental Factor Ontology or Schema.org. One of the most useful information PHI-Base provided was the pathogen genetic framework, with information about 3000 gene records.
  17. For each of these interactions, we actually created two contexts: an interaction context regarding the WT situation and another context where the pathogen genetic context is altered. Both these two contents contained an Enviromental, a Host and a Pahtogen context, a Description and a Protocol class. Description class was aimed to represent the literal phenotypic descriptions per se. The Protocol class describes the experimental approaches undertaken to study the pathogen genetic background in each case. Since PHI Base only contains genetic and experimental data regarding the mutant panorama, the WT context was modeled with basic information. The Mutant context situation, on the other hand, was deeply modeled, adding all the information PHI-Base contained, like the mutant-derived phenotypic outcome, the experimental approach chosen and its provenance information (including the PubMed article ID). During this modelling phase we reuse class names and property names from a wide variety of third-party ontologies, like the Relation Ontology, EDAM Ontology, Experimental Factor Ontology or Schema.org. One of the most useful information PHI-Base provided was the pathogen genetic framework, with information about 3000 gene records.
  18. Extensive modeling of this data was also performed, that began by connecting the Pathogen Context and the Gene class via a conceptual Allele class. The Gene class was further annotated with additional information PHI-Base provided, like… As a summary, by combining literature data, third-party ontologies and web resources like PHI-Base around a central plant pathology paradigm, we constructed the first semantic platform that fully describes the plant-pathogen-environment interaction knowledge domain, achieving our key goal of plant pathology knowledge modelling.
  19. Extensive modeling of this data was also performed, that began by connecting the Pathogen Context and the Gene class via a conceptual Allele class. The Gene class was further annotated with additional information PHI-Base provided, like… As a summary, by combining literature data, third-party ontologies and web resources like PHI-Base around a central plant pathology paradigm, we constructed the first semantic platform that fully describes the plant-pathogen-environment interaction knowledge domain, achieving our key goal of plant pathology knowledge modelling.
  20. Extensive modeling of this data was also performed, that began by connecting the Pathogen Context and the Gene class via a conceptual Allele class. The Gene class was further annotated with additional information PHI-Base provided, like… As a summary, by combining literature data, third-party ontologies and web resources like PHI-Base around a central plant pathology paradigm, we constructed the first semantic platform that fully describes the plant-pathogen-environment interaction knowledge domain, achieving our key goal of plant pathology knowledge modelling.
  21. Extensive modeling of this data was also performed, that began by connecting the Pathogen Context and the Gene class via a conceptual Allele class. The Gene class was further annotated with additional information PHI-Base provided, like… As a summary, by combining literature data, third-party ontologies and web resources like PHI-Base around a central plant pathology paradigm, we constructed the first semantic platform that fully describes the plant-pathogen-environment interaction knowledge domain, achieving our key goal of plant pathology knowledge modelling.
  22. Extensive modeling of this data was also performed, that began by connecting the Pathogen Context and the Gene class via a conceptual Allele class. The Gene class was further annotated with additional information PHI-Base provided, like… As a summary, by combining literature data, third-party ontologies and web resources like PHI-Base around a central plant pathology paradigm, we constructed the first semantic platform that fully describes the plant-pathogen-environment interaction knowledge domain, achieving our key goal of plant pathology knowledge modelling.