SlideShare a Scribd company logo
1 of 45
Journées Scientifiques de Rochebrune 2023 (JSR'23)
Slava Tykhonov, R&D
(DANS-KNAW, the Netherlands)
29 March 2023
Decentralized research data infrastructure
and knowledge graphs
About me: DANS-KNAW projects (2016-2023)
● MuseIT H2020 (ongoing)
● Polifonia H2020 (ongoing)
● CLARIAH+ (ongoing)
● ODISSEI (ongoing)
● EOSC Synergy
● SSHOC Dataverse
● CESSDA Dataverse Europe 2018
● Time Machine Europe Supervisor at DANS-KNAW
● PARTHENOS Horizon 2020
● CESSDA PID (PersistentIdentifiers) Horizon 2020
● CLARIAH-NL
● RDA (Research Data Alliance) PITTS Horizon 2020
● CESSDA SaW H2020-EU.1.4.1.1 Horizon 2020
2
Source: LinkedIn
Building an Operating System for Open Science
3
● Generic Common Research and Data Infrastructure should be distributed
and robust enough to be scaled up and reused for any challenging tasks like
cancer research etc
● Networked services built from Open Source components
● Data processed and published in FAIR way, the provenance information is
the part of our Data Lake
● Data evaluation and credibility is the top priority, we’re providing tools for the
expert community for the verification of our datasets
● The transparency of data and services guarantees the reproducibility of all
experiments and get bring new insights in the multidisciplinary research
● Infrastructure should enforce collaboration between people, bring together
general public, researchers, citizen scientists, etc
● Infrastructure is free of charge, (meta)data is protected and licenced.
Looking for Commons
Merce Crosas, “Harvard Data Commons”
4
Building a horizontal platform to serve vertical teams
Source: CoronaWhy infrastructure introduction 5
DANS Data Stations - Future Data Services
Dataverse is API based data platform and a key framework for Open Innovation!
What is Dataverse?
● Open source data repository developed by IQSS of Harvard University
● Great product with very long history (from 2006) created by experienced and
Agile development team
● Clear vision and understanding of research communities requirements, public
roadmap
● Well developed architecture with rich APIs allows to build application layers
around Dataverse
● Strong community behind of Dataverse is helping to improve the basic
functionality and develop it further.
● DANS-KNAW delivered production ready (Docker/k8s) Dataverse repository for
the European Open Science Cloud (EOSC) communities CESSDA, CLARIN and
DARIAH.
● Dataverse is de facto standard for FAIR data repositories in Europe with wide
adoption in the Netherlands, France, Norway, Portugal in other EU countries
Data integration challenges
● datasets are very heterogeneous and multilingual
● data usually lacks sufficient data quality control
● data providers using different modeling schemas and styles
● linked data cleansing and versioning is very difficult to track and maintain
properly, web resources aren’t persistent
● even modern data repositories providing only metadata records
describing data without giving access to individual data items stored in
files
● difficult to assign and manually keep up-to-date entity relationships in
knowledge graphs
8
Benefits of the Common Data Infrastructure
● It’s distributed and sustainable, suitable for the future
● maintenance costs will drop massively, as more organizations will join,
less expensive it will be to support
● maintenance costs could be reallocated to the training and further
development of the new (common) features
● reuse of the same infrastructure components will enforce the quality and
the speed of the knowledge exchange
● building a multidisciplinary teams reusing the same infra can bring us new
insights and unexpected views
● Common Data Infrastructure plays a role of the “universal gravity” power
for Data Science projects
(and so on…)
Semantic interoperability on the infrastructure level
We envision a situation where thousands of Dataverse instances (due to EOSC) on the web
can be simultaneously search for data and will form shared Data Lake.
The old dream of Federated search/Universal catalogue can only be realised if:
(1) Crosswalks; mapping across different metadata schemes are implemented
(2) In metadata schemes we seek for ways to enrich indexes with values from controlled
vocabularies
Standard response (centralized) = standardisation and harmonisation = repository software,
certain metadata standards, or certain controlled vocabularies
New response (distributed) = explore agile solutions (Proof of Concepts) which can be
implemented by different communities (even smaller ones), so we keep variety and still enable
integration in the Distributed Data Network by applying Linked Data technologies.
“Archive in a box” features (SSHOC Dataverse)
● Dockerized version of Dataverse application and shared networked services
● fully automatic Dataverse deployment with Traefik proxy
● Dataverse configuration managed through environmental file .env
● different Dataverse distributions with services on your preference suitable for different
use cases and research communities
● external controlled vocabularies support (demo of CESSDA CMM metadata fields
connected to Skosmos framework)
● S3 compatible MinIO storage support for Cloud Storage
● data previewers integrated in the Dataverse distribution
● startup process managed through scripts located in init.d folder
● automatic SOLR reindex
● external services integration with PostgreSQL triggers
● support of custom metadata schemes (CESSDA CMM, CLARIN CMDI, ...)
● built-in Web interface localization uses Dataverse language pack to support multiple
languages out of the box
https://github.com/IQSS/dataverse-docker
“Archive in a box” infra suitable both for academics and industry
Source: Citizen Science and Open Science Core Concepts and Areas of Synergy (Vohland and Göbel, 2017)
Anyone can
setup own
digital
archive and
share the
content in
distributed
infra
Decentralized
FAIR
Dataverse
network with
APIs to share
(meta)data,
search,
storage and
provenance
Open vs Closed Innovation in the decentralized world
Open Data vs Restricted (Sensitive) Data
Credits: OECD
Can Data still be Sensitive and FAIR in the same time?
Building FAIR decentralized data network for any type of content
Source: Wikipedia
We’re considering experimental implementation of the decentralized identifiers for controlled
vocabularies and content types extension to archive various types of content.
DIDs can be assigned to any artefacts including images, audio and video, for example, to store and link
metadata records and provenance information together with their digitized content.
DID can be private (invisible and not resolvable for public) but available for access with cryptokey.
DOI costs for Open Data
DataCite agency charge some fee from data providers depending on the amount of identifiers
and it can be significant amount starting from 1 million DOIs. What about DIDs?
Typical problems of “centralized” identifiers
Disambiguation and authorship issues:
● two authors with the same name mentioned in different papers, how do you know who is who?
● it’s very difficult to assign a paper to a specific person with ORCID without knowing the fact that it’s the original author
● some people can claim their false (fraudulent) authorship
Centralized entity which can be considered as a single point of failure.
Typical questions:
● can email be considered as identifier?
● what to do when email is changed because the domain name is changing and the identifier disappears
or not resolvable any more?
● how reliable is ORCID database?
“Centralized” controlled vocabularies
The European Language Social
Science Thesaurus (ELSST) hosted
by various data providers like
CESSDA and ODISSEI in Skosmos.
CESSDA has updated version with
more language properties.
How about versions of
vocabularies and concepts
changes and drift?
Decentralized identifiers as possible solution
We envision the near future where the it will be possible to create a decentralized system which will not depend on any specific
registry, one provider, one authority, etc., so all connections will be established in a peer-to-peer network, and but will be persistent at
the same time.
The resolution of the global decentralized identifier (DID) should be cryptographically verifiable to prove the identity and the
ownership of that identifier.
Core DID features are listed below:
1. A permanent (persistent) identifier (never change)
2. A resolvable identifier (you can look it up to discover metadata)
3. A cryptographically-verifiable identifier (with private and public keys)
4. A decentralized identifier (no centralized authority)
DID should bring control of all provenance and metadata back to their owners instead of giving them away. In the same time public part
will/could not be very different from other persistent identifiers like DOIs and even replace them for the specific use cases like sharing sensitive
data.
Major Concerns about DIDs
● Selection of PID technology, governance and business model highly depends on a variety of
additional non-technical factors, and that based on the use case, one needs a sensible
mechanism for identifying the best solution.
● Centralized solutions can work better for some use case, depends from requirements.
● The cost of DID can increase if you don’t have resources to run infrastructure, more expertise
required.
● DID takes power away from centralized authorities and gives it back to individuals, they should
be prepared for the concept shift, for example, how to use “digital wallets” to keep their
ownership.
● The automation of trust with DID technology means “no human in the loop” involved - could
be risky in the long run.
The place of DID as unified resource
Source: “Self-Sovereign Identity”. by Alex Preukschat, Drummond Reed
DID can be considered as “replacement” of domain names and DNS from the “centralized” network
Example of DID with private and public key, and service endpoints
Service endpoints can tell how exactly to interact with the subject, what kind of protocols, what kind of network endpoints
are available to connect, for example, to an agent that represents the data subjects so that you can then exchange
credentials or some other messages.
Attributes in DID document
DID URLs with parameters
Source: Decentralized identifiers (DIDs) fundamentals and deep dive, SSIMeetup
“Decentralized” technology is not the same as “Blockchain” technology
“Blockchain is a digitally distributed database that is shared among nodes, which are computers in the blockchain network, that makes
it difficult or impossible to change, hack, or cheat the system”.
Blockchain parties:
- Holder (Owner of the Verifiable Credential)
- Issuer (provides a credential to a holder and signs the credential with their private key)
- Verifier can check the blockchain to ensure that the issued certificate belongs to who it was issued to.
it’s not necessary to use blockchain to release decentralized identifiers as there are about 100 methods to register DIDs being
developed by various companies and organizations in the world. They implemented in the different way the same spec for interface
where input and output are standardized.
OYDID method was developed in Vienna and provides a self-sustained environment for managing digital identifiers
(DIDs). The did:oyd method links the identifier cryptographically to the DID Document and through also cryptographically
linked provenance information in a public log it ensures resolving to the latest valid version of the DID Document.
Universal Resolver for DIDs
Try this! https://dev.uniresolver.io
curl https://dev.uniresolver.io/1.0/identifiers/did:oyd:zQmdQvLdpogfEf5EHK7778EM9xoxFMVFdJgRD7SdYRcCHeL
OYDID methods explained
“OYDID (Own Your Decentralized IDentifier) takes the approach to not maintain DID and DID Document on a public ledger
but on one or more local storages (that usually are publicly available). Through cryptographically linking the DID identifier
to the DID Document, and furthermore linking the DID Document to a chained provenance trail, the same security and
validation properties as a traditional DID are maintained while avoiding highly redundant storage and general public access.”
(from OYDID docs)
DIDs for controlled vocabularies
Generic problem of CVs: the most of controlled vocabularies are published and distributed in not sustainable way and often
don’t even have persistent identifiers resolving to their concepts.
Possible solution for CLARIAH FAIR vocabularies:
● assign DID identifier to every vocabulary concept and use their built-in “update” mechanism to keep all revisions in the chain of
linked DIDs resolving to the archived version of every change
● metadata records can be linked in the distributed way to DID identifiers corresponding to a specific version of concept
preserved in data ledger
● this approach is more sustainable by design and can be considered as a step towards FAIR vocabularies, also high scores after
FAIR assessment
● vocabulary management/update in the hands of vocabulary owner/creator, separate private key will be generated for every
concept and should be stored it in a secure place
● extra properties and attributes could be added to DID documents representing specific vocabulary concept, such as
provenance information containing the date of creation or modification, authors, the name of ontology, relations to other
ontologies. They can even have their own labels.
● statistics of concepts usage, linkages, relations and other metrics will be available directly from the DID chains
CoronaWhy Proof of Concept on DIDs
Dataverse with information on Monkeypox 2022 outbreak use DIDs as persistent identifiers
https://datasets.coronawhy.org
DID summarizer
https://github.com/Dans-labs/did-summarizer
Vocabulary recommender
Vocabulary Recommender Command-line interface
(CLI) was developed by Triply and provides a
recommendation interface which returns relevant
Internationalized Resource Identifiers (IRIs) based on
the search input. It works with SPARQL or
Elasticsearch endpoints which contain relevant
vocabulary datasets.
DANS has created a service out of it.
Decentralized archiving with DIDs
Cache and storage
All concepts are being cached in RAM using Redis framework and preserved in MongoDB database. After every restart the key:value
pair for URI:DID reindexed and available for lookup in the cache. It should be possible to move all DIDs data from one network to
another without too much efforts.
Archiving layer
Content archiving functionality is optional and implemented by using S3 protocol compliant with cloud storage services like AWS,
Amazon Blob and Google Cloud Platform (GCP). By default the contents of every object or web page with global DID identifier can be
stored in MinIO High Performance Object Storage.
Use case: COVID-19 Museum (C19M) with Yves Rozenholc
“Archive in a box” infrastructure based on Dataverse
Archive in a box: increasing Dataverse metadata interoperability
34
External controlled vocabularies support contributed by SSHOC project (data infrastructure for the EOSC)
COVID-19 questions in SKOSMOS framework
35
Interactive C19M timeline
Demo
C19M components: Cloud Storage - MinIO
MinIO is an open source distributed object storage
server written in Go, designed for Private Cloud
infrastructure providing S3 storage functionality.
MinIO is suited for storing unstructured data such as
photos, videos, log files, backups, and container.
Some features:
● supports multiple, sophisticated server-side
encryption schemes to protect data - wherever it
may be.
● MinIO supports the most advanced standards in
identity management, integrating with the
OpenID connect compatible providers
● MinIO’s continuous replication is designed for
large scale, cross data center deployments
● A MinIO Federation Server supports an unlimited
number of Distributed Mode sets
Human-in-the-Loop for Machine Learning
“Computers are incredibly fast, accurate
and stupid; humans are incredibly slow,
inaccurate and brilliant; together they
are powerful beyond imagination."
Albert Einstein
“A combination of AI and Human
Intelligence gives rise to an extremely
high level of accuracy and intelligence
(Super Intelligence)”
38
Source: Hackernoon.com
C19 components: annotation tool (Doccano)
C19M components: Hypothes.is as a peer review service
1. AI pipeline does
domain specific
entities extraction
and ranking of
relevant CORD-19
papers.
2. Automatic entities
and statements will
be added, important
fragments should be
highlighted.
3. Human annotators
should verify results
and validate all
statements.
40
SEMAF service - semantic transformations
Proposal: SEMAF: A Proposal for a Flexible Semantic Mapping Framework
C19M components: visualizations with Apache Superset
Source: Apache Superset (Open Source)
C19M Graph Network Sustainability with DIDs
COVID-19 Museum Knowledge Graph. Q142 Wikidata: France@en, Frankrijk@nl, Frankreich@de, Франція@ua, France@fr
Graph scalability challenges (C19M covid graph)
https://kg.zandbak.dans.knaw.nl/graph/text-nodes/ COVID: https://kg.zandbak.dans.knaw.nl/graph/covid/
Questions?
Slava Tykhonov, R&D
(DANS-KNAW, the Netherlands)
vyacheslav.tykhonov@dans.knaw.nl

More Related Content

What's hot

Hyperledger Indy tutorial
Hyperledger Indy tutorialHyperledger Indy tutorial
Hyperledger Indy tutorialssuser3993f3
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceOptum
 
Healthcare in Digital Age
Healthcare in Digital Age Healthcare in Digital Age
Healthcare in Digital Age ict moph
 
What are Decentralized Identifiers (DIDs)?
What are Decentralized Identifiers (DIDs)?What are Decentralized Identifiers (DIDs)?
What are Decentralized Identifiers (DIDs)?Evernym
 
Session 3 - i4Trust components for Identity Management and Access Control i4T...
Session 3 - i4Trust components for Identity Management and Access Control i4T...Session 3 - i4Trust components for Identity Management and Access Control i4T...
Session 3 - i4Trust components for Identity Management and Access Control i4T...FIWARE
 
Lecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptxLecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptxking779879
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AISemantic Web Company
 
What makes a successful SSI strategy?
What makes a successful SSI strategy?What makes a successful SSI strategy?
What makes a successful SSI strategy?Evernym
 
Overview of IoT (JNTUK - UNIT 1)
Overview of IoT (JNTUK - UNIT 1)Overview of IoT (JNTUK - UNIT 1)
Overview of IoT (JNTUK - UNIT 1)FabMinds
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset PreparationAndrew Ferlitsch
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval ModelsNisha Arankandath
 
Introduction to Knowledge Graphs
Introduction to Knowledge GraphsIntroduction to Knowledge Graphs
Introduction to Knowledge Graphsmukuljoshi
 
OpenID Connect 4 SSI (DIFCon F2F)
OpenID Connect 4 SSI (DIFCon F2F)OpenID Connect 4 SSI (DIFCon F2F)
OpenID Connect 4 SSI (DIFCon F2F)Torsten Lodderstedt
 
What is self-sovereign identity (SSI)?
What is self-sovereign identity (SSI)?What is self-sovereign identity (SSI)?
What is self-sovereign identity (SSI)?Evernym
 

What's hot (20)

Hyperledger Indy tutorial
Hyperledger Indy tutorialHyperledger Indy tutorial
Hyperledger Indy tutorial
 
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data ScienceAI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
AI, Knowledge Representation and Graph Databases -
 Key Trends in Data Science
 
OpenID for SSI
OpenID for SSIOpenID for SSI
OpenID for SSI
 
Healthcare in Digital Age
Healthcare in Digital Age Healthcare in Digital Age
Healthcare in Digital Age
 
What Does Interoperability Mean for the IoT?
What Does Interoperability Mean for the IoT?What Does Interoperability Mean for the IoT?
What Does Interoperability Mean for the IoT?
 
What are Decentralized Identifiers (DIDs)?
What are Decentralized Identifiers (DIDs)?What are Decentralized Identifiers (DIDs)?
What are Decentralized Identifiers (DIDs)?
 
What is an IoT Agent
What is an IoT AgentWhat is an IoT Agent
What is an IoT Agent
 
Session 3 - i4Trust components for Identity Management and Access Control i4T...
Session 3 - i4Trust components for Identity Management and Access Control i4T...Session 3 - i4Trust components for Identity Management and Access Control i4T...
Session 3 - i4Trust components for Identity Management and Access Control i4T...
 
Lecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptxLecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptx
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
 
What makes a successful SSI strategy?
What makes a successful SSI strategy?What makes a successful SSI strategy?
What makes a successful SSI strategy?
 
Overview of IoT (JNTUK - UNIT 1)
Overview of IoT (JNTUK - UNIT 1)Overview of IoT (JNTUK - UNIT 1)
Overview of IoT (JNTUK - UNIT 1)
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
 
Introduction to Knowledge Graphs
Introduction to Knowledge GraphsIntroduction to Knowledge Graphs
Introduction to Knowledge Graphs
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
OpenID Connect 4 SSI (DIFCon F2F)
OpenID Connect 4 SSI (DIFCon F2F)OpenID Connect 4 SSI (DIFCon F2F)
OpenID Connect 4 SSI (DIFCon F2F)
 
Wikidata
WikidataWikidata
Wikidata
 
What is self-sovereign identity (SSI)?
What is self-sovereign identity (SSI)?What is self-sovereign identity (SSI)?
What is self-sovereign identity (SSI)?
 

Similar to Decentralised identifiers and knowledge graphs

Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...vty
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Projectvty
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution vty
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligencevty
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)vty
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs vty
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloudNational Institute of Informatics
 
DSpace-CRIS 7: What is Coming? OR2020
DSpace-CRIS 7: What is Coming? OR2020DSpace-CRIS 7: What is Coming? OR2020
DSpace-CRIS 7: What is Coming? OR20204Science
 
OrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsOrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsFabrizio Fortino
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...Eric Stephan
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Dr. Haxel Consult
 
DataverseEU: Building Multilingual infrastructure for the Social Sciences in...
DataverseEU: Building Multilingual infrastructure  for the Social Sciences in...DataverseEU: Building Multilingual infrastructure  for the Social Sciences in...
DataverseEU: Building Multilingual infrastructure for the Social Sciences in...vty
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSemantic Web Company
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museumvty
 
The importance of FAIR and the Community of Data Driven Insights - the road t...
The importance of FAIR and the Community of Data Driven Insights - the road t...The importance of FAIR and the Community of Data Driven Insights - the road t...
The importance of FAIR and the Community of Data Driven Insights - the road t...Carlos Utrilla Guerrero
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
 
Introduction to eudat and its services
Introduction to eudat and its servicesIntroduction to eudat and its services
Introduction to eudat and its servicesEUDAT
 

Similar to Decentralised identifiers and knowledge graphs (20)

Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...Building an electronic repository and archives on Dataverse in the European O...
Building an electronic repository and archives on Dataverse in the European O...
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
Decentralisation and knowledge graphs
Decentralisation and knowledge graphs Decentralisation and knowledge graphs
Decentralisation and knowledge graphs
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloud
 
DSpace-CRIS 7: What is Coming? OR2020
DSpace-CRIS 7: What is Coming? OR2020DSpace-CRIS 7: What is Coming? OR2020
DSpace-CRIS 7: What is Coming? OR2020
 
OrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data RelationshipsOrientDB: Unlock the Value of Document Data Relationships
OrientDB: Unlock the Value of Document Data Relationships
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
 
DataverseEU: Building Multilingual infrastructure for the Social Sciences in...
DataverseEU: Building Multilingual infrastructure  for the Social Sciences in...DataverseEU: Building Multilingual infrastructure  for the Social Sciences in...
DataverseEU: Building Multilingual infrastructure for the Social Sciences in...
 
SKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategiesSKOS as the focal point of linked data strategies
SKOS as the focal point of linked data strategies
 
Dataverse repository for research data in the COVID-19 Museum
Dataverse repository for research data  in the COVID-19 MuseumDataverse repository for research data  in the COVID-19 Museum
Dataverse repository for research data in the COVID-19 Museum
 
The importance of FAIR and the Community of Data Driven Insights - the road t...
The importance of FAIR and the Community of Data Driven Insights - the road t...The importance of FAIR and the Community of Data Driven Insights - the road t...
The importance of FAIR and the Community of Data Driven Insights - the road t...
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
Introduction to eudat and its services
Introduction to eudat and its servicesIntroduction to eudat and its services
Introduction to eudat and its services
 

More from vty

Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...vty
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7vty
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyvty
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes vty
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryvty
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...vty
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataversevty
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research datavty
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse vty
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataversevty
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes vty
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challengesvty
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloudvty
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2vty
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloudvty
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesvty
 
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC projectvty
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repositoryvty
 

More from vty (19)

Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
External CV support in Dataverse 5.7
External CV support in Dataverse 5.7External CV support in Dataverse 5.7
External CV support in Dataverse 5.7
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes CLARIN CMDI use case and flexible metadata schemes
CLARIN CMDI use case and flexible metadata schemes
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Controlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repositoryControlled vocabularies and ontologies in Dataverse data repository
Controlled vocabularies and ontologies in Dataverse data repository
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
External controlled vocabularies support in Dataverse
External controlled vocabularies support in DataverseExternal controlled vocabularies support in Dataverse
External controlled vocabularies support in Dataverse
 
Setting up Dataverse repository for research data
Setting up Dataverse repository for research dataSetting up Dataverse repository for research data
Setting up Dataverse repository for research data
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC,  Service QA and DataverseIntegration of WORSICA’s thematic service in EOSC,  Service QA and Dataverse
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 
Technical integration of data repositories status and challenges
Technical integration of data repositories status and challengesTechnical integration of data repositories status and challenges
Technical integration of data repositories status and challenges
 
SSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science CloudSSHOC Dataverse in the European Open Science Cloud
SSHOC Dataverse in the European Open Science Cloud
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2Dataverse SSHOC enrichment of DDI support at EDDI'19 2
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
Development in Dataverse SSHOC project
Development in Dataverse SSHOC projectDevelopment in Dataverse SSHOC project
Development in Dataverse SSHOC project
 
DataverseEU as multilingual repository
DataverseEU as multilingual repositoryDataverseEU as multilingual repository
DataverseEU as multilingual repository
 

Recently uploaded

GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 

Recently uploaded (20)

GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 

Decentralised identifiers and knowledge graphs

  • 1. Journées Scientifiques de Rochebrune 2023 (JSR'23) Slava Tykhonov, R&D (DANS-KNAW, the Netherlands) 29 March 2023 Decentralized research data infrastructure and knowledge graphs
  • 2. About me: DANS-KNAW projects (2016-2023) ● MuseIT H2020 (ongoing) ● Polifonia H2020 (ongoing) ● CLARIAH+ (ongoing) ● ODISSEI (ongoing) ● EOSC Synergy ● SSHOC Dataverse ● CESSDA Dataverse Europe 2018 ● Time Machine Europe Supervisor at DANS-KNAW ● PARTHENOS Horizon 2020 ● CESSDA PID (PersistentIdentifiers) Horizon 2020 ● CLARIAH-NL ● RDA (Research Data Alliance) PITTS Horizon 2020 ● CESSDA SaW H2020-EU.1.4.1.1 Horizon 2020 2 Source: LinkedIn
  • 3. Building an Operating System for Open Science 3 ● Generic Common Research and Data Infrastructure should be distributed and robust enough to be scaled up and reused for any challenging tasks like cancer research etc ● Networked services built from Open Source components ● Data processed and published in FAIR way, the provenance information is the part of our Data Lake ● Data evaluation and credibility is the top priority, we’re providing tools for the expert community for the verification of our datasets ● The transparency of data and services guarantees the reproducibility of all experiments and get bring new insights in the multidisciplinary research ● Infrastructure should enforce collaboration between people, bring together general public, researchers, citizen scientists, etc ● Infrastructure is free of charge, (meta)data is protected and licenced.
  • 4. Looking for Commons Merce Crosas, “Harvard Data Commons” 4
  • 5. Building a horizontal platform to serve vertical teams Source: CoronaWhy infrastructure introduction 5
  • 6. DANS Data Stations - Future Data Services Dataverse is API based data platform and a key framework for Open Innovation!
  • 7. What is Dataverse? ● Open source data repository developed by IQSS of Harvard University ● Great product with very long history (from 2006) created by experienced and Agile development team ● Clear vision and understanding of research communities requirements, public roadmap ● Well developed architecture with rich APIs allows to build application layers around Dataverse ● Strong community behind of Dataverse is helping to improve the basic functionality and develop it further. ● DANS-KNAW delivered production ready (Docker/k8s) Dataverse repository for the European Open Science Cloud (EOSC) communities CESSDA, CLARIN and DARIAH. ● Dataverse is de facto standard for FAIR data repositories in Europe with wide adoption in the Netherlands, France, Norway, Portugal in other EU countries
  • 8. Data integration challenges ● datasets are very heterogeneous and multilingual ● data usually lacks sufficient data quality control ● data providers using different modeling schemas and styles ● linked data cleansing and versioning is very difficult to track and maintain properly, web resources aren’t persistent ● even modern data repositories providing only metadata records describing data without giving access to individual data items stored in files ● difficult to assign and manually keep up-to-date entity relationships in knowledge graphs 8
  • 9. Benefits of the Common Data Infrastructure ● It’s distributed and sustainable, suitable for the future ● maintenance costs will drop massively, as more organizations will join, less expensive it will be to support ● maintenance costs could be reallocated to the training and further development of the new (common) features ● reuse of the same infrastructure components will enforce the quality and the speed of the knowledge exchange ● building a multidisciplinary teams reusing the same infra can bring us new insights and unexpected views ● Common Data Infrastructure plays a role of the “universal gravity” power for Data Science projects (and so on…)
  • 10. Semantic interoperability on the infrastructure level We envision a situation where thousands of Dataverse instances (due to EOSC) on the web can be simultaneously search for data and will form shared Data Lake. The old dream of Federated search/Universal catalogue can only be realised if: (1) Crosswalks; mapping across different metadata schemes are implemented (2) In metadata schemes we seek for ways to enrich indexes with values from controlled vocabularies Standard response (centralized) = standardisation and harmonisation = repository software, certain metadata standards, or certain controlled vocabularies New response (distributed) = explore agile solutions (Proof of Concepts) which can be implemented by different communities (even smaller ones), so we keep variety and still enable integration in the Distributed Data Network by applying Linked Data technologies.
  • 11. “Archive in a box” features (SSHOC Dataverse) ● Dockerized version of Dataverse application and shared networked services ● fully automatic Dataverse deployment with Traefik proxy ● Dataverse configuration managed through environmental file .env ● different Dataverse distributions with services on your preference suitable for different use cases and research communities ● external controlled vocabularies support (demo of CESSDA CMM metadata fields connected to Skosmos framework) ● S3 compatible MinIO storage support for Cloud Storage ● data previewers integrated in the Dataverse distribution ● startup process managed through scripts located in init.d folder ● automatic SOLR reindex ● external services integration with PostgreSQL triggers ● support of custom metadata schemes (CESSDA CMM, CLARIN CMDI, ...) ● built-in Web interface localization uses Dataverse language pack to support multiple languages out of the box https://github.com/IQSS/dataverse-docker
  • 12. “Archive in a box” infra suitable both for academics and industry Source: Citizen Science and Open Science Core Concepts and Areas of Synergy (Vohland and Göbel, 2017) Anyone can setup own digital archive and share the content in distributed infra Decentralized FAIR Dataverse network with APIs to share (meta)data, search, storage and provenance
  • 13. Open vs Closed Innovation in the decentralized world
  • 14. Open Data vs Restricted (Sensitive) Data Credits: OECD Can Data still be Sensitive and FAIR in the same time?
  • 15. Building FAIR decentralized data network for any type of content Source: Wikipedia We’re considering experimental implementation of the decentralized identifiers for controlled vocabularies and content types extension to archive various types of content. DIDs can be assigned to any artefacts including images, audio and video, for example, to store and link metadata records and provenance information together with their digitized content. DID can be private (invisible and not resolvable for public) but available for access with cryptokey.
  • 16. DOI costs for Open Data DataCite agency charge some fee from data providers depending on the amount of identifiers and it can be significant amount starting from 1 million DOIs. What about DIDs?
  • 17. Typical problems of “centralized” identifiers Disambiguation and authorship issues: ● two authors with the same name mentioned in different papers, how do you know who is who? ● it’s very difficult to assign a paper to a specific person with ORCID without knowing the fact that it’s the original author ● some people can claim their false (fraudulent) authorship Centralized entity which can be considered as a single point of failure. Typical questions: ● can email be considered as identifier? ● what to do when email is changed because the domain name is changing and the identifier disappears or not resolvable any more? ● how reliable is ORCID database?
  • 18. “Centralized” controlled vocabularies The European Language Social Science Thesaurus (ELSST) hosted by various data providers like CESSDA and ODISSEI in Skosmos. CESSDA has updated version with more language properties. How about versions of vocabularies and concepts changes and drift?
  • 19. Decentralized identifiers as possible solution We envision the near future where the it will be possible to create a decentralized system which will not depend on any specific registry, one provider, one authority, etc., so all connections will be established in a peer-to-peer network, and but will be persistent at the same time. The resolution of the global decentralized identifier (DID) should be cryptographically verifiable to prove the identity and the ownership of that identifier. Core DID features are listed below: 1. A permanent (persistent) identifier (never change) 2. A resolvable identifier (you can look it up to discover metadata) 3. A cryptographically-verifiable identifier (with private and public keys) 4. A decentralized identifier (no centralized authority) DID should bring control of all provenance and metadata back to their owners instead of giving them away. In the same time public part will/could not be very different from other persistent identifiers like DOIs and even replace them for the specific use cases like sharing sensitive data.
  • 20. Major Concerns about DIDs ● Selection of PID technology, governance and business model highly depends on a variety of additional non-technical factors, and that based on the use case, one needs a sensible mechanism for identifying the best solution. ● Centralized solutions can work better for some use case, depends from requirements. ● The cost of DID can increase if you don’t have resources to run infrastructure, more expertise required. ● DID takes power away from centralized authorities and gives it back to individuals, they should be prepared for the concept shift, for example, how to use “digital wallets” to keep their ownership. ● The automation of trust with DID technology means “no human in the loop” involved - could be risky in the long run.
  • 21. The place of DID as unified resource Source: “Self-Sovereign Identity”. by Alex Preukschat, Drummond Reed DID can be considered as “replacement” of domain names and DNS from the “centralized” network
  • 22. Example of DID with private and public key, and service endpoints Service endpoints can tell how exactly to interact with the subject, what kind of protocols, what kind of network endpoints are available to connect, for example, to an agent that represents the data subjects so that you can then exchange credentials or some other messages.
  • 23. Attributes in DID document
  • 24. DID URLs with parameters Source: Decentralized identifiers (DIDs) fundamentals and deep dive, SSIMeetup
  • 25. “Decentralized” technology is not the same as “Blockchain” technology “Blockchain is a digitally distributed database that is shared among nodes, which are computers in the blockchain network, that makes it difficult or impossible to change, hack, or cheat the system”. Blockchain parties: - Holder (Owner of the Verifiable Credential) - Issuer (provides a credential to a holder and signs the credential with their private key) - Verifier can check the blockchain to ensure that the issued certificate belongs to who it was issued to. it’s not necessary to use blockchain to release decentralized identifiers as there are about 100 methods to register DIDs being developed by various companies and organizations in the world. They implemented in the different way the same spec for interface where input and output are standardized. OYDID method was developed in Vienna and provides a self-sustained environment for managing digital identifiers (DIDs). The did:oyd method links the identifier cryptographically to the DID Document and through also cryptographically linked provenance information in a public log it ensures resolving to the latest valid version of the DID Document.
  • 26. Universal Resolver for DIDs Try this! https://dev.uniresolver.io curl https://dev.uniresolver.io/1.0/identifiers/did:oyd:zQmdQvLdpogfEf5EHK7778EM9xoxFMVFdJgRD7SdYRcCHeL
  • 27. OYDID methods explained “OYDID (Own Your Decentralized IDentifier) takes the approach to not maintain DID and DID Document on a public ledger but on one or more local storages (that usually are publicly available). Through cryptographically linking the DID identifier to the DID Document, and furthermore linking the DID Document to a chained provenance trail, the same security and validation properties as a traditional DID are maintained while avoiding highly redundant storage and general public access.” (from OYDID docs)
  • 28. DIDs for controlled vocabularies Generic problem of CVs: the most of controlled vocabularies are published and distributed in not sustainable way and often don’t even have persistent identifiers resolving to their concepts. Possible solution for CLARIAH FAIR vocabularies: ● assign DID identifier to every vocabulary concept and use their built-in “update” mechanism to keep all revisions in the chain of linked DIDs resolving to the archived version of every change ● metadata records can be linked in the distributed way to DID identifiers corresponding to a specific version of concept preserved in data ledger ● this approach is more sustainable by design and can be considered as a step towards FAIR vocabularies, also high scores after FAIR assessment ● vocabulary management/update in the hands of vocabulary owner/creator, separate private key will be generated for every concept and should be stored it in a secure place ● extra properties and attributes could be added to DID documents representing specific vocabulary concept, such as provenance information containing the date of creation or modification, authors, the name of ontology, relations to other ontologies. They can even have their own labels. ● statistics of concepts usage, linkages, relations and other metrics will be available directly from the DID chains
  • 29. CoronaWhy Proof of Concept on DIDs Dataverse with information on Monkeypox 2022 outbreak use DIDs as persistent identifiers https://datasets.coronawhy.org
  • 31. Vocabulary recommender Vocabulary Recommender Command-line interface (CLI) was developed by Triply and provides a recommendation interface which returns relevant Internationalized Resource Identifiers (IRIs) based on the search input. It works with SPARQL or Elasticsearch endpoints which contain relevant vocabulary datasets. DANS has created a service out of it.
  • 32. Decentralized archiving with DIDs Cache and storage All concepts are being cached in RAM using Redis framework and preserved in MongoDB database. After every restart the key:value pair for URI:DID reindexed and available for lookup in the cache. It should be possible to move all DIDs data from one network to another without too much efforts. Archiving layer Content archiving functionality is optional and implemented by using S3 protocol compliant with cloud storage services like AWS, Amazon Blob and Google Cloud Platform (GCP). By default the contents of every object or web page with global DID identifier can be stored in MinIO High Performance Object Storage.
  • 33. Use case: COVID-19 Museum (C19M) with Yves Rozenholc “Archive in a box” infrastructure based on Dataverse
  • 34. Archive in a box: increasing Dataverse metadata interoperability 34 External controlled vocabularies support contributed by SSHOC project (data infrastructure for the EOSC)
  • 35. COVID-19 questions in SKOSMOS framework 35
  • 37. C19M components: Cloud Storage - MinIO MinIO is an open source distributed object storage server written in Go, designed for Private Cloud infrastructure providing S3 storage functionality. MinIO is suited for storing unstructured data such as photos, videos, log files, backups, and container. Some features: ● supports multiple, sophisticated server-side encryption schemes to protect data - wherever it may be. ● MinIO supports the most advanced standards in identity management, integrating with the OpenID connect compatible providers ● MinIO’s continuous replication is designed for large scale, cross data center deployments ● A MinIO Federation Server supports an unlimited number of Distributed Mode sets
  • 38. Human-in-the-Loop for Machine Learning “Computers are incredibly fast, accurate and stupid; humans are incredibly slow, inaccurate and brilliant; together they are powerful beyond imagination." Albert Einstein “A combination of AI and Human Intelligence gives rise to an extremely high level of accuracy and intelligence (Super Intelligence)” 38 Source: Hackernoon.com
  • 39. C19 components: annotation tool (Doccano)
  • 40. C19M components: Hypothes.is as a peer review service 1. AI pipeline does domain specific entities extraction and ranking of relevant CORD-19 papers. 2. Automatic entities and statements will be added, important fragments should be highlighted. 3. Human annotators should verify results and validate all statements. 40
  • 41. SEMAF service - semantic transformations Proposal: SEMAF: A Proposal for a Flexible Semantic Mapping Framework
  • 42. C19M components: visualizations with Apache Superset Source: Apache Superset (Open Source)
  • 43. C19M Graph Network Sustainability with DIDs COVID-19 Museum Knowledge Graph. Q142 Wikidata: France@en, Frankrijk@nl, Frankreich@de, Франція@ua, France@fr
  • 44. Graph scalability challenges (C19M covid graph) https://kg.zandbak.dans.knaw.nl/graph/text-nodes/ COVID: https://kg.zandbak.dans.knaw.nl/graph/covid/
  • 45. Questions? Slava Tykhonov, R&D (DANS-KNAW, the Netherlands) vyacheslav.tykhonov@dans.knaw.nl