SlideShare une entreprise Scribd logo
1  sur  83
Skolemising Blank Nodes while
Preserving Isomorphism
Aidan Hogan – DCC, Universidad de Chile
WHY? BLANK NODES ARE GREAT!
When life gives you blank nodes …
Blank Nodes are glue!
Blank Nodes names aren’t important …
(Isomorphic)
Blank nodes are common in real-world data …
Aidan Hogan, Marcelo Arenas, Alejandro Mallea and Axel Polleres
"Everything You Always Wanted to Know About Blank Nodes".
Journal of Web Semantics 27: pp. 42–69, 2014
BLANK NODES ENABLE SYNTAX SHORTCUTS
They represent implicit nodes in the graph
They help specify order, higher-arity relations, reification, etc., succinctly
They are common in real-world data
BLANK NODES:
WHAT’S THE PROBLEM?
Are two RDF graphs isomorphic?
Are two RDF graphs isomorphic?
RDF ISOMORPHISM IS GI-COMPLETE
A general algorithm to see if two RDF graphs are the “same” will
(probably) not be tractable
BLANK NODES ADD COMPLEXITY?
WHAT TO DO?
RDF 1.1 proposes Skolemisation
But fresh IRIs every time is not ideal
But fresh IRIs every time is not ideal
Would prefer a “consistent” labelling
Would prefer a “consistent” labelling
Compute isomorphically-unique graph hash
Finding duplicate documents from a crawler
CANONICAL LABELLING USEFUL FOR:
1. Mapping blank nodes to IRIs
2. Computing unique hashes for RDF graphs
OLD BUT RECURRING QUESTION
An old question that won’t go away …
Jeremy J. Carroll. “Signing RDF Graphs.” ISWC 2003.
Edzard Höfig, Ina Schieferdecker. “Hashing of RDF Graphs
and a Solution to the Blank Node Problem.” URSW 2014.
NO EXISTING APPROACH IS GENERAL
• Hard cases seem unlikely in practice
• Let’s build a general (and thus worst-case exponential) algorithm
that’s efficient for practical cases
NAÏVE CANONICAL LABELLING SCHEME
(Naïve) Canonical labels for blank nodes
But wait … what happens if ... ?
Or another case …
Or another case …
Or another case …
Fixpoint does not distinguish all blank nodes!
NAÏVE: COLOUR BLANK NODES RECURSIVELY
UNTIL FIXPOINT
• Efficient
• Incomplete
CANONICAL LABELLING SCHEME:
ALWAYS DISTINGUISH ALL BLANK NODES
Brendan D. McKay. "Practical graph isomorphism". Congressus Numerantium 30: pp. 45–87, 1981.
Start with a (non-distinguished) colouring …
Let’s distinguish a node …
Let’s distinguish a node …
Colouring is no longer a fixpoint!
Rerun colouring to fixpoint
Rerun colouring to fixpoint
Rerun colouring to fixpoint
Rerun colouring to fixpoint
Fixpoint reached: still not finished!
So again let’s distinguish another …
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
… and rerun colouring to fixpoint
Now all blank nodes are distinguished!
Blank node labels computed from colour
Let’s go back: first, why pick _:a and _:c?
Okay so: why _:a …
Adapt ideas from the Nauty algorithm
(for standard graph isomorphism)
Adapt ideas from the Nauty algorithm
(for standard graph isomorphism)
Check all leafs for minimum graph
What happened?
What happened?
What happened?
Automorphisms cause repetitions
CORE ALGORITHM: FIND MINIMAL GRAPH
FOLLOWING FIXED COLOURING RULES
• Complete
• Efficient for many cases?
OKAY … SO WHAT HASHING TO USE?
What about hash collisions?
128 bit: MD5, Murmur3_128
160 bit: SHA1
HASHING MAY LEAD TO COLLISIONS
• Don’t care what hashing you want to use
• 128-bit hash shortest hash with acceptable collision probability
• For cryptographic use-cases, SHA-256 or better might be needed
EVALUATION
Evaluation: Real-world Graphs
Evaluation: Nasty Synthetic Graphs
CONCLUSIONS
In loving memory of
Linked Data
2007–2012
Survived by its research
community
_:b
1999–2015
Conclusions
Aside: Why GI-Hard?
Aside: Why GI-Hard?
(Can Encode Graph Isomorphism as RDF Isomorphism)
if and only if
Aside: Why GI-Complete?
(Can we encode RDF isomorphism as graph isomorphism?)
if and only if
?
?
Aside: Why GI-Complete?
(Yes: We can encode RDF isomorphism as graph isomorphism)
Aside: Why GI-Complete?
(Yes: We can encode RDF isomorphism as graph isomorphism)
if and only if
COMPLETE CANONICAL LABELLING SCHEME
A complete canonical labelling?
Find a canonical labelling for H
Choose the lowest possible graph
COMPLETE: FIND MINIMUM POSSIBLE
GRAPH USING FIXED BLANK NODE LABELS
• Complete
• Inefficient
The need for a graph-level hash
OPTIMISATION: PRUNE THE TREE USING
AUTOMORPHISMS
Trim the search tree
using “found” automorphisms
Found Automorphisms …
PRUNING PER AUTOMORPHISMS AVOIDS
SYMMETRIC REPETITIONS
• Automorphisms are found naturally
• Makes very “regular” structures (like cliques) a lot easier
• Need to be careful how to manage the automorphism group

Contenu connexe

En vedette

En vedette (7)

Learning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examplesLearning W3C Linked Data Platform with examples
Learning W3C Linked Data Platform with examples
 
Metadata - Linked Data
Metadata - Linked DataMetadata - Linked Data
Metadata - Linked Data
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Data in RDF
Data in RDFData in RDF
Data in RDF
 
RDF and OWL
RDF and OWLRDF and OWL
RDF and OWL
 
Andreas Blumauer: Über das ‘Smarte’ am Semantic Web
Andreas Blumauer: Über das ‘Smarte’ am Semantic WebAndreas Blumauer: Über das ‘Smarte’ am Semantic Web
Andreas Blumauer: Über das ‘Smarte’ am Semantic Web
 

Similaire à Skolemising Blank Nodes while Preserving Isomorphism

Presentation at SMI 2023
Presentation at SMI 2023Presentation at SMI 2023
Presentation at SMI 2023
Joaquim Jorge
 
R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009
Jose Quesada
 
Query Linguistic Intent Detection
Query Linguistic Intent DetectionQuery Linguistic Intent Detection
Query Linguistic Intent Detection
butest
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
Jonathan Dursi
 
OO and Rails...
OO and Rails... OO and Rails...
OO and Rails...
adzdavies
 

Similaire à Skolemising Blank Nodes while Preserving Isomorphism (20)

Presentation at SMI 2023
Presentation at SMI 2023Presentation at SMI 2023
Presentation at SMI 2023
 
Yoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and WhitherYoav Goldberg: Word Embeddings What, How and Whither
Yoav Goldberg: Word Embeddings What, How and Whither
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
 
Corr clust-kiel
Corr clust-kielCorr clust-kiel
Corr clust-kiel
 
R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009
 
ilp-nlp-slides.pdf
ilp-nlp-slides.pdfilp-nlp-slides.pdf
ilp-nlp-slides.pdf
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
Query Linguistic Intent Detection
Query Linguistic Intent DetectionQuery Linguistic Intent Detection
Query Linguistic Intent Detection
 
Bluffers guide to Terminology
Bluffers guide to TerminologyBluffers guide to Terminology
Bluffers guide to Terminology
 
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
Bluffers guide to elitist jargon - Martijn Verburg, Richard Warburton, James ...
 
DataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesDataDay 2023 Presentation - Notes
DataDay 2023 Presentation - Notes
 
Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...Colloquium talk on modal sense classification using a convolutional neural ne...
Colloquium talk on modal sense classification using a convolutional neural ne...
 
Spark MLlib and Viral Tweets
Spark MLlib and Viral TweetsSpark MLlib and Viral Tweets
Spark MLlib and Viral Tweets
 
A Guide to the Post Relational Revolution
A Guide to the Post Relational RevolutionA Guide to the Post Relational Revolution
A Guide to the Post Relational Revolution
 
Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018
Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018
Calin Constantinov - Neo4j - Bucharest Big Data Week Meetup - Bucharest 2018
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
 
OO and Rails...
OO and Rails... OO and Rails...
OO and Rails...
 
Hill Stephen Rendering Tools Splinter Cell Conviction
Hill Stephen Rendering Tools Splinter Cell ConvictionHill Stephen Rendering Tools Splinter Cell Conviction
Hill Stephen Rendering Tools Splinter Cell Conviction
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Skolemising Blank Nodes while Preserving Isomorphism