Graph-based modelling is becoming more popular, in the sciences and elsewhere, as a flexible and powerful way to exploit data to power world-changing digital applications. Com- pared to the initial vision of the Semantic Web, knowledge graphs and graph databases are be- coming a practical and computationally less formal way to manage graph data. On the other hand, linked data based on Semantic Web standards are a complementary, rather than alternative, ap- proach to deal with these data, since they still provide a common way to represent and exchange information. In this paper we introduce rdf2neo, a tool to populate Neo4j databases starting from RDF data sets, based on a configurable mapping between the two. By employing agrigenomics- related real use cases, we show how such mapping can allow for a hybrid approach to the man- agement of networked knowledge, based on taking advantage of the best of both RDF and prop- erty graphs.
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMiner Use Case
1. Getting the best of Linked Data and
Property Graphs: rdf2neo and the
KnetMiner Use Case
Marco Brandizi
marco.brandizi@rothamsted.ac.uk
Find these slides at:
https://www.slideshare.net/mbrandizi
4. A short story about Gene KnowledgeCan we improve? Graph DBs?
Query Languages? Open Data?
FAIR?
Sure! RDF! OWL!
Triple Store! SPARQL!
Uhm, we’ve tried that,
but…
I can feel what you mean,
but, it’s
not so difficult, let me…
Look! I’ve seen this Neo4j! It
has relations with properties!
Uhm… well… yeah, but no data
format, bad with ontologies, No
URIs/merging…
And look how cool a browser!
Oh, yes, that’s cool, but
maybe not the most
important thing…And Cypher is a
breeze!
Uhm… let me try. Oh, cool,
but UNION sucks, and…
And has graph algorithms!
And devs got the APIs in
minutes!
Uhm… Are Jena/RDF4J
that harder?
… …
Source: https://digiday.com/uk/weve-created-monster-publishers-vent-ad-tech-frustration
14. Comparing Functionality
• Data ELT and Integration
• See our example: https://github.com/Rothamsted/bioknet-
onto/tree/master/examples/bmp_reg_human
• Semantic Web is focused on standardised data sharing
• Neo4j doesn’t have a data format, focused on backing applications
• URI-based merging in RDF
• CONSTRUCT-based data transformations in Sem Web (including tools like TARQL)
• MATCH/CREATE in Cypher, but not the same
• Query languages
• Cypher considered compact and simple to learn
• SPARQL better at complex graph patterns with branches
• Cypher very good at chain patterns
19. Conclusions
• Hybrid architectures might be good at getting the best of both
• They’re feasible, performance are acceptable with both technologies
• rdf2neo can help you with keeping everything aligned to a conceptual
data model
• Helps with Linked Data and FAIR Principles
• Please checkout GitHub, get in touch (especially if you’re on
agriculture/plant biology)
• It comes with some overhead. You might need just one half
• Whatever you do, follow LOD/FAIR
20. Acknowledgements
Ajit Singh
Software Engineer
Monika Mistry
Master Student, Data Curator
Keywan Hassani-Pak
KnetMiner Team Leader
Chris Rawlings
Head of Computational & Analytical Sciences
William Brown
IT Admin
21. And You All!
Marco Brandizi
marco.brandizi@rothamsted.ac.ukFind these slides at:
24. Cypher vs SPARQL
Proteins->Reactions->Pathways:
// chain of paths, node selection via property (exploits indices)
MATCH (prot:Protein) - [csby:consumed_by] -> (:Reaction) - [:part_of] ->
(pway:Path{ title: ‘apoptosis’ })
// further conditions, not always so performant
WHERE prot.name =~ ‘(?i)^DNA.+’
// Usual projection and post-selection operators
RETURN prot.name, pway
// Relations can have properties
ORDER BY csby.pvalue
LIMIT 1000
Proteins->Reactions->Pathways:
// Single-path (or same-direction branching) easy to write
MATCH (prot:Protein) - [:produced_by|consumed_by] -> (:Reaction)
- [:part_of*1..3] -> (pway:Path)
RETURN ID(prot), ID(pway) LIMIT 1000
// Very compact forms available, depending on the data
MATCH (prot:Protein) - (pway:Path) RETURN pway
25. select distinct ?prot ?pway {
where {
# Branch 1
?prot kb:pd_by|kb:cs_by ?react.
?prot a kb:Protein.
?react a kb:Reaction.
?react kb:part_of ?pway.
?pway a kb:Path.
}
union { # Branch 2
?prot ^kb:ac_by|kb:is_a ?enz.
?prot a kb:Protein.
?enz a kb:Enzyme.
{ # Branch 2.1
?enz kb:ac_by|kb:in_by ?comp.
?comp a kb:Compound.
?comp kb:cs_by|kb:pd_by ?trns
?trns a kb:Transport
} union {
# Branch 2.2
?enz ^kb:ca_by ?trns.
?comp a kb:Compound.
?trns a kb:Transport
}
?trns kb:part_of ?pway.
?pway a kb:Path.
}
} LIMIT 1000
Cypher vs SPARQL
27. Conclusions
Neo4J, Cypher DBs, Graph DBs Semantic Web/Triple Stores
Data xchg format
- No official one, just Cypher,
Support for GraphML, RDF
+/- Focus on backing applications
+ Focus on data sharing standards
Data model
+ Relations with properties
- Metadata/schemas/ontologies management
- Relations cannot have properties (reification
required)
+ Metadata/schemas/ontologies as first citizen
and standardised OWL
Performance + complex graph traversals + Comparable in most cases
Query Language
+ Cypher is easier (eg, compact, implicit elems)? -
Expressivity issues (unions)
- No standard QL (but efforts in progress, eg,
OpenCypher)
- SPARQL is Harder? (URIs, namespaces,
verbosity) + SPARQL More expressive
Standardisation,
openness
+/- (TinkerPop is open, Neo4J isn’t)
+ Commercial support
+ More alive and up-to date (e.g., support for
Hadoop, nice Neo4j browser, easy installation)
+ Natively open, many open implementations
- Instability and many short-lived prototypes
- Advancements seems to be slowing down
+ Some nice open and commercial browser
(LODEStar,
Scalability, big data
+/- Commercial support to clustering/clouds for
Neo4J + Open support in TinkerPop
+ Load Balancing/Cluster solutions, Commercial
Cloud support (eg GraphDB) + SPARQL Over
TinkerPop (via SAIL inteface)
Notes de l'éditeur
Let me start from a little story…
Eventually, it’s not really a tug of war, it’s that we realised they’re complementary.
So, why not taking the best of the two worlds
Hopefully less controversial than that…
We have done some formal modelling of the mapping procedure
Main results are:
1) it works as expected
2) Computational complexity is no worse than SPARQL, which is known to be constrainable into LOGSPACE.
Loading is scalable and fairly OK, skipped here
In querying, Both have comparable performance. Single use cases, they’re complementary again.
Virtuoso is better with queries involving distant subgraphs (traversal not possible) and with complex branching, based on (nested) UNIONs.
In the latter case, SPARQL looks easier to write (especially with nested patterns), though OpenCypher promised subqueries
Virtuoso is better with queries involving distant subgraphs (traversal not possible) and with complex branching, based on (nested) UNIONs.
In the latter case, SPARQL looks easier to write (especially with nested patterns), though OpenCypher promised subqueries