SlideShare a Scribd company logo
1 of 39
Download to read offline
Evolution of the Graph Schema
Data Day Seattle 2017
Joshua Shinavier, PhD
20.10.2017
1. Knowledge and graphs
2. Semantic Web to Property Graphs
3. Re-emergence of the graph schema
4. Elements of a schema language
5. Graph and schema management
6. Graph generation
Outline
Knowledge and graphs
• Performance is a factor, but
• Many storage back-ends can be adapted to graphs
• E.g. relational DBs, column stores, key-value stores
• Better reasons:
• The domain model is graph-like
• We can take inspiration from the way we naturally
understand the world
Why graph databases?
Early data modelers
κατηγορία!
• How do we relate data with concepts in order to make
inferences or take action?
• We use schemata — rules that constrain data to a
language of “categories” (concepts)
• Some fundamental categories are built in
• E.g. plurality, necessity, limitation, negation,
reciprocity, etc.
• Others are built upon the foundation
Kant’s “schemata”
This schematism […] is an art, hidden in the depths of the human
soul, whose true modes of action we shall only with difficulty
discover and unveil. (Kant, 1781)
• Psychologists saw Kant’s “schemata” in the organization
of human memory (Head, 1920), but
• Memory is more than storage and recall
• We react to data by combining a schema with an
attitude (Bartlett, 1933)
Schemas in psychology
• Scripts, plans, goals (Schank & Abelson, 1970s)
• Frames (Minsky, 1974)
• Early KR languages
• Upper-level ontologies and commonsense
knowledge bases
Enter databases and AI
Semantic Web to Property Graphs
• A vocabulary for vocabulary sharing
• Includes a handful of basic terms
• Classes, properties, inheritance
• Meets the needs of most Web schemas
RDF Schema (RDFS)
• A much more expressive language for ontology development
• Supports:
• Classes and properties with inheritance
• Equality (sameAs, differentFrom, equivalentTo)
• Property domain/range restrictions, cardinality restrictions
• Inverse, transitive, and symmetric properties
• Ontology metadata (imports, versioning)
• Sublanguages OWL Full, OWL DL (description logic), OWL
Lite
• OWL 2 profiles EL (polynomial-time checking), QL (memory-
efficient query answering), DL (completeness and decidability)
• Is this slide too dense? OWL is huge.
OWL
• What commercial applications ended up using
• Supported by AllegroGraph, TopBraid, etc.
• All of RDFS
• Classes, properties, inheritance
• A few terms stolen from OWL
• e.g. sameAs, inverseOf, TransitiveProperty
“RDFS+”
The Web of Data…
• Property Graph data model takes a minimalist
approach
• Typically no inference or rules support
• Graph DBs, schema.org are a response to real-world
demands
…simplified
1
3
2
foo
foo
bar
Re-emergence of the graph schema
• There is power in simplicity
• NoSQL databases are said to have no predefined
schema
• In practice, every graph DB has a schema
• A set of constraints or assumptions about correct
structure
• Useful for validation and optimization
• There is no graph schema standard
NoSQL ⇏ no schema
• Property Graph data model is a basic schema
• Edge labels (required)
• Vertex labels (optional)
• Property keys (required)
• Property data types (optional, with optional constraints)
• Vertex meta-properties (optional)
Schemas in TinkerPop
• Labels
• Simple types on nodes and/or relationships
• Indexes
• Single-property — equality, existence, containment,
ranges
• Composite (multiple properties) — equality only
• Constraints
• Node property uniqueness
• Node/relationship property existence
• Node key (set of properties unique for the node)
Schemas in Neo4j
• Vertex and edge labels
• Property keys
• Property cardinality (SINGLE, LIST, SET)
• Indexes
• Graph-centric
• Individual properties, composite
• Vertex-centric (index on incoming/outgoing edges)
• Sorting key, sort order
• Automatic/implicit schema creation
Schemas in JanusGraph
• Object databases ≠ graph databases, but similar
• Built-in, object-oriented schemas
• Classes, extension, relationships, recursivity, etc.
• Used for encapsulation, composition, inheritance,
delegation, etc.
• OOP frameworks for graph DBs
• Frames, Ferma, etc.
Schemas in object databases
• Hypernode
• Objects, relations, and functions
• GROOVY
• Multi-level OOP schemas
• Hypergraph DB
• Types and relationships
• Grakn.AI
• Entities, relations, roles, and resources (data type,
uniqueness, regex)
• Single inheritance
Schemas in hypergraph databases
Elements of a schema language
• Support for a basic schema vocabulary
• Entity and relationship types, constraints
• Good coverage of existing schema frameworks
• Extensibility of schemas and types
• Mappings to RDF, schema.org, and storage frameworks
• Reference APIs for
• Schema validation
• Graph schema initialization and migration
• Statistical models, graph generation
Design goals
• Things about which we can make assertions
• “Classes” in RDF, “types” in schema.org, “vertex labels”
in TinkerPop, etc.
• Extend other entity types
Entity types
entities:
- label: Trip
sameAs: http://schema.org/TravelAction
description: A trip taken by a driver or requested by a rider
• Assertions about things
• “Properties” in RDF and schema.org
• “Edges” vs. “properties” in graph databases
• Hyperedges, meta-properties are also “relations”
Relationship types
relations:
- label: requested
description: Relates a rider to a trip he or she has requested
extends:
- core.relatedTo
cardinality: OneToMany
from: users.User
to: Trip
• Graph-centric
• Single-relation, composite
• Entity-centric
• Ordering on a secondary key
Index hints
indexes:
- key: core.uuid
- key: trips.requested
direction: Out
orderBy: core.createdAt
order: Decreasing
• Schemas import other schemas, like software modules
• Give developers/teams autonomy, but
• Coordinate schema integration top-down
Schema imports
name: production
version: 1.2
includes:
- name: trips
version: 1.2
- name: referrals
version: 1.2
Graph and schema management
• Study the source data
• Extend and validate the shared schema
• Generate artificial graph data
• Study system performance, iterate on the model
• Develop ingestion mappings for real data
• Review and check in schema changes
• Apply the schema to a live database
• Ingest data into the live database
Graph onboarding workflow
Revision control for schemas
• The schema is constantly changing
• Is this database compatible with this schema?
• How to update the database w.r.t. the schema?
• Use revision control to find diffs
• Ordered lists of basic changes
• Translate diffs to storage-specific workflows
• Ordered lists of idempotent operations
• Apply diff workflows to the database
Schema initialization, migration public enum SchemaChange {
AbstractAttributeChanged,
CardinalityChanged,
DomainChanged,
EntityAdded,
EntityRemoved,
ExtensionAdded,
ExtensionRemoved,
IncludeAdded,
IncludeRemoved,
IndexAdded,
IndexRemoved,
RangeChanged,
RelationAdded,
RelationRemoved,
RequiredAttributeChanged,
RequiredOfAttributeChanged,
SchemaAdded,
SchemaRemoved,
SchemaNameChanged,
SchemaVersionChanged,
}
Schema diff and patch
New
Database
Schema x.1
Schema x.2
Database at
Schema x.1
initialize
Diff of x.1
and x.2
Database at
Schema x.2
apply
diff
find
diff
Migration is not always possible
Don’t feel bad!
Basic schemas can’t be changed!
• E.g.
• Removal or abstraction of types already in use
• Changes unsupported at the storage level
Graph generation
• Problem:
• Need to predict write throughput, read latency
given 10x more data
• Analytical solutions are difficult
• Solution?
• Generate graphs of different sizes
• Study the trends
• Problem:
• Where do we get the data?
• Shrinking or growing real data is difficult
Capacity planning
• Existing graph benchmarks
• Lancichinetti-Fortunato-Radicchi (LFR) benchmark
• graphdb-benchmarks
• Linked Data Benchmark Council (LDBC)
• SPARQL benchmarks for triple stores
• None of these are very much like our data
• Not a social network; no power law distributions
• Vastly different topology
• Idea: use the schema to generate statistically
representative data
Benchmarking options
• Gather some statistics
• Entity and relationship type distributions
• Per-relationship in- and out-degree distributions
• Add these to the schema
• Give the Graphgen utility a dataset size, random seed
• Graphgen attempts to create a graph in accordance
with the model
• Gather statistics from the generated graph
• Compare and contrast
• Same dataset can be generated in different
environments
Graph generation workflow
Q&A
Joshua Shinavier
joshsh@uber.com
Kyler Liu
kylerliu@uber.com
Vignesh Ganapathy
vigneshg@uber.com
Evolution of the Graph Schema

More Related Content

What's hot

Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchNeo4j
 
How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...HostedbyConfluent
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsPeter Haase
 
HyperGraphDb
HyperGraphDbHyperGraphDb
HyperGraphDbborislav
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPHortonworks
 
Ontology Engineering: Introduction
Ontology Engineering: IntroductionOntology Engineering: Introduction
Ontology Engineering: IntroductionGuus Schreiber
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...Databricks
 
딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향홍배 김
 
a shift in our research focus: from knowledge acquisition to knowledge augmen...
a shift in our research focus: from knowledge acquisition to knowledge augmen...a shift in our research focus: from knowledge acquisition to knowledge augmen...
a shift in our research focus: from knowledge acquisition to knowledge augmen...Fabien Gandon
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4jNeo4j
 
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudRichard Cyganiak
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 

What's hot (20)

Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...How Kafka Powers the World's Most Popular Vector Database System with Charles...
How Kafka Powers the World's Most Popular Vector Database System with Charles...
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
HyperGraphDb
HyperGraphDbHyperGraphDb
HyperGraphDb
 
Dynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDPDynamic Column Masking and Row-Level Filtering in HDP
Dynamic Column Masking and Row-Level Filtering in HDP
 
Ontology Engineering: Introduction
Ontology Engineering: IntroductionOntology Engineering: Introduction
Ontology Engineering: Introduction
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
ACID ORC, Iceberg, and Delta Lake—An Overview of Table Formats for Large Scal...
 
딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향
 
a shift in our research focus: from knowledge acquisition to knowledge augmen...
a shift in our research focus: from knowledge acquisition to knowledge augmen...a shift in our research focus: from knowledge acquisition to knowledge augmen...
a shift in our research focus: from knowledge acquisition to knowledge augmen...
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
 
SHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data MudSHACL: Shaping the Big Ball of Data Mud
SHACL: Shaping the Big Ball of Data Mud
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 

Similar to Evolution of the Graph Schema

Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphDataWorks Summit
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWSAmazon Web Services
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Advance Java Training in Bangalore | Best Java Training Institute
Advance Java Training in Bangalore | Best Java Training Institute Advance Java Training in Bangalore | Best Java Training Institute
Advance Java Training in Bangalore | Best Java Training Institute TIB Academy
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
Infromation Reprentation, Structured Data and Semantics
Infromation Reprentation,Structured Data and SemanticsInfromation Reprentation,Structured Data and Semantics
Infromation Reprentation, Structured Data and SemanticsYogendra Tamang
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The CloudImaginea
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloudImaginea
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singhMayank Singh
 
A review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebA review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebSimon Price
 

Similar to Evolution of the Graph Schema (20)

Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Advance Java Training in Bangalore | Best Java Training Institute
Advance Java Training in Bangalore | Best Java Training Institute Advance Java Training in Bangalore | Best Java Training Institute
Advance Java Training in Bangalore | Best Java Training Institute
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Infromation Reprentation, Structured Data and Semantics
Infromation Reprentation,Structured Data and SemanticsInfromation Reprentation,Structured Data and Semantics
Infromation Reprentation, Structured Data and Semantics
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
NoSql
NoSqlNoSql
NoSql
 
Ontologies & linked open data
Ontologies & linked open dataOntologies & linked open data
Ontologies & linked open data
 
ORM Methodology
ORM MethodologyORM Methodology
ORM Methodology
 
A review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebA review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic Web
 

More from Joshua Shinavier

Transpilers Gone Wild: Introducing Hydra
Transpilers Gone Wild: Introducing HydraTranspilers Gone Wild: Introducing Hydra
Transpilers Gone Wild: Introducing HydraJoshua Shinavier
 
In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)Joshua Shinavier
 
In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)Joshua Shinavier
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Joshua Shinavier
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsJoshua Shinavier
 
semantic markup using schema.org
semantic markup using schema.orgsemantic markup using schema.org
semantic markup using schema.orgJoshua Shinavier
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsJoshua Shinavier
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsJoshua Shinavier
 
Real-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsReal-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsJoshua Shinavier
 
The state of the art in Linked Data
The state of the art in Linked DataThe state of the art in Linked Data
The state of the art in Linked DataJoshua Shinavier
 

More from Joshua Shinavier (12)

Transpilers Gone Wild: Introducing Hydra
Transpilers Gone Wild: Introducing HydraTranspilers Gone Wild: Introducing Hydra
Transpilers Gone Wild: Introducing Hydra
 
In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)
 
In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBs
 
Semantics and Sensors
Semantics and SensorsSemantics and Sensors
Semantics and Sensors
 
semantic markup using schema.org
semantic markup using schema.orgsemantic markup using schema.org
semantic markup using schema.org
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of Agents
 
Linked Process
Linked ProcessLinked Process
Linked Process
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter Annotations
 
Real-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsReal-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 chars
 
The state of the art in Linked Data
The state of the art in Linked DataThe state of the art in Linked Data
The state of the art in Linked Data
 

Recently uploaded

What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 

Recently uploaded (20)

What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 

Evolution of the Graph Schema

  • 1. Evolution of the Graph Schema Data Day Seattle 2017 Joshua Shinavier, PhD 20.10.2017
  • 2. 1. Knowledge and graphs 2. Semantic Web to Property Graphs 3. Re-emergence of the graph schema 4. Elements of a schema language 5. Graph and schema management 6. Graph generation Outline
  • 4. • Performance is a factor, but • Many storage back-ends can be adapted to graphs • E.g. relational DBs, column stores, key-value stores • Better reasons: • The domain model is graph-like • We can take inspiration from the way we naturally understand the world Why graph databases?
  • 6. • How do we relate data with concepts in order to make inferences or take action? • We use schemata — rules that constrain data to a language of “categories” (concepts) • Some fundamental categories are built in • E.g. plurality, necessity, limitation, negation, reciprocity, etc. • Others are built upon the foundation Kant’s “schemata” This schematism […] is an art, hidden in the depths of the human soul, whose true modes of action we shall only with difficulty discover and unveil. (Kant, 1781)
  • 7. • Psychologists saw Kant’s “schemata” in the organization of human memory (Head, 1920), but • Memory is more than storage and recall • We react to data by combining a schema with an attitude (Bartlett, 1933) Schemas in psychology
  • 8. • Scripts, plans, goals (Schank & Abelson, 1970s) • Frames (Minsky, 1974) • Early KR languages • Upper-level ontologies and commonsense knowledge bases Enter databases and AI
  • 9. Semantic Web to Property Graphs
  • 10. • A vocabulary for vocabulary sharing • Includes a handful of basic terms • Classes, properties, inheritance • Meets the needs of most Web schemas RDF Schema (RDFS)
  • 11. • A much more expressive language for ontology development • Supports: • Classes and properties with inheritance • Equality (sameAs, differentFrom, equivalentTo) • Property domain/range restrictions, cardinality restrictions • Inverse, transitive, and symmetric properties • Ontology metadata (imports, versioning) • Sublanguages OWL Full, OWL DL (description logic), OWL Lite • OWL 2 profiles EL (polynomial-time checking), QL (memory- efficient query answering), DL (completeness and decidability) • Is this slide too dense? OWL is huge. OWL
  • 12. • What commercial applications ended up using • Supported by AllegroGraph, TopBraid, etc. • All of RDFS • Classes, properties, inheritance • A few terms stolen from OWL • e.g. sameAs, inverseOf, TransitiveProperty “RDFS+”
  • 13. The Web of Data…
  • 14. • Property Graph data model takes a minimalist approach • Typically no inference or rules support • Graph DBs, schema.org are a response to real-world demands …simplified 1 3 2 foo foo bar
  • 15. Re-emergence of the graph schema
  • 16. • There is power in simplicity • NoSQL databases are said to have no predefined schema • In practice, every graph DB has a schema • A set of constraints or assumptions about correct structure • Useful for validation and optimization • There is no graph schema standard NoSQL ⇏ no schema
  • 17. • Property Graph data model is a basic schema • Edge labels (required) • Vertex labels (optional) • Property keys (required) • Property data types (optional, with optional constraints) • Vertex meta-properties (optional) Schemas in TinkerPop
  • 18. • Labels • Simple types on nodes and/or relationships • Indexes • Single-property — equality, existence, containment, ranges • Composite (multiple properties) — equality only • Constraints • Node property uniqueness • Node/relationship property existence • Node key (set of properties unique for the node) Schemas in Neo4j
  • 19. • Vertex and edge labels • Property keys • Property cardinality (SINGLE, LIST, SET) • Indexes • Graph-centric • Individual properties, composite • Vertex-centric (index on incoming/outgoing edges) • Sorting key, sort order • Automatic/implicit schema creation Schemas in JanusGraph
  • 20. • Object databases ≠ graph databases, but similar • Built-in, object-oriented schemas • Classes, extension, relationships, recursivity, etc. • Used for encapsulation, composition, inheritance, delegation, etc. • OOP frameworks for graph DBs • Frames, Ferma, etc. Schemas in object databases
  • 21. • Hypernode • Objects, relations, and functions • GROOVY • Multi-level OOP schemas • Hypergraph DB • Types and relationships • Grakn.AI • Entities, relations, roles, and resources (data type, uniqueness, regex) • Single inheritance Schemas in hypergraph databases
  • 22. Elements of a schema language
  • 23. • Support for a basic schema vocabulary • Entity and relationship types, constraints • Good coverage of existing schema frameworks • Extensibility of schemas and types • Mappings to RDF, schema.org, and storage frameworks • Reference APIs for • Schema validation • Graph schema initialization and migration • Statistical models, graph generation Design goals
  • 24. • Things about which we can make assertions • “Classes” in RDF, “types” in schema.org, “vertex labels” in TinkerPop, etc. • Extend other entity types Entity types entities: - label: Trip sameAs: http://schema.org/TravelAction description: A trip taken by a driver or requested by a rider
  • 25. • Assertions about things • “Properties” in RDF and schema.org • “Edges” vs. “properties” in graph databases • Hyperedges, meta-properties are also “relations” Relationship types relations: - label: requested description: Relates a rider to a trip he or she has requested extends: - core.relatedTo cardinality: OneToMany from: users.User to: Trip
  • 26. • Graph-centric • Single-relation, composite • Entity-centric • Ordering on a secondary key Index hints indexes: - key: core.uuid - key: trips.requested direction: Out orderBy: core.createdAt order: Decreasing
  • 27. • Schemas import other schemas, like software modules • Give developers/teams autonomy, but • Coordinate schema integration top-down Schema imports name: production version: 1.2 includes: - name: trips version: 1.2 - name: referrals version: 1.2
  • 28. Graph and schema management
  • 29. • Study the source data • Extend and validate the shared schema • Generate artificial graph data • Study system performance, iterate on the model • Develop ingestion mappings for real data • Review and check in schema changes • Apply the schema to a live database • Ingest data into the live database Graph onboarding workflow
  • 31. • The schema is constantly changing • Is this database compatible with this schema? • How to update the database w.r.t. the schema? • Use revision control to find diffs • Ordered lists of basic changes • Translate diffs to storage-specific workflows • Ordered lists of idempotent operations • Apply diff workflows to the database Schema initialization, migration public enum SchemaChange { AbstractAttributeChanged, CardinalityChanged, DomainChanged, EntityAdded, EntityRemoved, ExtensionAdded, ExtensionRemoved, IncludeAdded, IncludeRemoved, IndexAdded, IndexRemoved, RangeChanged, RelationAdded, RelationRemoved, RequiredAttributeChanged, RequiredOfAttributeChanged, SchemaAdded, SchemaRemoved, SchemaNameChanged, SchemaVersionChanged, }
  • 32. Schema diff and patch New Database Schema x.1 Schema x.2 Database at Schema x.1 initialize Diff of x.1 and x.2 Database at Schema x.2 apply diff find diff
  • 33. Migration is not always possible Don’t feel bad! Basic schemas can’t be changed! • E.g. • Removal or abstraction of types already in use • Changes unsupported at the storage level
  • 35. • Problem: • Need to predict write throughput, read latency given 10x more data • Analytical solutions are difficult • Solution? • Generate graphs of different sizes • Study the trends • Problem: • Where do we get the data? • Shrinking or growing real data is difficult Capacity planning
  • 36. • Existing graph benchmarks • Lancichinetti-Fortunato-Radicchi (LFR) benchmark • graphdb-benchmarks • Linked Data Benchmark Council (LDBC) • SPARQL benchmarks for triple stores • None of these are very much like our data • Not a social network; no power law distributions • Vastly different topology • Idea: use the schema to generate statistically representative data Benchmarking options
  • 37. • Gather some statistics • Entity and relationship type distributions • Per-relationship in- and out-degree distributions • Add these to the schema • Give the Graphgen utility a dataset size, random seed • Graphgen attempts to create a graph in accordance with the model • Gather statistics from the generated graph • Compare and contrast • Same dataset can be generated in different environments Graph generation workflow