5. What is a Graph in math
3
● represent a connected set of objects
● graph:
○ vertex (node/points)
○ edge (arc/line/relationship/arrow) - undirected
○ attribute (property) - on node/relationship
● types:
○ pair: G = (V, E)
○ digraph: D = (V, A)
○ mixed: G = (V, E, A)
V = {1, 2, 3, 4, 5, 6}
E = {{1, 2}, {1, 5}, {2, 3}, {2, 5}, {3, 4}, {4, 5}, {4, 6}}
6. What is a Graph database
4
● stores data in a graph and retrieving vast networks of data
● shines when storing richly-connected data
● consists of nodes, connected by relationships
○ A Graph —records data in→ Nodes —which have→ Properties
○ Nodes —are organized by→ Rels —which also have→ Properties
○ Nodes —are grouped by→ Labels —into→ Sets
○ A Traversal —navigates→ a Graph
it —identifies→ Paths —which order→ Nodes
○ An Index —maps from→ Properties —to either→ Nodes or Rels
○ A Graph Database —manages a→ Graph and
—also manages related→ Indexes
8. Graph Traversal
6
A Traversal
—navigates→ a Graph
it
—identifies→ Paths
—which order→ Nodes
what music
do my friends like
that I don’t yet own
if this power supply goes down,
what web services
are affected?
9. Graph Index
7
An Index
—maps from→ Properties
—to either→ Nodes or Rels
find the Account
for username master-of-graphs
13. A Graph Database elaborates a Key-Value Store
11
K* = key
V* = value
14. A Graph Database relates Column-Family
12
● BigTable databases are an evolution of key-value,
using "families" to allow grouping of rows
● stored in a graph, the families could become
hierarchical, and the relationships among data
becomes explicit
15. A Graph Database navigates a Document Store
13
D=Document,
S=Subdocument,
V=Value,
D2/S2 = reference
18. ● intuitive, using a graph model for data representation
● reliable, fully transactional, upholds ACID
● durable and fast, using a custom disk-based, native storage engine
● massively scalable, up to several billion nodes/relationships/properties
● highly-available, when distributed across multiple machines
● expressive, with a powerful, human readable declarative graph query
language
● fast, with a powerful traversal framework for high-speed graph queries
● embeddable, with a few small jars
● simple, accesible by a convenient REST API interface or an object-
oriented JAVA API
● indexes are based on Apache Lucene, supports Secondary Indexes
● has been in commercial development for 10 years and in production for
over 7 years; since 2003;
● Cross-platform; Simple set-up; Well documented; Open source;
● GPL for Community, AGPL for Enterprise
16
Neo4j features
19. ● CPU - Intel Core i3/i7
● Memory - 2GB .. 16/32GB
● Disk - 10GB SATA .. SSD w/ SATA
● Filesystem - ext4 .. ext4/ZFS
● Software - Oracle JAVA 7
17
Neo4j requirements
20. ● Neo4j Community
○ Open-Source High Performance
○ fully ACID transactional graph database
● Neo4j Enterprise
○ High-Performance Cache (up to 10x faster)
○ Horizontal scalability with Neo4j Clustering (predictable scalability)
○ High-availability and online backups
○ Cache based sharding (shard your graph in memory)
○ Advanced Monitoring (operational metrics)
○ Certified for Windows and Linux
○ Email/Phone Support (10x5, 24x7 hours)
○ Subscriptions
■ Personal (up to 3 devs, $100k annual revenue) = FREE
■ Startups (<$10M funding, <$5M annual revenue) = $12k
■ Business (medium, to Global 2000) = Contact Sales
18
Neo4j license
21. 19
● for the simple friends of friends query, Neo4j is 60% faster than MySQL
● for friends of friends of friends, Neo is 180 times faster
● and for the depth four query, Neo4j is 1,135 times faster
● and MySQL just chokes on the depth 5 query
Neo4j vs. Mysql
22. Neo4j: Nodes
● fundamental units that form a graph
● can have key/value-style properties
● index nodes and relationships
by {key, value} pairs
● represent entities
20
23. Neo4j: Relationships #1/2
● connect entities and structure domain
● allow for finding related data
● are always directed (outgoing or incoming)
● are equally well traversed in either direction
● can have relationships to itself
● have a relationship type (label)
21
25. Neo4j: Properties
● nodes and relationships can have properties
● are key-value pairs
○ key is a string
○ values can be either a primitive or an array of
one primitive type
■ boolean, String, int, int[], etc
■ Java Language Specification
● entity attributes, rels qualities,
and metadata
23
26. Neo4j: Labels
● used to group nodes into sets
● any number of labels, including none
● can be added and removed during runtime
● can be used to mark temporary states for nodes
● names case-sensitive
● CamelCase (convention)
24
27. Neo4j: Paths
● is one or more nodes with connecting relationships
● shortest path:
● a path of length one:
● a path of length one:
25
28. Neo4j: Traversal
● Traversal Framework from box
● means visiting nodes, following relationships by rules
● in most cases only a subgraph is visited
● callback based traversal API
○ you can specify the traversal rules
● traversing breadth- or depth-first
● open Java API
26
29. Neo4j: graph algorithms
● A* (> uses the A* algorithm to find the cheapest path between two
nodes)
● Dijkstra (dijkstra > Dijkstra algorithm to find the cheapest path
between two nodes)
● PathWithLength (> all paths of a certain length (depth)
between two nodes)
● Shortest paths (shortestPath Default > find all the
shortest paths between two nodes)
● All simple paths (allSimplePaths > find all simple paths
between two nodes; without loops;)
● All paths (allPaths > find all available paths between two
nodes)
27
31. ● introduced in Neo4j 2.0
● eventually available (populating in the background, is
not immediately available for querying)
○ come online after fully populated
○ failed status (drop and recreate the index)
● can be created on labels group
● indexed Nodes & Rels
● node_auto_indexing=false,
node_keys_indexable
Neo4j: Index
29
32. Neo4j: Constraints
● can help you keep your data clean
● specify the rules for what your data should
look like
● unique constraints is the only available
constraint type
30
33. ● single server instance
○ nodes = 2^35 (~34 billion)
○ relationships = 2^35 (~34 billion)
○ labels = 2^31 (~2 billion)
○ properties = 2^36 to 2^38 depending on
property types (maximum ~274 billion, always
at least ~68 billion)
○ relationship types = 2^15 (~ 32’000)
31
Neo4j: Data Size
34. ● powerful graph query language
● relatively simple
● declarative grammar (say what you want, not how)
● humane query language
● self-explanatory (based on English prose and neat iconography)
● written in Scala
● pattern-matching (borrows expression approaches from SPARQL)
● aggregation, ordering, limits
● create, update, delete
● structure and most of keywords inspired by SQL
● changing rather rapidly (CYPHER 1.9 START ...)
Cypher Query Language
32
“Makes the simple things easy, and the complex things possible”
37. Cypher: START / RETURN
“It all starts with the START”
Michael Hunger, Cypher webinar, Sep 2012
● designates the start points
● START is optional (in Neo4j >= 2.0)
Examples:
● START <lookup> RETURN <expression>
● START n=node(0) RETURN n
● START n=node(*) RETURN n.name
35
38. Cypher: MATCH
● primary way of getting data from the database
● START <lookup> MATCH <pattern> RETURN <expr>
● OPTIONAL MATCH <lookup> RETURN <expr>
Examples:
● MATCH (n) RETURN count(n)
● MATCH (actor:Actor) RETURN actor.name;
● START me=node(0) MATCH (me)--(f) RETURN f.name
● MATCH (n)-[r]->(m) RETURN n AS FROM, r AS `->`, m AS TO
36
40. Cypher: WHERE
● filters the results
● MATCH <pattern> WHERE <condition> RETURN <expr>
Examples:
● WHERE n.name =~ “(?i)John.*”
● WHERE NOT ..
● WHERE type(rel) =~ “Perso.*”
38
41. Cypher: RETURN
● creates the result table
● any query can return data
● can be nodes, relationships, or properties on these
● RETURN DISTINCT <expression> AS x
● RETURN aggregate(expr) as alias
● RETURN nodes, rels, properties
● RETURN expressions of funcs and operators
● RETURN aggregation funcs on the above
39
42. Cypher: etc
● CASE / WHEN / ELSE
● ORDER BY node.key, node2.key, .. ASC|DESC
● LIMIT / SKIP
● WITH (WITH count(*) as c)
● UNION / UNION ALL (combining results from multiple queries)
● USING INDEX/SCAN
● MERGE / SET / DELETE / REMOVE / FORECH
● Expressions
● Operators
● Comments
● Functions: ALL, ANY, LENGTH, {Math}, {String}, ...
40
43. ● any updating query will run in a transaction
● ACID
● “it is very important to finish each transaction”
● write lock on node/rel:
○ adding, changing or removing prop on a node/rel
● write lock on node:
○ creating or deleting a node
● write lock on node and both its nodes:
○ creating or deleting a relationship
Cypher: Transactions
41
45. ● SELECT *
FROM Person
WHERE name=“Valentin” and age > 30
● START person=node:Person(node=”Valentin”)
WHERE person.age > 30
RETURN person
Cypher: back to SQL #1/5
43
46. Cypher: back to SQL #2/5
● SELECT “Email”.*
FROM Person
JOIN “Email” ON “Person”.id = “Email”.person_id
WHERE “Person”.name = “Benedikt”
● START person=node:Person(name=”Benedikt”)
MATCH person-[:email]->email
RETURN email
44
47. Cypher: back to SQL #3/5
● show me all people that are both actors and
directors
● SELECT name FROM Person
WHERE
person_id IN (SELECT person_id FROM Actor) AND
person_id IN (SELECT person_id FROM Director)
● START person=node:Person(“name:*”)
WHERE (person)-[:ACTS_IN]->()
AND (person)-[:DIRECTED]->()
RETURN person.name
45
48. Cypher: back to SQL #4/5
● show me all Tom Hanks’s co-actors
● SELECT DISTICT co_actor.name FROM Person tom
JOIN Movie a1 ON tom.person_in = a1.person_id
JOIN Actor a2 ON a1.movie_id = a2.movie_id
JOIN Person co_actor ON co_actor.person_id = a2.person_id
WHERE tom.name = “Tom Hanks”
● START tom=node:Person(name=”Tom Hanks”)
MATCH tom-[:ACTS_IN]->movie,
co_actor-[:ACTS_IN]->movie
RETURN DISTINCT co_actor.name
46
49. Cypher: back to SQL #5/5
● show me all Lucy’s favorite directors
● SELECT dir.name, count(*) FROM Person lucy
JOIN Actor on Person.person_id = Actor.person_id
JOIN Director ON Actor.movie_id = Director.movie_id
JOIN Person dir ON Director.person_id = dir.person_id
WHERE lucy.name = “Lucy Liu”
GROUP BY dir.name
ORDER BY count(*) DESC
● START lucy=node:Person(name=”Lucy Liu”)
MATCH lucy-[:ACTS_IN]->movie,
director-[:DIRECTED]->movie
RETURN director.name, count(*)
ORDER BY director.name, count(*) DESC
47
50. START
lucy = node:Person(name=”Lucy Lui”),
kevin = node:Person(name=”Kevin Bacon”)
MATCH
p = shortestPath( lucy-[:ACTS_IN*]-kevin )
RETURN
EXTRACT (n in NODES(p):
COALESCE(n.name?, n.title?))
48
Cypher: back to SQL #6/5
52. Neo4j: Security
● does not deal with data encryption
explicitly
● can be used all means built into the Java
● can be used encrypted datastore
● webadmin https
50
53. ● manipulate data stored in RDF format
● focused on match triple sets
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?email
WHERE {
?person a foaf:Person.
?person foaf:name ?name.
?person foaf:mbox ?email.
}
SPARQL
51
54. ● graph traversal language
● scripting language
● Pipe & Filter (similar to jQuery)
● across different graph databases
● based on Groovy (limited to Java)
● not as stable in Neo4j
● XPath like
● ./outE[label=”family”]/inV/@name
● g.v(1).out('likes').in('likes').out('likes').groupCount(m)
● g.V.as('x').out.groupCount(m).loop('x'){c++ < 1000}
● g.v(1).in(‘LOVE_OF’).out(‘SOME_IN’).has(‘title’,’abc’).back(2)
Gremlin
52
55. Neo4j and PHP
● everyman/neo4jphp < packagist.org
○ PHP wrapper for the Neo4j using REST interface
○ Follows the PSR-0 autoloading standard
○ Basic wrappers for all components
○ Last update - a month ago
○ supports Gremlin
● Neo4j-PHP OGM < a lot of based on
○ Object Graph Mapper, inspired by Doctrine
○ based on DoctrineCommon
○ borrows significantly DoctrineORM design
○ uses annotations on classes
○ MIT Licence
● Neo4J PHP REST API client
○ Using Neo4j REST API
○ Node create/find/delete
○ Relationship create/list/filter
53
56. High Availability with Neo4j
● in HA - a single master and zero or more slaves
● slave synchronizing with the master to preserve
consistency
● master write to slave before transaction completes
54
57. Demo
Neo4j.org Example Datasets:
● DrWho (nodes=1'060; rels=2'286)
● Cineasts Movies & Actors (nodes=64'069; rels=121'778)
● Hubway Data Challenge (nodes=554'674; rels=2'011'904)
GraphGist:
● JIRA and neo4j
● PHP and neo4j
● Kant in neo4j
XSS
55
65. ● GrapheneDB - based on neo4j
● AllegroGraph - Closed Source, Commercial, RDF-QuadStore
● Sones - Closed Source, .NET focused
○ graph database built around the W3C spec for the Resource
Description Framework
○ supports SPARQL, RDFS++, and Prolog
● Virtuoso - Closed Source, RDF focused
● GraphDB - graph database built in .NET by the German company sones
● InfiniteGraph - goal is to create a graph database with "virtually
unlimited scalability."
● FlockDB
Analogues
63
67. ● best used for graph-style,
rich or complex,
structured dense data,
deep graphs with unlimited depth and cyclical,
with weighted connections,
interconnected data
● quickly add new functionality without impacting
existing deployments
● schema-less forcing to re-think entire approach to data
● not the silver bullet for all problems
Conclusion