15. New Solution to the Bacon Problem $keanu = $actorIndex->find('name', 'Keanu Reeves'); $kevin = $actorIndex->find('name', 'Kevin Bacon'); $path = $keanu->findPathTo($kevin);
* Six degrees game * Relational databases can't easily answer certain types of questions
* first pass using a relational database * cast table: actor_name, movie_title * hard to visualize the solution * In order to do this, you need to do multiple passes or joins
* Each degree adds a join * Increases complexity * Decreases performance * Stop when the actor you're looking for is in the list
* this problem highlights the ugly truth about RDBs * they weren't designed to handle these types of problems. * RDB relationships join data, but are not data in themselves
* Gather everything in the set that matches these criteria, then tell me if this thing is in the set * 1 set, no problem * 2nd set no problem * 3rd set not related to 1st * 4th not related to 2nd * 5th related to 1st and 4th * etc. * Relationships are only available between overlapping sets
* disjoint sets
* Graphs * Not X-Y * Computer Science definition of graphs
* graph theory
* Nodes can have arbitrary properties * Relationships can have arbitrary properties * Paths are found using traversal algorithms * Indexes help find starting points
* This is how graph dbs solve the problems that RDBs can't
* Tree data-structures * Networks * Maps * vehicles on streets == packets through network
* Make each record a node * Make every foreign key a relationship * RDB indexes are usually stored in a tree structure * Trees are graphs * Why not use RDBs? * The trouble with RDBs is how they are stored in memory and queried * Require a translation step from memory blocks to graph structure * Relationships not first-class citizens * Many problem domains map poorly to rows/tables
* Actors are nodes * Movies are nodes * Relationship: Actor is IN a movie * pseudo-code shortened for brevity * Compare to degree selection join queries
* Social networking - friends of friends of friends of friends * Assembly/Manufacturing - 1 widget contains 3 gadgets each contain 2 gizmos * Map directions - starting at my house find a route to the office that goes past the pub * Multi-tenancy - root node per tenant * all queries start at root * No overlap between graphs = no accidental data spillage * Fraud: track transactions back to origination * Pretty much anything that can be drawn on a whiteboard
* Example: retail system * Customer makes Order * Store sells Order * Order contains Items * Supplier supplied Items * Customer rates Items * Did this customer rank supplier X highly? * Which suppliers sell the highest rated items? * Does item A get rated higher when ordered with Item B? * All can be answered with RDBs as well * Not as elegant * Not as performant
* Recreate Google+
* billions of nodes and relationships in a single instance * cluster replication * transactions * native bindings for Ruby, Python, and language that can run in JVM * Licensing * Neo4jPHP - Josh's REST client, no affiliated with Neo Technologies
* Index can be saved separately * Or it is saved on `add` * Note that indexes don't have to be on real properties or values
* This is where the power of graph dbs comes from * Paths - find any relationship chain between A and B * Traversal - filter out paths that don't meet criteria * Queries - Here is what I want, find it however you can
* Paths deal with two known nodes * start and end point * This is the Kevin Bacon example, but with multiple datatypes * Path can be treated as an array of nodes or relationships * findPathsTo() returns a PathFinder which can have further restrictions placed on it
* Written in Javascript * plugins provide other languages: Groovy, Python * Anything that runs on JVM * Path object, check apidocs * inline edit/update/delete * explicit prune evaluator of maxDepth = 1 unless overriden * built in prune: none * built in return: all or all-but-start * Prune: should we continue doen this path? Return: Should we return the entity at this position? * You can return things and still continue traversing * Pros: expressive, powerful, complex search behaviors, in-line edit/update * Cons: complex to write, complex to understand (query languages make this better)
* Not very familiar with it * Just mentioning it's out there
* Cypher is "what to find" * describe the "shape" of the thing you're looking for * Very white-board friendly * Pros: easy to understand, query looks like domain model * Cons: not as powerful, not fully featured (YET) * result set is an array of arrays
* Three parts ** Where to start ** Shape to find ** possibly qualifiers ** What to return
* If there could be more than one relationship type, could further constrain by ratings
* Webadmin built into neo4j server
* RDBs are really good at data aggregation * Set math, duh * Have to traverse the whole graph in order to do aggregation * Truly tabular means not a lot of relationships between the data types