How to Remove Document Management Hurdles with X-Docs?
Introduction to Gremlin
1. Introduction to
Gremlin
Chicago Graph Database Meet-Up
Max De Marzi
2. About Me
Built the Neography Gem (Ruby
Wrapper to the Neo4j REST API)
Playing with Neo4j since 10/2009
• My Blog: http://maxdemarzi.com
• Find me on Twitter: @maxdemarzi
• Email me: maxdemarzi@gmail.com
• GitHub: http://github.com/maxdemarzi
3. Agenda
• What is Gremlin?
• Gremlin in Neo4j
• Gremlin Steps
• Gremlin Recommends
7. Gremlin is
• A Graph Traversal Language
• A domain specific language for traversing
property graphs
• Implemented by most Graph Database
Vendors
• Primarily seen with the Groovy Language
• With JVM connectivity in Java, Scala, and
other languages
36. Recommendation Algorithm
m = [:];
x = [] as Set; (continued)
v = g.v(node_id); outV.
outE('rated').
v. filter{it.stars > 3}.
out('hasGenre'). inV.
aggregate(x). filter{it != v}.
back(2). filter{it.out('hasGenre').toSet().equals(x)}.
inE('rated'). groupCount(m){"${it.id}:${it.title}"}.iterate();
filter{it. stars > 3}. m.sort{a,b -> b.value <=> a.value}[0..24]
37. Explanation
m = [:];
x = [] as Set;
v = g.v(node_id);
In Groovy [:] is a map, we will return this
The set “x” will hold the collection of genres we want our recommended
movies to have.
v is our starting point.
38. Explanation
v.
out('hasGenre'). (we are now at a genre node)
aggregate(x).
We fill the empty set “x” with the genres of our movie.
These are the properties we want to make sure our recommendations have.
39. Explanation
back(2). (we are back to our starting point)
inE('rated').
filter{it. stars > 3}. (we are now at the link between our movie and users)
We go back two steps to our starting movie, go to the relationship ‘rated’
and filter it so we only keep those with more than 3 stars.
40. Explanation
outV. (we are now at a user node)
outE('rated').
filter{it.stars > 3}. (we are now at the link between user and movie)
We follow our relationships to the users who made them, and then
go to the “rated” relationships of movies which also received more
than 3 stars.
41. Explanation
inV. (we are now at a movie node)
filter{it != v}.
We follow our relationships to the movies who received the, but filter out “v”
which is our starting movie. We do not want the system to recommend the
same movie we just watched.
43. Explanation
groupCount(m){"${it.id}:${it.title}"}.iterate();
groupCount does what it sounds like and stores the values in the map “m”
we created earlier, but we to retain the id and title of the movies.
iterate() is needed from the Neo4j REST API, the gremlin shell does
it automatically for you. You will forget this one day and kill
30 minutes of your life trying to figure out why you get nothing.
44. Explanation
m.sort{a,b -> b.value <=> a.value}[0..24]
Finally, we sort our map by value in descending order and grab the top
25 items… and you’re done.
See http://maxdemarzi.com/2012/01/16/neo4j-on-heroku-part-two/
for the full walk-through including data loading.
45. How to treat Gremlin in Neo4j
As the equivalent of Stored Procedures in SQL.
Allow only parameters from end-users, do not
generate gremlin dynamically or you’ll have the
mother of all SQL injection vulnerabilities…
Gremlin => Groovy => JVM => Full Power