SlideShare a Scribd company logo
1 of 60
Download to read offline
Gremlin       G = (V, E)

A Graph-Based Programming Language
             Marko A. Rodriguez
       T-5, Center for Nonlinear Studies
       Los Alamos National Laboratory
        http://markorodriguez.com
      http://gremlin.tinkerpop.com

              February 25, 2010
Abstract
Gremlin is a Turing-complete, graph-based programming language
developed for key/value-pair multi-relational graphs called property graphs.
Gremlin makes extensive use of XPath 1.0 to support complex graph
traversals. Connectors exist to various graph databases and frameworks.
This language has application in the areas of graph query, analysis, and
manipulation.




                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Acknowledgements
• Marko A. Rodriguez [http://markorodriguez.com]
  designed, developed, tested, and documented Gremlin.
• Peter Neubauer [http://www.linkedin.com/in/neubauer]
  aided in the design and the evangelizing of Gremlin.
• Pavel Yaskevich [http://github.com/xedin]
  aided in the development of user defined functions in Gremlin.
• Joshua Shinavier [http://fortytwo.net]
  provided initial conceptual support for Gremlin.
• Ketrina Yim [http://csillustrated.berkeley.edu]
  designed the logo for Gremlin.
• Gremlin-Users Group [http://groups.google.com/group/gremlin-users]
  provided much direction in the design and implementation of Gremlin.




               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusions




                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusions




                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
What is a Graph?
• A graph (network) is composed of a collection of vertices (dots) and edges (lines).
  There are many types of graphs: directed/undirected, weighted, attributed, etc.



                                                   vertex-labeled

                                                           a
                                                                                        hyper
                                                                             d                   edge-attributed
                                          ed                            bele
                                       ht                          e-la
                 multi




                                    ig                          edgknows                        created=2-01-09
                                  we 0.2                                                        modified=2-11-09




                                                                                 cted
                                                   tic




                                                                               undire
                                                                 di
                                               an




                                                                    re
                                                                    ct
                                               m




                                                   hired               ed
                                           se




                         reg
              ge




                            ula
            half-ed




                               r
                                                                                                   pseudo
                                                                         http://ex.com/123
                                  type="person"
                                  name="emil"                  resource description framework

                               vertex-attributed



                         Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Why Use a Graph?

• A graph is a very general data structure that can be used to model
  various systems.
    A graph can model the structure of transportation, technological,
    bibliographic, etc. systems.
    A graph can model a list, a map, a tree, etc.

• There are numerous graph algorithms that are defined independent of
  the domain of the graph model.

• There are numerous graph databases, frameworks, packages, etc.
  that aid in the creation, manipulation, and analysis of graphs.




             Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Graph Databases, Frameworks, and Packages
•   Neo4j Graph Database [http://neo4j.org]
•   AllegroGraph Quad Store [http://http://www.franz.com/agraph]
•   HyperGraphDB [http://www.kobrix.com/hgdb.jsp]
•   Java Universal Network/Graph Framework [http://jung.sourceforge.net]
•   OpenRDF Sesame Framework [http://www.openrdf.org]
•   InfoGrid Graph Database [http://infogrid.org]
•   Filament Graph Toolkit [http://filament.sourceforge.net]
•   OWLim Semantic Repository [http://www.ontotext.com/owlim]
•   Sones Graph Database [http://www.sones.com]
•   NetworkX Graph Toolkit [http://networkx.lanl.gov]
•   iGraph Toolkit [http://igraph.sourceforge.net]
•   Blueprints Graph API [http://blueprints.tinkerpop.com]
•   ... and many more.



                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
What Makes Gremlin Different?
• Gremlin is a domain specific language for working with graphs.

• Gremlin is not an application programming interface (API).

• Gremlin makes use of various graph databases, frameworks, packages.

• Gremlin is a language that currently has a virtual machine
  implementation written in Java.

• What can be succinctly expressed in Gremlin is verbose/clumsy to
  express in general purpose languages such as Java, Python, Ruby, etc.

• Gremlin allows one to map single-relational graph analysis algorithms
  over to the multi-relational domain.


              Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Single-Relational Graphs
• In single-relational graphs, all edges have the same meaning
  (e.g. all edges are either frienship, kinship, worksWith, knows, etc.).
       G = (V, E ⊆ (V × V ))

• Most graph algorithms are defined for single-relational graphs
  (e.g. centrality/ranking, clustering/community detection, etc.).

                                                   person-c




                                  person-a                           person-b




NOTE: These types of graphs are also known as directed, vertex-labeled graphs.


                     Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Multi-Relational Graphs
• In multi-relational graphs, edges can have different meanings.
       G = (V, E ⊂ (V × V ), ω : E → Σ∗)

• Most graph software is designed for multi-relational graphs (e.g. arbitrary
  objects as vertices and edges, knowledge-based reasoning systems, etc.).


                                                    book-c


                                             read              cites


                                  person-a          authored           book-b




NOTE: These types of graphs are also known as directed, vertex/edge-labeled graphs.


                     Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Gremlin and Multi-Relational Graphs

• Gremlin provides a means to elegantly map single-relational graph
  analysis algorithms over to the multi-relational graph domain.

• Gremlin provides an elegant way to do automated reasoning in
  multi-relational graphs using path expressions.

These two points form the primary thesis of this presentation.


Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis
Algorithms,” Journal of Informetrics, 4(1), 29–41, doi:10.1016/j.joi.2009.06.004, LA-UR-08-03931,
http://arxiv.org/abs/0806.2274, December 2009.




                      Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Property Graphs

• Gremlin works with a type of multi-relational graph called a property
  graph.
       Vertices and edges are labeled with unique identifiers.
       Edges are directed, labeled, and can form loops.
       Multiple edges of the same label can exist for the same vertex pair.
       Vertices and edges can have any number of key/value pair
       properties/attributes.

Property graphs are a relatively general graph structure that can be constrained to model other graph
structures — though, a property-based hypergraph would be the most general (see HyperGraphDB and the
JUNG API).




                      Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Property Graphs
                                        name = "lop"
                                        lang = "java"

                       weight = 0.4              3
     name = "marko"
     age = 29            created
                                                                weight = 0.2
                   9
               1
                                                                created
                   8                     created
                                                                          12
               7       weight = 1.0
                                                weight = 0.4                     6
weight = 0.5
                        knows
          knows                          11                               name = "peter"
                                                                          age = 35
                                                name = "josh"
                                        4       age = 32
               2

                                        10
     name = "vadas"
     age = 27
                                             weight = 1.0

                                      created



                                        5

                                name = "ripple"
                                lang = "java"




  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts




              Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Gremlin System Architecture

                                                    • The Gremlin console is a scripting environment
  Gremlin              Gremlin                        which allows for the dynamic evaluation of
  Console            ScriptEngine                     Gremlin code.
                                                    • Gremlin implements JSR 223 which allows
                                                      Gremlin to also be used within the Java
                                                      language and thus, as a virtual machine directly
                                                      accessible to Java applications. Popular JSR
                                                      223 implementations include Jython, JRuby, and
                                                      Groovy. For a fine list of implementations see
                                                      https://scripting.dev.java.net.
                                                    • Blueprints is a set of interfaces for abstract
                                                      data structures such as graphs and documents.
                                                      Implementations to these interfaces exist for
                                                      various data management systems.
                                                    • There exist many graph data management
                                                      systems that span various graph data models
Neo4j       NativeStore   TinkerGraph                 (e.g. edge labeled graphs, RDF graphs,
                                                      hypergraphs, etc.).



             Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
“Hello World” in the Gremlin Console


marko$ ./gremlin.sh

         ,,,/
         (o o)
-----oOOo-(_)-oOOo-----
gremlin>
gremlin> concat(‘goodbye’, ‘ ’, ‘self’)
==>goodbye self




            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Simple Traversals in Gremlin
                                             name = "lop"                               gremlin> $_ := g:key(‘name’,‘marko’)
                                             lang = "java"
                                                                                        ==>v[1]
                            weight = 0.4              3
       name = "marko"
       age = 29                 created
                                                                                        gremlin> .
                1
                       9                                                                ==>v[1]
                                                                     created

                7
                       8                      created
                                                                               12       gremlin> ./outE
                                                                                    6
weight = 0.5
                               knows
                                                                                        ==>e[7][1-knows->2]
               knows                          11
                       weight = 1.0                                                     ==>e[9][1-created->3]
                                                     name = "josh"
                                             4
                 2
                                                     age = 32                           ==>e[8][1-knows->4]
       name = "vadas"
                                             10                                         gremlin> ./outE/@weight
       age = 27
                                                                                        ==>0.5
                                           created
                                                                                        ==>0.4
                                             5
                                                                                        ==>1.0



./outE/@weight: “Get the current object(s). Then get the outgoing edges of those objects. Then get the
weights of those edges.”
$ is a reserved variable meaning the root list of objects.


                                       Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Simple Traversals in Gremlin
                                 name = "lop"                       gremlin> .
                                 lang = "java"
                                                                    ==>v[1]
                                         3
  name = "marko"                                                    gremlin> ./outE[@label=‘created’]/inV
  age = 29          created
              9
                                                                    ==>v[3]
        1                                        created

              8                   created
                                                                    gremlin> $_ := $_last
                                                           12
        7
                                                                6
                                                                    ==>v[3]
      knows
                   knows
                                  11
                                                                    gremlin> ./@name
                                                                    ==>lop
                                  4
        2                                                           gremlin> g:map(.)
                                 10
                                                                    ==>name=lop
                               created
                                                                    ==>lang=java
                                  5




./outE[@label=‘created’]/inV: “Get the current object(s). Then get the outgoing edges of those
objects, where their labels equal ‘created’. Then get the incoming vertices of those ‘created’ edges.”
$ last is a reserved variable meaning the last value evaluated.


                              Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Simple Traversals in Gremlin
                                                 name = "lop"
                                                 lang = "java"

                                                          3
                  name = "marko"
                  age = 29           created
                               9
                         1                                               created

                               8                  created
                                                                                   12
                         7
                                                                                        6
                                    knows
                       knows                      11

                                                         name = "josh"
                                                 4       age = 32
                         2

                                                 10
                   name = "vadas"
                   age = 27

                                               created



                                                 5




./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name
==>vadas

               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Simple Traversals in Gremlin
./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name


1. .: Get the current object(s).

2. outE[@label=‘knows’]: Get the outgoing edges of the current
   object(s), where their labels equal ‘knows’.

3. inV[matches(@name,‘va.{3}’) and @age > 21]: Get the incoming
   vertices of those ‘knows’ edges, where the names of those vertices are 5
   characters long, start with ‘va’, and whose age is greater than 21.

4. @name: get the name of those particular incoming vertices.



                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Knowledge-Based Reasoning
• Blueprints implements the Sesame SAIL interfaces and thus, Gremlin
  can be used over the many Resource Description Framework (RDF)
  triple/quad stores. In such cases, RDF is modeled as a property graph
  where the named graph component is the @ng edge property.

• Gremlin makes use of the Sesame SAIL SPARQL engine to allow for
  queries based on graph-pattern matching.

gremlin> sail:sparql(‘SELECT ?x ?y WHERE { ?x foaf:knows ?y }’)
==>{y=v[http://ex.com#2], x=v[http://ex.com#1]}
==>{y=v[http://ex.com#4], x=v[http://ex.com#1]}

• Gremlin is useful for knowledge-based reasoning using path
  expressions.


              Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Reasoning as Defining New Types of Adjacency
                                                                    • Graph-based reasoning is the process
                                                                      of making explicit what is implicit in
                                      lop    co-developer
                                                                      the graph.
               created

  marko
                                               created              • A reasoner takes a graph G
             co-developer
                                                            peter
                                                                      and a collection of graph-patterns
                                   created
                                                                      (i.e. transformation/rewrite rules) and
  knows      knows
                                                                      creates a new graph G (usually, G ⊂
                            josh
                                                                      G ). G has new relationships/edges
  vadas
                                                                      and thus, new definitions of vertex
                         created                                      adjacency.
                                                                    • Example: The co-developers of person
                         ripple                                       A are those people who have created
                                                                      the same software as person A and who
                                                                      are themselves, not person A (as person
For these “co-developer” examples, we will use
                                                                      A has created the same software as him
vertex 1 (marko) as the source of the reasoning
                                                                      or herself).
process.


                            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
The Co-Developers of Marko A. Rodriguez in SPARQL


                               name = "lop"                             SELECT ?x WHERE {
                               lang = "java"

                               ?y
                                                                          marko created ?y .
                                        3
   name = "marko"
   age = 29          created
                                                                          ?z created ?y .
marko    1
                                               created
                                                            ?z            ?z != marko .
                                 created
                                                             6            ?z name ?x
                    knows
                                                       name = "peter"   }
                                                       age = 35 ?x
        knows
                            ?z
                               4
                                       name = "josh"
                                       age = 32 ?x
                                                                        This query would return: josh and
          2
                                                                        peter.
                             created


                               5




                            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
The Co-Developers of Marko A. Rodriguez in Gremlin
                                           co-developer



                                                                   lop    co-developer
                                           created
                                                                            created
                             marko               co-developer
                                                                                         peter
                                                                created

                             knows       knows



                                                       josh
                             vadas


                                                     created



                                                      ripple




gremin> ./@name
==>marko
gremlin> ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]/@name
==>josh
==>peter


                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
The Co-Developers of Marko A. Rodriguez in Gremlin

./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]/@name


1. .: Get the current object(s) (i.e. vertex 1 — denoting Marko).
2. outE[@label=‘created’]: Get the outgoing edges of the Marko vertex, where their
   labels equal ‘created’.
3. inV: Get the incoming (i.e. head) vertices of those ‘created’ edges.
4. inE[@label=‘created’]: Get the incoming edges of those vertices, where their
   labels equal ‘created’.
5. outV[g:except($ )]: Get the outgoing (i.e. tail) vertices of those ‘created’ edges,
   where those vertices are not the Marko vertex.
6. @name: get the name of those non-Marko vertices.




                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Defining Co-Developers in Gremlin


path co-developer
  ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]
  end

Once defined, you can use it like any other path segment.
gremlin> ./co-developer
==>v[4]
==>v[6]
gremlin> ./co-developer/@name
==>josh
==>peter




               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Defining Co-Developers in Java
public class CoDeveloperPath implements Path {
   public List invoke(Object root) {
      if(root instanceof Vertex) {
         List<Vertex> projects = new ArrayList<Vertex>();
         for(Edge edge : ((Vertex)root).getOutEdges()) {
             if(edge.getLabel().equals("created")) {
                projects.add(edge.getInVertex());
             }
         }
         List<Vertex> coDevelopers = new ArrayList<Vertex>();
         for(Vertex project : projects) {
             for(Edge edge : project.getInEdges()) {
                if(edge.getLabel().equals("created") && edge.getOutVertex() != root) {
                    coDevelopers.add(edge.getOutVertex());
                }
             }
         }
         return coDevelopers;
      } else {
         return null;
      }
   }
}



                     Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusions




                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Gremlin Type System

                                          object




element   graph         number            string         boolean           map               list




vertex    edge




          Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Predefined Paths and Properties
                      vertex 1 out edges                   vertex 3 in edges
       edge 9 out vertex                   edge 9 label                   edge 9 in vertex
                                 edge 9 id


                 1                  9        created                           3

                               8                                 11
                                    knows              created
                                               4                      vertex 4 id
              vertex 4 properties
                                         name = "josh"
                                         age = 32




   object        property                          description                       example
   graph             V                  the vertex iterator of the graph               $g/V
   graph             E                   the edge iterator of the graph                $g/E
vertex/edge         @id                   the identifier of the element                $v/@id
   vertex          outE                the outgoing edges of the vertex              $v/outE
   vertex           inE               the incoming edges of the vertex                $v/inE
   vertex         bothE              both in and out edges of the vertex            $v/bothE
    edge           outV              the outgoing tail vertex of the edge            $e/outV
    edge            inV             the incoming head vertex of the edge             $e/outV
    edge          bothV              both in and out vertices of the edge           $e/bothV
    edge          @label                      the label of the edge                 $e/@label




    Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Predefined Functions

g:assign()        g:remove-idx()           g:list()                 g:sort()                  g:print()
g:assign()        g:load()                 g:dedup()                g:map()                   g:time()
g:unassign()      g:save()                 g:union()                g:keys()                  g:p()
g:id()            g:clear()                g:intersect()            g:values()                g:to-json()
g:key()           g:close()                g:difference()           g:rand-nat()              g:from-json()
g:add-v()         g:keys()                 g:retain()               g:rand-real()             ...
g:add-e()         g:values()               g:except()               g:prob()                  ..
g:remove-ve()     g:map()                  g:remove()               g:cont()                  .
g:idx-all()       g:get()                  g:get()                  g:halt()
g:add-idx()       g:op-value()             g:op-value()             g:type()


There are over 70 predefined functions. See the following for a description of each.
http://wiki.github.com/tinkerpop/gremlin/core-function-library
http://wiki.github.com/tinkerpop/gremlin/gremlin-function-library



                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Working With Non-Graph Types
gremlin> 1.2 + 6
==>7.2
gremlin> ‘this is a string’
==>this is a string
gremlin> true() or false()
==>true
gremlin> g:map(‘marko’,‘lanl’,‘peter’,‘neotech’,‘josh’,‘rpi’)
==>marko=lanl
==>peter=neotech
==>josh=rpi
gremlin> g:list(‘graphs’,‘hockey’,‘motorcylces’,6)
==>graphs
==>hockey
==>motorcylces
==>6.0

            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Working With Non-Graph Types
gremlin> $m := g:map(‘hobbies’,g:list(‘hockey’,‘graphs’),
   ‘location’, g:map(‘state’,‘new mexico’, ‘city’, ‘santa fe’,
      ‘zipcode’, 87501), ‘age’, 30)
==>location={zipcode=87501.0, state=new mexico, city=santa fe}
==>age=30.0
==>hobbies=[hockey, graphs]
gremlin> $m/@age
==>30.0
gremlin> $m/@hobbies[2]
==>graphs
gremlin> $m/@location/@city
==>santa fe




            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Variables

• Variables in Gremlin are prefixed with a $ character.

• There are a collection of reserved variables that all begin with $ .
     $ is the root list of objects.
     $ last is the last result evaluated by the evaluator.
     $ g is the “working graph” to reduce typing with graph functions.

gremlin> $x := 1
==>1.0
gremlin> $y := 2
==>2.0
gremlin> $x + $y
==>3.0

               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Language Statements
Variable Assignment                                  Repeat

                                                     gremlin> $i := 0
gremlin> $i := 1 + 5                                 ==>0.0
==>6.0                                               gremlin> repeat 10
gremlin> $i                                            $i := $i + 1
==>6.0                                                 end
                                                     ==>10.0
If/Else
                                                     While

gremlin> if true()                                   gremlin> $i := ‘g’
    $i := 1                                          ==>g
  else                                               gremlin> while not(matches($i, ‘ggg’))
    $i := 2                                            $i := concat($i,‘g’)
    end                                                end
==>1.0                                               ==>ggg


                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Language Statements
Foreach                                                   Path

gremlin> $i := 0                                          gremlin> path friend_name
==>0.0                                                       ./outE[@label=‘knows’]/inV/@name
gremlin> foreach $j in 1 | 2 | 3                             end
   $i := $i + $j                                          gremlin> gremlin> ./friend_name
   end                                                    ==>vadas
==>6.0                                                    ==>josh
Function

gremlin> func ex:hello($name)
   concat(‘hello ’, $name)
   end
gremlin> ex:hello(‘pavel’)
==>hello pavel

You can define functions and paths in native Gremlin (as demonstrated above) or in Java.


                     Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
XPath Filters

• Use [ ] filters to filter objects in a path expression (i.e. “such that” or
  “where”)

• The evaluated result of [ ] must be a number or boolean.
      If its a number, it is treated as the position within an array (i.e. list).
      If it is boolean, it is treated as whether to include or exclude the
      object from the next path in the sequence.

gremlin> ./outE[@label=‘knows’]
==>e[7][1-knows->2]
==>e[8][1-knows->4]
gremlin> ./outE[@label=‘knows’ and @weight>0.5]/inV[@age<21 or @name=‘josh’][true()][1]
==>v[4]




                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusion




               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset




2,500 concerts
35,000 songs played
600 songs
30 years
11 members
1 band
... the Grateful Dead.



                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset
                                                                 • vertices denote songs and artists
                                                                      type: “song” or “artist”
                                                                      name: name of song or artist.
                                                                      performances: number of times song was
                                                                      played in concert.
                                                                      song type: whether the song was a “cover”
                                                                      or “original”.


                                                                 • edges    denote   followed by,      sung by,
                                                                   written by
                                                                      weight: number of times a song was
                                                                      followed by another song over all concerts
                                                                      played.


Rodriguez, M.A., Gintautas, V., Pepe, A., “A Grateful Dead Analysis: The Relationship Between Concert and Listening

Behavior,” First Monday, 14(1), University of Illinois at Chicago Library, http://arxiv.org/abs/0807.2466, January 2009.

NOTE: A portion of the raw dataset courtesy of Mark Leone http://www.cs.cmu.edu/ mleone/gdead/setlists.html



                          Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset

Stanley Theater                                   type="artist"
                                                                                                                type="artist"
                                                  name="Hunter"
                                                                                                                name="Garcia"
Pittsburgh, PA (11/30/79)                                                        type="song"
                                                                                 name="Scarlet.."
                                                          7
   2nd Set                                                                                                                  5
                                                                    written_by          1           sung_by
-------------------
                                                              weight=239
Scarlet Begonias
                                                                followed_by      type="song"
Fire on the Mountain                                                             name="Fire on.."             sung_by           sung_by
                                                       written_by
Passenger                                                                               2

Terrapin Station                                                weight=1
                                                                                                               type="artist"
                                                                                                               name="Lesh"
...                                                             followed_by
                                                                                 type="song"
                                                                                 name="Pass.."                          6
..
                                                   written_by                           3            sung_by
.
                                                                 followed_by
                                                                                 type="song"
                                                                weight=2         name="Terrap.."


                                                                                        4




            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset – Load Data/Basic Stats

gremlin> g:load(‘data/graph-example-2.xml’)
==>true
gremlin> count($_g/V)
==>809.0
gremlin> count($_g/E)
==>8049.0




            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset – Out-Degree of Each Vertex


gremlin> $degrees := g:map()
gremlin> foreach $v in $_g/V
  $degrees[@name=$v/@name] := count($v/outE)
end




            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset – Out-Degree of Each Vertex

gremlin> g:sort($degrees, ‘value’, true())
==>PLAYING IN THE BAND=96.0
==>SUGAR MAGNOLIA=92.0
==>PROMISED LAND=89.0
==>GOOD LOVING=87.0
==>NOT FADE AWAY=86.0
==>I KNOW YOU RIDER=85.0
==>CASSIDY=83.0
==>DEAL=82.0
==>JACK STRAW=81.0
==>ONE MORE SATURDAY NIGHT=81.0
==>EL PASO=80.0
==>MEXICALI BLUES=79.0
...

            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset – Inspecting Single Vertex


gremlin> $v := g:key(‘name’,‘CHINA DOLL’)[1]
==>v[129]
gremlin> g:map($v)
==>name=CHINA DOLL
==>song_type=original
==>performances=114
==>type=song
gremlin> $v/outE[@label=‘sung_by’]/inV/@name
==>Garcia




            Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
A Grateful Dead Dataset – Inspecting Single Vertex
gremlin> $v/outE[@label=‘followed_by’]/inV/@name
==>BIG RIVER
==>THROWING STONES
==>SAMSON AND DELILAH
==>TRUCKING
==>CASEY JONES
==>HIGH TIME
...
gremlin> $v/outE[@label=‘followed_by’]/@weight
==>2
==>8
==>1
==>2
==>1
==>1
...

               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Introduction to PageRank
• The remainder of this section will discuss the PageRank algorithm and
  its application to multi-relational graphs.

• The arguments made and the examples presented generalizes to all other
  single-relational graph algorithms. However, for the sake of brevity and
  consistency, only PageRank will be discussed.




               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Introduction to Matrix-Based PageRank

• PageRank is a centrality measure based on the primary eigenvector
                                                  |V |×|V |
  of a modified version of a graph. Let A ∈ R+               denote the
  adjacency matrix representing the graph.

• In order to ensure a positive real values in the eigenvector, the graph
  must be strongly connected. PageRank induces strong connectivity
  by overlaying a low probability (defined by α ∈ [0, 1] – usually 0.15)
                                                           1 |V |×|V |
  “teleportation” graph over the original graph. Let B ∈ |V |          denote
  a teleportation adjacency matrix where ever vertex is connected to vertex
  with equal probability.
                                                           |V |×|V |
     C = (1 − α)A + αB, where C ∈ R+
                         |V |
     λ = λC, where λ ∈ R+ is the PageRank vector over V .


               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Introduction to Random Walk-Based PageRank
• PageRank can be implemented by a random walk.

• Create a vertex counter map, m : V → N+.

• Place a walker on a random vertex in V . Denote the walker’s current
  vertex i ∈ V .
 1.   increment the vertex counter by 1 (i.e. m(i) ← m(i) + 1).
 2.   the walker chooses a random adjacent vertex with probability α.
 3.   the walker chooses a random vertex in V with probability 1 − α.
 4.   rinse and repeat until m reaches a stationary probability distribution
      (continually normalize m if you want a probability distribution).

• We will use this random walk model in the Gremlin examples to follow.


                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
PageRank over Multi-Relational Graphs

• PageRank was designed for single-relational graphs (i.e. where all edges
  have the same meaning).

• In a multi-relational graph, what does it mean to find the centrality
  of a vertex when vertices can be related by various types of edges?
  For example, if there exists “socializes with” and “met once”, then the
  person who “met once” many people could be the most centrally located
  in the graph. Also, what if you graph has more than just “person”-type
  vertices (e.g. cars, pets, buildings, articles, etc.) and “person”-type
  edges (e.g. owns, walks, livesAt, cites, etc.).




               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
PageRank over Multi-Relational Graphs
• Calculating single-relational PageRank
  would yield Person as the most central                                                                                           ...
                                                                                      Person                                type
  vertex.                                                                                                                type
                                                                                                                      type
• You can boolean filter certain edge labels                                                                        type
                                                                                                                 type
  (e.g. ignore type edges — in such cases,                                                                    type
                                                                       type    type    type    type    type type
  you would have the centrality scores over
  the knows social graph).
• However, what if you only wanted to
  traverse knows edges if and only if the                Herbert       Johan          Marko            Josh           Jen      ...
  adjacent vertex knows more than 10
  other people?                                                knows           knows           knows          knows

• In the end, you want complete
                                                                       knows                           knows
  control (universal computability)
  over      the    paths      that      the
  traverser/walker can take through
  a graph.


                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
PageRank over Multi-Relational Graphs
• In multi-relational graphs, the meaning of your graph algorithm’s results are
  defined by your definition of adjacency.
• With respect to random walk-based PageRank, define the path that the walker
  should take. That path is the definition of adjacency.
• The stationary probability distribution created from this walk yields a path-dependent
  centrality.
• Thus, in a multi-relational graph, there are many types of PageRanks that can
  be calculated — one for each type of path defined for a walker.


Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks”, Knowledge-Based Systems,
21(7), 727–739, http://arxiv.org/abs/0803.4355, October 2008.




                    Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
PageRank over “Garcia Followed By” SubGraph

• Define a path that will go from song-to-song by “followed by” edges and
  only traverse songs that are “sung by” Jerry Garcia.

(./outE[@label=‘followed_by’]/inV/outE[@label=‘sung_by’]
         /inV[name=‘Garcia’]/../..)[g:rand-nat()]

         A                  B             C               D                        /../..
         followed_by                       sung_by                 name="Garcia"

                                                                                            g:rand-nat()
   .     followed_by                       sung_by                 name="Garcia"



         followed_by                       sung_by                 name="Weir"




                       Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
PageRank over “Garcia Followed By” SubGraph
path garcia-followed_by
   (./outE[@label=‘followed_by’]/inV/outE[@label=‘sung_by’]
         /inV[name=‘Garcia’]/../..)[g:rand-nat()]
   end

$m := g:map()
$alpha := 0.15
$_ := g:key(‘type’, ‘song’)[g:rand-nat()]
repeat 2500
  $_ := ./garcia-followed_by
  if count($_) > 0
    g:op-value(‘+’,$m,$_[1]/@name, 1.0)
  end
  if g:rand-real() < $alpha or count($_) = 0
    $_ := g:key(‘type’, ’song’)[g:rand-nat()]
  end
end

               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
PageRank over “Garcia Followed By” SubGraph
gremlin> g:sort($m,‘value’,true())
==>CRAZY FINGERS=98.0
==>HES GONE=85.0
==>CHINA CAT SUNFLOWER=79.0
==>BERTHA=76.0
==>UNCLE JOHNS BAND=74.0
==>TERRAPIN STATION=72.0
==>GOING DOWN THE ROAD FEELING BAD=71.0
==>WHARF RAT=71.0
==>EYES OF THE WORLD=65.0
==>COLD RAIN AND SNOW=62.0
==>SHIP OF FOOLS=58.0
==>RAMBLE ON ROSE=53.0
==>CASEY JONES=51.0
==>DARK STAR=47.0
==>DEAL=46.0
...

               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Universal Computation in Paths
path path-name
  # any arbitrary computation can occur here
  end

• A path definition can be used to define adjacencies.
    adjacency can be expressed as anything that can be computed by a Turing machine.
    path definitions are used to create “semantically meaningful” results from single-
    relational graph algorithms applied to multi-relational graphs.
    path definitions make explicit what is implicit in the structure of the graph. This
    has applications to knowledge-based reasoning.
• A path definition can perform any arbitrary computation.
    path definitions can check/set vertex/edge properties.
    path definitions can create new vertices and edges.
    path definitions can call/define functions.

This allows fine grained control over how your traverser/walker moves through a graph.


                  Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Outline

• Introduction to Graphs and Graph Software

• Basic Gremlin Concepts

• Gremlin Language Description

• Advanced Gremlin Concepts

• Conclusions




                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
The Current Gremlin EcoSystems
• Webling: Web console for Gremlin
  (developed by Pavel Yaskevich w/ funding from Neo Technology)


          Webling
• Project Gargamel: Distributed Graph Computing
  (uses Linked Process and Gremlin)




• ReXster: A Graph-Based Recommender Engine




                Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
Thank You
Please enjoy Gremlin at http://gremlin.tinkerpop.com ...




My homepage is http://markorodriguez.com.
Please feel to contact me with any questions or comments.




               Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010

More Related Content

What's hot

What's hot (20)

Html5
Html5 Html5
Html5
 
Kruskal Algorithm
Kruskal AlgorithmKruskal Algorithm
Kruskal Algorithm
 
[Modern Web] CSS3 Grid Layout
[Modern Web] CSS3 Grid Layout [Modern Web] CSS3 Grid Layout
[Modern Web] CSS3 Grid Layout
 
Lecture 21 problem reduction search ao star search
Lecture 21 problem reduction search ao star searchLecture 21 problem reduction search ao star search
Lecture 21 problem reduction search ao star search
 
PRIM'S ALGORITHM
PRIM'S ALGORITHMPRIM'S ALGORITHM
PRIM'S ALGORITHM
 
Lecture optimal binary search tree
Lecture optimal binary search tree Lecture optimal binary search tree
Lecture optimal binary search tree
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Html presentation
Html presentationHtml presentation
Html presentation
 
Game playing (tic tac-toe), andor graph
Game playing (tic tac-toe), andor graphGame playing (tic tac-toe), andor graph
Game playing (tic tac-toe), andor graph
 
Graph
GraphGraph
Graph
 
Graph db
Graph dbGraph db
Graph db
 
Overview of cryptography
Overview of cryptographyOverview of cryptography
Overview of cryptography
 
JSON and XML
JSON and XMLJSON and XML
JSON and XML
 
AVL Tree
AVL TreeAVL Tree
AVL Tree
 
Bellman ford algorithm
Bellman ford algorithmBellman ford algorithm
Bellman ford algorithm
 
Handling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph DatabaseHandling Billions of Edges in a Graph Database
Handling Billions of Edges in a Graph Database
 
HTTP request and response
HTTP request and responseHTTP request and response
HTTP request and response
 
Application Of Graph Data Structure
Application Of Graph Data StructureApplication Of Graph Data Structure
Application Of Graph Data Structure
 
Network flows
Network flowsNetwork flows
Network flows
 
ASP.NET 07 - Site Navigation
ASP.NET 07 - Site NavigationASP.NET 07 - Site Navigation
ASP.NET 07 - Site Navigation
 

Viewers also liked

Introduction to Gremlin
Introduction to GremlinIntroduction to Gremlin
Introduction to Gremlin
Max De Marzi
 

Viewers also liked (20)

The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal Language
 
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and GremlinIntro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with Graphs
 
The Graph Traversal Programming Pattern
The Graph Traversal Programming PatternThe Graph Traversal Programming Pattern
The Graph Traversal Programming Pattern
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with Gremlin
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the Graph
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph Computing
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal Machinery
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DB
 
Introduction to TitanDB
Introduction to TitanDB Introduction to TitanDB
Introduction to TitanDB
 
Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
Aerospike Architecture
Aerospike ArchitectureAerospike Architecture
Aerospike Architecture
 
Arquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devopsArquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devops
 
Introduction to Gremlin
Introduction to GremlinIntroduction to Gremlin
Introduction to Gremlin
 
GUI Testing
GUI TestingGUI Testing
GUI Testing
 
testing
testingtesting
testing
 
The Path Forward
The Path ForwardThe Path Forward
The Path Forward
 

Similar to Gremlin: A Graph-Based Programming Language

Introduction to object oriented programming
Introduction to object oriented programmingIntroduction to object oriented programming
Introduction to object oriented programming
Abzetdin Adamov
 
Computing with Directed Labeled Graphs
Computing with Directed Labeled GraphsComputing with Directed Labeled Graphs
Computing with Directed Labeled Graphs
Marko Rodriguez
 
Exploring Elixir Codebases with Archeometer
Exploring Elixir Codebases with ArcheometerExploring Elixir Codebases with Archeometer
Exploring Elixir Codebases with Archeometer
Agustin Ramos
 
Find your way in Graph labyrinths
Find your way in Graph labyrinthsFind your way in Graph labyrinths
Find your way in Graph labyrinths
Daniel Camarda
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaram
Viswanath Gangavaram
 

Similar to Gremlin: A Graph-Based Programming Language (20)

Mathematical Semantic Markup in a Wiki: the Roles of Symbols and Notations
Mathematical Semantic Markup in a Wiki: the Roles of Symbols and NotationsMathematical Semantic Markup in a Wiki: the Roles of Symbols and Notations
Mathematical Semantic Markup in a Wiki: the Roles of Symbols and Notations
 
Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)
 
Tips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software EngineeringTips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software Engineering
 
Introduction to object oriented programming
Introduction to object oriented programmingIntroduction to object oriented programming
Introduction to object oriented programming
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
 
Computing with Directed Labeled Graphs
Computing with Directed Labeled GraphsComputing with Directed Labeled Graphs
Computing with Directed Labeled Graphs
 
Exploring Elixir Codebases with Archeometer
Exploring Elixir Codebases with ArcheometerExploring Elixir Codebases with Archeometer
Exploring Elixir Codebases with Archeometer
 
Resume
ResumeResume
Resume
 
Srinivas Muddana Resume
Srinivas Muddana ResumeSrinivas Muddana Resume
Srinivas Muddana Resume
 
Srinivas Muddana Resume
Srinivas Muddana ResumeSrinivas Muddana Resume
Srinivas Muddana Resume
 
Srinivas Muddana Resume
Srinivas Muddana ResumeSrinivas Muddana Resume
Srinivas Muddana Resume
 
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al MesAyudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
Ayudando a los Viajeros usando 500 millones de Reseñas Hoteleras al Mes
 
Find your way in Graph labyrinths
Find your way in Graph labyrinthsFind your way in Graph labyrinths
Find your way in Graph labyrinths
 
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
 
A Logic-Based Approach To Semantic Information Extraction
A Logic-Based Approach To Semantic Information ExtractionA Logic-Based Approach To Semantic Information Extraction
A Logic-Based Approach To Semantic Information Extraction
 
IN4308 1
IN4308 1IN4308 1
IN4308 1
 
STI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-smSTI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-sm
 
Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discovery
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaram
 
Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?
 

More from Marko Rodriguez

Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to Redemption
Marko Rodriguez
 
The Network Data Structure in Computing
The Network Data Structure in ComputingThe Network Data Structure in Computing
The Network Data Structure in Computing
Marko Rodriguez
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
Marko Rodriguez
 

More from Marko Rodriguez (20)

mm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machinemm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machine
 
mm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Typemm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Type
 
Open Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryOpen Problems in the Universal Graph Theory
Open Problems in the Universal Graph Theory
 
Gremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialGremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM Dial
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and Language
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics Engine
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph Databases
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical Gremlin
 
Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to Redemption
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network Science
 
The Network Data Structure in Computing
The Network Data Structure in ComputingThe Network Data Structure in Computing
The Network Data Structure in Computing
 
A Model of the Scholarly Community
A Model of the Scholarly CommunityA Model of the Scholarly Community
A Model of the Scholarly Community
 
General-Purpose, Internet-Scale Distributed Computing with Linked Process
General-Purpose, Internet-Scale Distributed Computing with Linked ProcessGeneral-Purpose, Internet-Scale Distributed Computing with Linked Process
General-Purpose, Internet-Scale Distributed Computing with Linked Process
 
Collective Decision Making Systems: From the Ideal State to Human Eudaimonia
Collective Decision Making Systems: From the Ideal State to Human EudaimoniaCollective Decision Making Systems: From the Ideal State to Human Eudaimonia
Collective Decision Making Systems: From the Ideal State to Human Eudaimonia
 
Distributed Graph Databases and the Emerging Web of Data
Distributed Graph Databases and the Emerging Web of DataDistributed Graph Databases and the Emerging Web of Data
Distributed Graph Databases and the Emerging Web of Data
 
An Overview of Data Management Paradigms: Relational, Document, and Graph
An Overview of Data Management Paradigms: Relational, Document, and GraphAn Overview of Data Management Paradigms: Relational, Document, and Graph
An Overview of Data Management Paradigms: Relational, Document, and Graph
 
Graph Databases and the Future of Large-Scale Knowledge Management
Graph Databases and the Future of Large-Scale Knowledge ManagementGraph Databases and the Future of Large-Scale Knowledge Management
Graph Databases and the Future of Large-Scale Knowledge Management
 
Automatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative NetworksAutomatic Metadata Generation using Associative Networks
Automatic Metadata Generation using Associative Networks
 
Evolving the Web into a Giant Global Database
Evolving the Web into a Giant Global DatabaseEvolving the Web into a Giant Global Database
Evolving the Web into a Giant Global Database
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Gremlin: A Graph-Based Programming Language

  • 1. Gremlin G = (V, E) A Graph-Based Programming Language Marko A. Rodriguez T-5, Center for Nonlinear Studies Los Alamos National Laboratory http://markorodriguez.com http://gremlin.tinkerpop.com February 25, 2010
  • 2. Abstract Gremlin is a Turing-complete, graph-based programming language developed for key/value-pair multi-relational graphs called property graphs. Gremlin makes extensive use of XPath 1.0 to support complex graph traversals. Connectors exist to various graph databases and frameworks. This language has application in the areas of graph query, analysis, and manipulation. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 3. Acknowledgements • Marko A. Rodriguez [http://markorodriguez.com] designed, developed, tested, and documented Gremlin. • Peter Neubauer [http://www.linkedin.com/in/neubauer] aided in the design and the evangelizing of Gremlin. • Pavel Yaskevich [http://github.com/xedin] aided in the development of user defined functions in Gremlin. • Joshua Shinavier [http://fortytwo.net] provided initial conceptual support for Gremlin. • Ketrina Yim [http://csillustrated.berkeley.edu] designed the logo for Gremlin. • Gremlin-Users Group [http://groups.google.com/group/gremlin-users] provided much direction in the design and implementation of Gremlin. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 4. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusions Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 5. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusions Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 6. What is a Graph? • A graph (network) is composed of a collection of vertices (dots) and edges (lines). There are many types of graphs: directed/undirected, weighted, attributed, etc. vertex-labeled a hyper d edge-attributed ed bele ht e-la multi ig edgknows created=2-01-09 we 0.2 modified=2-11-09 cted tic undire di an re ct m hired ed se reg ge ula half-ed r pseudo http://ex.com/123 type="person" name="emil" resource description framework vertex-attributed Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 7. Why Use a Graph? • A graph is a very general data structure that can be used to model various systems. A graph can model the structure of transportation, technological, bibliographic, etc. systems. A graph can model a list, a map, a tree, etc. • There are numerous graph algorithms that are defined independent of the domain of the graph model. • There are numerous graph databases, frameworks, packages, etc. that aid in the creation, manipulation, and analysis of graphs. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 8. Graph Databases, Frameworks, and Packages • Neo4j Graph Database [http://neo4j.org] • AllegroGraph Quad Store [http://http://www.franz.com/agraph] • HyperGraphDB [http://www.kobrix.com/hgdb.jsp] • Java Universal Network/Graph Framework [http://jung.sourceforge.net] • OpenRDF Sesame Framework [http://www.openrdf.org] • InfoGrid Graph Database [http://infogrid.org] • Filament Graph Toolkit [http://filament.sourceforge.net] • OWLim Semantic Repository [http://www.ontotext.com/owlim] • Sones Graph Database [http://www.sones.com] • NetworkX Graph Toolkit [http://networkx.lanl.gov] • iGraph Toolkit [http://igraph.sourceforge.net] • Blueprints Graph API [http://blueprints.tinkerpop.com] • ... and many more. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 9. What Makes Gremlin Different? • Gremlin is a domain specific language for working with graphs. • Gremlin is not an application programming interface (API). • Gremlin makes use of various graph databases, frameworks, packages. • Gremlin is a language that currently has a virtual machine implementation written in Java. • What can be succinctly expressed in Gremlin is verbose/clumsy to express in general purpose languages such as Java, Python, Ruby, etc. • Gremlin allows one to map single-relational graph analysis algorithms over to the multi-relational domain. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 10. Single-Relational Graphs • In single-relational graphs, all edges have the same meaning (e.g. all edges are either frienship, kinship, worksWith, knows, etc.). G = (V, E ⊆ (V × V )) • Most graph algorithms are defined for single-relational graphs (e.g. centrality/ranking, clustering/community detection, etc.). person-c person-a person-b NOTE: These types of graphs are also known as directed, vertex-labeled graphs. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 11. Multi-Relational Graphs • In multi-relational graphs, edges can have different meanings. G = (V, E ⊂ (V × V ), ω : E → Σ∗) • Most graph software is designed for multi-relational graphs (e.g. arbitrary objects as vertices and edges, knowledge-based reasoning systems, etc.). book-c read cites person-a authored book-b NOTE: These types of graphs are also known as directed, vertex/edge-labeled graphs. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 12. Gremlin and Multi-Relational Graphs • Gremlin provides a means to elegantly map single-relational graph analysis algorithms over to the multi-relational graph domain. • Gremlin provides an elegant way to do automated reasoning in multi-relational graphs using path expressions. These two points form the primary thesis of this presentation. Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, 4(1), 29–41, doi:10.1016/j.joi.2009.06.004, LA-UR-08-03931, http://arxiv.org/abs/0806.2274, December 2009. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 13. Property Graphs • Gremlin works with a type of multi-relational graph called a property graph. Vertices and edges are labeled with unique identifiers. Edges are directed, labeled, and can form loops. Multiple edges of the same label can exist for the same vertex pair. Vertices and edges can have any number of key/value pair properties/attributes. Property graphs are a relatively general graph structure that can be constrained to model other graph structures — though, a property-based hypergraph would be the most general (see HyperGraphDB and the JUNG API). Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 14. Property Graphs name = "lop" lang = "java" weight = 0.4 3 name = "marko" age = 29 created weight = 0.2 9 1 created 8 created 12 7 weight = 1.0 weight = 0.4 6 weight = 0.5 knows knows 11 name = "peter" age = 35 name = "josh" 4 age = 32 2 10 name = "vadas" age = 27 weight = 1.0 created 5 name = "ripple" lang = "java" Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 15. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 16. Gremlin System Architecture • The Gremlin console is a scripting environment Gremlin Gremlin which allows for the dynamic evaluation of Console ScriptEngine Gremlin code. • Gremlin implements JSR 223 which allows Gremlin to also be used within the Java language and thus, as a virtual machine directly accessible to Java applications. Popular JSR 223 implementations include Jython, JRuby, and Groovy. For a fine list of implementations see https://scripting.dev.java.net. • Blueprints is a set of interfaces for abstract data structures such as graphs and documents. Implementations to these interfaces exist for various data management systems. • There exist many graph data management systems that span various graph data models Neo4j NativeStore TinkerGraph (e.g. edge labeled graphs, RDF graphs, hypergraphs, etc.). Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 17. “Hello World” in the Gremlin Console marko$ ./gremlin.sh ,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin> gremlin> concat(‘goodbye’, ‘ ’, ‘self’) ==>goodbye self Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 18. Simple Traversals in Gremlin name = "lop" gremlin> $_ := g:key(‘name’,‘marko’) lang = "java" ==>v[1] weight = 0.4 3 name = "marko" age = 29 created gremlin> . 1 9 ==>v[1] created 7 8 created 12 gremlin> ./outE 6 weight = 0.5 knows ==>e[7][1-knows->2] knows 11 weight = 1.0 ==>e[9][1-created->3] name = "josh" 4 2 age = 32 ==>e[8][1-knows->4] name = "vadas" 10 gremlin> ./outE/@weight age = 27 ==>0.5 created ==>0.4 5 ==>1.0 ./outE/@weight: “Get the current object(s). Then get the outgoing edges of those objects. Then get the weights of those edges.” $ is a reserved variable meaning the root list of objects. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 19. Simple Traversals in Gremlin name = "lop" gremlin> . lang = "java" ==>v[1] 3 name = "marko" gremlin> ./outE[@label=‘created’]/inV age = 29 created 9 ==>v[3] 1 created 8 created gremlin> $_ := $_last 12 7 6 ==>v[3] knows knows 11 gremlin> ./@name ==>lop 4 2 gremlin> g:map(.) 10 ==>name=lop created ==>lang=java 5 ./outE[@label=‘created’]/inV: “Get the current object(s). Then get the outgoing edges of those objects, where their labels equal ‘created’. Then get the incoming vertices of those ‘created’ edges.” $ last is a reserved variable meaning the last value evaluated. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 20. Simple Traversals in Gremlin name = "lop" lang = "java" 3 name = "marko" age = 29 created 9 1 created 8 created 12 7 6 knows knows 11 name = "josh" 4 age = 32 2 10 name = "vadas" age = 27 created 5 ./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name ==>vadas Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 21. Simple Traversals in Gremlin ./outE[@label=‘knows’]/inV[matches(@name,‘va.{3}’) and @age > 21]/@name 1. .: Get the current object(s). 2. outE[@label=‘knows’]: Get the outgoing edges of the current object(s), where their labels equal ‘knows’. 3. inV[matches(@name,‘va.{3}’) and @age > 21]: Get the incoming vertices of those ‘knows’ edges, where the names of those vertices are 5 characters long, start with ‘va’, and whose age is greater than 21. 4. @name: get the name of those particular incoming vertices. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 22. Knowledge-Based Reasoning • Blueprints implements the Sesame SAIL interfaces and thus, Gremlin can be used over the many Resource Description Framework (RDF) triple/quad stores. In such cases, RDF is modeled as a property graph where the named graph component is the @ng edge property. • Gremlin makes use of the Sesame SAIL SPARQL engine to allow for queries based on graph-pattern matching. gremlin> sail:sparql(‘SELECT ?x ?y WHERE { ?x foaf:knows ?y }’) ==>{y=v[http://ex.com#2], x=v[http://ex.com#1]} ==>{y=v[http://ex.com#4], x=v[http://ex.com#1]} • Gremlin is useful for knowledge-based reasoning using path expressions. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 23. Reasoning as Defining New Types of Adjacency • Graph-based reasoning is the process of making explicit what is implicit in lop co-developer the graph. created marko created • A reasoner takes a graph G co-developer peter and a collection of graph-patterns created (i.e. transformation/rewrite rules) and knows knows creates a new graph G (usually, G ⊂ josh G ). G has new relationships/edges vadas and thus, new definitions of vertex created adjacency. • Example: The co-developers of person ripple A are those people who have created the same software as person A and who are themselves, not person A (as person For these “co-developer” examples, we will use A has created the same software as him vertex 1 (marko) as the source of the reasoning or herself). process. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 24. The Co-Developers of Marko A. Rodriguez in SPARQL name = "lop" SELECT ?x WHERE { lang = "java" ?y marko created ?y . 3 name = "marko" age = 29 created ?z created ?y . marko 1 created ?z ?z != marko . created 6 ?z name ?x knows name = "peter" } age = 35 ?x knows ?z 4 name = "josh" age = 32 ?x This query would return: josh and 2 peter. created 5 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 25. The Co-Developers of Marko A. Rodriguez in Gremlin co-developer lop co-developer created created marko co-developer peter created knows knows josh vadas created ripple gremin> ./@name ==>marko gremlin> ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]/@name ==>josh ==>peter Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 26. The Co-Developers of Marko A. Rodriguez in Gremlin ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)]/@name 1. .: Get the current object(s) (i.e. vertex 1 — denoting Marko). 2. outE[@label=‘created’]: Get the outgoing edges of the Marko vertex, where their labels equal ‘created’. 3. inV: Get the incoming (i.e. head) vertices of those ‘created’ edges. 4. inE[@label=‘created’]: Get the incoming edges of those vertices, where their labels equal ‘created’. 5. outV[g:except($ )]: Get the outgoing (i.e. tail) vertices of those ‘created’ edges, where those vertices are not the Marko vertex. 6. @name: get the name of those non-Marko vertices. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 27. Defining Co-Developers in Gremlin path co-developer ./outE[@label=‘created’]/inV/inE[@label=‘created’]/outV[g:except($_)] end Once defined, you can use it like any other path segment. gremlin> ./co-developer ==>v[4] ==>v[6] gremlin> ./co-developer/@name ==>josh ==>peter Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 28. Defining Co-Developers in Java public class CoDeveloperPath implements Path { public List invoke(Object root) { if(root instanceof Vertex) { List<Vertex> projects = new ArrayList<Vertex>(); for(Edge edge : ((Vertex)root).getOutEdges()) { if(edge.getLabel().equals("created")) { projects.add(edge.getInVertex()); } } List<Vertex> coDevelopers = new ArrayList<Vertex>(); for(Vertex project : projects) { for(Edge edge : project.getInEdges()) { if(edge.getLabel().equals("created") && edge.getOutVertex() != root) { coDevelopers.add(edge.getOutVertex()); } } } return coDevelopers; } else { return null; } } } Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 29. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusions Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 30. Gremlin Type System object element graph number string boolean map list vertex edge Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 31. Predefined Paths and Properties vertex 1 out edges vertex 3 in edges edge 9 out vertex edge 9 label edge 9 in vertex edge 9 id 1 9 created 3 8 11 knows created 4 vertex 4 id vertex 4 properties name = "josh" age = 32 object property description example graph V the vertex iterator of the graph $g/V graph E the edge iterator of the graph $g/E vertex/edge @id the identifier of the element $v/@id vertex outE the outgoing edges of the vertex $v/outE vertex inE the incoming edges of the vertex $v/inE vertex bothE both in and out edges of the vertex $v/bothE edge outV the outgoing tail vertex of the edge $e/outV edge inV the incoming head vertex of the edge $e/outV edge bothV both in and out vertices of the edge $e/bothV edge @label the label of the edge $e/@label Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 32. Predefined Functions g:assign() g:remove-idx() g:list() g:sort() g:print() g:assign() g:load() g:dedup() g:map() g:time() g:unassign() g:save() g:union() g:keys() g:p() g:id() g:clear() g:intersect() g:values() g:to-json() g:key() g:close() g:difference() g:rand-nat() g:from-json() g:add-v() g:keys() g:retain() g:rand-real() ... g:add-e() g:values() g:except() g:prob() .. g:remove-ve() g:map() g:remove() g:cont() . g:idx-all() g:get() g:get() g:halt() g:add-idx() g:op-value() g:op-value() g:type() There are over 70 predefined functions. See the following for a description of each. http://wiki.github.com/tinkerpop/gremlin/core-function-library http://wiki.github.com/tinkerpop/gremlin/gremlin-function-library Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 33. Working With Non-Graph Types gremlin> 1.2 + 6 ==>7.2 gremlin> ‘this is a string’ ==>this is a string gremlin> true() or false() ==>true gremlin> g:map(‘marko’,‘lanl’,‘peter’,‘neotech’,‘josh’,‘rpi’) ==>marko=lanl ==>peter=neotech ==>josh=rpi gremlin> g:list(‘graphs’,‘hockey’,‘motorcylces’,6) ==>graphs ==>hockey ==>motorcylces ==>6.0 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 34. Working With Non-Graph Types gremlin> $m := g:map(‘hobbies’,g:list(‘hockey’,‘graphs’), ‘location’, g:map(‘state’,‘new mexico’, ‘city’, ‘santa fe’, ‘zipcode’, 87501), ‘age’, 30) ==>location={zipcode=87501.0, state=new mexico, city=santa fe} ==>age=30.0 ==>hobbies=[hockey, graphs] gremlin> $m/@age ==>30.0 gremlin> $m/@hobbies[2] ==>graphs gremlin> $m/@location/@city ==>santa fe Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 35. Variables • Variables in Gremlin are prefixed with a $ character. • There are a collection of reserved variables that all begin with $ . $ is the root list of objects. $ last is the last result evaluated by the evaluator. $ g is the “working graph” to reduce typing with graph functions. gremlin> $x := 1 ==>1.0 gremlin> $y := 2 ==>2.0 gremlin> $x + $y ==>3.0 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 36. Language Statements Variable Assignment Repeat gremlin> $i := 0 gremlin> $i := 1 + 5 ==>0.0 ==>6.0 gremlin> repeat 10 gremlin> $i $i := $i + 1 ==>6.0 end ==>10.0 If/Else While gremlin> if true() gremlin> $i := ‘g’ $i := 1 ==>g else gremlin> while not(matches($i, ‘ggg’)) $i := 2 $i := concat($i,‘g’) end end ==>1.0 ==>ggg Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 37. Language Statements Foreach Path gremlin> $i := 0 gremlin> path friend_name ==>0.0 ./outE[@label=‘knows’]/inV/@name gremlin> foreach $j in 1 | 2 | 3 end $i := $i + $j gremlin> gremlin> ./friend_name end ==>vadas ==>6.0 ==>josh Function gremlin> func ex:hello($name) concat(‘hello ’, $name) end gremlin> ex:hello(‘pavel’) ==>hello pavel You can define functions and paths in native Gremlin (as demonstrated above) or in Java. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 38. XPath Filters • Use [ ] filters to filter objects in a path expression (i.e. “such that” or “where”) • The evaluated result of [ ] must be a number or boolean. If its a number, it is treated as the position within an array (i.e. list). If it is boolean, it is treated as whether to include or exclude the object from the next path in the sequence. gremlin> ./outE[@label=‘knows’] ==>e[7][1-knows->2] ==>e[8][1-knows->4] gremlin> ./outE[@label=‘knows’ and @weight>0.5]/inV[@age<21 or @name=‘josh’][true()][1] ==>v[4] Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 39. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusion Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 40. A Grateful Dead Dataset 2,500 concerts 35,000 songs played 600 songs 30 years 11 members 1 band ... the Grateful Dead. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 41. A Grateful Dead Dataset • vertices denote songs and artists type: “song” or “artist” name: name of song or artist. performances: number of times song was played in concert. song type: whether the song was a “cover” or “original”. • edges denote followed by, sung by, written by weight: number of times a song was followed by another song over all concerts played. Rodriguez, M.A., Gintautas, V., Pepe, A., “A Grateful Dead Analysis: The Relationship Between Concert and Listening Behavior,” First Monday, 14(1), University of Illinois at Chicago Library, http://arxiv.org/abs/0807.2466, January 2009. NOTE: A portion of the raw dataset courtesy of Mark Leone http://www.cs.cmu.edu/ mleone/gdead/setlists.html Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 42. A Grateful Dead Dataset Stanley Theater type="artist" type="artist" name="Hunter" name="Garcia" Pittsburgh, PA (11/30/79) type="song" name="Scarlet.." 7 2nd Set 5 written_by 1 sung_by ------------------- weight=239 Scarlet Begonias followed_by type="song" Fire on the Mountain name="Fire on.." sung_by sung_by written_by Passenger 2 Terrapin Station weight=1 type="artist" name="Lesh" ... followed_by type="song" name="Pass.." 6 .. written_by 3 sung_by . followed_by type="song" weight=2 name="Terrap.." 4 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 43. A Grateful Dead Dataset – Load Data/Basic Stats gremlin> g:load(‘data/graph-example-2.xml’) ==>true gremlin> count($_g/V) ==>809.0 gremlin> count($_g/E) ==>8049.0 Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 44. A Grateful Dead Dataset – Out-Degree of Each Vertex gremlin> $degrees := g:map() gremlin> foreach $v in $_g/V $degrees[@name=$v/@name] := count($v/outE) end Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 45. A Grateful Dead Dataset – Out-Degree of Each Vertex gremlin> g:sort($degrees, ‘value’, true()) ==>PLAYING IN THE BAND=96.0 ==>SUGAR MAGNOLIA=92.0 ==>PROMISED LAND=89.0 ==>GOOD LOVING=87.0 ==>NOT FADE AWAY=86.0 ==>I KNOW YOU RIDER=85.0 ==>CASSIDY=83.0 ==>DEAL=82.0 ==>JACK STRAW=81.0 ==>ONE MORE SATURDAY NIGHT=81.0 ==>EL PASO=80.0 ==>MEXICALI BLUES=79.0 ... Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 46. A Grateful Dead Dataset – Inspecting Single Vertex gremlin> $v := g:key(‘name’,‘CHINA DOLL’)[1] ==>v[129] gremlin> g:map($v) ==>name=CHINA DOLL ==>song_type=original ==>performances=114 ==>type=song gremlin> $v/outE[@label=‘sung_by’]/inV/@name ==>Garcia Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 47. A Grateful Dead Dataset – Inspecting Single Vertex gremlin> $v/outE[@label=‘followed_by’]/inV/@name ==>BIG RIVER ==>THROWING STONES ==>SAMSON AND DELILAH ==>TRUCKING ==>CASEY JONES ==>HIGH TIME ... gremlin> $v/outE[@label=‘followed_by’]/@weight ==>2 ==>8 ==>1 ==>2 ==>1 ==>1 ... Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 48. Introduction to PageRank • The remainder of this section will discuss the PageRank algorithm and its application to multi-relational graphs. • The arguments made and the examples presented generalizes to all other single-relational graph algorithms. However, for the sake of brevity and consistency, only PageRank will be discussed. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 49. Introduction to Matrix-Based PageRank • PageRank is a centrality measure based on the primary eigenvector |V |×|V | of a modified version of a graph. Let A ∈ R+ denote the adjacency matrix representing the graph. • In order to ensure a positive real values in the eigenvector, the graph must be strongly connected. PageRank induces strong connectivity by overlaying a low probability (defined by α ∈ [0, 1] – usually 0.15) 1 |V |×|V | “teleportation” graph over the original graph. Let B ∈ |V | denote a teleportation adjacency matrix where ever vertex is connected to vertex with equal probability. |V |×|V | C = (1 − α)A + αB, where C ∈ R+ |V | λ = λC, where λ ∈ R+ is the PageRank vector over V . Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 50. Introduction to Random Walk-Based PageRank • PageRank can be implemented by a random walk. • Create a vertex counter map, m : V → N+. • Place a walker on a random vertex in V . Denote the walker’s current vertex i ∈ V . 1. increment the vertex counter by 1 (i.e. m(i) ← m(i) + 1). 2. the walker chooses a random adjacent vertex with probability α. 3. the walker chooses a random vertex in V with probability 1 − α. 4. rinse and repeat until m reaches a stationary probability distribution (continually normalize m if you want a probability distribution). • We will use this random walk model in the Gremlin examples to follow. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 51. PageRank over Multi-Relational Graphs • PageRank was designed for single-relational graphs (i.e. where all edges have the same meaning). • In a multi-relational graph, what does it mean to find the centrality of a vertex when vertices can be related by various types of edges? For example, if there exists “socializes with” and “met once”, then the person who “met once” many people could be the most centrally located in the graph. Also, what if you graph has more than just “person”-type vertices (e.g. cars, pets, buildings, articles, etc.) and “person”-type edges (e.g. owns, walks, livesAt, cites, etc.). Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 52. PageRank over Multi-Relational Graphs • Calculating single-relational PageRank would yield Person as the most central ... Person type vertex. type type • You can boolean filter certain edge labels type type (e.g. ignore type edges — in such cases, type type type type type type type you would have the centrality scores over the knows social graph). • However, what if you only wanted to traverse knows edges if and only if the Herbert Johan Marko Josh Jen ... adjacent vertex knows more than 10 other people? knows knows knows knows • In the end, you want complete knows knows control (universal computability) over the paths that the traverser/walker can take through a graph. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 53. PageRank over Multi-Relational Graphs • In multi-relational graphs, the meaning of your graph algorithm’s results are defined by your definition of adjacency. • With respect to random walk-based PageRank, define the path that the walker should take. That path is the definition of adjacency. • The stationary probability distribution created from this walk yields a path-dependent centrality. • Thus, in a multi-relational graph, there are many types of PageRanks that can be calculated — one for each type of path defined for a walker. Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks”, Knowledge-Based Systems, 21(7), 727–739, http://arxiv.org/abs/0803.4355, October 2008. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 54. PageRank over “Garcia Followed By” SubGraph • Define a path that will go from song-to-song by “followed by” edges and only traverse songs that are “sung by” Jerry Garcia. (./outE[@label=‘followed_by’]/inV/outE[@label=‘sung_by’] /inV[name=‘Garcia’]/../..)[g:rand-nat()] A B C D /../.. followed_by sung_by name="Garcia" g:rand-nat() . followed_by sung_by name="Garcia" followed_by sung_by name="Weir" Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 55. PageRank over “Garcia Followed By” SubGraph path garcia-followed_by (./outE[@label=‘followed_by’]/inV/outE[@label=‘sung_by’] /inV[name=‘Garcia’]/../..)[g:rand-nat()] end $m := g:map() $alpha := 0.15 $_ := g:key(‘type’, ‘song’)[g:rand-nat()] repeat 2500 $_ := ./garcia-followed_by if count($_) > 0 g:op-value(‘+’,$m,$_[1]/@name, 1.0) end if g:rand-real() < $alpha or count($_) = 0 $_ := g:key(‘type’, ’song’)[g:rand-nat()] end end Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 56. PageRank over “Garcia Followed By” SubGraph gremlin> g:sort($m,‘value’,true()) ==>CRAZY FINGERS=98.0 ==>HES GONE=85.0 ==>CHINA CAT SUNFLOWER=79.0 ==>BERTHA=76.0 ==>UNCLE JOHNS BAND=74.0 ==>TERRAPIN STATION=72.0 ==>GOING DOWN THE ROAD FEELING BAD=71.0 ==>WHARF RAT=71.0 ==>EYES OF THE WORLD=65.0 ==>COLD RAIN AND SNOW=62.0 ==>SHIP OF FOOLS=58.0 ==>RAMBLE ON ROSE=53.0 ==>CASEY JONES=51.0 ==>DARK STAR=47.0 ==>DEAL=46.0 ... Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 57. Universal Computation in Paths path path-name # any arbitrary computation can occur here end • A path definition can be used to define adjacencies. adjacency can be expressed as anything that can be computed by a Turing machine. path definitions are used to create “semantically meaningful” results from single- relational graph algorithms applied to multi-relational graphs. path definitions make explicit what is implicit in the structure of the graph. This has applications to knowledge-based reasoning. • A path definition can perform any arbitrary computation. path definitions can check/set vertex/edge properties. path definitions can create new vertices and edges. path definitions can call/define functions. This allows fine grained control over how your traverser/walker moves through a graph. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 58. Outline • Introduction to Graphs and Graph Software • Basic Gremlin Concepts • Gremlin Language Description • Advanced Gremlin Concepts • Conclusions Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 59. The Current Gremlin EcoSystems • Webling: Web console for Gremlin (developed by Pavel Yaskevich w/ funding from Neo Technology) Webling • Project Gargamel: Distributed Graph Computing (uses Linked Process and Gremlin) • ReXster: A Graph-Based Recommender Engine Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010
  • 60. Thank You Please enjoy Gremlin at http://gremlin.tinkerpop.com ... My homepage is http://markorodriguez.com. Please feel to contact me with any questions or comments. Center for Nonlinear Studies PostDoc Seminar – Los Alamos National Laboratory – February 25, 2010