This document discusses graph databases and analyzing relationships at scale. It provides an overview of graph databases, how they are used to represent complex relationships between entities as nodes and edges in a graph, and how graph queries and analytics can reveal useful insights by traversing the graph along relationships. It also briefly introduces Aurelius, an open source graph database platform, and some of its features for working with large graph datasets.
23. Apache 2
Aurelius Graph Cluster
TITAN FAUNUS FULGORA
Map/Reduce
Load
Bulk Load
Analysis results
back into Titan
Stores a massive-scale Batch processing of large Runs global graph algorithms
property graph allowing real- graphs with Hadoop
on large, compressed,
time traversals and updates
in-memory graphs
24. Titan Features
Numerous Concurrent Users
Many Short Transactions
read/write
Real-time Traversals (OLTP)
High Availability
Dynamic Scalability
Variable Consistency Model
ACID or eventual consistency
Real-time Big Graph Data
26. $ ./titan-0.2.0/bin/gremlin.sh!
! ! !,,,/!
(o o)!
-----oOOo-(_)-oOOo-----!
gremlin> g = TitanFactory.open('/tmp/titan')!
==>titangraph[local:/tmp/titan]!
gremlin> v = g.V(‘name’,’Hercules’)!
==>v[4]!
gremlin> v.out(‘father’).out(‘brother’).name!
27. Vertex-Centric Indices
Sort and index edges per
vertex by primary key
Primary key can be composite
Enables efficient focused
traversals
Only retrieve edges that matter
Uses push down predicates for
quick, index-driven retrieval
28. battled
battled
battled
time: 1
time: 3
time: 5
mother
battled
v
v.query()!
time: 9
father
fought
fought
29. battled
battled
battled
time: 1
time: 3
time: 5
mother
battled
v
v.query()!
time: 9
.direction(OUT)!
father
32. Titan Server
REST
REXPRO
$ wget http://s3.thinkaurelius.com/downloads/titan/titan-cassandra-0.3.0.zip!
$ unzip titan-cassandra-0.3.0.zip!
$ cd titan-cassandra-0.3.0!
$ sudo bin/titan.sh config/titan-server-rexster.xml config/titan-server-
cassandra.properties!
33. Graph Indexing
Vertex and Edge indexing
Pluggable index provider
ElasticSearch
Lucene
Full-text search
Numeric range search
Geographic search
34. name: Neptune
name: Alcmene
age: 4500
type: human
title: God of the age: 45
earth and ocean
brother
mother
name: Jupiter
name: Saturn
age: 4800
name: Hercules
name: Hydra
type: titan
title: God of the title: Divine hero
type: monster
age: 10000
heaven and skies
father
father
battled
time: 2
battled
locaion: [37.7,23.9]
brother
time:12
location: [39,22]
name: Pluto
name: Cerberus
age: 4000
title: Ugly beast of the
title: God of the
underworld
underworld
pet
35. name: Neptune
name: Alcmene
age: 4500
type: human
title: God of the age: 45
earth and ocean
brother
mother
name: Jupiter
name: Saturn
age: 4800
name: Hercules
name: Hydra
type: titan
title: God of the title: Divine hero
type: monster
age: 10000
heaven and skies
father
father
battled
time: 2
battled
locaion: [37.7,23.9]
brother
time:12
location: [39,22]
name: Pluto
name: Cerberus
age: 4000
title: Ugly beast of the
title: God of the
underworld
underworld
pet
g.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!
36. name: Neptune
name: Alcmene
age: 4500
type: human
title: God of the age: 45
earth and ocean
brother
mother
name: Jupiter
name: Saturn
age: 4800
name: Hercules
name: Hydra
type: titan
title: God of the title: Divine hero
type: monster
age: 10000
heaven and skies
father
father
battled
time: 2
battled
locaion: [37.7,23.9]
brother
time:12
location: [39,22]
name: Pluto
name: Cerberus
age: 4000
title: Ugly beast of the
title: God of the
underworld
underworld
pet
g.query().has(‘age’,GREATER_THAN,4500)
.has(‘title’,CONTAINS,’god’).vertices()!
37. name: Neptune
name: Alcmene
age: 4500
type: human
title: God of the age: 45
earth and ocean
brother
mother
name: Jupiter
name: Saturn
age: 4800
name: Hercules
name: Hydra
type: titan
title: God of the title: Divine hero
type: monster
age: 10000
heaven and skies
father
father
battled
time: 2
battled
locaion: [37.7,23.9]
brother
time:12
location: [39,22]
name: Pluto
name: Cerberus
age: 4000
title: Ugly beast of the
title: God of the
underworld
underworld
pet
g.query().has(‘location’,WITHIN,
Geoshape.circle(38,24,50).edges()!
38. Faunus Features
Hadoop-based Graph
Computing Framework
Graph Analytics
Breadth-first Traversals
Global Graph Computations
Batch Big Graph Data
41. Faunus Setup
$ bin/gremlin.sh !
,,,/!
(o o)!
-----oOOo-(_)-oOOo-----!
gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')!
==>faunusgraph[titanhbaseinputformat]!
gremlin> g.getProperties()!
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat!
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat!
==>faunus.output.location=dbpedia!
==>faunus.output.location.overwrite=true!
gremlin> g._() !
12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)!
12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:
MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]!
12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!
42. Build a Knowledge Graph
Based on DBPedia
Graph version of Wikipedia
~290 million edges (~1B triples)
1. Bulk load RDF into Faunus
6 m1.xlarge
2. Convert to property graph
3. Bulk load into Titan
3 m1.xlarge with Cassandra
4. OLTP+OLAP
Total Time: ~ 2 hours
49. Apache 2
Aurelius Graph Cluster
TITAN FAUNUS FULGORA
Map/Reduce
Load
Bulk Load
Analysis results
aureliusgraphs@googlegroups.com
back into Titan
Stores a massive-scale Batch processing of large Runs global graph algorithms
property graph allowing real- graphs with Hadoop
on large, compressed,
time traversals and updates
in-memory graphs