The document discusses graphs and their components such as nodes and edges. It provides examples of different types of graphs like directed and undirected graphs. It also discusses graph data structures and storage engines that can be used for graph databases like Neo4j and Redis. Common graph algorithms and operations on social networks like recommendations are also covered.
56. There are many of these ‘fundamental’
graph units:
- tripartite graphs (user/asset/tag)
- folksonomies
- multicolor-multiparity graph
- etc.
Wednesday, March 9, 2011
58. Neo4j
“An embedded, disk-based, fully transactional Java persistence engine that
stores data structured in graphs rather than in tables.”
http://neo4j.org
Wednesday, March 9, 2011
59. HypergraphDB
“A general purpose, extensible, portable, distributed, embeddable, open-source
data storage mechanism. It is a graph database designed specifically for
artificial intelligence and semantic web projects.”
http://kobrix.org/hgdb.jsp
Wednesday, March 9, 2011
61. FlockDB
“FlockDB is a database that stores graph data, but it isn't a database
optimized for graph-traversal operations. Instead, it's optimized for very
large adjacency lists, fast reads and writes, and page-able set arithmetic
queries.”
http://engineering.twitter.com/2010/05/introducing-flockdb.html
Wednesday, March 9, 2011
62. Redis
“Redis is an advanced key-value store. [...] the dataset is not volatile, and values
can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All
this data types can be manipulated with atomic operations to push/pop elements,
add/remove elements, perform server side union, intersection, difference between
sets, etc.”
http://code.google.com/p/redis
Wednesday, March 9, 2011
64. Redis makes you think in terms of datastructures,
and operations on those structures.
Wednesday, March 9, 2011
65. Set:
Finite (for our cases) collection of objects in which
order has no significance and multiplicity is generally
ignored.
S = { Alice, Bob, Carol }
List:
Finite (for our cases) collection of objects in which
order *is* significant and multiplicity is allowed.
L = [ X, Y, X, Z, Q]
Wednesday, March 9, 2011
66. Insert a user into a set
SET uid:1000:username jperras
Command Key Value
Wednesday, March 9, 2011
67. Use sets for denoting my followers/people
I follow.
Wednesday, March 9, 2011
68. Adding a new follower
SADD uid:1000:following 1001
SADD uid:1001:followers 1000
Command Key Value
Wednesday, March 9, 2011
69. Posting Updates
$r = Redis();
$postid = $r->incr("global:nextPostId");
$post = $User['id'] ."|". time() ."|". $status;
$r->set("post:$postid", $post);
$followers = $r->smembers("uid:".$User['id'].":followers");
if ($followers === false) $followers = Array();
$followers[] = $User['id']; /* Add the post to our own posts too */
foreach($followers as $fid) {
$r->push("uid:$fid:posts", $postid, false);
}
# Push the post on the timeline, and trim the timeline to the
# newest 1000 elements.
$r->push("global:timeline", $postid, false);
$r->ltrim("global:timeline",0,1000);
Wednesday, March 9, 2011
70. Common followers? - Set intersections!
SINTER users:1000:followers users:1000:followers
Command Key 1 Key 2
Wednesday, March 9, 2011
72. # Mutual Friends
select f.friend_id
from friends f
join friends m
on m.user_id = f.friend_id
and m.friend_id = f.user_id
where f.user_id = 1234
# Following (for directed graphs)
select f.friend_id
from friends f
left join friends m
on m.user_id = f.friend_id
and m.friend_id = f.user_id
where f.user_id = 1234
and m.user_id is null;
# Followers (for directed graphs)
select m.friend_id
from friends f
left join friends m
on m.user_id = f.friend_id
and m.friend_id = f.user_id
where f.friend_id = 1234
and m.user_id is null
Wednesday, March 9, 2011
73. # Mutual Friends
select f.friend_id
from friends f
join friends m
on m.user_id = f.friend_id
and m.friend_id = f.user_id
where f.user_id = 1234
# Following (for directed graphs)
select f.friend_id
from friends f
left join friends m
Not too bad. on m.user_id = f.friend_id
and m.friend_id = f.user_id
where f.user_id = 1234
and m.user_id is null;
# Followers (for directed graphs)
select m.friend_id
from friends f
left join friends m
on m.user_id = f.friend_id
and m.friend_id = f.user_id
where f.friend_id = 1234
and m.user_id is null
Wednesday, March 9, 2011
74. Relational databases can work for the simplest
of cases, but are not always the best solution for
many graph operations/algorithms.
Wednesday, March 9, 2011
76. However, graph algorithms are hard.
So don’t write your own.
And make sure you use a persistent storage engine
that is best suited for the type of queries
you will be performing.
Wednesday, March 9, 2011
78. Resources
The Algorithm Design Manual,
Steve S. Skiena
Programming Collective
Intelligence, Toby Segaran
Introduction to Algorithms,
Cormen, Leiserson, Rivest
Wednesday, March 9, 2011
80. Photo Credits
Graph of the internet, circa 2003: http://www.duniacyber.com/freebies/education/what-
is-internet-lookslike/ (built from partial troll of public servers using traceroute)
My real friends for letting me use their Facebook profile images.
Wednesday, March 9, 2011
81. References
Large Scale Graph Algorithms (class lectures), Yuri Lifshits, Steklov Institute of
Mathematics at St. Petersburg
http://mathworld.wolfram.com/Set.html
Programming Collective Intelligence, Toby Segaran
The Algorithm Design Manual, Steve S. Skiena
Wednesday, March 9, 2011