Contenu connexe Similaire à OQGraph @ SCaLE 11x 2013 (20) OQGraph @ SCaLE 11x 20131. OQGraph 3 for MariaDB
Graphs and Hierarchies in Plain SQL
http://goo.gl/gqr7b
Antony T Curtis <atcurtis@gmail.com>
graph@openquery.com
http://openquery.com/graph
2. Graphs / Networks
● Nodes connected by Edges.
● Edges may be directional.
● Edges may have a "weight" / "cost" attribute.
● Directed graphs may have bi-directional edges.
● Unconnected sets of nodes may exist on same graph.
● There need not be a "root" node.
Examples:
● "Social Graphs" / friend relationships.
● Decision / State graphs.
● Airline routes
OQGRAPH computation engine © 2009-2013 Open Query
3. RDBMS with Heirarchies and Graphs
● Not always a particularly good fit.
● Various tree models exist; each with limitations:
○ Adjacency model
■ Either uses fixed max depth or recursive queries.
■ Oracle has CONNECT BY PRIOR
■ SQL99 has WITH RECURSIVE...UNION...
○ Nested set
■ complex
■ recursive queries to find path to root.
○ Materialised path
■ Ugly and not relational.
■ Can be quite effective when used correctly.
Further reading: http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
OQGRAPH computation engine © 2009-2013 Open Query
4. What is OQGRAPH?
● Implemented as a storage engine.
○ Original concept by Arjen Lentz
● Mk. 2 implementation 2008
○ GPLv2+
○ Bundled with MariaDB 5.2+
○ Boost Graph Library
● Mk. 3 implementation
○ GPLv2+
○ Bundled with MariaDB 10.0 (soon)
● Easy to enable
○ INSTALL PLUGIN oqgraph SONAME ‘ha_oqgraph’;
OQGRAPH computation engine © 2009-2013 Open Query
5. OQGRAPH: A Computation Engine
● It is not a general purpose data engine.
○ unlike MyISAM, InnoDB or MEMORY.
● Looks like an ordinary table.
● Has a very different internal architecture.
● It does not operate in terms of
○ storing data for later retrieval.
○ having indexes on data.
● May be regarded as a "magic view" or "table function".
OQGRAPH computation engine © 2009-2013 Open Query
6. OQGRAPH: A Computation Engine
MySQL Server Communications, Session and Thread Management
DDL, DML,
Management Tables, SQL Parser and SQL
Views, Lock
Services, Management
Stored Procedure Engine
Buffers
Logging,
and
Utilities and Caches
Runtime
Libraries
Query Optimizer and Execution Engine
built in and run-time loaded plug ins
OQGraph
InnoDB
OQGRAPH computation engine © 2009-2013 Open Query
7. What's new in OQGRAPH 3
Features:
● Judy array bitmaps for Graph coloring.
● Uses existing tables for edge data.
● Much lower memory cost per query.
● Does not impose any strict structure on the source table.
● Can handle significantly larger graphs than OQGRAPHv2.
○ 100K+ index reads per second are possible.
○ Millions of edges are possible.
● All edges of graph need not fit in memory.
○ Only Judy bitmap array must be held in RAM.
Notes:
● Tables are read-only and only read from the backing table.
● Table must be in same schema as the backing table.
● Table must have appropriate indexes.
OQGRAPH computation engine © 2009-2013 Open Query
8. Anatomy of an OQGRAPH 3 table
CREATE TABLE db.tblname (
latch SMALLINT UNSIGNED NULL,
origid BIGINT UNSIGNED NULL,
destid BIGINT UNSIGNED NULL,
weight DOUBLE NULL,
seq BIGINT UNSIGNED NULL,
linkid BIGINT UNSIGNED NULL,
KEY (latch, origid, destid) USING HASH,
KEY (latch, destid, origid) USING HASH
) ENGINE=OQGRAPH
data_table='link' -- data table
origid='source' -- column name
destid='target' -- column name
weight='weight'; -- optional column name
;
OQGRAPH computation engine © 2009-2013 Open Query
9. OQGRAPH - Data source
● Edges are directed edges.
● Edge weight are optional and default to 1.0
● Undirected edges may be represented as two directed
edges, in opposite directions.
CREATE TABLE foo (
origid INT UNSIGNED NOT NULL,
destid INT UNSIGNED NOT NULL,
PRIMARY KEY(origid, destid),
KEY (destid)
);
INSERT INTO foo (origid,destid) VALUES
(1,2), (2,3), (2,4),
(4,5), (3,6), (5,6);
OQGRAPH computation engine © 2009-2013 Open Query
10. OQGRAPH - Data source, cont.
Creating the OQGRAPH table:
CREATE TABLE foo_graph (
latch SMALLINT UNSIGNED NULL,
origid BIGINT UNSIGNED NULL,
destid BIGINT UNSIGNED NULL,
weight DOUBLE NULL,
seq BIGINT UNSIGNED NULL,
linkid BIGINT UNSIGNED NULL,
KEY (latch, origid, destid) USING HASH,
KEY (latch, destid, origid) USING HASH
) ENGINE=OQGRAPH
data_table='foo' origid='origid' destid='destid';
OQGRAPH computation engine © 2009-2013 Open Query
11. Selecting Edges
MariaDB [foo]> select * from foo_graph;
+-------+--------+--------+--------+------+--------+
| latch | origid | destid | weight | seq | linkid |
+-------+--------+--------+--------+------+--------+
| NULL | 1 | 2 | 1 | NULL | NULL |
| NULL | 2 | 3 | 1 | NULL | NULL |
| NULL | 2 | 4 | 1 | NULL | NULL |
| NULL | 3 | 6 | 1 | NULL | NULL |
| NULL | 4 | 5 | 1 | NULL | NULL |
| NULL | 5 | 6 | 1 | NULL | NULL |
+-------+--------+--------+--------+------+--------+
6 rows in set (0.38 sec)
OQGRAPH computation engine © 2009-2013 Open Query
12. Now, it's time for some magic.
(shortest path calculation)
● SELECT * FROM foo_graph
WHERE latch=1 AND origid=1 AND destid=6;
+-------+--------+--------+--------+------+--------+
| latch | origid | destid | weight | seq | linkid |
+-------+--------+--------+--------+------+--------+
| 1 | 1 | 6 | NULL | 0 | 1 |
| 1 | 1 | 6 | 1 | 1 | 2 |
| 1 | 1 | 6 | 1 | 2 | 3 |
| 1 | 1 | 6 | 1 | 3 | 6 |
+-------+--------+--------+--------+------+--------+
● SELECT GROUP_CONCAT(linkid ORDER BY seq) AS path
FROM foo_graph WHERE latch=1 AND origid=1 AND destid=6 G
path: 1,2,3,6
OQGRAPH computation engine © 2009-2013 Open Query
13. Other computations,
● Which paths lead to node 4?
SELECT GROUP_CONCAT(linkid) AS list
FROM foo_graph WHERE latch=1 AND destid=4 G
list: 1,2,4
● Where can I get to from node 4?
SELECT GROUP_CONCAT(linkid) AS list
FROM foo_graph WHERE latch=1 AND origid=4 G
list: 6,5,4
OQGRAPH computation engine © 2009-2013 Open Query
14. Other computations, continued.
● See docs for latch 0 and latch NULL
● latch 1 : Dijkstra's shortest path.
○ O((V + E).log V)
● latch 2 : Breadth-first search.
○ O(V+E)
● Other algorithms possible
OQGRAPH computation engine © 2009-2013 Open Query
15. Joins make it prettier,
● INSERT INTO people VALUES
(1,’pearce’), (2,’hunnicut’), (3,’potter’),
(4,’hoolihan’), (5,’winchester’), (6,’
mulcahy’);
● SELECT GROUP_CONCAT(name ORDER BY seq) path
FROM foo_graph
JOIN people ON (foo.linkid = people.id)
WHERE latch=1 AND origid=1 AND destid=6 G
path: pearce,hunnicut,potter,mulcahy
OQGRAPH computation engine © 2009-2013 Open Query
16. Tree of Life
Load the tol.sql schema,
Create tol_link backing store table,
CREATE TABLE tol_link (
source INT UNSIGNED NOT NULL,
target INT UNSIGNED NOT NULL,
PRIMARY KEY (source, target),
KEY (target) ) ENGINE=innodb;
Populate it with all the edges we need:
INSERT INTO tol_link (source,target)
SELECT parent,id FROM tol WHERE parent IS NOT NULL
UNION ALL
SELECT id,parent FROM tol WHERE parent IS NOT NULL;
Query OK, 178102 rows affected (46.35 sec)
Records: 178102 Duplicates: 0 Warnings: 0
Direct download: http://bazaar.launchpad.net/~openquery-core/oqgraph/trunk/view/head:/examples/tree-of-life/tol.sql
OQGRAPH computation engine © 2009-2013 Open Query
17. Tree of Life, cont.
Creating the OQGRAPH table:
CREATE TABLE tol_tree (
latch SMALLINT UNSIGNED NULL,
origid BIGINT UNSIGNED NULL,
destid BIGINT UNSIGNED NULL,
weight DOUBLE NULL,
seq BIGINT UNSIGNED NULL,
linkid BIGINT UNSIGNED NULL,
KEY (latch, origid, destid) USING HASH,
KEY (latch, destid, origid) USING HASH
) ENGINE=OQGRAPH
data_table='tol_link' origid='source' destid='target';
OQGRAPH computation engine © 2009-2013 Open Query
18. Tree of Life - finding H.Sapiens
SELECT
GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS path
FROM tol_tree JOIN tol ON (linkid=id)
WHERE latch=1 AND origid=1 AND destid=16421 G
path: Life on Earth -> Eukaryotes -> Unikonts ->
Opisthokonts -> Animals -> Bilateria ->
Deuterostomia -> Chordata -> Craniata -> Vertebrata
-> Gnathostomata -> Teleostomi -> Osteichthyes ->
Sarcopterygii -> Terrestrial Vertebrates ->
Tetrapoda -> Reptiliomorpha -> Amniota -> Synapsida
-> Eupelycosauria -> Sphenacodontia ->
Sphenacodontoidea -> Therapsida -> Theriodontia ->
Cynodontia -> Mammalia -> Eutheria -> Primates ->
Catarrhini -> Hominidae -> Homo -> Homo sapiens
OQGRAPH computation engine © 2009-2011 Open Query
19. Internet Movie DataBase (IMDB)
Transform and load the movie database (this takes a long time)
CREATE TABLE `entity` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` enum('ACTOR','MOVIE','TV MOVIE','TV MINI','TV SERIES','VIDEO
MOVIE','VIDEO GAME','VOICE','ARCHIVE') NOT NULL,
`name` varchar(128) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `type` (`type`,`name`) USING BTREE
) ENGINE=InnoDB;
CREATE TABLE `link` (
`rel_id` int(11) NOT NULL AUTO_INCREMENT,
`link_from` int(11) NOT NULL,
`link_to` int(11) NOT NULL,
PRIMARY KEY (`rel_id`),
KEY `link_from` (`link_from`,`link_to`),
KEY `link_to` (`link_to`)
) ENGINE=InnoDB;
OQGRAPH computation engine © 2009-2013 Open Query
20. Degrees of N!xau
Graph of movies approximately 3.7 million nodes with 9 million edges. Tables are
about 1GB and InnoDB configured for 512MB buffer pool.
MariaDB [imdb]> SELECT
-> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS
path
-> FROM movie_graph JOIN entity ON (id=linkid)
-> WHERE latch=1
-> AND origid=(SELECT a.id FROM entity a
-> WHERE name='Kevin Bacon')
-> AND destid=(SELECT b.id FROM entity b
WHERE name='N!xau')G
OQGRAPH computation engine © 2009-2013 Open Query
21. Degrees of N!xau
Graph of movies approximately 3.7 million nodes with 9 million edges. Tables are
about 1GB and InnoDB configured for 512MB buffer pool.
MariaDB [imdb]> SELECT
-> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS
path
-> FROM movie_graph JOIN entity ON (id=linkid)
-> WHERE latch=1
-> AND origid=(SELECT a.id FROM entity a
-> WHERE name='Kevin Bacon')
-> AND destid=(SELECT b.id FROM entity b
WHERE name='N!xau')G
*************************** 1. row ***************************
path: Kevin Bacon -> The Air Up There (1994) -> Fanyana H. Sidumo ->
The Gods Must Be Crazy (1981) -> N!xau
1 row in set (3 min 9.67 sec)
--again
*************************** 1. row ***************************
path: Kevin Bacon -> The Air Up There (1994) -> Fanyana H. Sidumo ->
The Gods Must Be Crazy (1981) -> N!xau
1 row in set (1 min 7.13 sec)
Each query requires approximately 7.8 million secondary key reads.
OQGRAPH computation engine © 2009-2013 Open Query
22. Degrees of N!xau
Graph of approximately 3.7 million nodes with 30 million edges. Tables are about
3.5GB and InnoDB configured for 512MB buffer pool.
MariaDB [imdb]> SELECT
-> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS
path
-> FROM imdb_graph JOIN entity ON (id=linkid)
-> WHERE latch=1
-> AND origid=(SELECT a.id FROM entity a
-> WHERE name='Kevin Bacon')
-> AND destid=(SELECT b.id FROM entity b
WHERE name='N!xau')G
*************************** 1. row ***************************
path: Kevin Bacon -> The 45th Annual Golden Globe Awards (1988) ->
Richard Attenborough -> In Darkest Hollywood: Cinema and Apartheid
(1993) -> N!xau
1 row in set (10 min 6.55 sec)
--again
*************************** 1. row ***************************
path: Kevin Bacon -> The 45th Annual Golden Globe Awards (1988) ->
Richard Attenborough -> In Darkest Hollywood: Cinema and Apartheid
(1993) -> N!xau
1 row in set (8 min 29.66 sec)
Each query requires approximately 16.6 million secondary key reads.
OQGRAPH computation engine © 2009-2013 Open Query
23. We want your feedback!
● Very easy to use...
But do feel free to ask us for help/advice.
● OpenQuery created friendlist_graph for Drupal 6.
○ Currently based on OQGraph v2
○ Addition to the existing friendlist module.
○ Enables easy social networking in Drupal.
○ Peter Lieverdink (@cafuego) did this in about 30 minutes
● We would like to know how you are using OQGRAPH!
○ You could be doing something really cool...
OQGRAPH computation engine © 2009-2013 Open Query
24. Links and support
● Binaries & Packages
○ http://mariadb.com (MariaDB 10.0 soon)
● Source collaboration
○ https://launchpad.net/oqgraph
○ https://code.launchpad.net/~oqgraph-dev/maria/10.0-oqgraph3
● Info, Docs, Support, Licensing, Engineering
○ http://openquery.com/graph
○ This presentation: http://goo.gl/gqr7b
Thank you!
Antony Curtis & Arjen Lentz
graph@openquery.com
OQGRAPH computation engine © 2009-2013 Open Query