SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
OQGraph 3 for MariaDB
   Graphs and Hierarchies in Plain SQL
             http://goo.gl/gqr7b




Antony T Curtis <atcurtis@gmail.com>


                        graph@openquery.com
                        http://openquery.com/graph
Graphs / Networks
     ● Nodes connected by Edges.
     ● Edges may be directional.
     ● Edges may have a "weight" / "cost" attribute.
     ● Directed graphs may have bi-directional edges.
     ● Unconnected sets of nodes may exist on same graph.
     ● There need not be a "root" node.




   Examples:
    ● "Social Graphs" / friend relationships.
    ● Decision / State graphs.
    ● Airline routes
OQGRAPH computation engine © 2009-2013 Open Query
RDBMS with Heirarchies and Graphs

     ● Not always a particularly good fit.
     ● Various tree models exist; each with limitations:
        ○ Adjacency model
           ■ Either uses fixed max depth or recursive queries.
           ■ Oracle has CONNECT BY PRIOR
           ■ SQL99 has WITH RECURSIVE...UNION...
        ○ Nested set
           ■ complex
           ■ recursive queries to find path to root.
        ○ Materialised path
           ■ Ugly and not relational.
           ■ Can be quite effective when used correctly.

                                              Further reading: http://dev.mysql.com/tech-resources/articles/hierarchical-data.html

OQGRAPH computation engine © 2009-2013 Open Query
What is OQGRAPH?

    ● Implemented as a storage engine.
       ○ Original concept by Arjen Lentz
    ● Mk. 2 implementation 2008
       ○ GPLv2+
       ○ Bundled with MariaDB 5.2+
       ○ Boost Graph Library
    ● Mk. 3 implementation
       ○ GPLv2+
       ○ Bundled with MariaDB 10.0 (soon)
    ● Easy to enable
         ○ INSTALL PLUGIN oqgraph SONAME ‘ha_oqgraph’;




OQGRAPH computation engine © 2009-2013 Open Query
OQGRAPH: A Computation Engine

     ● It is not a general purpose data engine.
        ○ unlike MyISAM, InnoDB or MEMORY.
     ● Looks like an ordinary table.
     ● Has a very different internal architecture.
     ● It does not operate in terms of
        ○ storing data for later retrieval.
        ○ having indexes on data.

     ● May be regarded as a "magic view" or "table function".




OQGRAPH computation engine © 2009-2013 Open Query
OQGRAPH: A Computation Engine

                MySQL Server                   Communications, Session and Thread Management



                                       DDL, DML,
                 Management             Tables,            SQL Parser and SQL
                                      Views, Lock
                  Services,           Management
                                                         Stored Procedure Engine
                                                                                               Buffers
                  Logging,
                                                                                                and
                 Utilities and                                                                 Caches
                  Runtime
                  Libraries
                                       Query Optimizer and Execution Engine

                                                built in and run-time loaded plug ins


                                         OQGraph


                          InnoDB




OQGRAPH computation engine © 2009-2013 Open Query
What's new in OQGRAPH 3
   Features:
    ● Judy array bitmaps for Graph coloring.
    ● Uses existing tables for edge data.
    ● Much lower memory cost per query.
    ● Does not impose any strict structure on the source table.
    ● Can handle significantly larger graphs than OQGRAPHv2.
       ○ 100K+ index reads per second are possible.
       ○ Millions of edges are possible.
    ● All edges of graph need not fit in memory.
       ○ Only Judy bitmap array must be held in RAM.
   Notes:
    ● Tables are read-only and only read from the backing table.
    ● Table must be in same schema as the backing table.
    ● Table must have appropriate indexes.

OQGRAPH computation engine © 2009-2013 Open Query
Anatomy of an OQGRAPH 3 table
   CREATE TABLE db.tblname (
     latch SMALLINT UNSIGNED NULL,
     origid BIGINT UNSIGNED NULL,
     destid BIGINT UNSIGNED NULL,
     weight DOUBLE NULL,
     seq BIGINT UNSIGNED NULL,
     linkid BIGINT UNSIGNED NULL,
     KEY (latch, origid, destid) USING HASH,
     KEY (latch, destid, origid) USING HASH
   ) ENGINE=OQGRAPH
     data_table='link'       -- data table
     origid='source'         -- column name
     destid='target'         -- column name
     weight='weight';        -- optional column name
   ;
OQGRAPH computation engine © 2009-2013 Open Query
OQGRAPH - Data source
     ● Edges are directed edges.
     ● Edge weight are optional and default to 1.0
     ● Undirected edges may be represented as two directed
       edges, in opposite directions.

   CREATE TABLE foo (
      origid INT UNSIGNED NOT NULL,
      destid INT UNSIGNED NOT NULL,
      PRIMARY KEY(origid, destid),
      KEY (destid)
   );
   INSERT INTO foo (origid,destid) VALUES
   (1,2), (2,3), (2,4),
   (4,5), (3,6), (5,6);


OQGRAPH computation engine © 2009-2013 Open Query
OQGRAPH - Data source, cont.

   Creating the OQGRAPH table:
   CREATE TABLE foo_graph (
     latch SMALLINT UNSIGNED NULL,
     origid BIGINT UNSIGNED NULL,
     destid BIGINT UNSIGNED NULL,
     weight DOUBLE NULL,
     seq BIGINT UNSIGNED NULL,
     linkid BIGINT UNSIGNED NULL,
     KEY (latch, origid, destid) USING HASH,
     KEY (latch, destid, origid) USING HASH
   ) ENGINE=OQGRAPH
     data_table='foo' origid='origid' destid='destid';




OQGRAPH computation engine © 2009-2013 Open Query
Selecting Edges

   MariaDB [foo]> select * from foo_graph;
   +-------+--------+--------+--------+------+--------+
   | latch | origid | destid | weight | seq | linkid |
   +-------+--------+--------+--------+------+--------+
   | NULL |       1 |      2 |      1 | NULL |   NULL |
   | NULL |       2 |      3 |      1 | NULL |   NULL |
   | NULL |       2 |      4 |      1 | NULL |   NULL |
   | NULL |       3 |      6 |      1 | NULL |   NULL |
   | NULL |       4 |      5 |      1 | NULL |   NULL |
   | NULL |       5 |      6 |      1 | NULL |   NULL |
   +-------+--------+--------+--------+------+--------+
   6 rows in set (0.38 sec)




OQGRAPH computation engine © 2009-2013 Open Query
Now, it's time for some magic.
   (shortest path calculation)

      ● SELECT * FROM foo_graph
        WHERE latch=1 AND origid=1 AND destid=6;
        +-------+--------+--------+--------+------+--------+
        | latch | origid | destid | weight | seq | linkid |
        +-------+--------+--------+--------+------+--------+
        |     1 |      1 |      6 |   NULL |    0 |      1 |
        |     1 |      1 |      6 |      1 |    1 |      2 |
        |     1 |      1 |      6 |      1 |    2 |      3 |
        |     1 |      1 |      6 |      1 |    3 |      6 |
        +-------+--------+--------+--------+------+--------+


      ● SELECT GROUP_CONCAT(linkid ORDER BY seq) AS path
        FROM foo_graph WHERE latch=1 AND origid=1 AND destid=6 G
        path: 1,2,3,6


OQGRAPH computation engine © 2009-2013 Open Query
Other computations,
     ● Which paths lead to node 4?
        SELECT GROUP_CONCAT(linkid) AS list
        FROM foo_graph WHERE latch=1 AND destid=4 G

        list: 1,2,4


     ● Where can I get to from node 4?
        SELECT GROUP_CONCAT(linkid) AS list
        FROM foo_graph WHERE latch=1 AND origid=4 G

        list: 6,5,4




OQGRAPH computation engine © 2009-2013 Open Query
Other computations, continued.

     ● See docs for latch 0 and latch NULL
     ● latch 1 : Dijkstra's shortest path.
        ○ O((V + E).log V)
     ● latch 2 : Breadth-first search.
        ○ O(V+E)
     ● Other algorithms possible




OQGRAPH computation engine © 2009-2013 Open Query
Joins make it prettier,
     ● INSERT INTO people VALUES
       (1,’pearce’), (2,’hunnicut’), (3,’potter’),
       (4,’hoolihan’), (5,’winchester’), (6,’
       mulcahy’);


     ● SELECT GROUP_CONCAT(name ORDER BY seq) path
       FROM foo_graph
       JOIN people ON (foo.linkid = people.id)
       WHERE latch=1 AND origid=1 AND destid=6 G

        path: pearce,hunnicut,potter,mulcahy


OQGRAPH computation engine © 2009-2013 Open Query
Tree of Life
 Load the tol.sql schema,

 Create tol_link backing store table,
 CREATE TABLE tol_link (
   source INT UNSIGNED NOT NULL,
   target INT UNSIGNED NOT NULL,
   PRIMARY KEY (source, target),
   KEY (target) ) ENGINE=innodb;

 Populate it with all the edges we need:
 INSERT INTO tol_link (source,target)
 SELECT parent,id FROM tol WHERE parent IS NOT NULL
 UNION ALL
 SELECT id,parent FROM tol WHERE parent IS NOT NULL;
 Query OK, 178102 rows affected (46.35 sec)
 Records: 178102 Duplicates: 0 Warnings: 0

                 Direct download: http://bazaar.launchpad.net/~openquery-core/oqgraph/trunk/view/head:/examples/tree-of-life/tol.sql

OQGRAPH computation engine © 2009-2013 Open Query
Tree of Life, cont.

   Creating the OQGRAPH table:
   CREATE TABLE tol_tree (
     latch SMALLINT UNSIGNED NULL,
     origid BIGINT UNSIGNED NULL,
     destid BIGINT UNSIGNED NULL,
     weight DOUBLE NULL,
     seq BIGINT UNSIGNED NULL,
     linkid BIGINT UNSIGNED NULL,
     KEY (latch, origid, destid) USING HASH,
     KEY (latch, destid, origid) USING HASH
   ) ENGINE=OQGRAPH
     data_table='tol_link' origid='source' destid='target';




OQGRAPH computation engine © 2009-2013 Open Query
Tree of Life - finding H.Sapiens

   SELECT
      GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS path
      FROM tol_tree JOIN tol ON (linkid=id)
      WHERE latch=1 AND origid=1 AND destid=16421 G

   path: Life on Earth -> Eukaryotes -> Unikonts ->
   Opisthokonts -> Animals -> Bilateria ->
   Deuterostomia -> Chordata -> Craniata -> Vertebrata
   -> Gnathostomata -> Teleostomi -> Osteichthyes ->
   Sarcopterygii -> Terrestrial Vertebrates ->
   Tetrapoda -> Reptiliomorpha -> Amniota -> Synapsida
   -> Eupelycosauria -> Sphenacodontia ->
   Sphenacodontoidea -> Therapsida -> Theriodontia ->
   Cynodontia -> Mammalia -> Eutheria -> Primates ->
   Catarrhini -> Hominidae -> Homo -> Homo sapiens

OQGRAPH computation engine © 2009-2011 Open Query
Internet Movie DataBase (IMDB)
 Transform and load the movie database (this takes a long time)
 CREATE TABLE `entity` (
   `id` int(11) NOT NULL AUTO_INCREMENT,
   `type` enum('ACTOR','MOVIE','TV MOVIE','TV MINI','TV SERIES','VIDEO
 MOVIE','VIDEO GAME','VOICE','ARCHIVE') NOT NULL,
   `name` varchar(128) COLLATE utf8_unicode_ci NOT NULL,
   PRIMARY KEY (`id`),
   UNIQUE KEY `type` (`type`,`name`) USING BTREE
 ) ENGINE=InnoDB;

 CREATE TABLE `link` (
   `rel_id` int(11) NOT NULL AUTO_INCREMENT,
   `link_from` int(11) NOT NULL,
   `link_to` int(11) NOT NULL,
   PRIMARY KEY (`rel_id`),
   KEY `link_from` (`link_from`,`link_to`),
   KEY `link_to` (`link_to`)
 ) ENGINE=InnoDB;




OQGRAPH computation engine © 2009-2013 Open Query
Degrees of N!xau
 Graph of movies approximately 3.7 million nodes with 9 million edges. Tables are
 about 1GB and InnoDB configured for 512MB buffer pool.
 MariaDB [imdb]> SELECT
               -> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS
 path
               -> FROM movie_graph JOIN entity ON (id=linkid)
               -> WHERE latch=1
               -> AND origid=(SELECT a.id FROM entity a
               ->               WHERE name='Kevin Bacon')
               -> AND destid=(SELECT b.id FROM entity b
                                WHERE name='N!xau')G




OQGRAPH computation engine © 2009-2013 Open Query
Degrees of N!xau
 Graph of movies approximately 3.7 million nodes with 9 million edges. Tables are
 about 1GB and InnoDB configured for 512MB buffer pool.
 MariaDB [imdb]> SELECT
               -> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS
 path
               -> FROM movie_graph JOIN entity ON (id=linkid)
               -> WHERE latch=1
               -> AND origid=(SELECT a.id FROM entity a
               ->               WHERE name='Kevin Bacon')
               -> AND destid=(SELECT b.id FROM entity b
                                WHERE name='N!xau')G
 *************************** 1. row ***************************
 path: Kevin Bacon -> The Air Up There (1994) -> Fanyana H. Sidumo ->
 The Gods Must Be Crazy (1981) -> N!xau
 1 row in set (3 min 9.67 sec)
 --again
 *************************** 1. row ***************************
 path: Kevin Bacon -> The Air Up There (1994) -> Fanyana H. Sidumo ->
 The Gods Must Be Crazy (1981) -> N!xau
 1 row in set (1 min 7.13 sec)
 Each query requires approximately 7.8 million secondary key reads.




OQGRAPH computation engine © 2009-2013 Open Query
Degrees of N!xau
 Graph of approximately 3.7 million nodes with 30 million edges. Tables are about
 3.5GB and InnoDB configured for 512MB buffer pool.
 MariaDB [imdb]> SELECT
               -> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS
 path
               -> FROM imdb_graph JOIN entity ON (id=linkid)
               -> WHERE latch=1
               -> AND origid=(SELECT a.id FROM entity a
               ->                 WHERE name='Kevin Bacon')
               -> AND destid=(SELECT b.id FROM entity b
                                  WHERE name='N!xau')G
 *************************** 1. row ***************************
 path: Kevin Bacon -> The 45th Annual Golden Globe Awards (1988) ->
 Richard Attenborough -> In Darkest Hollywood: Cinema and Apartheid
 (1993) -> N!xau
 1 row in set (10 min 6.55 sec)
 --again
 *************************** 1. row ***************************
 path: Kevin Bacon -> The 45th Annual Golden Globe Awards (1988) ->
 Richard Attenborough -> In Darkest Hollywood: Cinema and Apartheid
 (1993) -> N!xau
 1 row in set (8 min 29.66 sec)
 Each query requires approximately 16.6 million secondary key reads.

OQGRAPH computation engine © 2009-2013 Open Query
We want your feedback!

     ● Very easy to use...
         But do feel free to ask us for help/advice.

     ● OpenQuery created friendlist_graph for Drupal 6.
        ○ Currently based on OQGraph v2
          ○ Addition to the existing friendlist module.
          ○ Enables easy social networking in Drupal.
          ○ Peter Lieverdink (@cafuego) did this in about 30 minutes

     ● We would like to know how you are using OQGRAPH!
       ○ You could be doing something really cool...



OQGRAPH computation engine © 2009-2013 Open Query
Links and support
    ● Binaries & Packages
         ○ http://mariadb.com (MariaDB 10.0 soon)
    ● Source collaboration
       ○ https://launchpad.net/oqgraph
         ○ https://code.launchpad.net/~oqgraph-dev/maria/10.0-oqgraph3
    ● Info, Docs, Support, Licensing, Engineering
         ○ http://openquery.com/graph
         ○ This presentation: http://goo.gl/gqr7b




                                     Thank you!
                                     Antony Curtis & Arjen Lentz
                                     graph@openquery.com
OQGRAPH computation engine © 2009-2013 Open Query

Contenu connexe

Tendances

Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Takahiro Harada
 

Tendances (20)

TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big Data
 
Migrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixMigrating to Apache Spark at Netflix
Migrating to Apache Spark at Netflix
 
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
Introduction to Monte Carlo Ray Tracing (CEDEC 2013)
 
51811680 open layers
51811680 open layers51811680 open layers
51811680 open layers
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
福岡大学における公開用NTPサービス事例(LACNOG2019発表資料日本語版)
福岡大学における公開用NTPサービス事例(LACNOG2019発表資料日本語版)福岡大学における公開用NTPサービス事例(LACNOG2019発表資料日本語版)
福岡大学における公開用NTPサービス事例(LACNOG2019発表資料日本語版)
 
PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"PostgreSQL Query Cache - "pqc"
PostgreSQL Query Cache - "pqc"
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
GeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony Fox
 
Open vSwitch 패킷 처리 구조
Open vSwitch 패킷 처리 구조Open vSwitch 패킷 처리 구조
Open vSwitch 패킷 처리 구조
 
OpenGL ES 3.1 Reference Card
OpenGL ES 3.1 Reference CardOpenGL ES 3.1 Reference Card
OpenGL ES 3.1 Reference Card
 
Getting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and KafkaGetting started with Hadoop, Hive, Spark and Kafka
Getting started with Hadoop, Hive, Spark and Kafka
 
BPStudy32 CouchDB 再入門
BPStudy32 CouchDB 再入門BPStudy32 CouchDB 再入門
BPStudy32 CouchDB 再入門
 
Monitoring IO performance with iostat and pt-diskstats
Monitoring IO performance with iostat and pt-diskstatsMonitoring IO performance with iostat and pt-diskstats
Monitoring IO performance with iostat and pt-diskstats
 
Linux Performance Profiling and Monitoring
Linux Performance Profiling and MonitoringLinux Performance Profiling and Monitoring
Linux Performance Profiling and Monitoring
 
いちから始めるクラウドCAE:Rescale ScaleX入門セミナー Part 2
いちから始めるクラウドCAE:Rescale ScaleX入門セミナー Part 2いちから始めるクラウドCAE:Rescale ScaleX入門セミナー Part 2
いちから始めるクラウドCAE:Rescale ScaleX入門セミナー Part 2
 
Duel of Two Libraries: Cairo & Skia
Duel of Two Libraries: Cairo & SkiaDuel of Two Libraries: Cairo & Skia
Duel of Two Libraries: Cairo & Skia
 
用 Go 語言 打造微服務架構
用 Go 語言打造微服務架構用 Go 語言打造微服務架構
用 Go 語言 打造微服務架構
 
A simple and powerful property system for C++ (talk at GCDC 2008, Leipzig)
A simple and powerful property system for C++ (talk at GCDC 2008, Leipzig)   A simple and powerful property system for C++ (talk at GCDC 2008, Leipzig)
A simple and powerful property system for C++ (talk at GCDC 2008, Leipzig)
 
TiDB Introduction
TiDB IntroductionTiDB Introduction
TiDB Introduction
 

Similaire à OQGraph @ SCaLE 11x 2013

OQGraph at MySQL Users Conference 2011
OQGraph at MySQL Users Conference 2011OQGraph at MySQL Users Conference 2011
OQGraph at MySQL Users Conference 2011
Antony T Curtis
 
Design and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageDesign and Implementation of the Security Graph Language
Design and Implementation of the Security Graph Language
Asankhaya Sharma
 

Similaire à OQGraph @ SCaLE 11x 2013 (20)

OQGraph at MySQL Users Conference 2011
OQGraph at MySQL Users Conference 2011OQGraph at MySQL Users Conference 2011
OQGraph at MySQL Users Conference 2011
 
MySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdfMySQL HA Orchestrator Proxysql Consul.pdf
MySQL HA Orchestrator Proxysql Consul.pdf
 
Neo4j: Graph-like power
Neo4j: Graph-like powerNeo4j: Graph-like power
Neo4j: Graph-like power
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Design and Implementation of the Security Graph Language
Design and Implementation of the Security Graph LanguageDesign and Implementation of the Security Graph Language
Design and Implementation of the Security Graph Language
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Apache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them AllApache Calcite: One Frontend to Rule Them All
Apache Calcite: One Frontend to Rule Them All
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
 
Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
 
What’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributorWhat’s new in 9.6, by PostgreSQL contributor
What’s new in 9.6, by PostgreSQL contributor
 
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
Gruter_TECHDAY_2014_03_ApacheTajo (in Korean)
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

OQGraph @ SCaLE 11x 2013

  • 1. OQGraph 3 for MariaDB Graphs and Hierarchies in Plain SQL http://goo.gl/gqr7b Antony T Curtis <atcurtis@gmail.com> graph@openquery.com http://openquery.com/graph
  • 2. Graphs / Networks ● Nodes connected by Edges. ● Edges may be directional. ● Edges may have a "weight" / "cost" attribute. ● Directed graphs may have bi-directional edges. ● Unconnected sets of nodes may exist on same graph. ● There need not be a "root" node. Examples: ● "Social Graphs" / friend relationships. ● Decision / State graphs. ● Airline routes OQGRAPH computation engine © 2009-2013 Open Query
  • 3. RDBMS with Heirarchies and Graphs ● Not always a particularly good fit. ● Various tree models exist; each with limitations: ○ Adjacency model ■ Either uses fixed max depth or recursive queries. ■ Oracle has CONNECT BY PRIOR ■ SQL99 has WITH RECURSIVE...UNION... ○ Nested set ■ complex ■ recursive queries to find path to root. ○ Materialised path ■ Ugly and not relational. ■ Can be quite effective when used correctly. Further reading: http://dev.mysql.com/tech-resources/articles/hierarchical-data.html OQGRAPH computation engine © 2009-2013 Open Query
  • 4. What is OQGRAPH? ● Implemented as a storage engine. ○ Original concept by Arjen Lentz ● Mk. 2 implementation 2008 ○ GPLv2+ ○ Bundled with MariaDB 5.2+ ○ Boost Graph Library ● Mk. 3 implementation ○ GPLv2+ ○ Bundled with MariaDB 10.0 (soon) ● Easy to enable ○ INSTALL PLUGIN oqgraph SONAME ‘ha_oqgraph’; OQGRAPH computation engine © 2009-2013 Open Query
  • 5. OQGRAPH: A Computation Engine ● It is not a general purpose data engine. ○ unlike MyISAM, InnoDB or MEMORY. ● Looks like an ordinary table. ● Has a very different internal architecture. ● It does not operate in terms of ○ storing data for later retrieval. ○ having indexes on data. ● May be regarded as a "magic view" or "table function". OQGRAPH computation engine © 2009-2013 Open Query
  • 6. OQGRAPH: A Computation Engine MySQL Server Communications, Session and Thread Management DDL, DML, Management Tables, SQL Parser and SQL Views, Lock Services, Management Stored Procedure Engine Buffers Logging, and Utilities and Caches Runtime Libraries Query Optimizer and Execution Engine built in and run-time loaded plug ins OQGraph InnoDB OQGRAPH computation engine © 2009-2013 Open Query
  • 7. What's new in OQGRAPH 3 Features: ● Judy array bitmaps for Graph coloring. ● Uses existing tables for edge data. ● Much lower memory cost per query. ● Does not impose any strict structure on the source table. ● Can handle significantly larger graphs than OQGRAPHv2. ○ 100K+ index reads per second are possible. ○ Millions of edges are possible. ● All edges of graph need not fit in memory. ○ Only Judy bitmap array must be held in RAM. Notes: ● Tables are read-only and only read from the backing table. ● Table must be in same schema as the backing table. ● Table must have appropriate indexes. OQGRAPH computation engine © 2009-2013 Open Query
  • 8. Anatomy of an OQGRAPH 3 table CREATE TABLE db.tblname ( latch SMALLINT UNSIGNED NULL, origid BIGINT UNSIGNED NULL, destid BIGINT UNSIGNED NULL, weight DOUBLE NULL, seq BIGINT UNSIGNED NULL, linkid BIGINT UNSIGNED NULL, KEY (latch, origid, destid) USING HASH, KEY (latch, destid, origid) USING HASH ) ENGINE=OQGRAPH data_table='link' -- data table origid='source' -- column name destid='target' -- column name weight='weight'; -- optional column name ; OQGRAPH computation engine © 2009-2013 Open Query
  • 9. OQGRAPH - Data source ● Edges are directed edges. ● Edge weight are optional and default to 1.0 ● Undirected edges may be represented as two directed edges, in opposite directions. CREATE TABLE foo ( origid INT UNSIGNED NOT NULL, destid INT UNSIGNED NOT NULL, PRIMARY KEY(origid, destid), KEY (destid) ); INSERT INTO foo (origid,destid) VALUES (1,2), (2,3), (2,4), (4,5), (3,6), (5,6); OQGRAPH computation engine © 2009-2013 Open Query
  • 10. OQGRAPH - Data source, cont. Creating the OQGRAPH table: CREATE TABLE foo_graph ( latch SMALLINT UNSIGNED NULL, origid BIGINT UNSIGNED NULL, destid BIGINT UNSIGNED NULL, weight DOUBLE NULL, seq BIGINT UNSIGNED NULL, linkid BIGINT UNSIGNED NULL, KEY (latch, origid, destid) USING HASH, KEY (latch, destid, origid) USING HASH ) ENGINE=OQGRAPH data_table='foo' origid='origid' destid='destid'; OQGRAPH computation engine © 2009-2013 Open Query
  • 11. Selecting Edges MariaDB [foo]> select * from foo_graph; +-------+--------+--------+--------+------+--------+ | latch | origid | destid | weight | seq | linkid | +-------+--------+--------+--------+------+--------+ | NULL | 1 | 2 | 1 | NULL | NULL | | NULL | 2 | 3 | 1 | NULL | NULL | | NULL | 2 | 4 | 1 | NULL | NULL | | NULL | 3 | 6 | 1 | NULL | NULL | | NULL | 4 | 5 | 1 | NULL | NULL | | NULL | 5 | 6 | 1 | NULL | NULL | +-------+--------+--------+--------+------+--------+ 6 rows in set (0.38 sec) OQGRAPH computation engine © 2009-2013 Open Query
  • 12. Now, it's time for some magic. (shortest path calculation) ● SELECT * FROM foo_graph WHERE latch=1 AND origid=1 AND destid=6; +-------+--------+--------+--------+------+--------+ | latch | origid | destid | weight | seq | linkid | +-------+--------+--------+--------+------+--------+ | 1 | 1 | 6 | NULL | 0 | 1 | | 1 | 1 | 6 | 1 | 1 | 2 | | 1 | 1 | 6 | 1 | 2 | 3 | | 1 | 1 | 6 | 1 | 3 | 6 | +-------+--------+--------+--------+------+--------+ ● SELECT GROUP_CONCAT(linkid ORDER BY seq) AS path FROM foo_graph WHERE latch=1 AND origid=1 AND destid=6 G path: 1,2,3,6 OQGRAPH computation engine © 2009-2013 Open Query
  • 13. Other computations, ● Which paths lead to node 4? SELECT GROUP_CONCAT(linkid) AS list FROM foo_graph WHERE latch=1 AND destid=4 G list: 1,2,4 ● Where can I get to from node 4? SELECT GROUP_CONCAT(linkid) AS list FROM foo_graph WHERE latch=1 AND origid=4 G list: 6,5,4 OQGRAPH computation engine © 2009-2013 Open Query
  • 14. Other computations, continued. ● See docs for latch 0 and latch NULL ● latch 1 : Dijkstra's shortest path. ○ O((V + E).log V) ● latch 2 : Breadth-first search. ○ O(V+E) ● Other algorithms possible OQGRAPH computation engine © 2009-2013 Open Query
  • 15. Joins make it prettier, ● INSERT INTO people VALUES (1,’pearce’), (2,’hunnicut’), (3,’potter’), (4,’hoolihan’), (5,’winchester’), (6,’ mulcahy’); ● SELECT GROUP_CONCAT(name ORDER BY seq) path FROM foo_graph JOIN people ON (foo.linkid = people.id) WHERE latch=1 AND origid=1 AND destid=6 G path: pearce,hunnicut,potter,mulcahy OQGRAPH computation engine © 2009-2013 Open Query
  • 16. Tree of Life Load the tol.sql schema, Create tol_link backing store table, CREATE TABLE tol_link ( source INT UNSIGNED NOT NULL, target INT UNSIGNED NOT NULL, PRIMARY KEY (source, target), KEY (target) ) ENGINE=innodb; Populate it with all the edges we need: INSERT INTO tol_link (source,target) SELECT parent,id FROM tol WHERE parent IS NOT NULL UNION ALL SELECT id,parent FROM tol WHERE parent IS NOT NULL; Query OK, 178102 rows affected (46.35 sec) Records: 178102 Duplicates: 0 Warnings: 0 Direct download: http://bazaar.launchpad.net/~openquery-core/oqgraph/trunk/view/head:/examples/tree-of-life/tol.sql OQGRAPH computation engine © 2009-2013 Open Query
  • 17. Tree of Life, cont. Creating the OQGRAPH table: CREATE TABLE tol_tree ( latch SMALLINT UNSIGNED NULL, origid BIGINT UNSIGNED NULL, destid BIGINT UNSIGNED NULL, weight DOUBLE NULL, seq BIGINT UNSIGNED NULL, linkid BIGINT UNSIGNED NULL, KEY (latch, origid, destid) USING HASH, KEY (latch, destid, origid) USING HASH ) ENGINE=OQGRAPH data_table='tol_link' origid='source' destid='target'; OQGRAPH computation engine © 2009-2013 Open Query
  • 18. Tree of Life - finding H.Sapiens SELECT GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS path FROM tol_tree JOIN tol ON (linkid=id) WHERE latch=1 AND origid=1 AND destid=16421 G path: Life on Earth -> Eukaryotes -> Unikonts -> Opisthokonts -> Animals -> Bilateria -> Deuterostomia -> Chordata -> Craniata -> Vertebrata -> Gnathostomata -> Teleostomi -> Osteichthyes -> Sarcopterygii -> Terrestrial Vertebrates -> Tetrapoda -> Reptiliomorpha -> Amniota -> Synapsida -> Eupelycosauria -> Sphenacodontia -> Sphenacodontoidea -> Therapsida -> Theriodontia -> Cynodontia -> Mammalia -> Eutheria -> Primates -> Catarrhini -> Hominidae -> Homo -> Homo sapiens OQGRAPH computation engine © 2009-2011 Open Query
  • 19. Internet Movie DataBase (IMDB) Transform and load the movie database (this takes a long time) CREATE TABLE `entity` ( `id` int(11) NOT NULL AUTO_INCREMENT, `type` enum('ACTOR','MOVIE','TV MOVIE','TV MINI','TV SERIES','VIDEO MOVIE','VIDEO GAME','VOICE','ARCHIVE') NOT NULL, `name` varchar(128) COLLATE utf8_unicode_ci NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `type` (`type`,`name`) USING BTREE ) ENGINE=InnoDB; CREATE TABLE `link` ( `rel_id` int(11) NOT NULL AUTO_INCREMENT, `link_from` int(11) NOT NULL, `link_to` int(11) NOT NULL, PRIMARY KEY (`rel_id`), KEY `link_from` (`link_from`,`link_to`), KEY `link_to` (`link_to`) ) ENGINE=InnoDB; OQGRAPH computation engine © 2009-2013 Open Query
  • 20. Degrees of N!xau Graph of movies approximately 3.7 million nodes with 9 million edges. Tables are about 1GB and InnoDB configured for 512MB buffer pool. MariaDB [imdb]> SELECT -> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS path -> FROM movie_graph JOIN entity ON (id=linkid) -> WHERE latch=1 -> AND origid=(SELECT a.id FROM entity a -> WHERE name='Kevin Bacon') -> AND destid=(SELECT b.id FROM entity b WHERE name='N!xau')G OQGRAPH computation engine © 2009-2013 Open Query
  • 21. Degrees of N!xau Graph of movies approximately 3.7 million nodes with 9 million edges. Tables are about 1GB and InnoDB configured for 512MB buffer pool. MariaDB [imdb]> SELECT -> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS path -> FROM movie_graph JOIN entity ON (id=linkid) -> WHERE latch=1 -> AND origid=(SELECT a.id FROM entity a -> WHERE name='Kevin Bacon') -> AND destid=(SELECT b.id FROM entity b WHERE name='N!xau')G *************************** 1. row *************************** path: Kevin Bacon -> The Air Up There (1994) -> Fanyana H. Sidumo -> The Gods Must Be Crazy (1981) -> N!xau 1 row in set (3 min 9.67 sec) --again *************************** 1. row *************************** path: Kevin Bacon -> The Air Up There (1994) -> Fanyana H. Sidumo -> The Gods Must Be Crazy (1981) -> N!xau 1 row in set (1 min 7.13 sec) Each query requires approximately 7.8 million secondary key reads. OQGRAPH computation engine © 2009-2013 Open Query
  • 22. Degrees of N!xau Graph of approximately 3.7 million nodes with 30 million edges. Tables are about 3.5GB and InnoDB configured for 512MB buffer pool. MariaDB [imdb]> SELECT -> GROUP_CONCAT(name ORDER BY seq SEPARATOR ' -> ') AS path -> FROM imdb_graph JOIN entity ON (id=linkid) -> WHERE latch=1 -> AND origid=(SELECT a.id FROM entity a -> WHERE name='Kevin Bacon') -> AND destid=(SELECT b.id FROM entity b WHERE name='N!xau')G *************************** 1. row *************************** path: Kevin Bacon -> The 45th Annual Golden Globe Awards (1988) -> Richard Attenborough -> In Darkest Hollywood: Cinema and Apartheid (1993) -> N!xau 1 row in set (10 min 6.55 sec) --again *************************** 1. row *************************** path: Kevin Bacon -> The 45th Annual Golden Globe Awards (1988) -> Richard Attenborough -> In Darkest Hollywood: Cinema and Apartheid (1993) -> N!xau 1 row in set (8 min 29.66 sec) Each query requires approximately 16.6 million secondary key reads. OQGRAPH computation engine © 2009-2013 Open Query
  • 23. We want your feedback! ● Very easy to use... But do feel free to ask us for help/advice. ● OpenQuery created friendlist_graph for Drupal 6. ○ Currently based on OQGraph v2 ○ Addition to the existing friendlist module. ○ Enables easy social networking in Drupal. ○ Peter Lieverdink (@cafuego) did this in about 30 minutes ● We would like to know how you are using OQGRAPH! ○ You could be doing something really cool... OQGRAPH computation engine © 2009-2013 Open Query
  • 24. Links and support ● Binaries & Packages ○ http://mariadb.com (MariaDB 10.0 soon) ● Source collaboration ○ https://launchpad.net/oqgraph ○ https://code.launchpad.net/~oqgraph-dev/maria/10.0-oqgraph3 ● Info, Docs, Support, Licensing, Engineering ○ http://openquery.com/graph ○ This presentation: http://goo.gl/gqr7b Thank you! Antony Curtis & Arjen Lentz graph@openquery.com OQGRAPH computation engine © 2009-2013 Open Query