SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Graph Databases and the Future of Large-Scale
          Knowledge Management



                  Marko A. Rodriguez
           T-5, Center for Nonlinear Studies
           Los Alamos National Laboratory
              http://markorodriguez.com

                     April 8, 2009
Abstract

Modern day open source and commercial graph databases can store on the
order of 1 billion relationships with some databases reaching the 10 billion
mark. These developments are making the graph database practical for
applications that require large-scale knowledge structures. Moreover, with
the Web of Data standards set forth by the Linked Data community, it is
possible to interlink graph databases across the web into a giant global
knowledge structure. This talk will discuss graph databases, their
underlying data model, their querying mechanisms, and the benefits of the
graph data structure for modeling and analysis.




                                          Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Outline

• The Relational Database vs. the Graph Database

• The World Wide Web vs. the Web of Data




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Outline

• The Relational Database vs. the Graph Database

• The World Wide Web vs. the Web of Data




                                    Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Relational Database

    ID      Name     Type    Legs     Fur              ID2            ID2

   0001     Marko    Human    2       false           0001           0002

   0002     Fluffy   Dog      4       true            0002           0001


 Object_Table                                     Friendship_Table




                                    Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Graph Database

           Human                                          Dog



            type                                          type


           0001                  friend                  0002


           name                                          name
    legs           fur                          legs                fur


2          Marko         false             4             Fluffy            true




                                          Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Extending our Make Believe World

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Extended World Modeled in a Relational Database

     ID      Name     Type    Legs   Fur       ID2          ID2           Type1       Type2

    0001     Marko    Human    2     false     0001        0002          Human       Mammal

    0002     Fluffy   Dog      4     true      0002        0001            Dog       Mammal


  Object_Table                               Friendship_Table          Subclass_Table




                                              Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Extended World Modeled in a Graph Database
                                         Mammal


                         subclassof                subclassof

                 Human                                          Dog



                  type                                          type


                 0001                     friend                0002


                 name                                           name
          legs             fur                          legs             fur


      2          Marko           false             4            Fluffy         true




                                                   Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Extending our Extended Make Believe World
• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.

• Fluffy peed on the carpet.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Extended Extended World Modeled in a
              Relational Database

   ID      Name     Type     Legs   Fur       ID2          ID2           Type1       Type2

  0001     Marko    Human     2     false     0001        0002          Human       Mammal

  0002     Fluffy    Dog      4     true      0002        0001            Dog       Mammal

  0003    My_Rug    Carpet   N/A    N/A
                                            Friendship_Table          Subclass_Table

Object_Table                                  ID1          ID2

                                              0002        0003

                                            Pee_Table




                                             Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Extended Extended World Modeled in a
             Graph Database
                                   Mammal


                   subclassof                subclassof

           Human                                          Dog                       Carpet



            type                                          type                       type


           0001                     friend                0002       peedOn          0003


           name                                           name                       name
    legs             fur                         legs              fur


2          Marko           false             4            Fluffy         true      My_Rug




                                                        Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Graph as the Natural World Model
• The world is inherently (or perceived as) object-oriented.

• The world is filled with objects and relations among them.

• The multi-relational graph is a very natural representation of the world.
                       undirected single-relational graph

                          x                                     z


                       directed single-relational graph

                          x                                     z


                       directed multi-relational graph

                          x                  y                  z




                                                     Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Graph as the Natural Programming Model

• High-level computer languages are object-oriented.

• Nearly no impedance mismatch between the multi-relational graph and
  the programming object.

• It is easy to go from graph database to in-memory object.

          Human marko = new Human();
          marko.name = "Marko";
          marko.addFriend(fluffy);
          marko.setHasFur(false);
          marko.setLegs(2);


                                        Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Relational Database vs. the Graph Database

• A relational database’s (e.g. MySQL, PostgreSQL, Oracle) data model
  is a collection interlinked tables.

• A graph database’s (e.g. OpenSesame, AllegroGraph, Neo4j) data model
  is a multi-relational graph.


                Relational Database       Graph Database




                    127.0.0.1                127.0.0.2




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009
SQL vs. SPARQL

SELECT OTY.Name FROM Object_Table AS OTX,
        Object_Table AS OTY, Friendship_Table WHERE
   OTX.Name = "Marko"
   AND Friendship_Table.ID = OTY.ID
   AND Friendship_Table.Friend = OTX.ID;

SELECT ?z WHERE {
  ?x name "Marko"^^xsd:string .
  ?y friend ?x .
  ?y name ?z }




                                  Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Outline

• The Relational Database vs. the Graph Database

• The World Wide Web vs. the Web of Data




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Internet Address Spaces

• The Uniform Resource Identifier (URI) is the superclass of the Uniform
  Resource Locator (URL) and Uniform Resource Name (URN).




                                       Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Uniform Resource Locator
• The set of all URLs is the address space of all resources that can be
  located and retrieved on the Web. URLs denote where a resource is.
    http://markorodriguez.com/index.html
    ∗ Domain name server (DNS): markorodriguez.com → 216.251.43.6
    ∗ http:// means GET at port 80,
    ∗ /index.html means the resource to get at that Internet location.

                                Web Server




                                  index.html




                             markorodriguez.com
                                216.251.43.6



                                               Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Uniform Resource Name

• The set of all URNs is the address space of all resources within the urn:
  namespace.
    urn:uuid:bd93def0-8026-11dd-842be54955baa12
    urn:issn:0892-3310
    urn:doi:10.1016/j.knosys.2008.03.030

• Named resources need not be retrievable through the Web.

• URNs denote what a resource is.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Uniform Resource Identifier
• The URI address space is an infinite space for all Internet resources.
    http://markorodriguez.com/index.html
    urn:issn:0892-3310
    ftp://markorodriguez.com/private/markos_secrets.txt
    http://www.lanl.gov#fluffy

• Imporant: URIs can denote concepts, instances, and datum.




                         lanl:fluffy        lanl:fluffy_legs




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The “Uniform Resource Graph”

• We can denote where something is, what something is, but how do we
  denote how something relates to something else?

• How can we denote what something means, where meaning is determined
  by its place within a larger relational structure?
    URIs are like words. They denote things in the real or imaginary world.
    Linking URIs is like defining words. Similar to how a dictionary defines
    words in terms of other words.




                                        Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The World Wide Web
• The World Wide Web is primarily concerned with the Hyper-Text
  Transfer Protocol (HTTP) and with retrievable resources in the URL
  address space.

• These retrievable resources are files: HTML documents, images, audio,
  etc. The “web” is created when HTML documents contain URLs.
                              http://markorodriguez.com/


                                      index.html



                                          href


            Resume.html     href      Home.html          href       Research.html




                                                   Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Web of Data

• The Web of Data is primarily concerned with URIs. If the World Wide
  Web is the web of files, the Web of Data is the web of data. In other
  words, for the World Wide Web, the level of granularity is the retrievable
  file. For the Web of Data, it is the information in that file. Moreover,
  this information is not necessarily contained in a file. There existence is
  predicated on their URI. Their meaning is predicated on their relationship
  to other URIs. The web of URIs is the Web of Data.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Resource Description Framework

• The Resource Description Framework (RDF) is the standard for
  representing the relationship between URIs and literals (e.g. float, string,
  date time, etc.). I would have preferred the name “Uniform Resource
  Graph” (URG).

• Relationships are directed, labeled links between URIs. A subject URI
  points to an object URI or literal by means of a predicate URI.

                           subject                 predicate                object


                          lanl:marko               foaf:knows               lanl:fluffy


                           foaf:name                                       foaf:name


                "Marko A. Rodriguez"^^xsd:string                "Fluffy P. Everywhere"^^xsd:string




                                                                 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
lanl:Mammal


                                    rdfs:subClassOf                 rdfs:subClassOf


                   lanl:Human                                                         lanl:Dog


                        rdf:type                                                      rdf:type



                   lanl:marko                         foaf:knows                      lanl:fluffy

                                                      foaf:knows
             lanl:fur        lanl:legs                                       lanl:fur      lanl:legs
                   foaf:name                                                        foaf:name

"false"^^xsd:boolean               "2"^^xsd:integer          "true"^^xsd:boolean                   "4"^^xsd:integer


        "Marko A. Rodriguez"^^xsd:string                                "Fluffy P. Everywhere"^^xsd:string




                                                              Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The RDF Data Model and its Serializations

• RDF is a data model. As such, there exists many serializations (encoding
  formats) of that model.

• RDF/XML is not RDF. It is a serialization of RDF. It is smart to, at all
  costs, avoid learning RDF/XML as it is an unintuitive standard. Other
  serializations include: N-TRIPLE, N3, TRIX, TRIG, ...

<http://www.lanl.gov#marko> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.lanl.gov#Human> .
<http://www.lanl.gov#marko> <http://xmlns.com/foaf/0.1/name> "Marko A. Rodriguez"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://www.lanl.gov#marko> <http://www.lanl.gov#legs> "2"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://www.lanl.gov#marko> <http://www.lanl.gov#fur> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> .
<http://www.lanl.gov#marko> <http://xmlns.com/foaf/0.1/knows> <http://www.lanl.gov#fluffy> .




                                                                Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Web of Data is a Distributed Database

• The URI address space is distributed.

• URIs can denote datum.

• RDF denotes the relationships URIs.

• The Web of Data’s foundational standard is RDF.

• Therefore, the Web of Data is a distributed database.




                                          Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The World Wide Web vs. the Web of Data

       Web Server                         Web Server




           HTML        href                    HTML


       127.0.0.1                           127.0.0.2




       Web Server                       Graph Database




           RDF      foaf:knows

       127.0.0.1                           127.0.0.2




                                 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Current Web of Data
                               homologenekegg                     projectgutenberg
                            symbol
                                                                                   homologenekegg
                                                                              libris                                          projectgutenberg
                                                   cas                          symbol
                                                                                     bbcjohnpeel
                                                                                                                                          libris
                  unists                    diseasome dailymed                 w3cwordnet
                                   chebi
                                        hgnc     pubchem           eurostat
                          mgi
                                   geneid
                                            omim                      wikicompany         geospecies
                                                                                                                   cas                               bbcjohnpeel
                                                                                                       diseasome dailymed
                                               drugbank                        worldfactbook
                       reactome
                                   pubmed                    unists
                                                                 magnatune
                                                                              opencyc                                                          w3cwordnet
              uniparc                                     linkedct                       chebi
                                                                                            freebase

taxonomy
         uniref
                             uniprot
                        geneontology
                                       interpro                                                    hgnc       pubchem              eurostat
                                   pdb                                   yago      umbel
                                             pfam                      mgi
                                                                   dbpedia                             omim
                                                                                              bbclatertotpgovtrack                    wikicompany         geospecies
                                         prosite
                               prodom                                     flickrwrappr
                                                                                         geneid
                                                                                      opencalais

                                                                reactome
                                                                                                uscensusdata
                                                                                                          drugbank                             worldfactbook
                                                                      lingvoj linkedmdb
                                                                                           surgeradio
                                                                                                                                 magnatune
                                                                                          pubmed
                                                                                  virtuososponger                                             opencyc
                                                  rdfbookmashup
                                             uniparc                                                                                                        freebase
                                                   swconferencecorpus         geonames musicbrainz      myspacewrapper    linkedct
                                         dblpberlin                         uniprot pubguide
                      taxonomy                         revyu                                      interpro
                                    uniref                       geneontologyjamendo bbcplaycountdata
                                                                   rdfohloh
                                                                                         pdb                                                       umbel
                                                                                                                                         yago
                                                    semanticweborg          siocsites        riese
                                                                                                        pfam                       dbpedia                    bbclatertotp            govtrack
                                                                foafprofiles
                            dblphannover    openguides                         audioscrobbler       prosite bbcprogrammes
                                                                                prodom
                                                                                     crunchbase                                           flickrwrappropencalais
                                                                    doapspace                                                                                   uscensusdata
                                                            flickrexporter
                                                                                                                                                           surgeradio
              budapestbme                                               qdos
                                                                                                                                      lingvoj linkedmdb
                                                                          semwebcentral                                                           virtuososponger
            eurecom                    ecssouthampton

                   pisa
                          dblprkbexplorer
                                  newcastle                                                                            rdfbookmashup
                                                                                                                                                   geonames musicbrainz
                                      rae2001
                                  eprints
                                       irittoulouse
                    laascnrs acm citeseer
                                                                                                                         swconferencecorpus                                        myspacewrapper
                           ieee                                                                                dblpberlin                                            pubguide
                resex
                                ibm

                                                                                                                            revyu                               jamendo
                                                                                                                                       rdfohloh
                                                                                                                                                                               bbcplaycountdata
                                                                                                                        semanticweborg        siocsites        riese
                                                                                                                                  foafprofiles
                                                                                                                  openguides                     audioscrobbler            bbcprogrammes
                                                                                               dblphannover
                                                                                                                                                       crunchbase
                                                                                                                             Risk Symposium – Santa Fe, New Mexico – April 8, 2009
                                                                                                                                       doapspace


                                                                                                                                 flickrexporter
                                                                                                                                            qdos
The Current Web of Data
data set           domain       data set           domain        data set              domain
audioscrobbler     music        govtrack           government    pubguide              books
bbclatertotp       music        homologene         biology       qdos                  social
bbcplaycountdata   music        ibm                computer      rae2001               computer
bbcprogrammes      media        ieee               computer      rdfbookmashup         books
budapestbme        computer     interpro           biology       rdfohloh              social
chebi              biology      jamendo            music         resex                 computer
crunchbase         business     laascnrs           computer      riese                 government
dailymed           medical      libris             books         semanticweborg        computer
dblpberlin         computer     lingvoj            reference     semwebcentral         social
dblphannover       computer     linkedct           medical       siocsites             social
dblprkbexplorer    computer     linkedmdb          movie         surgeradio            music
dbpedia            general      magnatune          music         swconferencecorpus    computer
doapspace          social       musicbrainz        music         taxonomy              reference
drugbank           medical      myspacewrapper     social        umbel                 general
eurecom            computer     opencalais         reference     uniref                biology
eurostat           government   opencyc            general       unists                biology
flickrexporter      images       openguides         reference     uscensusdata          government
flickrwrappr        images       pdb                biology       virtuososponger       reference
foafprofiles        social       pfam               biology       w3cwordnet            reference
freebase           general      pisa               computer      wikicompany           business
geneid             biology      prodom             biology       worldfactbook         government
geneontology       biology      projectgutenberg   books         yago                  general
geonames           geographic   prosite            biology       ...



                                                     Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Cultural Differences that are Leading to a New World of
          Large-Scale Knowledge Management

• Relational databases tend to not maintain public access points.

• Relational database users tend to not publish their schemas.

• Web of Data graph databases maintain public access points called
  SPARQL end-points.

• Web of Data graph databases tend to reuse and extend public schemas
  called ontologies.




                                        Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Conclusion

• Thank you for your time...
    My homepage: http://markorodriguez.com
    Neno/Fhat: http://neno.lanl.gov
    Collective Decision Making Systems: http://cdms.lanl.gov
    Faith in the Algorithm: http://faithinthealgorithm.net
    MESUR: http://www.mesur.org




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Contenu connexe

Plus de elliando dias

Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 
From Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn IntroductionFrom Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn Introductionelliando dias
 

Plus de elliando dias (20)

Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 
From Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn IntroductionFrom Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn Introduction
 

Dernier

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Dernier (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

Graph Databases and the Future of Large-Scale Knowledge Management

  • 1. Graph Databases and the Future of Large-Scale Knowledge Management Marko A. Rodriguez T-5, Center for Nonlinear Studies Los Alamos National Laboratory http://markorodriguez.com April 8, 2009
  • 2. Abstract Modern day open source and commercial graph databases can store on the order of 1 billion relationships with some databases reaching the 10 billion mark. These developments are making the graph database practical for applications that require large-scale knowledge structures. Moreover, with the Web of Data standards set forth by the Linked Data community, it is possible to interlink graph databases across the web into a giant global knowledge structure. This talk will discuss graph databases, their underlying data model, their querying mechanisms, and the benefits of the graph data structure for modeling and analysis. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 3. Outline • The Relational Database vs. the Graph Database • The World Wide Web vs. the Web of Data Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 4. Outline • The Relational Database vs. the Graph Database • The World Wide Web vs. the Web of Data Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 5. Our Make Believe World • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 6. Our World Modeled in a Relational Database ID Name Type Legs Fur ID2 ID2 0001 Marko Human 2 false 0001 0002 0002 Fluffy Dog 4 true 0002 0001 Object_Table Friendship_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 7. Our World Modeled in a Graph Database Human Dog type type 0001 friend 0002 name name legs fur legs fur 2 Marko false 4 Fluffy true Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 8. Extending our Make Believe World • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. • Human and dog are a subclass of mammal. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 9. Our Extended World Modeled in a Relational Database ID Name Type Legs Fur ID2 ID2 Type1 Type2 0001 Marko Human 2 false 0001 0002 Human Mammal 0002 Fluffy Dog 4 true 0002 0001 Dog Mammal Object_Table Friendship_Table Subclass_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 10. Our Extended World Modeled in a Graph Database Mammal subclassof subclassof Human Dog type type 0001 friend 0002 name name legs fur legs fur 2 Marko false 4 Fluffy true Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 11. Extending our Extended Make Believe World • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. • Human and dog are a subclass of mammal. • Fluffy peed on the carpet. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 12. Our Extended Extended World Modeled in a Relational Database ID Name Type Legs Fur ID2 ID2 Type1 Type2 0001 Marko Human 2 false 0001 0002 Human Mammal 0002 Fluffy Dog 4 true 0002 0001 Dog Mammal 0003 My_Rug Carpet N/A N/A Friendship_Table Subclass_Table Object_Table ID1 ID2 0002 0003 Pee_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 13. Our Extended Extended World Modeled in a Graph Database Mammal subclassof subclassof Human Dog Carpet type type type 0001 friend 0002 peedOn 0003 name name name legs fur legs fur 2 Marko false 4 Fluffy true My_Rug Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 14. The Graph as the Natural World Model • The world is inherently (or perceived as) object-oriented. • The world is filled with objects and relations among them. • The multi-relational graph is a very natural representation of the world. undirected single-relational graph x z directed single-relational graph x z directed multi-relational graph x y z Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 15. The Graph as the Natural Programming Model • High-level computer languages are object-oriented. • Nearly no impedance mismatch between the multi-relational graph and the programming object. • It is easy to go from graph database to in-memory object. Human marko = new Human(); marko.name = "Marko"; marko.addFriend(fluffy); marko.setHasFur(false); marko.setLegs(2); Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 16. The Relational Database vs. the Graph Database • A relational database’s (e.g. MySQL, PostgreSQL, Oracle) data model is a collection interlinked tables. • A graph database’s (e.g. OpenSesame, AllegroGraph, Neo4j) data model is a multi-relational graph. Relational Database Graph Database 127.0.0.1 127.0.0.2 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 17. SQL vs. SPARQL SELECT OTY.Name FROM Object_Table AS OTX, Object_Table AS OTY, Friendship_Table WHERE OTX.Name = "Marko" AND Friendship_Table.ID = OTY.ID AND Friendship_Table.Friend = OTX.ID; SELECT ?z WHERE { ?x name "Marko"^^xsd:string . ?y friend ?x . ?y name ?z } Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 18. Outline • The Relational Database vs. the Graph Database • The World Wide Web vs. the Web of Data Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 19. Internet Address Spaces • The Uniform Resource Identifier (URI) is the superclass of the Uniform Resource Locator (URL) and Uniform Resource Name (URN). Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 20. The Uniform Resource Locator • The set of all URLs is the address space of all resources that can be located and retrieved on the Web. URLs denote where a resource is. http://markorodriguez.com/index.html ∗ Domain name server (DNS): markorodriguez.com → 216.251.43.6 ∗ http:// means GET at port 80, ∗ /index.html means the resource to get at that Internet location. Web Server index.html markorodriguez.com 216.251.43.6 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 21. The Uniform Resource Name • The set of all URNs is the address space of all resources within the urn: namespace. urn:uuid:bd93def0-8026-11dd-842be54955baa12 urn:issn:0892-3310 urn:doi:10.1016/j.knosys.2008.03.030 • Named resources need not be retrievable through the Web. • URNs denote what a resource is. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 22. The Uniform Resource Identifier • The URI address space is an infinite space for all Internet resources. http://markorodriguez.com/index.html urn:issn:0892-3310 ftp://markorodriguez.com/private/markos_secrets.txt http://www.lanl.gov#fluffy • Imporant: URIs can denote concepts, instances, and datum. lanl:fluffy lanl:fluffy_legs Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 23. The “Uniform Resource Graph” • We can denote where something is, what something is, but how do we denote how something relates to something else? • How can we denote what something means, where meaning is determined by its place within a larger relational structure? URIs are like words. They denote things in the real or imaginary world. Linking URIs is like defining words. Similar to how a dictionary defines words in terms of other words. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 24. The World Wide Web • The World Wide Web is primarily concerned with the Hyper-Text Transfer Protocol (HTTP) and with retrievable resources in the URL address space. • These retrievable resources are files: HTML documents, images, audio, etc. The “web” is created when HTML documents contain URLs. http://markorodriguez.com/ index.html href Resume.html href Home.html href Research.html Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 25. The Web of Data • The Web of Data is primarily concerned with URIs. If the World Wide Web is the web of files, the Web of Data is the web of data. In other words, for the World Wide Web, the level of granularity is the retrievable file. For the Web of Data, it is the information in that file. Moreover, this information is not necessarily contained in a file. There existence is predicated on their URI. Their meaning is predicated on their relationship to other URIs. The web of URIs is the Web of Data. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 26. The Resource Description Framework • The Resource Description Framework (RDF) is the standard for representing the relationship between URIs and literals (e.g. float, string, date time, etc.). I would have preferred the name “Uniform Resource Graph” (URG). • Relationships are directed, labeled links between URIs. A subject URI points to an object URI or literal by means of a predicate URI. subject predicate object lanl:marko foaf:knows lanl:fluffy foaf:name foaf:name "Marko A. Rodriguez"^^xsd:string "Fluffy P. Everywhere"^^xsd:string Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 27. lanl:Mammal rdfs:subClassOf rdfs:subClassOf lanl:Human lanl:Dog rdf:type rdf:type lanl:marko foaf:knows lanl:fluffy foaf:knows lanl:fur lanl:legs lanl:fur lanl:legs foaf:name foaf:name "false"^^xsd:boolean "2"^^xsd:integer "true"^^xsd:boolean "4"^^xsd:integer "Marko A. Rodriguez"^^xsd:string "Fluffy P. Everywhere"^^xsd:string Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 28. The RDF Data Model and its Serializations • RDF is a data model. As such, there exists many serializations (encoding formats) of that model. • RDF/XML is not RDF. It is a serialization of RDF. It is smart to, at all costs, avoid learning RDF/XML as it is an unintuitive standard. Other serializations include: N-TRIPLE, N3, TRIX, TRIG, ... <http://www.lanl.gov#marko> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.lanl.gov#Human> . <http://www.lanl.gov#marko> <http://xmlns.com/foaf/0.1/name> "Marko A. Rodriguez"^^<http://www.w3.org/2001/XMLSchema#string> . <http://www.lanl.gov#marko> <http://www.lanl.gov#legs> "2"^^<http://www.w3.org/2001/XMLSchema#integer> . <http://www.lanl.gov#marko> <http://www.lanl.gov#fur> "false"^^<http://www.w3.org/2001/XMLSchema#boolean> . <http://www.lanl.gov#marko> <http://xmlns.com/foaf/0.1/knows> <http://www.lanl.gov#fluffy> . Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 29. The Web of Data is a Distributed Database • The URI address space is distributed. • URIs can denote datum. • RDF denotes the relationships URIs. • The Web of Data’s foundational standard is RDF. • Therefore, the Web of Data is a distributed database. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 30. The World Wide Web vs. the Web of Data Web Server Web Server HTML href HTML 127.0.0.1 127.0.0.2 Web Server Graph Database RDF foaf:knows 127.0.0.1 127.0.0.2 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 31. The Current Web of Data homologenekegg projectgutenberg symbol homologenekegg libris projectgutenberg cas symbol bbcjohnpeel libris unists diseasome dailymed w3cwordnet chebi hgnc pubchem eurostat mgi geneid omim wikicompany geospecies cas bbcjohnpeel diseasome dailymed drugbank worldfactbook reactome pubmed unists magnatune opencyc w3cwordnet uniparc linkedct chebi freebase taxonomy uniref uniprot geneontology interpro hgnc pubchem eurostat pdb yago umbel pfam mgi dbpedia omim bbclatertotpgovtrack wikicompany geospecies prosite prodom flickrwrappr geneid opencalais reactome uscensusdata drugbank worldfactbook lingvoj linkedmdb surgeradio magnatune pubmed virtuososponger opencyc rdfbookmashup uniparc freebase swconferencecorpus geonames musicbrainz myspacewrapper linkedct dblpberlin uniprot pubguide taxonomy revyu interpro uniref geneontologyjamendo bbcplaycountdata rdfohloh pdb umbel yago semanticweborg siocsites riese pfam dbpedia bbclatertotp govtrack foafprofiles dblphannover openguides audioscrobbler prosite bbcprogrammes prodom crunchbase flickrwrappropencalais doapspace uscensusdata flickrexporter surgeradio budapestbme qdos lingvoj linkedmdb semwebcentral virtuososponger eurecom ecssouthampton pisa dblprkbexplorer newcastle rdfbookmashup geonames musicbrainz rae2001 eprints irittoulouse laascnrs acm citeseer swconferencecorpus myspacewrapper ieee dblpberlin pubguide resex ibm revyu jamendo rdfohloh bbcplaycountdata semanticweborg siocsites riese foafprofiles openguides audioscrobbler bbcprogrammes dblphannover crunchbase Risk Symposium – Santa Fe, New Mexico – April 8, 2009 doapspace flickrexporter qdos
  • 32. The Current Web of Data data set domain data set domain data set domain audioscrobbler music govtrack government pubguide books bbclatertotp music homologene biology qdos social bbcplaycountdata music ibm computer rae2001 computer bbcprogrammes media ieee computer rdfbookmashup books budapestbme computer interpro biology rdfohloh social chebi biology jamendo music resex computer crunchbase business laascnrs computer riese government dailymed medical libris books semanticweborg computer dblpberlin computer lingvoj reference semwebcentral social dblphannover computer linkedct medical siocsites social dblprkbexplorer computer linkedmdb movie surgeradio music dbpedia general magnatune music swconferencecorpus computer doapspace social musicbrainz music taxonomy reference drugbank medical myspacewrapper social umbel general eurecom computer opencalais reference uniref biology eurostat government opencyc general unists biology flickrexporter images openguides reference uscensusdata government flickrwrappr images pdb biology virtuososponger reference foafprofiles social pfam biology w3cwordnet reference freebase general pisa computer wikicompany business geneid biology prodom biology worldfactbook government geneontology biology projectgutenberg books yago general geonames geographic prosite biology ... Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 33. Cultural Differences that are Leading to a New World of Large-Scale Knowledge Management • Relational databases tend to not maintain public access points. • Relational database users tend to not publish their schemas. • Web of Data graph databases maintain public access points called SPARQL end-points. • Web of Data graph databases tend to reuse and extend public schemas called ontologies. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 34. Conclusion • Thank you for your time... My homepage: http://markorodriguez.com Neno/Fhat: http://neno.lanl.gov Collective Decision Making Systems: http://cdms.lanl.gov Faith in the Algorithm: http://faithinthealgorithm.net MESUR: http://www.mesur.org Risk Symposium – Santa Fe, New Mexico – April 8, 2009