SlideShare une entreprise Scribd logo
1  sur  27
Digital Enterprise Research Institute                                                           www.deri.ie




                       Exchange and Consumption of
                             Huge RDF Data
                            Miguel A. Martínez-Prieto1,2 <migumar2@infor.uva.es>
                                                Mario Arias1,3 <mario.arias@deri.org>
                                        Javier D. Fernández1,2 <jfergar@infor.uva.es>

              1. Department   of Computer Science, Universidad de Valladolid (Spain)
              2. Department of Computer Science, Universidad de Chile (Chile)
              3. Digital Enterprise Research Institute, National University of Ireland Galway




 Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Sharing RDF in the Web of Data.
Digital Enterprise Research Institute                                                              www.deri.ie




                                                                              Parsing / Indexing
                                                                              Reasoning
                                                                                             R
                                                   •    Dataset analysis.      I
                                                   •    Setup a SPARQL server. P
                                                   •    Vocabulary interlinking / integration.
                                                   •    Browsing and Visualization.
                        sensor                     •    Exchange between servers
                                                   •    Data-intensive tasks.

                                                       dereferenceable URIs


                                        RDF dump
                                                       SPARQL Endpoints/
                                                             APIs
Dataset Exchange Workflow
Digital Enterprise Research Institute                                           www.deri.ie




                 1º                             2º                   3º
              Publication                    Exchange            Consumption

                   Convert                     Transfer            Decompress

                                 If RDF is meant to be machine processable,
                  Serialize                                            Parse
                         Why are we using plain text serialization formats??

                Compress                                               Index
HDT: RDF Binary Format
Digital Enterprise Research Institute                                www.deri.ie




            Compact Data Structure for RDF.
            W3C Submission. http://www.w3.org/Submission/2011/03/
            Open Source C++/Java library.
HDT Focused on Querying
Digital Enterprise Research Institute                                         www.deri.ie




                                                                 FoQ
            Contribution of this paper:
                   A complementary Index to make the HDT fully queryable.
                   Analysis on how HDT reduces exchange and indexing time.
                   Evaluate in-memory query performance.
Dictionary
Digital Enterprise Research Institute                    www.deri.ie




        Mapping of strings to correlative IDs. {1..n}
        Lexicographically sorted, no duplicates.
        Section compression explained at [8]
Triples Model
Digital Enterprise Research Institute                                                         www.deri.ie


            Triples
                                        S         1                     2                 3
             126
             132
             213                        P[   2        3]   [   1        2      ] [4   ]   3

             224
             225                        O[   6   ][   2]   [   ][
                                                               3    4   ] [5   ] [1   ]   2
             241
             332
Adjacency Lists
Digital Enterprise Research Institute                                                                                      www.deri.ie



                                        1                                2                       3

                           [     2      ,       3]   [           ,
                                                                 1           ,2       ] [4           ]       3
                                 1              2            3           4        5              6


                                        Array            2           3       1    2          4           3
                                     Bitmap              1           0       1    0          0           1

            Operations:
              –      access(g) = Given a global position, get the value.                                         O(1)
              –      findList(g) = Given a global position, get the list number.                                 O(1)
                                                                                                                 O(log log n)
              –      first(l), last(l), = Given a list, find the first and last.
Triples Model and Coding
Digital Enterprise Research Institute                                                                         www.deri.ie


            Triples
                                        S             1                       2                       3
             126
             132
             213                        P      2          3       1           2               4       3

             224
             225                        O      6          2       3       4           5       1       2
             241                            Array Y           2   3   1           2       4       3
             332                        Bitmap Y              1   0   1           0       0       1

                                            Array Z           6   2   3           4       5       1       2
                                        Bitmap Z              1   1   1           1       0       1   1
Searching by Subject
Digital Enterprise Research Institute                                                                           www.deri.ie


            Triples
                                        S             1   ( 2, 2, ? )           2                       3
             126
             132
             213                        P      2          3       1             2               4       3

             224
             225                        O      6          2       3         4           5       1       2
             241                            Array Y           2   3     1           2       4       3
             332                        Bitmap Y              1   0     1           0       0       1

     SPO, SP?                               Array Z           6   2     3           4       5       1       2
     S??, S?O                           Bitmap Z              1   1     1           1       0       1   1
Searching by Predicate
Digital Enterprise Research Institute                                                                           www.deri.ie


            Triples
                                        S             1       ( ?, 2, ? )       2                       3
             126
             132
             213                        P      2          3         1           2               4       3

             224
             225                        O      6          2         3       4           5       1       2
             241                            Array Y           2    3    1           2       4       3
             332                        Bitmap Y              1    0    1           0       0       1

           ?P?                              Array Z           6    2    3           4       5       1       2
                                        Bitmap Z              1    1    1           1       0       1   1
Wavelet Tree
Digital Enterprise Research Institute                                                                              www.deri.ie




            Compact Sequence of Integers {0,σ}.
                                                                        rank(3, 7) = 2
                      2      3      6       3       6       1   2
                                                                1   3     6   2    5     2   4   1    4   2
                      1      2     3    4       5       6   7   8   9    10 11 12 13 14 15
                                                                          9                      16
                                                                                  select(6, 3) = 9

                   access(position) = Value at position.
                   rank(entry, position) = Number of appearances of                                          O(log σ)
                                                                                                              O(log σ)
                    “entry” up to “position”.                                                                 O(log σ)
                   select(entry, i) = Position where “entry” appears for the
                    i-th time.
Searching by Predicate w/ Wavelet
Digital Enterprise Research Institute                                                                           www.deri.ie


            Triples
                                        S             1       ( ?, 2, ? )       2                       3
             126
             132
             213                        P      2          3         1           2               4       3

             224
             225                        O      6          2         3       4           5       1       2
             241
                                        Wavelet Y             2    3    1           2       4       3
             332                        Bitmap Y              1    0    1           0       0       1

           ?P?                              Array Z           6    2    3           4       5       1       2
                                        Bitmap Z              1    1    1           1       0       1   1
Triples: Object-Search
Digital Enterprise Research Institute                                                                        www.deri.ie


            Triples
                                        S            1       ( ?, ?, 2 )        2                        3
             126
             132
             213                        P       2        3         1            2               4        3

             224
             225                        O       6        2         3        4        5          1        2
             241
             332

     ??O                OP-Index            [   6   ][   2        ][
                                                                  7     ]3[         ] [4    ] [5 ]       1

     ?PO                                        O1           O2        O3       O4         O5       O6
Data Structure Summary.
Digital Enterprise Research Institute                            www.deri.ie




            From HDT to HDT-FoQ:
                   Convert Array Y to Wavelet.
                   Generate OP-Index.


            Triple Patterns:

                         SPO, SP?, S??, S?O       Original HDT
                         ?P?                      Wavelet Tree
                         ?PO, ??O                 OP-Index
Evaluation Environment
Digital Enterprise Research Institute                                                  www.deri.ie




          Dataset           Triples      Size NTriples
          LinkedMDB         6,1M         850 Mb
          DBLP              73M          11,1 Gb
          Geonames          112M         12,3 Gb
                                                         Producer:       Consumer:
          DBPedia           258M         37,3 Gb
                                                         Xeon @ 2.4Ghz   Phenom-II @ 3.2Ghz
                         Datasets                        96GB RAM        8GB RAM



                                        Compressors:                     RDF Storage

                                        • GZIP                           • Virtuoso
                                        • LZMA                           • RDF-3x
                                                                         • Hexastore
Compression Ratio
Digital Enterprise Research Institute                                                                 www.deri.ie



            DBPedia



         Geonames

                                                                                      hdt

                                                                                      gz
                 DBLP
                                                                                      lzma

                                                                                      hdt.gz
        LinkedMDB
                                                                                      hdt.lzma


                            0       1   2    3    4    5    6    7    8    9   10   11      12   13    14
                                            Compression ratio (% against plain ntriples)
Publication Times
Digital Enterprise Research Institute                                                                                          www.deri.ie


                                         NT+GZIP         NT+LZMA          HDT             HDT+GZIP        HDT+LZMA
                          linkedMDB      11,3 sec        14,7 min         1,05 min        1,09 min        1,52 min
                          DBLP           2,72 min        103 min          12 min          13,5 min        21,9 min
                          Geonames       3,28 min        244 min          25 min          26,4 min        38,9 min
                          DBPedia        15,9 min        466 min          56 min          60 min          121 min



          dbpedia



       geonames



             dblp



       linkedMDB


                    0      5      10    15    20    25      30      35      40       45         50   55   60   65    70   75       80
                                                            Times slower than Ntriples+GZIP

                                                     gz     lzma    hdt   hdt.gz     hdt.lzma
Publication Times2
Digital Enterprise Research Institute                                                                                         www.deri.ie


                                            NT+GZIP        NT+LZMA             HDT            HDT+GZIP       HDT+LZMA
                          linkedMDB         11,3 sec       14,7 min            1,05 min       1,09 min       1,52 min
                          DBLP              2,72 min       103 min             12 min         13,5 min       21,9 min
                          Geonames          3,28 min       244 min             25 min         26,4 min       38,9 min
                          DBPedia           15,9 min       466 min             56 min         60 min         121 min



          dbpedia



       geonames



             dblp



       linkedMDB


                    0       1           2      3       4        5          6         7        8          9   10     11   12    13
                                                            Times slower than Ntriples + GZIP

                                                           gz       hdt   hdt.gz   hdt.lzma
Exchange & Decompression Time
Digital Enterprise Research Institute                                                                 www.deri.ie




            GZIP




            LZMA




       HDT+GZIP




      HDT+LZMA                                                                           Exchange
                                                                                         Decompress

                   0                    50   100               150                200   250              300
                                             Seconds (Geometric Mean of all datasets)



                                                       *Assuming a Network Bandwidth of 2MByte/s
Overall Client Time
Digital Enterprise Research Institute                                                                                                    www.deri.ie




     LZMA+Virtuoso




       GZ+Virtuoso



                                                                                                                                 Exchange
      LZMA+RDF3x
                                                                                                                                 Decompress
                                                                                                                                 Index

         GZ+RDF3x
                                                                                                     LZMA+RDF3x              HDT+LZMA
                                                                                linkedMDB                      2,1 min              9,21 sec
  HDT+LZMA+FOQ
                                                                                dblp                           27 min              2,02 min
                                                                                geonames                     49,2 min              3,04 min
   HDT+GZIP+FOQ                                                                 dbpedia                       159 min              14,3 min

                     0     200    400   600   800   1000   1200   1400   1600   1800   2000   2200   2400   2600   2800   3000   3200    3400   3600
                                                            Seconds (Geometric mean of all datasets)
In-memory Data Store.
Digital Enterprise Research Institute                                                               www.deri.ie




                                        Triples                   Index Size (Mb)
                                                  Virtuoso       Hexastore       RDF3x    HDT-FoQ
              LinkedMDB                    6,1M         518           6976         337          68
              DBLP                          46M        3982                  -     3252        850
              Geonames                     112M        9216                  -     6678       1435
              DBPedia                      258M              -               -    15802       5260



            Less size = more data in memory = less I/O access!
Query Performance, Triple Patterns
Digital Enterprise Research Institute                                                                 www.deri.ie



                                LinkedMDB                                     Geonames
                    16                                       16
                    15                                       15
                    14                                       14                            RDF-3x
                    13                                       13                            Virtuoso
                    12                                       12
                    11                                       11
 Times HDT Faster




                    10                                       10
                     9                                        9
                     8                                        8
                     7                                        7
                     6                                        6
                     5                                        5
                     4                                        4
                     3                                        3
                     2                                        2
                     1                                        1
                     0                                        0
                         SP?   S?O   S??   ?PO   ?P?   ??O        SP?   S?O    S??   ?PO      ?P?      ??O
Query Performance Two-way Joins
Digital Enterprise Research Institute                                                                                                 www.deri.ie


                                     LinkedMDB                                                          Geonames
                     3                                                           3

                                                                                                                                   RDF-3x
                                                                                                                                   Virtuoso
                    2.5                                                         2.5




                     2                                                           2
 Times HDT Faster




                    1.5                                                         1.5




                     1                                                           1




                    0.5                                                         0.5




                     0                                                           0
                          SSbig   SSsmall   SObig   SOsmall   OObig   OOsmall         SSbig   SSsmall    SObig   SOsmall   OObig      OOsmall
Conclusions
Digital Enterprise Research Institute                                        www.deri.ie




         Data is ready to be consumed 10-15x faster.
               Exchange time reduced.
               Indexing burden on server = Lightweight client processing.
         Competitive query performance.
               Very fast on triple patterns.
               Joins on the same scale of existing solutions.
         This is useful to you:
               If you need a fast, compact read-only in-memory RDF store.
               If you want to share self-queryable RDF dumps.
               If you need fast download & query.
         Addresses the volume issue of Big Data.
Future work.
Digital Enterprise Research Institute                 www.deri.ie


            Full SPARQL support.
                   UNION, Optional, Multiple Join.
                   Optimized query evaluation.
            Integration:
                   Jena, Any23…
            Diffussion.
                   Get more people to use it!
            Additional services on top of HDT.
                   SPARQL Endpoint.
                   Distributed Stream Processing.
                   Mobile Applications.
Thanks! http://www.rdf-hdt.org
Digital Enterprise Research Institute   www.deri.ie

Contenu connexe

Tendances

Neo4j 4 Overview
Neo4j 4 OverviewNeo4j 4 Overview
Neo4j 4 OverviewNeo4j
 
Resource description framework
Resource description frameworkResource description framework
Resource description frameworkStanley Wang
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
How Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global TravelHow Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global TravelNeo4j
 
Risk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptx
Risk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptxRisk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptx
Risk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptxNeo4j
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQLOpen Data Support
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDFNarni Rajesh
 
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...Joshua Shinavier
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
Road to NODES - Blazing Fast Ingest with Apache Arrow
Road to NODES - Blazing Fast Ingest with Apache ArrowRoad to NODES - Blazing Fast Ingest with Apache Arrow
Road to NODES - Blazing Fast Ingest with Apache ArrowNeo4j
 
Data Quality and the FAIR principles
Data Quality and the FAIR principlesData Quality and the FAIR principles
Data Quality and the FAIR principlesAmrapali Zaveri, PhD
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開KnowledgeGraph
 
A Universe of Knowledge Graphs
A Universe of Knowledge GraphsA Universe of Knowledge Graphs
A Universe of Knowledge GraphsNeo4j
 

Tendances (20)

Neo4j 4 Overview
Neo4j 4 OverviewNeo4j 4 Overview
Neo4j 4 Overview
 
Resource description framework
Resource description frameworkResource description framework
Resource description framework
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
How Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global TravelHow Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global Travel
 
Risk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptx
Risk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptxRisk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptx
Risk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptx
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQL
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
A Graph is a Graph is a Graph: Equivalence, Transformation, and Composition o...
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
Road to NODES - Blazing Fast Ingest with Apache Arrow
Road to NODES - Blazing Fast Ingest with Apache ArrowRoad to NODES - Blazing Fast Ingest with Apache Arrow
Road to NODES - Blazing Fast Ingest with Apache Arrow
 
Data Quality and the FAIR principles
Data Quality and the FAIR principlesData Quality and the FAIR principles
Data Quality and the FAIR principles
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
Linked Open Data勉強会2020 前編:LODの基礎・作成・公開
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
A Universe of Knowledge Graphs
A Universe of Knowledge GraphsA Universe of Knowledge Graphs
A Universe of Knowledge Graphs
 
Apache hive
Apache hiveApache hive
Apache hive
 
06 Vector Visualization
06 Vector Visualization06 Vector Visualization
06 Vector Visualization
 

Similaire à Exchange and Consumption of Huge RDF Data

SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)net2-project
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015dhiguero
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Hubert Fan Chiang
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsRichard Cyganiak
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkWill Du
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challengesMichael Hausenblas
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesData Ninja API
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheChristopher Brown
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in SparkDatabricks
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprisesnvvrajesh
 
Capturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance WorkflowsCapturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance WorkflowsAndre Freitas
 
Omitola o rian_eswc_idts final
Omitola o rian_eswc_idts finalOmitola o rian_eswc_idts final
Omitola o rian_eswc_idts finalTope Omitola
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticAlexandre Passant
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksAnyscale
 
On the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open dataOn the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open dataAnisa Rula
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFascorlosquet
 

Similaire à Exchange and Consumption of Huge RDF Data (20)

SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015Adios hadoop, Hola Spark! T3chfest 2015
Adios hadoop, Hola Spark! T3chfest 2015
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
VoID: Metadata for RDF Datasets
VoID: Metadata for RDF DatasetsVoID: Metadata for RDF Datasets
VoID: Metadata for RDF Datasets
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache Spark
 
Linked Data: opportunities and challenges
Linked Data: opportunities and challengesLinked Data: opportunities and challenges
Linked Data: opportunities and challenges
 
Building DBpedia Japanese and Linked Data Cloud in Japanese
Building DBpedia Japanese and Linked Data Cloud in JapaneseBuilding DBpedia Japanese and Linked Data Cloud in Japanese
Building DBpedia Japanese and Linked Data Cloud in Japanese
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents Fiche
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
 
Capturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance WorkflowsCapturing Interactive Data Transformation Operations using Provenance Workflows
Capturing Interactive Data Transformation Operations using Provenance Workflows
 
Omitola o rian_eswc_idts final
Omitola o rian_eswc_idts finalOmitola o rian_eswc_idts final
Omitola o rian_eswc_idts final
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
On the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open dataOn the diversity and availability of temporal information in linked open data
On the diversity and availability of temporal information in linked open data
 
How to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFaHow to Build Linked Data Sites with Drupal 7 and RDFa
How to Build Linked Data Sites with Drupal 7 and RDFa
 

Dernier

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Dernier (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Exchange and Consumption of Huge RDF Data

  • 1. Digital Enterprise Research Institute www.deri.ie Exchange and Consumption of Huge RDF Data Miguel A. Martínez-Prieto1,2 <migumar2@infor.uva.es> Mario Arias1,3 <mario.arias@deri.org> Javier D. Fernández1,2 <jfergar@infor.uva.es> 1. Department of Computer Science, Universidad de Valladolid (Spain) 2. Department of Computer Science, Universidad de Chile (Chile) 3. Digital Enterprise Research Institute, National University of Ireland Galway Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
  • 2. Sharing RDF in the Web of Data. Digital Enterprise Research Institute www.deri.ie Parsing / Indexing Reasoning R • Dataset analysis. I • Setup a SPARQL server. P • Vocabulary interlinking / integration. • Browsing and Visualization. sensor • Exchange between servers • Data-intensive tasks. dereferenceable URIs RDF dump SPARQL Endpoints/ APIs
  • 3. Dataset Exchange Workflow Digital Enterprise Research Institute www.deri.ie 1º 2º 3º Publication Exchange Consumption Convert Transfer Decompress If RDF is meant to be machine processable, Serialize Parse Why are we using plain text serialization formats?? Compress Index
  • 4. HDT: RDF Binary Format Digital Enterprise Research Institute www.deri.ie  Compact Data Structure for RDF.  W3C Submission. http://www.w3.org/Submission/2011/03/  Open Source C++/Java library.
  • 5. HDT Focused on Querying Digital Enterprise Research Institute www.deri.ie FoQ  Contribution of this paper:  A complementary Index to make the HDT fully queryable.  Analysis on how HDT reduces exchange and indexing time.  Evaluate in-memory query performance.
  • 6. Dictionary Digital Enterprise Research Institute www.deri.ie  Mapping of strings to correlative IDs. {1..n}  Lexicographically sorted, no duplicates.  Section compression explained at [8]
  • 7. Triples Model Digital Enterprise Research Institute www.deri.ie Triples S 1 2 3 126 132 213 P[ 2 3] [ 1 2 ] [4 ] 3 224 225 O[ 6 ][ 2] [ ][ 3 4 ] [5 ] [1 ] 2 241 332
  • 8. Adjacency Lists Digital Enterprise Research Institute www.deri.ie 1 2 3 [ 2 , 3] [ , 1 ,2 ] [4 ] 3 1 2 3 4 5 6 Array 2 3 1 2 4 3 Bitmap 1 0 1 0 0 1  Operations: – access(g) = Given a global position, get the value. O(1) – findList(g) = Given a global position, get the list number. O(1) O(log log n) – first(l), last(l), = Given a list, find the first and last.
  • 9. Triples Model and Coding Digital Enterprise Research Institute www.deri.ie Triples S 1 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 Array Y 2 3 1 2 4 3 332 Bitmap Y 1 0 1 0 0 1 Array Z 6 2 3 4 5 1 2 Bitmap Z 1 1 1 1 0 1 1
  • 10. Searching by Subject Digital Enterprise Research Institute www.deri.ie Triples S 1 ( 2, 2, ? ) 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 Array Y 2 3 1 2 4 3 332 Bitmap Y 1 0 1 0 0 1 SPO, SP? Array Z 6 2 3 4 5 1 2 S??, S?O Bitmap Z 1 1 1 1 0 1 1
  • 11. Searching by Predicate Digital Enterprise Research Institute www.deri.ie Triples S 1 ( ?, 2, ? ) 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 Array Y 2 3 1 2 4 3 332 Bitmap Y 1 0 1 0 0 1 ?P? Array Z 6 2 3 4 5 1 2 Bitmap Z 1 1 1 1 0 1 1
  • 12. Wavelet Tree Digital Enterprise Research Institute www.deri.ie  Compact Sequence of Integers {0,σ}. rank(3, 7) = 2 2 3 6 3 6 1 2 1 3 6 2 5 2 4 1 4 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 9 16 select(6, 3) = 9  access(position) = Value at position.  rank(entry, position) = Number of appearances of O(log σ) O(log σ) “entry” up to “position”. O(log σ)  select(entry, i) = Position where “entry” appears for the i-th time.
  • 13. Searching by Predicate w/ Wavelet Digital Enterprise Research Institute www.deri.ie Triples S 1 ( ?, 2, ? ) 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 Wavelet Y 2 3 1 2 4 3 332 Bitmap Y 1 0 1 0 0 1 ?P? Array Z 6 2 3 4 5 1 2 Bitmap Z 1 1 1 1 0 1 1
  • 14. Triples: Object-Search Digital Enterprise Research Institute www.deri.ie Triples S 1 ( ?, ?, 2 ) 2 3 126 132 213 P 2 3 1 2 4 3 224 225 O 6 2 3 4 5 1 2 241 332 ??O OP-Index [ 6 ][ 2 ][ 7 ]3[ ] [4 ] [5 ] 1 ?PO O1 O2 O3 O4 O5 O6
  • 15. Data Structure Summary. Digital Enterprise Research Institute www.deri.ie  From HDT to HDT-FoQ:  Convert Array Y to Wavelet.  Generate OP-Index.  Triple Patterns: SPO, SP?, S??, S?O Original HDT ?P? Wavelet Tree ?PO, ??O OP-Index
  • 16. Evaluation Environment Digital Enterprise Research Institute www.deri.ie Dataset Triples Size NTriples LinkedMDB 6,1M 850 Mb DBLP 73M 11,1 Gb Geonames 112M 12,3 Gb Producer: Consumer: DBPedia 258M 37,3 Gb Xeon @ 2.4Ghz Phenom-II @ 3.2Ghz Datasets 96GB RAM 8GB RAM Compressors: RDF Storage • GZIP • Virtuoso • LZMA • RDF-3x • Hexastore
  • 17. Compression Ratio Digital Enterprise Research Institute www.deri.ie DBPedia Geonames hdt gz DBLP lzma hdt.gz LinkedMDB hdt.lzma 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Compression ratio (% against plain ntriples)
  • 18. Publication Times Digital Enterprise Research Institute www.deri.ie NT+GZIP NT+LZMA HDT HDT+GZIP HDT+LZMA linkedMDB 11,3 sec 14,7 min 1,05 min 1,09 min 1,52 min DBLP 2,72 min 103 min 12 min 13,5 min 21,9 min Geonames 3,28 min 244 min 25 min 26,4 min 38,9 min DBPedia 15,9 min 466 min 56 min 60 min 121 min dbpedia geonames dblp linkedMDB 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 Times slower than Ntriples+GZIP gz lzma hdt hdt.gz hdt.lzma
  • 19. Publication Times2 Digital Enterprise Research Institute www.deri.ie NT+GZIP NT+LZMA HDT HDT+GZIP HDT+LZMA linkedMDB 11,3 sec 14,7 min 1,05 min 1,09 min 1,52 min DBLP 2,72 min 103 min 12 min 13,5 min 21,9 min Geonames 3,28 min 244 min 25 min 26,4 min 38,9 min DBPedia 15,9 min 466 min 56 min 60 min 121 min dbpedia geonames dblp linkedMDB 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Times slower than Ntriples + GZIP gz hdt hdt.gz hdt.lzma
  • 20. Exchange & Decompression Time Digital Enterprise Research Institute www.deri.ie GZIP LZMA HDT+GZIP HDT+LZMA Exchange Decompress 0 50 100 150 200 250 300 Seconds (Geometric Mean of all datasets) *Assuming a Network Bandwidth of 2MByte/s
  • 21. Overall Client Time Digital Enterprise Research Institute www.deri.ie LZMA+Virtuoso GZ+Virtuoso Exchange LZMA+RDF3x Decompress Index GZ+RDF3x LZMA+RDF3x HDT+LZMA linkedMDB 2,1 min 9,21 sec HDT+LZMA+FOQ dblp 27 min 2,02 min geonames 49,2 min 3,04 min HDT+GZIP+FOQ dbpedia 159 min 14,3 min 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 Seconds (Geometric mean of all datasets)
  • 22. In-memory Data Store. Digital Enterprise Research Institute www.deri.ie Triples Index Size (Mb) Virtuoso Hexastore RDF3x HDT-FoQ LinkedMDB 6,1M 518 6976 337 68 DBLP 46M 3982 - 3252 850 Geonames 112M 9216 - 6678 1435 DBPedia 258M - - 15802 5260  Less size = more data in memory = less I/O access!
  • 23. Query Performance, Triple Patterns Digital Enterprise Research Institute www.deri.ie LinkedMDB Geonames 16 16 15 15 14 14 RDF-3x 13 13 Virtuoso 12 12 11 11 Times HDT Faster 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 SP? S?O S?? ?PO ?P? ??O SP? S?O S?? ?PO ?P? ??O
  • 24. Query Performance Two-way Joins Digital Enterprise Research Institute www.deri.ie LinkedMDB Geonames 3 3 RDF-3x Virtuoso 2.5 2.5 2 2 Times HDT Faster 1.5 1.5 1 1 0.5 0.5 0 0 SSbig SSsmall SObig SOsmall OObig OOsmall SSbig SSsmall SObig SOsmall OObig OOsmall
  • 25. Conclusions Digital Enterprise Research Institute www.deri.ie  Data is ready to be consumed 10-15x faster.  Exchange time reduced.  Indexing burden on server = Lightweight client processing.  Competitive query performance.  Very fast on triple patterns.  Joins on the same scale of existing solutions.  This is useful to you:  If you need a fast, compact read-only in-memory RDF store.  If you want to share self-queryable RDF dumps.  If you need fast download & query.  Addresses the volume issue of Big Data.
  • 26. Future work. Digital Enterprise Research Institute www.deri.ie  Full SPARQL support.  UNION, Optional, Multiple Join.  Optimized query evaluation.  Integration:  Jena, Any23…  Diffussion.  Get more people to use it!  Additional services on top of HDT.  SPARQL Endpoint.  Distributed Stream Processing.  Mobile Applications.
  • 27. Thanks! http://www.rdf-hdt.org Digital Enterprise Research Institute www.deri.ie

Notes de l'éditeur

  1. Importance of exchange. The Web is for exchanging data. Data flows between nodes. We are in the “Big Data era” We need fast speed, from the network to the application layers.Role of providers / Consumers.Consumption =~ QueryingHow data is shared:Dereferenceable URIs.SPARQL Endpoints.Big datasets: RDF dump. ( Similar to XML, PDF ).Examples where RDF dumps are important: - Setup a mirror. - Overloaded SPARQL Server. - Data analysis. - Vocabulary integration. - Download instead of crawl. - Visualization.Opens new applications. - Processing intensive. - Cooperating applications.
  2. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  3. CPUs are fast, memory/bandwidth are precious.Variable-length.Compression.Compact In-memory representations.
  4. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  5. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  6. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  7. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  8. Triples are sorted component by component.We represent them in a tree: - Each level represents S, P, O. - Each path / leave node represents one triple. How we encode the tree for 1 Space 2 Traverse. - Level by level. S implicit. P, O Array. - Relations with brackets / Bitmap. -
  9. DatasetsServersData stores.CompilerCompressors.GZIPLZMA
  10. From NTRIPLES to XXXFrom a data store could be faster (Already sorted).
  11. Includes dictionary!!!Great for mobile.