SlideShare a Scribd company logo
1 of 85
Download to read offline
Building a High
Performance Environment
   for RDF Publishing

        Pascal Christoph
These slides and all the graphics made by the author and those
taken from https://openclipart.org/ are dedicated to the public
domain : https://creativecommons.org/about/cc0 .

All marks mentioned may be trademarks or registered trademarks
of their respective owners.

Read about the license of „The scream“ of Edward Munch at
https://en.wikipedia.org/wiki/File:The_Scream.jpg

             Linking Open Data cloud diagram, by Richard
             Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Overview

Publishing is for Consuming
  •
    Mandatory
  •
    Nice to have

Story so far - experiences with lobid.org
  •
    What is lobid.org ?
  •
    Storing the data
  •
    Getting the data

Publishing RDF through elasticsearch
  •
    Benefits
  •
    Some more details
  •
    Caveats

Future prospects


                   Building a High Performance Environment for RDF Publishing   3
Overview

Publishing is for Consuming
  •
    Mandatory
  •
    Nice to have

Story so far - experiences with lobid.org
  •
    What is lobid.org ?
  •
    Storing the data
  •
    Getting the data

Publishing RDF through elasticsearch
  •
    Benefits
  •
    Some more details
  •
    Caveats

Future prospects


                   Building a High Performance Environment for RDF Publishing   4
ing is for Consuming
                        Publish



Publishing is for Consuming




  Building a High Performance Environment for RDF Publishing   5
ing is for Consuming
                                    Publish
                             Mandatory

A resource:




              Building a High Performance Environment for RDF Publishing   6
ing is for Consuming
                                    Publish
                             Mandatory

A resource:



gets a dereferenceable URI:




              Building a High Performance Environment for RDF Publishing   7
ing is for Consuming
                                                        Publish
                                               Mandatory

A resource:



gets a dereferenceable URI:


which provides RDF:
<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/title> "With reference to reference" .
<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/issued> "1983" .
<http://lobid.org/resource/HT002948556> <http://purl.org/ontology/bibo/isbn13> "9780915145539" .
<http://lobid.org/resource/HT002948556><http://purl.org/dc/elements/1.1/creator><http://d-nb.info/gnd/135539897> .




                         Building a High Performance Environment for RDF Publishing                          8
ing is for Consuming
                           Publish
                    Mandatory


=> basic LOD publishing is very simple:


you just need a Webserver




     Building a High Performance Environment for RDF Publishing   9
ing is for Consuming
                                              Publish
                                      Nice to have
•
    Dumps
•
    Content Negotiation (different RDF serializations)
•
    SPARQL
•
    Human readable representation (best: RDFa in HTML)
•
    Data searchable
•
    Timely updates
•
    High Availability
•
    Versioning
•
    Web developers want simple APIs providing JSON
•
    ...




                        Building a High Performance Environment for RDF Publishing   10
ing is for Consuming
                                              Publish
                                  SPARQL Endpoint
•
    (Dumps)
•
    Content Negotiation (different RDF serializations)
•
    SPARQL
•
    Human readable representation (best: RDFa in HTML)
•
    Data searchable
•
    Timely updates
•
    High Availability
•
    Versioning
•
    Web developers want simple APIs providing JSON
•
    ...




                        Building a High Performance Environment for RDF Publishing   11
ing is for Consuming
                                                  Publish
                                      SPARQL Endpoint
•
    (Dumps): but may be painfully slow when having lots of data
•
    Content Negotiation (different RDF serializations)
•
    SPARQL
•
    Human readable representation (best: RDFa in HTML)
•
    (Data searchable) : maybe painfully slow
•
    Timely updates
•
    High Availability
•
    Versioning
•
    Web developers want simple APIs providing JSON
          •
              most triple stores provides JSON/RDF
          •
              Simple powerful API : too powerful/complex ?
•
    ...
                            Building a High Performance Environment for RDF Publishing   12
ing is for Consuming
                                   Publish
                           Nice to have


In principle, web developers already got simple APIs :

                 LOD is the API !




             Building a High Performance Environment for RDF Publishing   13
ing is for Consuming
                                   Publish
                           Nice to have


In principle, web developers already got simple APIs :

                      Remember:




             Building a High Performance Environment for RDF Publishing   14
ing is for Consuming
                                                        Publish
                                               Mandatory

A resource:



gets a dereferenceable URI:


which provides the data (in RDF):

<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/title> "With reference to reference" .
<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/issued> "1983" .
<http://lobid.org/resource/HT002948556> <http://purl.org/ontology/bibo/isbn13> "9780915145539" .
<http://lobid.org/resource/HT002948556><http://purl.org/dc/elements/1.1/creator><http://d-nb.info/gnd/135539897> .




                         Building a High Performance Environment for RDF Publishing                          15
ing is for Consuming
                                   Publish
                           Nice to have


In principle, web developers already got powerful APIs :

                 RESTful         SPARQL




             Building a High Performance Environment for RDF Publishing   16
ing is for Consuming
                                    Publish
                  RESTful SPARQL example
  getting all data of all resources having a particular ISBN:


curl -H "Accept: application/json" --data-urlencode 'query=
prefix bibo: <http://purl.org/ontology/bibo/>
SELECT * WHERE {
 ?s bibo:isbn13 "9780851706238" ;
   ?p ?o .
 } LIMIT 100
' http://lobid.org/sparql/



              Building a High Performance Environment for RDF Publishing   17
ing is for Consuming
                      Publish
                Nice to have




Building a High Performance Environment for RDF Publishing 18
ing is for Consuming
                                                            Publish
                                  RESTful SPARQL example
… and the JSON/RDF result:
{
    "head": {
       "vars": [ "s", "p","o"]
    },
    "results": {
       "bindings": [ {
             "o": {
                "type": "uri",
                "value": "http://openlibrary.org/works/OL2109573W"
             },
             "p": {
                "type": "uri",
                "value": "http://rdvocab.info/RDARelationshipsWEMI/workManifested"
             },
             "s": {
                "type": "uri",
                "value": "http://lobid.org/resource/HT007824357"
             }
          },
          {
             "o": { ...




                            Building a High Performance Environment for RDF Publishing   19
ing is for Consuming
                      Publish
                Nice to have
               As it is, web developers don't like SPARQL




                               web developer




Building a High Performance Environment for RDF Publishing 20
ing is for Consuming
                                   Publish
                           Nice to have


Web developers want APIs like:


http://lobid.org/resources/api/isbn/$isbn




             Building a High Performance Environment for RDF Publishing   21
Happy web developer
Overview

Publishing is for Consuming
  •
    Mandatory
  •
    Nice to have


Story so far - experiences with lobid.org
  •
    What is lobid.org ?
  •
    Storing the data
  •
    Getting the data

Publishing RDF through elasticsearch
  •
    Benefits
  •
    Some more details
  •
    Caveats

Future prospects

                   Building a High Performance Environment for RDF Publishing   23
What is l obid.org ?
                 lobid.org




Building a High Performance Environment for RDF Publishing   24
What is l obid.org ?
●
    lobid := linking open bibliographic data
●
    LOD services of the hbz
     ●
       lobid-resources :
        ●
          exposes 85% of the hbz cooperative catalogue
        ●
          entries coming from > 200 scientific German libraries
        ●
          ~ 16 M records with 700 M triples
           ●
             with links to ~ 5 M other resources
           ●
             with links to ~ 32 M items (consisting of 300 M triples)
    ●
        lobid-organisations :
        ●
          exposes German Sigelverzeichnis and MARC-Isil directory
        ●
          ~ 40 k descriptions of institutions



                   Building a High Performance Environment for RDF Publishing   25
What is lobid.org ?
                                   What's missing?
•
    Dumps
•
    Content Negotiation (different RDF serializations)
•
    SPARQL
•
    Human readable representation ( RDFa in HTML)
•
    Data searchable
•
    Timely updates
•
    High Availability
•
    Versioning
•
    Web developers want simple APIs providing JSON
•
    ...




                        Building a High Performance Environment for RDF Publishing   26
Overview

Publishing is for Consuming
  •
    Mandatory
  •
    Nice to have


Story so far - experiences with lobid.org
  •
      What is lobid.org ?
  •
      Storing the data
  •
      Getting the data

Publishing RDF through elasticsearch
  •
    Benefits
  •
    Some more details
  •
    Caveats

Future prospects

                    Building a High Performance Environment for RDF Publishing   27
storing the data
                2010 - 2011, lobid-organisation

Filesystem :
     + easy to maintain
     + reliable
     + fast
     - no search
     - no SPARQL
     - ...




               Building a High Performance Environment for RDF Publishing   28
storing the data
                              lobid today

Triple Store (4store) :
     + power of SPARQL
     +/- depending on the query: fast to horribly slow
     +/- search (but string searches often slow and limited)
     - sometimes gets stuck !




               Building a High Performance Environment for RDF Publishing   29
storing the data
                               lobid today

Search engine (elasticsearch):
     + fast search
     + stemming, linguistics …
     + wildcard searching
     + facets
     + geo search
     + JSON
     + schema-less
     + simple RESTful API
     + many plugins
     + ...
     + easy to achieve High Availability
     + scales nicely



                Building a High Performance Environment for RDF Publishing   30
lobid today storing/getting the data
Overview

Publishing is for Consuming
  •
    Mandatory
  •
    Nice to have


Story so far - experiences with lobid.org
  •
    What is lobid.org ?
  •
    Storing the data
  •
      Getting the data
Publishing RDF through elasticsearch
  •
    Benefits
  •
    Some more details
  •
    Caveats

Future prospects

                   Building a High Performance Environment for RDF Publishing   32
getting the data
lobid : technology/dependency stack

            Webapp

                  Search Engine

                         Triple Store




  Building a High Performance Environment for RDF Publishing   33
getting the data
lobid : technology/dependency stack

            Webapp
              we can do that

                  Search Engine
                     highly available !

                         Triple Store
                              sometimes gets stuck!




  Building a High Performance Environment for RDF Publishing   34
getting the data
lobid : technology/dependency stack

            Webapp
              we can do that

                  Search Engine
                                              <
                     highly available !

                         Triple Store
                                                   =
                              sometimes gets stuck!




  Building a High Performance Environment for RDF Publishing   35
getting the data
lobid : technology/dependency stack

            Webapp
              we can do that

                  Search Engine
                     highly available !

                         Triple Store
                              sometimes gets stuck!




  Building a High Performance Environment for RDF Publishing   36
oring/getting the data
                                                  st
            Variant 1 : technology/dependency stack




         Triple Store                                       Triple Store
Closed, internal. Will be safe from malign queries.         For external access.
                                                            Sometimes gets stuck!

                    Building a High Performance Environment for RDF Publishing   37
oring/getting the data
                                                  st
            Variant 1 : technology/dependency stack


                                   redundant, complex …




         Triple Store                                       Triple Store
Closed, internal. Will be safe from malign queries.         For external access.
                                                            Sometimes gets stuck!

                    Building a High Performance Environment for RDF Publishing   38
Overview

Publishing is for Consuming
  •
    Mandatory
  •
    Nice to have


Story so far - experiences with lobid.org
  •
    What is lobid.org ?
  •
    Storing the data
  •
    Getting the data

Publishing RDF through elasticsearch
  Benefits
  •


  •
    Some more details
  •
    Caveats

Future prospects

                   Building a High Performance Environment for RDF Publishing   39
D with elasticsearch
                                 Publishing LO
             Variant 2: technology/dependency stack



                           Webapp
                             we can do that

                                 Search Engine
                                    highly available !


LOD basis functionality (and some other
                                                            Triple Store
                                                     For external access and some
      APIs) are highly available
                                                     fancy nice-to-have stuff.
                                                     Sometimes gets stuck!
                    Building a High Performance Environment for RDF Publishing   40
D with elasticsearch
                                    Publishing LO
                                         Benefits
•
    Dumps
•
    Content Negotiation (different RDF serializations)
•
    SPARQL
•
    Human readable representation ( RDFa in HTML)
•
    Data searchable
•
    Near Real Time updates
•
    High Availability
•
    (Versioning)
•
    Web developers want simple APIs returning JSON
•
    ...



                        Building a High Performance Environment for RDF Publishing   41
D with elasticsearch
              Publishing LO
                   Benefits




fast, scalable search engine




  Building a High Performance Environment for RDF Publishing   42
D with elasticsearch
                                            Publishing LO
                                            performance test

Data: 10 M records <=> 300 M triple
Case-insensitive query: „beach“
SELECT ?s
WHERE
 { ?s <http://purl.org/dc/terms/title> ?o
   FILTER regex(str(?o), "beach", "i")
 }
#### => SPARQL execution time for Q8316: 108.7s, returned 2815 rows.

http://$ip:9200/_search?q=beach&from=0&size=2800
# => Elasticsearch needed 0.4s

=> Elasticsearch is 250 times faster


                          Building a High Performance Environment for RDF Publishing   43
D with elasticsearch
                             Publishing LO
                            performance test



(there is a support for text indexing in 4store, have not tested that.)




                 Building a High Performance Environment for RDF Publishing   44
D with elasticsearch
                                 Publishing LO
                                performance test


Elasticsearch: 18 M records , 6 GB RAM: 5 hour
4store: 1 B triples, having 72 GB RAM: 7 hours


CPU: Quad Core mit 2.4 GhZ und Hyperthreading => 8 CPUs
HD: 6 x 2.5" 10k U/min a 146GB




(Don't take benchmarks too seriously – they just give a clue !)



                    Building a High Performance Environment for RDF Publishing   45
D with elasticsearch
                                    Publishing LO
                                         Benefits
•
    Dumps
•
    Content Negotiation (different RDF serializations)
•
    SPARQL
•
    Human readable representation ( RDFa in HTML)
•
    Data searchable
•
    Near Real Time updates
•
    High Availability
•
  (Versioning)
      •
        Web developers want simple APIs providing JSON
•
  ...




                        Building a High Performance Environment for RDF Publishing   46
D with elasticsearch
                    Publishing LO
                         Benefits




build to be easily made highly available !




        Building a High Performance Environment for RDF Publishing   47
D with elasticsearch
                                    Publishing LO
                                         Benefits
•
    Dumps
•
    Content Negotiation (different RDF serializations)
•
    SPARQL
•
    Human readable representation ( RDFa in HTML)
•
    Data searchable
•
    Near Real Time updates
•
    High Availability
•
  (Versioning)
      •
        Web developers want simple APIs providing JSON
•
  ...




                        Building a High Performance Environment for RDF Publishing   48
D with elasticsearch
                       Publishing LO
                            Benefits


Versioning with elasticsearch:
Not out-of-the-box, but comes at least e.g. with
* concurrency control
* documents have a version number
=> implementing versioning is not hard

           Building a High Performance Environment for RDF Publishing   49
D with elasticsearch
                                    Publishing LO
      Benefits, relying on elasticsearch as basic LOD storage
•
    Dumps
•
    Content Negotiation (different RDF serializations)
•
    SPARQL
•
    Human readable representation ( RDFa in HTML)
•
    Data searchable
•
    Near Real Time updates
•
    High Availability
•
    Versionizing
•
  Web developers want:
      •
        JSON (LD)
      •
        Simple APIs
•
  ...

                        Building a High Performance Environment for RDF Publishing   50
D with elasticsearch
                                    Publishing LO
                                         Benefits
•
    Dumps
•
    Content Negotiation (different RDF serializations)
•
    SPARQL
•
    Human readable representation ( RDFa in HTML)
•
    Data searchable
•
    Near Real Time updates
•
    High Availability
•
  (Versioning)
      •
        Web developers want simple APIs providing JSON
•
  ...




                        Building a High Performance Environment for RDF Publishing   51
D with elasticsearch
                             Publishing LO
                             Why JSON-LD?
    JSON is :
•
    stored natively by many tools (e.g. elasticsearch)
•
    loved by consumers (web developers)


    JSON-LD is :
•
    supported by RDF libraries (e.g. transforming to NTriples)




                 Building a High Performance Environment for RDF Publishing   52
D with elasticsearch
                                    Publishing LO
                                         Benefits
•
    Dumps
•
    Content Negotiation (different RDF serializations)
•
    SPARQL
•
    Human readable representation ( RDFa in HTML)
•
    Data searchable
•
    Near Real Time updates
•
    High Availability
•
  (Versioning)
      •
        Web developers want simple APIs providing JSON
•
  ...




                        Building a High Performance Environment for RDF Publishing   53
D with elasticsearch
                           Publishing LO
                                Benefits


RESTful elasticsearch API, e. g. :


http://lobid.org/resources/_search?q=isbn:$isbn




               Building a High Performance Environment for RDF Publishing   54
D with elasticsearch
                                  Publishing LO
                                       Benefits

•
    … and many other nice things come with elasticsearch
     •
         geo-search : „Query only libraries/items residing up to 10 km from me.“
     •
         …




                      Building a High Performance Environment for RDF Publishing   55
D with elasticsearch
                                    Publishing LO
                                         Benefits
•
    Dumps
    Content Negotiation (different RDF serializations)

                                       d!
•



•
    SPARQL
                                 plishe
•


                            accom
    Human readable representation ( RDFa in HTML)
•



•
    Data searchable
                   Mis sion
    Near Real Time updates
•
    High Availability
•
    (Versioning)
•
    Web developers want simple APIs providing JSON
•
    ...



                        Building a High Performance Environment for RDF Publishing   56
D with elasticsearch
                                    Publishing LO
                   ( … ok, something is left to be done ! )
•
    Dumps
•
    Content Negotiation (different RDF serializations)
•
    SPARQL
•
    Human readable representation ( RDFa in HTML)
•
    Data searchable
•
    Near Real Time updates
•
    High Availability
•
    Versionizing
•
    Web developers want simple APIs providing JSON
•
    ...



                        Building a High Performance Environment for RDF Publishing   57
Overview

Publishing is for Consuming
  •
    Mandatory
  •
    Nice to have

Story so far - experiences with lobid.org
  •
    What is lobid.org ?
  •
    Storing the data
  •
    Getting the data

Publishing RDF through elasticsearch
  •
      Benefits
  •
      Caveats
  •
      Auto suggest demo

Conclusion

                   Building a High Performance Environment for RDF Publishing   58
D with elasticsearch
                                    Publishing LO
                                         Caveats
•
    Dumps
•
    Content Negotiation (different RDF serializations)
•
    SPARQL
•
    Human readable representation ( RDFa in HTML)
•
    Data searchable
•
    Near Real Time updates
•


•
    High Availability
    Versionizing
                                                                    !?
•
    Web developers want simple APIs providing JSON
•
    ...



                        Building a High Performance Environment for RDF Publishing   59
D with elasticsearch
                           Publishing LO
                                Caveats
How to integrate semantic search into a document storage ?

dct:contributor --------> dct:creator -------> dc:creator
              ---------> dc:contributor
                --------> bibo:translator
              …


There is no inferencing as comes with SPARQL !



               Building a High Performance Environment for RDF Publishing   60
D with elasticsearch
                              Publishing LO
                                   Caveats

Our data flow :

from records to RDF triples to records




                  Building a High Performance Environment for RDF Publishing   61
D with elasticsearch
                              Publishing LO
                                   Caveats

Our data flow :

from records to RDF triples to records




                  Building a High Performance Environment for RDF Publishing   62
D with elasticsearch
                      Publishing LO
                           Caveats




from records to RDF triples to records


                   !?

          Building a High Performance Environment for RDF Publishing   63
D with elasticsearch
                                Publishing LO
                                     Caveats

From records to RDF triples
                  |-----> graph-database
                  '------> computing ---> record-database




 MARC/MAB/PICA...
                                                                JSON-LD




                    Building a High Performance Environment for RDF Publishing   64
D with elasticsearch
               Publishing LO
                    Caveats
    tree-based vs graph-based:

Pre-render the whole document?

 What is the document ?



   Building a High Performance Environment for RDF Publishing   65
D with elasticsearch
            Publishing LO
                 Caveats




Building a High Performance Environment for RDF Publishing   66
D with elasticsearch
                         Publishing LO
                              Caveats

What is the document ? Only the top-level node ?




             Building a High Performance Environment for RDF Publishing   67
D with elasticsearch
                          Publishing LO
                               Caveats

What is the document ? Only the top-level node ?




… but then you couldn't even search the authors name !

              Building a High Performance Environment for RDF Publishing   68
D with elasticsearch
                              Publishing LO
                                   Caveats

searching needs integration of some fields from subgraphs into the document




                  Building a High Performance Environment for RDF Publishing   69
Overview

Publishing is for Consuming
  •
    Mandatory
  •
    Nice to have

Story so far - experiences with lobid.org
  •
    What is lobid.org ?
  •
    Storing the data
  •
    Getting the data

Publishing RDF through elasticsearch
  •
    Benefits
  •
    Caveats
  •
      Auto suggest demo
Conclusion

                   Building a High Performance Environment for RDF Publishing   70
D with elasticsearch
                  Publishing LO
                    auto suggest



authority IDs must be easily found




                                                               71
      Building a High Performance Environment for RDF Publishing
D with elasticsearch
                  Publishing LO
                    auto suggest



authority IDs must be easily found
   => in need of auto suggest




                                                               72
      Building a High Performance Environment for RDF Publishing
D with elasticsearch
                   Publishing LO
                     auto suggest



auto suggests needs fast searching




       Building a High Performance Environment for RDF Publishing   73
D with elasticsearch
            Publishing LO
                   Demo



        auto suggest




Building a High Performance Environment for RDF Publishing   74
D with elasticsearch
Publishing LO




                               75
D with elasticsearch
                                Publishing LO
                                  auto suggest
RESTful APIs:

http://demo.lobid.org/search?format=short&index=gnd-index&author=Schmidt%2C+Karl

http://demo.lobid.org/search?format=page&index=gnd-index&author=Schmidt%2C+Karl

http://demo.lobid.org/search?format=full&index=gnd-index&author=Schmidt%2C+Karl
…

API usage:
GET /search?format=<page|full|short>&index=<lobid-index|gnd-index>&author=<query>

easy to enhance with the play framework and the elasticsearch API




                    Building a High Performance Environment for RDF Publishing   76
D with elasticsearch
                                         Publishing LO
                                           auto suggest
RESTful APIs:

http://demo.lobid.org/search
?format=short&index=gnd-index&author=Schmidt%2C+Karl
  [

            "Schmidt, Karl (1894-1945)",
            "Schmidt, Karl",
            "Schmidt, Karl (1910-)",
            "Schmidt, Karl (1846-1928)",
            "Schmidt, Karl (1913-)",
            "Schmidt, Karl (1899-)",
         "Schmidt, Karl (1924-)",
      "Schmidt, Karl (1836-1888)",
      "Schmidt, L. F. Karl",
      "Schmidt, Karl (1902-1945)",
      "Schmidt, Karl J.",
      "Schmidt, Karl (1848-1905)",
      "Schmidt, Karl (1817-1882)",
      "Schmidt, Karl R.",
      "Schmidt, Karl (1954-)",
      "Schmidt, Karl (1888-)",
      "Schmidt, Karl (1867-)",
      ...
  ]



                             Building a High Performance Environment for RDF Publishing   77
D with elasticsearch
            Publishing LO
              auto suggest

 GND authority file
                    in
     lobid-resources



Building a High Performance Environment for RDF Publishing   78
D with elasticsearch
Publishing LO




                               79
D with elasticsearch
            Publishing LO




Building a High Performance Environment for RDF Publishing   80
Overview

Publishing is for Consuming
  •
    Mandatory
  •
    Nice to have

Story so far - experiences with lobid.org
  •
    What is lobid.org ?
  •
    Storing the data
  •
    Getting the data

Publishing RDF through elasticsearch
  •
    Benefits
  •
    Caveats
  •
    Auto suggest demo

Conclusion

                   Building a High Performance Environment for RDF Publishing   81
D with elasticsearch
                                 Publishing LO
                                   Conclusion
    a highly customizable/reliable/feature-rich LOD service

                           Webapp
                             we can do that

                                 Search Engine
                                    highly available !


LOD basis functionality (and some other
                                                            Triple Store
                                                     For external access and some
      APIs) are highly available
                                                     fancy nice-to-have stuff.
                                                     Sometimes gets stuck!
                    Building a High Performance Environment for RDF Publishing   82
D with elasticsearch
             Publishing LO
the software is Open Source:



              http://elasticsearch.org/


         https://hadoop.apache.org/


    http://www.playframework.org/

  http://4store.org/

 https://github.com/lobid/




Building a High Performance Environment for RDF Publishing   83
Any Questions ?


     Pascal Christoph
     christoph@hbz-nrw.de


     semweb@hbz-nrw.de
Using a dark background, this presentation saves maybe 70% of energy

More Related Content

What's hot

Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloudrhatr
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...HBaseCon
 
Rails 6 Multi-DB 実戦投入
Rails 6 Multi-DB 実戦投入Rails 6 Multi-DB 実戦投入
Rails 6 Multi-DB 実戦投入kiyots
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypresNekoGato
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillMapR Technologies
 
SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceBrian Culver
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Yahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudYahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudConSanFrancisco123
 
Drupal + ApacheSolr
Drupal + ApacheSolrDrupal + ApacheSolr
Drupal + ApacheSolrDropsolid
 
Yahoo! - Arun Murthy - Hadoop World 2010
Yahoo! - Arun Murthy - Hadoop World 2010Yahoo! - Arun Murthy - Hadoop World 2010
Yahoo! - Arun Murthy - Hadoop World 2010Cloudera, Inc.
 
FOSDEM '18 - Tools for large scale collection and analysis of source code re...
FOSDEM '18  - Tools for large scale collection and analysis of source code re...FOSDEM '18  - Tools for large scale collection and analysis of source code re...
FOSDEM '18 - Tools for large scale collection and analysis of source code re...seoul_engineer
 

What's hot (13)

Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloud
 
Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
A Graph Service for Global Web Entities Traversal and Reputation Evaluation B...
 
Rails 6 Multi-DB 実戦投入
Rails 6 Multi-DB 実戦投入Rails 6 Multi-DB 実戦投入
Rails 6 Multi-DB 実戦投入
 
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypres
 
Hadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache DrillHadoop User Group - Status Apache Drill
Hadoop User Group - Status Apache Drill
 
SharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceSharePoint Saturday San Antonio: SharePoint 2010 Performance
SharePoint Saturday San Antonio: SharePoint 2010 Performance
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Yahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudYahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The Cloud
 
Drupal + ApacheSolr
Drupal + ApacheSolrDrupal + ApacheSolr
Drupal + ApacheSolr
 
Yahoo! - Arun Murthy - Hadoop World 2010
Yahoo! - Arun Murthy - Hadoop World 2010Yahoo! - Arun Murthy - Hadoop World 2010
Yahoo! - Arun Murthy - Hadoop World 2010
 
FOSDEM '18 - Tools for large scale collection and analysis of source code re...
FOSDEM '18  - Tools for large scale collection and analysis of source code re...FOSDEM '18  - Tools for large scale collection and analysis of source code re...
FOSDEM '18 - Tools for large scale collection and analysis of source code re...
 

Similar to Building a High Performance Environment for RDF Publishing

ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2Martin Hepp
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2guestecacad2
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Richard Cyganiak
 
RDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itRDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itJose Luis Lopez Pino
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013François Belleau
 
RDFa Introductory Course Session 3/4 Why RDFa
RDFa Introductory Course Session 3/4 Why RDFaRDFa Introductory Course Session 3/4 Why RDFa
RDFa Introductory Course Session 3/4 Why RDFaPlatypus
 
Towards a Commons RDF Library - ApacheCon Europe 2014
Towards a Commons RDF Library - ApacheCon Europe 2014Towards a Commons RDF Library - ApacheCon Europe 2014
Towards a Commons RDF Library - ApacheCon Europe 2014Sergio Fernández
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaJeen Broekstra
 
RejectKaigi2010 - RDF.rb
RejectKaigi2010 - RDF.rbRejectKaigi2010 - RDF.rb
RejectKaigi2010 - RDF.rbFumihiro Kato
 
Ruby semweb 2011-12-06
Ruby semweb 2011-12-06Ruby semweb 2011-12-06
Ruby semweb 2011-12-06Gregg Kellogg
 
Apache dubbo (incubating) open source present and future
Apache dubbo (incubating) open source present and futureApache dubbo (incubating) open source present and future
Apache dubbo (incubating) open source present and futureHuxing Zhang
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012François Belleau
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsDataWorks Summit
 

Similar to Building a High Performance Environment for RDF Publishing (20)

ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2ISWC GoodRelations Tutorial Part 2
ISWC GoodRelations Tutorial Part 2
 
GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2GoodRelations Tutorial Part 2
GoodRelations Tutorial Part 2
 
Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)Practical Cross-Dataset Queries with SPARQL (Introduction)
Practical Cross-Dataset Queries with SPARQL (Introduction)
 
RDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itRDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use it
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
LOD技術解説
LOD技術解説LOD技術解説
LOD技術解説
 
D2RQ
D2RQD2RQ
D2RQ
 
Why rdfa
Why rdfaWhy rdfa
Why rdfa
 
RDFa Introductory Course Session 3/4 Why RDFa
RDFa Introductory Course Session 3/4 Why RDFaRDFa Introductory Course Session 3/4 Why RDFa
RDFa Introductory Course Session 3/4 Why RDFa
 
D2 rq
D2 rqD2 rq
D2 rq
 
Towards a Commons RDF Library - ApacheCon Europe 2014
Towards a Commons RDF Library - ApacheCon Europe 2014Towards a Commons RDF Library - ApacheCon Europe 2014
Towards a Commons RDF Library - ApacheCon Europe 2014
 
Eclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in JavaEclipse RDF4J - Working with RDF in Java
Eclipse RDF4J - Working with RDF in Java
 
RejectKaigi2010 - RDF.rb
RejectKaigi2010 - RDF.rbRejectKaigi2010 - RDF.rb
RejectKaigi2010 - RDF.rb
 
Ruby semweb 2011-12-06
Ruby semweb 2011-12-06Ruby semweb 2011-12-06
Ruby semweb 2011-12-06
 
ArangoDB
ArangoDBArangoDB
ArangoDB
 
Apache dubbo (incubating) open source present and future
Apache dubbo (incubating) open source present and futureApache dubbo (incubating) open source present and future
Apache dubbo (incubating) open source present and future
 
Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012Bio2RDF presentation at Combine 2012
Bio2RDF presentation at Combine 2012
 
RDFauthor (EKAW)
RDFauthor (EKAW)RDFauthor (EKAW)
RDFauthor (EKAW)
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worlds
 
Converting GHO to RDF
Converting GHO to RDFConverting GHO to RDF
Converting GHO to RDF
 

Recently uploaded

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Recently uploaded (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Building a High Performance Environment for RDF Publishing

  • 1. Building a High Performance Environment for RDF Publishing Pascal Christoph
  • 2. These slides and all the graphics made by the author and those taken from https://openclipart.org/ are dedicated to the public domain : https://creativecommons.org/about/cc0 . All marks mentioned may be trademarks or registered trademarks of their respective owners. Read about the license of „The scream“ of Edward Munch at https://en.wikipedia.org/wiki/File:The_Scream.jpg Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 3. Overview Publishing is for Consuming • Mandatory • Nice to have Story so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the data Publishing RDF through elasticsearch • Benefits • Some more details • Caveats Future prospects Building a High Performance Environment for RDF Publishing 3
  • 4. Overview Publishing is for Consuming • Mandatory • Nice to have Story so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the data Publishing RDF through elasticsearch • Benefits • Some more details • Caveats Future prospects Building a High Performance Environment for RDF Publishing 4
  • 5. ing is for Consuming Publish Publishing is for Consuming Building a High Performance Environment for RDF Publishing 5
  • 6. ing is for Consuming Publish Mandatory A resource: Building a High Performance Environment for RDF Publishing 6
  • 7. ing is for Consuming Publish Mandatory A resource: gets a dereferenceable URI: Building a High Performance Environment for RDF Publishing 7
  • 8. ing is for Consuming Publish Mandatory A resource: gets a dereferenceable URI: which provides RDF: <http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/title> "With reference to reference" . <http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/issued> "1983" . <http://lobid.org/resource/HT002948556> <http://purl.org/ontology/bibo/isbn13> "9780915145539" . <http://lobid.org/resource/HT002948556><http://purl.org/dc/elements/1.1/creator><http://d-nb.info/gnd/135539897> . Building a High Performance Environment for RDF Publishing 8
  • 9. ing is for Consuming Publish Mandatory => basic LOD publishing is very simple: you just need a Webserver Building a High Performance Environment for RDF Publishing 9
  • 10. ing is for Consuming Publish Nice to have • Dumps • Content Negotiation (different RDF serializations) • SPARQL • Human readable representation (best: RDFa in HTML) • Data searchable • Timely updates • High Availability • Versioning • Web developers want simple APIs providing JSON • ... Building a High Performance Environment for RDF Publishing 10
  • 11. ing is for Consuming Publish SPARQL Endpoint • (Dumps) • Content Negotiation (different RDF serializations) • SPARQL • Human readable representation (best: RDFa in HTML) • Data searchable • Timely updates • High Availability • Versioning • Web developers want simple APIs providing JSON • ... Building a High Performance Environment for RDF Publishing 11
  • 12. ing is for Consuming Publish SPARQL Endpoint • (Dumps): but may be painfully slow when having lots of data • Content Negotiation (different RDF serializations) • SPARQL • Human readable representation (best: RDFa in HTML) • (Data searchable) : maybe painfully slow • Timely updates • High Availability • Versioning • Web developers want simple APIs providing JSON • most triple stores provides JSON/RDF • Simple powerful API : too powerful/complex ? • ... Building a High Performance Environment for RDF Publishing 12
  • 13. ing is for Consuming Publish Nice to have In principle, web developers already got simple APIs : LOD is the API ! Building a High Performance Environment for RDF Publishing 13
  • 14. ing is for Consuming Publish Nice to have In principle, web developers already got simple APIs : Remember: Building a High Performance Environment for RDF Publishing 14
  • 15. ing is for Consuming Publish Mandatory A resource: gets a dereferenceable URI: which provides the data (in RDF): <http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/title> "With reference to reference" . <http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/issued> "1983" . <http://lobid.org/resource/HT002948556> <http://purl.org/ontology/bibo/isbn13> "9780915145539" . <http://lobid.org/resource/HT002948556><http://purl.org/dc/elements/1.1/creator><http://d-nb.info/gnd/135539897> . Building a High Performance Environment for RDF Publishing 15
  • 16. ing is for Consuming Publish Nice to have In principle, web developers already got powerful APIs : RESTful SPARQL Building a High Performance Environment for RDF Publishing 16
  • 17. ing is for Consuming Publish RESTful SPARQL example getting all data of all resources having a particular ISBN: curl -H "Accept: application/json" --data-urlencode 'query= prefix bibo: <http://purl.org/ontology/bibo/> SELECT * WHERE { ?s bibo:isbn13 "9780851706238" ; ?p ?o . } LIMIT 100 ' http://lobid.org/sparql/ Building a High Performance Environment for RDF Publishing 17
  • 18. ing is for Consuming Publish Nice to have Building a High Performance Environment for RDF Publishing 18
  • 19. ing is for Consuming Publish RESTful SPARQL example … and the JSON/RDF result: { "head": { "vars": [ "s", "p","o"] }, "results": { "bindings": [ { "o": { "type": "uri", "value": "http://openlibrary.org/works/OL2109573W" }, "p": { "type": "uri", "value": "http://rdvocab.info/RDARelationshipsWEMI/workManifested" }, "s": { "type": "uri", "value": "http://lobid.org/resource/HT007824357" } }, { "o": { ... Building a High Performance Environment for RDF Publishing 19
  • 20. ing is for Consuming Publish Nice to have As it is, web developers don't like SPARQL web developer Building a High Performance Environment for RDF Publishing 20
  • 21. ing is for Consuming Publish Nice to have Web developers want APIs like: http://lobid.org/resources/api/isbn/$isbn Building a High Performance Environment for RDF Publishing 21
  • 23. Overview Publishing is for Consuming • Mandatory • Nice to have Story so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the data Publishing RDF through elasticsearch • Benefits • Some more details • Caveats Future prospects Building a High Performance Environment for RDF Publishing 23
  • 24. What is l obid.org ? lobid.org Building a High Performance Environment for RDF Publishing 24
  • 25. What is l obid.org ? ● lobid := linking open bibliographic data ● LOD services of the hbz ● lobid-resources : ● exposes 85% of the hbz cooperative catalogue ● entries coming from > 200 scientific German libraries ● ~ 16 M records with 700 M triples ● with links to ~ 5 M other resources ● with links to ~ 32 M items (consisting of 300 M triples) ● lobid-organisations : ● exposes German Sigelverzeichnis and MARC-Isil directory ● ~ 40 k descriptions of institutions Building a High Performance Environment for RDF Publishing 25
  • 26. What is lobid.org ? What's missing? • Dumps • Content Negotiation (different RDF serializations) • SPARQL • Human readable representation ( RDFa in HTML) • Data searchable • Timely updates • High Availability • Versioning • Web developers want simple APIs providing JSON • ... Building a High Performance Environment for RDF Publishing 26
  • 27. Overview Publishing is for Consuming • Mandatory • Nice to have Story so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the data Publishing RDF through elasticsearch • Benefits • Some more details • Caveats Future prospects Building a High Performance Environment for RDF Publishing 27
  • 28. storing the data 2010 - 2011, lobid-organisation Filesystem : + easy to maintain + reliable + fast - no search - no SPARQL - ... Building a High Performance Environment for RDF Publishing 28
  • 29. storing the data lobid today Triple Store (4store) : + power of SPARQL +/- depending on the query: fast to horribly slow +/- search (but string searches often slow and limited) - sometimes gets stuck ! Building a High Performance Environment for RDF Publishing 29
  • 30. storing the data lobid today Search engine (elasticsearch): + fast search + stemming, linguistics … + wildcard searching + facets + geo search + JSON + schema-less + simple RESTful API + many plugins + ... + easy to achieve High Availability + scales nicely Building a High Performance Environment for RDF Publishing 30
  • 32. Overview Publishing is for Consuming • Mandatory • Nice to have Story so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the data Publishing RDF through elasticsearch • Benefits • Some more details • Caveats Future prospects Building a High Performance Environment for RDF Publishing 32
  • 33. getting the data lobid : technology/dependency stack Webapp Search Engine Triple Store Building a High Performance Environment for RDF Publishing 33
  • 34. getting the data lobid : technology/dependency stack Webapp we can do that Search Engine highly available ! Triple Store sometimes gets stuck! Building a High Performance Environment for RDF Publishing 34
  • 35. getting the data lobid : technology/dependency stack Webapp we can do that Search Engine < highly available ! Triple Store = sometimes gets stuck! Building a High Performance Environment for RDF Publishing 35
  • 36. getting the data lobid : technology/dependency stack Webapp we can do that Search Engine highly available ! Triple Store sometimes gets stuck! Building a High Performance Environment for RDF Publishing 36
  • 37. oring/getting the data st Variant 1 : technology/dependency stack Triple Store Triple Store Closed, internal. Will be safe from malign queries. For external access. Sometimes gets stuck! Building a High Performance Environment for RDF Publishing 37
  • 38. oring/getting the data st Variant 1 : technology/dependency stack redundant, complex … Triple Store Triple Store Closed, internal. Will be safe from malign queries. For external access. Sometimes gets stuck! Building a High Performance Environment for RDF Publishing 38
  • 39. Overview Publishing is for Consuming • Mandatory • Nice to have Story so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the data Publishing RDF through elasticsearch Benefits • • Some more details • Caveats Future prospects Building a High Performance Environment for RDF Publishing 39
  • 40. D with elasticsearch Publishing LO Variant 2: technology/dependency stack Webapp we can do that Search Engine highly available ! LOD basis functionality (and some other Triple Store For external access and some APIs) are highly available fancy nice-to-have stuff. Sometimes gets stuck! Building a High Performance Environment for RDF Publishing 40
  • 41. D with elasticsearch Publishing LO Benefits • Dumps • Content Negotiation (different RDF serializations) • SPARQL • Human readable representation ( RDFa in HTML) • Data searchable • Near Real Time updates • High Availability • (Versioning) • Web developers want simple APIs returning JSON • ... Building a High Performance Environment for RDF Publishing 41
  • 42. D with elasticsearch Publishing LO Benefits fast, scalable search engine Building a High Performance Environment for RDF Publishing 42
  • 43. D with elasticsearch Publishing LO performance test Data: 10 M records <=> 300 M triple Case-insensitive query: „beach“ SELECT ?s WHERE { ?s <http://purl.org/dc/terms/title> ?o FILTER regex(str(?o), "beach", "i") } #### => SPARQL execution time for Q8316: 108.7s, returned 2815 rows. http://$ip:9200/_search?q=beach&from=0&size=2800 # => Elasticsearch needed 0.4s => Elasticsearch is 250 times faster Building a High Performance Environment for RDF Publishing 43
  • 44. D with elasticsearch Publishing LO performance test (there is a support for text indexing in 4store, have not tested that.) Building a High Performance Environment for RDF Publishing 44
  • 45. D with elasticsearch Publishing LO performance test Elasticsearch: 18 M records , 6 GB RAM: 5 hour 4store: 1 B triples, having 72 GB RAM: 7 hours CPU: Quad Core mit 2.4 GhZ und Hyperthreading => 8 CPUs HD: 6 x 2.5" 10k U/min a 146GB (Don't take benchmarks too seriously – they just give a clue !) Building a High Performance Environment for RDF Publishing 45
  • 46. D with elasticsearch Publishing LO Benefits • Dumps • Content Negotiation (different RDF serializations) • SPARQL • Human readable representation ( RDFa in HTML) • Data searchable • Near Real Time updates • High Availability • (Versioning) • Web developers want simple APIs providing JSON • ... Building a High Performance Environment for RDF Publishing 46
  • 47. D with elasticsearch Publishing LO Benefits build to be easily made highly available ! Building a High Performance Environment for RDF Publishing 47
  • 48. D with elasticsearch Publishing LO Benefits • Dumps • Content Negotiation (different RDF serializations) • SPARQL • Human readable representation ( RDFa in HTML) • Data searchable • Near Real Time updates • High Availability • (Versioning) • Web developers want simple APIs providing JSON • ... Building a High Performance Environment for RDF Publishing 48
  • 49. D with elasticsearch Publishing LO Benefits Versioning with elasticsearch: Not out-of-the-box, but comes at least e.g. with * concurrency control * documents have a version number => implementing versioning is not hard Building a High Performance Environment for RDF Publishing 49
  • 50. D with elasticsearch Publishing LO Benefits, relying on elasticsearch as basic LOD storage • Dumps • Content Negotiation (different RDF serializations) • SPARQL • Human readable representation ( RDFa in HTML) • Data searchable • Near Real Time updates • High Availability • Versionizing • Web developers want: • JSON (LD) • Simple APIs • ... Building a High Performance Environment for RDF Publishing 50
  • 51. D with elasticsearch Publishing LO Benefits • Dumps • Content Negotiation (different RDF serializations) • SPARQL • Human readable representation ( RDFa in HTML) • Data searchable • Near Real Time updates • High Availability • (Versioning) • Web developers want simple APIs providing JSON • ... Building a High Performance Environment for RDF Publishing 51
  • 52. D with elasticsearch Publishing LO Why JSON-LD? JSON is : • stored natively by many tools (e.g. elasticsearch) • loved by consumers (web developers) JSON-LD is : • supported by RDF libraries (e.g. transforming to NTriples) Building a High Performance Environment for RDF Publishing 52
  • 53. D with elasticsearch Publishing LO Benefits • Dumps • Content Negotiation (different RDF serializations) • SPARQL • Human readable representation ( RDFa in HTML) • Data searchable • Near Real Time updates • High Availability • (Versioning) • Web developers want simple APIs providing JSON • ... Building a High Performance Environment for RDF Publishing 53
  • 54. D with elasticsearch Publishing LO Benefits RESTful elasticsearch API, e. g. : http://lobid.org/resources/_search?q=isbn:$isbn Building a High Performance Environment for RDF Publishing 54
  • 55. D with elasticsearch Publishing LO Benefits • … and many other nice things come with elasticsearch • geo-search : „Query only libraries/items residing up to 10 km from me.“ • … Building a High Performance Environment for RDF Publishing 55
  • 56. D with elasticsearch Publishing LO Benefits • Dumps Content Negotiation (different RDF serializations) d! • • SPARQL plishe • accom Human readable representation ( RDFa in HTML) • • Data searchable Mis sion Near Real Time updates • High Availability • (Versioning) • Web developers want simple APIs providing JSON • ... Building a High Performance Environment for RDF Publishing 56
  • 57. D with elasticsearch Publishing LO ( … ok, something is left to be done ! ) • Dumps • Content Negotiation (different RDF serializations) • SPARQL • Human readable representation ( RDFa in HTML) • Data searchable • Near Real Time updates • High Availability • Versionizing • Web developers want simple APIs providing JSON • ... Building a High Performance Environment for RDF Publishing 57
  • 58. Overview Publishing is for Consuming • Mandatory • Nice to have Story so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the data Publishing RDF through elasticsearch • Benefits • Caveats • Auto suggest demo Conclusion Building a High Performance Environment for RDF Publishing 58
  • 59. D with elasticsearch Publishing LO Caveats • Dumps • Content Negotiation (different RDF serializations) • SPARQL • Human readable representation ( RDFa in HTML) • Data searchable • Near Real Time updates • • High Availability Versionizing !? • Web developers want simple APIs providing JSON • ... Building a High Performance Environment for RDF Publishing 59
  • 60. D with elasticsearch Publishing LO Caveats How to integrate semantic search into a document storage ? dct:contributor --------> dct:creator -------> dc:creator ---------> dc:contributor --------> bibo:translator … There is no inferencing as comes with SPARQL ! Building a High Performance Environment for RDF Publishing 60
  • 61. D with elasticsearch Publishing LO Caveats Our data flow : from records to RDF triples to records Building a High Performance Environment for RDF Publishing 61
  • 62. D with elasticsearch Publishing LO Caveats Our data flow : from records to RDF triples to records Building a High Performance Environment for RDF Publishing 62
  • 63. D with elasticsearch Publishing LO Caveats from records to RDF triples to records !? Building a High Performance Environment for RDF Publishing 63
  • 64. D with elasticsearch Publishing LO Caveats From records to RDF triples |-----> graph-database '------> computing ---> record-database MARC/MAB/PICA... JSON-LD Building a High Performance Environment for RDF Publishing 64
  • 65. D with elasticsearch Publishing LO Caveats tree-based vs graph-based: Pre-render the whole document? What is the document ? Building a High Performance Environment for RDF Publishing 65
  • 66. D with elasticsearch Publishing LO Caveats Building a High Performance Environment for RDF Publishing 66
  • 67. D with elasticsearch Publishing LO Caveats What is the document ? Only the top-level node ? Building a High Performance Environment for RDF Publishing 67
  • 68. D with elasticsearch Publishing LO Caveats What is the document ? Only the top-level node ? … but then you couldn't even search the authors name ! Building a High Performance Environment for RDF Publishing 68
  • 69. D with elasticsearch Publishing LO Caveats searching needs integration of some fields from subgraphs into the document Building a High Performance Environment for RDF Publishing 69
  • 70. Overview Publishing is for Consuming • Mandatory • Nice to have Story so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the data Publishing RDF through elasticsearch • Benefits • Caveats • Auto suggest demo Conclusion Building a High Performance Environment for RDF Publishing 70
  • 71. D with elasticsearch Publishing LO auto suggest authority IDs must be easily found 71 Building a High Performance Environment for RDF Publishing
  • 72. D with elasticsearch Publishing LO auto suggest authority IDs must be easily found => in need of auto suggest 72 Building a High Performance Environment for RDF Publishing
  • 73. D with elasticsearch Publishing LO auto suggest auto suggests needs fast searching Building a High Performance Environment for RDF Publishing 73
  • 74. D with elasticsearch Publishing LO Demo auto suggest Building a High Performance Environment for RDF Publishing 74
  • 76. D with elasticsearch Publishing LO auto suggest RESTful APIs: http://demo.lobid.org/search?format=short&index=gnd-index&author=Schmidt%2C+Karl http://demo.lobid.org/search?format=page&index=gnd-index&author=Schmidt%2C+Karl http://demo.lobid.org/search?format=full&index=gnd-index&author=Schmidt%2C+Karl … API usage: GET /search?format=<page|full|short>&index=<lobid-index|gnd-index>&author=<query> easy to enhance with the play framework and the elasticsearch API Building a High Performance Environment for RDF Publishing 76
  • 77. D with elasticsearch Publishing LO auto suggest RESTful APIs: http://demo.lobid.org/search ?format=short&index=gnd-index&author=Schmidt%2C+Karl [ "Schmidt, Karl (1894-1945)", "Schmidt, Karl", "Schmidt, Karl (1910-)", "Schmidt, Karl (1846-1928)", "Schmidt, Karl (1913-)", "Schmidt, Karl (1899-)", "Schmidt, Karl (1924-)", "Schmidt, Karl (1836-1888)", "Schmidt, L. F. Karl", "Schmidt, Karl (1902-1945)", "Schmidt, Karl J.", "Schmidt, Karl (1848-1905)", "Schmidt, Karl (1817-1882)", "Schmidt, Karl R.", "Schmidt, Karl (1954-)", "Schmidt, Karl (1888-)", "Schmidt, Karl (1867-)", ... ] Building a High Performance Environment for RDF Publishing 77
  • 78. D with elasticsearch Publishing LO auto suggest GND authority file in lobid-resources Building a High Performance Environment for RDF Publishing 78
  • 80. D with elasticsearch Publishing LO Building a High Performance Environment for RDF Publishing 80
  • 81. Overview Publishing is for Consuming • Mandatory • Nice to have Story so far - experiences with lobid.org • What is lobid.org ? • Storing the data • Getting the data Publishing RDF through elasticsearch • Benefits • Caveats • Auto suggest demo Conclusion Building a High Performance Environment for RDF Publishing 81
  • 82. D with elasticsearch Publishing LO Conclusion a highly customizable/reliable/feature-rich LOD service Webapp we can do that Search Engine highly available ! LOD basis functionality (and some other Triple Store For external access and some APIs) are highly available fancy nice-to-have stuff. Sometimes gets stuck! Building a High Performance Environment for RDF Publishing 82
  • 83. D with elasticsearch Publishing LO the software is Open Source: http://elasticsearch.org/ https://hadoop.apache.org/ http://www.playframework.org/ http://4store.org/ https://github.com/lobid/ Building a High Performance Environment for RDF Publishing 83
  • 84. Any Questions ? Pascal Christoph christoph@hbz-nrw.de semweb@hbz-nrw.de
  • 85. Using a dark background, this presentation saves maybe 70% of energy