SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
FedBench
A Benchmark Suite for
Federated Semantic Data Processing



Michael Schmidt1, Olaf Görlitz2, Peter Haase1, Günter Ladwig3,
Andreas Schwarte1, Thanh Tran3

 1                     2                          3




                   10th Intl. Semantic Web Conference, Oct 26, 2011, Bonn
Linked Data Evaluation Strategies

         Query




        Central
       Repository



RDF       RDF       RDF
Data      Data      Data



      Centralized
Linked Data Processing
Linked Data Evaluation Strategies

         Query                      Query



                              Federation Layer
                                                      Dynamic
        Central                                        HTTP
                            Local   SPARQL   SPARQL
       Repository           Rep.     Endp.    Endp.      Lookups



RDF       RDF       RDF    RDF      RDF      RDF
Data      Data      Data   Data     Data     Data



      Centralized                Federated
Linked Data Processing     Linked Data Processing
Centralized vs. Federated Approaches

Centralized Processing                    Federated Processing
•  Data periodically crawled, gathered,   •  Use of original data sources ensures
   and updated                               that data is always „up-to-date“
•  High reliability and controllability   •  No control over federation members
•  Inflexible set of data sources         •  Ad-hoc integration of remote sources
•  Comprehensive knowledge about data, •  Requires careful optimization, but also
   useful for query optimization          offers opportunities (parallelization)
Centralized vs. Federated Approaches

Centralized Processing                      Federated Processing
•  Data periodically crawled, gathered,     •  Use of original data sources ensures
   and updated                                 that data is always „up-to-date“
•  High reliability and controllability     •  No control over federation members
•  Inflexible set of data sources           •  Ad-hoc integration of remote sources
•  Comprehensive knowledge about data, •  Requires careful optimization, but also
   useful for query optimization          offers opportunities (parallelization)



Key Observations
(1)  Both centralized and federated Linked Data processing have practical use cases
(2)  Radically different requirements, challenges, and characteristics
Benchmarking Linked Data Evaluation
             Query                        Query



                                    Federation Layer
                                                            Dynamic
            Central                                          HTTP
                                  Local   SPARQL   SPARQL
           Repository             Rep.     Endp.    Endp.      Lookups



    RDF       RDF       RDF       RDF     RDF      RDF
    Data      Data      Data      Data    Data     Data



          Centralized                  Federated
    Linked Data Processing       Linked Data Processing


BSBM, LUBM, SP2Bench, ...      So far no benchmarks proposed
Challenges in Federated Linked Data Benchmarking:
      Heterogeneity of Use Cases

Data level                          Query level
¨    (D1) Physical Distribution    ¨    (Q1) Query Language
      ¤    Local vs. remote              ¤    Expressiveness
¨    (D2) Data Access Interfaces         ¤    Complexity
      ¤    Native repository       ¨    (Q2) Result Completeness
      ¤    SPARQL Endpoint         ¨    (Q3) Ranking
      ¤    Linked Data (HTTP)
                                    ¨    Various other characteristics
¨    (D3) Knowledge about Data           ¤    Join types
      Source Existence                    ¤    Result size
¨    (D4) Data Statistics                ¤    ...
Challenges in Federated Linked Data Benchmarking:
      Heterogeneity of Use Cases

Data level                               Query level
¨    (D1) Physical Distribution         ¨    (Q1) Query Language
      ¤    Local vs. remote                   ¤    Expressiveness
¨    (D2) Data Access Interfaces              ¤    Complexity
      ¤    Native repository            ¨    (Q2) Result Completeness
      ¤    SPARQL Endpoint              ¨    (Q3) Ranking
      ¤    Linked Data (HTTP)
                                         ¨    Various other characteristics
¨    (D3) Knowledge about Data                ¤    Join types
      Source Existence                         ¤    Result size
¨    (D4) Data Statistics                     ¤    ...

                  Need for a flexible benchmark suite rather
                  than “one-size-fits-all“ benchmark scenario!
FedBench Components (ctd)

Data Sets

•  Vary in structuredness,
   domain, size, etc.
•  Grouped in collections
Data Collections

               Cross-Domain Collection




Life Science Collection       SP2Bench Data Collection

                              •    Synthetic Data
                              •    Split into sub-datasets
                                   according to types
FedBench Components (ctd)

Data Sets                    Queries

•  Vary in structuredness,   •  Operate on the data
   domain, size, etc.           collections
•  Grouped in collections    •  Logically grouped
Example Query

List all US presidents including their party and associated news.



    SELECT ?pres ?party ?page
    WHERE {
        ?pres rdf:type dbpedia-owl:President .
        ?pres dbpedia-owl:nationality dbpedia:United_States .
        ?pres dbpedia-owl:party ?party .
        ?x nytimes:topicPage ?page .
        ?x owl:sameAs ?pres
    }
Queries

¨    Partially taken from prototype systems, partially designed
      to capture challenges in federated query processing
¨    Four sets of queries
      ¤    Life Science
            n    Life Science query set (full SPARQL): 7 queries (LS)
      ¤    Cross Domain
            n  Cross Domain query set (full SPARQL): 7 queries (CD)
            n  Linked Data query set (BGPs): 11 queries (LD)
      ¤    SP2Bench
            n    SP2Bench query set (full SPARQL): 14 queries (SP)
¨    Focus on different functional aspects
      ¤  General federated query processing requirements
      ¤  Pure Linked Data processing
Queries




Operators:          A – AND, U – UNION, O – OPTIONAL, F – FILTER
Solution Modifiers: Or – ORDER BY, D – DISTINCT, L – LIMIT, Of – OFFSET
Queries
FedBench Components (ctd)

  Data Sets                       Queries

  •  Vary in structuredness,      •  Operate on the data
     domain, size, etc.              collections
  •  Grouped in collections       •  Logically grouped
 Benchmark Driver
•  Allows to execute FedBench in a unified way
•  Java, Open Source à easily adjustable and extensible
Evaluation Framework
¨  Parametrizable benchmark driver
¨  Implemented in Java using the Sesame framework

¨  Highly customizable via config files

      ¤  Data and query sets
      ¤  Number of runs, timeouts
      ¤  Deployment method of data sets

      ¤  Metrics (loading time, evaluation time, #requests)

¨    Highly extendable, which makes it easy to connect
      new systems on demand
FedBench Components (ctd)

  Data Sets                       Queries

  •  Vary in structuredness,      •  Operate on the data
     domain, size, etc.              collections
  •  Grouped in collections       •  Logically grouped
 Benchmark Driver
•  Allows to execute FedBench in a unified way
•  Java, Open Source à easily adjustable and extensible
            Benchmark
            Results

CSV   RDF
FedBench Components (ctd)

  Data Sets                       Queries

  •  Vary in structuredness,      •  Operate on the data
     domain, size, etc.              collections
  •  Grouped in collections       •  Logically grouped
 Benchmark Driver
•  Allows to execute FedBench in a unified way
•  Java, Open Source à easily adjustable and extensible
            Benchmark
            Results                    •  Wiki-based platform for
                                          Linked Data
CSV   RDF                              •  Publishing and discussion of
                 Publishing
                                          benchmark results
Evaluation
¨    Goal: prove practicability & flexibility of benchmark
      ¤  Cover  a variety of scenarios
      ¤  Assess first state-of-the-art results
      ¤  Identify weaknesses and strengths of systems
¨    Measures
      ¤  Queryevaluation time
      ¤  Number of requests sent to remote sources
¨    Hardware
      ¤  ILO2 HP server ProLiant DL360
      ¤  4Core CPU with 2000MHz
      ¤  64bit Windows Server 2008, running 64bit JVM 1.6.0_22
      ¤  32GB RAM (20GB for federation mediator, rest distributed
          among federation members)
Evaluation: Scenario A

¨    “Centralized vs. Federated“ query processing
      ¤  Scenario   A1: Centralized processing
        n  Sesame   2.3.1
      ¤  Scenario   A2: Local federation
        n  Sesame   2.3.1 + AliBaba
      ¤  Scenario   A3: SPARQL Endpoint federation (HTTP)
        n  Sesame 2.3.1. + AliBaba
        n  SPLENDID from WeST

¨  10min timeout per query
¨  Average over three runs (after warm-up phase)
Scenario A: Life Science Queries
                 Data size: 50M triples in total




#Requests to Endpoints            LS1       LS2    LS3       LS4      LS5    LS6      LS7
Endpoint Federation (AliBaba)        13       61    (410)      21k     17k    (130)    (876)
Endpoint Federation (SPLENDID)          2     49         9         10 4778     322     4889
Evaluation: Scenario B
   ¨    Scenario B: Linked Data query set on CD collection
         ¤  Bottom-upapproach
         ¤  Top-down approach
         ¤  Mixed approach

   ¨  Local CumulusRDF Linked Data server
   ¨  Systems: dedicated prototype implementations*

   ¨  Major findings

         ¤  Top-down   approach most performant
         ¤  Mixed approach competitive, bringing the merits of
             earlier result reporting
* G. Ladwig, T. Tran: Linked Data Query Processing Strategies. In Proc. ISWC, 2010.
Summary: Central Findings

   ¨  Effective join ordering often impossible when no
       intelligent source selection strategy is given
   ¨  In such cases: often very high number of requests

       (104+) caused by iterative, nested-loop evaluation
       strategy of AliBaba
   ¨  Limited capabilities of Sesame to deal with
       parallelization cause problems (locking issues)

In the following talk:
FedX – a federated query processing system that tackles these issues!
Conclusion

¨  Benchmark flexible enough to cover a wide range
    of semantic data use cases/applications
¨  Evaluation reveals severe deficiencies of today‘s
    approaches
¨  Upcoming tasks/future work
      ¤  General   SPARQL 1.1 extensions
      ¤  SPARQL 1.1 federation extensions
      ¤  Distributed reasoning

¨    Laid out as community project: you are invited to
      contribute with your own data & queries!
Questions ?




    http://code.google.com/p/fbench/

Contenu connexe

Tendances

GCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native ArchitecturesGCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native Architecturesnine
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Hubert Fan Chiang
 
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalApache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalDatabricks
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming宇 傅
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureDataWorks Summit
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark Summit
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Databricks
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversitySpark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversityAlex Zeltov
 
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMaintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMapR Technologies
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.
 

Tendances (20)

GCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native ArchitecturesGCP Meetup #3 - Approaches to Cloud Native Architectures
GCP Meetup #3 - Approaches to Cloud Native Architectures
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan AgrawalApache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
Apache Spark Based Reliable Data Ingestion in Datalake with Gagan Agrawal
 
Spark and Spark Streaming
Spark and Spark StreamingSpark and Spark Streaming
Spark and Spark Streaming
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Hadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and FutureHadoop Infrastructure @Uber Past, Present and Future
Hadoop Infrastructure @Uber Past, Present and Future
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
Spark-on-Yarn: The Road Ahead-(Marcelo Vanzin, Cloudera)
 
Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...Designing and Building Next Generation Data Pipelines at Scale with Structure...
Designing and Building Next Generation Data Pipelines at Scale with Structure...
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
 
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversitySpark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
 
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMaintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 

Similaire à Fedbench - A Benchmark Suite for Federated Semantic Data Processing

No SQL : Which way to go? Presented at DDDMelbourne 2015
No SQL : Which way to go?  Presented at DDDMelbourne 2015No SQL : Which way to go?  Presented at DDDMelbourne 2015
No SQL : Which way to go? Presented at DDDMelbourne 2015Himanshu Desai
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sqlaftab alam
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillDataWorks Summit
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchMapR Technologies
 
The Performance of MapReduce: An In-depth Study
The Performance of MapReduce: An In-depth StudyThe Performance of MapReduce: An In-depth Study
The Performance of MapReduce: An In-depth StudyKevin Tong
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreUri Laserson
 
ModeShape 3 overview
ModeShape 3 overviewModeShape 3 overview
ModeShape 3 overviewRandall Hauch
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Minerva: Drill Storage Plugin for IPFS
Minerva: Drill Storage Plugin for IPFSMinerva: Drill Storage Plugin for IPFS
Minerva: Drill Storage Plugin for IPFSBowenDing4
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysDemi Ben-Ari
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
high_level_parallel_processing_model
high_level_parallel_processing_modelhigh_level_parallel_processing_model
high_level_parallel_processing_modelMingliang Sun
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 

Similaire à Fedbench - A Benchmark Suite for Federated Semantic Data Processing (20)

No SQL : Which way to go? Presented at DDDMelbourne 2015
No SQL : Which way to go?  Presented at DDDMelbourne 2015No SQL : Which way to go?  Presented at DDDMelbourne 2015
No SQL : Which way to go? Presented at DDDMelbourne 2015
 
NoSQL, which way to go?
NoSQL, which way to go?NoSQL, which way to go?
NoSQL, which way to go?
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
 
The Performance of MapReduce: An In-depth Study
The Performance of MapReduce: An In-depth StudyThe Performance of MapReduce: An In-depth Study
The Performance of MapReduce: An In-depth Study
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant Store
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
ModeShape 3 overview
ModeShape 3 overviewModeShape 3 overview
ModeShape 3 overview
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Minerva: Drill Storage Plugin for IPFS
Minerva: Drill Storage Plugin for IPFSMinerva: Drill Storage Plugin for IPFS
Minerva: Drill Storage Plugin for IPFS
 
Apache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - PanoraysApache Spark 101 - Demi Ben-Ari - Panorays
Apache Spark 101 - Demi Ben-Ari - Panorays
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
high_level_parallel_processing_model
high_level_parallel_processing_modelhigh_level_parallel_processing_model
high_level_parallel_processing_model
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 

Plus de Peter Haase

Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryPeter Haase
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsPeter Haase
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationPeter Haase
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudPeter Haase
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsPeter Haase
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
 
Smart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge GraphSmart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge GraphPeter Haase
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsPeter Haase
 
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked DataMapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked DataPeter Haase
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterpriseThe Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterprisePeter Haase
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic TechnologiesPeter Haase
 
Linked Data as a Service
Linked Data as a ServiceLinked Data as a Service
Linked Data as a ServicePeter Haase
 
Everything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchEverything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchPeter Haase
 
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...Peter Haase
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentPeter Haase
 
Semantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementSemantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementPeter Haase
 

Plus de Peter Haase (16)

Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
 
Ephedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federationEphedra: efficiently combining RDF data and services using SPARQL federation
Ephedra: efficiently combining RDF data and services using SPARQL federation
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Smart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge GraphSmart Data Applications powered by the Wikidata Knowledge Graph
Smart Data Applications powered by the Wikidata Knowledge Graph
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked DataMapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterpriseThe Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic Technologies
 
Linked Data as a Service
Linked Data as a ServiceLinked Data as a Service
Linked Data as a Service
 
Everything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchEverything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information Workbench
 
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
 
Semantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementSemantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud Management
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Fedbench - A Benchmark Suite for Federated Semantic Data Processing

  • 1. FedBench A Benchmark Suite for Federated Semantic Data Processing Michael Schmidt1, Olaf Görlitz2, Peter Haase1, Günter Ladwig3, Andreas Schwarte1, Thanh Tran3 1 2 3 10th Intl. Semantic Web Conference, Oct 26, 2011, Bonn
  • 2. Linked Data Evaluation Strategies Query Central Repository RDF RDF RDF Data Data Data Centralized Linked Data Processing
  • 3. Linked Data Evaluation Strategies Query Query Federation Layer Dynamic Central HTTP Local SPARQL SPARQL Repository Rep. Endp. Endp. Lookups RDF RDF RDF RDF RDF RDF Data Data Data Data Data Data Centralized Federated Linked Data Processing Linked Data Processing
  • 4. Centralized vs. Federated Approaches Centralized Processing Federated Processing •  Data periodically crawled, gathered, •  Use of original data sources ensures and updated that data is always „up-to-date“ •  High reliability and controllability •  No control over federation members •  Inflexible set of data sources •  Ad-hoc integration of remote sources •  Comprehensive knowledge about data, •  Requires careful optimization, but also useful for query optimization offers opportunities (parallelization)
  • 5. Centralized vs. Federated Approaches Centralized Processing Federated Processing •  Data periodically crawled, gathered, •  Use of original data sources ensures and updated that data is always „up-to-date“ •  High reliability and controllability •  No control over federation members •  Inflexible set of data sources •  Ad-hoc integration of remote sources •  Comprehensive knowledge about data, •  Requires careful optimization, but also useful for query optimization offers opportunities (parallelization) Key Observations (1)  Both centralized and federated Linked Data processing have practical use cases (2)  Radically different requirements, challenges, and characteristics
  • 6. Benchmarking Linked Data Evaluation Query Query Federation Layer Dynamic Central HTTP Local SPARQL SPARQL Repository Rep. Endp. Endp. Lookups RDF RDF RDF RDF RDF RDF Data Data Data Data Data Data Centralized Federated Linked Data Processing Linked Data Processing BSBM, LUBM, SP2Bench, ... So far no benchmarks proposed
  • 7. Challenges in Federated Linked Data Benchmarking: Heterogeneity of Use Cases Data level Query level ¨  (D1) Physical Distribution ¨  (Q1) Query Language ¤  Local vs. remote ¤  Expressiveness ¨  (D2) Data Access Interfaces ¤  Complexity ¤  Native repository ¨  (Q2) Result Completeness ¤  SPARQL Endpoint ¨  (Q3) Ranking ¤  Linked Data (HTTP) ¨  Various other characteristics ¨  (D3) Knowledge about Data ¤  Join types Source Existence ¤  Result size ¨  (D4) Data Statistics ¤  ...
  • 8. Challenges in Federated Linked Data Benchmarking: Heterogeneity of Use Cases Data level Query level ¨  (D1) Physical Distribution ¨  (Q1) Query Language ¤  Local vs. remote ¤  Expressiveness ¨  (D2) Data Access Interfaces ¤  Complexity ¤  Native repository ¨  (Q2) Result Completeness ¤  SPARQL Endpoint ¨  (Q3) Ranking ¤  Linked Data (HTTP) ¨  Various other characteristics ¨  (D3) Knowledge about Data ¤  Join types Source Existence ¤  Result size ¨  (D4) Data Statistics ¤  ... Need for a flexible benchmark suite rather than “one-size-fits-all“ benchmark scenario!
  • 9. FedBench Components (ctd) Data Sets •  Vary in structuredness, domain, size, etc. •  Grouped in collections
  • 10. Data Collections Cross-Domain Collection Life Science Collection SP2Bench Data Collection •  Synthetic Data •  Split into sub-datasets according to types
  • 11. FedBench Components (ctd) Data Sets Queries •  Vary in structuredness, •  Operate on the data domain, size, etc. collections •  Grouped in collections •  Logically grouped
  • 12. Example Query List all US presidents including their party and associated news. SELECT ?pres ?party ?page WHERE { ?pres rdf:type dbpedia-owl:President . ?pres dbpedia-owl:nationality dbpedia:United_States . ?pres dbpedia-owl:party ?party . ?x nytimes:topicPage ?page . ?x owl:sameAs ?pres }
  • 13. Queries ¨  Partially taken from prototype systems, partially designed to capture challenges in federated query processing ¨  Four sets of queries ¤  Life Science n  Life Science query set (full SPARQL): 7 queries (LS) ¤  Cross Domain n  Cross Domain query set (full SPARQL): 7 queries (CD) n  Linked Data query set (BGPs): 11 queries (LD) ¤  SP2Bench n  SP2Bench query set (full SPARQL): 14 queries (SP) ¨  Focus on different functional aspects ¤  General federated query processing requirements ¤  Pure Linked Data processing
  • 14. Queries Operators: A – AND, U – UNION, O – OPTIONAL, F – FILTER Solution Modifiers: Or – ORDER BY, D – DISTINCT, L – LIMIT, Of – OFFSET
  • 16. FedBench Components (ctd) Data Sets Queries •  Vary in structuredness, •  Operate on the data domain, size, etc. collections •  Grouped in collections •  Logically grouped Benchmark Driver •  Allows to execute FedBench in a unified way •  Java, Open Source à easily adjustable and extensible
  • 17. Evaluation Framework ¨  Parametrizable benchmark driver ¨  Implemented in Java using the Sesame framework ¨  Highly customizable via config files ¤  Data and query sets ¤  Number of runs, timeouts ¤  Deployment method of data sets ¤  Metrics (loading time, evaluation time, #requests) ¨  Highly extendable, which makes it easy to connect new systems on demand
  • 18. FedBench Components (ctd) Data Sets Queries •  Vary in structuredness, •  Operate on the data domain, size, etc. collections •  Grouped in collections •  Logically grouped Benchmark Driver •  Allows to execute FedBench in a unified way •  Java, Open Source à easily adjustable and extensible Benchmark Results CSV RDF
  • 19. FedBench Components (ctd) Data Sets Queries •  Vary in structuredness, •  Operate on the data domain, size, etc. collections •  Grouped in collections •  Logically grouped Benchmark Driver •  Allows to execute FedBench in a unified way •  Java, Open Source à easily adjustable and extensible Benchmark Results •  Wiki-based platform for Linked Data CSV RDF •  Publishing and discussion of Publishing benchmark results
  • 20. Evaluation ¨  Goal: prove practicability & flexibility of benchmark ¤  Cover a variety of scenarios ¤  Assess first state-of-the-art results ¤  Identify weaknesses and strengths of systems ¨  Measures ¤  Queryevaluation time ¤  Number of requests sent to remote sources ¨  Hardware ¤  ILO2 HP server ProLiant DL360 ¤  4Core CPU with 2000MHz ¤  64bit Windows Server 2008, running 64bit JVM 1.6.0_22 ¤  32GB RAM (20GB for federation mediator, rest distributed among federation members)
  • 21. Evaluation: Scenario A ¨  “Centralized vs. Federated“ query processing ¤  Scenario A1: Centralized processing n  Sesame 2.3.1 ¤  Scenario A2: Local federation n  Sesame 2.3.1 + AliBaba ¤  Scenario A3: SPARQL Endpoint federation (HTTP) n  Sesame 2.3.1. + AliBaba n  SPLENDID from WeST ¨  10min timeout per query ¨  Average over three runs (after warm-up phase)
  • 22. Scenario A: Life Science Queries Data size: 50M triples in total #Requests to Endpoints LS1 LS2 LS3 LS4 LS5 LS6 LS7 Endpoint Federation (AliBaba) 13 61 (410) 21k 17k (130) (876) Endpoint Federation (SPLENDID) 2 49 9 10 4778 322 4889
  • 23. Evaluation: Scenario B ¨  Scenario B: Linked Data query set on CD collection ¤  Bottom-upapproach ¤  Top-down approach ¤  Mixed approach ¨  Local CumulusRDF Linked Data server ¨  Systems: dedicated prototype implementations* ¨  Major findings ¤  Top-down approach most performant ¤  Mixed approach competitive, bringing the merits of earlier result reporting * G. Ladwig, T. Tran: Linked Data Query Processing Strategies. In Proc. ISWC, 2010.
  • 24. Summary: Central Findings ¨  Effective join ordering often impossible when no intelligent source selection strategy is given ¨  In such cases: often very high number of requests (104+) caused by iterative, nested-loop evaluation strategy of AliBaba ¨  Limited capabilities of Sesame to deal with parallelization cause problems (locking issues) In the following talk: FedX – a federated query processing system that tackles these issues!
  • 25. Conclusion ¨  Benchmark flexible enough to cover a wide range of semantic data use cases/applications ¨  Evaluation reveals severe deficiencies of today‘s approaches ¨  Upcoming tasks/future work ¤  General SPARQL 1.1 extensions ¤  SPARQL 1.1 federation extensions ¤  Distributed reasoning ¨  Laid out as community project: you are invited to contribute with your own data & queries!
  • 26. Questions ? http://code.google.com/p/fbench/