SlideShare une entreprise Scribd logo
1  sur  31
International Semantic Web Conference ISWC 2012




      SRBench: A Streaming
     RDF/SPARQL Benchmark

Ying Zhang1, Pham Minh Duc1, Oscar Corcho2, Jean-Paul Calbimonte2
                 1Centrum    Wiskunde & Informatica, Amsterdam
         2Ontology   Engineering Group, Universidad Politécnica de Madrid.



                               jp.calbimonte@upm.es




                                                                             Date: 14/11/2012
Streaming Data


                            Data streams

                            Timestamped tuples


                            Time windows



                            Continuous evaluation

e.g. Data Stream Management Systems (DSMS)

                     2
Sensor Web




“too much (streaming) data but not enough (tools to
gain and derive) knowledge”*




                            3
Semantic Web Tech for Streaming Data

Why?
 Annotate sensor data with semantic metadata

 Apply Linked Data principles to publish streaming data

 Interlink streaming data with existing datasets

 Integrate data stream processing + reasoning

 Raise the query abstraction level with ontologies




                          4
Semantic Sensor Web
   “too much (streaming) data but not enough (tools to
   gain and derive) knowledge”*
                                                                                                  LinkedSensorData
Sensor data publishing

Linked Data                                  LSM

                                                                 Sense2Web


Semantic sensor metadata                             Sensor APIs


                                                                                       ETALIS
Semantic Sensor Network ontology                                                                    Videk
                                                                             SwissEx


                                                                                            BOTTARI,
                                                                                            UrbanMatch
                                                 AEMET
                                                 transporte.
                                                 linkeddata.es



                                                                                                         SSN +CEP




  * Sheth et al. 2008, Semantic Sensor Web
                                             5
Semantic Sensor Web
         Querying semantic streaming data


RDF Streams
                         CQELS


SPARQL extensions
                                                            Streaming
                                                            SPARQL


                                                     EP-SPARQL


SPARQL-based continuous query processors

                                                        C-SPARQL

                                 SPARQL-Stream




                             6
Approaches


Extend RDF for streaming data
                                     ~Similarities
Extend SPARQL for streaming RDF



Apply reasoning on streaming RDF

Query rewriting to DSMS or CEP        Divergence
Logic-programming based query evaluation

RDF Streaming engine from scratch



                          7
RDF Streaming Processing Challenges


               How to specify queries?

          Standard query language extensions
not now

               How to compare systems?

          Streaming RDF/SPARQL benckmarks




                          8
SRBench

      First benchmark for streaming RDF engines


Streaming RDF/SPARQL benchmark

Assess engines abilities of dealing with streaming data

Based on real-world datasets

Functional evaluation     missing?

                          crucial?

                          distinctive?


                           9
Existing Benchmarks?
   Linear Road Benchmark
        relational-based model ≠ RDF graph model
        no interlinking data with other datasets
        no reasoning
   RDF /SPARQL benchmarks
        • LUBM, BSBM, SP2Bench, ...
               meant for stored data
               one off-queries
               single static pre-generated dataset
               do not exploit LOD datasets
               no SPARQL 1.1 features*, no reasoning


* Now the BSBM BI use case includes aggregates
                                                 10
SRBench Challenges


                                   relevant    realistic
Proper benchmark dataset          semantically valid
                                    interlinkable


                                  time-bounded       summarization
A concise set of features         continuous     data abstraction
                                   contextual data




                                    descriptive definition
No standard query language         C-SPARQL      CQELS
                                     SPARQL-Stream


                             11
SRBench Datasets
                                              LinkedSensorData
                                       real-world U.S. weather data1
                                       first & largest sensor dataset in LOD


    LinkedSensorMetadata                                                            LinkedObservationData
 ~20k US weather stations, ~100k sensors                                     hurricane & blizzard observations in US
 links to locations in GeoNames nearby                                       ~1.73 billion RDF triples
                                                                             ~159 million observations

                       Name      Storm Type    Date                      #Triples      #Observations   Data size
                       Bill      Hurricane     Aug. 17 – 22, 2009        231,021,108   21,272,790      ~15 GB
                       Ike       Hurricane     Sep. 01 – 13, 2008        374,094,660   34,430,964      ~34 GB
                       Gustav    Hurricane     Aug. 25 – 31, 2008        258,378,511   23,792,818      ~17 GB
                       Bertha    Hurricane     Jul. 06 – 17, 2008        278,235,734   25,762,568      ~13 GB
                       Wilma     Hurricane     Oct. 17 – 23, 2005        171,854,686   15,797,852      ~10 GB
                       Katrina   Hurricane     Aug. 23 – 30, 2005        203,386,049   18,832,041      ~12 GB
                       Charley   Hurricane     Aug. 09 – 15, 2004        101,956,760   9,333,676       ~7 GB
                                 Blizzard      Apr. 01 – 06, 2003        111,357,227   10,237,791      ~2 GB

1 http://mesowest.utah.edu
                                                                    12
SRBench Datasets



GeoNames                                 DBpedia
geographical database, >8M places        largest & most popular dataset in LOD
~8M geographic features                  structured information from Wikipedia
~146M RDF triples                        links to GeoNames through owl:sameAs

~10GB on disk                            we only use the English language collection
                                             ~181M RDF triples

                                             ~27GB on disk




                                    13
SRBench Dataset model


                                             LinkedSensorData

             LinkedObservationData                                        LinkedSensorMetadata
                                                    om-owl:procedure
                      Observation                                             System

om-owl:samplingTime                 om-owl:result                                         om-owl:hasLocatedNearRel
                                                          om-owl:processLocation

       Instant                  ResultData
                                                                                  Point       LocatedNearRel
                       MeasureData          TruthData




                              DBpedia                                  GeoNames
                                                    owl:sameAs                                   om-owl:hasLocation
                                Airport                                 Feature




                                                         14
SRBench Queries

         graph pattern matching     and, filter, union, optional


               solution modifier    projection, distinct
17 queries




                    query form      select, construct, ask

                                    aggregate, subquery
                   SPARQL 1.1
                                    select expr, property path

                     reasoning      subclass, subproperty, sameAs

                                    time window, istream
                     streaming
                                    dstream, rstream

                                    observations, sensor metadata
                    data access
                                    geonames, dbpedia

                                   15
Query Features
                       Q1    Q2      Q3   Q4      Q5    Q6      Q7     Q8     Q9      Q10   Q11     Q12     Q13     Q14     Q15     Q16     Q17


1.Graph pattern        A     A,F,O   A    A,F     A     A,F,U   A     A       A       A     A,F     A,F,U   A,F     A,F,U   A,F     A,F     A,F
matching

2. Solution modifier   P,D   P,D     P    P       P     P       P,D   P       P       P,D   P,D     P       P       P,D     P       P       P


3. Query form          S     S       A    S       C     S       S     S       S       S     S       S       S       S       S       S       S

4. SPARQL 1.1                F,P     A    A,E,M   A,S           N     A,E,M   A,E,M         A,S,M   A,S,E, A,E,M    F,P     A,E,M   P       P
                                          ,F                                                ,F      M,F,P ,F,P              ,P

5. Reasoning                 C       R                                                                                      C       A       C

6. Streaming           T     T       T    T       T     T       T,D   T       T       T     T       T       T       T       T

7. Dataset             O     O       O    O       O     O       O     O,S     O,S     O,S   O,S     O,S,G   O,S,G   O,S,G   O,S,D   O,S,G   S
                                                                                                                                    ,D




                       1. And, Filter, Union, Optional
                       2. Projection, Distinct
                       3. Select, Construct, Ask
                       4. Aggregate, Subquery, Negation, Expr in SELECT, assignMent,
                          Functions&operators, PropertyPath
                       5. subClassOf, subpRopertyOf, owl:sameAs
                       6. Time-based window, Istream, Dstream,Rstream
                       7. LinkedObservationData, LinkedSensorMetadata, GeoNames, Dbpedia



                                                                      16
Sample Queries

     Q1. Get the rainfall observed once in an hour
                                om-owl:result
             weather:                                 result
        RainfallObservation                                                                time-window
                              om-owl:floatValue    om-owl:uom


basic graph patterns                     value                  uom




     Q2. Get all precipitation observed once in an hour
                                                                         om-owl:result
                                weather:                                                     result
                        PrecipitationObservation
                                                                       om-owl:floatValue   om-owl:uom

      weather:                                                                    value                 uom
 RainfallObservation
                                                       weather:
                                                  GraupelObservation
                       weather:
                  DrizzleObservation




                                                                17
Sample Queries



Q4. Get average wind speed at stations where the air
temperature is >32 deg. in the last hour every 10 min
        aggregates      filter                        time-window

                        om-owl:procedure


       weather:               weather:
 WindSpeedObservation   TemperatureObservation




                                                 18
Sample Queries


Q9. Get the daily average wind force and direction
observed by the sensor at a given location.
                           om-owl:result        om-owl:floatValue              <1 0
                                                                               <4 1




                                                                         AVG
       weather:                            result                value
 WindSpeedObservation                                                          <8 2
                                                                               <13  3
                                                                               …
                           om-owl:result        om-owl:floatValue
                                                                               Beaufort scale




                                                                         AVG
       weather:                            result                value
WindDirectionObservation




      Some semantics to bare wind speed numbers

      Post process qualified triple patterns




                                                            19
Sample Queries

Q12. Get the hourly average air temperature
and humidity of large cities
        om-owl:hasLocatedNearRel
    sensor

                                   om-owl:hasLocation

                             feature




                             feature                   population   >15000
                                       gn:population




                                             20
Query Implementations

     http://www.w3.org/wiki/SRBench


C-SPARQL

SPARQLStream

CQELS

Not exhaustive!




                     21
Query implementations
Q6. Get the stations that have observed extremely low visibility in the
last hour.
SELECT ?sensor
FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 1 HOURS]
WHERE {
 { ?observation om-owl:procedure ?sensor ; a weather:VisibilityObservation ;

 UNION
                om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value < "10"^^xsd:float) }   SPARQLStream
 { ?observation om-owl:procedure ?sensor ; a weather:RainfallObservation ;
                om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value > "30"^^xsd:float) }
 UNION
 { ?observation om-owl:procedure ?sensor ; a weather:SnowfallObservation . } }
SELECT ?sensor
FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations>[RANGE 1h TUMBLING]
WHERE {
 { ?observation om-owl:procedure ?sensor ; a weather:VisibilityObservation ;
                om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value < "10"^^xsd:float) }
 UNION
 { ?observation om-owl:procedure ?sensor ; a weather:RainfallObservation ;
                om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value > "30"^^xsd:float) }
                                                                                                   C-SPARQL
 UNION
 { ?observation om-owl:procedure ?sensor ; a weather:SnowfallObservation . } }
 SELECT ?sensor
 WHERE {
 STREAM <http://www.cwi.nl/SRBench/observations> [RANGE 3600s] {
 { ?observation om-owl:procedure ?sensor ; a weather:VisibilityObservation ;
                om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value < "10"^^xsd:float)}
 UNION
 { ?observation om-owl:procedure ?sensor ; a weather:RainfallObservation ;
                                                                                                   CQELS
                om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value > "30"^^xsd:float) }
 UNION
 { ?observation om-owl:procedure ?sensor ; a weather:SnowfallObservation . } } }

                                                                 22
Query implementations
   Q3. Detect if a hurricane has been observed
     « A hurricane has a sustained wind (for more than 3 hours) of at least 33 metres
     per second or 74 miles per hour (119 km/h) »
ASK FROM NAMED STREAM
   <http://www.cwi.nl/SRBench/observations> [NOW - 3 HOURS SLIDE 10 MINUTES]
WHERE {
 ?observation om-owl:procedure ?sensor ; om-owl:observedProperty weather:WindSpeed ;
              om-owl:result [ om-owl:floatValue ?value ] . }
GROUP BY ?sensor HAVING ( AVG(?value) >= "74"^^xsd:float )         SPARQLStream

ASK FROM STREAM
    <http://www.cwi.nl/SRBench/observations> [RANGE 1h STEP 10m]
WHERE {
 ?observation om-owl:procedure ?sensor ; om-owl:observedProperty weather:WindSpeed ;
                      om-owl:result [ om-owl:floatValue ?value ] . }
GROUP BY ?sensor HAVING ( AVG(?value) >= "74"^^xsd:float )           C-SPARQL

ASK WHERE {
 STREAM <http://www.cwi.nl/SRBench/observations> [RANGE 10800s SLIDE 600s] {
  ?observation om-owl:procedure ?sensor ; om-owl:observedProperty weather:WindSpeed ;
                om-owl:result [ om-owl:floatValue ?value ] .} }
GROUP BY ?sensor HAVING ( AVG(?value) >= "74"^^xsd:float )         CQELS
                                              23
Query implementations
  Q2. Get all precipitation observed once in an hour

SELECT DISTINCT ?sensor ?value ?uom
FROM NAMED STREAM
       <http://www.cwi.nl/SRBench/observations> [NOW - 1 HOURS]
WHERE {
  ?observation om-owl:procedure ?sensor ;
               rdf:type/rdfs:subClassOf* weather:PrecipitationObservation ;
               om-owl:result ?result .
  ?result om-owl:floatValue ?value .
  OPTIONAL { ?result om-owl:uom ?uom . }
}




                                     24
Functional Evaluation


System         Q1   Q2   Q3   Q4   Q5   Q6 Q7    Q8   Q9        Q10 Q11 Q12    Q13   Q14   Q15   Q16   Q17


SPARQLStream        PP   A    G    G             G    G,IF SD       SD   PP,SD PP,SD PP,SD PP,SD PP,SD PP,SD


CQELS               PP   A                 D/N        IF                 PP    PP    PP    PP    PP    PP


C-SPARQL            PP   A                 D          IF                 PP    PP    PP    PP    PP    PP



    Ask
    Dstream
    Group by and aggregations
    IF expression
    Negation
    Property Path
    Static Dataset




                                                           25
Evaluation:Discussion


• the graph pattern matching features
• solution modifiers
• SELECT and CONSTRUCT query forms

• property path expressions are not supported
• lack of support for the ASK
• DSTREAM, alternatively NOT EXISTS

•   Lack of reasoning
•   C-SPARQLsimple RDF entailment
•   SPARQLStream  ontology-based query rewriting
•   CQELS  Native implementation
                          26
Ongoing work: Tests criteria
• Correctness
   •   query results validated
   •   possible variations in ordering
   •   possibly multiple valid results per query
   •   mismatch, precision/recall
• Throughput:
   • maximal number data items a strRS engine is able to
     process per time unit
• Scalability:
   • increasing number of incoming streams
   • Increasing number of continuous queries to be processed
• Response time:
   • minimal elapsed time between a data item entering the
     system and being returned as output of a query
   • mainly relevant for queries allowing immediate query results
     upon receiving of a data item
                                   27
Evaluation Issues


• Correctness, Throughput, Scalability
   •   Different outputs
   •   Differences in query semantics?
   •   Very different query evaluation approaches
   •   Reasoning
   •   Execution parameters




                                 28
Some related work


• Framework with a toolset for evaluating Linked Stream
  Data engines
   Linked Stream Data Processing Engines: Facts and Figures.
   Le-Phuoc et al. ISWC 2012


• Subset of functionalities
   • Missing functions, reasoning, property paths, window-to-stream,
• Synthetic data
   • but flexibility
• First set of performace tests
• Showcase benefits of using semantic technologies?




                                     29
Close to the end


 SRBench: the first benchmark for streaming RDF engines

 Version 1

  ◦ SRBench specification
  ◦ Functional evaluation

 Much room left for improvements

  ◦ Streaming RDF processing is an evolving topic
  ◦ Exploiting more reasoning possibilities on semantic data
  ◦ Performance evaluation in Version 2




                                 30
…Thanks


    Questions, please.




jp.calbimonte@upm.es


            31

Contenu connexe

Similaire à SRBench Streaming RDF SPARQL Benchmark

The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...Gezim Sejdiu
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupOscar Corcho
 
Sasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation DefenseSasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation DefenseSasa Nesic
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...Muhammad Saleem
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Tina Zhang
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...DataWorks Summit
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeSumant Tambe
 
SSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data EntitiesSSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data EntitiesRiccardo Albertoni
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactJean-Paul Calbimonte
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Zekeriya Besiroglu
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDCAstroAtom
 
Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...DESTIN-Informatique.com
 
ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …Valery Tkachenko
 
Myriam phd
Myriam phdMyriam phd
Myriam phdiammyr
 
Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioCHAKER ALLAOUI
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingHemant Kumar
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsPat Patterson
 
Ivan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3CIvan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3Csssw2012
 

Similaire à SRBench Streaming RDF SPARQL Benchmark (20)

The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering Group
 
Sasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation DefenseSasa Nesic - PhD Dissertation Defense
Sasa Nesic - PhD Dissertation Defense
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_Predictive maintenance withsensors_in_utilities_
Predictive maintenance withsensors_in_utilities_
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/Subscribe
 
SSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data EntitiesSSONDE: Semantic Similarity On liNked Data Entities
SSONDE: Semantic Similarity On liNked Data Entities
 
RDF Stream Processing: Let's React
RDF Stream Processing: Let's ReactRDF Stream Processing: Let's React
RDF Stream Processing: Let's React
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
Data-intensive profile for the VAMDC
Data-intensive profile for the VAMDCData-intensive profile for the VAMDC
Data-intensive profile for the VAMDC
 
Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...
 
ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …
 
Myriam phd
Myriam phdMyriam phd
Myriam phd
 
Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process Scenario
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
Ivan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3CIvan Herman - Semantic Web Activities @ W3C
Ivan Herman - Semantic Web Activities @ W3C
 

Plus de Jean-Paul Calbimonte

Towards Collaborative Creativity in Persuasive Multi-agent Systems
Towards Collaborative Creativity in Persuasive Multi-agent SystemsTowards Collaborative Creativity in Persuasive Multi-agent Systems
Towards Collaborative Creativity in Persuasive Multi-agent SystemsJean-Paul Calbimonte
 
A Platform for Difficulty Assessment and Recommendation of Hiking Trails
A Platform for Difficulty Assessment andRecommendation of Hiking TrailsA Platform for Difficulty Assessment andRecommendation of Hiking Trails
A Platform for Difficulty Assessment and Recommendation of Hiking TrailsJean-Paul Calbimonte
 
Decentralized Management of Patient Profiles and Trajectories through Semanti...
Decentralized Management of Patient Profiles and Trajectories through Semanti...Decentralized Management of Patient Profiles and Trajectories through Semanti...
Decentralized Management of Patient Profiles and Trajectories through Semanti...Jean-Paul Calbimonte
 
Personal Data Privacy Semantics in Multi-Agent Systems Interactions
Personal Data Privacy Semantics in Multi-Agent Systems InteractionsPersonal Data Privacy Semantics in Multi-Agent Systems Interactions
Personal Data Privacy Semantics in Multi-Agent Systems InteractionsJean-Paul Calbimonte
 
SanTour: Personalized Recommendation of Hiking Trails to Health Pro files
SanTour: Personalized Recommendation of Hiking Trails to Health ProfilesSanTour: Personalized Recommendation of Hiking Trails to Health Profiles
SanTour: Personalized Recommendation of Hiking Trails to Health Pro filesJean-Paul Calbimonte
 
Multi-agent interactions on the Web through Linked Data Notifications
Multi-agent interactions on the Web through Linked Data NotificationsMulti-agent interactions on the Web through Linked Data Notifications
Multi-agent interactions on the Web through Linked Data NotificationsJean-Paul Calbimonte
 
The MedRed Ontology for Representing Clinical Data Acquisition Metadata
The MedRed Ontology for Representing Clinical Data Acquisition MetadataThe MedRed Ontology for Representing Clinical Data Acquisition Metadata
The MedRed Ontology for Representing Clinical Data Acquisition MetadataJean-Paul Calbimonte
 
Linked Data Notifications for RDF Streams
Linked Data Notifications for RDF StreamsLinked Data Notifications for RDF Streams
Linked Data Notifications for RDF StreamsJean-Paul Calbimonte
 
Fundamentos de Scala (Scala Basics) (español) Catecbol
Fundamentos de Scala (Scala Basics) (español) CatecbolFundamentos de Scala (Scala Basics) (español) Catecbol
Fundamentos de Scala (Scala Basics) (español) CatecbolJean-Paul Calbimonte
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebJean-Paul Calbimonte
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsJean-Paul Calbimonte
 
Query Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingQuery Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingJean-Paul Calbimonte
 
Toward Semantic Sensor Data Archives on the Web
Toward Semantic Sensor Data Archives on the WebToward Semantic Sensor Data Archives on the Web
Toward Semantic Sensor Data Archives on the WebJean-Paul Calbimonte
 
Detection of hypoglycemic events through wearable sensors
Detection of hypoglycemic events through wearable sensorsDetection of hypoglycemic events through wearable sensors
Detection of hypoglycemic events through wearable sensorsJean-Paul Calbimonte
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsJean-Paul Calbimonte
 
The Schema Editor of OpenIoT for Semantic Sensor Networks
The Schema Editor of OpenIoT for Semantic Sensor NetworksThe Schema Editor of OpenIoT for Semantic Sensor Networks
The Schema Editor of OpenIoT for Semantic Sensor NetworksJean-Paul Calbimonte
 
Scala Programming for Semantic Web Developers ESWC Semdev2015
Scala Programming for Semantic Web Developers ESWC Semdev2015Scala Programming for Semantic Web Developers ESWC Semdev2015
Scala Programming for Semantic Web Developers ESWC Semdev2015Jean-Paul Calbimonte
 

Plus de Jean-Paul Calbimonte (20)

Towards Collaborative Creativity in Persuasive Multi-agent Systems
Towards Collaborative Creativity in Persuasive Multi-agent SystemsTowards Collaborative Creativity in Persuasive Multi-agent Systems
Towards Collaborative Creativity in Persuasive Multi-agent Systems
 
A Platform for Difficulty Assessment and Recommendation of Hiking Trails
A Platform for Difficulty Assessment andRecommendation of Hiking TrailsA Platform for Difficulty Assessment andRecommendation of Hiking Trails
A Platform for Difficulty Assessment and Recommendation of Hiking Trails
 
Stream reasoning agents
Stream reasoning agentsStream reasoning agents
Stream reasoning agents
 
Decentralized Management of Patient Profiles and Trajectories through Semanti...
Decentralized Management of Patient Profiles and Trajectories through Semanti...Decentralized Management of Patient Profiles and Trajectories through Semanti...
Decentralized Management of Patient Profiles and Trajectories through Semanti...
 
Personal Data Privacy Semantics in Multi-Agent Systems Interactions
Personal Data Privacy Semantics in Multi-Agent Systems InteractionsPersonal Data Privacy Semantics in Multi-Agent Systems Interactions
Personal Data Privacy Semantics in Multi-Agent Systems Interactions
 
RDF data validation 2017 SHACL
RDF data validation 2017 SHACLRDF data validation 2017 SHACL
RDF data validation 2017 SHACL
 
SanTour: Personalized Recommendation of Hiking Trails to Health Pro files
SanTour: Personalized Recommendation of Hiking Trails to Health ProfilesSanTour: Personalized Recommendation of Hiking Trails to Health Profiles
SanTour: Personalized Recommendation of Hiking Trails to Health Pro files
 
Multi-agent interactions on the Web through Linked Data Notifications
Multi-agent interactions on the Web through Linked Data NotificationsMulti-agent interactions on the Web through Linked Data Notifications
Multi-agent interactions on the Web through Linked Data Notifications
 
The MedRed Ontology for Representing Clinical Data Acquisition Metadata
The MedRed Ontology for Representing Clinical Data Acquisition MetadataThe MedRed Ontology for Representing Clinical Data Acquisition Metadata
The MedRed Ontology for Representing Clinical Data Acquisition Metadata
 
Linked Data Notifications for RDF Streams
Linked Data Notifications for RDF StreamsLinked Data Notifications for RDF Streams
Linked Data Notifications for RDF Streams
 
Fundamentos de Scala (Scala Basics) (español) Catecbol
Fundamentos de Scala (Scala Basics) (español) CatecbolFundamentos de Scala (Scala Basics) (español) Catecbol
Fundamentos de Scala (Scala Basics) (español) Catecbol
 
Connecting Stream Reasoners on the Web
Connecting Stream Reasoners on the WebConnecting Stream Reasoners on the Web
Connecting Stream Reasoners on the Web
 
RDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementationsRDF Stream Processing Tutorial: RSP implementations
RDF Stream Processing Tutorial: RSP implementations
 
Query Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream ProcessingQuery Rewriting in RDF Stream Processing
Query Rewriting in RDF Stream Processing
 
Toward Semantic Sensor Data Archives on the Web
Toward Semantic Sensor Data Archives on the WebToward Semantic Sensor Data Archives on the Web
Toward Semantic Sensor Data Archives on the Web
 
Detection of hypoglycemic events through wearable sensors
Detection of hypoglycemic events through wearable sensorsDetection of hypoglycemic events through wearable sensors
Detection of hypoglycemic events through wearable sensors
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
The Schema Editor of OpenIoT for Semantic Sensor Networks
The Schema Editor of OpenIoT for Semantic Sensor NetworksThe Schema Editor of OpenIoT for Semantic Sensor Networks
The Schema Editor of OpenIoT for Semantic Sensor Networks
 
Scala Programming for Semantic Web Developers ESWC Semdev2015
Scala Programming for Semantic Web Developers ESWC Semdev2015Scala Programming for Semantic Web Developers ESWC Semdev2015
Scala Programming for Semantic Web Developers ESWC Semdev2015
 
Streams of RDF Events Derive2015
Streams of RDF Events Derive2015Streams of RDF Events Derive2015
Streams of RDF Events Derive2015
 

SRBench Streaming RDF SPARQL Benchmark

  • 1. International Semantic Web Conference ISWC 2012 SRBench: A Streaming RDF/SPARQL Benchmark Ying Zhang1, Pham Minh Duc1, Oscar Corcho2, Jean-Paul Calbimonte2 1Centrum Wiskunde & Informatica, Amsterdam 2Ontology Engineering Group, Universidad Politécnica de Madrid. jp.calbimonte@upm.es Date: 14/11/2012
  • 2. Streaming Data Data streams Timestamped tuples Time windows Continuous evaluation e.g. Data Stream Management Systems (DSMS) 2
  • 3. Sensor Web “too much (streaming) data but not enough (tools to gain and derive) knowledge”* 3
  • 4. Semantic Web Tech for Streaming Data Why? Annotate sensor data with semantic metadata Apply Linked Data principles to publish streaming data Interlink streaming data with existing datasets Integrate data stream processing + reasoning Raise the query abstraction level with ontologies 4
  • 5. Semantic Sensor Web “too much (streaming) data but not enough (tools to gain and derive) knowledge”* LinkedSensorData Sensor data publishing Linked Data LSM Sense2Web Semantic sensor metadata Sensor APIs ETALIS Semantic Sensor Network ontology Videk SwissEx BOTTARI, UrbanMatch AEMET transporte. linkeddata.es SSN +CEP * Sheth et al. 2008, Semantic Sensor Web 5
  • 6. Semantic Sensor Web Querying semantic streaming data RDF Streams CQELS SPARQL extensions Streaming SPARQL EP-SPARQL SPARQL-based continuous query processors C-SPARQL SPARQL-Stream 6
  • 7. Approaches Extend RDF for streaming data ~Similarities Extend SPARQL for streaming RDF Apply reasoning on streaming RDF Query rewriting to DSMS or CEP Divergence Logic-programming based query evaluation RDF Streaming engine from scratch 7
  • 8. RDF Streaming Processing Challenges How to specify queries? Standard query language extensions not now How to compare systems? Streaming RDF/SPARQL benckmarks 8
  • 9. SRBench First benchmark for streaming RDF engines Streaming RDF/SPARQL benchmark Assess engines abilities of dealing with streaming data Based on real-world datasets Functional evaluation missing? crucial? distinctive? 9
  • 10. Existing Benchmarks? Linear Road Benchmark relational-based model ≠ RDF graph model no interlinking data with other datasets no reasoning RDF /SPARQL benchmarks • LUBM, BSBM, SP2Bench, ... meant for stored data one off-queries single static pre-generated dataset do not exploit LOD datasets no SPARQL 1.1 features*, no reasoning * Now the BSBM BI use case includes aggregates 10
  • 11. SRBench Challenges relevant realistic Proper benchmark dataset semantically valid interlinkable time-bounded summarization A concise set of features continuous data abstraction contextual data descriptive definition No standard query language C-SPARQL CQELS SPARQL-Stream 11
  • 12. SRBench Datasets LinkedSensorData real-world U.S. weather data1 first & largest sensor dataset in LOD LinkedSensorMetadata LinkedObservationData ~20k US weather stations, ~100k sensors hurricane & blizzard observations in US links to locations in GeoNames nearby ~1.73 billion RDF triples ~159 million observations Name Storm Type Date #Triples #Observations Data size Bill Hurricane Aug. 17 – 22, 2009 231,021,108 21,272,790 ~15 GB Ike Hurricane Sep. 01 – 13, 2008 374,094,660 34,430,964 ~34 GB Gustav Hurricane Aug. 25 – 31, 2008 258,378,511 23,792,818 ~17 GB Bertha Hurricane Jul. 06 – 17, 2008 278,235,734 25,762,568 ~13 GB Wilma Hurricane Oct. 17 – 23, 2005 171,854,686 15,797,852 ~10 GB Katrina Hurricane Aug. 23 – 30, 2005 203,386,049 18,832,041 ~12 GB Charley Hurricane Aug. 09 – 15, 2004 101,956,760 9,333,676 ~7 GB Blizzard Apr. 01 – 06, 2003 111,357,227 10,237,791 ~2 GB 1 http://mesowest.utah.edu 12
  • 13. SRBench Datasets GeoNames DBpedia geographical database, >8M places largest & most popular dataset in LOD ~8M geographic features structured information from Wikipedia ~146M RDF triples links to GeoNames through owl:sameAs ~10GB on disk we only use the English language collection ~181M RDF triples ~27GB on disk 13
  • 14. SRBench Dataset model LinkedSensorData LinkedObservationData LinkedSensorMetadata om-owl:procedure Observation System om-owl:samplingTime om-owl:result om-owl:hasLocatedNearRel om-owl:processLocation Instant ResultData Point LocatedNearRel MeasureData TruthData DBpedia GeoNames owl:sameAs om-owl:hasLocation Airport Feature 14
  • 15. SRBench Queries graph pattern matching and, filter, union, optional solution modifier projection, distinct 17 queries query form select, construct, ask aggregate, subquery SPARQL 1.1 select expr, property path reasoning subclass, subproperty, sameAs time window, istream streaming dstream, rstream observations, sensor metadata data access geonames, dbpedia 15
  • 16. Query Features Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 1.Graph pattern A A,F,O A A,F A A,F,U A A A A A,F A,F,U A,F A,F,U A,F A,F A,F matching 2. Solution modifier P,D P,D P P P P P,D P P P,D P,D P P P,D P P P 3. Query form S S A S C S S S S S S S S S S S S 4. SPARQL 1.1 F,P A A,E,M A,S N A,E,M A,E,M A,S,M A,S,E, A,E,M F,P A,E,M P P ,F ,F M,F,P ,F,P ,P 5. Reasoning C R C A C 6. Streaming T T T T T T T,D T T T T T T T T 7. Dataset O O O O O O O O,S O,S O,S O,S O,S,G O,S,G O,S,G O,S,D O,S,G S ,D 1. And, Filter, Union, Optional 2. Projection, Distinct 3. Select, Construct, Ask 4. Aggregate, Subquery, Negation, Expr in SELECT, assignMent, Functions&operators, PropertyPath 5. subClassOf, subpRopertyOf, owl:sameAs 6. Time-based window, Istream, Dstream,Rstream 7. LinkedObservationData, LinkedSensorMetadata, GeoNames, Dbpedia 16
  • 17. Sample Queries Q1. Get the rainfall observed once in an hour om-owl:result weather: result RainfallObservation time-window om-owl:floatValue om-owl:uom basic graph patterns value uom Q2. Get all precipitation observed once in an hour om-owl:result weather: result PrecipitationObservation om-owl:floatValue om-owl:uom weather: value uom RainfallObservation weather: GraupelObservation weather: DrizzleObservation 17
  • 18. Sample Queries Q4. Get average wind speed at stations where the air temperature is >32 deg. in the last hour every 10 min aggregates filter time-window om-owl:procedure weather: weather: WindSpeedObservation TemperatureObservation 18
  • 19. Sample Queries Q9. Get the daily average wind force and direction observed by the sensor at a given location. om-owl:result om-owl:floatValue <1 0 <4 1 AVG weather: result value WindSpeedObservation <8 2 <13  3 … om-owl:result om-owl:floatValue Beaufort scale AVG weather: result value WindDirectionObservation Some semantics to bare wind speed numbers Post process qualified triple patterns 19
  • 20. Sample Queries Q12. Get the hourly average air temperature and humidity of large cities om-owl:hasLocatedNearRel sensor om-owl:hasLocation feature feature population >15000 gn:population 20
  • 21. Query Implementations http://www.w3.org/wiki/SRBench C-SPARQL SPARQLStream CQELS Not exhaustive! 21
  • 22. Query implementations Q6. Get the stations that have observed extremely low visibility in the last hour. SELECT ?sensor FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 1 HOURS] WHERE { { ?observation om-owl:procedure ?sensor ; a weather:VisibilityObservation ; UNION om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value < "10"^^xsd:float) } SPARQLStream { ?observation om-owl:procedure ?sensor ; a weather:RainfallObservation ; om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value > "30"^^xsd:float) } UNION { ?observation om-owl:procedure ?sensor ; a weather:SnowfallObservation . } } SELECT ?sensor FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations>[RANGE 1h TUMBLING] WHERE { { ?observation om-owl:procedure ?sensor ; a weather:VisibilityObservation ; om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value < "10"^^xsd:float) } UNION { ?observation om-owl:procedure ?sensor ; a weather:RainfallObservation ; om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value > "30"^^xsd:float) } C-SPARQL UNION { ?observation om-owl:procedure ?sensor ; a weather:SnowfallObservation . } } SELECT ?sensor WHERE { STREAM <http://www.cwi.nl/SRBench/observations> [RANGE 3600s] { { ?observation om-owl:procedure ?sensor ; a weather:VisibilityObservation ; om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value < "10"^^xsd:float)} UNION { ?observation om-owl:procedure ?sensor ; a weather:RainfallObservation ; CQELS om-owl:result [om-owl:floatValue ?value ] . FILTER ( ?value > "30"^^xsd:float) } UNION { ?observation om-owl:procedure ?sensor ; a weather:SnowfallObservation . } } } 22
  • 23. Query implementations Q3. Detect if a hurricane has been observed « A hurricane has a sustained wind (for more than 3 hours) of at least 33 metres per second or 74 miles per hour (119 km/h) » ASK FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 3 HOURS SLIDE 10 MINUTES] WHERE { ?observation om-owl:procedure ?sensor ; om-owl:observedProperty weather:WindSpeed ; om-owl:result [ om-owl:floatValue ?value ] . } GROUP BY ?sensor HAVING ( AVG(?value) >= "74"^^xsd:float ) SPARQLStream ASK FROM STREAM <http://www.cwi.nl/SRBench/observations> [RANGE 1h STEP 10m] WHERE { ?observation om-owl:procedure ?sensor ; om-owl:observedProperty weather:WindSpeed ; om-owl:result [ om-owl:floatValue ?value ] . } GROUP BY ?sensor HAVING ( AVG(?value) >= "74"^^xsd:float ) C-SPARQL ASK WHERE { STREAM <http://www.cwi.nl/SRBench/observations> [RANGE 10800s SLIDE 600s] { ?observation om-owl:procedure ?sensor ; om-owl:observedProperty weather:WindSpeed ; om-owl:result [ om-owl:floatValue ?value ] .} } GROUP BY ?sensor HAVING ( AVG(?value) >= "74"^^xsd:float ) CQELS 23
  • 24. Query implementations Q2. Get all precipitation observed once in an hour SELECT DISTINCT ?sensor ?value ?uom FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 1 HOURS] WHERE { ?observation om-owl:procedure ?sensor ; rdf:type/rdfs:subClassOf* weather:PrecipitationObservation ; om-owl:result ?result . ?result om-owl:floatValue ?value . OPTIONAL { ?result om-owl:uom ?uom . } } 24
  • 25. Functional Evaluation System Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 SPARQLStream PP A G G G G,IF SD SD PP,SD PP,SD PP,SD PP,SD PP,SD PP,SD CQELS PP A D/N IF PP PP PP PP PP PP C-SPARQL PP A D IF PP PP PP PP PP PP Ask Dstream Group by and aggregations IF expression Negation Property Path Static Dataset 25
  • 26. Evaluation:Discussion • the graph pattern matching features • solution modifiers • SELECT and CONSTRUCT query forms • property path expressions are not supported • lack of support for the ASK • DSTREAM, alternatively NOT EXISTS • Lack of reasoning • C-SPARQLsimple RDF entailment • SPARQLStream  ontology-based query rewriting • CQELS  Native implementation 26
  • 27. Ongoing work: Tests criteria • Correctness • query results validated • possible variations in ordering • possibly multiple valid results per query • mismatch, precision/recall • Throughput: • maximal number data items a strRS engine is able to process per time unit • Scalability: • increasing number of incoming streams • Increasing number of continuous queries to be processed • Response time: • minimal elapsed time between a data item entering the system and being returned as output of a query • mainly relevant for queries allowing immediate query results upon receiving of a data item 27
  • 28. Evaluation Issues • Correctness, Throughput, Scalability • Different outputs • Differences in query semantics? • Very different query evaluation approaches • Reasoning • Execution parameters 28
  • 29. Some related work • Framework with a toolset for evaluating Linked Stream Data engines Linked Stream Data Processing Engines: Facts and Figures. Le-Phuoc et al. ISWC 2012 • Subset of functionalities • Missing functions, reasoning, property paths, window-to-stream, • Synthetic data • but flexibility • First set of performace tests • Showcase benefits of using semantic technologies? 29
  • 30. Close to the end  SRBench: the first benchmark for streaming RDF engines  Version 1 ◦ SRBench specification ◦ Functional evaluation  Much room left for improvements ◦ Streaming RDF processing is an evolving topic ◦ Exploiting more reasoning possibilities on semantic data ◦ Performance evaluation in Version 2 30
  • 31. …Thanks Questions, please. jp.calbimonte@upm.es 31