RDF Stream Processing: Let's React

1 961 vues

Publié le

RDF Stream processing and reactive systems

Publié dans : Logiciels
1 commentaire
4 j’aime
Aucun téléchargement
Nombre de vues
1 961
Sur SlideShare
Issues des intégrations
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive

RDF Stream Processing: Let's React

  1. 1. RDF Stream Processing: Let’s React! @jpcik Jean-Paul Calbimonte LSIR EPFL OrdRing 2014. International Semantic Web Conference ISWC 2014 Riva del Garda, 20.10.2014
  2. 2. The RSP Community Research work Many Papers PhD Thesis Datasets Prototypes Benchmarks RDF Streams Stream Reasoning Complex Event Processing Stream Query Processing Stream Compression Semantic Sensor Web Many topics Tons of work W3C RSP Community Group http://www.w3.org/community/rsp discuss standardize combine formalize evangelize Effort to our work on RDF stream processing 2
  3. 3. The RSP Community 3
  4. 4. Why Streams? Internet of Things Sensor Networks Mobile Networks Smart Devices Participatory Sensing Transportation Financial Data Social Media Urban Planning Health Monitoring Marketing “It’s a streaming world!”[1] 4 Della Valle, et al : It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems
  5. 5. Why Streams? Web standards Data discovery Data sharing Web queries Go Web Semantics Vocabularies Data Harvesting Data linking Matching Integration Ontologies Expressivity Inference Rule processing Knowledge bases Reasoning Query languages Query answering Efficient processing Query Federation Processing Is this what we require? 5
  6. 6. Looking back 10 years ago… Stonebreaker et al. The 8 requirement of Real-TimeStream Processing. SIGMOD Record. 2005. “8 requirements of real-time stream processing”[2] Do we address them? Do we have more requirements? Do we need to do more? Keep data moving Query with stream SQL Handle imperfections Predictable outcomes Integrate stored data Data safety & availability Partition & scale Respond Instantaneously 8 Requirements 6
  7. 7. Reactive Systems Keep data moving Query with stream SQL Handle imperfections Predictable outcomes Integrate stored data Data safety & availability Partition & scale Respond Instantaneously 8 Requirements Event-Driven Jonas Boner. Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems. 2013. Events: react to Load: Scalable Failure: Resilient Users: Responsive Do we address them? Do we have more requirements? Do we need to do more? 7
  8. 8. Warning: You may see code 8
  9. 9. ① Keep the data moving Process data in-stream Not required to store Active processing model input streams RSP queries/ rules output streams/events RDF Streams 9
  10. 10. … RDF Stream Gi Gi+1 Gi+2 … Gi+n … unbounded sequence Gi {(s1,p1,o1), (s2,p2,o2),…} [ti] 1+ triples implicit/explicit timestamp/interval public class SensorsStreamer extends RdfStream implements Runnable { public void run() { .. while(true){ the stream is “observable” ... RdfQuadruple q=new RdfQuadruple(subject,predicate,object, System.currentTimeMillis()); this.put(q); } } } How do I code this? something to run on a thread timestamped triple Data structure, execution and callbacks are mixed 10 Observer pattern Tightly coupled listener
  11. 11. Actor Model Actor 1 11 send: fire-forget m No shared mutable state Actor 2 Avoid blocking operators Lightweight objects Loose coupling communicate through messages mailbox state non-blocking response behavior More goodies later… Implementations: e.g. Akka for Java/Scala
  12. 12. RDF Stream Data structure object DemoStreams { ... def streamTriples={ Ideas using akka actors Iterator.from(1) map{i=> ... new Triple(subject,predicate,object) } } Infinite triple iterator Execution val f=Future(DemoStreams.streamTriples) f.map{a=>a.foreach{triple=> //do something }} Asynchronous iteration Message passing f.map{a=>a.foreach{triple=> someSink ! triple }} send triple to actor Immutable RDF stream  avoid shared mutable state  avoid concurrent writes  unbounded sequence Futures  non blocking composition  concurrent computations  work with not-yet-computed results Actors  message-based  share-nothing async  distributable 12
  13. 13. RDF Stream  Loose coupling  Immutable data streams  Asynchronous message passing  Well defined input/output 13 … other issues: Graph implementation? Timestamps: application vs system? Serialization?
  14. 14. ② Query using SQL on Streams Model Continuous execution Union, Join, Optional, Filter Aggregates Time window Triple window R2S operator Sequence, Co-ocurrence Time function TA-SPARQL TA-RDF ✗ ✔ Limited ✗ ✗ ✗ ✗ ✗ tSPARQL tRDF ✗ ✔ ✗ ✗ ✗ ✗ ✗ ✗ Streaming RDF Stream ✔ ✔ ✗ ✔ ✔ ✗ ✗ ✗ SPARQL C-SPARQL RDF Stream ✔ ✔ ✔ ✔ ✔ ✗ ✗ ✔ CQELS RDF Stream ✔ ✔ ✔ ✔ ✔ ✗ ✗ ✗ SPARQLStream (Virtual) RDF Stream ✔ ✔ ✔ ✔ ✗ ✔ ✗ ✗ EP-SPARQL RDF Stream ✔ ✔ ✔ ✗ ✗ ✗ ✔ ✗ Instans RDF ✔ ✔ ✔ ✗ ✗ ✗ ✗ ✗ W3C RSP  review features in existing systems  agree on fundamental operators  discuss on possible semantics https://www.w3.org/community/rsp/wiki/RSP_Query_Features RSP is not always/only SPARQL-like querying SPARQL protocol is not enough RSP RESTful interfaces? 14 Powerful languages for continuous query processing
  15. 15. SELECT ?person ?loc WHERE { STREAM <http://deri.org/streams/rfid> [RANGE 3s] {?person :detectedAt ?loc} } ExecContext context=new ExecContext(HOME, false); register query String queryString =" SELECT ?person ?loc … ContinuousSelect selQuery=context.registerSelect(queryString); selQuery.register(new ContinuousListener() { public void update(Mapping mapping){ adding listener get result updates String result=""; for(Iterator<Var> vars=mapping.vars();vars.hasNext();) result+=" "+ context.engine().decode(mapping.get(vars.next())); System.out.println(result); } }); RSP Querying Example with CQELS (code.google.com/p/cqels) CQELS continuous query: “Queries evaluated continuously against the changing dataset”[4] Le Phuoc et al. An Native and adaptive approach for unifyied processing of Linked Streams and Linked Data. ISWC2011. 15 Tightly coupled listeners Results delivery: push & pull?
  16. 16. Dynamic Push-Pull Producer 16 Consumer demand flow m data flow Push when consumer is faster Pull when producer is faster Dynamically switch modes Communication is dynamic depending on demand vs supply
  17. 17. ③ Handle stream imperfections Delayed, missing, out-of-order … Setting timeouts in message passing val future = myActor.ask("hello")(5 seconds) context.setReceiveTimeout(100 milliseconds) def receive = { case "Hello" => //do something case ReceiveTimeout => throw new RuntimeException("Timed out") 17 timeout after 5 sec Timeout receiving messages in case of timeout “order matters!”[1] Async message passing helps Still to be studied at protocol level Different guarantee requirements In W3C RSP: usually ‘out-of-scope’ 
  18. 18. ④ Generate predictable outcomes Correctness in RSP 18 “RSP engines produce different but correct results”[1] RSP queries: operational semantics Common query execution model Comparable / predictable results Dell’Aglio Calbimonte, Balduini, Corcho, Della Valle. On Correctness on RDF Stream Processing Benchmarking. ISWC 2013
  19. 19. ⑤ Integrate stored and streaming data Integration in RSP languages SELECT ?person1 ?person2 FROM NAMED <http://deri.org/floorplan/> WHERE { GRAPH <http://deri.org/floorplan/> {?loc1 lv:connected ?loc2} STREAM <http://deri.org/streams/rfid> [NOW] {?person1 lv:detectedAt ?loc1} SELECT ?road ?speed WHERE { { ?road :slowTrafficDue ?observ } { ?observ rdfs:subClassOf :SlowTraffic { ?observ :speed ?speed } FILTER (getDURATION() < ”P1H”^^xsd:duration) Loading background knowledge 19 } stored RDF graph RDF stream Clean query model for stream+stored data Web-ready inter-dataset queries Shared vocabularies/ontologies RDF stream + stored “EP-SPARQL uses background ontologies to enable stream reasoning”[1] More than just joins with stored data Stream reasoning / inferences context.loadDataset("{URI OF GRAPH}", "{DIRECTORY TO LOAD}"); Anicic et al. EP-SPARQL: a unified language for event processing and stream reasoning. WWW2011.
  20. 20. ⑥ Guarantee data safety and availability Restart, Suspend, Stop, Escalate, etc 20 Parent Actor 1 Automatic supervision Isolate failures Manage local failures Supervision strategies: All-for-One One-for-one Death watch handling Supervision hierarchy Supervision Actor 2 Actor 3 4X Actor
  21. 21. Data availability High load: input streams 21 RSP queries/ rules under stress “eviction: delete from cache of operators”[1] binding1 binding2 .. bindingX binding3 binding5 .. bindingY X Join operator X Evict cache items based on: recency LRU Likeliness of matching Gao et al. The CLOCK Data-Aware Eviction Approach: Towards Processing Linked Data Streams with Limited Resources. ESWC 2014
  22. 22. ⑦ Partition and scale automatically 22 Dynamite: Parallel Materialization Maintaining a dynamic ontology Parallel threads
  23. 23. Actors everywhere Actor 1 23 m No difference in Actor 2 one core many cores many servers Actor 3 Actor 4 Transparent Remoting Locality optimization Define Routing policies Define actor clusters m m Existing ‘map reduce’ for streams: Storm, S4, Spark, Akka Streams Create workflows of Stream Processors?
  24. 24. ⑧ Process and respond instantaneously Blocking queries Sync communication Bottlenecks No end to end reactivity  24 SPARQLStream Virtual RDF Stream DSMS CEP Sensor middleware … rewritten queries users, applications query processing Morph-streams data layer “Query virtual RDF streams”[1] Push: Using Websockets!  Two way communication Responsive answers
  25. 25. RSP Stream Reasoning Example with TrOWL (trowl.eu) Ontologies evolve over time! ONTO + axiom1 + axiom2 ONTO’ - axiom3 ONTO’’ Adding and removing axioms over time “reasoning with changing knowledge”[3] add 10 axioms get instances of ‘System’ class Ren, Pan, Zhao. Towards Scalable Reasoning on Ontology Streams via Syntactic Approximation. IWOD2010. val onto=mgr.createOntology val reasoner = new RELReasonerFactory().createReasoner(onto) (1 to 10).foreach{i=> reasoner += (Indiv(rspEx+"sys"+i) ofClass system) reasoner.reclassify reclassify } println (reasoner.getInstances(SystemClass, true)) 25 Powerful Dynamic Ontology Maintenance Streaming query results? Notify new inferences?
  26. 26. A lot to do… 26
  27. 27. Reactive RSPs Keep data moving Query with stream SQL Handle imperfections Predictable outcomes Integrate stored data Data safety & availability Partition & scale Respond Instantaneously 8 Requirements We go beyond only these 27 Data Heterogeneity Data Modeling Stream Reasoning Data discovery Stream data linking Query optimization … more Reactive Principles Needed if we want to build relevant systems
  28. 28. Reactive RSP workflows 28 Morph Streams CSPARQL s Etalis TrOWL s Dyna mite s Event-driven message passing Async communication Immutable streams Transparent Remoting Parallel and distributed Supervised Failure Handling Responsive processing s CQELS Minimal agreements: standards, serialization, interfaces Formal models for RSPs and reasoning Working prototypes/systems! Reactive RSPs
  29. 29. We like to do things 29
  30. 30. Muchas gracias! @jpcik Jean-Paul Calbimonte LSIR EPFL