Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
RDF Stream Processing: Let's React
1. RDF Stream Processing:
Let’s React!
@jpcik
Jean-Paul Calbimonte
LSIR EPFL
OrdRing 2014. International Semantic Web Conference ISWC 2014
Riva del Garda, 20.10.2014
2. The RSP Community
Research work
Many Papers
PhD Thesis
Datasets
Prototypes
Benchmarks
RDF Streams
Stream Reasoning
Complex Event Processing
Stream Query Processing
Stream Compression
Semantic Sensor Web
Many topics
Tons of work
W3C RSP Community Group
http://www.w3.org/community/rsp
discuss
standardize
combine
formalize
evangelize
Effort to our work on RDF stream processing
2
4. Why Streams?
Internet of Things
Sensor Networks
Mobile Networks
Smart Devices
Participatory Sensing
Transportation
Financial Data
Social Media
Urban Planning
Health Monitoring
Marketing
“It’s a
streaming
world!”[1]
4 Della Valle, et al : It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems
5. Why Streams?
Web standards
Data discovery
Data sharing
Web queries
Go Web
Semantics
Vocabularies
Data Harvesting
Data linking
Matching
Integration
Ontologies
Expressivity
Inference
Rule processing
Knowledge bases
Reasoning
Query languages
Query answering
Efficient processing
Query Federation
Processing
Is this what
we require?
5
6. Looking back 10 years ago…
Stonebreaker et al. The 8 requirement of Real-TimeStream Processing. SIGMOD Record. 2005.
“8 requirements of
real-time stream
processing”[2]
Do we address them?
Do we have more requirements?
Do we need to do more?
Keep data moving
Query with stream SQL
Handle imperfections
Predictable outcomes
Integrate stored data
Data safety & availability
Partition & scale
Respond Instantaneously
8 Requirements
6
7. Reactive Systems
Keep data moving
Query with stream SQL
Handle imperfections
Predictable outcomes
Integrate stored data
Data safety & availability
Partition & scale
Respond Instantaneously
8 Requirements
Event-Driven
Jonas Boner. Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems. 2013.
Events:
react to
Load: Scalable
Failure: Resilient
Users: Responsive
Do we address them?
Do we have more requirements?
Do we need to do more?
7
9. ① Keep the data moving
Process data in-stream
Not required to store
Active processing model
input streams
RSP
queries/
rules output streams/events
RDF Streams
9
10. … RDF Stream
Gi
Gi+1
Gi+2
…
Gi+n
…
unbounded sequence
Gi {(s1,p1,o1),
(s2,p2,o2),…} [ti]
1+ triples
implicit/explicit
timestamp/interval
public class SensorsStreamer extends RdfStream implements Runnable {
public void run() {
..
while(true){
the stream is
“observable”
...
RdfQuadruple q=new RdfQuadruple(subject,predicate,object,
System.currentTimeMillis());
this.put(q);
}
}
}
How do I code this?
something
to run on a
thread
timestamped
triple
Data structure, execution
and callbacks are mixed
10
Observer pattern
Tightly coupled listener
11. Actor Model
Actor
1
11
send: fire-forget
m No shared mutable state
Actor
2
Avoid blocking operators
Lightweight objects
Loose coupling
communicate
through messages
mailbox
state
non-blocking response behavior
More goodies later…
Implementations: e.g. Akka for Java/Scala
12. RDF Stream
Data structure
object DemoStreams {
...
def streamTriples={
Ideas using akka actors
Iterator.from(1) map{i=>
...
new Triple(subject,predicate,object)
}
}
Infinite triple
iterator
Execution
val f=Future(DemoStreams.streamTriples)
f.map{a=>a.foreach{triple=>
//do something
}}
Asynchronous
iteration
Message passing
f.map{a=>a.foreach{triple=>
someSink ! triple
}}
send triple to
actor
Immutable RDF stream
avoid shared mutable state
avoid concurrent writes
unbounded sequence
Futures
non blocking composition
concurrent computations
work with not-yet-computed results
Actors
message-based
share-nothing async
distributable
12
13. RDF Stream
Loose coupling
Immutable data streams
Asynchronous message passing
Well defined input/output
13
… other issues:
Graph implementation?
Timestamps: application vs system?
Serialization?
14. ② Query using SQL on Streams
Model
Continuous
execution
Union, Join,
Optional, Filter
Aggregates
Time window
Triple window
R2S operator
Sequence, Co-ocurrence
Time function
TA-SPARQL TA-RDF ✗ ✔ Limited ✗ ✗ ✗ ✗ ✗
tSPARQL tRDF ✗ ✔ ✗ ✗ ✗ ✗ ✗ ✗
Streaming
RDF Stream ✔ ✔ ✗ ✔ ✔ ✗ ✗ ✗
SPARQL
C-SPARQL RDF Stream ✔ ✔ ✔ ✔ ✔ ✗ ✗ ✔
CQELS RDF Stream ✔ ✔ ✔ ✔ ✔ ✗ ✗ ✗
SPARQLStream (Virtual)
RDF Stream
✔ ✔ ✔ ✔ ✗ ✔ ✗ ✗
EP-SPARQL RDF Stream ✔ ✔ ✔ ✗ ✗ ✗ ✔ ✗
Instans RDF ✔ ✔ ✔ ✗ ✗ ✗ ✗ ✗
W3C RSP
review features in existing systems
agree on fundamental operators
discuss on possible semantics
https://www.w3.org/community/rsp/wiki/RSP_Query_Features
RSP is not always/only SPARQL-like
querying
SPARQL protocol is not enough
RSP RESTful interfaces?
14
Powerful languages for continuous
query processing
15. SELECT ?person ?loc WHERE {
STREAM <http://deri.org/streams/rfid> [RANGE 3s]
{?person :detectedAt ?loc}
}
ExecContext context=new ExecContext(HOME, false);
register
query
String queryString =" SELECT ?person ?loc …
ContinuousSelect selQuery=context.registerSelect(queryString);
selQuery.register(new ContinuousListener()
{
public void update(Mapping mapping){
adding listener
get result updates
String result="";
for(Iterator<Var> vars=mapping.vars();vars.hasNext();)
result+=" "+ context.engine().decode(mapping.get(vars.next()));
System.out.println(result);
}
});
RSP Querying
Example with CQELS (code.google.com/p/cqels)
CQELS continuous query: “Queries evaluated
continuously against
the changing dataset”[4]
Le Phuoc et al. An Native and adaptive approach for unifyied processing of Linked Streams and Linked Data. ISWC2011.
15
Tightly coupled listeners
Results delivery: push & pull?
16. Dynamic Push-Pull
Producer
16
Consumer
demand flow
m
data flow
Push when consumer is faster
Pull when producer is faster
Dynamically switch modes
Communication is dynamic
depending on demand vs supply
17. ③ Handle stream imperfections
Delayed, missing, out-of-order …
Setting timeouts in message passing
val future = myActor.ask("hello")(5 seconds)
context.setReceiveTimeout(100 milliseconds)
def receive = {
case "Hello" =>
//do something
case ReceiveTimeout =>
throw new RuntimeException("Timed out")
17
timeout after 5 sec
Timeout receiving messages
in case of timeout
“order
matters!”[1]
Async message passing helps
Still to be studied at protocol level
Different guarantee requirements
In W3C RSP: usually ‘out-of-scope’
18. ④ Generate predictable outcomes
Correctness in RSP
18
“RSP engines produce
different but correct results”[1]
RSP queries: operational semantics
Common query execution model
Comparable / predictable results
Dell’Aglio Calbimonte, Balduini, Corcho, Della Valle. On Correctness on RDF Stream Processing Benchmarking. ISWC 2013
19. ⑤ Integrate stored and streaming data
Integration in RSP languages
SELECT ?person1 ?person2
FROM NAMED <http://deri.org/floorplan/>
WHERE {
GRAPH <http://deri.org/floorplan/>
{?loc1 lv:connected ?loc2}
STREAM <http://deri.org/streams/rfid> [NOW]
{?person1 lv:detectedAt ?loc1}
SELECT ?road ?speed WHERE {
{ ?road :slowTrafficDue ?observ }
{ ?observ rdfs:subClassOf :SlowTraffic
{ ?observ :speed ?speed }
FILTER (getDURATION() < ”P1H”^^xsd:duration)
Loading background knowledge
19
}
stored RDF graph
RDF stream
Clean query model for stream+stored data
Web-ready inter-dataset queries
Shared vocabularies/ontologies
RDF stream +
stored
“EP-SPARQL uses
background ontologies to enable
stream reasoning”[1]
More than just joins with stored data
Stream reasoning / inferences
context.loadDataset("{URI OF GRAPH}", "{DIRECTORY TO LOAD}");
Anicic et al. EP-SPARQL: a unified language for event processing and stream reasoning. WWW2011.
20. ⑥ Guarantee data safety and availability
Restart,
Suspend,
Stop,
Escalate, etc
20
Parent
Actor
1
Automatic supervision
Isolate failures
Manage local failures
Supervision strategies:
All-for-One
One-for-one
Death watch handling
Supervision
hierarchy
Supervision
Actor
2
Actor
3
4X
Actor
21. Data availability
High load:
input streams
21
RSP
queries/
rules
under stress
“eviction: delete from
cache of operators”[1]
binding1
binding2
..
bindingX
binding3
binding5
..
bindingY
X
Join
operator
X
Evict cache items based on:
recency
LRU
Likeliness of matching
Gao et al. The CLOCK Data-Aware Eviction Approach: Towards Processing Linked Data Streams with Limited Resources. ESWC 2014
22. ⑦ Partition and scale automatically
22
Dynamite: Parallel Materialization
Maintaining a dynamic ontology
Parallel threads
23. Actors everywhere
Actor
1
23
m No difference in
Actor
2
one core
many cores
many servers
Actor
3
Actor
4
Transparent Remoting
Locality optimization
Define Routing policies
Define actor clusters
m
m
Existing ‘map reduce’ for streams: Storm, S4, Spark, Akka Streams
Create workflows of Stream Processors?
24. ⑧ Process and respond instantaneously
Blocking queries
Sync communication
Bottlenecks
No end to end reactivity
24
SPARQLStream
Virtual RDF Stream
DSMS CEP Sensor
middleware
…
rewritten
queries
users, applications
query processing Morph-streams
data layer
“Query virtual RDF streams”[1]
Push:
Using Websockets!
Two way communication
Responsive answers
25. RSP Stream Reasoning
Example with TrOWL (trowl.eu)
Ontologies evolve over time!
ONTO
+ axiom1
+ axiom2
ONTO’
- axiom3
ONTO’’
Adding and removing axioms over time
“reasoning with
changing
knowledge”[3]
add 10
axioms
get instances of
‘System’ class
Ren, Pan, Zhao. Towards Scalable Reasoning on Ontology Streams via Syntactic Approximation. IWOD2010.
val onto=mgr.createOntology
val reasoner = new RELReasonerFactory().createReasoner(onto)
(1 to 10).foreach{i=>
reasoner += (Indiv(rspEx+"sys"+i) ofClass system)
reasoner.reclassify
reclassify
}
println (reasoner.getInstances(SystemClass, true))
25
Powerful Dynamic Ontology Maintenance
Streaming query results?
Notify new inferences?
27. Reactive RSPs
Keep data moving
Query with stream SQL
Handle imperfections
Predictable outcomes
Integrate stored data
Data safety & availability
Partition & scale
Respond Instantaneously
8 Requirements
We go beyond only these
27
Data Heterogeneity
Data Modeling
Stream Reasoning
Data discovery
Stream data linking
Query optimization
… more
Reactive Principles
Needed if we want to build relevant systems
28. Reactive RSP workflows
28
Morph
Streams
CSPARQL
s
Etalis
TrOWL
s
Dyna
mite
s
Event-driven message passing
Async communication
Immutable streams
Transparent Remoting
Parallel and distributed
Supervised Failure Handling
Responsive processing
s CQELS
Minimal agreements: standards, serialization, interfaces
Formal models for RSPs and reasoning
Working prototypes/systems!
Reactive RSPs