5. Why Streams?
Internet of Things
Sensor Networks
Mobile Networks
Smart Devices
Participatory Sensing
Transportation
Financial Data
Social Media
Urban Planning
Health Monitoring
Marketing
“It’s a
streaming
world!”[1]
5 Della Valle, et al : It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems
6. Why Streams?
Web standards
Data discovery
Data sharing
Web queries
GoWeb
Semantics
Vocabularies
Data Harvesting
Data linking
Matching
Integration
Ontologies
Expressivity
Inference
Rule processing
Knowledge basesReasoning
Query languages
Query answering
Efficient processing
Query Federation
Processing
6
7. Raw Streams to Semantics
7
Raw observations
Patterns
Semantics Upstairs | Standing | Downstairs
Example: Activity recognition
TimeSeriesSymbolsOntologies
9. Data items
9
• With data item we can refer to:
1. A triple
2. A graph
<:alice :isWith :bob>
<:alice :posts :p>
<:p :who :bob>
<:p :where :redRoom>
:graph1
10. RDF stream model
10
• A commonly adopted RDF stream model
• A RDF triple is an event
• Application time: point-based
<:alice :isWith :bob>:[1]
<:alice :isWith :carl>:[3]
<:bob :isWith :diana>:[6]
...
e1 e2 e3 e4S
t3 6 91
:alice :isWith :bob
:alice :isWith :carl
:bob :isWith :diana
:diana :isWith :carl
12. RDF Stream…
Gi
Gi+1
Gi+2
…
Gi+n
…
unboundedsequence
Gi {(s1,p1,o1),
(s2,p2,o2),…} [ti]
1+ triples
implicit/explicit
timestamp/interval
public class SensorsStreamer extends RdfStream implements Runnable {
public void run() {
..
while(true){
...
RdfQuadruple q=new RdfQuadruple(subject,predicate,object,
System.currentTimeMillis());
this.put(q);
}
}
}
How do I code this?
something
to run on a
thread
timestamped
triple
the stream is
“observable”
Data structure, execution
and callbacks are mixed
12
Observer pattern
Tightly coupled listener
13. Reactive Systems
Event-Driven
Jonas Boner. Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems. 2013.
Events:
reactto
ScalableLoad:
ResilientFailure:
ResponsiveUsers:
13
15. Actor Model
15
Actor
1
Actor
2
m
No shared mutable state
Avoid blocking operators
Lightweight objects
Loose coupling
communicate
through messages
mailbox
state
behavior
non-blocking response
send: fire-forget
Implementations: e.g. Akka for Java/Scala
Producer
Actor Processor
Actor
asynchronous messages
non-blocking response
stream of triples
Consumer
Actor
stream of results
Prod
ucer
Prod
ucer
Prod
ucer
Cons
umer Cons
umer
Cons
umer
Cons
umer
16. RDF Stream
object DemoStreams {
...
def streamTriples={
Iterator.from(1) map{i=>
...
new Triple(subject,predicate,object)
}
}
Data structure
Infinite triple
iterator
Execution
val f=Future(DemoStreams.streamTriples)
f.map{a=>a.foreach{triple=>
//do something
}}
Asynchronous
iteration
Message passing
f.map{a=>a.foreach{triple=>
someSink ! triple
}}
send triple to
actor
Immutable RDF stream
avoid shared mutable state
avoid concurrent writes
unbounded sequence
Ideas using akka actors
Futures
non blocking composition
concurrent computations
work with not-yet-computed results
Actors
message-based
share-nothing async
distributable
16
17. Dynamic Push-Pull
17
Producer
Consumer
m
data flow
demand flow
Push when consumer is faster
Pull when producer is faster
Dynamically switch modes
Communication is dynamic
depending on demand vs supply
Producer Consumer
m
m
m m
m
m
m
m
m
m
push
21. The RSP Community
Research work
Many Papers
PhD Thesis
Datasets
Prototypes
Benchmarks
RDF Streams
Stream Reasoning
Complex Event Processing
Stream Query Processing
Stream Compression
Semantic Sensor Web
Manytopics
Tonsofwork
http://www.w3.org/community/rsp
W3C RSP Community Group
Effort to our work on RDF stream processing
discuss
standardize
combine
formalize
evangelize
21
23. RDF Stream
23
… other issues:
Graph implementation?
Timestamps: application vs system?
Serialization?
Loose coupling
Immutable data streams
Asynchronous message passing
Well defined input/output
24. ⑥ Guarantee data safety and availability
Restart,
Suspend,
Stop,
Escalate, etc
24
Parent
Actor
1
Automatic supervision
Isolate failures
Manage local failures
Supervision strategies:
All-for-One
One-for-one
Death watch handling
Supervision
hierarchy
Supervision
Actor
2
Actor
3
Actor
4
X
25. Actors everywhere
25
Actor
1
Actor
2
m No difference in
one core
many cores
many servers
Actor
3
Actor
4
Transparent Remoting
Locality optimization
Define Routing policies
Define actor clusters
m
m
Existing ‘map reduce’ for streams: Storm, S4, Spark, Akka Streams
Create workflows of Stream Processors?