Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note
1. Stream Reasoning
Where We Got So Far
Oxford - 2010.1.18
http://streamreasoning.org
Emanuele Della Valle
DEI - Politecnico di Milano
emanuele.dellavalle@polimi.it
http://emanueledellavalle.org
Joint work with:
Davide Francesco Barbieri, Daniele Braga, Stefano http://wiki.larkc.eu/UrbanComputing
• For more information visit Ceri, and Michael Grossniklaus
2. Agenda
• Motivation
• Running Example
• Background
• Concept
• Achievements
• Retrospective and Conclusions
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 2
3. Motivation
It s a streaming World! [IEEE-IS2009]
• Sensor networks, …
• traffic engineering, …
• social networking, …
• financial markets, …
• generate streams!
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 3
4. Running Example
Real-Time Streams on the Web
• Streams are appearing more and more often on the
Web in sites that distribute and present information in
real-time streams.
• Checkout http://activitystrea.ms/ for a standard API
• E.g.
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 4
5. Running Example
Examples of Questions Users are Asking
• Which topics have my close friends discussed in the
last hour?
• Which book is my friend likely to read next?
• What impact have I been creating with my tweets in
the last day?
• …
• <query> … <time dimension> ?
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 5
6. Motivation
Problem Statement
• Making sense
– in real time
– of gigantic and inevitably noisy data streams
– in order to support the decision process of
extremely large numbers of concurrent user
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 6
7. Background
What are data streams anyway?
• Formally:
– Data streams are unbounded sequences of time-
varying data elements
time
• Less formally:
– an (almost) continuous flow of information
– with the recent information being more relevant as it
describes the current state of a dynamic system
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 7
8. Background
Continuous Semantics
• Processing data streams in the space of
one-time semantics is difficult
because of the very nature of the underlying data
• Innovative* assumption: continuous semantics!
– streams can be consumed on the fly rather than being
stored forever and
– queries are registered and continuously produce
answers
* This innovation arose in DB community in 90s
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 8
9. Background
Stream Processing
• Continuous queries registered over streams that
are observed trough windows
window
input stream Registered
stream of answer
Con-nuous
Query
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 9
10. Background
Data Stream Management Systems (DSMS)
• Research Prototypes
– Amazon/Cougar (Cornell) – sensors
– Aurora (Brown/MIT) – sensor monitoring, dataflow
– Gigascope: AT&T Labs – Network Monitoring
– Hancock (AT&T) – Telecom streams
– Niagara (OGI/Wisconsin) – Internet DBs & XML
– OpenCQ (Georgia) – triggers, view maintenance
– Stream (Stanford) – general-purpose DSMS
– Stream Mill (UCLA) - power & extensibility
– Tapestry (Xerox) – publish/subscribe filtering
– Telegraph (Berkeley) – adaptive engine for sensors
– Tribeca (Bellcore) – network monitoring
• High-tech startups
– Streambase, Coral8, Apama, Truviso
• Major DBMS vendors are all adding stream extensions as well
– Oracle http://www.oracle.com/technology/products/dataint/htdocs/streams_fo.html
– DB2 http://www.eweek.com/c/a/Database/IBM-DB2-Turns-25-and-Prepares-for-New-Life/
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 10
11. Background
Can the Semantic Web process data stream?
• The Semantic Web, the Web of Data is doing fine
– RDF, RDF Schema, SPARQL, OWL, RIF
– well understood theory,
– rapid increase in scalability
• BUT it pretends that the world is static
or at best a low change rate
both in change-volume and change-frequency
– ontology versioning
– belief revision
– time stamps on named graphs
• It sticks to the traditional one-time semantics
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 11
12. Concept
Stream Reasoning [IEEE-IS2010]
• Idea origination
– Can continuous semantics be ported to reasoning?
– This is an unexplored yet high impact research area!
• Stream Reasoning
– Logical reasoning in real time on gigantic and
inevitably noisy data streams in order to support
the decision process of extremely large numbers
of concurrent users.
-- S. Ceri, E. Della Valle, F. van Harmelen and H. Stuckenschmidt, 2010
• Note: making sense of streams necessarily requires
processing them against rich background knowledge
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 12
13. Concept
Research Challenges
• Relation with data-stream systems
– Just as RDF relates to data-base systems?
• Query languages for semantic streams
– Just as SPARQL for RDF but with continuous semantics?
• Reasoning on Streams
– Formal representations for stream reasoning
– Notions of soundness and completeness
– Efficiency
– Scalability
• Dealing with incomplete & noisy data
– Even more so than on the current Web of Data
• Distributed and parallel processing
– Streams are parallel in nature
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 13
14. Achievements
Explored Continuous Semantics for SeWeb
• We investigated
– Architecture of a Stream Reasoner
– RDF streams
• the natural extension of the RDF data model to the new
continuous scenario and
– Continuous SPARQL (or simply C-SPARQL)
• the extension of SPARQL for querying RDF streams.
– Efficient incremental updates of deductive
closures
• specifically considering the nature of data streams
– Effective inductive stream reasoning (joint work
with Siemens - Munich)
• See paper in IEEE IS special issue on Social Media
Analytics
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 14
15. Achievements
Architecture (IEEE-IS2010)
Social
Media
Analytics
Selector Abstracter Deductive C
Window DSMS
. DSMS Reasoner
C C
Abstracter Inductive
Legend Long-‐Term P
data
stream C C-‐SPARQL
query Matrix Reasoner
RDF
stream P SPARQL
with Probability
Abstracter Inductive
RDF
graph Hype P
Matrix Reasoner
• Based on the LarKC conceptual framework
http://www.larkc.eu
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 15
16. Achievements
RDF Stream [WWW2009,EDBT2010,IJSC2010]
• RDF Stream Data Type
– Ordered sequence of pairs, where each pair is made
of an RDF triple and its timestamp t
(< triple >, t)
• E.g.,
(<:Giulia :likes :Twilight >, 2010-02-12T13:34:41)
(<:John :likes :TheLordOfTheRings >, 2010-02-12T13:36:28)
(<:Alice :dislikes :Twilight >, 2010-02-12T13:36:28)
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 16
17. Achievements
C-SPARQL [WWW2009,EDBT2010,IJSC2010]
• We specificied of C-SPARQL syntax
– Incrementally, from existing specifications
• Including windows, grouping, aggregates, timestamping
• We gave the formal semantics of C-SPARQL
– Query registration, handling overloads
– Order of evaluation, pattern matching over time, …
• We investigated efficiency of evaluation
– Defining a suitable algebra
– Applying optimizations
– Efficient materialization of inferred data from streams
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 17
18. Achievements
An Example of C-SPARQL Query
Who are the opinion makers? i.e., the users who are likely to influence
the behavior of other users who follow them
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS
CONSTRUCT { ?opinionMaker sd:about ?resource }
FROM STREAM <http://streamingsocialdata.org/interactions>
[RANGE 30m STEP 5m]
WHERE {
?opinionMaker ?opinion ?resource .
?follower sioc:follows ?opinionMaker.
?follower ?opinion ?resource.
FILTER ( cs:timestamp(?follower) >
cs:timestamp(?opinionMaker)
&& ?opinion != sd:accesses )
}
HAVING ( COUNT(DISTINCT ?follower) > 3 )
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 18
19. Achievements
An Example of C-SPARQL Query
Who are the opinion makers? i.e., the users who are likely to influence
Query registration RDF Stream added as
the (for continuous execution) who follow them
behavior of other users new ouput format
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS
CONSTRUCT { ?opinionMaker sd:about ?resource }
FROM STREAM <http://streamingsocialdata.org/interactions>
[RANGE 30m STEP 5m] FROM STREAM clause
WHERE {
?opinionMaker ?opinion ?resource . WINDOW
?follower sioc:follows ?opinionMaker. Builtin to
?follower ?opinion ?resource. access
timestamps
FILTER ( cs:timestamp(?follower) >
cs:timestamp(?opinionMaker)
&& ?opinion != sd:accesses ) Aggregates as
in SPARQL 1.1
}
HAVING ( COUNT(DISTINCT ?follower) > 3 )
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 19
20. Achievements
Efficiency of Evaluation 1/3 [IEEE-IS2010]
• Evaluation of Window-based Selection
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 20
21. Achievements
Efficiency of Evaluation 2/3 [EDBT2010]
• Several transformations can be applied to algebraic
representation of C-SPARQL
• some recalling well known results from classical
relational optimization
– push of FILTERs and projections
• some being more specific to the domain of streams.
– push of aggregates.
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 21
22. Achievements
Efficiency of Evaluation 3/3 [EDBT2010]
• Push of filters and projections
125
100
75
ms
50
25
0
10 100 1000 10000 100000
Window Size
None Static Only Streaming Only Both
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 22
23. Achievements
Example of C-SPARQL and Reasoning 1/2
What impact have I been creating with my tweets in the last hour?
Is it positive or negative? Let’s count them …
REGISTER QUERY CountPositiveAndNegativeReactions AS
PREFIX : <http://ex.org/twitterImpactMining#>
SELECT ?t count(?pos) count(?neg)
FROM STREAM <http://ex.org/discussions.trdf>
[RANGE 30m STEP 30s] :discuss a owl:TransitiveProperty .
WHERE { :reply rdfs:subPropertyOf :discuss .
?t a :MonitoredTweet . :retweet rdfs:subPropertyOf :discuss .
{ ?pos :discuss ?t ;
:ProduceReaction [ a :PositiveReaction ] .
} UNION {
?neg :discuss ?t ;
:ProduceReaction [ a :NegativeReaction ] .
}
} GROUP BY ?t
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 23
24. Achievements
Example of C-SPARQL and Reasoning 2/2
discuss
discuss
retweet
reply
retweet
t1
t1-‐1
t1-‐2
t1-‐3
discuss
discuss
discuss
discuss
Monitored
Posi.ve
Nega.ve
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 24
25. Achievements
State-of-the-Art Approach [Ceri1994,Volz2005]
1. Overestimation of deletion: Overestimates deletions
by computing all direct consequences of a deletion.
2. Rederivation: Prunes those estimated deletions for
which alternative derivations (via some other facts
in the program) exist.
3. Insertion: Adds the new derivations that are
consequences of insertions to extensional
predicates.
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 25
26. Achievements
our approach [ESWC2010] 1/2
• Assuption
– Insertions and deletions are triples respectively
entering and exiting the window
– The window size is known
• Therefore
– The time when each triple will expire is known and
determined by the window size
• E.g. if the window is 10s long a triple entering at time t will
exit at time t+10s
– Note: all knowledge can be annotated with an
expiration time
• i.e., background knowledge is annotated with +∞
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 26
27. Achievements
our approach [ESWC2010] 2/2
• The algorithm
1. deletes all triples (asserted or inferred) that have just
expired
2. computes the entailments derived by the inserts,
3. annotates each entailed triple with a expiration time,
and
4. eliminates from the current state all copies of derived
triples except the one with the highest timestamp.
• learn more
– http://www.slideshare.net/emanueledellavalle/incremental-
reasoning-on-streams-andrich-background-knowledge
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 27
28. Achievements
Comparative Evaluation 1/2 [ESWC2010]
• Hypothesis
– Background knowledge do not change and it is fully materialized
– Changes only take place in the window
• An experiment comparing the time required to compute a new
materialization using
– Re-computing from scratch (i.e.,1250 ms in our setting)
– State of the art incremental approach [Volz, 2005]
– Our approach
• Results at increasing % of the materialization changed when
the window slides
10000
1000
ms.
100
10
0,0% 2,0% 4,0% 6,0% 8,0% 10,0% 12,0% 14,0% 16,0% 18,0% 20,0%
• . %
of
t he
m aterialization
changed
when
t he
window
slides
incremental-‐volz incremental-‐stream
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 28
29. Achievements
Comparative Evaluation 2/2
• Comparison of the average time needed to answer a
C-SPARQL query using
– a forward reasoner,
– the naive approach of re-computing the materialization
– our approach
20
15
10
ms.
5
0
forward
reasoning naive
approach incremental-‐stream
query 5,82 1,61 1,61
materialization 0 15,91 0,28
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 29
30. Retrospective and Conclusions
Wrap Up
• RDF Streams
– Notion defined
• C-SPARQL
– Syntax and semantics defined as a SPARQL extension
– Engine designed
– Engine implemented based on the decision to keep stream
management and query evaluation separated
• Experiments with C-SPARQL under simple RDF entailment
regimes
– window based selection of C-SPARQL outperforms the standard
FILTER based selection
– having formally defined C-SPARQL semantics algebraic
optimizations are possible
• Experiment with C-SPARQL under OWL-RL entailment
regimes
– efficient incremental updates of deductive closures investigated
– our approach outperform state-of-the-art when updates comes as
stream
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 30
31. Retrospective and Conclusions
Achievements vs. Research Challenges
• Relation with data-stream systems
– Notion of RDF stream :-|
• Query languages for semantic streams
– C-SPARQL :-D
• Reasoning on Streams
– Formal representations for stream reasoning
• :-P
– Notions of soundness and completeness
• :-P
– Efficient incremental updates of deductive closures
• ESWC 2010 paper :-) ... but much more work is needed!
– How to combine streams and background knowledge
• ESWC 2010 paper :-| ... but a lot needs to be studied ...
• Dealing with incomplete & noisy data
– :-P
• Distributed and parallel processing
– :-P
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 31
32. References
• Vision
[IEEE-IS2009] Emanuele Della Valle, Stefano Ceri, Frank van Harmelen, Dieter Fensel
It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent
Systems 24(6): 83-89 (2009)
• Continuous SPARQL (C-SPARQL)
[EDBT2010] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri and Michael
Grossniklaus. An Execution Environment for C-SPARQL Queries. EDBT 2010
[WWW2009] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle,
Michael Grossniklaus: C-SPARQL: SPARQL for continuous querying. WWW 2009:
1061-1062
[IJSC2010] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle,
Michael Grossniklaus: C-SPARQL: a Continuous Query Language for RDF Data Streams.
Int. J. Semantic Computing 4(1): 3-25 (2010)
[IEEE-IS2010] Davide Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Yi Huang,
Volker Tresp, Achim Rettinger, Hendrik Wermser, "Deductive and Inductive Stream
Reasoning for Semantic Social Media Analytics," IEEE Intelligent Systems, 30 Aug. 2010.
• Stream Reasoning
[ESWC2010] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle,
Michael Grossniklaus. Incremental Reasoning on Streams and Rich Background
Knowledge. In. 7th Extended Semantic Web Conference (ESWC 2010)
• Background work
[Ceri1994] Stefano Ceri, Jennifer Widom: Deriving Incremental Production Rules for Deductive
Data. Inf. Syst. 19(6): 467-490 (1994)
[Volz2005] Raphael Volz, Steffen Staab, Boris Motik: Incrementally Maintaining
Materializations of Ontologies Stored in Logic Databases. J. Data Semantics 2: 1-34 (2005)
Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 32
33. Thank You! Questions?
Much More to Come!
Keep an eye on
http://www.streamreasoning.org
Oxford, 2011-1-18 For more information visit http://www.larkc.eu/ 33