Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
OrientDB - Time Series and Event Sequences - Codemotion Milan 2014
1. Time flows, my friend
Managing event sequences and time series with a
Document-Graph Database
Codemotion Milan 2014
Luigi Dell’Aquila
Orient Technologies LTD
Twitter: @ldellaquila
3. Time What…?
Time series:
A time series is a sequence of data points, typically
consisting of successive measurements made over a
time interval (Wikipedia)
4. Time What…?
Event sequences:
• A set of events with a timestamp
• A set of relationships “happened
before/after”
• Cause and effect relationships
5. Time What…?
Time as a dimension:
• Direct:
– Eg. begin and end of relationships (I’m a
friend of John since…)
• Calculated
– Eg. Speed (distance/time)
8. Fast and Effective
Fast write: Time doesn’t wait! Writes just arrive
Fast read: a lot of data to be read in a short time
Effective manipulation: complex operations like
- Aggregation
- Prediction
- Analysis
11. Current approaches
0. Relational approach: table
HH MM SS Value
14 35 0 1321
14 35 1 2444
14 35 2 2135
14 35 3 1833
12. Current approaches
0. Relational – Advantages
• Simple
• It can be used together with your application data
(operational)
13. Current approaches
0. Relational – Disadvantages
• Slow read (relies on an index)
• Slow insert (update the index…)
14. Current approaches
1. Document Database
• Collections of Documents instead of tables
• Schemaless
• Complex data structures
15. Current approaches
1. Document approach: Minute Based
{
timestamp: “2014-11-21 12.05“
load: [10, 15, 3, … 30] //array of 60, one per second
}
16. Current approaches
1. Document approach: Hour Based
{
timestamp: “2014-11-21 12.00“
load: {
0: [10, 15, 3, … 30], //array of 60, one per second
1: [0, 12, 31, … 24],
…
59: [10, 10, 1, … 16]
}
}
17. Current approaches
1. Document approach – Advantages
• Fast write: One insert x 60 updates
• Fast fetch
18. Current approaches
1. Document approach – Disadvantages
• Fixed time windows
• Single point per unit
• How to pre-aggregate?
• Relationships with the rest of the world?
• Relationships between events?
19. Current approaches
2. Graph Database
• Nodes/Edges instead of tables
• Index free adjacency
• Fast traversal
• Dynamic structure
20. Current approaches
2. Graph approach: linked sequence
e
1
next e
e
2
next e
next e
3
4
5
next
(timestamp on vertex)
21. Current approaches
2. Graph approach: linked sequence (tag
based)
e
1
e
2
nextTag1
e
3
nextTag2
e
4
nextTag1
e
5
nextTag1
nextTag2
[Tag1, Tag2] [Tag1]
[Tag1, Tag2]
[Tag1]
[Tag2]
22. Current approaches
2. Graph approach: Hierarchy
e
1
e
2
e6
0
1
1
8
24
2 60 …
…
Days
Hours
Minutes
Seconds
…
e
3
23. Current approaches
2. Graph approach: mixed
e
1
e
2
e6
0
1
1
8
24
2 60 …
…
Days
Hours
Minutes
Seconds
…
e
3
24. Current approaches
1. Graph approach – Advantages
• Flexible
• Events can be connected together in different ways
• You can connect events to other entities
• Fast traversal of dynamic time windows
• Fast aggregation (based on hierarchy)
25. Current approaches
1. Graph approach – Disadvantages
• Slow writes (vertex + edge + maintenance)
• Not so fast reads
26. Can we mix different models and get
all the advantages?
27. Can we mix all this with the rest of
application logic?
30. OrientDB
First step: put them together
1
1
8
24
2 60 …
Days
Hours
Minutes
…
{
0: 1000,
1: 1500.
…
59: 96
}
31. OrientDB
First step: put them together
1
1
8
24
2 60 …
Days
Hours
Minutes
…
{
0: 1000,
1: 1500.
…
59: 96
}
Graph
Document <- IT’S A VERTEX TOO!!!
32. OrientDB
First step: put them together
1
8
24
Days
… Hours
{
0: {
0: 1000,
1: 1500,
…
59: 210
}
1: { … }
…
59: { … }
}
Graph
Document
33. Where should I stop?
It depends on my domain and
requirements.
34. OrientDB
Result:
• Same insert speed of Document approach
• But with flexibility of a Graph
• (as a side effect of mixing models,
documents can also contain “pointers” to
other elements of app domain)
38. OrientDB
How to aggregate
Hooks: Server side triggers (Java or Javascript),
executed when DB operations happen (eg. Insert
or update)
Java interface:
Public RESULT onBeforeInsert(…);
public void onAfterInsert(…);
public RESULT onBeforeUpdate(…);
public void onAfterUpdate(…);
39. OrientDB
Aggregation logic
• Second 0 -> insert
• Second 1 -> update
• …
• Second 57 -> update
• Second 58 -> update
• Second 59 -> update + aggregate
– Write aggregate value on minute vertex
• Minute == 59? Calculate aggregate on hour vertex
40. OrientDB
1
1
8
24
2 60 …
Days
Hours
Minutes
…
{
0: 1,
1: 12.
…
59: 3
}
sum = 1000
sum = 15000
sum = 300
1 2
incomplete
complete
sum = null
sum = null
41. OrientDB
Query logic:
• Traverse from root node to specified level
(filtering based on vertex data)
• Is there aggregate value?
– Yes: return it
– No: go one level down and do the same
Aggregation on a level will be VERY fast if you
have horizontal edges!
42. OrientDB
How to calculate aggregate values with a query
Input params:
- Root node (suppose it is #11:11)
select sum(aggregateVal) from (
traverse out() from #11:11
while in().aggregateVal is null
)
With the same logic you can query based on time
windows
44. OrientDB
Another use case: Event Categories and OO
e
1
e
2
nextTag1
e
3
nextTag2
e
4
nextTag1
e
5
nextTag1
nextTag2
[Tag1, Tag2, Tag3] [Tag1]
[Tag1, Tag2]
[Tag1]
[Tag2]
nextTag3
e
3
[Tag3]
45. OrientDB
Another use case: Event Categories and OO
Suppose tags are hierarchical categories
(Classes for vertices and/or edges)
nextTAG
nextTagX nextTag3
nextTag1 nextTag2
46. OrientDB
Subset of events
TRAVERSE out(‘nextTag1’) FROM <e1>
e
1
e
2
nextTag1
e
4
nextTag1
e
5
nextTag1
[Tag1, Tag2, Tag3] [Tag1]
[Tag1, Tag2]
[Tag1]
47. OrientDB
Subset of events
TRAVERSE out(‘nextTag2’) FROM <e1>
e
1
nextTag1
nextTag2 e
e
3
5
nextTag2
[Tag1, Tag2, Tag3]
[Tag1, Tag2]
[Tag2]
48. OrientDB
Subset of events (Polymorphic!!!)
TRAVERSE out(‘nextTagX’) FROM <e1>
e
1
e
2
nextTag1
e
3
nextTag2
e
4
nextTag1
e
5
nextTag1
nextTag2
[Tag1, Tag2, Tag3] [Tag1]
[Tag1, Tag2]
[Tag1]
[Tag2]
52. Chase
• Your target is running away
• You have informers that track his moves
(coordinates in a point of time) and give
you additional (unstructured) information
• You have a street map
• You want to:
– Catch him ASAP
– Predict his moves
– Be sure that he is inside an area
55. Chase
• Map is made of points and distances
• You also have speed limits for streets
point1
pointN Distance: 1Km
Max speed: 70Km/h
Distance: 2Km
Max speed: 120Km/h
Distance: 8Km
Max speed: 90Km/h
Map point
Street
56. Chase
• Map is made of points and distances
• You also have speed limits for streets
• Distance / Speed = TIME!!!
57. Chase
You have a time series of your target’s moves
{
{
Timestamp: 29/11/2014 17:15:00
LAT: 19,12223
LON: 42,134
}
Timestamp: 29/11/2014 17:55:00
LAT: 19,12223
LON: 42,134
}
Event
Event seqence
{
Timestamp: 29/11/2014 17:55:00
LAT: 19,12223
LON: 42,134
}
58. Chase
You have a time series of your target’s moves
21/11/2014
2:35:00 PM
20/11/2014
1:20:00 PM
Map point
Street
59. Chase
You have a time series of your target’s moves
21/11/2014
14:35:00
20/11/2014
13:20:00
Event
Map point
Where
Event seqence
Street
29/11/2014
17:55:00
60. Chase
Vertices and edges are also documents
So you can store complex information inside them
{
timestamp: 22213989487987,
lat: xxxx,
lon: yyy,
informer: 15,
additional: {
speed: 120,
description: “the target was in a car”
car: {
model: “Fiat 500”,
licensePlate: “AA 123 BB”
}
}
}
61. Chase
Now you can:
• Predict his moves (eg. statistical methods,
interpolation on lat/lon + time)
• Calculate how far he can be (based on last
position, avg speed and street data)
• Reach him quickly (shortest path, Dijkstra)
• … intelligence?
62. Chase
But to have all this you need:
• An easy way for your informers to send
time series events
Hint: REST interface
With OrientDB you can expose Javascript
functions as REST services!
63. Chase
And you need:
• An extended query language
Eg.
TRAVERSE out(“street”) FROM (
SELECT out(“point”) FROM #11:11
// my last event
) WHILE canBeReached($current, #11:11)
(where he could be)
64. Chase
With OrientDB you can write
function canBeReached(node, event)
In Javascript and use it in your queries
65. Chase
It’s just a game, but think about:
• Fraud detection
• Traffic routing
• Multi-dimensional analytics
• Forecasting
• …
67. One model is not enough
One of most common issues of my customers
is:
“I have a zoo of technologies in my application
stack, and it’s getting worse every day”
My answer is: Multi-Model DB
68. One model is not enough
One of most common issues of my customers
is:
“I have a zoo of technologies in my application
stack, and it’s getting worse every day”
My answer is: Multi-Model DB
of course ;-)
69. From:
“choose the right data model for your
use case”
To:
“Your application has multiple data
models, you need all of them!”