2. Introduction
● plista GmbH
○ recommendations & advertising
○ founded in 2008, Berlin [DE]
○ ~3k recommendations/ second
● never batch = never Hadoop
● stream computing with In Memory Database
● we love
3.
4. How to build recommendations?
welt.de/football/berlin_wins.html
We only have the URL?
to show recommendations
we are integrated on the
website
so "at least" we can count
the hits
5. Most popular
welt.de/football/berlin_wins.html
● ZINCR "p:welt.de" berlin_wins
● ZREVRANGEBYSCORE
p:welt.de
berlin_wins 689 +1
summer_is_coming 420
plista_company 135
Live Read
+ Live Write
= Real Time Recommendations
8. Most popular with timeseries
:1360007000
:1360007000
:1360007000
-1h -2h -3h -4h -5h -6h -7h -8h
9. Most popular to any context
● it's not only publisher, we use ~50 context
attributes publisher = welt.de
weekday = sunday
berlin_wins 689 +1
berlin_wins 400 +1
summer_is_coming 420
dortmund_wins 200
plista_company 135
... 100
context attributes:
● publisher geolocation = dortmund
● weekday dortmund_wins 200
● geolocation
● demographics berlin_wins 10 +1
● ... ... 5
10. Most popular to any context
● how it looks like in Redis
ZUNION ... WEIGHTS publisher = welt.de
p:welt.de:1360007 4
p:welt.de:1360006 2 weekday = sunday
berlin_wins 689 +1
p:welt.de:1360005 1 berlin_wins 400
summer_is_coming 420
w:sunday:1360007 4 dortmund_wins 200
plista_company 135
w:sunday:1360006 2
w:sunday:1360005 1 ... 100
g:dortmund:1360007 4 geolocation = dortmund
g:dortmund:1360006 2
g:dortmund:1360005 1 dortmund_wins 200
berlin_wins 10
... 5
11. Most popular with Effect size
● which context has an influence?
ZUNION ... WEIGHTS
p:welt.de:1360007 4 * 70%
p:welt.de:1360006 2 * 70%
p:welt.de:1360005 1 * 70%
Examples:
w:sunday:1360007 4 * 10% small effect: weather
w:sunday:1360006 2 * 10% big effect: publisher
w:sunday:1360005 1 * 10%
Data with small effect
g:dortmund:1360007 4 * 30% should not been taken
g:dortmund:1360006 2 * 30% into account, otherwise
g:dortmund:1360005 1 * 30% we get avg results
Effect Size
12. Most popular with Significance
● some data has more significance/trust
● so we add a significance matrix
publisher = welt.de sig:publisher = welt.de
berlin_wins 689 berlin_wins 1
summer_is_coming 420
X summer_is_coming 1
plista_company 135 plista_company 0.5
● Significance might depend on a common limit,
like 200 (in the example)
13. Most popular with Significance
● some data has more significance/trust
● so we add a significance matrix
SUM over all context
Σ( )
publisher = welt.de sig:publisher = welt.de
berlin_wins 689 berlin_wins 1
summer_is_coming 420
X summer_is_coming 1
plista_company 135 plista_company 0.5
Numerator
SUM over all context sig:publisher = welt.de Denominator
Σ
berlin_wins 1
summer_is_coming 1
plista_company 0.5
14. SUM over..
ZUNION ... WEIGHTS
● timeseries p:welt.de:1360007 4
● different context p:welt.de:1360006 2
● previous hits of the user p:welt.de:1360005 1
● similar publisher w:sunday:1360007 4
knowledge w:sunday:1360006 2
w:sunday:1360005 1
Σ
g:dortmund:1360007 4
publisher = welt.de
g:dortmund:1360006 2
berlin_wins 689 g:dortmund:1360005 1
summer_is_coming 420
plista_company 135 ... redis can do it ;)
15. Even more Matrix Operations ;)
● Similarity Matrix
● Human Control Matrix
Σ
● Meta-learning Matrix
○ might be covered in next talk
○ cooperation with
∏
○ aided from
16. Conclusions
● Redis fits perfect for simple operations
○ SUM + AGGREGATE + MIN + MAX
● In-Memory operations are pretty fast
● Real-time features feel better in a real-time
database (e.g. time series)
● We don't need batch
17. What else?
In Redis
● Incremental Collaborative Filtering
● More Recommender
● Live Statistics
At plista
● Semantics with Lucene
● Cloud Technologies
○ Scalability
○ Enterprise Service Bus
● Contest for Recommenders