SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
Scalable real-time 
processing techniques 
How to almost count 
Lars Albertsson, Schibsted
“We promised to count live... 
...but since you can’t do that, we used historical 
numbers and this cool math to extrapolate.” 
?!?
Stream counting is simple 
You already have the building blocks 
Yet many wait for batch execution 
Or go through estimation hoops
Accurate counting 
Server Bus 
Bucketiser 
Bucketiser 
Bucketiser 
Aggregator 
Server 
Server 
Server 
● Straightforward, with some plumbing. 
● Heavier than you need.
Now or later? Exact or rough? 
Approximation now >> accurate later
Basic scenarios 
● How many distinct items in last x minutes? 
● What are the top k items in last x minutes? 
● How many Ys in last x minutes? 
These base techniques are sufficient for 
implementing e.g. personalisation and 
recommendation algorithms.
Cardinality - distinct stream count 
● Naive: Set of hashes. X bits per item.
Cardinality - distinct stream count 
● Naive: Set of hashes. X bits per item. 
● Naive 2: Set approximation with Bloom filter 
+ counter.
Counting in context 
● Look backward, different time windows, 
compare. 
● Count for a small time quantum, keep 
history. 
● Aggregate old windows. 
● Monoid representations are desirable.
Cardinality - distinct stream count 
● Naive: Set of hashes. X bits per item. 
● Naive 2: Set approximation with Bloom filter 
+ counter. 
● Naive 3: Hash to bitmap. Count bits.
Cardinality - distinct stream count 
● Naive: Set of hashes. X bits per item. 
● Naive 2: Set approximation with Bloom filter 
+ counter. 
● Naive 3: Hash to bitmap. Count bits. 
● Attempt 4: Hash, bitmap, count + collision 
compensation. Linear Probabilistic Counter.
Cardinality - distinct stream count 
● Naive: Set of hashes. X bits per item. 
● Naive 2: Set approximation with Bloom filter 
+ counter. 
● Naive 3: Hash to bitmap. Count bits. 
● Attempt 4: Hash, bitmap, count + collision 
compensation. Linear Probabilistic Counter. 
● Read papers… -> HyperLogLog counter
Cardinality - distinct stream count 
Source: Shakespeare, highscalability.com
Top K counting 
U2 65 
Gaga 46 
Avicii 23 
Eminem 21 
Dolly 18 
U2 65 
Gaga 46 
Avicii 23 
Eminem 21 
Peps 19 
U2 65 
Gaga 46 
Avicii 23 
Eminem 21 
Dolly 20 
● Keep k items, assume absentees have 
lowest value 
● Accurate at top, overcounting in bottom
Approx counting - Count-Min Sketch 
● Compute n hashes for key. 
● Increment once on each row, col by mod 
(hash) 
● Retrieve by min() over rows 
3 7 20 3 11 6 3+1 4 1 1 
3 8 6 2+1 17 13 1 0 4 5 
12 7 6 14 2 0 2 3 6+1 7 
3 2 12 8+1 10 2 7 2 11 2
Top K with Count-Min Sketch 
U2 65 
Gaga 46 
Avicii 23 
Eminem 21 
Dolly 18 
U2 65 
Gaga 46 
Avicii 23 
Eminem 21 
Peps 2 
U2 65 
Gaga 46 
Avicii 23 
Eminem 21 
Dolly 19 
● Keep Heavy Hitters list. 
● Lookup absentees in CMS. 
● Risk of overcount is smaller and spread out.
Cubic CMS 
● Decorate song with geo, age, etc. Pour into 
CMS. 
● Keep heavy hitters per geo, age group. 
*:*:<U2> 
SE:*:<U2> 
*:31-40:<U2> 
SE:31-40:<U2> 
+1 
+1 
+1 
+1
Machinery 
O(104) messages / s per machine. 
You probably only need one. If not, use Storm. 
Read and write to pub/sub channel, e.g. Kafka 
or ZeroMQ.
Brute force alternative 
Dump every single message into 
ElasticSearch. 
Suitable for high dimensionality cubes.
Recommendations, you said? 
● Collaborative filtering - similarity matrix 
Users 
2 4 1 1 5 2 
0 1 7 1 0 6 
5 2 9 0 3 0 
3 8 0 6 0 7 
Items
Shave the matrix 
Users 
Items 
0,0 3 
0,1 5 
0,2 0 
0,3 2 
1,0 8 
... ... 
2,1 9 
1,0 8 
2,2 7 
5,0 7 
5,2 6 
... ... 
Flip Sort 
2,1 9 
1,0 8 
2,2 7 
5,0 7 
5,2 6 
Cut 
0 0 0 0 0 0 
0 0 7 0 0 6 
0 0 9 0 0 0 
0 8 0 0 0 7 
Noise removed - fine for 
recommendations 
2 4 1 1 5 2 
0 1 7 1 0 6 
5 2 9 0 3 0 
3 8 0 6 0 7
Hungry for more? 
Mikio Braun: http://www.berlinbuzzwords.de/session/real-time-personalization-and- 
recommendation-stream-mining 
Ted Dunning on deep learning for real-time anomaly detection: http://www. 
berlinbuzzwords.de/session/deep-learning-high-performance-time-series-databases 
Ted Dunning on Storm: http://www.youtube.com/watch?v=7PcmbI5aC20 
Open source: stream-lib, Algebird
Want to work in this area? 
lalle@schibsted.com
Scalable real-time processing techniques
Scalable real-time processing techniques
Scalable real-time processing techniques
Scalable real-time processing techniques
Scalable real-time processing techniques
Scalable real-time processing techniques
Scalable real-time processing techniques
Scalable real-time processing techniques

Contenu connexe

Tendances

(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...Altinity Ltd
 
Introduction to InfluxDB
Introduction to InfluxDBIntroduction to InfluxDB
Introduction to InfluxDBJorn Jambers
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...InfluxData
 
Time series database, InfluxDB & PHP
Time series database, InfluxDB & PHPTime series database, InfluxDB & PHP
Time series database, InfluxDB & PHPCorley S.r.l.
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseFlorian Lautenschlager
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series WorldMapR Technologies
 
Devoxx france 2015 influxdb
Devoxx france 2015 influxdbDevoxx france 2015 influxdb
Devoxx france 2015 influxdbNicolas Muller
 
Statsd introduction
Statsd introductionStatsd introduction
Statsd introductionRick Chang
 
Developing Ansible Dynamic Inventory Script - Nov 2017
Developing Ansible Dynamic Inventory Script - Nov 2017Developing Ansible Dynamic Inventory Script - Nov 2017
Developing Ansible Dynamic Inventory Script - Nov 2017Ahmed AbouZaid
 
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...Altinity Ltd
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkDatabricks
 
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLONPaul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLONOutlyer
 
Monitoring in a scalable world
Monitoring in a scalable worldMonitoring in a scalable world
Monitoring in a scalable worldTechExeter
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...InfluxData
 
RBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingRBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingGyula Fóra
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016DataStax
 

Tendances (20)

(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Tick
TickTick
Tick
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 
Introduction to InfluxDB
Introduction to InfluxDBIntroduction to InfluxDB
Introduction to InfluxDB
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
 
Time series database, InfluxDB & PHP
Time series database, InfluxDB & PHPTime series database, InfluxDB & PHP
Time series database, InfluxDB & PHP
 
Apache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series databaseApache Solr as a compressed, scalable, and high performance time series database
Apache Solr as a compressed, scalable, and high performance time series database
 
Time Series Data in a Time Series World
Time Series Data in a Time Series WorldTime Series Data in a Time Series World
Time Series Data in a Time Series World
 
Devoxx france 2015 influxdb
Devoxx france 2015 influxdbDevoxx france 2015 influxdb
Devoxx france 2015 influxdb
 
Statsd introduction
Statsd introductionStatsd introduction
Statsd introduction
 
Developing Ansible Dynamic Inventory Script - Nov 2017
Developing Ansible Dynamic Inventory Script - Nov 2017Developing Ansible Dynamic Inventory Script - Nov 2017
Developing Ansible Dynamic Inventory Script - Nov 2017
 
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
 
InfluxDb
InfluxDbInfluxDb
InfluxDb
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLONPaul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
 
Monitoring in a scalable world
Monitoring in a scalable worldMonitoring in a scalable world
Monitoring in a scalable world
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
RBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at KingRBea: Scalable Real-Time Analytics at King
RBea: Scalable Real-Time Analytics at King
 
Graphite
GraphiteGraphite
Graphite
 
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
 

En vedette

Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelinesLars Albertsson
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...confluent
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015Holden Karau
 
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot InstancesAmazon Web Services
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyChris Johnson
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systemsXavier Amatriain
 

En vedette (9)

Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
 
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 

Similaire à Scalable real-time processing techniques

FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsDebasish Ghosh
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
Reactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServicesReactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServicesStéphane Maldini
 
Webinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under ControlWebinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under ControlScyllaDB
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasFlink Forward
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxGopiNathVelivela
 
[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...
[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...
[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...Insight Technology, Inc.
 
Sketch algorithms
Sketch algorithmsSketch algorithms
Sketch algorithmsSimon Belak
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"NUS-ISS
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithmsSandeep Joshi
 
Counting (Using Computer)
Counting (Using Computer)Counting (Using Computer)
Counting (Using Computer)roshmat
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
 
Interview questions slide deck
Interview questions slide deckInterview questions slide deck
Interview questions slide deckMikeBegley
 
#GDC15 Code Clinic
#GDC15 Code Clinic#GDC15 Code Clinic
#GDC15 Code ClinicMike Acton
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon RedshiftJeff Patti
 
Probabilistic data structure
Probabilistic data structureProbabilistic data structure
Probabilistic data structureThinh Dang
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 

Similaire à Scalable real-time processing techniques (20)

FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
14 query processing-sorting
14 query processing-sorting14 query processing-sorting
14 query processing-sorting
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Reactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServicesReactor, Reactive streams and MicroServices
Reactor, Reactive streams and MicroServices
 
Webinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under ControlWebinar: Using Control Theory to Keep Compactions Under Control
Webinar: Using Control Theory to Keep Compactions Under Control
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
 
streamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptxstreamingalgo88585858585858585pppppp.pptx
streamingalgo88585858585858585pppppp.pptx
 
[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...
[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...
[db analytics showcase Sapporo 2018] B33 H2O4GPU and GoAI: harnessing the pow...
 
Sketch algorithms
Sketch algorithmsSketch algorithms
Sketch algorithms
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithms
 
Counting (Using Computer)
Counting (Using Computer)Counting (Using Computer)
Counting (Using Computer)
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
 
Interview questions slide deck
Interview questions slide deckInterview questions slide deck
Interview questions slide deck
 
#GDC15 Code Clinic
#GDC15 Code Clinic#GDC15 Code Clinic
#GDC15 Code Clinic
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
 
Probabilistic data structure
Probabilistic data structureProbabilistic data structure
Probabilistic data structure
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 

Plus de Lars Albertsson

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Crossing the data divide
Crossing the data divideCrossing the data divide
Crossing the data divideLars Albertsson
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with ScalametaLars Albertsson
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfLars Albertsson
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdfLars Albertsson
 
The 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfLars Albertsson
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application qualityLars Albertsson
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetLars Albertsson
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesLars Albertsson
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift leftLars Albertsson
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityLars Albertsson
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish styleLars Albertsson
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data opsLars Albertsson
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data qualityLars Albertsson
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processingLars Albertsson
 
Taming the reproducibility crisis
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisisLars Albertsson
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipelineLars Albertsson
 

Plus de Lars Albertsson (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Crossing the data divide
Crossing the data divideCrossing the data divide
Crossing the data divide
 
Schema management with Scalameta
Schema management with ScalametaSchema management with Scalameta
Schema management with Scalameta
 
How to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdfHow to not kill people - Berlin Buzzwords 2023.pdf
How to not kill people - Berlin Buzzwords 2023.pdf
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
The 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdfThe 7 habits of data effective companies.pdf
The 7 habits of data effective companies.pdf
 
Holistic data application quality
Holistic data application qualityHolistic data application quality
Holistic data application quality
 
Secure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budgetSecure software supply chain on a shoestring budget
Secure software supply chain on a shoestring budget
 
DataOps - Lean principles and lean practices
DataOps - Lean principles and lean practicesDataOps - Lean principles and lean practices
DataOps - Lean principles and lean practices
 
Ai legal and ethics
Ai   legal and ethicsAi   legal and ethics
Ai legal and ethics
 
The right side of speed - learning to shift left
The right side of speed - learning to shift leftThe right side of speed - learning to shift left
The right side of speed - learning to shift left
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
 
Data democratised
Data democratisedData democratised
Data democratised
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Eventually, time will kill your data processing
Eventually, time will kill your data processingEventually, time will kill your data processing
Eventually, time will kill your data processing
 
Taming the reproducibility crisis
Taming the reproducibility crisisTaming the reproducibility crisis
Taming the reproducibility crisis
 
Eventually, time will kill your data pipeline
Eventually, time will kill your data pipelineEventually, time will kill your data pipeline
Eventually, time will kill your data pipeline
 

Dernier

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 

Dernier (20)

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 

Scalable real-time processing techniques

  • 1. Scalable real-time processing techniques How to almost count Lars Albertsson, Schibsted
  • 2. “We promised to count live... ...but since you can’t do that, we used historical numbers and this cool math to extrapolate.” ?!?
  • 3. Stream counting is simple You already have the building blocks Yet many wait for batch execution Or go through estimation hoops
  • 4. Accurate counting Server Bus Bucketiser Bucketiser Bucketiser Aggregator Server Server Server ● Straightforward, with some plumbing. ● Heavier than you need.
  • 5. Now or later? Exact or rough? Approximation now >> accurate later
  • 6. Basic scenarios ● How many distinct items in last x minutes? ● What are the top k items in last x minutes? ● How many Ys in last x minutes? These base techniques are sufficient for implementing e.g. personalisation and recommendation algorithms.
  • 7. Cardinality - distinct stream count ● Naive: Set of hashes. X bits per item.
  • 8. Cardinality - distinct stream count ● Naive: Set of hashes. X bits per item. ● Naive 2: Set approximation with Bloom filter + counter.
  • 9. Counting in context ● Look backward, different time windows, compare. ● Count for a small time quantum, keep history. ● Aggregate old windows. ● Monoid representations are desirable.
  • 10. Cardinality - distinct stream count ● Naive: Set of hashes. X bits per item. ● Naive 2: Set approximation with Bloom filter + counter. ● Naive 3: Hash to bitmap. Count bits.
  • 11. Cardinality - distinct stream count ● Naive: Set of hashes. X bits per item. ● Naive 2: Set approximation with Bloom filter + counter. ● Naive 3: Hash to bitmap. Count bits. ● Attempt 4: Hash, bitmap, count + collision compensation. Linear Probabilistic Counter.
  • 12. Cardinality - distinct stream count ● Naive: Set of hashes. X bits per item. ● Naive 2: Set approximation with Bloom filter + counter. ● Naive 3: Hash to bitmap. Count bits. ● Attempt 4: Hash, bitmap, count + collision compensation. Linear Probabilistic Counter. ● Read papers… -> HyperLogLog counter
  • 13. Cardinality - distinct stream count Source: Shakespeare, highscalability.com
  • 14. Top K counting U2 65 Gaga 46 Avicii 23 Eminem 21 Dolly 18 U2 65 Gaga 46 Avicii 23 Eminem 21 Peps 19 U2 65 Gaga 46 Avicii 23 Eminem 21 Dolly 20 ● Keep k items, assume absentees have lowest value ● Accurate at top, overcounting in bottom
  • 15. Approx counting - Count-Min Sketch ● Compute n hashes for key. ● Increment once on each row, col by mod (hash) ● Retrieve by min() over rows 3 7 20 3 11 6 3+1 4 1 1 3 8 6 2+1 17 13 1 0 4 5 12 7 6 14 2 0 2 3 6+1 7 3 2 12 8+1 10 2 7 2 11 2
  • 16. Top K with Count-Min Sketch U2 65 Gaga 46 Avicii 23 Eminem 21 Dolly 18 U2 65 Gaga 46 Avicii 23 Eminem 21 Peps 2 U2 65 Gaga 46 Avicii 23 Eminem 21 Dolly 19 ● Keep Heavy Hitters list. ● Lookup absentees in CMS. ● Risk of overcount is smaller and spread out.
  • 17. Cubic CMS ● Decorate song with geo, age, etc. Pour into CMS. ● Keep heavy hitters per geo, age group. *:*:<U2> SE:*:<U2> *:31-40:<U2> SE:31-40:<U2> +1 +1 +1 +1
  • 18. Machinery O(104) messages / s per machine. You probably only need one. If not, use Storm. Read and write to pub/sub channel, e.g. Kafka or ZeroMQ.
  • 19. Brute force alternative Dump every single message into ElasticSearch. Suitable for high dimensionality cubes.
  • 20. Recommendations, you said? ● Collaborative filtering - similarity matrix Users 2 4 1 1 5 2 0 1 7 1 0 6 5 2 9 0 3 0 3 8 0 6 0 7 Items
  • 21. Shave the matrix Users Items 0,0 3 0,1 5 0,2 0 0,3 2 1,0 8 ... ... 2,1 9 1,0 8 2,2 7 5,0 7 5,2 6 ... ... Flip Sort 2,1 9 1,0 8 2,2 7 5,0 7 5,2 6 Cut 0 0 0 0 0 0 0 0 7 0 0 6 0 0 9 0 0 0 0 8 0 0 0 7 Noise removed - fine for recommendations 2 4 1 1 5 2 0 1 7 1 0 6 5 2 9 0 3 0 3 8 0 6 0 7
  • 22. Hungry for more? Mikio Braun: http://www.berlinbuzzwords.de/session/real-time-personalization-and- recommendation-stream-mining Ted Dunning on deep learning for real-time anomaly detection: http://www. berlinbuzzwords.de/session/deep-learning-high-performance-time-series-databases Ted Dunning on Storm: http://www.youtube.com/watch?v=7PcmbI5aC20 Open source: stream-lib, Algebird
  • 23. Want to work in this area? lalle@schibsted.com