SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
BERLIN BUZZWORDS 2014	

SELECTEDTALKS OVERVIEW
tech talk @ ferret
Andrii Gakhov	

05/06/2014
SAMZA AT LINKEDIN
TAKING STREAM PROCESSING ONTHE NEXT LEVEL
by Martin Kleppmann
ABOUT OF THE AUTHOR
• Software engineer at LinkedIn	

• co-founder of Rapportive (acquired by LinkedIn
in 2012)	

• http://martin.kleppmann.com/	

• @martinkl
APACHE SAMZA
• Apache Samza is a distributed stream processing framework.	

• http://samza.incubator.apache.org/	

• uses Apache Kafka for messaging	

• LinkedIn uses it in production
APACHE KAFKA
• A high-throughput distributed messaging system (commit log
service). Fast, Scalable, Durable and Distributed.	

• http://kafka.apache.org/	

• A single Kafka broker can handle hundreds of megabytes of
reads and writes per second from thousands of clients.	

• Messages are persisted on disk and replicated within the
cluster to prevent data loss. Each broker can handle terabytes
of messages without performance impact.	

• Key-value storage, but only appends are supported for key
SAMZAVS. STORM
• Both systems provide partitioned stream model, distributed execution environment,API
for steam processing, fault tolerance etc.	

• Similar parallelism model, but Storm uses 1 thread per task by default, Samza uses single-
threaded processes. Doesn’t support dynamic rebalancing.	

• Written in Java/Scala, and currently supports only JVM languages	

• Guaranteed delivery: Samza currenly supports only at-least-once delivery model
(planned for exactly-once).	

• Completely different state management. Instead of using remote DB for durable storage,
each Samza task includes an embedded key-value storage, located on he same machine.
Changes are replicated.	

• Samza better suited for handling keyed data (because never processes messages in a
partition out-of-order.
SAMZA ARCHITECTURE
YARN NodeManager
Kafka
Samza Container
Task Task
Samza Container
Task Task
YARN NodeManager
Kafka
Samza Container
Task Task
Samza Container
Task Task
YARN NodeManager
Kafka
Samza Container
Samza Container
CRDTS
CONSISTENCY WITHOUT CONSENSUS
by Peter Bourgon
ABOUT OF THE AUTHOR
• Engineer at SoundCloud	

• background in search and distributes systems	

• http://peter.bourgon.org	

• @peterbourgon
DISTRIBUTED SYSTEMTHEORY
• Partition-tolerance

system continues to operate despite message loss
due to network and/or node failure	

• Consistency

all nodes see the same data at the same time	

• Availability

a guarantee that every request receives a response
about whether it was successful or failed
CAP THEOREM
Partition

tolerance
Consistency Availability
CP AP
EXAMPLES
AP
• Cassandra	

• Riak	

• CouchBase	

• MongoDB



* eventual consistency, some node could
be stale, but not wrong
CP
• Paxos (doozer, chubby)	

• Zab (ZooKeeper)	

• Raft (Consul)	

* Consensus protocols
CRDTS
• CRDTs are data structures for distributed systems	

• C = Conflict-free	

• R = Replicated	

• D = Data	

• T =Types	

CRDTs archive eventual consistence by using CALM / ACID 2.0
principles
INCREMENT ONLY COUNTERS
Associative: {1} U ({2} U {3}) = ({1} U {2}) U {3}	

Commutative: {1} U {2} = {2} U {1}	

Idempotent: {1} U {1} = {1}
{ }	

{ }	

{ }
{ }	

{ }	

{ }
{ }	

{ }	

{ }
{ }	

{ }	

{ }
{ }	

{ }	

{ }
{ }	

{ }	

{ }
{ }	

{ }	

{ }
{ }	

{ }	

{ }
123 123
123
123
123
123
123
123
123
123
123456
123
123
123, 456
123
123
123, 456 123, 456
123, 456
123, 456
Items are unique IDs of users who listen the track. User can’t rewake his “choice”
SOUND CLOUD EXAMPLE
Event
• Timestamp (At 2014-05-26 12:04:56.097403 UTC)	

• User (snoopdogg)	

• Verb (Reposted)	

• Identifier (theeconomist/election-day)
DATA MODELS
Fanoutonwrite	

(closetothedataconsumer)
Faninonread	

(closetothedataproducer)
SoundCloud uses Cassandra and give each user a row in a column family.	

Reads are fast, but writes take more time when you have a lot of followers.
Reads are difficult. Nobody now builds timelines via fan-in-on-read. But here is a
big potential - huge storage reduction, keep data set in the memory etc.
CRDTS SET
Events are unique, so use a set!	

• G-set: can’t delete	

• 2P-set: add, remove once	

• OR-set: storage overhead	

• CRDT sets	

S+ = {A B C}

S- = {B}

S = {A C}
SET
• S = actor’s set keys (snoopdogg:outbox)	

• A, B, C, D = actor:verb:identifier	

• 1, 2, 3, 4 = timestamp	

S+ = {A/1 B/2 C/3}	

S- = {D/4}	

S = {A/1 B/2 C/3}	

• Read is easy, write is interesting!
EXAMPLE
• S+ = {A/1 B/2} S- = {C/3}	

• Insert D/4 => S+ = {A/1 B/2 D/4} S- = {C/3}	

• Insert D/4 => S+ = {A/1 B/2 D/4} S- = {C/3}	

• Insert D/3 => S+ = {A/1 B/2 D/4} S- = {C/3}	

• Delete D/3 => S+ = {A/1 B/2 D/4} S- = {C/3}	

• Delete D/4 => S+ = {A/1 B/2 D/4} S- ={C/3}	

• Delete D/5 => S+ = {A/1B/2} S- = {C/3 D/5}	

• Insert D/5 => S+ = {A/1 B/2} S- = {C/3 D/5}	

• Delete D/6 => S+ = {A/1 B/2} S- = {C/3 D/6}
CRDTS AGAIN
• It’s possible to map fan-in-on-read stream product
to a data model that could be implemented with a
specific type of CRDT
ROSHI
• Roshi is an open source distributed storage system for time-
series events.	

• written in Go (5K likes including 2.3K lines of tests)	

• implements a novel CRDT set type	

• uses Redis ZSET sorted set to storage state
ARCHITECTURE
Pool Pool Pool
Cluster Cluster Cluster
Farm
{A B C} {A B C} {A C}
U = {A B C}
∆ = {B}
so, possible to do read-repeare

Contenu connexe

Tendances

Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic searchmarkstory
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with CassandraRyan King
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchpmanvi
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloJoe Stein
 
SQL for Elasticsearch
SQL for ElasticsearchSQL for Elasticsearch
SQL for ElasticsearchJodok Batlogg
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogJoe Stein
 
Heuritech: Apache Spark REX
Heuritech: Apache Spark REXHeuritech: Apache Spark REX
Heuritech: Apache Spark REXdidmarin
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache MesosJoe Stein
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosJoe Stein
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
 
Cassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE SearchCassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE SearchCaleb Rackliffe
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosJoe Stein
 
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...InfluxData
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
 

Tendances (20)

Hadoop on osx
Hadoop on osxHadoop on osx
Hadoop on osx
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
 
Scaling Twitter with Cassandra
Scaling Twitter with CassandraScaling Twitter with Cassandra
Scaling Twitter with Cassandra
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
SQL for Elasticsearch
SQL for ElasticsearchSQL for Elasticsearch
SQL for Elasticsearch
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to Sqoop | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Heuritech: Apache Spark REX
Heuritech: Apache Spark REXHeuritech: Apache Spark REX
Heuritech: Apache Spark REX
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
High Performance Solr
High Performance SolrHigh Performance Solr
High Performance Solr
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Cassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE SearchCassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE Search
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache Mesos
 
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
RESTful API – How to Consume, Extract, Store and Visualize Data with InfluxDB...
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 

En vedette

Millennials are YOUR future!
Millennials are YOUR future!Millennials are YOUR future!
Millennials are YOUR future!Don Polley
 
Online Marketing : A General Overview
Online Marketing : A General OverviewOnline Marketing : A General Overview
Online Marketing : A General OverviewAlankaram Duraisamy
 
Soft ideation action templates _idea_matrix_board
Soft ideation action templates _idea_matrix_boardSoft ideation action templates _idea_matrix_board
Soft ideation action templates _idea_matrix_boardThe Innovation Lab
 
Social Media: A Writer's Essentials (A 2016 Toolkit for Creative and Academic...
Social Media: A Writer's Essentials (A 2016 Toolkit for Creative and Academic...Social Media: A Writer's Essentials (A 2016 Toolkit for Creative and Academic...
Social Media: A Writer's Essentials (A 2016 Toolkit for Creative and Academic...Sherry Jones
 
Workshops on sound and moving image preservation hanoi v2
Workshops on sound and moving image preservation hanoi v2Workshops on sound and moving image preservation hanoi v2
Workshops on sound and moving image preservation hanoi v2Richard Wright
 
International Year of Sanitation 2008. Indonesia water and Sanitation Magazin...
International Year of Sanitation 2008. Indonesia water and Sanitation Magazin...International Year of Sanitation 2008. Indonesia water and Sanitation Magazin...
International Year of Sanitation 2008. Indonesia water and Sanitation Magazin...Oswar Mungkasa
 
151119 iotlt-talk
151119 iotlt-talk151119 iotlt-talk
151119 iotlt-talksonycsl
 
Methodal 2016 : Les classes inversées, un phénomène précurseur pour « l’école...
Methodal 2016 : Les classes inversées, un phénomène précurseur pour « l’école...Methodal 2016 : Les classes inversées, un phénomène précurseur pour « l’école...
Methodal 2016 : Les classes inversées, un phénomène précurseur pour « l’école...Marcel Lebrun
 
9 big steve jobs mistakes
9 big steve jobs mistakes9 big steve jobs mistakes
9 big steve jobs mistakesHeyday ApS
 
Mobile Ads Before and After piece
Mobile Ads Before and After pieceMobile Ads Before and After piece
Mobile Ads Before and After pieceJan Rezab
 
Mongolian Nomads' Spring Migration-Photographer Timothy Allen
Mongolian Nomads' Spring Migration-Photographer Timothy AllenMongolian Nomads' Spring Migration-Photographer Timothy Allen
Mongolian Nomads' Spring Migration-Photographer Timothy Allenmaditabalnco
 
Customer Service Excellence Programme (Email)
Customer Service Excellence Programme (Email)Customer Service Excellence Programme (Email)
Customer Service Excellence Programme (Email)DavidGMontague
 
Patient forms we welcome you to mattison podiatry group as our patient drs ma...
Patient forms we welcome you to mattison podiatry group as our patient drs ma...Patient forms we welcome you to mattison podiatry group as our patient drs ma...
Patient forms we welcome you to mattison podiatry group as our patient drs ma...Mattison Podiatry Group
 
Competency based feedback system workshop slides chadramowly
Competency based feedback system workshop slides  chadramowlyCompetency based feedback system workshop slides  chadramowly
Competency based feedback system workshop slides chadramowlyChandramowly :
 

En vedette (16)

Millennials are YOUR future!
Millennials are YOUR future!Millennials are YOUR future!
Millennials are YOUR future!
 
Online Marketing : A General Overview
Online Marketing : A General OverviewOnline Marketing : A General Overview
Online Marketing : A General Overview
 
Luisa ramirez
Luisa ramirezLuisa ramirez
Luisa ramirez
 
EY_A vision for growth_EN 2016
EY_A vision for growth_EN 2016EY_A vision for growth_EN 2016
EY_A vision for growth_EN 2016
 
Soft ideation action templates _idea_matrix_board
Soft ideation action templates _idea_matrix_boardSoft ideation action templates _idea_matrix_board
Soft ideation action templates _idea_matrix_board
 
Social Media: A Writer's Essentials (A 2016 Toolkit for Creative and Academic...
Social Media: A Writer's Essentials (A 2016 Toolkit for Creative and Academic...Social Media: A Writer's Essentials (A 2016 Toolkit for Creative and Academic...
Social Media: A Writer's Essentials (A 2016 Toolkit for Creative and Academic...
 
Workshops on sound and moving image preservation hanoi v2
Workshops on sound and moving image preservation hanoi v2Workshops on sound and moving image preservation hanoi v2
Workshops on sound and moving image preservation hanoi v2
 
International Year of Sanitation 2008. Indonesia water and Sanitation Magazin...
International Year of Sanitation 2008. Indonesia water and Sanitation Magazin...International Year of Sanitation 2008. Indonesia water and Sanitation Magazin...
International Year of Sanitation 2008. Indonesia water and Sanitation Magazin...
 
151119 iotlt-talk
151119 iotlt-talk151119 iotlt-talk
151119 iotlt-talk
 
Methodal 2016 : Les classes inversées, un phénomène précurseur pour « l’école...
Methodal 2016 : Les classes inversées, un phénomène précurseur pour « l’école...Methodal 2016 : Les classes inversées, un phénomène précurseur pour « l’école...
Methodal 2016 : Les classes inversées, un phénomène précurseur pour « l’école...
 
9 big steve jobs mistakes
9 big steve jobs mistakes9 big steve jobs mistakes
9 big steve jobs mistakes
 
Mobile Ads Before and After piece
Mobile Ads Before and After pieceMobile Ads Before and After piece
Mobile Ads Before and After piece
 
Mongolian Nomads' Spring Migration-Photographer Timothy Allen
Mongolian Nomads' Spring Migration-Photographer Timothy AllenMongolian Nomads' Spring Migration-Photographer Timothy Allen
Mongolian Nomads' Spring Migration-Photographer Timothy Allen
 
Customer Service Excellence Programme (Email)
Customer Service Excellence Programme (Email)Customer Service Excellence Programme (Email)
Customer Service Excellence Programme (Email)
 
Patient forms we welcome you to mattison podiatry group as our patient drs ma...
Patient forms we welcome you to mattison podiatry group as our patient drs ma...Patient forms we welcome you to mattison podiatry group as our patient drs ma...
Patient forms we welcome you to mattison podiatry group as our patient drs ma...
 
Competency based feedback system workshop slides chadramowly
Competency based feedback system workshop slides  chadramowlyCompetency based feedback system workshop slides  chadramowly
Competency based feedback system workshop slides chadramowly
 

Similaire à Buzzwords 2014 / Overview / part1

Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageKai Sasaki
 
Introdução ao Data Warehouse Amazon Redshift
Introdução ao Data Warehouse Amazon RedshiftIntrodução ao Data Warehouse Amazon Redshift
Introdução ao Data Warehouse Amazon RedshiftAmazon Web Services LATAM
 
Real World Storage in Treasure Data
Real World Storage in Treasure DataReal World Storage in Treasure Data
Real World Storage in Treasure DataKai Sasaki
 
Introdução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftIntrodução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftAmazon Web Services LATAM
 
Ga4 gh meeting at the the sanger institute
Ga4 gh meeting at the the sanger instituteGa4 gh meeting at the the sanger institute
Ga4 gh meeting at the the sanger instituteMatt Massie
 
Cassandra
CassandraCassandra
Cassandraexsuns
 
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J..."Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...Dataconomy Media
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Cassandra Tutorial
Cassandra Tutorial Cassandra Tutorial
Cassandra Tutorial Na Zhu
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAMfnothaft
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentationSergey Enin
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
 
Ultimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per secUltimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per secb0ris_1
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon RedshiftAmazon Web Services
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Sparknickmbailey
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 

Similaire à Buzzwords 2014 / Overview / part1 (20)

Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
 
Introdução ao Data Warehouse Amazon Redshift
Introdução ao Data Warehouse Amazon RedshiftIntrodução ao Data Warehouse Amazon Redshift
Introdução ao Data Warehouse Amazon Redshift
 
Real World Storage in Treasure Data
Real World Storage in Treasure DataReal World Storage in Treasure Data
Real World Storage in Treasure Data
 
Amazon Redshift
Amazon Redshift Amazon Redshift
Amazon Redshift
 
Introdução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon RedshiftIntrodução ao data warehouse Amazon Redshift
Introdução ao data warehouse Amazon Redshift
 
Ga4 gh meeting at the the sanger institute
Ga4 gh meeting at the the sanger instituteGa4 gh meeting at the the sanger institute
Ga4 gh meeting at the the sanger institute
 
Cassandra
CassandraCassandra
Cassandra
 
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J..."Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Cassandra Tutorial
Cassandra Tutorial Cassandra Tutorial
Cassandra Tutorial
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Design for Scalability in ADAM
Design for Scalability in ADAMDesign for Scalability in ADAM
Design for Scalability in ADAM
 
Cassandra presentation
Cassandra presentationCassandra presentation
Cassandra presentation
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
Ultimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per secUltimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per sec
 
Getting started with Amazon Redshift
Getting started with Amazon RedshiftGetting started with Amazon Redshift
Getting started with Amazon Redshift
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 

Plus de Andrii Gakhov

Let's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureLet's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureAndrii Gakhov
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Andrii Gakhov
 
Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Andrii Gakhov
 
Implementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and LuaImplementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and LuaAndrii Gakhov
 
Pecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsPecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsAndrii Gakhov
 
Probabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityProbabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityAndrii Gakhov
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyAndrii Gakhov
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityAndrii Gakhov
 
Вероятностные структуры данных
Вероятностные структуры данныхВероятностные структуры данных
Вероятностные структуры данныхAndrii Gakhov
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksAndrii Gakhov
 
Swagger / Quick Start Guide
Swagger / Quick Start GuideSwagger / Quick Start Guide
Swagger / Quick Start GuideAndrii Gakhov
 
API Days Berlin highlights
API Days Berlin highlightsAPI Days Berlin highlights
API Days Berlin highlightsAndrii Gakhov
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcasesAndrii Gakhov
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferretAndrii Gakhov
 
Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014Andrii Gakhov
 
Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014Andrii Gakhov
 
Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014Andrii Gakhov
 
Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014Andrii Gakhov
 

Plus de Andrii Gakhov (20)

Let's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureLet's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architecture
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
 
Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...
 
DNS Delegation
DNS DelegationDNS Delegation
DNS Delegation
 
Implementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and LuaImplementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and Lua
 
Pecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsPecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food Traditions
 
Probabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityProbabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. Similarity
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. Frequency
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. Cardinality
 
Вероятностные структуры данных
Вероятностные структуры данныхВероятностные структуры данных
Вероятностные структуры данных
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
 
Swagger / Quick Start Guide
Swagger / Quick Start GuideSwagger / Quick Start Guide
Swagger / Quick Start Guide
 
API Days Berlin highlights
API Days Berlin highlightsAPI Days Berlin highlights
API Days Berlin highlights
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcases
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferret
 
Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014
 
Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014
 
Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014
 
Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014
 

Dernier

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Dernier (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Buzzwords 2014 / Overview / part1

  • 1. BERLIN BUZZWORDS 2014 SELECTEDTALKS OVERVIEW tech talk @ ferret Andrii Gakhov 05/06/2014
  • 2. SAMZA AT LINKEDIN TAKING STREAM PROCESSING ONTHE NEXT LEVEL by Martin Kleppmann
  • 3. ABOUT OF THE AUTHOR • Software engineer at LinkedIn • co-founder of Rapportive (acquired by LinkedIn in 2012) • http://martin.kleppmann.com/ • @martinkl
  • 4. APACHE SAMZA • Apache Samza is a distributed stream processing framework. • http://samza.incubator.apache.org/ • uses Apache Kafka for messaging • LinkedIn uses it in production
  • 5. APACHE KAFKA • A high-throughput distributed messaging system (commit log service). Fast, Scalable, Durable and Distributed. • http://kafka.apache.org/ • A single Kafka broker can handle hundreds of megabytes of reads and writes per second from thousands of clients. • Messages are persisted on disk and replicated within the cluster to prevent data loss. Each broker can handle terabytes of messages without performance impact. • Key-value storage, but only appends are supported for key
  • 6. SAMZAVS. STORM • Both systems provide partitioned stream model, distributed execution environment,API for steam processing, fault tolerance etc. • Similar parallelism model, but Storm uses 1 thread per task by default, Samza uses single- threaded processes. Doesn’t support dynamic rebalancing. • Written in Java/Scala, and currently supports only JVM languages • Guaranteed delivery: Samza currenly supports only at-least-once delivery model (planned for exactly-once). • Completely different state management. Instead of using remote DB for durable storage, each Samza task includes an embedded key-value storage, located on he same machine. Changes are replicated. • Samza better suited for handling keyed data (because never processes messages in a partition out-of-order.
  • 7. SAMZA ARCHITECTURE YARN NodeManager Kafka Samza Container Task Task Samza Container Task Task YARN NodeManager Kafka Samza Container Task Task Samza Container Task Task YARN NodeManager Kafka Samza Container Samza Container
  • 9. ABOUT OF THE AUTHOR • Engineer at SoundCloud • background in search and distributes systems • http://peter.bourgon.org • @peterbourgon
  • 10. DISTRIBUTED SYSTEMTHEORY • Partition-tolerance
 system continues to operate despite message loss due to network and/or node failure • Consistency
 all nodes see the same data at the same time • Availability
 a guarantee that every request receives a response about whether it was successful or failed
  • 12. EXAMPLES AP • Cassandra • Riak • CouchBase • MongoDB
 
 * eventual consistency, some node could be stale, but not wrong CP • Paxos (doozer, chubby) • Zab (ZooKeeper) • Raft (Consul) * Consensus protocols
  • 13. CRDTS • CRDTs are data structures for distributed systems • C = Conflict-free • R = Replicated • D = Data • T =Types CRDTs archive eventual consistence by using CALM / ACID 2.0 principles
  • 14. INCREMENT ONLY COUNTERS Associative: {1} U ({2} U {3}) = ({1} U {2}) U {3} Commutative: {1} U {2} = {2} U {1} Idempotent: {1} U {1} = {1} { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } 123 123 123 123 123 123 123 123 123 123 123456 123 123 123, 456 123 123 123, 456 123, 456 123, 456 123, 456 Items are unique IDs of users who listen the track. User can’t rewake his “choice”
  • 15. SOUND CLOUD EXAMPLE Event • Timestamp (At 2014-05-26 12:04:56.097403 UTC) • User (snoopdogg) • Verb (Reposted) • Identifier (theeconomist/election-day)
  • 16. DATA MODELS Fanoutonwrite (closetothedataconsumer) Faninonread (closetothedataproducer) SoundCloud uses Cassandra and give each user a row in a column family. Reads are fast, but writes take more time when you have a lot of followers. Reads are difficult. Nobody now builds timelines via fan-in-on-read. But here is a big potential - huge storage reduction, keep data set in the memory etc.
  • 17. CRDTS SET Events are unique, so use a set! • G-set: can’t delete • 2P-set: add, remove once • OR-set: storage overhead • CRDT sets S+ = {A B C}
 S- = {B}
 S = {A C}
  • 18. SET • S = actor’s set keys (snoopdogg:outbox) • A, B, C, D = actor:verb:identifier • 1, 2, 3, 4 = timestamp S+ = {A/1 B/2 C/3} S- = {D/4} S = {A/1 B/2 C/3} • Read is easy, write is interesting!
  • 19. EXAMPLE • S+ = {A/1 B/2} S- = {C/3} • Insert D/4 => S+ = {A/1 B/2 D/4} S- = {C/3} • Insert D/4 => S+ = {A/1 B/2 D/4} S- = {C/3} • Insert D/3 => S+ = {A/1 B/2 D/4} S- = {C/3} • Delete D/3 => S+ = {A/1 B/2 D/4} S- = {C/3} • Delete D/4 => S+ = {A/1 B/2 D/4} S- ={C/3} • Delete D/5 => S+ = {A/1B/2} S- = {C/3 D/5} • Insert D/5 => S+ = {A/1 B/2} S- = {C/3 D/5} • Delete D/6 => S+ = {A/1 B/2} S- = {C/3 D/6}
  • 20. CRDTS AGAIN • It’s possible to map fan-in-on-read stream product to a data model that could be implemented with a specific type of CRDT
  • 21. ROSHI • Roshi is an open source distributed storage system for time- series events. • written in Go (5K likes including 2.3K lines of tests) • implements a novel CRDT set type • uses Redis ZSET sorted set to storage state
  • 22. ARCHITECTURE Pool Pool Pool Cluster Cluster Cluster Farm {A B C} {A B C} {A C} U = {A B C} ∆ = {B} so, possible to do read-repeare