3. ABOUT OF THE AUTHOR
• Software engineer at LinkedIn
• co-founder of Rapportive (acquired by LinkedIn
in 2012)
• http://martin.kleppmann.com/
• @martinkl
4. APACHE SAMZA
• Apache Samza is a distributed stream processing framework.
• http://samza.incubator.apache.org/
• uses Apache Kafka for messaging
• LinkedIn uses it in production
5. APACHE KAFKA
• A high-throughput distributed messaging system (commit log
service). Fast, Scalable, Durable and Distributed.
• http://kafka.apache.org/
• A single Kafka broker can handle hundreds of megabytes of
reads and writes per second from thousands of clients.
• Messages are persisted on disk and replicated within the
cluster to prevent data loss. Each broker can handle terabytes
of messages without performance impact.
• Key-value storage, but only appends are supported for key
6. SAMZAVS. STORM
• Both systems provide partitioned stream model, distributed execution environment,API
for steam processing, fault tolerance etc.
• Similar parallelism model, but Storm uses 1 thread per task by default, Samza uses single-
threaded processes. Doesn’t support dynamic rebalancing.
• Written in Java/Scala, and currently supports only JVM languages
• Guaranteed delivery: Samza currenly supports only at-least-once delivery model
(planned for exactly-once).
• Completely different state management. Instead of using remote DB for durable storage,
each Samza task includes an embedded key-value storage, located on he same machine.
Changes are replicated.
• Samza better suited for handling keyed data (because never processes messages in a
partition out-of-order.
9. ABOUT OF THE AUTHOR
• Engineer at SoundCloud
• background in search and distributes systems
• http://peter.bourgon.org
• @peterbourgon
10. DISTRIBUTED SYSTEMTHEORY
• Partition-tolerance
system continues to operate despite message loss
due to network and/or node failure
• Consistency
all nodes see the same data at the same time
• Availability
a guarantee that every request receives a response
about whether it was successful or failed
12. EXAMPLES
AP
• Cassandra
• Riak
• CouchBase
• MongoDB
* eventual consistency, some node could
be stale, but not wrong
CP
• Paxos (doozer, chubby)
• Zab (ZooKeeper)
• Raft (Consul)
* Consensus protocols
13. CRDTS
• CRDTs are data structures for distributed systems
• C = Conflict-free
• R = Replicated
• D = Data
• T =Types
CRDTs archive eventual consistence by using CALM / ACID 2.0
principles
14. INCREMENT ONLY COUNTERS
Associative: {1} U ({2} U {3}) = ({1} U {2}) U {3}
Commutative: {1} U {2} = {2} U {1}
Idempotent: {1} U {1} = {1}
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
{ }
123 123
123
123
123
123
123
123
123
123
123456
123
123
123, 456
123
123
123, 456 123, 456
123, 456
123, 456
Items are unique IDs of users who listen the track. User can’t rewake his “choice”
15. SOUND CLOUD EXAMPLE
Event
• Timestamp (At 2014-05-26 12:04:56.097403 UTC)
• User (snoopdogg)
• Verb (Reposted)
• Identifier (theeconomist/election-day)
17. CRDTS SET
Events are unique, so use a set!
• G-set: can’t delete
• 2P-set: add, remove once
• OR-set: storage overhead
• CRDT sets
S+ = {A B C}
S- = {B}
S = {A C}
18. SET
• S = actor’s set keys (snoopdogg:outbox)
• A, B, C, D = actor:verb:identifier
• 1, 2, 3, 4 = timestamp
S+ = {A/1 B/2 C/3}
S- = {D/4}
S = {A/1 B/2 C/3}
• Read is easy, write is interesting!
20. CRDTS AGAIN
• It’s possible to map fan-in-on-read stream product
to a data model that could be implemented with a
specific type of CRDT
21. ROSHI
• Roshi is an open source distributed storage system for time-
series events.
• written in Go (5K likes including 2.3K lines of tests)
• implements a novel CRDT set type
• uses Redis ZSET sorted set to storage state