A simple explanation of basic principles of Distributed Programming with NodeJS. The CAP Theorem is fully explained, with working code the you can try yourself!
2. @bbossola
Whoami
● Developer since 1988
● XP Coach 2000+
● Co-founder of JUG Torino
● Java Champion since 2005
● CTO @ EF (Education First)
I live in London, love the weather...
3. @bbossola
Agenda
● Distributed programming
● How does it work, what does it mean
● The CAP theorem
● CAP explained with code
– CA system using two phase commit
– AP system using sloppy quorums
– CP system using majority quorums
● What next?
● Q&A
5. @bbossola
Distributed programming
● Any system should deal with two tasks:
– Storage
– Computation
● How do we deal with scale?
● How do we use multiple computers to do what we used to
do on one?
7. @bbossola
Scalability
● The ability of a system/network/process to:
– handle a growing amount of work
– be enlarged to accommodate new growth
A scalable system continue to meet the needs of its users as the
scale increase
clipart courtesy of openclipart.org
clipart courtesy of openclipart.org
8. @bbossola
Scalability flavours
● size:
– more nodes, more speed
– more nodes, more space
– more data, same latency
● geographic:
– more data centers, quicker response
● administrative:
– more machines, no additional work
9. @bbossola
How do we scale? partitioning
● Slice the dataset into smaller independent sets
● reduces the impact of dataset growth
– improves performance by limiting the amount of data to
be examined
– improves availability by the ability of partitions to fail
indipendently
10. @bbossola
How do we scale? partitioning
● But can also be a source of problems
– what happens if a partition become unavailable?
– what if It becomes slower?
– what if it becomes unresponsive?
clipart courtesy of openclipart.org
11. @bbossola
How do we scale? replication
● Copies of the same data on multiple machines
● Benefits:
– allows more servers to take part in the computation
– improves performance by making additional computing
power and bandwidth
– improves availability by creating copy of the data
12. @bbossola
How do we scale? replication
● But it's also a source of problems
– there are independent copies of the data
– need to be kept in sync on multiple machines
● Your system must follow a consistency model
v4 v4
v8
v8 v4 v5
v7
v8
clipart courtesy of openclipart.org
13. @bbossola
Availability
● The proportion of time a system is in functioning conditions
● The system is fault-tolerant
– the ability of your system to behave in a well defined
manner once a fault occurs
● All clients can always read and write
– In distributed systems this
is achieved by redundancy
clipart courtesy of openclipart.org
14. @bbossola
Introducing: performance
● The amount of useful work accomplished compared to the
time and resources used
● Basically:
– short response time for a unit of work
– high rate of processing
– low utilization of resources
clipart courtesy of openclipart.org
15. @bbossola
Introducing: latency
● The period between the initiation of something and the
occurrence
● The time between something happened and the time it has
an impact or become visible
● more high level examples:
– how long until you become a zombie
after a bite?
– how long until my post is visible
to others?
clipart courtesy of cliparts.co
16. @bbossola
Consistency
● Any read on a data item X returns a value corresponding
to the result of the most recent write on X.
● Each client always has the same view of the data
● Also know as “Strong Consistency”
clipart courtesy of cliparts.co
17. @bbossola
Consistency flavours
● Strong consistency
– every replica sees every update in the same order.
– no two replicas may have different values at the same time.
● Weak consistency
– every replica will see every update, but possibly in different
orders.
● Eventual consistency
– every replica will eventually see every update and will
eventually agree on all values.
19. @bbossola
The CAP theorem
● You cannot have all :(
● You can select two
properties at once
Sorry, this has been mathematically proven and no, has not been debunked.
20. @bbossola
The CAP theorem
CA systems!
● You selected consistency
and availability!
● Strict quorum protocols
(two/multi phase commit)
● Most RDBMS
Hey! A network partition will
f**k you up good!
21. @bbossola
The CAP theorem
AP systems!
● You selected availability
and partition tolerance!
● Sloppy quorums and
conflict resolution protocols
● Amazon Dynamo, Riak,
Cassandra
22. @bbossola
The CAP theorem
CP systems!
● You selected consistency
and partition tolerance!
● Majority quorum protocols
(paxos, raft, zab)
● Apache Zookeeper,
Google Spanner
23. @bbossola
NodeJS time!
● Let's write our brand new key value store
● We will code all three different flavours
● We will have many nodes, fully replicated
● No sharding
● We will kill servers!
● We will trigger network
partitions!
– (no worries. it's a simulation!)
clipart courtesy of cliparts.co
28. @bbossola
Nodeapp`
AP: sloppy quorums, simplified
QUORUM
API
Storage
API
GET (k) SET (k,v)
Storage
Database
QUORUM
Core
(read) (repair)
propose
(tx)
commit
(tx)
rollback
(tx)
30. @bbossola
CP: majority quorums (raft, simplified)
RAFT
API
Storage
API
GET (k) SET (k,v)
Storage
Database
RAFT
Core
beat
voteme history
Nodeapp`
Urgently needs
refactoring!!!!
31. @bbossola
What about BASE?
● It's just a way to qualify eventually consistent systems
● BAsic Availability
– The database appears to work most of the time.
● Soft-state
– Stores don’t have to be write-consistent, nor do different
replicas have to be mutually consistent all the time.
● Eventual consistency
– Stores exhibit consistency at some later point (e.g.,
lazily at read time).
32. @bbossola
What about Lamport clocks?
● It's a mechanism to maintain a distributed notion of time
● Each process maintains a counter
– Whenever a process does work, increment the counter
– Whenever a process sends a message, include the
counter
– When a message is received, set the counter to
max(local_counter, received_counter) + 1
clipart courtesy of cliparts.co
33. @bbossola
What about Vector clocks?
● Maintains an array of N Lamport clocks, one per each node
● Whenever a process does work, increment the logical clock
value of the node in the vector
● Whenever a process sends a message, include the full vector
● When a message is received:
– update each element in
● max(local, received)
– increment the logical clock
– of the current node in the vector
clipart courtesy of cliparts.co
34. @bbossola
What next?
● Learn the lingo and the basics
● Do your homework
● Start playing with these concepts
● It's complicated, but not rocket science
● Be inspired!
The 93 petaflop Sunway TaihuLight is installed at the National Supercomputing Centre in Wuxi. At its peak, the computer can perform around 93,000 trillion calculations per second.
It has more than 10.5 million processing cores and 40,960 nodes and runs on a Linux-based operating system.
There are tradeoffs involved in optimizing for any of these outcomes. For example, a system may achieve a higher throughput by processing larger batches of work thereby reducing operation overhead. The tradeoff would be longer response times for individual pieces of work due to batching.
I find that low latency - achieving a short response time - is the most interesting aspect of performance, because it has a strong connection with physical (rather than financial) limitations. It is harder to address latency using financial resources than the other aspects of performance.
Strong consistency every replica sees every update in the same order. Updates are made atomically, so that no two replicas may have different values at the same time.
Weak consistency every replica will see every update, but possibly in different orders.
Eventual consistency every replica will eventually see every update (i.e. there is a point in time after which every replica has seen a given update), and will eventually agree on all values. Updates are therefore not atomic.
Consistency means that each client always has the same view of the data.
Availability means that all clients can always read and write.
Partition tolerance means that the system works well across physical network partitions.
Consistency is considered strong here:
“Atomic, linearizable, consistency: there must exist a total order on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time”
Raft, Paxos and Zookeeper ZAB, all provide linearizable writes
This is intuitive since they use a leader which publishes the quorum-voted changes atomically and in order, creating a virtual synchrony.
CockroachDB and Google Spanner, also provide linearizability (Google also uses atomic clocks to optimize latency).
explain CAP theorem with a distributed key-value store
move to AP and implement lampart clock
move to CP and implement consensus
It provides the illusion of behaving like a single system but cannot tolerate network partitions or failures of his parts
Example: Amazon Dynamo (Riak, Cassandra...)
Dynamo prioritizes availability over consistency; it does not guarantee single-copy consistency. Instead, replicas may diverge from each other when values are written; when a key is read, there is a read reconciliation phase that attempts to reconcile differences between replicas before returning the value back to the client.
For many features on Amazon, it is more important to avoid outages than it is to ensure that data is perfectly consistent, as an outage can lead to lost business and a loss of credibility. Furthermore, if the data is not particularly important, then a weakly consistent system can provide better performance and higher availability at a lower cost than a traditional RDBMS.