3. Locks
• How can we get "critical section" on a resource?
• Various operations require Locks
• Exclusive operations:
concurrent operations break consistency
• Metadata operations:
several consecutive operations to be seen "atomically"
4. Concurrency
• How many operations can we process concurrently?
• Concurrently - works independently from each other
no concurrency If two operations require to lock a resource
• Concurrent operations can be processed "in parallel"
Concurrent
operations
Lock on Resource A
Non-concurrent
operations
Lock on Resource B
Lock on Resource C
Lock on Resource A
Lock on Resource ALock on Resource A
5. Throughput
• How many operations can we process in a second?
• ≒ length of critical sections, with locks
Low
throughput
(3ops/sec)
High
throughput
(6ops/sec)
Op w/ lock A
Op w/ lock A
Op w/ lock A
Op w/ lock A
1s
Op w/ lock A
Op w/ lock A
Op w/ lock A
Op w/ lock A
Op w/ lock A
6. Node Y (slave)Node X (master)
Patterns of Implementations
• How can we minimize critical section?
• let's think about a distributed Key-Value Store
• Key: an UUID
• Value: a metadata (size, index, ...) and a file (binary)
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
7. Node Y (slave)Node X (master)
Patterns of Implementations
• Patterns:
1. Naive Giant Lock
2. Metadata Giant Lock + Simple Resource Lock
3. Reference Counting Lock
4. Reference Counting Lock + Async Operation Pipeline
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
8. Node YNode X
Naive Giant Lock (1a)
Appending data to File A
• Lock the entire storage for any operations
• until replication finishes
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
1. Lock the entire storage on X
9. metadata A
Node X Node Y
File Bmetadata B
File A
File B
Replication
metadata A
metadata B
2. Edit file A
Naive Giant Lock (1b)
Appending data to File A
File A
10. Node X Node Y
File A
File Bmetadata B
File A
File B
Replication
metadata A
metadata B
3. Edit metadata A
Naive Giant Lock (1b)
Appending data to File A
metadata A File A
11. Node X Node Y
File Bmetadata B File B
Replication
metadata B
4. Send a request to replicate the operation
Naive Giant Lock (1d)
Appending data to File A
File Ametadata A File Ametadata A
12. Node X Node Y
File Bmetadata B
File A
File B
Replication
metadata A
metadata B
5. Release the lock and respond to X
Naive Giant Lock (1e)
Appending data to File A
File Ametadata A
13. Node X Node Y
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
6. Release the lock
Naive Giant Lock (1f)
Appending data to File A
14. Node YNode X
Naive Giant Lock (2a)
Adding a resource C
• Lock the entire storage for any operations
• until replication finishes
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
1. Lock the entire storage on X
15. Node YNode X
Naive Giant Lock (2b)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
2. Add a file C on X
File C
16. Node YNode X
Naive Giant Lock (2c)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
3. Add metadata C using content of file C
(e.g., ctime, checksum)
File Cmetadata C
17. Node YNode X
Naive Giant Lock (2d)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File Cmetadata C
4. Send a request to replicate the operation
File Cmetadata C
18. Node YNode X
Naive Giant Lock (2e)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File Cmetadata C File Cmetadata C
5. Release the lock and respond to X
19. Node YNode X
Naive Giant Lock (2f)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File Cmetadata C File Cmetadata C
6. Release the lock
20. Naive Giant Lock
• Pros:
• Very easy to implement, understand and maintain
• Cons:
• Very poor throughput: entire operations are in critical section
• Very poor concurrency: all operations on every resources are exclusive
• OK only when # of all requests are less than 1~3req/sec
• Local operation: ~10ms, Replication: ~200ms
22. Node YNode X
Metadata Giant Lock + Simple Resource Lock (1a)
Appending data to File A
• Adding/deleting or checking existence of metadata require Giant Lock
• updating metadata/file requires Lock on a metadata/file
• using metadata as a key of a resource
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
1. Lock the entire metadata set
23. Node YNode X
Metadata Giant Lock + Simple Resource Lock (1b)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
2. Check metadata A, and lock it
• Adding/deleting or checking existence of metadata require Giant Lock
• updating metadata/file requires Lock on a metadata/file
• using metadata as a key of a resource
24. Node YNode X
Metadata Giant Lock + Simple Resource Lock (1c)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
3. Release the lock of entire metadata set
25. Node YNode X
Metadata Giant Lock + Simple Resource Lock (1d)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
4. Edit file A
File A
26. Node YNode X
Metadata Giant Lock + Simple Resource Lock (1e)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File A
5. Edit metadata A
metadata A
27. Node YNode X
Metadata Giant Lock + Simple Resource Lock (1f)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File Ametadata A
4. Send a request to replicate the operation
28. Node YNode X
Metadata Giant Lock + Simple Resource Lock (1g)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File Ametadata A
5. Check metadata A and lock it
29. Node YNode X
Metadata Giant Lock + Simple Resource Lock (1h)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File Ametadata A
6. Release entire metadata lock, and edit A
File Ametadata A
30. Node YNode X
Metadata Giant Lock + Simple Resource Lock (1i)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File Ametadata A
7. Release lock and respond to X
File Ametadata A
31. Node YNode X
Metadata Giant Lock + Simple Resource Lock (1j)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File Ametadata A
8. Release lock on A
File Ametadata A
32. Node YNode X
Metadata Giant Lock + Simple Resource Lock (2a)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
1. Lock the entire metadata set
• Adding/deleting or checking existence of metadata require Giant Lock
• updating metadata/file requires Lock on a metadata/file
• using metadata as a key of a resource
33. Node YNode X
Metadata Giant Lock + Simple Resource Lock (2b)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
• Adding/deleting or checking existence of metadata require Giant Lock
• updating metadata/file requires Lock on a metadata/file
• using metadata as a key of a resource
2. Add a file C on X
File C
34. Node YNode X
Metadata Giant Lock + Simple Resource Lock (2c)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
• Adding/deleting or checking existence of metadata require Giant Lock
• updating metadata/file requires Lock on a metadata/file
• using metadata as a key of a resource
3. Add a file C on X
File C
35. Node YNode X
Metadata Giant Lock + Simple Resource Lock (2d)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File C
4. Add metadata C using content of file C
(e.g., ctime, checksum)
metadata C
36. Node YNode X
Metadata Giant Lock + Simple Resource Lock (2e)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File Cmetadata C
5. Send a request to replicate the operation
File Cmetadata C
37. Node YNode X
Metadata Giant Lock + Simple Resource Lock (2f)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File Cmetadata C
6. Release the lock and respond to X
File Cmetadata C
38. Node YNode X
Metadata Giant Lock + Simple Resource Lock (2g)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File Cmetadata C
6. Release the lock
File Cmetadata C
39. Metadata Giant Lock + Simple Resource Lock
• Pros:
• Still easy to implement & understand
• Better concurrency for updating resources
• Cons:
• Poor throughput: entire operations on a resource are in critical section
• Poor concurrency for adding/deleting operations:
adding/deleting resources require giant exclusive lock
• Fits for many concurrent update workload
41. Node YNode X
Reference Counting Lock (1a)
Appending data to File A
• A dictionary of lock object, with reference counting
• all operations on a resource require locking the lock object
• adding/deleting lock object to/from dictionary require lock of the dic.
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
1. Get a lock obj for A, or create if missing, then lock it
lock
1
42. Node YNode X
Reference Counting Lock (1b)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
2. Edit A
lock File Ametadata A
1
43. Node YNode X
Reference Counting Lock (1c)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
3. Send a request to replicate
lock File Ametadata A
1
lock
1
File Ametadata A
44. Node YNode X
Reference Counting Lock (1d)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
4. Release & decrement the lock, and delete it if counter is 0,
then respond to X
lock File Ametadata A
1
File Ametadata Alock
0
45. Node YNode X
Reference Counting Lock (1e)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
5. Release & decrement the lock, and delete it if counter is 0
File Ametadata A File Ametadata A
46. Node YNode X
Reference Counting Lock (2a)
Adding a resource C
• A dictionary of lock object, with reference counting
• all operations on a resource require locking the lock object
• adding/deleting lock object to/from dictionary require lock of the dic.
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
1. Get a lock obj for C, or create if missing, then lock it
lock
1
47. Node YNode X
Reference Counting Lock (2b)
Adding a resource C
• A dictionary of lock object, with reference counting
• all operations on a resource require locking the lock object
• adding/deleting lock object to/from dictionary require lock of the dic.
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
lock
1
2. Add a file C on X
File C
48. Node YNode X
Reference Counting Lock (2c)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
lock
1
File C
3. Add metadata C using content of file C
(e.g., ctime, checksum)
metadata C
49. Node YNode X
Reference Counting Lock (2d)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
lock
1
File Cmetadata C
4. Send a request to replicate the operation
File Cmetadata Clock
1
50. Node YNode X
Reference Counting Lock (2f)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
lock
1
File Cmetadata C File Cmetadata Clock
0
5. Release & decrement the lock, and delete it if counter is 0,
then respond to X
51. Node YNode X
Reference Counting Lock (2g)
Adding a resource C
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
File Cmetadata C File Cmetadata C
6. Release & decrement the lock, and delete it if counter is 0
52. Reference Counting Lock
• Pros:
• Better concurrency for any operations on resources:
Adding/deleting lock dictionary entries are very lightweight
• Cons:
• Poor throughput: entire operations on a resource are in critical section
• A bit complex code on locks with reference counting:
but it's required to help resource leak
• Fits for many concurrent operations
54. Node YNode X
Reference Counting Lock w/o Replication Protection (1a)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
1. Get a lock obj for A, or create if missing, then lock it
lock
1
Thread T1
55. Node YNode X
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
2. Get a lock obj for A, then lock it - but wait it
lock
2
Thread T2
Reference Counting Lock w/o Replication Protection (1b)
Appending data to File A
56. Node YNode X
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
3. Edit A to append P1
lock File A (P1)metadata A
2
Thread T1
Reference Counting Lock w/o Replication Protection (1c)
Appending data to File A
57. Node YNode X
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
4. Release lock on A
lock File A (P1)metadata A
1
Thread T1
Reference Counting Lock w/o Replication Protection (1d)
Appending data to File A
... and going to replicate it
(but the thread is not scheduled on CPU)
58. Node YNode X
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
5. Got the lock on A, then edit A to append P2
lock File A (P1, P2)metadata A
1
Thread T2
Reference Counting Lock w/o Replication Protection (1e)
Appending data to File A
59. Node YNode X
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
6. Release the lock
File A (P1, P2)metadata A
Thread T2
Reference Counting Lock w/o Replication Protection (1f)
Appending data to File A
... and going to replicate it
(and it is SCHEDULED on CPU)
60. Node YNode X
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
7. Get the lock on A
and append P2
File A (P1, P2)metadata A
Reference Counting Lock w/o Replication Protection (1g)
Appending data to File A
... and going to replicate it
(and it is SCHEDULED on CPU)
lock
1
File A (P2)metadata A
61. Node YNode X
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
8. Get the lock on A
and append P1
File A (P1, P2)metadata A
Reference Counting Lock w/o Replication Protection (1h)
Appending data to File A
then, request from T1 arrived
lock
1
File A (P2, P1)metadata A
62. Node YNode X
File A
File B
metadata A
metadata B
File A
File B
Replication
metadata A
metadata B
8. Get the lock on A
and append P1
File A (P1, P2)metadata A
Reference Counting Lock w/o Replication Protection (1h)
Appending data to File A
then, request from T1 arrived
lock
1
File A (P2, P1)metadata A
INCONSISTENT File A
between Node X and Y
63. ... But, We *Seriously* Need
MORE Throughput !!!!
64. Node YNode X
Reference Counting Lock w/ Async Operation Pipeline (1a)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
metadata A
metadata B
1. Get a lock obj for A, or create if missing, then lock it
lock
1
Thread T1
operations pipeline to Y
65. Node YNode X
Reference Counting Lock w/ Async Operation Pipeline (1b)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
metadata A
metadata B
lock
2
operations pipeline to Y
2. Get a lock obj for A, then lock it - but wait it
Thread T2
66. Node YNode X
Reference Counting Lock w/ Async Operation Pipeline (1c)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
metadata A
metadata B
lock
2
operations pipeline to Y
3. Edit A to append P1
and enqueue an operation to add P1 on A
File A (P1)metadata A
Thread T1
ADD(A, P1)
67. Node YNode X
Reference Counting Lock w/ Async Operation Pipeline (1d)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
metadata A
metadata B
lock
1
operations pipeline to Y
File A (P1)metadata A
ADD(A, P1)
4. Release the lock on A
and wait callback invocation from "ADD(A,P1)"
Thread T1
68. Node YNode X
Reference Counting Lock w/ Async Operation Pipeline (1e)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
metadata A
metadata B
lock
1
operations pipeline to Y
File A (P1, P2)metadata A
ADD(A, P1)
5. Got the lock on A, then edit A to append P2,
and enqueue an operation to add P2 on A
Thread T2
ADD(A, P2)
69. Node YNode X
Reference Counting Lock w/ Async Operation Pipeline (1f)
Appending data to File A
File A
File B
metadata A
metadata B
File A
File B
metadata A
metadata B
operations pipeline to Y
File A (P1, P2)metadata A
ADD(A, P1)
6. Release the lock on A,
and wait callback invocation from "ADD(A,P2)"
Thread T2
ADD(A, P2)
70. Node YNode X
Reference Counting Lock w/ Async Operation Pipeline (1g)
Appending data to File A, and adding a resource C
File A
File B
metadata A
metadata B
File A
File B
metadata A
metadata B
operations pipeline to Y
File A (P1, P2)metadata A
ADD(A, P1)
7. Other operations can be enqueued into pipeline
ADD(A, P2)
File Cmetadata C
CREATE(C)
71. Node YNode X
Reference Counting Lock w/ Async Operation Pipeline (1h)
Appending data to File A, and adding a resource C
File A
File B
metadata A
metadata B
File A
File B
metadata A
metadata B
operations pipeline to Y
File A (P1, P2)metadata A
ADD(A, P1) ADD(A, P2)
File Cmetadata C
CREATE(C)
Replication
8. Sends pipelined operations to Y
as a batch request in requested order
operations to be applied
ADD(A, P1) ADD(A, P2) CREATE(C)
Background Worker Threads
72. Node YNode X
Reference Counting Lock w/ Async Operation Pipeline (1i)
Appending data to File A, and adding a resource C
File A
File B
metadata A
metadata B
File A (P1, P2)
File B
metadata A
metadata B
operations pipeline to Y
File A (P1, P2)metadata A
ADD(A, P1) ADD(A, P2)
File Cmetadata C
CREATE(C)
Replication
9. Node Y applies operations on A and C
and respond to X
operations to be applied
ADD(A, P1) ADD(A, P2) CREATE(C)
File Cmetadata C
73. Node YNode X
Reference Counting Lock w/ Async Operation Pipeline (1j)
Appending data to File A, and adding a resource C
File A
File B
metadata A
metadata B
File A (P1, P2)
File B
metadata A
metadata B
operations pipeline to Y
File A (P1, P2)metadata A
ADD(A, P1) ADD(A, P2)
File Cmetadata C
CREATE(C)
Replication
operations to be applied
File Cmetadata C
10. Calls callbacks of operations finished
Background Worker Threads
74. Reference Counting Lock
w/ Asynchronous Operation Pipeline
• Pros:
• Better throughput and concurrency for every operations:
Local operations are enough fast
Remote operations are processed in micro batch manner
• Cons:
• Hard to implement:
Serializable operations, Queues per peer, Background worker threads
and Callback management
• Fits for heavy traffics (but too much for many cases...)
75. Conclusion
There's no way to achieve
high-throughput & highly-concurrent systems
rather than
1. have fine-grained locks
2. execute operations out of critical sections
without losing consistency!
76. Watch Your Traffic Carefully,
Then Implement Locks & Concurrency "much enough"
For Your workload!
Thanks!
@tagomoris