SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Transactions with
Replicated Data
Distributed Software
Systems
One copy serializability
z Replicated transactional service
y Each replica manager provides concurrency control
and recovery of its own data items in the same way
as it would for non-replicated data
z Effects of transactions performed by various
clients on replicated data items are the same as
if they had been performed one at a time on a
single data item
z Additional complications: failures, network
partitions
y Failures should be serialized wrt transactions, i.e. any
failure observed by a transaction must appear to
have happened before a transaction started
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F200
Figure 14.18 Replicated transactional service.
B
A
Client + front end
BB BA A
GetBalance(A)
Client + front end
Replica managers
Replica managers
Deposit(B,3);
UT
Replication Schemes
z Read one –Write All
yCannot handle network partitions
z Schemes that can handle network
partitions
yAvailable copies with validation
yQuorum consensus
yVirtual Partition
Read one/Write All
z One copy serializability
y Each write operation sets a write lock at each replica
manager
y Each read sets a read lock at one replica manager
z Two phase commit
y Two-level nested transaction
xCoordinator -> Workers
xIf either coordinator or worker is a replica manager, it has to
communicate with replica managers
z Primary copy replication
y ALL client requests are directed to a single primary
server
xDifferent from scheme discussed earlier
Available copies replication
z Can handle some replica managers are
unavailable because they have failed or
communication failure
z Reads can be performed by any available replica
manager but writes must be performed by all
available replica managers
z Normal case is like read one/write all
y As long as the set of available replica managers does
not change during a transaction
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F201
Figure 14.19 Available copies.
A
X
Client + front end
P
B
Client + front end
Replica managers
Deposit(A,3);
UT
Deposit(B,3);
GetBalance(B)
GetBalance(A)
Replica managers
Y
M
B
N
A
B
Available copies replication
z Failure case
y One copy serializabilty requires that failures and
recovery be serialized wrt transactions
y This is not achieved when different transactions
make conflicting failure observations
y Example shows local concurrency control not enough
y Additional concurrency control procedure (called local
validation) has to be performed to ensure correctness
z Available copies with local validation assumes no
network partition - i.e. functioning replica
managers can communicate with one another
Local validation - example
z Assume X fails just after T has performed
GetBalance and N fails just after U has
performed GetBalance
z Assume X and N fail before T & U have
performed their Deposit operations
y T’s Deposit will be performed at M & P while U’s
Deposit will be performed at Y
y Concurrency control on A at X does not prevent U
from updating A at Y; similarly concurrency control
on B at N does not prevent Y from updating B at M &
P
y Local concurrency control not enough!
Local validation cont’d
z T has read from an item at X, so X’s
failure must be after T.
z T observes the failure of N, so N’s failure
must be before T
yN fails -> T reads A at X; T writes B at M & P
-> T commits -> X fails
ySimilarly, we can argue:
X fails -> U reads B at N; U writes A at Y ->
U commits -> N fails
Local validation cont’d
z Local validation ensures such incompatible
sequences cannot both occur
z Before a transaction commits it checks for
failures (and recoveries) of replica managers of
data items it has accessed
z In example, if T validates before U, T would
check that N is still unavailable and X,M, P are
available. If so, it can commit
z U’s validation would fail because N has already
failed.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F202
Figure 14.20 Network partition.
Client + front end
B
Withdraw(B, 4)
Client + front end
Replica managers
Deposit(B,3);
UT
Network
partition
B
B B
Handling Network Partitions
z Network partitions separate replica managers
into two or more subgroups, in such a way that
the members of a subgroup can communicate
with one another but members of different
subgroups cannot communicate
z Optimistic approaches
y Available copies with validation
z Pessimistic approaches
y Quorum consensus
Available Copies With
Validation
z Available copies algorithm applied within each
partition
y Maintains availability for Read operations
z When partition is repaired, possibly conflicting
transactions in separate partitions are validated
y The effects of a committed transaction that is now
aborted on validation will have to be undone
xOnly feasible for applications where such compensating
actions can be taken
Available copies with
validation cont’d
z Validation
y Version vectors (Write-Write conflicts)
y Precedence graphs (each partition maintains a log of
data items affected by the Read and Write operations
of transactions
y Log used to construct precedence graph whose
nodes are transactions and whose edges represent
conflicts between Read and Write operations
xNo cycles in graph corresponding to each partition
y If there are cycles in graph, validation fails
Quorum consensus
z A quorum is a subgroup of replica managers
whose size gives it the right to carry out
operations
z Majority voting one instance of a quorum
consensus scheme
y R + W > total number of votes in group
y W > half the total votes
y Ensures that each read quorum intersects a write
quorum, and two write quora will intersect
z Each replica has a version number that is used
to detect if the replica is up to date.
Virtual Partitions scheme
z Combines available copies and quorum
consensus
z Virtual partition = set of replica managers that
have a read and write quorum
z If a virtual partition can be formed, available
copies is used
y Improves performance of Reads
z If a failure occurs, and virtual partition changes
during a transaction, it is aborted
z Have to ensure virtual partitions do not overlap
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F203
Replica managers
Network partition
VX Y Z
TTransaction
Figure 14.21 Two network partitions.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F204
Figure 14.22 Virtual partition.
X V Y Z
Replica managers
Virtual partition Network partition
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F205
Figure 14.23 Two overlapping virtual partitions.
Virtual partition V1 Virtual partition V2
Y X V Z
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F206
Figure 14.24 Creating a virtual partition.
Phase 1:
• The initiator sends a Join request to each potential member. The argument of Join
is a proposed logical timestamp for the new virtual partition;
• When a replica manager receives a Join request it compares the proposed logical
timestamp with that of its current virtual partition;
– If the proposed logical timestamp is greater it agrees to join and replies Yes;
– If it is less, it refuses to join and replies No;
Phase 2:
• If the initiator has received sufficient Yes replies to have Read and Write quora, it
may complete the creation of the new virtual partition by sending a Confirmation
message to the sites that agreed to join. The creation timestamp and list of actual
members are sent as arguments;
• Replica managers receiving the Confirmation message join the new virtual
partition and record its creation timestamp and list of actual members.

Contenu connexe

Similaire à Replicated data

Practical Byzantine Fault Tolernace
Practical Byzantine Fault TolernacePractical Byzantine Fault Tolernace
Practical Byzantine Fault TolernaceYongraeJo
 
19. Distributed Databases in DBMS
19. Distributed Databases in DBMS19. Distributed Databases in DBMS
19. Distributed Databases in DBMSkoolkampus
 
Distributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Transactions(flat and nested) and Atomic Commit ProtocolsDistributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Transactions(flat and nested) and Atomic Commit ProtocolsSachin Chauhan
 
M03 2 Behavioral Diagrams
M03 2 Behavioral DiagramsM03 2 Behavioral Diagrams
M03 2 Behavioral DiagramsDang Tuan
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replicationAbDul ThaYyal
 
3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-query3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-queryM Rezaur Rahman
 
Distributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency controlDistributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency controlbalamurugan.k Kalibalamurugan
 
DIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterDIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterUlf Wendel
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency ControlDilum Bandara
 
20101116讨论会ppt
20101116讨论会ppt20101116讨论会ppt
20101116讨论会pptanyshpm chen
 
Distributed systems principles and paradigms.pdf
Distributed systems principles and paradigms.pdfDistributed systems principles and paradigms.pdf
Distributed systems principles and paradigms.pdfBizuayehuDesalegn
 
Two phase commit protocol in dbms
Two phase commit protocol in dbmsTwo phase commit protocol in dbms
Two phase commit protocol in dbmsDilouar Hossain
 
Introduction to transaction processing
Introduction to transaction processingIntroduction to transaction processing
Introduction to transaction processingJafar Nesargi
 
Chapter 9 introduction to transaction processing
Chapter 9 introduction to transaction processingChapter 9 introduction to transaction processing
Chapter 9 introduction to transaction processingJafar Nesargi
 

Similaire à Replicated data (20)

Practical Byzantine Fault Tolernace
Practical Byzantine Fault TolernacePractical Byzantine Fault Tolernace
Practical Byzantine Fault Tolernace
 
Chapter 13
Chapter 13Chapter 13
Chapter 13
 
19. Distributed Databases in DBMS
19. Distributed Databases in DBMS19. Distributed Databases in DBMS
19. Distributed Databases in DBMS
 
Distributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Transactions(flat and nested) and Atomic Commit ProtocolsDistributed Transactions(flat and nested) and Atomic Commit Protocols
Distributed Transactions(flat and nested) and Atomic Commit Protocols
 
M03 2 Behavioral Diagrams
M03 2 Behavioral DiagramsM03 2 Behavioral Diagrams
M03 2 Behavioral Diagrams
 
Chapter 14 replication
Chapter 14 replicationChapter 14 replication
Chapter 14 replication
 
3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-query3 distributed transactions-cocurrency-query
3 distributed transactions-cocurrency-query
 
Distributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency controlDistributed datababase Transaction and concurrency control
Distributed datababase Transaction and concurrency control
 
DBMS UNIT V.pptx
DBMS UNIT V.pptxDBMS UNIT V.pptx
DBMS UNIT V.pptx
 
09 workflow
09 workflow09 workflow
09 workflow
 
DIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL ClusterDIY: A distributed database cluster, or: MySQL Cluster
DIY: A distributed database cluster, or: MySQL Cluster
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency Control
 
20101116讨论会
20101116讨论会20101116讨论会
20101116讨论会
 
20101116讨论会ppt
20101116讨论会ppt20101116讨论会ppt
20101116讨论会ppt
 
Distributed systems principles and paradigms.pdf
Distributed systems principles and paradigms.pdfDistributed systems principles and paradigms.pdf
Distributed systems principles and paradigms.pdf
 
Two phase commit protocol in dbms
Two phase commit protocol in dbmsTwo phase commit protocol in dbms
Two phase commit protocol in dbms
 
slides.07.pptx
slides.07.pptxslides.07.pptx
slides.07.pptx
 
F33 book-depend-pres-pt6
F33 book-depend-pres-pt6F33 book-depend-pres-pt6
F33 book-depend-pres-pt6
 
Introduction to transaction processing
Introduction to transaction processingIntroduction to transaction processing
Introduction to transaction processing
 
Chapter 9 introduction to transaction processing
Chapter 9 introduction to transaction processingChapter 9 introduction to transaction processing
Chapter 9 introduction to transaction processing
 

Dernier

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Replicated data

  • 2. One copy serializability z Replicated transactional service y Each replica manager provides concurrency control and recovery of its own data items in the same way as it would for non-replicated data z Effects of transactions performed by various clients on replicated data items are the same as if they had been performed one at a time on a single data item z Additional complications: failures, network partitions y Failures should be serialized wrt transactions, i.e. any failure observed by a transaction must appear to have happened before a transaction started
  • 3. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F200 Figure 14.18 Replicated transactional service. B A Client + front end BB BA A GetBalance(A) Client + front end Replica managers Replica managers Deposit(B,3); UT
  • 4. Replication Schemes z Read one –Write All yCannot handle network partitions z Schemes that can handle network partitions yAvailable copies with validation yQuorum consensus yVirtual Partition
  • 5. Read one/Write All z One copy serializability y Each write operation sets a write lock at each replica manager y Each read sets a read lock at one replica manager z Two phase commit y Two-level nested transaction xCoordinator -> Workers xIf either coordinator or worker is a replica manager, it has to communicate with replica managers z Primary copy replication y ALL client requests are directed to a single primary server xDifferent from scheme discussed earlier
  • 6. Available copies replication z Can handle some replica managers are unavailable because they have failed or communication failure z Reads can be performed by any available replica manager but writes must be performed by all available replica managers z Normal case is like read one/write all y As long as the set of available replica managers does not change during a transaction
  • 7. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F201 Figure 14.19 Available copies. A X Client + front end P B Client + front end Replica managers Deposit(A,3); UT Deposit(B,3); GetBalance(B) GetBalance(A) Replica managers Y M B N A B
  • 8. Available copies replication z Failure case y One copy serializabilty requires that failures and recovery be serialized wrt transactions y This is not achieved when different transactions make conflicting failure observations y Example shows local concurrency control not enough y Additional concurrency control procedure (called local validation) has to be performed to ensure correctness z Available copies with local validation assumes no network partition - i.e. functioning replica managers can communicate with one another
  • 9. Local validation - example z Assume X fails just after T has performed GetBalance and N fails just after U has performed GetBalance z Assume X and N fail before T & U have performed their Deposit operations y T’s Deposit will be performed at M & P while U’s Deposit will be performed at Y y Concurrency control on A at X does not prevent U from updating A at Y; similarly concurrency control on B at N does not prevent Y from updating B at M & P y Local concurrency control not enough!
  • 10. Local validation cont’d z T has read from an item at X, so X’s failure must be after T. z T observes the failure of N, so N’s failure must be before T yN fails -> T reads A at X; T writes B at M & P -> T commits -> X fails ySimilarly, we can argue: X fails -> U reads B at N; U writes A at Y -> U commits -> N fails
  • 11. Local validation cont’d z Local validation ensures such incompatible sequences cannot both occur z Before a transaction commits it checks for failures (and recoveries) of replica managers of data items it has accessed z In example, if T validates before U, T would check that N is still unavailable and X,M, P are available. If so, it can commit z U’s validation would fail because N has already failed.
  • 12. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F202 Figure 14.20 Network partition. Client + front end B Withdraw(B, 4) Client + front end Replica managers Deposit(B,3); UT Network partition B B B
  • 13. Handling Network Partitions z Network partitions separate replica managers into two or more subgroups, in such a way that the members of a subgroup can communicate with one another but members of different subgroups cannot communicate z Optimistic approaches y Available copies with validation z Pessimistic approaches y Quorum consensus
  • 14. Available Copies With Validation z Available copies algorithm applied within each partition y Maintains availability for Read operations z When partition is repaired, possibly conflicting transactions in separate partitions are validated y The effects of a committed transaction that is now aborted on validation will have to be undone xOnly feasible for applications where such compensating actions can be taken
  • 15. Available copies with validation cont’d z Validation y Version vectors (Write-Write conflicts) y Precedence graphs (each partition maintains a log of data items affected by the Read and Write operations of transactions y Log used to construct precedence graph whose nodes are transactions and whose edges represent conflicts between Read and Write operations xNo cycles in graph corresponding to each partition y If there are cycles in graph, validation fails
  • 16. Quorum consensus z A quorum is a subgroup of replica managers whose size gives it the right to carry out operations z Majority voting one instance of a quorum consensus scheme y R + W > total number of votes in group y W > half the total votes y Ensures that each read quorum intersects a write quorum, and two write quora will intersect z Each replica has a version number that is used to detect if the replica is up to date.
  • 17. Virtual Partitions scheme z Combines available copies and quorum consensus z Virtual partition = set of replica managers that have a read and write quorum z If a virtual partition can be formed, available copies is used y Improves performance of Reads z If a failure occurs, and virtual partition changes during a transaction, it is aborted z Have to ensure virtual partitions do not overlap
  • 18. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F203 Replica managers Network partition VX Y Z TTransaction Figure 14.21 Two network partitions.
  • 19. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F204 Figure 14.22 Virtual partition. X V Y Z Replica managers Virtual partition Network partition
  • 20. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F205 Figure 14.23 Two overlapping virtual partitions. Virtual partition V1 Virtual partition V2 Y X V Z
  • 21. Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 2 (2nd impression) © Addison-Wesley Publishers 1994 F206 Figure 14.24 Creating a virtual partition. Phase 1: • The initiator sends a Join request to each potential member. The argument of Join is a proposed logical timestamp for the new virtual partition; • When a replica manager receives a Join request it compares the proposed logical timestamp with that of its current virtual partition; – If the proposed logical timestamp is greater it agrees to join and replies Yes; – If it is less, it refuses to join and replies No; Phase 2: • If the initiator has received sufficient Yes replies to have Read and Write quora, it may complete the creation of the new virtual partition by sending a Confirmation message to the sites that agreed to join. The creation timestamp and list of actual members are sent as arguments; • Replica managers receiving the Confirmation message join the new virtual partition and record its creation timestamp and list of actual members.