SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
Container Sync


        Gregory Holt
       gholt@rackspace.com
         http://tlohg.com/
   OpenStack Design Summit
      April 26-29, 2011
Original Goal
Provide greater availability and durability with geographically distinct replicas.

     Multi-Region Replication
     • Replicate objects to other Swift clusters.
     • Allow a configurable number of remote replicas.
     • Ideally allow per container configuration.
     Problems
     • Very complex to implement, the simpler feature I propose
          is already pretty complex.
     •    Swift currently only has a cluster-wide replica count.
     •    Tracking how many replicas are remote and where adds
          complexity.
     •    Per container remote replica counts adds complexity.

     Complexity = More Time and More Bugs
New Goal
Provide greater availability and durability with geographically distinct replicas.

    Simpler Container Synchronization
    • Replicate objects to other Swift clusters.
    • Remote replica count not configurable, it is the number of
        replicas the remote cluster is already configured for.
    •   Per container configuration allowed, but just "to where".

    Benefits
    • Much simpler (but still complex).
    • Doesn't alter fundamental Swift internals.
    • Per container configuration that doesn't change behavior,
        only the destination.
    •   Side Benefit: Can actually synchronize containers within the
        same cluster. (Migrating an account to another, for instance.)

    Simpler = Less Time and Fewer Bugs
How the User Would Use It
1. Set the first container's X-Container-Sync-To and
   X-Container-Sync-Key values; the To to the second
   container's URL and the Key made up:
  $ st post -t https://cluster2/v1/AUTH_gholt/container2 -k secret container1



2. Set the second container's X-Container-Sync-To and
   X-Container-Sync-Key values; the To to the first
   container's URL and the Key the same made up value:
  $ st post -t https://cluster1/v1/AUTH_gholt/container1 -k secret container2




  Now, any existing objects in the containers will be synced
  to one another as well as any additional objects.
Advanced Container Synchronization
      You can synchronize more than just two containers.


   Normally you just synchronize the two containers:


         Container 1                   Container 2



    But, you could synchronize more by using a chain:


      Container 1        Container 2        Container 3
Caveats
• Valid X-Container-Sync-To destinations must be configured for each
  cluster ahead of time. The feature is based on Cluster Trust.
• The Swift cluster clocks need to be set reasonably close to one other.
  Swift timestamps each operation and these timestamps are used in conflict
  resolution. For example, if an object is deleted on one cluster and
  overwritten on the other, whichever has the newest timestamp will win.
• There needs to be enough bandwidth between the clusters to keep up
  with all the changes to the synchronized containers.
• There will be a burst of bandwidth used when turning the feature on for
  an existing container full of objects.
• A user has no explicit guarantee when a change will make it to the remote
  cluster. For example, a successful PUT means that cluster has the object,
  not the remote cluster. The synchronization happens in the background.
• Does not sync object POSTs yet (more on this later).
• Since background syncs come from the container servers themselves, they
  need to communicate with the remote cluster, probably requiring an HTTP
  proxy, and probably one per zone to avoid choke points.
What’s Left To Do?

                        HTTP Proxying
                                  Tests
                       Documentation
                                POSTs

Because object POSTs don't currently cause a container database update, we need
  to either cause an update or come up with another way to synchronize them.

     The current plan is to modify POSTs to actually be a COPY internally.

                Downside: POSTs to large files will take longer.

            Upside: We have noticed very few POSTs in production.
Live Account Migrations
       This is a big step towards live account migrations.
1.   Turn on sync for the linked accounts on the two clusters.
2.   Wait for the new account to get caught up.
3.   Switch auth response URL to new account and revoke all existing account tokens.
4.   Put old account in a read-only mode.
5.   Turn off sync from the new account to the old.
6.   Wait until old account is no longer sending updates plus some safety time.
7.   Purge old account.

                                 Missing Pieces:
• Account sync (creating new containers on both sides, deletes and posts too).
• Account read-only mode.
• Using alternate operator-only headers to not conflict with the user's, also keeping
     the user from seeing or modifying the values.
Implementation
st
• Updated to set/read X-Container-Sync-To and X-Container-Sync-Key.


Swauth and container-server
• Requires a new conf value allowed_sync_hosts indicating the allowed remote
    clusters.


swift-container-sync
•   New daemon that runs on every container server.
•   Scans every container database looking for ones with sync turned on.
•   Sends updates based on any new ROWIDs in the container database.
•   Keeps sync points in the local container databases of the last ROWIDs sent out.
Complexity - swift-container-sync
There are three container databases on different servers for each container.
No need and quite wasteful for each to send all the updates.
Easiest solution is to just have one send out the updates, but:
    • What if that one is down?
    • Couldn't synchronization be done faster if all three were involved?


Instead, each sends a different third of the updates (assuming 3 replicas here).
    • Downside: If one is down, a third of the updates will be delayed until it comes back up.


So, in addition, each node will send all older updates to ensure quicker synchronization.
    • Normally, each server does a third of the updates.
    • Each server also does all older updates for assurance.
    • The vast majority of assurance updates will short circuit.
In The Weeds

• Two sync points are kept per container database.
• All rows between the two sync points trigger updates. *
• Any rows newer than both sync points cause updates
 depending on the node's position for the container (primary
 nodes do one third, etc. depending on the replica count of
 course).
• After a sync run, the first sync point is set to the newest
 ROWID known and the second sync point is set to newest
 ROWID for which all updates have been sent.


 * This is a slight lie. It actually only needs to send the two-thirds of updates it isn't
 primarily responsible for since it knows it already sent the other third.
In The Weeds

  An example may help. Assume replica count is 3 and perfectly
  matching ROWIDs starting at 1.


  First sync run, database has 6 rows:
• SyncPoint1 starts as -1.
• SyncPoint2 starts as -1.
• No rows between points, so no "all updates" rows.
• Six rows newer than SyncPoint1, so a third of the rows are sent by
  node 1, another third by node 2, remaining third by node 3.
• SyncPoint1 is set as 6 (the newest ROWID known).
• SyncPoint2 is left as -1 since no "all updates" rows were synced.
In The Weeds

 Next sync run, database has 12 rows:
• SyncPoint1 starts as 6.
• SyncPoint2 starts as -1.
• The rows between -1 and 6 all trigger updates (most of which should
 short-circuit on the remote end as having already been done).
• Six more rows newer than SyncPoint1, so a third of the rows are
 sent by node 1, another third by node 2, remaining third by node 3.
• SyncPoint1 is set as 12 (the newest ROWID known).
• SyncPoint2 is set as 6 (the newest "all updates" ROWID).

 In this way, under normal circumstances each node sends its share of updates
 each run and just sends a batch of older updates to ensure nothing was missed.
Extras
• swift-container-sync can be configured to only spend x amount of time trying
  to sync a given container -- avoids one crazy container starving out all others.
• A crash of a container server means lost container database copies that will be
  replaced by one of the remaining copies on the other servers. The
  reestablished server will get the sync points from the copy, but no updates will
  be lost due to the "all updates" algorithm the other two followed.
• Rebalancing the container ring moves container database copies around, but
  results in the same behavior as a crashed server would.
• For bidirectional sync setups, the receiver will send the sender back the
  updates (though they will short-circuit). Only way I can think to prevent that is
  to track where updates were received from (X-Loop) but that's expensive.


                                   Anything Else?
                              gholt@rackspace.com
                                 http://tlohg.com/

Contenu connexe

Tendances

Preview of Apache Pulsar 2.5.0
Preview of Apache Pulsar 2.5.0Preview of Apache Pulsar 2.5.0
Preview of Apache Pulsar 2.5.0StreamNative
 
Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)Sri Prasanna
 
Using Docker Swarm Mode to Deploy Service Without Loss by Dongluo Chen & Nish...
Using Docker Swarm Mode to Deploy Service Without Loss by Dongluo Chen & Nish...Using Docker Swarm Mode to Deploy Service Without Loss by Dongluo Chen & Nish...
Using Docker Swarm Mode to Deploy Service Without Loss by Dongluo Chen & Nish...Docker, Inc.
 
NATS + Docker meetup talk Oct - 2016
NATS + Docker meetup talk Oct - 2016NATS + Docker meetup talk Oct - 2016
NATS + Docker meetup talk Oct - 2016wallyqs
 
Exploring Magnum and Senlin integration for autoscaling containers
Exploring Magnum and Senlin integration for autoscaling containersExploring Magnum and Senlin integration for autoscaling containers
Exploring Magnum and Senlin integration for autoscaling containersTon Ngo
 
Kubernetes #6 advanced scheduling
Kubernetes #6   advanced schedulingKubernetes #6   advanced scheduling
Kubernetes #6 advanced schedulingTerry Cho
 
How to tune Kafka® for production
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for productionconfluent
 
Bug smash day magnum
Bug smash day magnumBug smash day magnum
Bug smash day magnumTon Ngo
 
GopherFest 2017 - Adding Context to NATS
GopherFest 2017 -  Adding Context to NATSGopherFest 2017 -  Adding Context to NATS
GopherFest 2017 - Adding Context to NATSwallyqs
 
BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
BlueHat Seattle 2019 || Kubernetes Practical Attack and DefenseBlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
BlueHat Seattle 2019 || Kubernetes Practical Attack and DefenseBlueHat Security Conference
 
Kubernetes overview and Exploitation
Kubernetes overview and ExploitationKubernetes overview and Exploitation
Kubernetes overview and ExploitationOWASPSeasides
 
Deploying Microservice on Docker
Deploying Microservice on DockerDeploying Microservice on Docker
Deploying Microservice on DockerKnoldus Inc.
 
Addressing data plane performance measurement on OpenStack clouds using VMTP
Addressing data plane performance measurement on OpenStack clouds using VMTPAddressing data plane performance measurement on OpenStack clouds using VMTP
Addressing data plane performance measurement on OpenStack clouds using VMTPSuhail Syed
 
OpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon Garcia
OpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon GarciaOpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon Garcia
OpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon GarciaOpenNebula Project
 
Kubernetes workshop
Kubernetes workshopKubernetes workshop
Kubernetes workshopKumar Gaurav
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 

Tendances (20)

Preview of Apache Pulsar 2.5.0
Preview of Apache Pulsar 2.5.0Preview of Apache Pulsar 2.5.0
Preview of Apache Pulsar 2.5.0
 
Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)Clock Synchronization (Distributed computing)
Clock Synchronization (Distributed computing)
 
Using Docker Swarm Mode to Deploy Service Without Loss by Dongluo Chen & Nish...
Using Docker Swarm Mode to Deploy Service Without Loss by Dongluo Chen & Nish...Using Docker Swarm Mode to Deploy Service Without Loss by Dongluo Chen & Nish...
Using Docker Swarm Mode to Deploy Service Without Loss by Dongluo Chen & Nish...
 
NATS + Docker meetup talk Oct - 2016
NATS + Docker meetup talk Oct - 2016NATS + Docker meetup talk Oct - 2016
NATS + Docker meetup talk Oct - 2016
 
GFS
GFSGFS
GFS
 
Sdn command line controller lab
Sdn command line controller labSdn command line controller lab
Sdn command line controller lab
 
Exploring Magnum and Senlin integration for autoscaling containers
Exploring Magnum and Senlin integration for autoscaling containersExploring Magnum and Senlin integration for autoscaling containers
Exploring Magnum and Senlin integration for autoscaling containers
 
Geneve
GeneveGeneve
Geneve
 
Kubernetes #6 advanced scheduling
Kubernetes #6   advanced schedulingKubernetes #6   advanced scheduling
Kubernetes #6 advanced scheduling
 
Storm - SpaaS
Storm - SpaaSStorm - SpaaS
Storm - SpaaS
 
How to tune Kafka® for production
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for production
 
Bug smash day magnum
Bug smash day magnumBug smash day magnum
Bug smash day magnum
 
GopherFest 2017 - Adding Context to NATS
GopherFest 2017 -  Adding Context to NATSGopherFest 2017 -  Adding Context to NATS
GopherFest 2017 - Adding Context to NATS
 
BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
BlueHat Seattle 2019 || Kubernetes Practical Attack and DefenseBlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
 
Kubernetes overview and Exploitation
Kubernetes overview and ExploitationKubernetes overview and Exploitation
Kubernetes overview and Exploitation
 
Deploying Microservice on Docker
Deploying Microservice on DockerDeploying Microservice on Docker
Deploying Microservice on Docker
 
Addressing data plane performance measurement on OpenStack clouds using VMTP
Addressing data plane performance measurement on OpenStack clouds using VMTPAddressing data plane performance measurement on OpenStack clouds using VMTP
Addressing data plane performance measurement on OpenStack clouds using VMTP
 
OpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon Garcia
OpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon GarciaOpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon Garcia
OpenNebulaConf2015 1.09.02 Installgems Add-on - Alvaro Simon Garcia
 
Kubernetes workshop
Kubernetes workshopKubernetes workshop
Kubernetes workshop
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 

Similaire à Swift container sync

Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity securityLen Bass
 
Running a distributed system across kubernetes clusters - Kubecon North Ameri...
Running a distributed system across kubernetes clusters - Kubecon North Ameri...Running a distributed system across kubernetes clusters - Kubecon North Ameri...
Running a distributed system across kubernetes clusters - Kubecon North Ameri...Alex Robinson
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Xavier Lucas
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Dharma Shukla
 
designing distributed scalable and reliable systems
designing distributed scalable and reliable systemsdesigning distributed scalable and reliable systems
designing distributed scalable and reliable systemsMauro Servienti
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayAltinity Ltd
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Belmiro Moreira
 
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...Continuent
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafkaconfluent
 
Brief Introduction To Kubernetes
Brief Introduction To KubernetesBrief Introduction To Kubernetes
Brief Introduction To KubernetesAvinash Ketkar
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...HostedbyConfluent
 
Cloud computing Module 2 First Part
Cloud computing Module 2 First PartCloud computing Module 2 First Part
Cloud computing Module 2 First PartSoumee Maschatak
 
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...confluent
 

Similaire à Swift container sync (20)

Architecting for the cloud elasticity security
Architecting for the cloud elasticity securityArchitecting for the cloud elasticity security
Architecting for the cloud elasticity security
 
Running a distributed system across kubernetes clusters - Kubecon North Ameri...
Running a distributed system across kubernetes clusters - Kubecon North Ameri...Running a distributed system across kubernetes clusters - Kubecon North Ameri...
Running a distributed system across kubernetes clusters - Kubecon North Ameri...
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
 
Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019Cosmos DB at VLDB 2019
Cosmos DB at VLDB 2019
 
designing distributed scalable and reliable systems
designing distributed scalable and reliable systemsdesigning distributed scalable and reliable systems
designing distributed scalable and reliable systems
 
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBayReal-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
Real-time, Exactly-once Data Ingestion from Kafka to ClickHouse at eBay
 
Shoaib
ShoaibShoaib
Shoaib
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
 
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
Webinar Slides: Tungsten Connector / Proxy – The Secret Sauce Behind Zero-Dow...
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 
Shoaib
ShoaibShoaib
Shoaib
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
Freckle
FreckleFreckle
Freckle
 
Brief Introduction To Kubernetes
Brief Introduction To KubernetesBrief Introduction To Kubernetes
Brief Introduction To Kubernetes
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
 
Cloud computing Module 2 First Part
Cloud computing Module 2 First PartCloud computing Module 2 First Part
Cloud computing Module 2 First Part
 
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
 

Plus de Open Stack

OpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overviewOpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overviewOpen Stack
 
OpenStack Swift overview oscon2011
OpenStack Swift overview oscon2011OpenStack Swift overview oscon2011
OpenStack Swift overview oscon2011Open Stack
 
Dell Crowbar and OpenStack at OSCON
Dell Crowbar and OpenStack at OSCONDell Crowbar and OpenStack at OSCON
Dell Crowbar and OpenStack at OSCONOpen Stack
 
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry introEMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry introOpen Stack
 
OpenStack Technology Overview
OpenStack Technology OverviewOpenStack Technology Overview
OpenStack Technology OverviewOpen Stack
 
JCO Conference OpenStack
JCO Conference OpenStackJCO Conference OpenStack
JCO Conference OpenStackOpen Stack
 
OpenStack 101 Technical Overview
OpenStack 101 Technical OverviewOpenStack 101 Technical Overview
OpenStack 101 Technical OverviewOpen Stack
 
Nebula james Williams
Nebula james WilliamsNebula james Williams
Nebula james WilliamsOpen Stack
 
Open stack dashboard diablo
Open stack dashboard   diabloOpen stack dashboard   diablo
Open stack dashboard diabloOpen Stack
 
Snapshot clone-boot-presentation-final
Snapshot clone-boot-presentation-finalSnapshot clone-boot-presentation-final
Snapshot clone-boot-presentation-finalOpen Stack
 
Opening Presentation
Opening PresentationOpening Presentation
Opening PresentationOpen Stack
 
Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Open Stack
 
The site architecture you can edit
The site architecture you can editThe site architecture you can edit
The site architecture you can editOpen Stack
 
Mach Technology
Mach Technology Mach Technology
Mach Technology Open Stack
 
OpenStack on Intel
OpenStack on IntelOpenStack on Intel
OpenStack on IntelOpen Stack
 
Operating the Hyperscale Cloud
Operating the Hyperscale CloudOperating the Hyperscale Cloud
Operating the Hyperscale CloudOpen Stack
 
Openstack and eBay
Openstack and eBay Openstack and eBay
Openstack and eBay Open Stack
 
OpenStack Opportunity - Citrix
OpenStack Opportunity - CitrixOpenStack Opportunity - Citrix
OpenStack Opportunity - CitrixOpen Stack
 
PaaS on Openstack
PaaS on OpenstackPaaS on Openstack
PaaS on OpenstackOpen Stack
 

Plus de Open Stack (20)

OpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overviewOpenStack Boston User Group, OpenStack overview
OpenStack Boston User Group, OpenStack overview
 
OpenStack Swift overview oscon2011
OpenStack Swift overview oscon2011OpenStack Swift overview oscon2011
OpenStack Swift overview oscon2011
 
Dell Crowbar and OpenStack at OSCON
Dell Crowbar and OpenStack at OSCONDell Crowbar and OpenStack at OSCON
Dell Crowbar and OpenStack at OSCON
 
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry introEMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
EMEA OpenStack Day, July 13th 2011 in London - Jim Curry intro
 
OpenStack Technology Overview
OpenStack Technology OverviewOpenStack Technology Overview
OpenStack Technology Overview
 
JCO Conference OpenStack
JCO Conference OpenStackJCO Conference OpenStack
JCO Conference OpenStack
 
OpenStack 101 Technical Overview
OpenStack 101 Technical OverviewOpenStack 101 Technical Overview
OpenStack 101 Technical Overview
 
Nova HA
Nova HANova HA
Nova HA
 
Nebula james Williams
Nebula james WilliamsNebula james Williams
Nebula james Williams
 
Open stack dashboard diablo
Open stack dashboard   diabloOpen stack dashboard   diablo
Open stack dashboard diablo
 
Snapshot clone-boot-presentation-final
Snapshot clone-boot-presentation-finalSnapshot clone-boot-presentation-final
Snapshot clone-boot-presentation-final
 
Opening Presentation
Opening PresentationOpening Presentation
Opening Presentation
 
Gluster open stack dev summit 042011
Gluster open stack dev summit 042011Gluster open stack dev summit 042011
Gluster open stack dev summit 042011
 
The site architecture you can edit
The site architecture you can editThe site architecture you can edit
The site architecture you can edit
 
Mach Technology
Mach Technology Mach Technology
Mach Technology
 
OpenStack on Intel
OpenStack on IntelOpenStack on Intel
OpenStack on Intel
 
Operating the Hyperscale Cloud
Operating the Hyperscale CloudOperating the Hyperscale Cloud
Operating the Hyperscale Cloud
 
Openstack and eBay
Openstack and eBay Openstack and eBay
Openstack and eBay
 
OpenStack Opportunity - Citrix
OpenStack Opportunity - CitrixOpenStack Opportunity - Citrix
OpenStack Opportunity - Citrix
 
PaaS on Openstack
PaaS on OpenstackPaaS on Openstack
PaaS on Openstack
 

Dernier

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Swift container sync

  • 1. Container Sync Gregory Holt gholt@rackspace.com http://tlohg.com/ OpenStack Design Summit April 26-29, 2011
  • 2. Original Goal Provide greater availability and durability with geographically distinct replicas. Multi-Region Replication • Replicate objects to other Swift clusters. • Allow a configurable number of remote replicas. • Ideally allow per container configuration. Problems • Very complex to implement, the simpler feature I propose is already pretty complex. • Swift currently only has a cluster-wide replica count. • Tracking how many replicas are remote and where adds complexity. • Per container remote replica counts adds complexity. Complexity = More Time and More Bugs
  • 3. New Goal Provide greater availability and durability with geographically distinct replicas. Simpler Container Synchronization • Replicate objects to other Swift clusters. • Remote replica count not configurable, it is the number of replicas the remote cluster is already configured for. • Per container configuration allowed, but just "to where". Benefits • Much simpler (but still complex). • Doesn't alter fundamental Swift internals. • Per container configuration that doesn't change behavior, only the destination. • Side Benefit: Can actually synchronize containers within the same cluster. (Migrating an account to another, for instance.) Simpler = Less Time and Fewer Bugs
  • 4. How the User Would Use It 1. Set the first container's X-Container-Sync-To and X-Container-Sync-Key values; the To to the second container's URL and the Key made up: $ st post -t https://cluster2/v1/AUTH_gholt/container2 -k secret container1 2. Set the second container's X-Container-Sync-To and X-Container-Sync-Key values; the To to the first container's URL and the Key the same made up value: $ st post -t https://cluster1/v1/AUTH_gholt/container1 -k secret container2 Now, any existing objects in the containers will be synced to one another as well as any additional objects.
  • 5. Advanced Container Synchronization You can synchronize more than just two containers. Normally you just synchronize the two containers: Container 1 Container 2 But, you could synchronize more by using a chain: Container 1 Container 2 Container 3
  • 6. Caveats • Valid X-Container-Sync-To destinations must be configured for each cluster ahead of time. The feature is based on Cluster Trust. • The Swift cluster clocks need to be set reasonably close to one other. Swift timestamps each operation and these timestamps are used in conflict resolution. For example, if an object is deleted on one cluster and overwritten on the other, whichever has the newest timestamp will win. • There needs to be enough bandwidth between the clusters to keep up with all the changes to the synchronized containers. • There will be a burst of bandwidth used when turning the feature on for an existing container full of objects. • A user has no explicit guarantee when a change will make it to the remote cluster. For example, a successful PUT means that cluster has the object, not the remote cluster. The synchronization happens in the background. • Does not sync object POSTs yet (more on this later). • Since background syncs come from the container servers themselves, they need to communicate with the remote cluster, probably requiring an HTTP proxy, and probably one per zone to avoid choke points.
  • 7. What’s Left To Do? HTTP Proxying Tests Documentation POSTs Because object POSTs don't currently cause a container database update, we need to either cause an update or come up with another way to synchronize them. The current plan is to modify POSTs to actually be a COPY internally. Downside: POSTs to large files will take longer. Upside: We have noticed very few POSTs in production.
  • 8. Live Account Migrations This is a big step towards live account migrations. 1. Turn on sync for the linked accounts on the two clusters. 2. Wait for the new account to get caught up. 3. Switch auth response URL to new account and revoke all existing account tokens. 4. Put old account in a read-only mode. 5. Turn off sync from the new account to the old. 6. Wait until old account is no longer sending updates plus some safety time. 7. Purge old account. Missing Pieces: • Account sync (creating new containers on both sides, deletes and posts too). • Account read-only mode. • Using alternate operator-only headers to not conflict with the user's, also keeping the user from seeing or modifying the values.
  • 9. Implementation st • Updated to set/read X-Container-Sync-To and X-Container-Sync-Key. Swauth and container-server • Requires a new conf value allowed_sync_hosts indicating the allowed remote clusters. swift-container-sync • New daemon that runs on every container server. • Scans every container database looking for ones with sync turned on. • Sends updates based on any new ROWIDs in the container database. • Keeps sync points in the local container databases of the last ROWIDs sent out.
  • 10. Complexity - swift-container-sync There are three container databases on different servers for each container. No need and quite wasteful for each to send all the updates. Easiest solution is to just have one send out the updates, but: • What if that one is down? • Couldn't synchronization be done faster if all three were involved? Instead, each sends a different third of the updates (assuming 3 replicas here). • Downside: If one is down, a third of the updates will be delayed until it comes back up. So, in addition, each node will send all older updates to ensure quicker synchronization. • Normally, each server does a third of the updates. • Each server also does all older updates for assurance. • The vast majority of assurance updates will short circuit.
  • 11. In The Weeds • Two sync points are kept per container database. • All rows between the two sync points trigger updates. * • Any rows newer than both sync points cause updates depending on the node's position for the container (primary nodes do one third, etc. depending on the replica count of course). • After a sync run, the first sync point is set to the newest ROWID known and the second sync point is set to newest ROWID for which all updates have been sent. * This is a slight lie. It actually only needs to send the two-thirds of updates it isn't primarily responsible for since it knows it already sent the other third.
  • 12. In The Weeds An example may help. Assume replica count is 3 and perfectly matching ROWIDs starting at 1. First sync run, database has 6 rows: • SyncPoint1 starts as -1. • SyncPoint2 starts as -1. • No rows between points, so no "all updates" rows. • Six rows newer than SyncPoint1, so a third of the rows are sent by node 1, another third by node 2, remaining third by node 3. • SyncPoint1 is set as 6 (the newest ROWID known). • SyncPoint2 is left as -1 since no "all updates" rows were synced.
  • 13. In The Weeds Next sync run, database has 12 rows: • SyncPoint1 starts as 6. • SyncPoint2 starts as -1. • The rows between -1 and 6 all trigger updates (most of which should short-circuit on the remote end as having already been done). • Six more rows newer than SyncPoint1, so a third of the rows are sent by node 1, another third by node 2, remaining third by node 3. • SyncPoint1 is set as 12 (the newest ROWID known). • SyncPoint2 is set as 6 (the newest "all updates" ROWID). In this way, under normal circumstances each node sends its share of updates each run and just sends a batch of older updates to ensure nothing was missed.
  • 14. Extras • swift-container-sync can be configured to only spend x amount of time trying to sync a given container -- avoids one crazy container starving out all others. • A crash of a container server means lost container database copies that will be replaced by one of the remaining copies on the other servers. The reestablished server will get the sync points from the copy, but no updates will be lost due to the "all updates" algorithm the other two followed. • Rebalancing the container ring moves container database copies around, but results in the same behavior as a crashed server would. • For bidirectional sync setups, the receiver will send the sender back the updates (though they will short-circuit). Only way I can think to prevent that is to track where updates were received from (X-Loop) but that's expensive. Anything Else? gholt@rackspace.com http://tlohg.com/