SlideShare a Scribd company logo
1 of 25
Download to read offline
Using OpenStack Swift for 
extreme data durability 
Florent Flament, Cloudwatt 
Christian Schwede, eNovance 
OpenStack Summit Paris, November 2014
Intro - Cloudwatt 
● Florent Flament 
● Dev & Fireman @ Cloudwatt 
● Fixing & tuning of OpenStack (Cinder, Keystone, Nova, Swift) 
● Email: florent.flament@cloudwatt.com 
● IRC: florentflament on #openstack-dev (Freenode) 
● Twitter: @florentflament_ 
● Blogs: http://dev.cloudwatt.com / http://www.florentflament.com
Intro - eNovance 
● Christian Schwede 
● Developer @ eNovance / Red Hat 
● Mostly working on Swift, testing, automation and developer tools 
● Swift Core 
● IRC: cschwede in #openstack-swift 
● christian@enovance.com / cschwede@redhat.com 
● Twitter: @cschwede_de
Architecture
Proxy 
Node 
Proxy 
Node 
Network 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk
Proxy 
Node 
Proxy 
Node 
Network 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Zone 0 Zone 1 Zone 2
Proxy 
Node 
Proxy 
Node 
Network 
Zone 0 Zone 1 
Region 0 (⅔ of the data) 
Zone 2 
Region 1 (⅓ of the data) 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Zone 0 
Disk 
Disk 
Disk
The Ring
Ring : the Map of data 
● One file per type of data. Ring files map each copy of a 
data to a physical device through partitions. 
● An object’s partition number is computed from the hash 
of the object’s name. 
● A Ring file is: a (replica, partition) to device ID table, a 
devices table and a number of hash bits. 
● Visualize a Ring: https://github.com/victorlin/swiftsense
Concrete example of Ring 
Replica & Partition to Device ID table Devices table 
0 1 2 3 0 1 2 3 
1 2 3 0 1 2 3 0 
2 3 0 1 2 3 0 1 
Partition number 
0 
1 
2 
Replica number 
0 1 2 3 4 5 6 7 
ID Host Port Device 
0 192.168.0.10 6000 sdb1 
1 192.168.0.10 6000 sdc1 
2 192.168.0.11 6000 sdb1 
3 192.168.0.11 6000 sdc1 
Bit count (partition power) = 3 
→ 23 = 8 partitions
Storage policies 
● Included in the Juno release (Swift > 2.0.0) 
● Applied on a per-container basis 
● Flexibility to use multiple rings, for example: 
○ Basic: 2 replicas on spinning disks, single datacenter 
○ Strong: 3 replicas in three different datacenters around the globe 
○ Fast: 3 replicas on SSDs and much more powerful proxies
Availability & Durability
Object durability 
● Disk failures: pd ~ 2-5% per year 
● Unrecoverable bit read errors: pb = 10-15 ⋅ 8 ⋅ objectsize 
Failure Failure Failure 
3 replicas 2 replicas 1 replica Data loss 
Replication Replication Replication 
● Durability in the range of 10-11 nines with 3 replicas (99.99999999%) 
● http://enovance.github.io/swift-durability-calculator/
Recover from a disk failure 
Set failed device weight to 0, rebalance, push new ring 
Failed
Object availability & durability 
Zone 0 Zone 1 
Region 0 (⅔ of the data) 
Zone 2 
Region 1 (⅓ of the data) 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Disk 
Zone 0 
Disk 
Disk 
Disk
Maintenance
Maintainability by Simplicity 
● Standalone `swift-ring-builder` tool to manipulate the Ring 
○ Uses `builders` files to keep architectural information on the cluster 
○ Smartly assigns partitions to devices 
○ Generates Ring files easily checked 
● Processes on Swift nodes focus on ensuring that files are stored 
uncorrupted at the appropriate location
Splitting a running Swift Cluster 
● Ensuring no data is lost 
○ Move only 1 replica at a time 
○ Small steps to limit the impact 
○ Check for data corruption 
○ Check data location 
○ Rollback in case of failure 
● Limiting the impact on performance 
○ Availability of cluster resources 
○ Load incurred by cluster being split 
○ Small steps to limit the impact 
○ Control nodes accessed by users 
Natively available in Swift
Splitting a running Swift Cluster 
● Ensuring no data is lost 
○ Move only 1 replica at a time 
○ Small steps to limit the impact 
○ Check for data corruption 
○ Check data location 
○ Rollback in case of failure 
● Limiting the impact on performance 
○ Availability of cluster resources 
○ Load incurred by cluster being split 
○ Small steps to limit the impact 
○ Control nodes accessed by users 
Small steps 
New in Swift 2.2 !!
Adding a new region 
Add a new region smoothly by limiting the amount of data moved 
● really possible since Swift 2.2 
● Final weight in new region should be at least ⅓ of the total cluster weight 
Example of process: 
1. Add devices to new region with a very low weight 
2. Increase devices’ weights to store 5% of data in the new region 
3. Progressively increase by steps of 5% the amount of data in the new region 
More details: http://www.florentflament.com/blog/splitting-swift-cluster.html
Outlook & Summary
Erasure coding 
● Coming real soon now 
● Instead of N copies of each object: 
○ apply EC to object, split into multiple fragments, for example 14 
○ store them on different disks/nodes 
○ objects can be rebuild from 10 fragments 
■ Tolerates loss of 4 fragments 
● higher durability 
■ Only ~ 40% overhead (compared to 200%) 
● much cheaper
Durability calculation 
● More detailed calculation 
○ Number of disks, servers, partitions 
● Add erasure coding 
● Include in Swift documentation? 
● Community effort 
○ Discussion started last Swift hackathon 
■ NTT, Swiftstack, IBM, Seagate, Red Hat / eNovance 
○ Ad-Hoc session on Thursday/Friday - join us!
Summary 
● High availability, even if large parts of the cluster are not accessible 
● Automatic failure correction ensures high durability, and depending on 
your cluster configuration excels known industry standards 
● Swift 2.2 (Juno release) 
○ Even smoother and predictable cluster upgrades 
○ Storage Policies allow fine grained data placement control 
● Erasure Coding increase durability even more while lowering costs
 Using OpenStack Swift for Extreme Data Durability

More Related Content

What's hot

Turning OpenStack Swift into a VM storage platform
Turning OpenStack Swift into a VM storage platformTurning OpenStack Swift into a VM storage platform
Turning OpenStack Swift into a VM storage platform
OpenStack_Online
 
Openstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - IsraelOpenstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - Israel
Arthur Berezin
 
OpenStack Introduction
OpenStack IntroductionOpenStack Introduction
OpenStack Introduction
openstackindia
 
Introducing OpenStack for Beginners
Introducing OpenStack for Beginners Introducing OpenStack for Beginners
Introducing OpenStack for Beginners
openstackindia
 
What's new in OpenStack Liberty
What's new in OpenStack LibertyWhat's new in OpenStack Liberty
What's new in OpenStack Liberty
Stephen Gordon
 

What's hot (20)

Deploying swift on a scale out file system
Deploying swift on a scale out file systemDeploying swift on a scale out file system
Deploying swift on a scale out file system
 
Changing the world with ZeroVM and Swift
Changing the world with ZeroVM and SwiftChanging the world with ZeroVM and Swift
Changing the world with ZeroVM and Swift
 
Swift Install Workshop - OpenStack Conference Spring 2012
Swift Install Workshop - OpenStack Conference Spring 2012Swift Install Workshop - OpenStack Conference Spring 2012
Swift Install Workshop - OpenStack Conference Spring 2012
 
Turning OpenStack Swift into a VM storage platform
Turning OpenStack Swift into a VM storage platformTurning OpenStack Swift into a VM storage platform
Turning OpenStack Swift into a VM storage platform
 
Monitoring Swift - OpenStack Summit May 2015, Vancouver
Monitoring Swift - OpenStack Summit May 2015, VancouverMonitoring Swift - OpenStack Summit May 2015, Vancouver
Monitoring Swift - OpenStack Summit May 2015, Vancouver
 
Openstack – An introduction
Openstack – An introductionOpenstack – An introduction
Openstack – An introduction
 
Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: Openstack
 
Openstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - IsraelOpenstack platform -Red Hat Pizza and technology event - Israel
Openstack platform -Red Hat Pizza and technology event - Israel
 
OpenStack Introduction
OpenStack IntroductionOpenStack Introduction
OpenStack Introduction
 
Tokyo azure meetup #9 azure update, october
Tokyo azure meetup #9   azure update, octoberTokyo azure meetup #9   azure update, october
Tokyo azure meetup #9 azure update, october
 
Mirantis OpenStack-DC-Meetup 17 Sept 2014
Mirantis OpenStack-DC-Meetup 17 Sept 2014Mirantis OpenStack-DC-Meetup 17 Sept 2014
Mirantis OpenStack-DC-Meetup 17 Sept 2014
 
Introducing OpenStack for Beginners
Introducing OpenStack for Beginners Introducing OpenStack for Beginners
Introducing OpenStack for Beginners
 
DevOps and OpenStack December 2012
DevOps and OpenStack December 2012DevOps and OpenStack December 2012
DevOps and OpenStack December 2012
 
SFScon16 - Michele Baldessari: "OpenStack – An introduction"
SFScon16 - Michele Baldessari: "OpenStack – An introduction"SFScon16 - Michele Baldessari: "OpenStack – An introduction"
SFScon16 - Michele Baldessari: "OpenStack – An introduction"
 
Introduction to OpenStack : Barcamp Bangkhen 2016
Introduction to OpenStack : Barcamp Bangkhen 2016Introduction to OpenStack : Barcamp Bangkhen 2016
Introduction to OpenStack : Barcamp Bangkhen 2016
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Introduction to cloud and openstack
Introduction to cloud and openstackIntroduction to cloud and openstack
Introduction to cloud and openstack
 
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theoryQCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
 
What's new in OpenStack Liberty
What's new in OpenStack LibertyWhat's new in OpenStack Liberty
What's new in OpenStack Liberty
 
Case Study: Utilizing Mirantis Fuel to install OpenStack Ansible
Case Study: Utilizing Mirantis Fuel to install OpenStack AnsibleCase Study: Utilizing Mirantis Fuel to install OpenStack Ansible
Case Study: Utilizing Mirantis Fuel to install OpenStack Ansible
 

Similar to Using OpenStack Swift for Extreme Data Durability

NetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmapNetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmap
Ruslan Meshenberg
 

Similar to Using OpenStack Swift for Extreme Data Durability (20)

Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Montreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpMontreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUp
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 
Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO
 
Delivering a bleeding edge community led open stack distribution- rdo
Delivering a bleeding edge community led open stack distribution- rdoDelivering a bleeding edge community led open stack distribution- rdo
Delivering a bleeding edge community led open stack distribution- rdo
 
DEVIEW 2013
DEVIEW 2013DEVIEW 2013
DEVIEW 2013
 
More Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoMore Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit Juno
 
Containers > VMs
Containers > VMsContainers > VMs
Containers > VMs
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
OpenStack Cinder Best Practices - Meet Up
OpenStack Cinder Best Practices - Meet UpOpenStack Cinder Best Practices - Meet Up
OpenStack Cinder Best Practices - Meet Up
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
 
Best practices for optimizing Red Hat platforms for large scale datacenter de...
Best practices for optimizing Red Hat platforms for large scale datacenter de...Best practices for optimizing Red Hat platforms for large scale datacenter de...
Best practices for optimizing Red Hat platforms for large scale datacenter de...
 
How to deliver High Performance OpenStack Cloud: Christoph Dwertmann, Vault S...
How to deliver High Performance OpenStack Cloud: Christoph Dwertmann, Vault S...How to deliver High Performance OpenStack Cloud: Christoph Dwertmann, Vault S...
How to deliver High Performance OpenStack Cloud: Christoph Dwertmann, Vault S...
 
NetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmapNetflixOSS meetup lightning talks and roadmap
NetflixOSS meetup lightning talks and roadmap
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 
Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!
 
Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Recently uploaded (20)

HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

Using OpenStack Swift for Extreme Data Durability

  • 1. Using OpenStack Swift for extreme data durability Florent Flament, Cloudwatt Christian Schwede, eNovance OpenStack Summit Paris, November 2014
  • 2. Intro - Cloudwatt ● Florent Flament ● Dev & Fireman @ Cloudwatt ● Fixing & tuning of OpenStack (Cinder, Keystone, Nova, Swift) ● Email: florent.flament@cloudwatt.com ● IRC: florentflament on #openstack-dev (Freenode) ● Twitter: @florentflament_ ● Blogs: http://dev.cloudwatt.com / http://www.florentflament.com
  • 3. Intro - eNovance ● Christian Schwede ● Developer @ eNovance / Red Hat ● Mostly working on Swift, testing, automation and developer tools ● Swift Core ● IRC: cschwede in #openstack-swift ● christian@enovance.com / cschwede@redhat.com ● Twitter: @cschwede_de
  • 5. Proxy Node Proxy Node Network Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk
  • 6. Proxy Node Proxy Node Network Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Zone 0 Zone 1 Zone 2
  • 7. Proxy Node Proxy Node Network Zone 0 Zone 1 Region 0 (⅔ of the data) Zone 2 Region 1 (⅓ of the data) Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Zone 0 Disk Disk Disk
  • 9. Ring : the Map of data ● One file per type of data. Ring files map each copy of a data to a physical device through partitions. ● An object’s partition number is computed from the hash of the object’s name. ● A Ring file is: a (replica, partition) to device ID table, a devices table and a number of hash bits. ● Visualize a Ring: https://github.com/victorlin/swiftsense
  • 10. Concrete example of Ring Replica & Partition to Device ID table Devices table 0 1 2 3 0 1 2 3 1 2 3 0 1 2 3 0 2 3 0 1 2 3 0 1 Partition number 0 1 2 Replica number 0 1 2 3 4 5 6 7 ID Host Port Device 0 192.168.0.10 6000 sdb1 1 192.168.0.10 6000 sdc1 2 192.168.0.11 6000 sdb1 3 192.168.0.11 6000 sdc1 Bit count (partition power) = 3 → 23 = 8 partitions
  • 11. Storage policies ● Included in the Juno release (Swift > 2.0.0) ● Applied on a per-container basis ● Flexibility to use multiple rings, for example: ○ Basic: 2 replicas on spinning disks, single datacenter ○ Strong: 3 replicas in three different datacenters around the globe ○ Fast: 3 replicas on SSDs and much more powerful proxies
  • 13. Object durability ● Disk failures: pd ~ 2-5% per year ● Unrecoverable bit read errors: pb = 10-15 ⋅ 8 ⋅ objectsize Failure Failure Failure 3 replicas 2 replicas 1 replica Data loss Replication Replication Replication ● Durability in the range of 10-11 nines with 3 replicas (99.99999999%) ● http://enovance.github.io/swift-durability-calculator/
  • 14. Recover from a disk failure Set failed device weight to 0, rebalance, push new ring Failed
  • 15. Object availability & durability Zone 0 Zone 1 Region 0 (⅔ of the data) Zone 2 Region 1 (⅓ of the data) Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Disk Zone 0 Disk Disk Disk
  • 17. Maintainability by Simplicity ● Standalone `swift-ring-builder` tool to manipulate the Ring ○ Uses `builders` files to keep architectural information on the cluster ○ Smartly assigns partitions to devices ○ Generates Ring files easily checked ● Processes on Swift nodes focus on ensuring that files are stored uncorrupted at the appropriate location
  • 18. Splitting a running Swift Cluster ● Ensuring no data is lost ○ Move only 1 replica at a time ○ Small steps to limit the impact ○ Check for data corruption ○ Check data location ○ Rollback in case of failure ● Limiting the impact on performance ○ Availability of cluster resources ○ Load incurred by cluster being split ○ Small steps to limit the impact ○ Control nodes accessed by users Natively available in Swift
  • 19. Splitting a running Swift Cluster ● Ensuring no data is lost ○ Move only 1 replica at a time ○ Small steps to limit the impact ○ Check for data corruption ○ Check data location ○ Rollback in case of failure ● Limiting the impact on performance ○ Availability of cluster resources ○ Load incurred by cluster being split ○ Small steps to limit the impact ○ Control nodes accessed by users Small steps New in Swift 2.2 !!
  • 20. Adding a new region Add a new region smoothly by limiting the amount of data moved ● really possible since Swift 2.2 ● Final weight in new region should be at least ⅓ of the total cluster weight Example of process: 1. Add devices to new region with a very low weight 2. Increase devices’ weights to store 5% of data in the new region 3. Progressively increase by steps of 5% the amount of data in the new region More details: http://www.florentflament.com/blog/splitting-swift-cluster.html
  • 22. Erasure coding ● Coming real soon now ● Instead of N copies of each object: ○ apply EC to object, split into multiple fragments, for example 14 ○ store them on different disks/nodes ○ objects can be rebuild from 10 fragments ■ Tolerates loss of 4 fragments ● higher durability ■ Only ~ 40% overhead (compared to 200%) ● much cheaper
  • 23. Durability calculation ● More detailed calculation ○ Number of disks, servers, partitions ● Add erasure coding ● Include in Swift documentation? ● Community effort ○ Discussion started last Swift hackathon ■ NTT, Swiftstack, IBM, Seagate, Red Hat / eNovance ○ Ad-Hoc session on Thursday/Friday - join us!
  • 24. Summary ● High availability, even if large parts of the cluster are not accessible ● Automatic failure correction ensures high durability, and depending on your cluster configuration excels known industry standards ● Swift 2.2 (Juno release) ○ Even smoother and predictable cluster upgrades ○ Storage Policies allow fine grained data placement control ● Erasure Coding increase durability even more while lowering costs