SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
NO COMPROMISES:
DISTRIBUTED TRANSACTIONS WITH
CONSISTENCY,
AVAILABILITY AND
PERFORMANCE
DRAGOJEVIC ET. AL., SOSP '15
TODAY
> Overview of FaRM, plus technological context
> No proofs this time! (yay)
> Only cursory overview of recovery protocol
WHAT'S TO
LOVE?
1. CHALLENGE
TO ORTHODOXY
2. FORWARD LOOKING
.. (WITHOUT BEING OVERLY SPECULATIVE)
3. ENGINEERING
IS GREAT
DO WE NEED TO
COMPROMISE?
1980S: DISKS ARE SLOW AND MEMORY IS SMALL
1980S: DISKS ARE SLOW AND MEMORY IS SMALL
... SO LET'S INVENT GRACE JOIN AND FRIENDS.1
1
'Implementation techniques for main memory database systems', DeWitt et. al., SIGMOD'84
1990S: WANS ARE SLOW!
1990S: WANS ARE SLOW!
... SO LET'S BUILD A CROSS-SITE OPTIMIZER2
2
'Mariposa: a wide-area distributed database system', Stonebraker et. al.
2000S: MEMORY IS SLOW!
2000S: MEMORY IS SLOW!... SO LET'S BUILD A CACHE-EFFICIENT JOIN ALGORITHM (X-100)3
3
'Database Architecture Optimized for the new Bottleneck, Memory Access', Boncz et. al., VLDB'99
2010: DISKS ARE SLOW AGAIN!
2010: DISKS ARE SLOW AGAIN!
... SO LET'S PUT LOTS OF THEM IN A SINGLE MACHINE!
DATABASE SYSTEM DESIGN CAN BE VIEWED AS
AN EXERCISE IN CHASING A MOVING TARGET.
2015: CPUS ARE
GOING TO
BECOME SLOW
2015: CPUS ARE GOING TO BECOME SLOW
... WHAT CAN WE DO ABOUT IT?
WHY ARE CPUS GOING TO BECOME SLOW?
> Non-volatile storage is going to get much, much quicker
> Message latency is going to decrease
WHY ARE CPUS GOING TO BECOME SLOW?
> Non-volatile storage is going to get much, much quicker
> Message latency is going to decrease
AND BOTH WILL BECOME AFFORDABLE IN
DATACENTERS
FASTER NON-VOLATILE STORAGE
> Add a UPS to main memory
> When power is lost, write to SSD!
> NV-DRAM is not new, but this is a cheap (effective) hack.
LOW-LATENCY IN-DATACENTER MESSAGING
> Remote Direct Memory Access (RDMA) is a low-latency
link (v1) or IP (v2)-level protocol
> Allows machines to directly access memory of remote
peers
> with no CPU involvement at all!
> Infiniband was expensive, but RDMA-over-Ethernet
(RoCE) is cheaper and becoming popular.
DISTRIBUTED
DATABASE
CONTEXT
DURABILITY REQUIRES WRITES TO NON-VOLATILE STORAGE
MESSAGING IS EXTREMELY CPU EXPENSIVE
THE CPU COST OF AN RPC:
> Interrupt for kernel service
> Memory copy into kernel
> Copy into userspace
> Wake-up handler thread
> De-serialize message
> Do something
THE CPU COST OF AN RPC:
> Interrupt for kernel service
> Memory copy into kernel
> Copy into userspace
> Wake-up handler thread
> De-serialize message
> Do something
4
4
'Profiling a warehouse-scale computer', Kanev et. al. ISCA'15
RDMA
> No CPU on the usual write or read path
> NIC has its own set of page tables (without paging)
> Address memory regions directly
> FaRM uses two data structures:
> Transactional log
> Messaging ring-buffer
FARM
TWO PAPERS:
> 'No compromises...', Dragojevic et. al., SOSP'15
> 'FaRM: Fast Remote Memory', Dragojevic et. al., NSDI'14
MAIN CONTRIBUTIONS:
> Very low-latency, high-throughput transactional
system.
> Very fast failure detection / recovery protocol
> Unusual distributed system architecture based on
Vertical Paxos
> Commit protocol optimised for RDMA / low message
count
WHAT YOU GET: ABSTRACTIONS
> Global address space of addressable memory
> Transactional API, including lock-free reads
PROGRAMMING MODEL
> Application threads run in FARM servers
> Can perform arbitrary logic during transaction (but no
side-effects, please!)
> May have to deal with anomolies on read, thanks to
optimistic commit
SYSTEM ARCHITECTURE
ADDRESSABLE MEMORY: REGIONS
> Memory is partitioned into 2GB regions, pinned into
memory on each machine
> Regions are served by a primary, but have f backups
> Region->primary mapping is maintained by the
'configuration manager'
> Regions may be co-located at application's behest
HOW A CHUNK OF MEMORY BECOMES A
REGION
> Two-phase commit from CM (initiated by machine)
> Ensures that all replicas have mapping before it gets
used
REGION MAPPING RECOVERY?
> State is present in the cluster, so if CM fails can
recover it from active replicas.
> Individual machines cache mapping after fetching
through RDMA
TRANSACTIONAL
PROTOCOL
OPTIMISTIC CONCURRENCY:
TRANSACTIONS MAY FAIL AFTER LOCK ACQUISITION
COMMIT PROTOCOL
COMMIT PROTOCOL
COMMIT PROTOCOL NOTES
> All communication is over RDMA
> Total message delays not fewer than Paxos
> But total number of messages is:
4P(2f + 1) vs Pw(f+3) + Pr
> And some of those are extremely cheap
*
FAILURE DETECTION
AND RECOVERY
LEASES
> i.e. registration + keepalive, created by three-way-
handshake
> 5ms leases for 90-node cluster, with 1ms-frequency
retries!!
LEASES - HOW THEY DID IT
> Preallocation of lease manager memory
> Pin code in RAM
> Keep hardware threads free
> Use unreliable transport
SEVEN-STEP PROCESS TOWARDS RECOVERY
1. Suspect - block external requests
2. Probe - check for correlated failures
3. Update configuration - atomically move configuration
to next version in ZK
4. Remap regions - recover replication guarantee from
existing replicas
SEVEN-STEP PROCESS: COMMIT PROTOCOL
1. Send new configuration - replicas are informed of
new configuration
2. Apply new configuration - replicas update their
configurations in parallel, and wait...
3. Commit new configuration - replicas are told to
start serving requests again
Commit protocol ensures consistent membership state,
TRANSACTION RECOVERY
THANKS! QUESTIONS?
@HENRYR / HENRY.ROBINSON@GMAIL.COM
No Compromises: Distributed Transactions with Consistency, Availability and Performance

Contenu connexe

Tendances

LF_OVS_17_Ingress Scheduling
LF_OVS_17_Ingress SchedulingLF_OVS_17_Ingress Scheduling
LF_OVS_17_Ingress SchedulingLF_OpenvSwitch
 
Packet Framework - Cristian Dumitrescu
Packet Framework - Cristian DumitrescuPacket Framework - Cristian Dumitrescu
Packet Framework - Cristian Dumitrescuharryvanhaaren
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.Naoto MATSUMOTO
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunheut2008
 
Cpu高效编程技术
Cpu高效编程技术Cpu高效编程技术
Cpu高效编程技术Feng Yu
 
The n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkThe n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkmarkdgray
 
CS4344 Lecture 9: Traffic Analysis
CS4344 Lecture 9: Traffic AnalysisCS4344 Lecture 9: Traffic Analysis
CS4344 Lecture 9: Traffic AnalysisWei Tsang Ooi
 
Performance Lessons learned in vRouter - Stephen Hemminger
Performance Lessons learned in vRouter - Stephen HemmingerPerformance Lessons learned in vRouter - Stephen Hemminger
Performance Lessons learned in vRouter - Stephen Hemmingerharryvanhaaren
 
Open vSwitch Implementation Options
Open vSwitch Implementation Options Open vSwitch Implementation Options
Open vSwitch Implementation Options Netronome
 
OpenvSwitch Deep Dive
OpenvSwitch Deep DiveOpenvSwitch Deep Dive
OpenvSwitch Deep Diverajdeep
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATOpen vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATThomas Graf
 
DPDK Support for New HW Offloads
DPDK Support for New HW OffloadsDPDK Support for New HW Offloads
DPDK Support for New HW OffloadsNetronome
 
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solutionVirtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solutionFlavio Bertini
 

Tendances (20)

Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
 
LF_OVS_17_Ingress Scheduling
LF_OVS_17_Ingress SchedulingLF_OVS_17_Ingress Scheduling
LF_OVS_17_Ingress Scheduling
 
Packet Framework - Cristian Dumitrescu
Packet Framework - Cristian DumitrescuPacket Framework - Cristian Dumitrescu
Packet Framework - Cristian Dumitrescu
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.
 
Enable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zunEnable DPDK and SR-IOV for containerized virtual network functions with zun
Enable DPDK and SR-IOV for containerized virtual network functions with zun
 
Cpu高效编程技术
Cpu高效编程技术Cpu高效编程技术
Cpu高效编程技术
 
The n00bs guide to ovs dpdk
The n00bs guide to ovs dpdkThe n00bs guide to ovs dpdk
The n00bs guide to ovs dpdk
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
CS4344 Lecture 9: Traffic Analysis
CS4344 Lecture 9: Traffic AnalysisCS4344 Lecture 9: Traffic Analysis
CS4344 Lecture 9: Traffic Analysis
 
Performance Lessons learned in vRouter - Stephen Hemminger
Performance Lessons learned in vRouter - Stephen HemmingerPerformance Lessons learned in vRouter - Stephen Hemminger
Performance Lessons learned in vRouter - Stephen Hemminger
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Open vSwitch Implementation Options
Open vSwitch Implementation Options Open vSwitch Implementation Options
Open vSwitch Implementation Options
 
OpenvSwitch Deep Dive
OpenvSwitch Deep DiveOpenvSwitch Deep Dive
OpenvSwitch Deep Dive
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATOpen vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NAT
 
First steps on CentOs7
First steps on CentOs7First steps on CentOs7
First steps on CentOs7
 
DPDK Support for New HW Offloads
DPDK Support for New HW OffloadsDPDK Support for New HW Offloads
DPDK Support for New HW Offloads
 
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solutionVirtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
 
Mercurial
MercurialMercurial
Mercurial
 
Dpdk performance
Dpdk performanceDpdk performance
Dpdk performance
 

Similaire à No Compromises: Distributed Transactions with Consistency, Availability and Performance

Offloading for Databases - Deep Dive
Offloading for Databases - Deep DiveOffloading for Databases - Deep Dive
Offloading for Databases - Deep DiveUniFabric
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
 
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance ConsiderationsShawn Wells
 
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET Journal
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Fwdays
 
Intel® RDT Hands-on Lab
Intel® RDT Hands-on LabIntel® RDT Hands-on Lab
Intel® RDT Hands-on LabMichelle Holley
 
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld
 
Practical steps to mitigate DDoS attacks
Practical steps to mitigate DDoS attacksPractical steps to mitigate DDoS attacks
Practical steps to mitigate DDoS attacksSecurity Session
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)Sri Prasanna
 
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. TanenbaumA Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaumeurobsdcon
 
MARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicMARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicEric Verhulst
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Виталий Стародубцев
 
Towards Software Defined Persistent Memory
Towards Software Defined Persistent MemoryTowards Software Defined Persistent Memory
Towards Software Defined Persistent MemorySwaminathan Sundararaman
 
Basic ccna interview questions and answers ~ sysnet notes
Basic ccna interview questions and answers ~ sysnet notesBasic ccna interview questions and answers ~ sysnet notes
Basic ccna interview questions and answers ~ sysnet notesVamsi Krishna Kalavala
 
What a Modern Database Enables_Srini Srinivasan.pdf
What a Modern Database Enables_Srini Srinivasan.pdfWhat a Modern Database Enables_Srini Srinivasan.pdf
What a Modern Database Enables_Srini Srinivasan.pdfAerospike, Inc.
 
Seastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration SummitSeastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration SummitDon Marti
 

Similaire à No Compromises: Distributed Transactions with Consistency, Availability and Performance (20)

Coa presentation3
Coa presentation3Coa presentation3
Coa presentation3
 
Offloading for Databases - Deep Dive
Offloading for Databases - Deep DiveOffloading for Databases - Deep Dive
Offloading for Databases - Deep Dive
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The Sequel
 
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations2009-01-28 DOI NBC Red Hat on System z Performance Considerations
2009-01-28 DOI NBC Red Hat on System z Performance Considerations
 
L05 parallel
L05 parallelL05 parallel
L05 parallel
 
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDLIRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
IRJET- Design And VLSI Verification of DDR SDRAM Controller Using VHDL
 
Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...Anton Moldovan "Building an efficient replication system for thousands of ter...
Anton Moldovan "Building an efficient replication system for thousands of ter...
 
Intel® RDT Hands-on Lab
Intel® RDT Hands-on LabIntel® RDT Hands-on Lab
Intel® RDT Hands-on Lab
 
os
osos
os
 
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
VMworld 2013: Silent Killer: How Latency Destroys Performance...And What to D...
 
Practical steps to mitigate DDoS attacks
Practical steps to mitigate DDoS attacksPractical steps to mitigate DDoS attacks
Practical steps to mitigate DDoS attacks
 
Intro (Distributed computing)
Intro (Distributed computing)Intro (Distributed computing)
Intro (Distributed computing)
 
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. TanenbaumA Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
A Reimplementation of NetBSD Based on a Microkernel by Andrew S. Tanenbaum
 
MARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 AltreonicMARC ONERA Toulouse2012 Altreonic
MARC ONERA Toulouse2012 Altreonic
 
Mainframe
MainframeMainframe
Mainframe
 
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
Технологии работы с дисковыми хранилищами и файловыми системами Windows Serve...
 
Towards Software Defined Persistent Memory
Towards Software Defined Persistent MemoryTowards Software Defined Persistent Memory
Towards Software Defined Persistent Memory
 
Basic ccna interview questions and answers ~ sysnet notes
Basic ccna interview questions and answers ~ sysnet notesBasic ccna interview questions and answers ~ sysnet notes
Basic ccna interview questions and answers ~ sysnet notes
 
What a Modern Database Enables_Srini Srinivasan.pdf
What a Modern Database Enables_Srini Srinivasan.pdfWhat a Modern Database Enables_Srini Srinivasan.pdf
What a Modern Database Enables_Srini Srinivasan.pdf
 
Seastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration SummitSeastar at Linux Foundation Collaboration Summit
Seastar at Linux Foundation Collaboration Summit
 

Dernier

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Dernier (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

No Compromises: Distributed Transactions with Consistency, Availability and Performance

  • 1. NO COMPROMISES: DISTRIBUTED TRANSACTIONS WITH CONSISTENCY, AVAILABILITY AND PERFORMANCE DRAGOJEVIC ET. AL., SOSP '15
  • 2. TODAY > Overview of FaRM, plus technological context > No proofs this time! (yay) > Only cursory overview of recovery protocol
  • 3.
  • 4.
  • 7. 2. FORWARD LOOKING .. (WITHOUT BEING OVERLY SPECULATIVE)
  • 9. DO WE NEED TO COMPROMISE?
  • 10. 1980S: DISKS ARE SLOW AND MEMORY IS SMALL
  • 11. 1980S: DISKS ARE SLOW AND MEMORY IS SMALL ... SO LET'S INVENT GRACE JOIN AND FRIENDS.1 1 'Implementation techniques for main memory database systems', DeWitt et. al., SIGMOD'84
  • 13. 1990S: WANS ARE SLOW! ... SO LET'S BUILD A CROSS-SITE OPTIMIZER2 2 'Mariposa: a wide-area distributed database system', Stonebraker et. al.
  • 15. 2000S: MEMORY IS SLOW!... SO LET'S BUILD A CACHE-EFFICIENT JOIN ALGORITHM (X-100)3 3 'Database Architecture Optimized for the new Bottleneck, Memory Access', Boncz et. al., VLDB'99
  • 16. 2010: DISKS ARE SLOW AGAIN!
  • 17. 2010: DISKS ARE SLOW AGAIN! ... SO LET'S PUT LOTS OF THEM IN A SINGLE MACHINE!
  • 18. DATABASE SYSTEM DESIGN CAN BE VIEWED AS AN EXERCISE IN CHASING A MOVING TARGET.
  • 19. 2015: CPUS ARE GOING TO BECOME SLOW
  • 20. 2015: CPUS ARE GOING TO BECOME SLOW ... WHAT CAN WE DO ABOUT IT?
  • 21. WHY ARE CPUS GOING TO BECOME SLOW? > Non-volatile storage is going to get much, much quicker > Message latency is going to decrease
  • 22. WHY ARE CPUS GOING TO BECOME SLOW? > Non-volatile storage is going to get much, much quicker > Message latency is going to decrease AND BOTH WILL BECOME AFFORDABLE IN DATACENTERS
  • 23. FASTER NON-VOLATILE STORAGE > Add a UPS to main memory > When power is lost, write to SSD! > NV-DRAM is not new, but this is a cheap (effective) hack.
  • 24. LOW-LATENCY IN-DATACENTER MESSAGING > Remote Direct Memory Access (RDMA) is a low-latency link (v1) or IP (v2)-level protocol > Allows machines to directly access memory of remote peers > with no CPU involvement at all! > Infiniband was expensive, but RDMA-over-Ethernet (RoCE) is cheaper and becoming popular.
  • 26. DURABILITY REQUIRES WRITES TO NON-VOLATILE STORAGE
  • 27. MESSAGING IS EXTREMELY CPU EXPENSIVE
  • 28. THE CPU COST OF AN RPC: > Interrupt for kernel service > Memory copy into kernel > Copy into userspace > Wake-up handler thread > De-serialize message > Do something
  • 29. THE CPU COST OF AN RPC: > Interrupt for kernel service > Memory copy into kernel > Copy into userspace > Wake-up handler thread > De-serialize message > Do something
  • 30. 4 4 'Profiling a warehouse-scale computer', Kanev et. al. ISCA'15
  • 31. RDMA > No CPU on the usual write or read path > NIC has its own set of page tables (without paging) > Address memory regions directly > FaRM uses two data structures: > Transactional log > Messaging ring-buffer
  • 32. FARM
  • 33. TWO PAPERS: > 'No compromises...', Dragojevic et. al., SOSP'15 > 'FaRM: Fast Remote Memory', Dragojevic et. al., NSDI'14
  • 34.
  • 35. MAIN CONTRIBUTIONS: > Very low-latency, high-throughput transactional system. > Very fast failure detection / recovery protocol > Unusual distributed system architecture based on Vertical Paxos > Commit protocol optimised for RDMA / low message count
  • 36. WHAT YOU GET: ABSTRACTIONS > Global address space of addressable memory > Transactional API, including lock-free reads
  • 37. PROGRAMMING MODEL > Application threads run in FARM servers > Can perform arbitrary logic during transaction (but no side-effects, please!) > May have to deal with anomolies on read, thanks to optimistic commit
  • 39. ADDRESSABLE MEMORY: REGIONS > Memory is partitioned into 2GB regions, pinned into memory on each machine > Regions are served by a primary, but have f backups > Region->primary mapping is maintained by the 'configuration manager' > Regions may be co-located at application's behest
  • 40. HOW A CHUNK OF MEMORY BECOMES A REGION > Two-phase commit from CM (initiated by machine) > Ensures that all replicas have mapping before it gets used
  • 41. REGION MAPPING RECOVERY? > State is present in the cluster, so if CM fails can recover it from active replicas. > Individual machines cache mapping after fetching through RDMA
  • 43. OPTIMISTIC CONCURRENCY: TRANSACTIONS MAY FAIL AFTER LOCK ACQUISITION
  • 46. COMMIT PROTOCOL NOTES > All communication is over RDMA > Total message delays not fewer than Paxos > But total number of messages is: 4P(2f + 1) vs Pw(f+3) + Pr > And some of those are extremely cheap *
  • 48. LEASES > i.e. registration + keepalive, created by three-way- handshake > 5ms leases for 90-node cluster, with 1ms-frequency retries!!
  • 49. LEASES - HOW THEY DID IT > Preallocation of lease manager memory > Pin code in RAM > Keep hardware threads free > Use unreliable transport
  • 50. SEVEN-STEP PROCESS TOWARDS RECOVERY 1. Suspect - block external requests 2. Probe - check for correlated failures 3. Update configuration - atomically move configuration to next version in ZK 4. Remap regions - recover replication guarantee from existing replicas
  • 51. SEVEN-STEP PROCESS: COMMIT PROTOCOL 1. Send new configuration - replicas are informed of new configuration 2. Apply new configuration - replicas update their configurations in parallel, and wait... 3. Commit new configuration - replicas are told to start serving requests again Commit protocol ensures consistent membership state,
  • 53. THANKS! QUESTIONS? @HENRYR / HENRY.ROBINSON@GMAIL.COM