SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
KEY CONCEPTS FOR
SCALABLE STATEFUL
SERVICES
Nikolay Novik
https://github.com/jettify
PyConUA 2017
I AM ...
Software Engineer: at DataRobot Ukraine
Github:
Twitter:
aio-libs:
My Projects:
database clients: aiomysql, aioobc, aiogibson
web and etc: aiomonitor,
aiohttp_debugtoolbar, aiobotocore,
aiohttp_mako, aiohttp_admin, aiorwlock
https://github.com/jettify
https://twitter.com/isinf
https://github.com/aio-libs
POLL: HAVE YOU EVER READ DYNAMO PAPER?
1. I read this papers.
2. I heard about this paper and know key ideas.
3. I think distributed systems is kinda cool.
AGENDA
1. Motivation, why and when we might want to user stateful services.
2. Industry examples: Uber, Halo 4, DragonAge, HPC
3. Problem statement, required components
4. Overview of consistent hashing, gossip dissemination and swim failure
detection
5. Possible improvements
USE STATELESS (DUCK TAPE) WHEN YOU CAN!
Stateless protocol is proved technique, use it like duck tape
ISSUES WITH STATELESS SERVICES
Soft real time is requirement
State serialization
Wasteful data fetching
DB leaky transactions
STATELESS SERVICE EXAMPLE
Notice that user data fetched several times and cached on
multiple servers.
BENEFITS OF STATEFUL SERVICES
Data locality, logic executed where data is stored with fast
access
Lower latency state in memory, no need extra network hops
Higher performance no need to deserialize data
STATEFUL SERVICE EXAMPLE
Avoided are extra trips to the database which reduces latency.
Even if the database is down the request can be handled.
INDUSTRY EXAMPLE: UBER
Geo spatial index service to match driver and user
INDUSTRY EXAMPLE: HALO 4
Orleans used as backbone for server part of Halo game,
including: presence, statistics, cheat detection, etc
INDUSTRY EXAMPLE: HPC
San Diego Supercomputer Center uses Serf to coordinate
compute resources in multiple locations, cluster size is about
2k nodes
LETS TRY TO SOLVE CLOSE TO REAL WORLD
PROBLEM: PREDICTION SERVICE
Services that predicts reselling prices of different products,
based on product specification
User enters used product specs, and obtains price estimate
Each product category
FUNCTIONAL REQUIREMENTS
Dynamic scaling
Fault tolerance
Exploit data
locality
Flexible API
REQUIRED COMPONENTS
1. Work distribution and routing move job request to
appropriate node
2. Cluster membership update provide means to determine
nodes participating in cluster in stable and cluster resizing
conditions
3. Failure detector periodically check nodes and remove
unresponsive/dead ones
ROUTING. NAIVE SOLUTION WITH HARD CODED
CLUSTER NODES
Very easy to implement, viable solution when dynamic
resizing is not required
Does not support dynamic scaling in or scaling out
Requires cluster restart for changing nodes configuration
ROUTING. CONSISTENT HASHING SOLUTION
This simple algorithms made Akamai multi billion worth
company
CONSISTENT HASHING. BASIC IDEA
Consistent hashing minimizes number of keys, need to be
remapped
http://blog.carlosgaldino.com/consistent-hashing.html
CONSISTENT HASHING. ADDING NODE
In case of adding capacity, only fraction of keys will be moved
CONSISTENT HASHING. REMOVING NODE
In case of node failure next address will handle related keys
CONSISTENT HASHING. VIRTUAL NODES
Virtual nodes help with keys distribution, moving it close to
1/n
CLUSTER MEMBERSHIP PROBLEM
We have routing and job distribution, lets figure out how to
add and remove nodes.
WHY NOT JUST USE ZOOKEEPER/CONSUL/ECTD
(OR IN OTHER WORDS ZAB, PAXOS, RAFT)?
Issues
Availability
Performance
Network partitions
Operation overhead
TYPICAL SYSTEM WITH COORDINATION
Zookeeper forces own
view
Possible links:
but for FD used only
Nodes availability
decision best when it
is local
n(n−1)
2
n
CLUSTER MEMBERSHIP UPDATE PROBLEM. NAIVE
SOLUTION
Broadcast: could be used for cluster membership update
Use network broadcast (usually disabled)
Send message one by one to each peer(not reliable)
Xerox invented gossip protocols: and
.
GOSSIP PROTOCOL
anti-entropy rumor
mongering
GOSSIP OVERVIEW
Basic gossip protocol
Send message to k
random peers
peers retransmit
message to next k
random peers
in steps,
information will be
disseminated
log(n)
GOSSIP PROTOCOL VS PACKET LOSS
Heavy packet loss does not stop dissemination, it simply will
take a bit longer, 2 times for 50% loss.
FAILURE DETECTION PROTOCOL
We can route jobs and communicate cluster update, last
component is failure detector.
Chandra, Tushar Deepak, and Sam Toueg. "Unreliable failure detectors for reliable distributed systems." Journal of the ACM
(JACM) 43.2 (1996): 225-267.
FAILURE DETECTORS FOR ASYNCHRONOUS
SYSTEMS
In asynchronous distributed systems, the detection of crash
failures is imperfect. There will be false positives and false
negatives.
FAILURE DETECTORS. PROPERTIES
Completeness - every crashed process is eventually
suspected
Accuracy - no correct process is ever suspected
Speed - how fast we can detect fault node
Network message load - number of messages required
during protocol period
BASIC FAILURE DETECTOR
Each process periodically sends out an incremented
heartbeat counter to the outside world.
Another process is detected as failed when a heartbeat is not
received from it for some time
BASIC FAILURE DETECTOR. PROPERTIES
Completeness each process eventually miss heartbeat
Speed configurable, as little as protocol interval
Accuracy high, depends on speed
Network message load each node sends message to
all other nodes
O( )n
2
SWIM FAILURE DETECTOR
SWIM: Scalable Weakly-consistent Infection-style Process
Group Membership. Protocol
SWIM FAILURE DETECTOR
On each protocol round,
node sends only
pings messages
SWIM uses ping as
primary way to do FD, and
indirect ping for better
tolerance to network
partitions
k = 3
SWIM FAILURE DETECTOR. PROPERTIES
Completeness each process eventually will be pinged
Speed configurable, 1 protocol interval
Accuracy 99.9 % with delivery probability 0.95 and k=3
Network message load. ( )O(n) 4k + 2)n
SWIM VS CONNECTION LOSS. SUSPICION
SUBPROTOCOL
Provides a mechanism to reduce the rate of false positives by
“suspecting” a process before “declaring” it as failed within
the group.
SWIM VS PACKET ORDER
Ordering between messages is important, but total order is not
required, only happens before/casual ordering.
Logical timestamp for state updates
Peer specific and only incremented by peer
SWIM VS NETWORK PARTITIONS
Nodes in each subnet can talk to each as result declares peers
on other subnet as dead.
How we can
recover cluster
after network heal?
Do not purge nodes
on dead
Periodically try to
rejoin
PROBLEM SOLVED! IMPLEMENTATION DETAILS
How python can
help with
implementation?
What frameworks
to use?
OVERVIEW OF FRAMEWORKS FOR BUILDING
CLUSTER AWARE SYSTEMS
Name Language Developer Description
??? Python ??? ???
node.js Uber Used as services for matching user and driver with follow
up location update
golang Hashicorp Used in number applications for instance in HPC to
manage computing resources
.NET Microsoft General purpose framework, used in Halo online game
Java EA Games Used in Bioware games, such as DragonAge game, not
sure where thou. Inspired by Orleans
Erlang Basho Building block for Riak database and erlang distributed
systems
Scala Lightblend General purpose distribute systems framework, often used
as microservsies platform
RingPop
Serf
Orleans
Orbit/jGroups
riak_core
Akka
IMPROVEMENT: NETWORK COORDINATES
Famous paper from MIT, describes synthetic network
coordinates, based on ping delays, used in Serf/Consul for data
center fail over
IMPROVEMENT: NETWORK COORDINATES
VISUALIZATION
Notice coordinate drifting in space and stable distance
between clusters
IMPROVEMENT: PARTIAL VIEW FOR HUGE
CLUSTERS
For huge clusters full membership is not scalable, paper
proposes partial membership protocol
IMPROVEMENT: PARTIAL VIEW IN CASE OF NODE
FAILURES
Even for failure rates as high as 95%, HyParView still
manages to maintain a reliability value in the order of
deliveries to 90% of the active processes.
IMPROVEMENT: DHT FOR MORE BALANCING
Orleans uses a one-hop distributed hash table that maps actors
between machines, as result actors could be moved across the
cluster
STATEFUL SERVICES CHALLENGES
Work distribution
Code deployment
Unbounded data structures
Memory management
Persistent strategies
READ MORE PAPERS!
REFERENCES
1. Karger, David, et al. "Consistent hashing and random trees: Distributed caching protocols for
relieving hot spots on the World Wide Web." Proceedings of the twenty-ninth annual ACM
symposium on Theory of computing. ACM, 1997.
2. Chandra, Tushar Deepak, and Sam Toueg. "Unreliable failure detectors for reliable distributed
systems." Journal of the ACM (JACM) 43.2 (1996): 225-267.
3. Das, Abhinandan, Indranil Gupta, and Ashish Motivala. "Swim: Scalable weakly-consistent
infection-style process group membership protocol." Dependable Systems and Networks, 2002.
DSN 2002. Proceedings. International Conference on. IEEE, 2002.
4. Dabek, Frank, et al. "Vivaldi: A decentralized network coordinate system." ACM SIGCOMM
Computer Communication Review 34.4 (2004): 15-26.
5. Leitao, Joao, José Pereira, and Luis Rodrigues. "HyParView: A membership protocol for reliable
gossip-based broadcast." Dependable Systems and Networks, 2007. DSN'07. 37th Annual
IEEE/IFIP International Conference on. IEEE, 2007.
6. Stoica, Ion, et al. "Chord: A scalable peer-to-peer lookup service for internet applications."
ACM SIGCOMM Computer Communication Review 31.4 (2001): 149-160.
7. Bailis, Peter, and Kyle Kingsbury. "The network is reliable." Queue 12.7 (2014): 20.
8. Lamport, Leslie. "Time, clocks, and the ordering of events in a distributed system."
Communications of the ACM 21.7 (1978): 558-565.b
THANK YOU!
aio-libs: https://github.com/aio-libs
slides: https://jettify.github.io/pyconua2017

Contenu connexe

Similaire à KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES

Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITOpenStack
 
CrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked dataCrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked dataRaphael do Vale
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source Nitesh Jadhav
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper diveRobert Kubiś
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfKishaKiddo
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham
 
Crypto Mark Scheme for Fast Pollution Detection and Resistance over Networking
Crypto Mark Scheme for Fast Pollution Detection and Resistance over NetworkingCrypto Mark Scheme for Fast Pollution Detection and Resistance over Networking
Crypto Mark Scheme for Fast Pollution Detection and Resistance over NetworkingIRJET Journal
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applicationsDing Li
 
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~Brocade
 
A Unique Test Bench for Various System-on-a-Chip
A Unique Test Bench for Various System-on-a-Chip A Unique Test Bench for Various System-on-a-Chip
A Unique Test Bench for Various System-on-a-Chip IJECEIAES
 
Disadvantages Of Robotium
Disadvantages Of RobotiumDisadvantages Of Robotium
Disadvantages Of RobotiumSusan Tullis
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5Peter Lawrey
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training ClassDeepak Shankar
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 

Similaire à KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES (20)

Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
CrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked dataCrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked data
 
Tos tutorial
Tos tutorialTos tutorial
Tos tutorial
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Build cloud native solution using open source
Build cloud native solution using open source Build cloud native solution using open source
Build cloud native solution using open source
 
Monitoring - deeper dive
Monitoring  - deeper diveMonitoring  - deeper dive
Monitoring - deeper dive
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdf
 
Internship msc cs
Internship msc csInternship msc cs
Internship msc cs
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Crypto Mark Scheme for Fast Pollution Detection and Resistance over Networking
Crypto Mark Scheme for Fast Pollution Detection and Resistance over NetworkingCrypto Mark Scheme for Fast Pollution Detection and Resistance over Networking
Crypto Mark Scheme for Fast Pollution Detection and Resistance over Networking
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
 
A Unique Test Bench for Various System-on-a-Chip
A Unique Test Bench for Various System-on-a-Chip A Unique Test Bench for Various System-on-a-Chip
A Unique Test Bench for Various System-on-a-Chip
 
Disadvantages Of Robotium
Disadvantages Of RobotiumDisadvantages Of Robotium
Disadvantages Of Robotium
 
Chapter 3 chapter reading task
Chapter 3 chapter reading taskChapter 3 chapter reading task
Chapter 3 chapter reading task
 
Low latency in java 8 v5
Low latency in java 8 v5Low latency in java 8 v5
Low latency in java 8 v5
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training Class
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 

Dernier

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 

Dernier (20)

(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES

  • 1. KEY CONCEPTS FOR SCALABLE STATEFUL SERVICES Nikolay Novik https://github.com/jettify PyConUA 2017
  • 2. I AM ... Software Engineer: at DataRobot Ukraine Github: Twitter: aio-libs: My Projects: database clients: aiomysql, aioobc, aiogibson web and etc: aiomonitor, aiohttp_debugtoolbar, aiobotocore, aiohttp_mako, aiohttp_admin, aiorwlock https://github.com/jettify https://twitter.com/isinf https://github.com/aio-libs
  • 3. POLL: HAVE YOU EVER READ DYNAMO PAPER? 1. I read this papers. 2. I heard about this paper and know key ideas. 3. I think distributed systems is kinda cool.
  • 4. AGENDA 1. Motivation, why and when we might want to user stateful services. 2. Industry examples: Uber, Halo 4, DragonAge, HPC 3. Problem statement, required components 4. Overview of consistent hashing, gossip dissemination and swim failure detection 5. Possible improvements
  • 5. USE STATELESS (DUCK TAPE) WHEN YOU CAN! Stateless protocol is proved technique, use it like duck tape
  • 6. ISSUES WITH STATELESS SERVICES Soft real time is requirement State serialization Wasteful data fetching DB leaky transactions
  • 7. STATELESS SERVICE EXAMPLE Notice that user data fetched several times and cached on multiple servers.
  • 8. BENEFITS OF STATEFUL SERVICES Data locality, logic executed where data is stored with fast access Lower latency state in memory, no need extra network hops Higher performance no need to deserialize data
  • 9. STATEFUL SERVICE EXAMPLE Avoided are extra trips to the database which reduces latency. Even if the database is down the request can be handled.
  • 10. INDUSTRY EXAMPLE: UBER Geo spatial index service to match driver and user
  • 11. INDUSTRY EXAMPLE: HALO 4 Orleans used as backbone for server part of Halo game, including: presence, statistics, cheat detection, etc
  • 12. INDUSTRY EXAMPLE: HPC San Diego Supercomputer Center uses Serf to coordinate compute resources in multiple locations, cluster size is about 2k nodes
  • 13. LETS TRY TO SOLVE CLOSE TO REAL WORLD PROBLEM: PREDICTION SERVICE Services that predicts reselling prices of different products, based on product specification User enters used product specs, and obtains price estimate Each product category
  • 14. FUNCTIONAL REQUIREMENTS Dynamic scaling Fault tolerance Exploit data locality Flexible API
  • 15. REQUIRED COMPONENTS 1. Work distribution and routing move job request to appropriate node 2. Cluster membership update provide means to determine nodes participating in cluster in stable and cluster resizing conditions 3. Failure detector periodically check nodes and remove unresponsive/dead ones
  • 16. ROUTING. NAIVE SOLUTION WITH HARD CODED CLUSTER NODES Very easy to implement, viable solution when dynamic resizing is not required Does not support dynamic scaling in or scaling out Requires cluster restart for changing nodes configuration
  • 17. ROUTING. CONSISTENT HASHING SOLUTION This simple algorithms made Akamai multi billion worth company
  • 18. CONSISTENT HASHING. BASIC IDEA Consistent hashing minimizes number of keys, need to be remapped http://blog.carlosgaldino.com/consistent-hashing.html
  • 19. CONSISTENT HASHING. ADDING NODE In case of adding capacity, only fraction of keys will be moved
  • 20. CONSISTENT HASHING. REMOVING NODE In case of node failure next address will handle related keys
  • 21. CONSISTENT HASHING. VIRTUAL NODES Virtual nodes help with keys distribution, moving it close to 1/n
  • 22. CLUSTER MEMBERSHIP PROBLEM We have routing and job distribution, lets figure out how to add and remove nodes.
  • 23. WHY NOT JUST USE ZOOKEEPER/CONSUL/ECTD (OR IN OTHER WORDS ZAB, PAXOS, RAFT)? Issues Availability Performance Network partitions Operation overhead
  • 24. TYPICAL SYSTEM WITH COORDINATION Zookeeper forces own view Possible links: but for FD used only Nodes availability decision best when it is local n(n−1) 2 n
  • 25. CLUSTER MEMBERSHIP UPDATE PROBLEM. NAIVE SOLUTION Broadcast: could be used for cluster membership update Use network broadcast (usually disabled) Send message one by one to each peer(not reliable)
  • 26. Xerox invented gossip protocols: and . GOSSIP PROTOCOL anti-entropy rumor mongering
  • 27. GOSSIP OVERVIEW Basic gossip protocol Send message to k random peers peers retransmit message to next k random peers in steps, information will be disseminated log(n)
  • 28. GOSSIP PROTOCOL VS PACKET LOSS Heavy packet loss does not stop dissemination, it simply will take a bit longer, 2 times for 50% loss.
  • 29. FAILURE DETECTION PROTOCOL We can route jobs and communicate cluster update, last component is failure detector.
  • 30. Chandra, Tushar Deepak, and Sam Toueg. "Unreliable failure detectors for reliable distributed systems." Journal of the ACM (JACM) 43.2 (1996): 225-267. FAILURE DETECTORS FOR ASYNCHRONOUS SYSTEMS In asynchronous distributed systems, the detection of crash failures is imperfect. There will be false positives and false negatives.
  • 31. FAILURE DETECTORS. PROPERTIES Completeness - every crashed process is eventually suspected Accuracy - no correct process is ever suspected Speed - how fast we can detect fault node Network message load - number of messages required during protocol period
  • 32. BASIC FAILURE DETECTOR Each process periodically sends out an incremented heartbeat counter to the outside world. Another process is detected as failed when a heartbeat is not received from it for some time
  • 33. BASIC FAILURE DETECTOR. PROPERTIES Completeness each process eventually miss heartbeat Speed configurable, as little as protocol interval Accuracy high, depends on speed Network message load each node sends message to all other nodes O( )n 2
  • 34. SWIM FAILURE DETECTOR SWIM: Scalable Weakly-consistent Infection-style Process Group Membership. Protocol
  • 35. SWIM FAILURE DETECTOR On each protocol round, node sends only pings messages SWIM uses ping as primary way to do FD, and indirect ping for better tolerance to network partitions k = 3
  • 36. SWIM FAILURE DETECTOR. PROPERTIES Completeness each process eventually will be pinged Speed configurable, 1 protocol interval Accuracy 99.9 % with delivery probability 0.95 and k=3 Network message load. ( )O(n) 4k + 2)n
  • 37. SWIM VS CONNECTION LOSS. SUSPICION SUBPROTOCOL Provides a mechanism to reduce the rate of false positives by “suspecting” a process before “declaring” it as failed within the group.
  • 38. SWIM VS PACKET ORDER Ordering between messages is important, but total order is not required, only happens before/casual ordering. Logical timestamp for state updates Peer specific and only incremented by peer
  • 39. SWIM VS NETWORK PARTITIONS Nodes in each subnet can talk to each as result declares peers on other subnet as dead. How we can recover cluster after network heal? Do not purge nodes on dead Periodically try to rejoin
  • 40. PROBLEM SOLVED! IMPLEMENTATION DETAILS How python can help with implementation? What frameworks to use?
  • 41. OVERVIEW OF FRAMEWORKS FOR BUILDING CLUSTER AWARE SYSTEMS Name Language Developer Description ??? Python ??? ??? node.js Uber Used as services for matching user and driver with follow up location update golang Hashicorp Used in number applications for instance in HPC to manage computing resources .NET Microsoft General purpose framework, used in Halo online game Java EA Games Used in Bioware games, such as DragonAge game, not sure where thou. Inspired by Orleans Erlang Basho Building block for Riak database and erlang distributed systems Scala Lightblend General purpose distribute systems framework, often used as microservsies platform RingPop Serf Orleans Orbit/jGroups riak_core Akka
  • 42. IMPROVEMENT: NETWORK COORDINATES Famous paper from MIT, describes synthetic network coordinates, based on ping delays, used in Serf/Consul for data center fail over
  • 43. IMPROVEMENT: NETWORK COORDINATES VISUALIZATION Notice coordinate drifting in space and stable distance between clusters
  • 44. IMPROVEMENT: PARTIAL VIEW FOR HUGE CLUSTERS For huge clusters full membership is not scalable, paper proposes partial membership protocol
  • 45. IMPROVEMENT: PARTIAL VIEW IN CASE OF NODE FAILURES Even for failure rates as high as 95%, HyParView still manages to maintain a reliability value in the order of deliveries to 90% of the active processes.
  • 46. IMPROVEMENT: DHT FOR MORE BALANCING Orleans uses a one-hop distributed hash table that maps actors between machines, as result actors could be moved across the cluster
  • 47. STATEFUL SERVICES CHALLENGES Work distribution Code deployment Unbounded data structures Memory management Persistent strategies
  • 49. REFERENCES 1. Karger, David, et al. "Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web." Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. ACM, 1997. 2. Chandra, Tushar Deepak, and Sam Toueg. "Unreliable failure detectors for reliable distributed systems." Journal of the ACM (JACM) 43.2 (1996): 225-267. 3. Das, Abhinandan, Indranil Gupta, and Ashish Motivala. "Swim: Scalable weakly-consistent infection-style process group membership protocol." Dependable Systems and Networks, 2002. DSN 2002. Proceedings. International Conference on. IEEE, 2002. 4. Dabek, Frank, et al. "Vivaldi: A decentralized network coordinate system." ACM SIGCOMM Computer Communication Review 34.4 (2004): 15-26. 5. Leitao, Joao, José Pereira, and Luis Rodrigues. "HyParView: A membership protocol for reliable gossip-based broadcast." Dependable Systems and Networks, 2007. DSN'07. 37th Annual IEEE/IFIP International Conference on. IEEE, 2007. 6. Stoica, Ion, et al. "Chord: A scalable peer-to-peer lookup service for internet applications." ACM SIGCOMM Computer Communication Review 31.4 (2001): 149-160. 7. Bailis, Peter, and Kyle Kingsbury. "The network is reliable." Queue 12.7 (2014): 20. 8. Lamport, Leslie. "Time, clocks, and the ordering of events in a distributed system." Communications of the ACM 21.7 (1978): 558-565.b
  • 50. THANK YOU! aio-libs: https://github.com/aio-libs slides: https://jettify.github.io/pyconua2017