SlideShare une entreprise Scribd logo
1  sur  31
Chord: A Scalable Peer-
 to-peer Lookup Protocol
 for Internet Applications
  Ion Stoica, Robert Morris, David Liben-Nowell, David R.
Karger, M. Frans Kaashoek, Frank Dabek Hari Balakrishnan
About Chord

•   Protocol and algorithm for distributed hash
    table (DHT)

•   Structured-decentralized P2P overlay
    network
About Chord
•   Supports just one operation: lookup(key)

    •   Given a key, it maps the key onto a node

    •   Returns the IP-address of the node

•   Associate a value with each key and store
    the pair on the resulting node:

    •   Distributed hash table

•   Uses consistent hashing for this operation
Chord properties (1)
•   Load balancing:

    •   spreads keys evenly over nodes

•   Decentralized:

    •   fully distributed, all nodes equally
        important

    •   does not take into account heterogeneity
        of nodes
Chord properties (2)
•   Scalability:
    •   lookup cost: O(log(N))
•   Availability:
    •   automatically updates lookup tables as
        nodes join, leave, fail
•   Flexible naming:
    •   Flat key-space, no constraint on structure
        of keys
Traditional hashing

•   Given an object, assign it to a bin from a set
    of N bins

•   Each bin: roughly same amount of objects

•   Problem:

    •   If N changes, all objects need to be
        reassigned
Consistent hashing
•   Evenly distributes K objects across N bins,
    about K/N each

•   When N changes:

    •   Not all objects need to be moved

    •   Only O(K/N) need to

•   Uses a base hash function, such as SHA-1
Chord protocol
•   Assign each node and key an m-bit
    identifier using SHA-1:

    •   node: hash the IP-address

    •   key: hash the key

•   Identifiers are ordered on an identifier circle
    module 2m, the Chord ring:

    •   circle of numbers from 0 to 2m-1
Assigning keys to nodes

 •   Assigning key k to a node:

     •   first node whose identifier is equal to or
         follows the identifier of key k in the
         identifier space

     •   successor node of k: successor(k)

     •   first node clockwise from k on Chord ring
Chord ring (m=6)
                          N1



        N56                          N8
                                           K10
 K54
   N51
                                           N14
  N48



                                       N21
        N42

                                     K24
          K38 N38
                    N32
                               K30
Minimal disruption
•   Minimal disruption when node joins/leaves

•   n joins:

    •   certain keys assigned to successor(n) are
        now assigned to n

•   n leaves:

    •   all keys assigned to n are now assigned
        to successor(n)
Simple Chord lookup
•   Simplest Chord lookup algorithm:

    •   Each node only needs to know its
        successor on Chord ring

    •   Query for identifier id is passed around
        Chord ring until successor(id) is found

    •   Result returned along reverse path

•   Easy but not efficient: # messages O(N)
Simple Chord lookup
                                                                        N1

                                                                                     lookup(K54)
/ ask node n to fi nd the successor of id                                              N8
                                               K54 N56
n.find successor(id) the successor
    // ask node n to find
  if// of id successor])
     (id ∈ (n,
    n.find_successor(id)
     return successor;                          N51
                                                                                           N14
  else return (n, successor])
      if (id ∈
                successor;                     N48
     // forward the query around the circle
      else
     return successor.fiquery around the
         // forward the nd successor(id);
       // circle
       return successor.find_successor(id);                                             N21
                                                     N42

                    (a)
                                                           N38
                                                                  N32

                                                                     (b)
seudocode to fi nd the successor node of an identifi er id. Remote procedure calls and variable lookups
 a query from node 8 for key 54, using the pseudocode in Figure 3(a).
Finger table
•   Chord maintains additional routing info:

    •   not essential for correctness

    •   but allows acceleration of lookups

•   Each node maintains finger table with m
    entries:

    •   n.finger[i] = successor(n + 2i-1)
Finger table
le (but slow) pseudocode to fi nd the successor node of an identifi er id. Rem
e path taken by a query from node 8 for key 54, using the pseudocode in Figu

                                        N1

                                                                  Finger table
                                                      N8
                                                       +1         N8 + 1   N14
                                                                  N8 + 2   N14   K54 N
                                                     +2           N8 + 4   N14
                                                   +4             N8 + 8   N21
             N51                                                                  N51
                            +32               +8            N14   N8 +16   N32
                                                                  N8 +32   N42
            N48                         +16                                      N48



                                                          N21
               N42
                                                                                       N


                      N38
                                  N32
Improved lookup
•   Lookup for id now works as follows:

    •   If id falls between n and successor(n), successor(n) is
        returned

    •   Otherwise, lookup is performed at node n’, which is the
        node in the finger table of n that most immediately
        precedes id

•   Since each node has finger entries at power of two
    intervals around the identifier circle, each node can
    forward a query at least halfway along the remaining
    distance between the node and the target identifier.

•   Thus O(log(N)) nodes need to be contacted
N32

                                                          (b)

                        Chord protocol
 successor node of an identifi er id. Remote procedure calls and variable lookups ar
for key 54, using the pseudocode in Figure 3(a).

1                                                                     N1

                       Finger table                                        lookup(54)
 // ask node n to find the successor
           N8
 // of id    +1       N8 + 1 N14                                           N8
 n.find_successor(id) N8 + 2 N14        K54 N56
   if (id ∈ (n, successor]) N14
                      N8 + 4
          +2
     return successor; + 8 N21
        +4            N8
   else                                  N51
     +8         N14 N8 +16 N32                                                  N14
     n′ = closest_preceding_node(id);
                      N8 +32 N42
16   return n′.find_successor(id);      N48
    // search the local table for the
    // highest predecessor of id
    n.closest_preceding_node(id)
                 N21                                                        N21
      for i = m downto 1
        if (finger[i] ∈ (n, id))              N42
          return finger[i];
      return n;
                                                    N38
                                                                N32


(a)                                                               (b)
Coping with joining/leaving

 •   Essential that successor pointers stay up to
     date

 •   Each node periodically runs stabilization
     protocol which updates Chord’s successor
     pointers and finger tables

 •   For node n to join, it must know a node n’ of
     the Chord ring
Coping with joining/leaving
 •   To join, n performs join()

     •   node n asks n’ to find_successor(n), which
         n sets as its sucessor

 •   Each node periodically performs stabilize() to
     learn about newly joined nodes

     •   n asks his successor s for s’ predecessor p

     •   decides whether p should be n’s successor
         instead
Coping with joining/leaving
•   stabilize() also notifies n’s successor s of n’s
    existence

    •   s might change it’s predecessor to n

•   Each node periodically calls fix_fingers()

    •   makes sure fingers are up to date

    •   is how new nodes acquire finger table

    •   is how old nodes add new nodes to their
        tables
Coping with joining/leaving
                                        // n′ thinks it might be our
//join a Chord ring containing node n′.
                                        // predecessor.
n.join(n′)
                                        n.notify(n′)
  predecessor = nil;
                                          if (predecessor is nil or
  successor = n′.find_successor(n);
                                                   n′ ∈ (predecessor, n))
                                            predecessor = n′;
// called periodically. verifies n’s
// immediate successor, and tells the
// successor about n.
                                        // called periodically. refreshes
n.stabilize()
                                        // finger table entries. next stores
  x = successor.predecessor;
                                        // the index of the next finger to fix.
  if (x ∈ (n, successor))
                                        n.fix_fingers()
    successor = x;
                                          next = next + 1 ;
  successor.notify(n);
                                          if (next > m)
                                            next = 1;
                                          finger[next] = find_successor(n+2next−1);
Coping with joining/leaving

                              N21                          N21                           N21                         N21


          successor(N21)

                                                     N26                           N26                         N26
                                                                                   K24                         K24
           N32                         N32                          N32                           N32
           K24                         K24                          K24                           K30

           K30                         K30                          K30



                  (a)                         (b)                            (c)                            (d)

 lustrating the join operation. Node 26 joins the system between nodes 21 and 32. The arcs represent the successor relationsh
  to node 32; (b) node 26 fi nds its successor (i.e., node 32) and points to it; (c) node 26 copies all keys less than 26 from node
ates the successor of node 21 to node 26.


most immediate predecessor of id. In addition,                     its r successors means that it can inform the hig
Coping with joining/leaving
 •   Lookup will only fail if successor pointers are
     not up to date => retry after a pause

 •   As soon as successor pointers are correct,
     lookup will work

     •   over time, finger tables will be up to date
         and linear search is no longer needed

 •   Old finger tables can still be used and in
     general lookup will remain O(log(N)) even if
     not all finger tables are up to date
Successor list
•   Lookup fails if a node doesn’t know its correct
    successor

    •   can occur when nodes fail

•   Solution: each node maintains a successor
    list:

    •   contains the node’s first r successors

    •   if first successor does not respond, try the
        rest of the list
Replication

•   Successor list can also be used for replication

    •   application on top of Chord might store
        replicas of data associated with a key on k
        nodes from n’s successor list

    •   Chord can inform application when this
        successor list changes, so application can
        create new replicas
Chord in practice
•   In practice, Chord ring will never be in stable
    state:

    •   Joins and leaves occur continuously

    •   Interleaved by stabilization algorithm

    •   Not enough time to stabilize before new
        change

•   If stabilization is run at a certain rate:

    •   Chord ring will be continuously “almost
        stable” and lookup will be fast and correct
Virtual nodes
•   Chord load balancing can be improved
    through virtual nodes:

    •   node ids do not uniformly cover entire id
        space

    •   to make (# keys / node) more uniform,
        associate keys with virtual nodes

    •   map r virtual nodes, with unrelated ids, to
        each real node

•   Tradeoff: each real node needs r times as
    much space to store finger table
Network latency
•   Chord path length is O(log(N)), but latency
    can be quite large

    •   Nodes close in id space can be far away in
        the underlying network

•   Idea:

    •   choose next-hop finger based on progress
        in id space and latency in network

        •   maximize progress in id space

        •   minimize latency
Network latency
•   Different approach to improving latency:

    •   each entry in finger table can point to list of
        nodes instead of a single node

    •   nodes have similar ids and are equivalent
        for routing purposes

    •   latency of the nodes could be different

    •   use closest node in terms of network
        distance in the underlying network
Applications of Chord
•   Chord File System (CFS):

    •   Chord locates storage blocks

•   Distributed indexes:

    •   solves Napster’s central server issue
•   Large-scale combinatorial search:

    •   candidate solutions are keys, Chord maps these to
        machines which test them

•   Cooperative mirroring:

    •   Uses Chord to provide load balancing

•   Time-shared storage:

    •   Uses Chord to provide availability
?

Contenu connexe

Tendances

Propositional logic
Propositional logicPropositional logic
Propositional logic
Rushdi Shams
 

Tendances (20)

Sorting
SortingSorting
Sorting
 
Parallel sorting Algorithms
Parallel  sorting AlgorithmsParallel  sorting Algorithms
Parallel sorting Algorithms
 
Merkle Trees and Fusion Trees
Merkle Trees and Fusion TreesMerkle Trees and Fusion Trees
Merkle Trees and Fusion Trees
 
Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search Tree
 
Complements
ComplementsComplements
Complements
 
Data Structure and Algorithms Merge Sort
Data Structure and Algorithms Merge SortData Structure and Algorithms Merge Sort
Data Structure and Algorithms Merge Sort
 
Quick sort-Data Structure
Quick sort-Data StructureQuick sort-Data Structure
Quick sort-Data Structure
 
Time complexity
Time complexityTime complexity
Time complexity
 
Branch & bound
Branch & boundBranch & bound
Branch & bound
 
Introduction to Neural networks (under graduate course) Lecture 7 of 9
Introduction to Neural networks (under graduate course) Lecture 7 of 9Introduction to Neural networks (under graduate course) Lecture 7 of 9
Introduction to Neural networks (under graduate course) Lecture 7 of 9
 
First Order Logic resolution
First Order Logic resolutionFirst Order Logic resolution
First Order Logic resolution
 
CDMA - USE WALSH TABLE TO GENERATE CHIP SEQUENCE
CDMA - USE WALSH TABLE TO GENERATE CHIP SEQUENCE CDMA - USE WALSH TABLE TO GENERATE CHIP SEQUENCE
CDMA - USE WALSH TABLE TO GENERATE CHIP SEQUENCE
 
Uncertain knowledge and reasoning
Uncertain knowledge and reasoningUncertain knowledge and reasoning
Uncertain knowledge and reasoning
 
1.5 binary search tree
1.5 binary search tree1.5 binary search tree
1.5 binary search tree
 
Propositional logic
Propositional logicPropositional logic
Propositional logic
 
Red black tree
Red black treeRed black tree
Red black tree
 
Priority Queue in Data Structure
Priority Queue in Data StructurePriority Queue in Data Structure
Priority Queue in Data Structure
 
Digital Search Tree
Digital Search TreeDigital Search Tree
Digital Search Tree
 
Multi ways trees
Multi ways treesMulti ways trees
Multi ways trees
 
Kademlia introduction
Kademlia introductionKademlia introduction
Kademlia introduction
 

Plus de GertThijs

Easter presentation
Easter presentationEaster presentation
Easter presentation
GertThijs
 
X-mas thesis presentation
X-mas thesis presentationX-mas thesis presentation
X-mas thesis presentation
GertThijs
 
Game Interaction Storyboards
Game Interaction StoryboardsGame Interaction Storyboards
Game Interaction Storyboards
GertThijs
 
Game proposals
Game proposalsGame proposals
Game proposals
GertThijs
 
Thesis Overview
Thesis OverviewThesis Overview
Thesis Overview
GertThijs
 
Eindpresentatie CHI11 - Team Chili Peppers
Eindpresentatie CHI11 - Team Chili PeppersEindpresentatie CHI11 - Team Chili Peppers
Eindpresentatie CHI11 - Team Chili Peppers
GertThijs
 
Storyboard 2 #chi11
Storyboard 2 #chi11Storyboard 2 #chi11
Storyboard 2 #chi11
GertThijs
 

Plus de GertThijs (7)

Easter presentation
Easter presentationEaster presentation
Easter presentation
 
X-mas thesis presentation
X-mas thesis presentationX-mas thesis presentation
X-mas thesis presentation
 
Game Interaction Storyboards
Game Interaction StoryboardsGame Interaction Storyboards
Game Interaction Storyboards
 
Game proposals
Game proposalsGame proposals
Game proposals
 
Thesis Overview
Thesis OverviewThesis Overview
Thesis Overview
 
Eindpresentatie CHI11 - Team Chili Peppers
Eindpresentatie CHI11 - Team Chili PeppersEindpresentatie CHI11 - Team Chili Peppers
Eindpresentatie CHI11 - Team Chili Peppers
 
Storyboard 2 #chi11
Storyboard 2 #chi11Storyboard 2 #chi11
Storyboard 2 #chi11
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Chord presentation

  • 1. Chord: A Scalable Peer- to-peer Lookup Protocol for Internet Applications Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek Hari Balakrishnan
  • 2. About Chord • Protocol and algorithm for distributed hash table (DHT) • Structured-decentralized P2P overlay network
  • 3. About Chord • Supports just one operation: lookup(key) • Given a key, it maps the key onto a node • Returns the IP-address of the node • Associate a value with each key and store the pair on the resulting node: • Distributed hash table • Uses consistent hashing for this operation
  • 4. Chord properties (1) • Load balancing: • spreads keys evenly over nodes • Decentralized: • fully distributed, all nodes equally important • does not take into account heterogeneity of nodes
  • 5. Chord properties (2) • Scalability: • lookup cost: O(log(N)) • Availability: • automatically updates lookup tables as nodes join, leave, fail • Flexible naming: • Flat key-space, no constraint on structure of keys
  • 6. Traditional hashing • Given an object, assign it to a bin from a set of N bins • Each bin: roughly same amount of objects • Problem: • If N changes, all objects need to be reassigned
  • 7. Consistent hashing • Evenly distributes K objects across N bins, about K/N each • When N changes: • Not all objects need to be moved • Only O(K/N) need to • Uses a base hash function, such as SHA-1
  • 8. Chord protocol • Assign each node and key an m-bit identifier using SHA-1: • node: hash the IP-address • key: hash the key • Identifiers are ordered on an identifier circle module 2m, the Chord ring: • circle of numbers from 0 to 2m-1
  • 9. Assigning keys to nodes • Assigning key k to a node: • first node whose identifier is equal to or follows the identifier of key k in the identifier space • successor node of k: successor(k) • first node clockwise from k on Chord ring
  • 10. Chord ring (m=6) N1 N56 N8 K10 K54 N51 N14 N48 N21 N42 K24 K38 N38 N32 K30
  • 11. Minimal disruption • Minimal disruption when node joins/leaves • n joins: • certain keys assigned to successor(n) are now assigned to n • n leaves: • all keys assigned to n are now assigned to successor(n)
  • 12. Simple Chord lookup • Simplest Chord lookup algorithm: • Each node only needs to know its successor on Chord ring • Query for identifier id is passed around Chord ring until successor(id) is found • Result returned along reverse path • Easy but not efficient: # messages O(N)
  • 13. Simple Chord lookup N1 lookup(K54) / ask node n to fi nd the successor of id N8 K54 N56 n.find successor(id) the successor // ask node n to find if// of id successor]) (id ∈ (n, n.find_successor(id) return successor; N51 N14 else return (n, successor]) if (id ∈ successor; N48 // forward the query around the circle else return successor.fiquery around the // forward the nd successor(id); // circle return successor.find_successor(id); N21 N42 (a) N38 N32 (b) seudocode to fi nd the successor node of an identifi er id. Remote procedure calls and variable lookups a query from node 8 for key 54, using the pseudocode in Figure 3(a).
  • 14. Finger table • Chord maintains additional routing info: • not essential for correctness • but allows acceleration of lookups • Each node maintains finger table with m entries: • n.finger[i] = successor(n + 2i-1)
  • 15. Finger table le (but slow) pseudocode to fi nd the successor node of an identifi er id. Rem e path taken by a query from node 8 for key 54, using the pseudocode in Figu N1 Finger table N8 +1 N8 + 1 N14 N8 + 2 N14 K54 N +2 N8 + 4 N14 +4 N8 + 8 N21 N51 N51 +32 +8 N14 N8 +16 N32 N8 +32 N42 N48 +16 N48 N21 N42 N N38 N32
  • 16. Improved lookup • Lookup for id now works as follows: • If id falls between n and successor(n), successor(n) is returned • Otherwise, lookup is performed at node n’, which is the node in the finger table of n that most immediately precedes id • Since each node has finger entries at power of two intervals around the identifier circle, each node can forward a query at least halfway along the remaining distance between the node and the target identifier. • Thus O(log(N)) nodes need to be contacted
  • 17. N32 (b) Chord protocol successor node of an identifi er id. Remote procedure calls and variable lookups ar for key 54, using the pseudocode in Figure 3(a). 1 N1 Finger table lookup(54) // ask node n to find the successor N8 // of id +1 N8 + 1 N14 N8 n.find_successor(id) N8 + 2 N14 K54 N56 if (id ∈ (n, successor]) N14 N8 + 4 +2 return successor; + 8 N21 +4 N8 else N51 +8 N14 N8 +16 N32 N14 n′ = closest_preceding_node(id); N8 +32 N42 16 return n′.find_successor(id); N48 // search the local table for the // highest predecessor of id n.closest_preceding_node(id) N21 N21 for i = m downto 1 if (finger[i] ∈ (n, id)) N42 return finger[i]; return n; N38 N32 (a) (b)
  • 18. Coping with joining/leaving • Essential that successor pointers stay up to date • Each node periodically runs stabilization protocol which updates Chord’s successor pointers and finger tables • For node n to join, it must know a node n’ of the Chord ring
  • 19. Coping with joining/leaving • To join, n performs join() • node n asks n’ to find_successor(n), which n sets as its sucessor • Each node periodically performs stabilize() to learn about newly joined nodes • n asks his successor s for s’ predecessor p • decides whether p should be n’s successor instead
  • 20. Coping with joining/leaving • stabilize() also notifies n’s successor s of n’s existence • s might change it’s predecessor to n • Each node periodically calls fix_fingers() • makes sure fingers are up to date • is how new nodes acquire finger table • is how old nodes add new nodes to their tables
  • 21. Coping with joining/leaving // n′ thinks it might be our //join a Chord ring containing node n′. // predecessor. n.join(n′) n.notify(n′) predecessor = nil; if (predecessor is nil or successor = n′.find_successor(n); n′ ∈ (predecessor, n)) predecessor = n′; // called periodically. verifies n’s // immediate successor, and tells the // successor about n. // called periodically. refreshes n.stabilize() // finger table entries. next stores x = successor.predecessor; // the index of the next finger to fix. if (x ∈ (n, successor)) n.fix_fingers() successor = x; next = next + 1 ; successor.notify(n); if (next > m) next = 1; finger[next] = find_successor(n+2next−1);
  • 22. Coping with joining/leaving N21 N21 N21 N21 successor(N21) N26 N26 N26 K24 K24 N32 N32 N32 N32 K24 K24 K24 K30 K30 K30 K30 (a) (b) (c) (d) lustrating the join operation. Node 26 joins the system between nodes 21 and 32. The arcs represent the successor relationsh to node 32; (b) node 26 fi nds its successor (i.e., node 32) and points to it; (c) node 26 copies all keys less than 26 from node ates the successor of node 21 to node 26. most immediate predecessor of id. In addition, its r successors means that it can inform the hig
  • 23. Coping with joining/leaving • Lookup will only fail if successor pointers are not up to date => retry after a pause • As soon as successor pointers are correct, lookup will work • over time, finger tables will be up to date and linear search is no longer needed • Old finger tables can still be used and in general lookup will remain O(log(N)) even if not all finger tables are up to date
  • 24. Successor list • Lookup fails if a node doesn’t know its correct successor • can occur when nodes fail • Solution: each node maintains a successor list: • contains the node’s first r successors • if first successor does not respond, try the rest of the list
  • 25. Replication • Successor list can also be used for replication • application on top of Chord might store replicas of data associated with a key on k nodes from n’s successor list • Chord can inform application when this successor list changes, so application can create new replicas
  • 26. Chord in practice • In practice, Chord ring will never be in stable state: • Joins and leaves occur continuously • Interleaved by stabilization algorithm • Not enough time to stabilize before new change • If stabilization is run at a certain rate: • Chord ring will be continuously “almost stable” and lookup will be fast and correct
  • 27. Virtual nodes • Chord load balancing can be improved through virtual nodes: • node ids do not uniformly cover entire id space • to make (# keys / node) more uniform, associate keys with virtual nodes • map r virtual nodes, with unrelated ids, to each real node • Tradeoff: each real node needs r times as much space to store finger table
  • 28. Network latency • Chord path length is O(log(N)), but latency can be quite large • Nodes close in id space can be far away in the underlying network • Idea: • choose next-hop finger based on progress in id space and latency in network • maximize progress in id space • minimize latency
  • 29. Network latency • Different approach to improving latency: • each entry in finger table can point to list of nodes instead of a single node • nodes have similar ids and are equivalent for routing purposes • latency of the nodes could be different • use closest node in terms of network distance in the underlying network
  • 30. Applications of Chord • Chord File System (CFS): • Chord locates storage blocks • Distributed indexes: • solves Napster’s central server issue • Large-scale combinatorial search: • candidate solutions are keys, Chord maps these to machines which test them • Cooperative mirroring: • Uses Chord to provide load balancing • Time-shared storage: • Uses Chord to provide availability
  • 31. ?

Notes de l'éditeur

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n