Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Xiang Li
xiang.li@coreos.com
etcd - mission-critical key-value store
CoreOS Meetup - May 25th
Uncoordinated Upgrades
... ... ...
... ... ...
Unavailable
Uncoordinated Upgrades
Motivation
CoreOS cluster reboot lock
- Decrement a semaphore key atomically
- Reboot and wait...
- After reboot increment...
3
CoreOS updates coordination
CoreOS updates coordination
3
...
CoreOS updates coordination
2
... ... ...
CoreOS updates coordination
0
... ... ...
CoreOS updates coordination
0
... ...
CoreOS updates coordination
0
... ...
CoreOS updates coordination
0
... ...
CoreOS updates coordination
1
... ...
...
CoreOS updates coordination
0
CoreOS updates coordination
Store Application Configuration
config
config
Start / RestartStart / Restart
Store Application Configuration
config
Update
Store Application Configuration
config
Unavailable
Store Application Configuration
Requirements
Strong Consistency
- mutual exclusive at any time for locking purpose
Highly Available
- resilient to single ...
Requirements
CAP
- We want CP
- We want something like Paxos
Common problem
GFS
Paxos
Big Table
Spanner
CFS
Chubby
Google - “All” infrastructure relies on Paxos
Common problem
Amazon - Replicated log powers ec2
Microsoft - Boxwood powers storage
infrastructure
Hadoop - ZooKeeper is ...
ETCD KEY VALUE STORE
Fully Replicated, Highly Available,
Consistent
PUT(foo, bar), GET(foo), DELETE(foo)
Watch(foo)
CAS(foo, bar, bar1)
Key-value Operations
DEMO
play.etcd.io
Runtime Reconfiguration
Point-in-time Backup
Extensive Metrics
etcd Operationality
ETCD v3
Successor of etcd v2
ETCD v3
Better Performance
ETCD v3
More Efficient APIs
Multi-Version
Put(foo, bar)
Put(foo, bar1)
Put(foo, bar2)
Get(foo) -> bar2
Multi-Version
Put(foo, bar)
Put(foo, bar1)
Put(foo, bar2)
Get(foo, 1) -> bar
Tx.If(
Compare(Value("foo"), ">", "bar"),
Compare(Version("foo"), "=", 2),
...
).Then(
Put("ok","true")...
).Else(
Put("ok...
l = CreateLease(15 * second)
Put(foo, bar, l)
l.KeepAlive()
l.Revoke()
Leases
w = Watch(foo)
for {
r = w.Recv()
print (r.Event) // PUT
print (r.KV) // foo,bar
}
Streaming Watch
ETCD v2
machine coordination -> Thousands
ETCD v3
app/container coordination -> Millions
Performance 1K keys
Performance
Snapshot caused
performance degradation
etcd2 - 600K keys
Performance etcd2 - 600K keys
Snapshot triggered
elections
ZooKeeper Performance
Non-blocking full snapshot
Efficient memory management
Performance ZooKeeper default
Performance
Snapshot triggered
election
ZooKeeper default
Performance
Snapshot
ZooKeeper default
Performance
GC
ZooKeeper snapshot disabled
Reliable Performance
- Similar to ZooKeeper with snapshot disabled
- Incremental snapshot
- No Garbage Collection Pauses
-...
Performance etcd3 /ZooKeeper snapshot disabled
Performance etcd3 /ZooKeeper snapshot disabled
Memory
10GB
2.4GB
0.8GB
512MB data - 2M 256B keys
Reliability
99% at small scale is easy
- Failure is infrequent and human manageable
99% at large scale is not enough
- Not...
HOW DO WE ACHIEVE RELIABILITY
WAL, Snapshots, Testing
Write Ahead Log
Append only
- Simple is good
Rolling CRC protected
- Storage & OSes can be unreliable
Snapshots
Torturing DBs for Fun and Profit (OSDI2014)
- The simpler database is safer
- LMDB was the winner
Boltdb an appe...
Testing Clusters Failure
Inject failures into running clusters
White box runtime checking
- Hash state of the system
- Pro...
Testing Cluster Health with Failures
Issue lock operations across cluster
Ensure the correctness of client library
TESTING CLUSTER
dash.etcd.io
etcd/raft Reliability
Designed for testability and flexibility
Used by large scale db systems and others
- Cockroachdb, Ti...
etcd Reliability
Do it Really Well
ETCD v3.0 BETA
Efficient and Scalable
BETA AVAILABLE TODAY
github.com/coreos/etcd
FUTURE WORK
Proxy, Caching, Watch Coalescing,
Secondary Index
Xiang Li
xiang.li@coreos.com
etcd - mission-critical key-value store
Thank you!
Prochain SlideShare
Chargement dans…5
×

Etcd- Mission Critical Key-Value Store

1 119 vues

Publié le

Xiang Li, Head of Distributed Systems at CoreOS
SF CoreOS Meetup
May 25th, 2016

Publié dans : Technologie
  • Soyez le premier à commenter

Etcd- Mission Critical Key-Value Store

  1. 1. Xiang Li xiang.li@coreos.com etcd - mission-critical key-value store CoreOS Meetup - May 25th
  2. 2. Uncoordinated Upgrades
  3. 3. ... ... ... ... ... ... Unavailable Uncoordinated Upgrades
  4. 4. Motivation CoreOS cluster reboot lock - Decrement a semaphore key atomically - Reboot and wait... - After reboot increment the semaphore key
  5. 5. 3 CoreOS updates coordination
  6. 6. CoreOS updates coordination 3
  7. 7. ... CoreOS updates coordination 2
  8. 8. ... ... ... CoreOS updates coordination 0
  9. 9. ... ... ... CoreOS updates coordination 0
  10. 10. ... ... CoreOS updates coordination 0
  11. 11. ... ... CoreOS updates coordination 0
  12. 12. ... ... CoreOS updates coordination 1
  13. 13. ... ... ... CoreOS updates coordination 0
  14. 14. CoreOS updates coordination
  15. 15. Store Application Configuration config
  16. 16. config Start / RestartStart / Restart Store Application Configuration
  17. 17. config Update Store Application Configuration
  18. 18. config Unavailable Store Application Configuration
  19. 19. Requirements Strong Consistency - mutual exclusive at any time for locking purpose Highly Available - resilient to single points of failure & network partitions Watchable - push configuration updates to application
  20. 20. Requirements CAP - We want CP - We want something like Paxos
  21. 21. Common problem GFS Paxos Big Table Spanner CFS Chubby Google - “All” infrastructure relies on Paxos
  22. 22. Common problem Amazon - Replicated log powers ec2 Microsoft - Boxwood powers storage infrastructure Hadoop - ZooKeeper is the heart of the ecosystem
  23. 23. ETCD KEY VALUE STORE Fully Replicated, Highly Available, Consistent
  24. 24. PUT(foo, bar), GET(foo), DELETE(foo) Watch(foo) CAS(foo, bar, bar1) Key-value Operations
  25. 25. DEMO play.etcd.io
  26. 26. Runtime Reconfiguration Point-in-time Backup Extensive Metrics etcd Operationality
  27. 27. ETCD v3 Successor of etcd v2
  28. 28. ETCD v3 Better Performance
  29. 29. ETCD v3 More Efficient APIs
  30. 30. Multi-Version Put(foo, bar) Put(foo, bar1) Put(foo, bar2) Get(foo) -> bar2
  31. 31. Multi-Version Put(foo, bar) Put(foo, bar1) Put(foo, bar2) Get(foo, 1) -> bar
  32. 32. Tx.If( Compare(Value("foo"), ">", "bar"), Compare(Version("foo"), "=", 2), ... ).Then( Put("ok","true")... ).Else( Put("ok","false")... ).Commit() Mini-Transactions
  33. 33. l = CreateLease(15 * second) Put(foo, bar, l) l.KeepAlive() l.Revoke() Leases
  34. 34. w = Watch(foo) for { r = w.Recv() print (r.Event) // PUT print (r.KV) // foo,bar } Streaming Watch
  35. 35. ETCD v2 machine coordination -> Thousands
  36. 36. ETCD v3 app/container coordination -> Millions
  37. 37. Performance 1K keys
  38. 38. Performance Snapshot caused performance degradation etcd2 - 600K keys
  39. 39. Performance etcd2 - 600K keys Snapshot triggered elections
  40. 40. ZooKeeper Performance Non-blocking full snapshot Efficient memory management
  41. 41. Performance ZooKeeper default
  42. 42. Performance Snapshot triggered election ZooKeeper default
  43. 43. Performance Snapshot ZooKeeper default
  44. 44. Performance GC ZooKeeper snapshot disabled
  45. 45. Reliable Performance - Similar to ZooKeeper with snapshot disabled - Incremental snapshot - No Garbage Collection Pauses - Off-heap storage
  46. 46. Performance etcd3 /ZooKeeper snapshot disabled
  47. 47. Performance etcd3 /ZooKeeper snapshot disabled
  48. 48. Memory 10GB 2.4GB 0.8GB 512MB data - 2M 256B keys
  49. 49. Reliability 99% at small scale is easy - Failure is infrequent and human manageable 99% at large scale is not enough - Not manageable by humans 99.99% at large scale - Reliable systems at bottom layer
  50. 50. HOW DO WE ACHIEVE RELIABILITY WAL, Snapshots, Testing
  51. 51. Write Ahead Log Append only - Simple is good Rolling CRC protected - Storage & OSes can be unreliable
  52. 52. Snapshots Torturing DBs for Fun and Profit (OSDI2014) - The simpler database is safer - LMDB was the winner Boltdb an append only B+Tree - A simpler LMDB written in Go
  53. 53. Testing Clusters Failure Inject failures into running clusters White box runtime checking - Hash state of the system - Progress of the system
  54. 54. Testing Cluster Health with Failures Issue lock operations across cluster Ensure the correctness of client library
  55. 55. TESTING CLUSTER dash.etcd.io
  56. 56. etcd/raft Reliability Designed for testability and flexibility Used by large scale db systems and others - Cockroachdb, TiKV, Dgraph
  57. 57. etcd Reliability Do it Really Well
  58. 58. ETCD v3.0 BETA Efficient and Scalable
  59. 59. BETA AVAILABLE TODAY github.com/coreos/etcd
  60. 60. FUTURE WORK Proxy, Caching, Watch Coalescing, Secondary Index
  61. 61. Xiang Li xiang.li@coreos.com etcd - mission-critical key-value store Thank you!

×