Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
CURB TAIL LATENCY
IN-MEMORY CACHING:
WITH PELIKAN
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese an...
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practi...
ABOUT ME
• 6 years at Twitter, on cache
• maintainer of Twemcache (OSS), Twitter’s Redis fork
• operations of thousands of...
CACHE PERFORMANCE
THE PROBLEM:
CACHE
RULES
EVERYTHING
AROUND
ME
CACHE DB
SERVICE
CACHE
RUINS
EVERYTHING
AROUND
ME
CACHE DB
SERVICE
😣
SENSITIVE!
😣
GOOD CACHE PERFORMANCE
=
PREDICTABLE LATENCY
GOOD CACHE PERFORMANCE
=
PREDICTABLE TAIL LATENCY
“MILLIONS OF QPS PER MACHINE”
“SUB-MILLISECOND LATENCIES”
“NEAR LINE-RATE THROUGHPUT”
…
KING OF PERFORMANCE
“USUALLY PRETTY FAST”
“HICCUPS EVERY ONCE IN A WHILE”
“TIMEOUT SPIKES AT THE TOP OF THE HOUR”
“SLOW ONLY WHEN MEMORY IS LO...
I SPENT FIRST 3 MONTHS AT TWITTER
LEARNING CACHE BASICS…
…AND THE NEXT 5 YEARS CHASING
GHOSTS
MINIMIZE
INDETERMINISTIC
BEHAVIOR
CONTAIN GHOSTS
=
HOW?
IDENTIFY
AVOID MITIGATE
CACHING IN DATACENTER
A PRIMER:
CONTEXT
• geographically centralized
• highly homogeneous network
• reliable, predictable infrastructure
• long-lived conn...
MAINLY:
REQUEST → RESPONSE
CACHE IN PRODUCTION
INITIALLY:
CONNECT
ALSO (BECAUSE WE ARE ADULTS):
STATS, LOGGING, HEALTH CHE...
CACHE: BIRD’S VIEW
HOST
event-driven
server
protocol
data
storage
OS
network infrastructure
HOW DID WE UNCOVER THE
UNCERTAINTIES?
”
“
BANDWIDTH UTILIZATION WENT WAY
UP, BUT REQUEST RATE WAY DOWN.
SYSCALLS
CONNECTING IS SYSCALL-HEAVY
read
event
accept config register4+ syscalls
REQUEST IS SYSCALL-LIGHT
read
event
IO
(read)
post-
read
parse process compose
write
event
IO
(write)
post-
write
3 syscal...
TWEMCACHE IS MOSTLY SYSCALLS
• 1-2 µs overhead per call
• dominate CPU time in simple cache
• What if we have 100k conns /...
CONNECTION STORM
culprit:
”
“
…TWEMCACHE RANDOM HICCUPS,
ALWAYS AT THE TOP OF THE HOUR.
DISK
⏱
cache
tworker
logging
cron job “x”
I/O
BLOCKING I/O
culprit:
”
“
WE ARE SEEING SEVERAL “BLIPS”
AFTER EACH CACHE REBOOT…
LOCKING FACTS
• ~25ns per operation
• more expensive on NUMA
• much more costly when contended
source
MEMCACHE RESTART
…
EVERYTHING IS FINE
REQUESTS SUDDENLY GET SLOW/TIMED-OUT
CONNECTION STORM
CLIENTS TOPPLE
SLOWLY RECOVER
...
LOCKING
culprit:
”
“
HOSTS WITH LONG RUNNING CACHE
TRIGGERS OOM WHEN LOAD SPIKE.
”
“
REDIS INSTANCES WERE KILLED BY
SCHEDULER.
MEMORY
culprit:
CONNECTION STORM
BLOCKING I/O
LOCKING
MEMORY
SUMMARY
HOW TO MITIGATE?
DATA PLANE,
CONTROL PLANE
PUT OPERATIONS OF DIFFERENT NATURE / PURPOSE
ON SEPARATE THREADS
HIDE EXPENSIVE OPS
LISTENING (ADMIN CONNECTIONS)
STATS AGGREGATION
STATS EXPORTING
LOG DUMP
SLOW: CONTROL PLANE
FAST: DATA PLANE / REQUEST
read
event
IO
(read)
post-
read
parse process compose
write
event
IO
(write)
post-
write
:
twor...
FAST: DATA PLANE / CONNECT
read
event
accept config
read
event
register
:
tworker
:
tserver
dispatch
LATENCY-ORIENTED THREADING
tworker
tserver tadmin
new
connection
logging,
stats update
logging,
stats update
REQUESTS
CONN...
WHAT TO AVOID?
LOCKING
WHAT WE KNOW
• inter-thread communication in cache
• stats
• logging
• connection hand-off
• locking propagates blocking/d...
MAKE STATS UPDATE LOCKLESS
LOCKLESS OPERATIONS
w/ atomic instructions
MAKE LOGGING WAITLESS
LOCKLESS OPERATIONS
RING/CYCLIC BUFFER
read
position
writer
reader
write
position
MAKE CONNECTION HAND-OFF LOCKLESS
LOCKLESS OPERATIONS
RING ARRAY
read
position
writer
reader
write
position
… …
MEMORY
WHAT WE KNOW
• alloc-free cause fragmentation
• internal vs external fragmentation
• OOM/swapping is deadly
• memory alloc...
AVOID EXTERNAL FRAGMENTATION
CAP ALL MEMORY RESOURCES
PREDICTABLE FOOTPRINT
REUSE BUFFER
PREALLOCATE
PREDICTABLE RUNTIME
PELIKAN CACHE
IMPLEMENTATION
WHAT IS PELIKAN CACHE?
• (Datacenter-) Caching framework
• A summary of Twitter’s cache ops
• Perf goal: deterministically...
A COMPARISON
PERFORMANCE DESIGN DECISIONS
latency-oriented
threading
Memory/
fragmentation
Memory/
buffer caching
Memory/
...
MEMCACHED REDIS
TO BE FAIR…
• multiple worker threads
• binary protocol + SASL
• rich set of data structures
• master-slav...
ALWAYS FAST
THE BEST CACHE IS…
QUESTIONS?
Watch the video with slide synchronization on
InfoQ.com!
https://www.infoq.com/presentations/pelikan
In-memory Caching: Curb Tail Latency with Pelikan
Prochain SlideShare
Chargement dans…5
×

In-memory Caching: Curb Tail Latency with Pelikan

299 vues

Publié le

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2h34ibv.

Yao Yue introduces Pelikan, a framework to implement distributed caches such as Memcached and Redis. Filmed at qconsf.com.

Yao Yue is a Distributed Systems Engineer Working on Cache at Twitter.

Publié dans : Technologie
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

In-memory Caching: Curb Tail Latency with Pelikan

  1. 1. CURB TAIL LATENCY IN-MEMORY CACHING: WITH PELIKAN
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ pelikan
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  4. 4. ABOUT ME • 6 years at Twitter, on cache • maintainer of Twemcache (OSS), Twitter’s Redis fork • operations of thousands of machines • hundreds of (internal) customers • Now working on Pelikan, a next-gen cache framework to replace the above @twitter • Twitter: @thinkingfish
  5. 5. CACHE PERFORMANCE THE PROBLEM:
  6. 6. CACHE RULES EVERYTHING AROUND ME CACHE DB SERVICE
  7. 7. CACHE RUINS EVERYTHING AROUND ME CACHE DB SERVICE 😣 SENSITIVE! 😣
  8. 8. GOOD CACHE PERFORMANCE = PREDICTABLE LATENCY
  9. 9. GOOD CACHE PERFORMANCE = PREDICTABLE TAIL LATENCY
  10. 10. “MILLIONS OF QPS PER MACHINE” “SUB-MILLISECOND LATENCIES” “NEAR LINE-RATE THROUGHPUT” … KING OF PERFORMANCE
  11. 11. “USUALLY PRETTY FAST” “HICCUPS EVERY ONCE IN A WHILE” “TIMEOUT SPIKES AT THE TOP OF THE HOUR” “SLOW ONLY WHEN MEMORY IS LOW” … GHOSTS OF PERFORMANCE
  12. 12. I SPENT FIRST 3 MONTHS AT TWITTER LEARNING CACHE BASICS… …AND THE NEXT 5 YEARS CHASING GHOSTS
  13. 13. MINIMIZE INDETERMINISTIC BEHAVIOR CONTAIN GHOSTS =
  14. 14. HOW? IDENTIFY AVOID MITIGATE
  15. 15. CACHING IN DATACENTER A PRIMER:
  16. 16. CONTEXT • geographically centralized • highly homogeneous network • reliable, predictable infrastructure • long-lived connections • high data rate • simple data/operations
  17. 17. MAINLY: REQUEST → RESPONSE CACHE IN PRODUCTION INITIALLY: CONNECT ALSO (BECAUSE WE ARE ADULTS): STATS, LOGGING, HEALTH CHECK…
  18. 18. CACHE: BIRD’S VIEW HOST event-driven server protocol data storage OS network infrastructure
  19. 19. HOW DID WE UNCOVER THE UNCERTAINTIES?
  20. 20. ” “ BANDWIDTH UTILIZATION WENT WAY UP, BUT REQUEST RATE WAY DOWN.
  21. 21. SYSCALLS
  22. 22. CONNECTING IS SYSCALL-HEAVY read event accept config register4+ syscalls
  23. 23. REQUEST IS SYSCALL-LIGHT read event IO (read) post- read parse process compose write event IO (write) post- write 3 syscalls* *: event loop returns multiple read events at once, I/O syscalls can be further amortized by batching/pipelining
  24. 24. TWEMCACHE IS MOSTLY SYSCALLS • 1-2 µs overhead per call • dominate CPU time in simple cache • What if we have 100k conns / sec? source
  25. 25. CONNECTION STORM culprit:
  26. 26. ” “ …TWEMCACHE RANDOM HICCUPS, ALWAYS AT THE TOP OF THE HOUR.
  27. 27. DISK ⏱ cache tworker logging cron job “x” I/O
  28. 28. BLOCKING I/O culprit:
  29. 29. ” “ WE ARE SEEING SEVERAL “BLIPS” AFTER EACH CACHE REBOOT…
  30. 30. LOCKING FACTS • ~25ns per operation • more expensive on NUMA • much more costly when contended source
  31. 31. MEMCACHE RESTART … EVERYTHING IS FINE REQUESTS SUDDENLY GET SLOW/TIMED-OUT CONNECTION STORM CLIENTS TOPPLE SLOWLY RECOVER (REPEAT A FEW TIMES) … STABILIZE A TIMELINE lock! lock!
  32. 32. LOCKING culprit:
  33. 33. ” “ HOSTS WITH LONG RUNNING CACHE TRIGGERS OOM WHEN LOAD SPIKE.
  34. 34. ” “ REDIS INSTANCES WERE KILLED BY SCHEDULER.
  35. 35. MEMORY culprit:
  36. 36. CONNECTION STORM BLOCKING I/O LOCKING MEMORY SUMMARY
  37. 37. HOW TO MITIGATE?
  38. 38. DATA PLANE, CONTROL PLANE
  39. 39. PUT OPERATIONS OF DIFFERENT NATURE / PURPOSE ON SEPARATE THREADS HIDE EXPENSIVE OPS
  40. 40. LISTENING (ADMIN CONNECTIONS) STATS AGGREGATION STATS EXPORTING LOG DUMP SLOW: CONTROL PLANE
  41. 41. FAST: DATA PLANE / REQUEST read event IO (read) post- read parse process compose write event IO (write) post- write : tworker
  42. 42. FAST: DATA PLANE / CONNECT read event accept config read event register : tworker : tserver dispatch
  43. 43. LATENCY-ORIENTED THREADING tworker tserver tadmin new connection logging, stats update logging, stats update REQUESTS CONNECTS OTHER
  44. 44. WHAT TO AVOID?
  45. 45. LOCKING
  46. 46. WHAT WE KNOW • inter-thread communication in cache • stats • logging • connection hand-off • locking propagates blocking/delay between threads tworker tserver tadmin new connection logging, stats update logging, stats update
  47. 47. MAKE STATS UPDATE LOCKLESS LOCKLESS OPERATIONS w/ atomic instructions
  48. 48. MAKE LOGGING WAITLESS LOCKLESS OPERATIONS RING/CYCLIC BUFFER read position writer reader write position
  49. 49. MAKE CONNECTION HAND-OFF LOCKLESS LOCKLESS OPERATIONS RING ARRAY read position writer reader write position … …
  50. 50. MEMORY
  51. 51. WHAT WE KNOW • alloc-free cause fragmentation • internal vs external fragmentation • OOM/swapping is deadly • memory alloc/copy relatively expensive source
  52. 52. AVOID EXTERNAL FRAGMENTATION CAP ALL MEMORY RESOURCES PREDICTABLE FOOTPRINT
  53. 53. REUSE BUFFER PREALLOCATE PREDICTABLE RUNTIME
  54. 54. PELIKAN CACHE IMPLEMENTATION
  55. 55. WHAT IS PELIKAN CACHE? • (Datacenter-) Caching framework • A summary of Twitter’s cache ops • Perf goal: deterministically fast • Clean, modular design • Open-source waitless logging lockless metrics composed config channels buffers timer alarm poo ling streams events data store parse/compose/tracedata model request response server orchestration threading common core cache process pelikan.io
  56. 56. A COMPARISON PERFORMANCE DESIGN DECISIONS latency-oriented threading Memory/ fragmentation Memory/ buffer caching Memory/ pre-allocation, cap locking Memcached partial internal partial partial yes Redis no->partial external no partial no->yes Pelikan yes internal yes yes no
  57. 57. MEMCACHED REDIS TO BE FAIR… • multiple worker threads • binary protocol + SASL • rich set of data structures • master-slave replication • redis-cluster • modules • tools
  58. 58. ALWAYS FAST THE BEST CACHE IS…
  59. 59. QUESTIONS?
  60. 60. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/pelikan

×