SlideShare une entreprise Scribd logo
1  sur  61
CURB TAIL LATENCY
IN-MEMORY CACHING:
WITH PELIKAN
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
pelikan
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
ABOUT ME
• 6 years at Twitter, on cache
• maintainer of Twemcache (OSS), Twitter’s Redis fork
• operations of thousands of machines
• hundreds of (internal) customers
• Now working on Pelikan, a next-gen cache framework to replace the above @twitter
• Twitter: @thinkingfish
CACHE PERFORMANCE
THE PROBLEM:
CACHE
RULES
EVERYTHING
AROUND
ME
CACHE DB
SERVICE
CACHE
RUINS
EVERYTHING
AROUND
ME
CACHE DB
SERVICE
😣
SENSITIVE!
😣
GOOD CACHE PERFORMANCE
=
PREDICTABLE LATENCY
GOOD CACHE PERFORMANCE
=
PREDICTABLE TAIL LATENCY
“MILLIONS OF QPS PER MACHINE”
“SUB-MILLISECOND LATENCIES”
“NEAR LINE-RATE THROUGHPUT”
…
KING OF PERFORMANCE
“USUALLY PRETTY FAST”
“HICCUPS EVERY ONCE IN A WHILE”
“TIMEOUT SPIKES AT THE TOP OF THE HOUR”
“SLOW ONLY WHEN MEMORY IS LOW”
…
GHOSTS OF PERFORMANCE
I SPENT FIRST 3 MONTHS AT TWITTER
LEARNING CACHE BASICS…
…AND THE NEXT 5 YEARS CHASING
GHOSTS
MINIMIZE
INDETERMINISTIC
BEHAVIOR
CONTAIN GHOSTS
=
HOW?
IDENTIFY
AVOID MITIGATE
CACHING IN DATACENTER
A PRIMER:
CONTEXT
• geographically centralized
• highly homogeneous network
• reliable, predictable infrastructure
• long-lived connections
• high data rate
• simple data/operations
MAINLY:
REQUEST → RESPONSE
CACHE IN PRODUCTION
INITIALLY:
CONNECT
ALSO (BECAUSE WE ARE ADULTS):
STATS, LOGGING, HEALTH CHECK…
CACHE: BIRD’S VIEW
HOST
event-driven
server
protocol
data
storage
OS
network infrastructure
HOW DID WE UNCOVER THE
UNCERTAINTIES?
”
“
BANDWIDTH UTILIZATION WENT WAY
UP, BUT REQUEST RATE WAY DOWN.
SYSCALLS
CONNECTING IS SYSCALL-HEAVY
read
event
accept config register4+ syscalls
REQUEST IS SYSCALL-LIGHT
read
event
IO
(read)
post-
read
parse process compose
write
event
IO
(write)
post-
write
3 syscalls*
*: event loop returns multiple read events at once, I/O syscalls can be further amortized by batching/pipelining
TWEMCACHE IS MOSTLY SYSCALLS
• 1-2 µs overhead per call
• dominate CPU time in simple cache
• What if we have 100k conns / sec?
source
CONNECTION STORM
culprit:
”
“
…TWEMCACHE RANDOM HICCUPS,
ALWAYS AT THE TOP OF THE HOUR.
DISK
⏱
cache
tworker
logging
cron job “x”
I/O
BLOCKING I/O
culprit:
”
“
WE ARE SEEING SEVERAL “BLIPS”
AFTER EACH CACHE REBOOT…
LOCKING FACTS
• ~25ns per operation
• more expensive on NUMA
• much more costly when contended
source
MEMCACHE RESTART
…
EVERYTHING IS FINE
REQUESTS SUDDENLY GET SLOW/TIMED-OUT
CONNECTION STORM
CLIENTS TOPPLE
SLOWLY RECOVER
(REPEAT A FEW TIMES)
…
STABILIZE
A TIMELINE
lock!
lock!
LOCKING
culprit:
”
“
HOSTS WITH LONG RUNNING CACHE
TRIGGERS OOM WHEN LOAD SPIKE.
”
“
REDIS INSTANCES WERE KILLED BY
SCHEDULER.
MEMORY
culprit:
CONNECTION STORM
BLOCKING I/O
LOCKING
MEMORY
SUMMARY
HOW TO MITIGATE?
DATA PLANE,
CONTROL PLANE
PUT OPERATIONS OF DIFFERENT NATURE / PURPOSE
ON SEPARATE THREADS
HIDE EXPENSIVE OPS
LISTENING (ADMIN CONNECTIONS)
STATS AGGREGATION
STATS EXPORTING
LOG DUMP
SLOW: CONTROL PLANE
FAST: DATA PLANE / REQUEST
read
event
IO
(read)
post-
read
parse process compose
write
event
IO
(write)
post-
write
:
tworker
FAST: DATA PLANE / CONNECT
read
event
accept config
read
event
register
:
tworker
:
tserver
dispatch
LATENCY-ORIENTED THREADING
tworker
tserver tadmin
new
connection
logging,
stats update
logging,
stats update
REQUESTS
CONNECTS OTHER
WHAT TO AVOID?
LOCKING
WHAT WE KNOW
• inter-thread communication in cache
• stats
• logging
• connection hand-off
• locking propagates blocking/delay
between threads
tworker
tserver tadmin
new
connection
logging,
stats update
logging,
stats update
MAKE STATS UPDATE LOCKLESS
LOCKLESS OPERATIONS
w/ atomic instructions
MAKE LOGGING WAITLESS
LOCKLESS OPERATIONS
RING/CYCLIC BUFFER
read
position
writer
reader
write
position
MAKE CONNECTION HAND-OFF LOCKLESS
LOCKLESS OPERATIONS
RING ARRAY
read
position
writer
reader
write
position
… …
MEMORY
WHAT WE KNOW
• alloc-free cause fragmentation
• internal vs external fragmentation
• OOM/swapping is deadly
• memory alloc/copy relatively
expensive
source
AVOID EXTERNAL FRAGMENTATION
CAP ALL MEMORY RESOURCES
PREDICTABLE FOOTPRINT
REUSE BUFFER
PREALLOCATE
PREDICTABLE RUNTIME
PELIKAN CACHE
IMPLEMENTATION
WHAT IS PELIKAN CACHE?
• (Datacenter-) Caching framework
• A summary of Twitter’s cache ops
• Perf goal: deterministically fast
• Clean, modular design
• Open-source
waitless logging lockless metrics composed config
channels buffers timer alarm
poo
ling
streams events
data store
parse/compose/tracedata model
request response
server
orchestration
threading
common
core
cache
process
pelikan.io
A COMPARISON
PERFORMANCE DESIGN DECISIONS
latency-oriented
threading
Memory/
fragmentation
Memory/
buffer caching
Memory/
pre-allocation, cap
locking
Memcached partial internal partial partial yes
Redis no->partial external no partial no->yes
Pelikan yes internal yes yes no
MEMCACHED REDIS
TO BE FAIR…
• multiple worker threads
• binary protocol + SASL
• rich set of data structures
• master-slave replication
• redis-cluster
• modules
• tools
ALWAYS FAST
THE BEST CACHE IS…
QUESTIONS?
Watch the video with slide synchronization on
InfoQ.com!
https://www.infoq.com/presentations/pelikan

Contenu connexe

Similaire à In-memory Caching: Curb Tail Latency with Pelikan

Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
Genoveva Vargas-Solar
 

Similaire à In-memory Caching: Curb Tail Latency with Pelikan (20)

How Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet TrafficHow Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet Traffic
 
Logging at scale: doing more with less
Logging at scale: doing more with lessLogging at scale: doing more with less
Logging at scale: doing more with less
 
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
Have Your Cake and Eat It Too -- Further Dispelling the Myths of the Lambda A...
 
Getting started with Splunk Breakout Session
Getting started with Splunk Breakout SessionGetting started with Splunk Breakout Session
Getting started with Splunk Breakout Session
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windows
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia Cetax
 
Cashing in on logging and exception data
Cashing in on logging and exception dataCashing in on logging and exception data
Cashing in on logging and exception data
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
How fluentd fits into the modern software landscape
How fluentd fits into the modern software landscapeHow fluentd fits into the modern software landscape
How fluentd fits into the modern software landscape
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
 
Asynchronous micro-services and the unified log
Asynchronous micro-services and the unified logAsynchronous micro-services and the unified log
Asynchronous micro-services and the unified log
 
The Incremental Path to Observability
The Incremental Path to ObservabilityThe Incremental Path to Observability
The Incremental Path to Observability
 
The Present and Future of Serverless Observability
The Present and Future of Serverless ObservabilityThe Present and Future of Serverless Observability
The Present and Future of Serverless Observability
 
The Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data ImplementationThe Great Lakes: How to Approach a Big Data Implementation
The Great Lakes: How to Approach a Big Data Implementation
 
Joe witt may2015_kafka_nyc_apachenifi-overview
Joe witt may2015_kafka_nyc_apachenifi-overviewJoe witt may2015_kafka_nyc_apachenifi-overview
Joe witt may2015_kafka_nyc_apachenifi-overview
 
Playback data systems
Playback data systemsPlayback data systems
Playback data systems
 
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
 
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
Webcast slides for "Low Risk and High Reward in App Decomm with InfoArchive a...
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real World
 

Plus de C4Media

Plus de C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Dernier

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Dernier (20)

Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 

In-memory Caching: Curb Tail Latency with Pelikan