SlideShare une entreprise Scribd logo
1  sur  43
From check-ins to recommendations 
Jon Hoffman @hoffrocket 
QCon NYC – June 11, 2014
Watch the video with slide 
synchronization on InfoQ.com! 
http://www.infoq.com/presentations 
/scale-foursquare 
InfoQ.com: News & Community Site 
• 750,000 unique visitors/month 
• Published in 4 languages (English, Chinese, Japanese and Brazilian 
Portuguese) 
• Post content from our QCon conferences 
• News 15-20 / week 
• Articles 3-4 / week 
• Presentations (videos) 12-15 / week 
• Interviews 2-3 / week 
• Books 1 / month
Presented at QCon New York 
www.qconnewyork.com 
Purpose of QCon 
- to empower software development by facilitating the spread of 
knowledge and innovation 
Strategy 
- practitioner-driven conference designed for YOU: influencers of 
change and innovation in your teams 
- speakers and topics driving the evolution and innovation 
- connecting and catalyzing the influencers and innovators 
Highlights 
- attended by more than 12,000 delegates since 2007 
- held in 9 cities worldwide
About Foursquare
Scaling in two parts 
• Part one: data storage 
• Part two: application complexity
Part 1: Data Storage 
2009
Table splits 
DB.A 
DB Checkins 
Venues 
Checkins 
Users 
Friends 
Venues 
DB.B 
Users 
Friends
Replication 
Master 
RW 
Slave 
RO 
Slave 
RO
Outgrowing our hardware 
• Not enough RAM for indexes and 
working data set 
• 100 writes/second/disk
Sharding 
• Manage ourselves in application code on 
top of postgres? 
• Use something called Cassandra? 
• Use something called HBase? 
• Use something called Mongo?
Besides Mongo 
• Memcache 
• Elastic search 
– nearby venue search 
– user search 
• Custom data services 
– Read only key value server 
– in memory cache with business logic
HFile Service: Read only KV Store 
Hadoop HFile Servers 
MR HDFS 
hfile_0_a 
hfile_0_b 
hfile_1_b 
hfile_0 
hfile_1 
Application 
Servers 
Zookeeper: 
- data type to machine mapping 
- key hash to shard mapping 
hfile_1_a
Caching Services 
Mongo Oplog Tailer Kafka 
Kafka 
Consumers 
Redis 
Cache 
Servers 
Application 
Servers 
getUserVenueCounts( 
1: list<i64> userIds 
2: list<ObjectId> venues)
Part 2: application complexity 
2009
RPC Tracing
Throttles
Remember the goats?
Monolithic problems 
• Compiling all the code, all the time 
• Deploying all the code all the time 
• Hard to isolate cause of performance 
regressions and resource leaks
SOA Infancy 
• Single codebase, Multiple builds 
Web 
API 
Offline
Finagle Era 
• Twitter’s scala based RPC library 
service 
Geocoder 
{ 
GeocodeResponse 
geocode( 
1: 
GeocodeRequest 
r 
) 
}
Benefits 
• Independent compile targets 
• Fined grained control on releases and 
bug fixes 
• Functional isolation
Problems 
• Duplication in packaging and 
deployment efforts 
• Hard to trace execution problems 
• Hard to define/change where things live 
• Networks aren’t reliable
Builds and deploys 
• single service definition file 
• consistent build packaging 
• simple deployment of canary & fleet 
./service_releaser 
–j 
service_name
Monitoring 
• healthcheck endpoint over http 
• consistent metric names 
• dashboard for every service
Distributed Tracing
Exception Aggregation
Application Discovery 
• Finagle Server Sets + ZK
Circuit Breaking 
• Fast failing RPC calls after some error 
rate threshold 
• Loosely based on Netflix’s hystrix
SOA Problem Recap 
• Duplication in packaging and deployment efforts 
– Build and deploy automation 
• Hard to trace execution problems 
– Monitoring consistency 
– Distributed Tracing 
– Error aggregation 
• Hard to define/change where things live 
– Application discovery with zookeeper 
• Networks aren’t reliable 
– Circuit breaking
Organization 
• Smaller teams owning front to back 
implementation of features 
• Desire to have quick deploy cycles on 
new API endpoints
Remote Endpoints 
Wouldn’t it be cool if a developer 
could expose a new API endpoint 
without redeploying our still 
monolithic API server?
Remote Endpoint Benefits 
• Very easy to experiment with new 
endpoints 
• Tight contract for service interaction 
– JSON responses 
– all http params passed along 
• Clear path to breaking off more chunks 
from API monolith
Future work: Part 3? 
• Further isolating services with 
independent storage layers? 
• Completely automated continuous 
deployment 
• Hybrid immutable/mutable data storage 
– mongo & hfile & cache service
Thanks! 
• Want to build these things? 
https://foursquare.com/jobs 
• jon@foursquare.com
Watch the video with slide synchronization on 
InfoQ.com! 
http://www.infoq.com/presentations/scale-foursquare

Contenu connexe

Plus de C4Media

Plus de C4Media (20)

Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 

Dernier

Dernier (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Scaling Foursquare: From Check-ins to Recommendations

  • 1. From check-ins to recommendations Jon Hoffman @hoffrocket QCon NYC – June 11, 2014
  • 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /scale-foursquare InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 5. Scaling in two parts • Part one: data storage • Part two: application complexity
  • 6. Part 1: Data Storage 2009
  • 7.
  • 8.
  • 9. Table splits DB.A DB Checkins Venues Checkins Users Friends Venues DB.B Users Friends
  • 10. Replication Master RW Slave RO Slave RO
  • 11. Outgrowing our hardware • Not enough RAM for indexes and working data set • 100 writes/second/disk
  • 12. Sharding • Manage ourselves in application code on top of postgres? • Use something called Cassandra? • Use something called HBase? • Use something called Mongo?
  • 13.
  • 14. Besides Mongo • Memcache • Elastic search – nearby venue search – user search • Custom data services – Read only key value server – in memory cache with business logic
  • 15. HFile Service: Read only KV Store Hadoop HFile Servers MR HDFS hfile_0_a hfile_0_b hfile_1_b hfile_0 hfile_1 Application Servers Zookeeper: - data type to machine mapping - key hash to shard mapping hfile_1_a
  • 16. Caching Services Mongo Oplog Tailer Kafka Kafka Consumers Redis Cache Servers Application Servers getUserVenueCounts( 1: list<i64> userIds 2: list<ObjectId> venues)
  • 17. Part 2: application complexity 2009
  • 19.
  • 22.
  • 23. Monolithic problems • Compiling all the code, all the time • Deploying all the code all the time • Hard to isolate cause of performance regressions and resource leaks
  • 24. SOA Infancy • Single codebase, Multiple builds Web API Offline
  • 25. Finagle Era • Twitter’s scala based RPC library service Geocoder { GeocodeResponse geocode( 1: GeocodeRequest r ) }
  • 26. Benefits • Independent compile targets • Fined grained control on releases and bug fixes • Functional isolation
  • 27.
  • 28. Problems • Duplication in packaging and deployment efforts • Hard to trace execution problems • Hard to define/change where things live • Networks aren’t reliable
  • 29. Builds and deploys • single service definition file • consistent build packaging • simple deployment of canary & fleet ./service_releaser –j service_name
  • 30. Monitoring • healthcheck endpoint over http • consistent metric names • dashboard for every service
  • 33. Application Discovery • Finagle Server Sets + ZK
  • 34. Circuit Breaking • Fast failing RPC calls after some error rate threshold • Loosely based on Netflix’s hystrix
  • 35. SOA Problem Recap • Duplication in packaging and deployment efforts – Build and deploy automation • Hard to trace execution problems – Monitoring consistency – Distributed Tracing – Error aggregation • Hard to define/change where things live – Application discovery with zookeeper • Networks aren’t reliable – Circuit breaking
  • 36. Organization • Smaller teams owning front to back implementation of features • Desire to have quick deploy cycles on new API endpoints
  • 37. Remote Endpoints Wouldn’t it be cool if a developer could expose a new API endpoint without redeploying our still monolithic API server?
  • 38.
  • 39.
  • 40. Remote Endpoint Benefits • Very easy to experiment with new endpoints • Tight contract for service interaction – JSON responses – all http params passed along • Clear path to breaking off more chunks from API monolith
  • 41. Future work: Part 3? • Further isolating services with independent storage layers? • Completely automated continuous deployment • Hybrid immutable/mutable data storage – mongo & hfile & cache service
  • 42. Thanks! • Want to build these things? https://foursquare.com/jobs • jon@foursquare.com
  • 43. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/scale-foursquare