Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1FQhXpx.
Jon Hoffman discusses the general architecture, storage systems and development practices created to handle the ever increasing volume and complexity at Foursquare. Filmed at qconnewyork.com.
Jon Hoffman has been a software engineer at Foursquare for over 4 years. He's led teams building features and backend infrastructure. Before Foursquare Jon spent a brief time working at a three person startup, built distributed systems at Goldman Sachs, worked on VoIP apps for the telephone company, created a medical record app for palm pilot, and graduated from Carnegie Mellon.
Scaling Foursquare: From Check-ins to Recommendations
1. From check-ins to recommendations
Jon Hoffman @hoffrocket
QCon NYC – June 11, 2014
2. Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/scale-foursquare
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
3. Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
11. Outgrowing our hardware
• Not enough RAM for indexes and
working data set
• 100 writes/second/disk
12. Sharding
• Manage ourselves in application code on
top of postgres?
• Use something called Cassandra?
• Use something called HBase?
• Use something called Mongo?
13.
14. Besides Mongo
• Memcache
• Elastic search
– nearby venue search
– user search
• Custom data services
– Read only key value server
– in memory cache with business logic
15. HFile Service: Read only KV Store
Hadoop HFile Servers
MR HDFS
hfile_0_a
hfile_0_b
hfile_1_b
hfile_0
hfile_1
Application
Servers
Zookeeper:
- data type to machine mapping
- key hash to shard mapping
hfile_1_a
23. Monolithic problems
• Compiling all the code, all the time
• Deploying all the code all the time
• Hard to isolate cause of performance
regressions and resource leaks
24. SOA Infancy
• Single codebase, Multiple builds
Web
API
Offline
25. Finagle Era
• Twitter’s scala based RPC library
service
Geocoder
{
GeocodeResponse
geocode(
1:
GeocodeRequest
r
)
}
26. Benefits
• Independent compile targets
• Fined grained control on releases and
bug fixes
• Functional isolation
27.
28. Problems
• Duplication in packaging and
deployment efforts
• Hard to trace execution problems
• Hard to define/change where things live
• Networks aren’t reliable
29. Builds and deploys
• single service definition file
• consistent build packaging
• simple deployment of canary & fleet
./service_releaser
–j
service_name
30. Monitoring
• healthcheck endpoint over http
• consistent metric names
• dashboard for every service
34. Circuit Breaking
• Fast failing RPC calls after some error
rate threshold
• Loosely based on Netflix’s hystrix
35. SOA Problem Recap
• Duplication in packaging and deployment efforts
– Build and deploy automation
• Hard to trace execution problems
– Monitoring consistency
– Distributed Tracing
– Error aggregation
• Hard to define/change where things live
– Application discovery with zookeeper
• Networks aren’t reliable
– Circuit breaking
36. Organization
• Smaller teams owning front to back
implementation of features
• Desire to have quick deploy cycles on
new API endpoints
37. Remote Endpoints
Wouldn’t it be cool if a developer
could expose a new API endpoint
without redeploying our still
monolithic API server?
38.
39.
40. Remote Endpoint Benefits
• Very easy to experiment with new
endpoints
• Tight contract for service interaction
– JSON responses
– all http params passed along
• Clear path to breaking off more chunks
from API monolith
41. Future work: Part 3?
• Further isolating services with
independent storage layers?
• Completely automated continuous
deployment
• Hybrid immutable/mutable data storage
– mongo & hfile & cache service
42. Thanks!
• Want to build these things?
https://foursquare.com/jobs
• jon@foursquare.com
43. Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/scale-foursquare