3. Traffic History at Yelp
• 2005 - 2007
– ~200k searches/day
– ~850k reviews total
– www.yelp.com
– LAMP (P - Python)
– Pushes were daily
– Master-Slave MySQL setup, 1 main database,
mix of InnoDB and MyISAM tables
– Turned on gzip on Apache
– Squid for image proxying
4. As we grew so did our infrastructure… 2008 - 2009
5. Traffic History at Yelp
• 2008 - 2009
– Search scaling
• sharded by geography
• index distribution: rsync w/ fadvise
– Log aggregation:
• Syslog + rsync + s3
• scribe + s3
– Gearman used for async queue processing
– MySQL split vertically into 3 databases
– Dirty session cookie
– Mobile apps: iPhone, Android, Blackberry,
etc…
– 4 countries
7. Infrastructure History at Yelp
• 2010 - 2011
– Introduced “read only” mode for the
site
– First CDN put into use
– Photos migrated to s3
– mrjob is built/open sourced
– AWS EMR is used for mrjob
processing
– Managed DNS - DynDNS
– 13 countries
9. Infrastructure History at Yelp
• 2012 - 2013
– Introduced “read only” datacenters
• Cacheserv introduced
– Load balance traffic between
datacenters
– Elasticsearch
– Pre-IPO traffic, we added the ability to
quickly reduce load
– Direct connection with AWS
– All schema changes are done, online
– Moved to all FusionIO for DB hosts
– 24 countries
11. Traffic Infrastructure - current picture
• 2014 - current
– abusive scraping
– Starting to serve traffic from EC2 (for
Asia/Europe)
– Elasticsearch, Logstash, and Kibana
– Gearman w/ MySQL
– Kafka == Scribe
– Pyleus (doing work in real-time)
– 29 countries
12. Pyleus: A Python Framework for Storm Topologies
● Pyleus: Yelp’s super new Python Storm bindings
● Build topologies in Python
● Declaratively describe structure in YAML
● Respects requirements.txt
● Compose a topology from Python packaged components!