The document describes Hootsuite's scaling journey from using Apache and PHP on one MySQL server to a microservices architecture using multiple technologies like Nginx, PHP-FPM, Memcached, MongoDB, Gearman, and Scala/Akka services communicating via ZeroMQ. Key steps included caching with Memcached to reduce MySQL load, using Gearman for asynchronous tasks, and MongoDB for large datasets. Monitoring with Statsd, Logstash and Elasticsearch was added for visibility. They moved to a service-oriented architecture with independent services to keep scaling their large codebase and engineering team.
7. Solution - Caching
Memcached.
● Distributed cache, cluster of boxes with lots of RAM, trivial to scale
● Cache as much as possible, invalidate only when necessary
● Use cache instead of DB
● No joins - decouple entities (collection caching)
● Twemproxy!
8. “There are only two hard things in
Computer Science: cache invalidation and
naming things.”
• Phil Karlton
10. Solution - Caching
SELECT * FROM member WHERE org_id=888
set individual cache records
member_1 {data}
member_5 {data}
member_9 {data}
set collection cache
member_org_888 [1,5,9]
Automatic invalidation of collection cache
11. Solution - Caching
It’s hard to scale MySQL horizontally
Now:
● No need to scale MySQL
● Able to serve the whole site on 1 MySQL server
● 500 MySQL SELECTs per second. 50,000 Memcached GETs.
● 99+% hit rate
13. Problem
Need a way to perform asynchronous, distributed tasks using a
single-threaded language.
14. Solution - Gearman
Gearman.
● Distribute work to other servers to handle (workers also using
PHP, same codebase)
● Precursor to SOA where everything is truly distributed
● Many other solutions, queueing systems.
16. Solution - Gearman
Need a way to perform asynchronous, distributed tasks using a
single-threaded language.
Now:
● Moved key tasks to Gearman
● Another cluster, scalable separately from web
● Discrete tasks, callable sync or async
18. Problem
Need to store data with the potential to grow too big to handle
effectively with MySQL.
19. Solution - MongoDB
MongoDB.
● Certain data did not need to be highly relational
● NoSQL DB, many other solutions these days
● Mongo can be a pain, lots of moving parts
● Had to make our own sequencer where auto-incremented ids were
necessary
20. Solution - MongoDB
Need to store data with the potential to grow too big to handle
effectively with MySQL.
Now:
● Multiple clusters containing amounts of data that likely would
have crushed MySQL
● Billions of rows per collection, many TB of data on disk
23. Problem
With a codebase and an engineering team increasing in size, how do
we keep up the pace of development and maintain control of the
system?
(SVN, big branches, merge hell)
24. Solution - Dark Launching
Dark Launching.
● Wrap code in block with a specific name
● That name will appear in a management page
● Can control whether or not that block is executed by modifying it’s value
● Boolean , random percentage, session-based, member list, organization
list, etc.
25. Solution - Dark Launching
if (In_Feature::isEnabled(‘TWITTER_ADS’)) {
// execute new code
} else {
// execute old code
}
26. Dark Launching - Reasons
• Control your code
• Limit risk -> raise confidence -> speed up pace of releases
• “Branching in Production”
• Learning happens in Production
27. Solution - Dark Launching
With a codebase and an engineering team increasing in size, how do
we keep up the pace of development and maintain control of the
system?
Now:
● Work fast with more confidence
● Huge amount of control over production systems
● Typically 10+ code releases to production per day
● Push-based distribution with Consul
32. Solution - Monitoring
Logger::event('user liked from in-stream', In_Log::CATEGORY_UX, $logData);
33. Solution - Monitoring
• Visibility into the performance and behaviour of your application
• Iterate upon your code, measure results
• Pairs well with dark launching
• Also systems like New Relic
34. Solution - Monitoring
With a rapidly increasing codebase and amount of users / traffic
how do we keep visibility into the performance of the code?
Now:
● Able to watch performance / behaviour in real time.
● Able to view important events both in the aggregate or very
granular
● Able to control the system and watch the effect of changes
39. Optimizations - Push work to users
• Within reason, push work up to users
• Make your users into a distributed processing grid
• e.g. Stream rendering
40. Optimizations - Performance / Risks
• Performance is more important than clean code, business reqts
(in the instances where they may be mutually exclusive)
• Fine line between future proofing and premature optimization
• Don’t add burdensome processes, but make it easy for your team
to do things the right way
• Know your weak spots, protect against abuse
43. Problem
With a huge and growing monolithic codebase and over 80
engineers, how to keep scaling in a manageable way?
44. Solution - SOA
SOA.
● Split up the system into independent services which communicate only via APIs
● Teams can work on their own services with encapsulated business logic and have their own
deployment schedules.
● We chose to use Scala/Akka for services, communicating via ZeroMQ
● SOA transition made easier by the “no joins” philosophy
● Tons of work
45. Solution - SOA
SOM.
● “Service Oriented Monolith”
● When splitting up a monolithic codebase, dependencies are what kill you
● Fulfill dependencies by writing interim services using existing PHP code
● Maintain the contract and future scala services will be drop-in
replacements
46. Solution - SOA
With a huge and growing monolithic codebase and over 130
engineers, how to keep scaling in a manageable way?
Today:
● Transitioning to Scala SOA
● PHP will still be used as the Façade, a thin layer built on top of
the business logic of the services it interacts with.