What happens when site traffic outgrows the ability of your Ruby on Rails site (and related infrastructure) to handle it? What does Rails provide to solve this problem, and where do those built-ins break down? Where do you go from there?
5. Who am I? Why should you care?
Proprietary and
Confidential
■ Application developer
Many and various technologies.
Worked with Rails for ~5 years.
■ Recent focus has been
operational
Chef, PostgreSQL, SmartOS,
monitoring
■ TDD, BDD, Agile, DevOps, SOA, etc
I care about how code is organized, and always want to
learn how to do that better.
■ Now I’m at Wanelo
Thursday, June 6, 13
6. What is Wanelo?
Proprietary and
Confidential
■ Wanelo (“Wah-nee-lo” from Want, Need
Love) is a global platform for shopping.
Thursday, June 6, 13
13. What is the bad place?
Proprietary and
Confidential
green: disk reads, red: disk writes on DB server
Is this actually a problem?
Thursday, June 6, 13
14. What is the bad place?
Proprietary and
Confidential
Can be difficult to predict, using most of the
default metrics we track
green: disk reads, red: disk writes on DB server
Thursday, June 6, 13
22. Both are important!
Proprietary and
Confidential
■ Tracking saturation helps you predict
problems
■ Tracking utilization can tell you how to
solve that problem
■ Basically, watch every video by
Brendan Gregg on YouTube
http://www.youtube.com/results?search_query=brendan+gregg
Thursday, June 6, 13
23. Know the limits of your data
Proprietary and
Confidential
Averages can be extremely useful, but
do not give a complete picture
Thursday, June 6, 13
24. Know the limits of your data
Proprietary and
Confidential
Outliers can cause severe problems
even when average is great
Thursday, June 6, 13
25. This is why we ❤ PostgreSQL
Proprietary and
Confidential
■ pg_stat_activity
■ pg_stat_user_indexes
■ pg_stat_user_tables
■ pg_stat_statements
■ PostgreSQL gives you tools to monitor
it and operate at scale
Thursday, June 6, 13
32. What were the goals of caching?
Proprietary and
Confidential
■ Using model attributes as cache keys
means you need to fetch DB records
■ At high scale, fetching DB records for
every page becomes problematic
■ Cache sweepers are painful, but
they're the only thing we've found to
be scalable and reliable
Thursday, June 6, 13
33. Action cache all the things!
Proprietary and
Confidential
■ Cache hits skip all rendering
■ Still able to run before filters, for
instance when doing A/B testing
■ Some cached pages can be put
behind a CDN
■ Requires page personalization to be
added via Ajax
Thursday, June 6, 13
36. Fragment cache the rest!
Proprietary and
Confidential
■ Fragments can be shared between pages
■ Can reduce rendering time
■ Can remove queries for related records
Thursday, June 6, 13
37. Proprietary and
Confidential
■ Difficult to remove top level query
■ Joins/eager loading means some related
records are still queried
■ How many trips to memcached?
Thursday, June 6, 13
38. Proprietary and
Confidential
def multi_get_on_collection(objects,
cache_options = {})
cache_keys = objects.
inject(ActiveSupport::OrderedHash.new) do |key_map, obj|
key_map[obj] = cache_options[:cache_key_proc].call obj
key_map
end
pre_rendered_objects = Rails.cache.read_multi *cache_keys.values
cache_keys.map do |object, cache_key|
cached_html = pre_rendered_objects[cache_key]
if cached_html.present? && caching_enabled?
cached_html
else
cache_options[:render_proc].call(object).tap do |fragment|
Rails.cache.write cache_key, fragment, cache_options
end
end
end
end
Thursday, June 6, 13
39. Proprietary and
Confidential
def multi_get_on_collection(objects,
cache_options = {})
cache_keys = objects.
inject(ActiveSupport::OrderedHash.new) do |key_map, obj|
key_map[obj] = cache_options[:cache_key_proc].call obj
key_map
end
pre_rendered_objects = Rails.cache.read_multi *cache_keys.values
cache_keys.map do |object, cache_key|
cached_html = pre_rendered_objects[cache_key]
if cached_html.present? && caching_enabled?
cached_html
else
cache_options[:render_proc].call(object).tap do |fragment|
Rails.cache.write cache_key, fragment, cache_options
end
end
end
end
Thursday, June 6, 13
40. Proprietary and
Confidential
def multi_get_on_collection(objects,
cache_options = {})
cache_keys = objects.
inject(ActiveSupport::OrderedHash.new) do |key_map, obj|
key_map[obj] = cache_options[:cache_key_proc].call obj
key_map
end
pre_rendered_objects = Rails.cache.read_multi *cache_keys.values
cache_keys.map do |object, cache_key|
cached_html = pre_rendered_objects[cache_key]
if cached_html.present? && caching_enabled?
cached_html
else
cache_options[:render_proc].call(object).tap do |fragment|
Rails.cache.write cache_key, fragment, cache_options
end
end
end
end
Thursday, June 6, 13
41. MultiGet is your friend
Proprietary and
Confidential
■ Turns 100+ memcached calls into a
single request
cache_key = ->(product) do
"products_thumb_#{product.id}"
end
renderer = ->(product) do
render "products/thumb", model: product
end
multi_get_on_collection(@products,
cache_key_proc: cache_key,
render_proc: renderer,
expires_in: 6.hours).join("n").html_safe
Thursday, June 6, 13
53. Rails has the answer!
Proprietary and
Confidential
■ counter_cache column on table
■ Adding records executes INCR
■ Removing records executes DECR
class Product < ActiveRecord::Base
belongs_to :store, counter_cache: true
end
Thursday, June 6, 13
55. Use background jobs
Proprietary and
Confidential
■ Stop updating counter caches on save
■ Queue a delayed job for the near future
■ Job performs a complete recalculation of
the counter cache and is idempotent
■ The higher the count, the further we
delay the job (less likely users will notice)
Thursday, June 6, 13
56. Deduplicate delayed jobs
Proprietary and
Confidential
■ Sidekiq with UniqueJob plugin
■ Updates are serialized via a fixed
number of workers
■ Workers can be stopped to alleviate DB
load in an emergency
■ Same pattern can be applied elsewhere,
like updating Solr indexes
Thursday, June 6, 13
57. Deduplicate delayed jobs
Proprietary and
Confidential
class UpdateProductCountWorker < WaneloWorker
sidekiq_options queue: :product_counts,
unique: true
wait 10.minutes
def perform!(params)
id = params[:id].to_i
ActiveRecord::Base.connection.execute %Q{
update stores set product_count =
(select count(*)
from products
where store_id = stores.id)
where stores.id = #{id}
}
end
end
Thursday, June 6, 13
58. STILL TOO MANY COUNTS
Proprietary and
ConfidentialThursday, June 6, 13
63. Proprietary and
Confidential
■ Kaminari executes a count(*) to determine
total page count
SELECT "stores".* FROM "stores"
WHERE (state = 'approved')
LIMIT 20 OFFSET 0
SELECT COUNT(*) FROM "stores" WHERE (state = 'approved')
Pagination gems run counts
Thursday, June 6, 13
64. Proprietary and
Confidential
■ Kaminari executes a count(*) to determine
total page count
SELECT "stores".* FROM "stores"
WHERE (state = 'approved')
LIMIT 20 OFFSET 0
SELECT COUNT(*) FROM "stores" WHERE (state = 'approved')
Pagination gems run counts
■ We paginate EVERYTHING
■ We often already know total count from
counter cache (or can hard code 10,000)
Thursday, June 6, 13
65. Proprietary and
Confidential
module Kaminari
module ActiveRecordRelationMethods
# a workaround for AR 3.0.x that returns 0 for #count when page > 1
# if +limit_value+ is specified, load all the records and count them
if ActiveRecord::VERSION::STRING < '3.1'
def count(column_name = nil, options = {}) #:nodoc:
limit_value ? length : super(column_name, options)
end
end
def total_count(column_name = nil, options = {}) #:nodoc:
@total_count ||= begin
c = except(:offset, :limit, :order)
# Remove includes only if they are irrelevant
c = c.except(:includes) unless references_eager_loaded_tables?
# .group returns an OrderdHash that responds to #count
c = c.count(column_name, options)
if c.is_a?(ActiveSupport::OrderedHash)
c.count
else
c.respond_to?(:count) ? c.count(column_name, options) : c
end
end
end
end
end
Thursday, June 6, 13
66. Proprietary and
Confidential
Time for some monkey patches!
module ActiveRecord
class Relation
def custom_counter(count)
@total_count ||= count
self
end
end
end
module Sunspot
module Search
class AbstractSearch
def custom_counter(count)
@total ||= count
self
end
end
end
end
@products = @store.products.
custom_counter(@store.products_count).
page(params[:page])
Thursday, June 6, 13
67. How do we know what
Rails is really doing?
Proprietary and
ConfidentialThursday, June 6, 13
68. ruby-prof and pilfer
Proprietary and
Confidential
■ Profile the entire stack trace of an action
or a set of classes
https://github.com/eric/pilfer
https://github.com/ruby-prof/ruby-prof
Thursday, June 6, 13
69. Proprietary and
Confidential
■ Can work as Rack middleware
■ Outputs HTML that will tell you...
if Rails.env.profile?
use Rack::RubyProf, :path => '/tmp/profile'
end
Thursday, June 6, 13
70. Proprietary and
Confidential
■ Can work as Rack middleware
■ Outputs HTML that will tell you...
if Rails.env.profile?
use Rack::RubyProf, :path => '/tmp/profile'
end
■ How long are we spending generating
URLs???
Thursday, June 6, 13
74. HTML vs JSON
Proprietary and
Confidential
■ Since launching native iOS and Android
apps, the majority of Wanelo traffic is
served via a JSON API
■ The longer we spend rendering JSON,
the more CPUs are tied up, the more
servers we need
Thursday, June 6, 13
76. Things we did not expect
Proprietary and
Confidential
■ RABL serializes, then deserializes JSON
to do merges!
■ ActiveSupport defines inefficient :to_json
on every object
Thursday, June 6, 13
77. Proprietary and
Confidential
■ Rendering JSON partials should
hash.merge!
■ Fragment caching should marshal hashes,
not JSON
■ Caching should allow for MultiGet
■ Should allow for arbitrary composition
■ JSON conversion should use OJ to
call :to_json ONCE
https://github.com/wanelo/compositor
Thursday, June 6, 13
78. Proprietary and
Confidential
require 'active_support/json'
require 'active_support/core_ext/object/to_json'
[Object, Array, FalseClass, Float,
Hash, Integer, NilClass, String,
TrueClass].each do |klass|
klass.class_eval do
def to_json(options = nil)
Oj.dump(self, options)
end
end
end
Thursday, June 6, 13
80. Do you read, or do you write?
Proprietary and
Confidential
■ Tools like iostat, vmstat, kstat, collectd
pg_stat_user_tables can show you
utilization
■ Reads much easier to scale than writes.
Read/write splitting, caching.
■ What do you do when writes become the
bottleneck?
Thursday, June 6, 13
81. Writes committed to disk?
Proprietary and
Confidential
■ Some workloads are more lenient for
delay/possible data loss
■ Applicable to many technologies
■ Even microsecond delays can reduce
load
■ Waiting for Solr to respond can keep DB
transactions open longer
Thursday, June 6, 13
83. i.e. Solr
Proprietary and
Confidential
<!-- Perform a <commit/> automatically under certain
conditions -->
<autoCommit>
<!-- number of updates since last commit -->
<maxDocs>1000</maxDocs>
<!-- oldest uncommited update (in ms) long ago -->
<maxTime>30000</maxTime>
</autoCommit>
Thursday, June 6, 13
84. What happens when data
grows larger than a
single database?
Proprietary and
ConfidentialThursday, June 6, 13
87. The only real way to scale
Proprietary and
Confidential
■ Tune code, infrastructure for a very
particular workload
■ Hide sharding from other codebases
■ Allow small teams to manage small(er)
codebases
Thursday, June 6, 13
88. The only real way to scale
Proprietary and
Confidential
■ Tune code, infrastructure for a very
particular workload
■ Hide sharding from other codebases
■ Allow small teams to manage small(er)
codebases
■ Everyone talks about why, no-one talks
about how in the Rails world
Thursday, June 6, 13
89. Services are hard (the first time)
Proprietary and
Confidential
■ Synchonous vs asynchronous data
persistence
■ Message passing
■ Testing
■ Iterative development
Thursday, June 6, 13
90. Iteration is the key
Proprietary and
Confidential
■ Isolate data
■ Isolate interface
■ Extract interface with an database
adapter
■ Launch service layer
■ Switch interface to user service adapter
Thursday, June 6, 13
93. Proprietary and
Confidential
class Product < ActiveRecord::Base
has_many :saves
end X
class Product < ActiveRecord::Base
def saves
Save.where(product_id: self.id)
end
end
Thursday, June 6, 13
94. Proprietary and
Confidential
class Product < ActiveRecord::Base
has_many :saves
end X
class Product < ActiveRecord::Base
def saves
Save.where(product_id: self.id)
end
end
Do this everywhere (you have tests, right?)
Thursday, June 6, 13
95. Proprietary and
Confidential
class Save < ActiveRecord::Base
establish_connection "saves_#{Rails.env}"
end
■ Set up a read replica
■ Take down site
■ Promote replica to be a master
■ Restart unicorns with new config
■ Bring up site
■ Clean up unnecessary tables on each DB
Thursday, June 6, 13
96. Proprietary and
Confidential
class Save < ActiveRecord::Base
establish_connection "saves_#{Rails.env}"
def self.by_product(product)
where(product_id: product.id)
end
end
class Product < ActiveRecord::Base
def saves
Save.by_product(self)
end
end
■ Reduce Ruby interface to minimum possible
■ Easy to deploy this
Thursday, June 6, 13
97. Proprietary and
Confidential
class Save
include SavesClient
end
module SavesClient
def self.included(other)
other.send(:attr_accessor, :id, :product_id)
other.extend ClientClassMethods
end
module ClientClassMethods
def by_product(*args)
adapter.new(self).by_product(*args)
end
def adapter
@adapter ||= SavesClient::DbAdapter
end
end
end
Thursday, June 6, 13
98. Proprietary and
Confidential
class Save
include SavesClient
end
module SavesClient
def self.included(other)
other.send(:attr_accessor, :id, :product_id)
other.extend ClientClassMethods
end
module ClientClassMethods
def by_product(*args)
adapter.new(self).by_product(*args)
end
def adapter
@adapter ||= SavesClient::DbAdapter
end
end
end
Thursday, June 6, 13
99. Proprietary and
Confidential
class Save
include SavesClient
end
module SavesClient
def self.included(other)
other.send(:attr_accessor, :id, :product_id)
other.extend ClientClassMethods
end
module ClientClassMethods
def by_product(*args)
adapter.new(self).by_product(*args)
end
def adapter
@adapter ||= SavesClient::DbAdapter
end
end
end
Thursday, June 6, 13
100. Proprietary and
Confidential
module SavesClient
class DbAdapter
def self.close_connections
# Check in the database connection,
# since we're shutting down this thread
SavesService::Save.clear_active_connections!
end
def by_product(*args)
relation :by_product, *args
end
def relation(method, *args)
SavesClient::AdapterRelation.new(self,
SavesService::Save.send(method, *args))
end
def all(scope)
# Scope is an AR Relation instance returned from
# SavesService::Save.by_product(product)
scope.all.map { |m| client_class.new save_attrs_from(m) }
end
end
end
Thursday, June 6, 13
101. Proprietary and
Confidential
module SavesClient
class DbAdapter
def self.close_connections
# Check in the database connection,
# since we're shutting down this thread
SavesService::Save.clear_active_connections!
end
def by_product(*args)
relation :by_product, *args
end
def relation(method, *args)
SavesClient::AdapterRelation.new(self,
SavesService::Save.send(method, *args))
end
def all(scope)
# Scope is an AR Relation instance returned from
# SavesService::Save.by_product(product)
scope.all.map { |m| client_class.new save_attrs_from(m) }
end
end
end
Thursday, June 6, 13
102. Proprietary and
Confidential
module SavesClient
class DbAdapter
def self.close_connections
# Check in the database connection,
# since we're shutting down this thread
SavesService::Save.clear_active_connections!
end
def by_product(*args)
relation :by_product, *args
end
def relation(method, *args)
SavesClient::AdapterRelation.new(self,
SavesService::Save.send(method, *args))
end
def all(scope)
# Scope is an AR Relation instance returned from
# SavesService::Save.by_product(product)
scope.all.map { |m| client_class.new save_attrs_from(m) }
end
end
end
Thread safety is EXTREMELY important
Thursday, June 6, 13
103. Proprietary and
Confidential
module SavesClient
class AdapterRelation
attr_reader :adapter, :scope
def initialize(adapter, scope)
@adapter, @scope = adapter, scope
end
def limit(num); end
def order(order); end
def page(num); end
def first; end
def all
adapter.all(scope)
end
end
end
■ Calling :by_product instantiates a Relation
■ Calling :all executes the query
Thursday, June 6, 13
104. What executes the query?
Proprietary and
Confidential
■ The ActiveRecord model moves into the gem
■ The adapter translates the Save methods into AR calls,
maps columns into attributes on our class
■ Important to deploy this at this stage, as there many
problems to solve, like:
■ Getting your tests green, fixtures consistent
■ Figuring out whether you really covered all access
patterns
module SavesService
class Save < ActiveRecord::Base
end
end
Thursday, June 6, 13
105. Minimize the Ruby access
Proprietary and
Confidential
■ We were able to reduce everything to 7 scopes,
i.e. :by_product, :by_user, etc
■ Reduced Relation methods to these:
■ limit
■ page
■ order
■ count
■ all
■ first
■ last
■ pluck
■ find_in_batches
Thursday, June 6, 13
106. Launch the service layer
Proprietary and
Confidential
■ Now that we have a small public Ruby
interface, we can pair that to a Sinatra app
■ Sinatra can serve as a long-term fake for
Selenium, even after service is re-written
Thursday, June 6, 13
107. module SavesService
class Web < Sinatra::Base
set :environment, ENV['RACK_ENV'] || "development"
PAGE_SIZE = 50
ActiveRecord::Base.include_root_in_json = false
register Sinatra::ActiveRecordExtension
configure do
set :database_file, SavesService.config.db_config
set :root, File.expand_path("../../../", __FILE__)
disable :raise_errors
disable :show_exceptions
set :logger, nil
end
error do
e = env['sinatra.error']
ActiveRecord::Base.logger.error ["#{e.class}: #{e.message}",
*e.backtrace].join("n ")
status 500
body '{"errors":":("}'
end
before { content_type 'application/json' }
end
end
Proprietary and
ConfidentialThursday, June 6, 13
108. Proprietary and
Confidential
get '/products/:pid/saves' do
paginate SavesService::Save.by_product(params[:pid])
end
private
def paginate(scope)
return count(scope) if params[:count].present?
order = params[:order] if params[:order] =~ /Adesc|ascZ/i
scope = scope.order("created_at #{order || 'desc'}")
limit = params[:limit] ? params[:limit].to_i : PAGE_SIZE
page_number = [params[:page].to_i - 1, 0].max * limit
scope = scope.offset(page_number).limit(limit)
scope = scope.pluck(params[:pluck]) if params[:pluck]
body Oj.dump(saves: scope)
end
Thursday, June 6, 13
109. Proprietary and
Confidential
#!/usr/bin/env ruby
require 'optparse'
options = {
:environment => 'development',
:port => 3000
}
OptionParser.new do |opts|
opts.banner = "Usage: saves_service [options]"
opts.on("-d", "--dbconfig OPT", "path to database.yml") do |opt|
options[:db_config] = File.expand_path(opt, Dir.pwd)
end
opts.on("-E", "--environment OPT", "RACK_ENV to use") do |opt|
options[:environment] = opt
end
opts.on("-p", "--port OPT", "port to use") do |opt|
options[:port] = opt
end
end.parse!
cmd_env = { 'RACK_ENV' => options[:environment],
'DB_CONFIG' => options[:db_config],
}.delete_if{|k,v| v.nil? }
rackup_file = File.expand_path('../../config.ru', __FILE__)
exec cmd_env, "unicorn -p #{options[:port]} #{rackup_file}"
Thursday, June 6, 13
110. DbAdapter vs HTTPAdapter
Proprietary and
Confidential
■ Maps Ruby interface to Net::HTTP::Persistent
■ Deserializes JSON into values on class
■ Scope becomes a map
by_product(1) => "/products/1/saves"
■ Finder methods map params
limit(10) => "?limit=10"
■ AdapterRelation uses Adapter to fetch records,
does not change at all
Thursday, June 6, 13
111. Proprietary and
Confidential
module SavesClient
class AdapterRelation
attr_reader :adapter, :scope
def initialize(adapter, scope)
@adapter, @scope = adapter, scope
end
def limit(num); end
def order(order); end
def page(num); end
def first; end
def all
adapter.all(scope)
end
end
end
Thursday, June 6, 13
112. Deploy as a setting change
Proprietary and
Confidential
Rails.application.config.after_initialize do |app|
if Settings.saves_service.enabled
Save.saves_base_url = Settings.saves_service.url
else
require 'saves_client/db_adapter'
Save.adapter = SavesClient::DbAdapter
require 'saves_service/save'
saves_db = "saves_#{Rails.env}"
SavesService::Save.establish_connection saves_db
end
end
Thursday, June 6, 13
114. Proprietary and
Confidential
■ Choose technologies that are easy to
operate and monitor
■ Don't immediately break when they hit
resource thresholds
■ Sound replication strategies
■ Assume that data should be tracked, even if
you don't yet understand the relevance
■ Small iterative performance improvements
can have massive payoff over time
Thursday, June 6, 13