A presentation with tips and tools on how to integrate batch and asynchronous operations in a generic ruby on rails application.
Did this at rubyday.it 2011
1. To Batch Or Not To Batch
Luca Mearelli
rubyday.it 2011
2. First and foremost, we believe that speed
is more than a feature. Speed is the most
important feature. If your application is
application is
slow, people won’t use it.
people won’t use it.
Fred Wilson
@lmea #rubyday
3. Not all the interesting features are fast
Interacting with remote API
Sending emails
Media transcoding
Large dataset handling
@lmea #rubyday
4. Anatomy of an asynchronous action
The app decides it needs to do a long operation
The app asks the async system to do the
operation and quickly returns the response
The async system executes the operation out-
of-band
@lmea #rubyday
7. Cron
scheduled operations
unrelated to the requests
low frequency
longer run time
@lmea #rubyday
8. Anatomy of a cron batch: the rake task
namespace :export do
task :items_xml => :environment do
# read the env variables
# make the export
end
end
@lmea #rubyday
9. Anatomy of a cron batch: the shell script
#!/bin/sh
# this goes in script/item_export_full.sh
cd /usr/rails/MyApp/current
export RAILS_ENV=production
echo "Item Export Full started: `date`"
rake export:items_xml XML_FOLDER='data/exports'
echo "Item Export Full completed: `date`"
@lmea #rubyday
10. Anatomy of a cron batch: the crontab entry
0 0 1 * * /usr/rails/MyApp/current/script/item_export_full.sh >> /usr/rails/
MyApp/current/log/dump_item_export.log 2>&1
30 13 * * * cd /usr/rails/MyApp/current; ruby /usr/rails/MyApp/current/script/
runner -e production "Newsletter.deliver_daily" >> /usr/rails/MyApp/current/log/
newsletter_daily.log 2>&1
@lmea #rubyday
12. Whenever: schedule.rb
# adds ">> /path/to/file.log 2>&1" to all commands
set :output, '/path/to/file.log'
every 3.hours do
rake "my:rake:task"
end
every 1.day, :at => '4:30 am' do
runner "MyModel.task_to_run_at_four_thirty_in_the_morning"
end
every :hour do # Many shortcuts available: :hour, :day, :month, :year, :reboot
command "/usr/bin/my_great_command", :output => {:error =>
'error.log', :standard => 'cron.log'}
end
@lmea #rubyday
17. Delayed job
Any object method can be a job
Db backed queue
Integer-based priority
Lifecycle hooks (enqueue, before, after, ... )
@lmea #rubyday
18. Delayed job: simple jobs
# without delayed_job
@user.notify!(@event)
# with delayed_job
@user.delay.notify!(@event)
# always asyncronous method
class Newsletter
def deliver
# long running method
end
handle_asynchronously :deliver
end
newsletter = Newsletter.new
newsletter.deliver
@lmea #rubyday
22. Delayed job: checking the job status
The queue is for scheduled and running jobs
Handle the status outside Delayed::Job object
@lmea #rubyday
23. Delayed job: checking the job status
# Include this in your initializers somewhere
class Queue < Delayed::Job
def self.status(id)
self.find_by_id(id).nil? ? "success" : (job.last_error.nil? ? "queued" : "failure")
end
end
# Use this method in your poll method like so:
def poll
status = Queue.status(params[:id])
if status == "success"
# Success, notify the user!
elsif status == "failure"
# Failure, notify the user!
end
end
@lmea #rubyday
24. Delayed job: checking the job status
class AJob < Struct.new(:options)
def perform
do_something(options)
end
def success(job)
# record success of job.id
Rails.cache.write("status:#{job.id}", "success")
end
end
# a helper
def job_completed_with_success(job_id)
Rails.cache.read("status:#{job_id}")=="success"
end
@lmea #rubyday
25. Resque
Redis-backed queues
Queue/dequeue speed independent of list size
Forking behaviour
Built in front-end
Multiple queues / no priorities
@lmea #rubyday
26. Resque: the job
class Export
@queue = :export_jobs
def self.perform(dataset_id, kind = 'full')
ds = Dataset.find(dataset_id)
ds.create_export(kind)
end
end
@lmea #rubyday
27. Resque: enqueuing the job
class Dataset
def async_create_export(kind)
Resque.enqueue(Export, self.id, kind)
end
end
ds = Dataset.find(100)
ds.async_create_export('full')
@lmea #rubyday
28. Resque: persisting the job
# jobs are persisted as JSON,
# so jobs should only take arguments that can be expressed as JSON
{
'class': 'Export',
'args': [ 100, 'full' ]
}
# don't do this: Resque.enqueue(Export, self, kind)
# do this:
Resque.enqueue(Export, self.id, kind)
@lmea #rubyday
29. Resque: generic async methods
# A simple async helper
class Repository < ActiveRecord::Base
# This will be called by a worker when a job needs to be processed
def self.perform(id, method, *args)
find(id).send(method, *args)
end
# We can pass this any Repository instance method that we want to
# run later.
def async(method, *args)
Resque.enqueue(Repository, id, method, *args)
end
end
# Now we can call any method and have it execute later:
@repo.async(:update_disk_usage)
@repo.async(:update_network_source_id, 34)
@lmea #rubyday
30. Resque: anatomy of a worker
# a worker does this:
start
loop do
if job = reserve
job.process
else
sleep 5
end
end
shutdown
@lmea #rubyday
31. Resque: working the queues
$ QUEUES=critical,high,low rake resque:work
$ QUEUES=* rake resque:work
$ PIDFILE=./resque.pid QUEUE=export_jobs rake environment resque:work
task "resque:setup" => :environment do
AppConfig.a_parameter = ...
end
@lmea #rubyday
32. Resque: monit recipe
# example monit monitoring recipe
check process resque_worker_batch_01
with pidfile /app/current/tmp/pids/worker_01.pid
start program = "/bin/bash -c 'cd /app/current; RAILS_ENV=production QUEUE=batch_queue nohup
rake environment resque:work & > log/worker_01.log && echo $! > tmp/pids/worker_01.pid'" as uid
deploy and gid deploy
stop program = "/bin/bash -c 'cd /app/current && kill -s QUIT `cat tmp/pids/worker_01.pid` && rm
-f tmp/pids/worker_01.pid; exit 0;'"
if totalmem is greater than 1000 MB for 10 cycles then restart # eating up memory?
group resque_workers
@lmea #rubyday
35. Resque-status
Simple trackable jobs for resque
Job instances have a UUID
Jobs can report their status while running
@lmea #rubyday
36. Resque-status
# inheriting from JobWithStatus
class ExportJob < Resque::JobWithStatus
# perform is an instance method
def perform
limit = options['limit'].to_i || 1000
items = Item.limit(limit)
total = items.count
exported = []
items.each_with_index do |item, num|
at(num, total, "At #{num} of #{total}")
exported << item.to_csv
end
File.open(local_filename, 'w') { |f| f.write(exported.join("n")) }
complete(:filename=>local_filename)
end
end
@lmea #rubyday
37. Resque-status
job_id = SleepJob.create(:length => 100)
status = Resque::Status.get(job_id)
# the status object tell us:
status.pct_complete #=> 0
status.status #=> 'queued'
status.queued? #=> true
status.working? #=> false
status.time #=> Time object
status.message #=> "Created at ..."
Resque::Status.kill(job_id)
@lmea #rubyday
38. Resque-scheduler
Queueing for future execution
Scheduling jobs (like cron!)
@lmea #rubyday
39. Resque-scheduler
# run a job in 5 days
Resque.enqueue_in(5.days, SendFollowupEmail)
# run SomeJob at a specific time
Resque.enqueue_at(5.days.from_now, SomeJob)
@lmea #rubyday
40. Resque-scheduler
namespace :resque do
task :setup do
require 'resque'
require 'resque_scheduler'
require 'resque/scheduler'
Resque.redis = 'localhost:6379'
# The schedule doesn't need to be stored in a YAML, it just needs to
# be a hash. YAML is usually the easiest.
Resque::Scheduler.schedule = YAML.load_file('your_resque_schedule.yml')
# When dynamic is set to true, the scheduler process looks for
# schedule changes and applies them on the fly.
# Also if dynamic the Resque::Scheduler.set_schedule (and remove_schedule)
# methods can be used to alter the schedule
#Resque::Scheduler.dynamic = true
end
end
$ rake resque:scheduler
@lmea #rubyday
41. Resque-scheduler: the yaml configuration
queue_documents_for_indexing:
cron: "0 0 * * *"
class: QueueDocuments
queue: high
args:
description: "This job queues all content for indexing in solr"
export_items:
cron: "30 6 * * 1"
class: Export
queue: low
args: full
description: "This job does a weekly export"
@lmea #rubyday