1. Celery
An introduction to the distributed task queue.
Rich Leland
ZPUGDC // April 6, 2010
@richleland
richard_leland@discovery.com
http://creative.discovery.com
3. What is Celery?
An asynchronous, concurrent, distributed,
super-awesome task queue.
4. A brief history
• First commit in April 2009 as "crunchy"
• Originally built for use with Django
• Django is still a requirement
• Don't be scurred! No Django app required!
• It's for the ORM, caching, and signaling
• Future is celery using SQLAlchemy and louie
6. User perspective
• Minimize request/response cycle
• Smoother user experience
• Difference between pleasant and unpleasant
7. Developer perspective
• Offload time/cpu intensive processes
• Scalability - add workers as needed
• Flexibility - many points of customization
• About to turn 1 (apr 24)
• Actively developed
• Great documentation
• Lots of tutorials
9. Business perspective
• Latency == $$$
• Every 100ms of latency cost Amazon 1% in sales
• Google found an extra .5 seconds in search page
generation time dropped traffic by 20%
• 5ms latency in an electronic trading platform could mean
$4 million in lost revenues per millisecond
http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it
10. Example Uses
• Image processing
• Calculate points and award badges
• Upload files to a CDN
• Re-generate static files
• Generate graphs for enormous data sets periodically
• Send blog comments through a spam filter
• Transcoding of audio and video
19. Define a task
from celery.decorators import task
@task
def add(x, y):
return x + y
20. Execute the task
>>> from tasks import add
>>> add.delay(4, 4)
<AsyncResult: 889143a6-39a2-4e52-837b-d80d33efb22d>
21. Analyze the results
>>> result = add.delay(4, 4)
>>> result.ready() # has task has finished processing?
False
>>> result.result # task is not ready, so no return value yet.
None
>>> result.get() # wait until the task is done and get retval.
8
>>> result.result # access result
8
>>> result.successful()
True
22. The Task class
class CanDrinkTask(Task):
"""
A task that determines if a person is 21 years of age or older.
"""
def run(self, person_id, **kwargs):
logger = self.get_logger(**kwargs)
logger.info("Running determine_can_drink task for person %s" % person_id)
person = Person.objects.get(pk=person_id)
now = date.today()
diff = now - person.date_of_birth
# i know, i know, this doesn't account for leap year
age = diff.days / 365
if age >= 21:
person.can_drink = True
person.save()
else:
person.can_drink = False
person.save()
return True
23. Task retries
class CanDrinkTask(Task):
"""
A task that determines if a person is 21 years of age or older.
"""
default_retry_delay = 5 * 60 # retry in 5 minutes
max_retries = 5
def run(self, person_id, **kwargs):
logger = self.get_logger(**kwargs)
logger.info("Running determine_can_drink task for person %s" % person_id)
...
24. The PeriodicTask class
class FullNameTask(PeriodicTask):
"""
A periodic task that concatenates fields to form a person's full name.
"""
run_every = timedelta(seconds=60)
def run(self, **kwargs):
logger = self.get_logger(**kwargs)
logger.info("Running full name task.")
for person in Person.objects.all():
person.full_name = " ".join([person.prefix, person.first_name,
person.middle_name, person.last_name,
person.suffix]).strip()
person.save()
return True
25. Holy chock full of features Batman!
• Messaging • Remote-control
• Distribution • Monitoring
• Concurrency • Serialization
• Scheduling • Tracebacks
• Performance • Retries
• Return values • Task sets
• Result stores • Web views
• Webhooks • Error reporting
• Rate limiting • Supervising
• Routing • init scripts