The document discusses strategies for building applications and services that can scale to handle large amounts of traffic and data. It covers using databases like Postgres and MySQL, augmenting them with Redis for tasks like counters and activity streams. It also discusses using queues like Celery with brokers like RabbitMQ to process asynchronous tasks. Object caching and caching views with Redis is presented as another scaling technique. The document emphasizes enhancing existing databases rather than replacing them, queuing all operations, and keeping solutions relatively simple rather than overly complex.
22. Counters in SQL
UPDATE table SET counter = counter + 1;
Tuesday, February 26, 13
23. Counters in Redis
INCR counter 1
>>> redis.incr('counter')
Tuesday, February 26, 13
24. Counters in Sentry
event ID 1 event ID 2 event ID 3
Redis INCR Redis INCR Redis INCR
SQL Update
Tuesday, February 26, 13
25. Counters in Sentry
‣
INCR event_id in Redis
‣
Queue buffer incr task
‣
5 - 10s explicit delay
‣
Task does atomic GET event_id and DEL
event_id (Redis pipeline)
‣
No-op If GET is not > 0
‣
One SQL UPDATE per unique event per
delay
Tuesday, February 26, 13
26. Counters in Sentry (cont.)
Pros
‣
Solves database row lock contention
‣
Redis nodes are horizontally scalable
‣
Easy to implement
Cons
‣
Too many dummy (no-op) tasks
Tuesday, February 26, 13
27. Alternative Counters
event ID 1 event ID 2 event ID 3
Redis ZINCRBY Redis ZINCRBY Redis ZINCRBY
SQL Update
Tuesday, February 26, 13
29. Alternative Counters
‣
ZINCRBY events event_id in Redis
‣
Cron buffer flush
‣
ZRANGE events to get pending updates
‣
Fire individual task per update
‣
Atomic ZSCORE events event_id and
ZREM events event_id to get and flush
count.
Tuesday, February 26, 13
30. Alternative Counters (cont.)
Pros
‣
Removes (most) no-op tasks
‣
Works without a complex queue due to no
required delay on jobs
Cons
‣
Single Redis key stores all pending updates
Tuesday, February 26, 13
33. Streams in SQL
class Activity:
SET_RESOLVED = 1
SET_REGRESSION = 6
TYPE = (
(SET_RESOLVED, 'set_resolved'),
(SET_REGRESSION, 'set_regression'),
)
event = ForeignKey(Event)
type = IntegerField(choices=TYPE)
user = ForeignKey(User, null=True)
datetime = DateTimeField()
data = JSONField(null=True)
Tuesday, February 26, 13
34. Streams in SQL (cont.)
>>> Activity(event, SET_RESOLVED, user, now)
"David marked this event as resolved."
>>> Activity(event, SET_REGRESSION, datetime=now)
"The system marked this event as a regression."
>>> Activity(type=DEPLOY_START, datetime=now)
"A deploy started."
>>> Activity(type=SET_RESOLVED, datetime=now)
"All events were marked as resolved"
Tuesday, February 26, 13
38. Views in Redis (cont.)
MAX_SIZE = 10000
def add(self, event):
score = float(event.date.strftime('%s.%m'))
# increment the key and trim the data to avoid
# data bloat in a single key
with self.db.pipeline() as pipe:
pipe.zadd(self.key, event.id, score)
pipe.zremrange(self.key, event.id, MAX_SIZE, -1)
Tuesday, February 26, 13
45. Object Cache Prerequisites
‣
Your database can't handle the read-load
‣
Your data changes infrequently
‣
You can handle slightly worse performance
Tuesday, February 26, 13
46. Distributing Load with Memcache
Memcache 1 Memcache 2 Memcache 3
Event ID 01 Event ID 02 Event ID 03
Event ID 04 Event ID 05 Event ID 06
Event ID 07 Event ID 08 Event ID 09
Event ID 10 Event ID 11 Event ID 12
Event ID 13 Event ID 14 Event ID 15
Tuesday, February 26, 13
47. Querying the Object Cache
def make_key(model, id):
return '{}:{}'.format(model.__name__, id)
def get_by_ids(model, id_list):
model_name = model.__name__
keys = map(make_key, id_list)
res = cache.get_multi()
pending = set()
for id, value in res.iteritems():
if value is None:
pending.add(id)
if pending:
mres = model.objects.in_bulk(pending)
cache.set_multi({make_key(o.id): o for o in mres})
res.update(mres)
return res
Tuesday, February 26, 13
48. Pushing State
def save(self):
cache.set(make_key(type(self), self.id), self)
def delete(self):
cache.delete(make_key(type(self), self.id)
Tuesday, February 26, 13
49. Redis for Persistence
Redis 1 Redis 2 Redis 3
Event ID 01 Event ID 02 Event ID 03
Event ID 04 Event ID 05 Event ID 06
Event ID 07 Event ID 08 Event ID 09
Event ID 10 Event ID 11 Event ID 12
Event ID 13 Event ID 14 Event ID 15
Tuesday, February 26, 13
50. Routing with Nydus
# create a cluster of Redis connections which
# partition reads/writes by (hash(key) % size)
from nydus.db import create_cluster
redis = create_cluster({
'engine': 'nydus.db.backends.redis.Redis',
'router': 'nydus.db...redis.PartitionRouter',
'hosts': {
{0: {'db': 0} for n in xrange(10)},
}
})
github.com/disqus/nydus
Tuesday, February 26, 13
54. Sentry's Team Dashboard
‣
Data limited to a single team
‣
Simple views which could be materialized
‣
Only entry point for "data for team"
Tuesday, February 26, 13
55. Sentry's Stream View
‣
Data limited to a single project
‣
Each project could map to a different DB
Tuesday, February 26, 13
58. redis-1
DB0 DB1 DB2 DB3 DB4
When a physical machine becomes
overloaded migrate a chunk of shards
to another machine.
redis-2
DB5 DB6 DB7 DB8 DB9
Tuesday, February 26, 13