This is a talk that I gave on July 20, 2012 at the Southern California Python Interest Group meetup at Cross Campus, with food and drinks provided by Graph Effect.
How to Troubleshoot Apps for the Modern Connected Worker
Python redis talk
1. Redis and Python
by Josiah Carlson
@dr_josiah
dr-josiah.blogspot.com
bit.ly/redis-in-action
2. Redis and Python;
It's PB & J time
by Josiah Carlson
@dr_josiah
dr-josiah.blogspot.com
bit.ly/redis-in-action
3. What will be covered
• Who am I?
• What is Redis?
• Why Redis with Python?
• Cool stuff you can do by combining them
4. Who am I?
• A Python user for 12+ years
• Former python-dev bike-shedder
• Former maintainer of Python async sockets libraries
• Author of a few small OS projects
o rpqueue, parse-crontab, async_http, timezone-utils, PyPE
• Worked at some cool places you've never heard of
(Networks In Motion, Ad.ly)
• Cool places you have (Google)
• And cool places you will (ChowNow)
• Heavy user of Redis
• Author of upcoming Redis in Action
5. What is Redis?
• In-memory database/data structure server
o Limited to main memory; vm and diskstore defunct
• Persistence via snapshot or append-only file
• Support for master/slave replication (multiple slaves
and slave chaining supported)
o No master-master, don't even try
o Client-side sharding
o Cluster is in-progress
• Five data structures + publish/subscribe
o Strings, Lists, Sets, Hashes, Sorted Sets (ZSETs)
• Server-side scripting with Lua in Redis 2.6
6. What is Redis? (compared to other
databases/caches)
• Memcached
o in-memory, no-persistence, counters, strings, very fast, multi-threaded
• Redis
o in-memory, optionally persisted, data structures, very fast, server-side
scripting, single-threaded
• MongoDB
o on-disk, speed inversely related to data integrity, bson, master/slave,
sharding, multi-master, server-side mapreduce, database-level locking
• Riak
o on-disk, pluggable data stores, multi-master sharding, RESTful API,
server-side map-reduce, (Erlang + C)
• MySQL/PostgreSQL
o on-disk/in-memory, pluggable data stores, master/slave, sharding,
stored procedures, ...
7. What is Redis? (Strings)
• Really scalars of a few different types
o Character strings
concatenate values to the end
get/set individual bits
get/set byte ranges
o Integers (platform long int)
increment/decrement
auto "casting"
o Floats (IEEE 754 FP Double)
increment/decrement
auto "casting"
8. What is Redis? (Lists)
• Doubly-linked list of character strings
o Push/pop from both ends
o [Blocking] pop from multiple lists
o [Blocking] pop from one list, push on another
o Get/set/search for item in a list
o Sortable
9. What is Redis? (Sets)
• Unique unordered sequence of character
strings
o Backed by a hash table
o Add, remove, check membership, pop, random pop
o Set intersection, union, difference
o Sortable
10. What is Redis? (Hashes)
• Key-value mapping inside a key
o Get/Set/Delete single/multiple
o Increment values by ints/floats
o Bulk fetch of Keys/Values/Both
o Sort-of like a small version of Redis that only
supports strings/ints/floats
11. What is Redis? (Sorted Sets -
ZSETs)
• Like a Hash, with 'members' and 'scores',
scores limited to float values
o Get, set, delete, increment
o Can be accessed by the sorted order of the
(score,member) pair
By score
By index
12. What is Redis? (Publish/Subscribe)
• Readers subscribe to "channels" (exact
strings or patterns)
• Writers publish to channels, broadcasting to
all subscribers
• Messages are transient
13. Why Redis with Python?
• The power of Python lies in:
o Reasonably sane syntax/semantics
o Easy manipulation of data and data structures
o Large and growing community
• Redis also has:
o Reasonably sane syntax/semantics
o Easy manipulation of data and data structures
o Medium-sized and growing community
o Available as remote server
Like a remote IPython, only for data
So useful, people have asked for a library version
14. Per-hour and Per-day hit counters
from itertools import imap
import redis
def process_lines(prefix, logfile):
conn = redis.Redis()
for log in imap(parse_line, open(logfile, 'rb')):
time = log.timestamp.isoformat()
hour = time.partition(':')[0]
day = time.partition('T')[0]
conn.zincrby(prefix + hour, log.path)
conn.zincrby(prefix + day, log.path)
conn.expire(prefix + hour, 7*86400)
conn.expire(prefix + day, 30*86400)
15. Per-hour and Per-day hit counters
(with pipelines for speed)
from itertools import imap
import redis
def process_lines(prefix, logfile):
pipe = redis.Redis().pipeline(False)
for i, log in enumerate(imap(parse_line, open(logfile, 'rb'))):
time = log.timestamp.isoformat()
hour = time.partition(':')[0]
day = time.partition('T')[0]
pipe.zincrby(prefix + hour, log.path)
pipe.zincrby(prefix + day, log.path)
pipe.expire(prefix + hour, 7*86400)
pipe.expire(prefix + day, 30*86400)
if not i % 1000:
pipe.execute()
pipe.execute()