The document discusses caching techniques in Python. It begins with an introduction to caching and how it is similar to manual memory management. It then covers common caching patterns like memoization and cache invalidation. Some common problems with caching are discussed such as invalidating too much/little data and dependencies between cached values. Finally, it presents solutions like using process-level caching with dicts, application-level caching with Memcache, and batch invalidation of keys.
SQL Database Design For Developers at php[tek] 2024
Caching Techniques in Python: Memoization, Invalidation and Process Level Cache
1. Caching techinques in
python
Michael Domanski
europython 2010
czwartek, 22 lipca 2010
2. who I am
• python developer, professionally for a few
years now
• experienced also in c and objective-c
• currently working for 10clouds.com
czwartek, 22 lipca 2010
3. Interesting intro
• a bit of theory
• common patterns
• common problems
• common solutions
czwartek, 22 lipca 2010
4. How I think about
cache
• imagine a giant dict storing all your data
• you have to manage all data manually
• or provide some automated behaviour
czwartek, 22 lipca 2010
5. similar to....
• manual memory managment in c
• cache is memory
• and you have to controll it manually
czwartek, 22 lipca 2010
10. • very old pattern (circa 1968)
• we own the name to Donald Mitchie
czwartek, 22 lipca 2010
11. how it works
• we assosciate input with output, and store
in somewhere
• based on the assumption that for a given
input, output is always the same
czwartek, 22 lipca 2010
12. code example
CACHE_DICT = {}
def cached(key):
def func_wrapper(func):
def arg_wrapper(*args, **kwargs):
if not key in CACHE_DICT:
value = func(*args, **kwargs)
CACHE_DICT[key] = value
return CACHE_DICT[key]
return arg_wrapper
return func_wrapper
czwartek, 22 lipca 2010
13. what if output can
change?
• our pattern is still usefull
• we simply need to add something
czwartek, 22 lipca 2010
15. There are only two hard problems in Computer
Science: cache invalidation and naming things
Phil Karlton
czwartek, 22 lipca 2010
16. • basically, we update data in cache
• we need to know when and what to
change
• the more granular you want to be, the
harder it gets
czwartek, 22 lipca 2010
17. code example
def invalidate(key):
try:
del CACHE_DICT[key]
except KeyError:
print "someone tried to invalidate not present
key: %s" %key
czwartek, 22 lipca 2010
19. invalidating too much/
not enough
• flushing all data any time something changes
• not flushing cache at all
• tragic effects
czwartek, 22 lipca 2010
20. @cached('key1')
def simple_function1():
return db_get(id=1)
@cached('key2')
def simple_function2():
return db_get(id=2)
# SUPPOSE THIS IS IN ANOTHER MODULE
@cached('big_key1')
def some_bigger_function():
"""
this function depends on big_key1, key1 and key2
"""
def inner_workings():
db_set(1, 'something totally new')
#######
## imagine 100 lines of code here :)
######
inner_workings()
return [simple_function1(),simple_function2()]
if __name__ == '__main__':
simple_function1()
simple_function2()
a,b = some_bigger_function()
assert a == db_get(id=1), "this fails because we didn't invalidated cache properly"
czwartek, 22 lipca 2010
21. invalidating too soon/
too late
• your cache have to be synchronised to you
db
• sometimes very hard to spot
• leads to tragic mistakes
czwartek, 22 lipca 2010
22. @cached('key1')
def simple_function1():
return db_get(id=1)
@cached('key2')
def simple_function2():
return db_get(id=2)
# SUPPOSE THIS IS IN ANOTHER MODULE
def some_bigger_function():
db_set(1, 'something')
value = simple_function1()
db_set(2, 'something else')
#### now we know we used 2 cached functions so....
invalidate('key1')
invalidate('key2')
#### now we know we are safe, but for a price
return simple_function2()
if __name__ == '__main__':
some_bigger_function()
czwartek, 22 lipca 2010
23. superposition of
dependancy
• somehow less obvious problem
• eventually you will start caching effects of
computation
• you have to know very preciselly of what
your data is dependant
czwartek, 22 lipca 2010
24. @cached('key1')
def simple_function1():
return db_get(id=1)
@cached('key2')
def simple_function2():
return db_get(id=2)
# SUPPOSE THIS IS IN ANOTHER MODULE
@cached('key')
def some_bigger_function():
return {
'1': simple_function1(),
'2': simple_function2(),
'3': db_get(id=3)
}
if __name__ == '__main__':
simple_function1()
# somewhere else
db_set(1, 'foobar')
# and again
db_set(3, 'bazbar')
invalidate('key')
# ooops, we forgot something
data = some_bigger_function()
assert data['1'] == db_get(id=1), "this fails because we didn't manage to invalidate all the
keys"
czwartek, 22 lipca 2010
25. summing up
• know your data....
• be aware what and when you cache
• take care when using cached data in
computation
czwartek, 22 lipca 2010
30. code example
CACHE_DICT = {}
def cached(key):
def func_wrapper(func):
def arg_wrapper(*args, **kwargs):
if not key in CACHE_DICT:
value = func(*args, **kwargs)
CACHE_DICT[key] = value
return CACHE_DICT[key]
return arg_wrapper
return func_wrapper
czwartek, 22 lipca 2010
32. code example
def invalidate(key):
try:
del CACHE_DICT[key]
except KeyError:
print "someone tried to invalidate not present
key: %s" %key
czwartek, 22 lipca 2010
37. why no benchmarks
• not the point of this talk :)
• benchmarks are generic, caching is specific
• pick your flavour, think for yourself
czwartek, 22 lipca 2010
38. code example
cache = memcache.Client(['localhost:11211'])
def memcached(key):
def func_wrapper(func):
def arg_wrapper(*args, **kwargs):
value = cache.get(str(key))
if not value:
value = func(*args, **kwargs)
cache.set(str(key), value)
return value
return arg_wrapper
return func_wrapper
czwartek, 22 lipca 2010
42. • what if I don’t want to expire each key
manually
• that’s a lot to remember
• and we have to be carefull :(
czwartek, 22 lipca 2010
43. groups?
• group keys into sets
• which are tied to one key per set
• expire one key, instead of twenty
czwartek, 22 lipca 2010
44. how to get there?
• store some extra data
• you can store dicts in cache
• and cache behaves like dict
• so it’s a case of comparing keys and values
czwartek, 22 lipca 2010
45. #we start with specified key and group
key='some_key'
group='some_group'
# now retrieve some data from memcached
data=memcached_client.get_multi(key, group)
# now data is a dict that should look like
#{'some_key' :{'group_key' : '1234',
# 'value' : 'some_value' },
# 'some_group' : '1234'}
#
if data and (key in data) and (group in data):
if data[key]['group_key']==data[group]:
return data[key]['value']
czwartek, 22 lipca 2010
46. def cached(key, group_key='', exp_time=0 ):
# we don't want to mix time based and event based expiration models
if group_key : assert exp_time==0, "can't set expiration time for grouped keys"
def f_wrapper(func):
def arg_wrapper(*args, **kwargs):
value = None
if group_key:
data = cache.get_multi([tools.make_key(group_key)]+[tools.make_key(key)])
data_dict = data.get(tools.make_key(key))
if data_dict:
value = data_dict['value']
group_value = data_dict['group_value']
if group_value != data[tools.make_key(group_key)]:
value = None
else:
value = cache.get(key)
if not value:
value = func(*args, **kwargs)
if exp_time:
cache.set(tools.make_key(key), value, exp_time)
elif not group_key:
cache.set(tools.make_key(key), value)
else: # exp_time not set and we have group_keys
group_value = make_group_value(group_key)
data_dict = { 'value':value, 'group_value': group_value}
cache.set_multi({ tools.make_key(key):data_dict, tools.make_key(group_key):group_value })
return value
arg_wrapper.__name__ = func.__name__
return arg_wrapper
return f_wrapper
czwartek, 22 lipca 2010