Contenu connexe Similaire à Advanced Python, Part 2 (20) Advanced Python, Part 21. More topics in Advanced Python
© 2014 Zaar Hai tech.zarmory.com
● Generators
● Async programming
© 2014 Zaar Hai tech.zarmory.com
2. Appetizer – Slots vs Dictionaries
(Almost) every python object has built-in __dict__ dictionary
It can be memory wasteful for numerous objects having only
small amount of attributes
class A(object):
pass
class B(object):
__slots__ = ["a","b"]
>>> A().c = 1
>>> B().c = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'B' object has no attribute 'c'
Slots come to save memory (and CPU)
But do they really?
2 © 2014 Zaar Hai tech.zarmory.com
3. Slots vs Dictionaries - competitors
class A(object):
# __slots__ = ["a", "b", "c"]
def __init__(self):
self.a = "foot"
self.b = 2
self.c = True
l = []
for i in xrange(50000000):
l.append(A())
import resource
print resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
3 © 2014 Zaar Hai tech.zarmory.com
4. Slots vs Dictionaries – memory
400
350
300
250
200
150
100
50
0
Py 2.7 slots Py 3.4 slots Pypy slots Py 2.7 dict Py 3.4 dict
Pypy dict
1000 10000 100000 1000000
Objects
4 © 2014 Zaar Hai tech.zarmory.com
Memory - megabytes
5. Slots vs Dictionaries – MEMORY
Py 2.7 slots Py 3.4 slots Pypy slots Py 2.7 dict Py 3.4 dict Pypy dict
20000
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
1000 10000 100000 1000000 10000000 50000000
Objects
5 © 2014 Zaar Hai tech.zarmory.com
Memory - megabytes
6. Slots vs Dictionaries – cpu
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
Py 2.7 slots Py 3.4 slots Pypy slots Py 2.7 dict Py 3.4 dict Pypy dict
1000 10000 100000 1000000
Objects
6 © 2014 Zaar Hai tech.zarmory.com
Time - seconds
7. Slots vs Dictionaries – CPU
70
60
50
40
30
20
10
0
Py 2.7 slots Py 3.4 slots Pypy slots
Py 2.7 dict Py 3.4 dict Pypy dict
1000 10000 100000 1000000 10000000 50000000
Objects
7 © 2014 Zaar Hai tech.zarmory.com
Time - seconds
8. Slots vs Dictionaries - conclusions
Slots vs dicts – and the winner is... PyPy
Seriously – forget the slots, and just move to PyPy if
performance becomes an issue. As a bonus you get
performance improvements in other areas
Most important – run your micro benchmarks before jumping
into new stuff
8 © 2014 Zaar Hai tech.zarmory.com
10. The magic yield statement
A function becomes a generator if it contains yield statement
def gen():
yield 1
yield 2
When invoked - “nothing” happens. i.e. function code does
not run yet
>>> g = gen()
>>> g
<generator object gen at 0x7f423b1b3f00>
next() method runs function until next yield statement and
returns yielded value
>>> g.next()
1>
>> g.next()
2>
>> g.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
10 © 2014 Zaar Hai tech.zarmory.com
11. Generator exceptions
StopIteration is raised when generator is exhausted
for statement catches StopIteration automagically
>>> for i in gen():
... print i
...
12
If generator function raises exception, generator stops
def gen2():
yield 1
raise ValueError
yield 2
>>> g = gen2()
>>> g.next()
1 >>> g.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in gen2
ValueError
>>> g.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
11 © 2014 Zaar Hai tech.zarmory.com
12. Stopping generator prematurely
def producer():
conn = db.connection()
for row in conn.execute("SELECT * FROM t LIMIT 1000")
yield row
conn.close()
def consumer():
rows = producer()
print "First row %s" % rows.next()
In the above example connection will never be closed. Fix:
def producer():
conn = db.connection()
try:
for row in conn.execute("SELECT * FROM t LIMIT 1000")
yield row
finally:
conn.close()
def consumer():
rows = producer()
print "First row %s" % rows.next()
rows.close() # Will raise GeneratorExit in producer code
12 © 2014 Zaar Hai tech.zarmory.com
13. Syntactic sugar
Most of us use generators without even knowing about them
>>> [i for i in [1,2,3]]
[1, 2, 3]
However there is generator inside […] above
>>> ( i for i in [1,2,3] )
<generator object <genexpr> at 0x7f423b1b3f00>
list's constructor detects that input argument is a sequence
and iterates through it to create itself
More goodies:
>>> [i for i in range(6, 100) if i % 6 == i % 7 ]
[42, 43, 44, 45, 46, 47, 84, 85, 86, 87, 88, 89]
13 © 2014 Zaar Hai tech.zarmory.com
14. Generators produce stuff on demand
Writing Fibonacci series generator is a piece of cake:
def fibogen():
a,b = 0,1
yield a
yield b
while True:
a, b = b, a + b
yield b
No recursion
O(1) memory
Generates as much as you want to consume
14 © 2014 Zaar Hai tech.zarmory.com
15. Returning value from a generator
Only None can be returned from generator until Python 3.3
Since 3.3 you can:
def gen():
yield 1
yield 2
return 3
>>> g=gen()
>>> next(g)
1 >>> next(g)
2 >>> try:
... next(g)
... except StopIteration as e:
... print(e.value)
...
3
In earlier versions:
class Return(Exception):
def __init__(self, value):
self.value = value
Then raise it from generator and catch outside
15 © 2014 Zaar Hai tech.zarmory.com
16. Consumer generator
You can send stuff back to generator
def db_stream():
conn = db.connection()
try:
while True:
try:
row = yield
conn.execute("INSERT INTO t VALUES(%s)", row)
except ConnCommit:
conn.commit()
except ConnRollBack:
conn.rollback
except GeneratorExit:
conn.commit()
finally:
conn.close()
>>> g = db_stream()
>>> g.send([1])
>>> g.throw(ConnCommit)
>>> g.close()
16 © 2014 Zaar Hai tech.zarmory.com
18. Async in the nutshell
Technion CS “Introduction to Operating Systems”, HW 2
Setup:
import socket, select, time
from collections import defaultdict, deque
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind(("", 1234));
sock.listen(20000);
sock.setblocking(0)
rqueue = set([sock]);
wqueue = set()
pending = defaultdict(deque)
18 © 2014 Zaar Hai tech.zarmory.com
19. Async in the nutshell – event loop
Technion CS “Introduction to Operating Systems”, HW 2
while True:
rq, wq, _ = select.select(rqueue, wqueue, [])
for s in rq:
if s == sock:
new_sock, _ = sock.accept()
new_sock.setblocking(0)
rqueue.add(new_sock)
continue
data = s.recv(1024)
if not data:
s.close()
rqueue.remove(s)
else:
pending[s].append(data)
wqueue.add(s)
for s in wq:
if not pending[s]:
wqueue.remove(s)
continue
data = pending[s].popleft()
sent = s.send(data)
if sent != len(data):
data = data[sent:]
pending[s].appendleft(data)
19 © 2014 Zaar Hai tech.zarmory.com
20. Why bother with async?
Less memory resources
Stack memory allocated for each spawned thread. 2Mb on
x86 Linux
For a server to handle 10k connection – 20Gb of memory
required just for starters!
Less CPU resources
Context switching 10k threads is expensive
Async moves switching logic for OS / interpreter level to
application level – which is always more efficient
20 © 2014 Zaar Hai tech.zarmory.com
21. C10k problem
The art of managing large amount of connections
Why is that a problem? - long polling / websockets
With modern live web applications, each client / browser
holds an open connection to the server
Gmail has 425 million active users
I.e. gmail servers have to handle ~400 million active
connections at any given time
21 © 2014 Zaar Hai tech.zarmory.com
22. Concurrency vs Parallelism
Concurrency
Dealing with several tasks simultaneously
But with one task a time
All Intel processors up to Pentium were concurrent
Parallelism
Dealing with several tasks simultaneously
But with several tasks at any given time
All Intel processors since Pentium can execute more then
one instruction per clock cycle
(C)Python is always concurrent
Either with threads or with async approach
22 © 2014 Zaar Hai tech.zarmory.com
23. Thread abuse
Naive approach – spawn a thread for every tiny task:
Resource waste
Burden on OS / Interpreter
Good single-thread code can saturate a single core
Usually you don't need more then 1 thread / process per CPU
In web word
Your application need to scale beyond single machine
I.e. you'll have to run in multiple isolated processes anyway
23 © 2014 Zaar Hai tech.zarmory.com
24. Explicit vs Implicit context switching
Implicit context switching
OS / Interpreter decides when to switch
Coder needs to assume he can use control any time
Synchronization required – mutexes, etc
Explicit context switching
Coder decides when give up execution control
No synchronization primitives required!
24 © 2014 Zaar Hai tech.zarmory.com
25. Explicit vs Implicit context switching
Threads Explicit Async
def transfer(acc_f, acc_t, sum):
acc_f.lock()
if acc_f.balance > sum:
acc_f.balance -= sum
acc_t.balance += sum
acc_f.commit_balance()
acc_t.commit_balance()
acc_f.release()
def transfer(acc_f, acc_t, sum):
if acc_f.balance > sum:
acc_f.balance -= sum
acc_t.balance += sum
yield acc_f.commit_balance()
yield acc_t.deposit(sum)
25 © 2014 Zaar Hai tech.zarmory.com
26. Practical approach
Traditionally, async approach was implemented through
callbacks
In JavaScript it can get as nasty as this:
button.on("click", function() {
JQuery.ajax("http://...", {
success: function(data) {
// do something
}
}
}
Thankfully, Python's support for anonymous functions is not
that good
26 © 2014 Zaar Hai tech.zarmory.com
27. Back to fun – Async frameworks in python
Explicit
Tornado
Twisted
Tulip – part of Python standard lib since 3.4
Implicit
Gevent (for python < 3)
27 © 2014 Zaar Hai tech.zarmory.com
28. Tornado Hello World
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.write("Hello, world")
application = tornado.web.Application([
(r"/", MainHandler),
])
if __name__ == "__main__":
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
So far everything is synchronous
28 © 2014 Zaar Hai tech.zarmory.com
29. Tornado + database = async magic
from tornado.get import coroutine
from momoko.connections import Pool
db = Pool(host=...)
class MainHandler(tornado.web.RequestHandler):
@coroutine
def get(self):
cursor = yield db.execute("SELECT * FROM greetings")
for row in cursor.fetchall()
self.write(str(row))
self.finish()
29 © 2014 Zaar Hai tech.zarmory.com
30. Demystifying the magic
Future – proxy to an object that will be available later
AKA “promise” in JavaScript, “deferred” in Twisted
Traditional thread-related usage:
future = r.invoke("model_get")
res = future.get_result()
future = Future()
new_thread({
r = _invoke(...)
future.set_result(r)
})
return future
30 © 2014 Zaar Hai tech.zarmory.com
31. Futures in async
@coroutine
def get(self):
rows = yield db.execute(...)
def coroutine(func):
def wrapper(func):
gen = func()
future = gen.next()
Runner(gen, future)
return wrapper
from tornado import IOloop
class Runner(object):
def __init__(self, gen, future):
self.iploop = IOloop.instance()
self.gen = gen
self.future = future
self.handle_yield()
def run(self):
value = future.result()
next_future = self.gen.send(value)
# check StopIteration
self.future = next_future
self.handle_yield():
def handle_yield(self):
if self.future.done():
self.run()
else:
self.ioloop.add_future(
future, cb=self.run)
31 © 2014 Zaar Hai tech.zarmory.com
32. Now the magical db.execute(...)
class Connection(object):
def __init__(self, host=...):
self.sock = …
def execute(self, query):
self.future = Future()
self.query = query
self.ioloop.add_handler(self.sock, self.handle_write, IOloop.WRITE)
return self.future
def handle_write(self):
self.sock.write(query)
self.ioloop.add_handler(self.sock, self.handle_read, IOloop.READ)
def handle_read(self):
rows = self.sock.read()
self.future.set_result(rows)
32 © 2014 Zaar Hai tech.zarmory.com
33. Writing async-ready libraries
You have a library that uses, lets say, sockets
You want to make it async compatible
Two options:
Either choose which ioloop implementation you use
(Tornado IOLoop, Python 3.4 Tuplip, etc). But its hard
choice, limiting your users
Implementing library in a poll-able way. This way it can be
plugged into any ioloop.
33 © 2014 Zaar Hai tech.zarmory.com
34. (dumb) Pollable example: psycopg2 async mode
The following example is dumb, because it uses async in a
sync way. But it demonstrates the principle
from psycopg2.extensions import
POLL_OK, POLL_WRITE, POLL_READ
def wait(conn):
while 1:
state = conn.poll()
if state == POLL_OK:
break
elif state == POLL_WRITE:
select.select([], [conn.fileno()], [])
elif state == POLL_READ:
select.select([conn.fileno()], [], [])
else:
raise psycopg2.OperationalError("...")
>>> aconn = psycopg2.connect(database='test', async=1)
>>> wait(aconn)
>>> acurs = aconn.cursor()
>>> acurs.execute("SELECT pg_sleep(5); SELECT 42;")
>>> wait(acurs.connection)
>>> acurs.fetchone()[0]
42
34 © 2014 Zaar Hai tech.zarmory.com
35. Pollable example – the goal
class POLL_BASE(object): pass
class POLL_OK(POLL_BASE): pass
class POLL_READ(POLL_BASE): pass
class POLL_WRITE(POLL_BASE): pass
class Connection(object):
…
conn = Connection(host, port, …)
conn.read(10)
wait(conn) # poll, poll, poll
print "Received: %s" % conn.buff
35 © 2014 Zaar Hai tech.zarmory.com
36. Pollable example - implementation
class POLL_BASE(object): pass
class POLL_OK(POLL_BASE): pass
class POLL_READ(POLL_BASE): pass
class POLL_WRITE(POLL_BASE): pass
class Connection(object):
def __init__(self, …):
self.async_queue = deque()
def _read(self, total):
buff = []
left = total
while left:
yield POLL_READ
data = self.sock.recv(left)
left -= len(data)
buff.append(data)
raise Return("".join(buff))
def _read_to_buff(self, total):
self.buff = yield self._read(total)
def read(self, total):
self.async_queue.append(self._read_to_buff(total))
36 © 2014 Zaar Hai tech.zarmory.com
37. Pollable example – implementation cont
def poll(self, value=None):
try:
if value:
value = self.async_queue[0].send(value)
else:
# Because we can't send non-None values to not started gens
value = next(self.async_queue[0])
except (Return, StopIteration) as err:
value = getattr(err, "value", None)
self.async_queue.popleft()
if not len(self.async_queue):
return POLL_OK # All generators are done - operation finished
if value in (POLL_READ, POLL_WRITE):
return value # Need to wait for socket
if isinstance(value, types.GeneratorType):
self.async_queue.appendleft(value)
return self.poll() # Continue "pulling" next generator
# Pass return value to previous (caller) generator
return self.poll(value)
37 © 2014 Zaar Hai tech.zarmory.com