FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
Python Performance 101
1. About me :)
● Computer Programmer,
● Coding in Python for last 3 years,
● Part of the Team at HP that developed an early
warning software that parses over 40+TB of
data annually to find problems before they
happen, (Coded in Python)
● Skills in Django, PyQt,
● Http://uptosomething.in [ Homepage ]
3. Performance : Measurement
Reading cProfile output
Ncalls : for the number of calls,
Tottime : for the total time spent in the given function (and excluding time
made in calls to sub-functions),
Percall : is the quotient of tottime divided by ncalls
Cumtime : is the total time spent in this and all subfunctions (from invocation
till exit). This figure is accurate even for recursive functions.
Percall : is the quotient of cumtime divided by primitive calls
filename:lineno(function) : provides the respective data of each function
6. Python Performance : Low Hanging Fruits
● String concatenation Benchmark ( http://sprocket.io/blog/2007/10/string-concatenation-performance-in-
python/ )
add: a + b + c + d
add equals: a += b; a += c; a += d
format strings: ‘%s%s%s%s’ % (a, b, c, d)
named format strings:‘%(a)s%(b)s%(c)s%(d)s’ % {‘a’: a, ‘b’: b, ‘c’: c, ‘d’: d}”
join: ”.join([a,b,c,d])
#!/usr/bin/python
# benchmark various string concatenation methods. Run each 5*1,000,000 times
# and pick the best time out of the 5. Repeats for string lengths of
# 4, 16, 64, 256, 1024, and 4096. Outputs in CSV format via stdout.
import timeit
tests = {
'add': "x = a + b + c + d",
'join': "x = ''.join([a,b,c,d])",
'addequals': "x = a; x += b; x += c; x += d",
'format': "x = '%s%s%s%s' % (a, b, c, d)",
'full_format': "x = '%(a)s%(b)s%(c)s%(d)s' % {'a': a, 'b': b, 'c': c, 'd': d}"
}
count = 1
for i in range(6):
count = count * 4
init = "a = '%s'; b = '%s'; c = '%s'; d = '%s'" %
('a' * count, 'b' * count, 'c' * count, 'd' * count)
for test in tests:
t = timeit.Timer(tests[test], init)
best = min(t.repeat(5, 1000000))
print "'%s',%s,%s" % (test, count, best)
7. Python Performance : Low Hanging Fruits
Simple addition is the fastest string concatenation for small strings, followed by add equals.
”.join() is the fastest string concatenation for large strings.
* named format is always the worst performer.
* using string formatting for joins is equally as good as add equals for large strings, but for small strings it’s mediocre.
8. Python Performance : Low Hanging Fruits
newlist = [] newlist = map(str.upper, oldlist)
for word in oldlist:
newlist.append(word.upper())
newlist = [s.upper() for s in oldlist]
upper = str.upper
newlist = []
append = newlist.append
I wouldn't do this
for word in oldlist:
append(upper(word))
Exception for branching
wdict = {}
wdict = {}
for word in words:
for word in words:
try:
if word not in wdict:
wdict[word] += 1
wdict[word] = 0
except KeyError:
wdict[word] += 1
wdict[word] = 1
9. Python Performance : Low Hanging Fruits
Function call overhead
import time import time
x = 0 x = 0
def doit1(i): def doit2(list):
global x global x
x = x + i for i in list:
list = range(100000) x = x + i
t = time.time() list = range(100000)
for i in list: t = time.time()
doit1(i) doit2(list)
print "%.3f" % (time.time()-t) print "%.3f" % (time.time()-t)
>>> t = time.time()
>>> for i in list:
... doit1(i)
...
>>> print "%.3f" % (time.time()-t)
0.758
>>> t = time.time()
>>> doit2(list)
>>> print "%.3f" % (time.time()-t)
0.204
10. Python Performance : Low Hanging Fruits
Xrange vs range
Membership testing with sets and dictionaries is much faster, O(1), than searching
sequences, O(n).
When testing "a in b", b should be a set or dictionary instead of a list or tuple.
Lists perform well as either fixed length arrays or variable length stacks. However, for queue
applications using pop(0) or insert(0,v)), collections.deque() offers superior O(1)
performance because it avoids the O(n) step of rebuilding a full list for each insertion or
deletion.
In functions, local variables are accessed more quickly than global variables, builtins, and
attribute lookups. So, it is sometimes worth localizing variable access in inner-loops.
http://wiki.python.org/moin/PythonSpeed
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
11. Python : Multi-core Architecture
● In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from
executing Python bytecodes at once. This lock is necessary mainly because CPython's memory
management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the
guarantees that it enforces.) More here http://wiki.python.org/moin/GlobalInterpreterLock
● Use Multi Processing to overcome GIL
from multiprocessing import Process, Queue
def f(iq,oq):
if not iq.empty():
values = iq.get()
oq.put(sum(values))
if __name__ == '__main__':
inputQueue = Queue()
outputQueue = Queue()
values = range(0,1000000)
processOne = Process(target=f, args=(inputQueue,outputQueue))
processTwo = Process(target=f, args=(inputQueue,outputQueue))
inputQueue.put(values[0:len(values)/2])
inputQueue.put(values[len(values)/2:])
processOne.start()
processTwo.start()
processOne.join()
processTwo.join()
outputOne = outputQueue.get()
outputTwo = outputQueue.get()
print sum([outputOne, outputTwo])
12. Python : Multi-core Architecture
● IPL encapsulated. Queue, Pipe, Lock.
● Use logging module to log multiprocess i.e. SocketHandler,
● Good practise is to have maximum 2 * No of cores processes
spawned.
● Debugging is a little painful as cProfile has to be attached to each
process and then you dump the stats output of it and one can join
them all. Still a little painful.
13. Python : Interpreter
CPython - the default install everyone uses
Jython - Python on the JVM, currently targets Python 2.5, true
concurrency, strong JVM integration. About even with CPython speed-
wise, maybe a bit slower.
IronPython - Python on the CLR, currently targets 2.6, with a 2.7 pre-
release available, true concurrency, good CLR integration. Speed
comparison with CPython varies greatly depending on which feature you're
looking at.
PyPy - Python on RPython (a static subset of python), currently targets
2.5, with a branch targeting 2.7, has a GIL, and a JIT, which can result in
huge performance gains (see http://speed.pypy.org/).
Unladen Swallow - a branch of CPython utilizing LLVM to do just in time
compilation. Branched from 2.6, although with the acceptance of PEP
3146 it is slated for merger into py3k.
Source: Alex Gaynor @ Quora
14. Python : Interpreter
PyPy
Http://pypy.org
PyPy is a fast, compliant alternative implementation of the Python language (2.7.1). It has several
advantages and distinct features:
Speed: thanks to its Just-in-Time compiler, Python programs often run faster on PyPy.
(What is a JIT compiler?)
Memory usage: large, memory-hungry Python programs might end up taking less space than they
do in CPython.
Compatibility: PyPy is highly compatible with existing python code. It supports ctypes and can run
popular python libraries like twisted and django.
Sandboxing: PyPy provides the ability to run untrusted code in a fully secure way.
Stackless: PyPy can be configured to run in stackless mode, providing micro-threads for massive
concurrency.
Source : http://pypy.org
15. Python : Interpreter
● Unladen swallow
An optimization branch of CPython, intended to
be fully compatible and significantly faster.
http://code.google.com/p/unladen-swallow/
● Mandate is to merge the codebase with Python
3.x series.
● It's a google sponsered project.
● Know to be used @ Youtube which is in Python.