SlideShare a Scribd company logo
1 of 42
Download to read offline
Getting started with
   Concurrency
  ...using Multiprocessing and Threading

         PyWorks, Atlanta 2008
            Jesse Noller
Who am I?
•   Just another guy.
•   Wrote PEP 371- “Addition of the multiprocessing package”
•   Did a lot of the integration, now the primary point of
    contact.
    •   This means I talk a lot on the internet.
•   Test Engineering is my focus - many of those are distributed
    and/or concurrent.
    •   This stuff is not my full time job.
What is concurrency?


•   Simultaneous execution

•   Potentially interacting tasks

•   Uses multi-core hardware

•   Includes Parallelism.
What is a thread?

•   Share the memory and state of the parent.
•   Are “light weight”
•   Each gets its own stack
•   Do not use Inter-Process Communication or messaging.

•   POSIX “Threads” - pthreads.
What are they good for?

•   Adding throughput and reduce latency within most
    applications.
    •   Throughput: adding threads allows you to process
        more information faster.
    •   Latency: adding threads to make the application react
        faster, such as GUI actions.

•   Algorithms which rely on shared data/state
What is a process?

•   An independent process-of-control.
•   Processes are “share nothing”.
•   Must use some form of Inter-Process
    Communication to communicate/coordinate.

•   Processes are “big”.
Uses for Processes.

•   When you don’t need to share lots of state and want a large
    amount of throughput.
•   Shared-Nothing-or-Little is “safer” then shared-everything.
•   Processes automatically run on multiple cores.
•   Easier to turn into a distributed application.
The Difference

•   Threads are implicitly “share everything” - this makes the
    programmer have to protect (lock) anything which will be
    shared between threads.
•   Processes are “share nothing” - programmers must explicitly
    share any data/state - this means that the programmer is
    forced to think about what is being shared
•   Explicit is better than Implicit
Python Threads


•   Python has threads, they are real, OS/Kernel level POSIX (p)
    threads.
    •   When you use threading.Thread, you get a pthread.
2.6 changes

•   camelCase method names are now foo_bar() style, e.g.:
    active_count, current_thread, is_alive, etc.
•   Attributes of threads and processes have been turned into
    properties.
    •   E.g.: daemon is now Thread.daemon = <bool>
•   For threading: these changes are optional. The old methods
    still exist.
...


Python does not use “green threads”. It has real threads. OS
Ones. Stop saying it doesn’t.
...


But: Python only allows a single thread to be executing within
the interpreter at once. This restriction is enforced by the
GIL.
The GIL


•   GIL: “Global Interpreter Lock” - this is a lock which must be
    acquired for a thread to enter the interpreter’s space.
•   Only one thread may be executing within the Python
    interpreter at once.
Yeah but...
•   No, it is not a bug.
•   It is an implementation detail of CPython interpreter
•   Interpreter maintenance is easier.
•   Creation of new C extension modules easier.
•   (mostly) sidestepped if the app is I/O (file, socket) bound.
•   A threaded app which makes heavy use of sockets, won’t see
    a a huge GIL penalty: it is still there though.
•   Not going away right now.
The other guys

•   Jython: No GIL, allows “free” threading by using the
    underlying Java threading system.
•   IronPython: No GIL - uses the underlying CLR, threads all
    run concurrently.
•   Stackless: Has a GIL. But has micro threads.
•   PyPy: Has a GIL... for now (dun dun dun)
Enter Multiprocessing
What is multiprocessing?

•   Follows the threading API closely but uses Processes and
    inter-process communication under the hood
•   Also offers distributed-computing faculties as well.
•   Allows the side-stepping of the GIL for CPU bound
    applications.
•   Allows for data/memory sharing.
•   CPython only.
Why include it?

•   Covered in PEP 371, we wanted to have something which was
    fast and “freed” many users from the GIL restrictions.
•   Wanted to add to the concurrency toolbox for Python as a
    whole.
     •   It is not the “final” answer. Nor is it “feature complete”
•   Oh, and it beats the threading module in speed.*
                          * lies, damned lies and benchmarks
How much faster?


•   It depends on the problem.
•   For example: number crunching, it’s significantly faster then
    adding threads.
•   Also faster in wide-finder/map-reduce situations
•   Process creation can be sluggish: create the workers up front.
Example: Crunching Primes


•       Yes, I picked something embarrassingly parallel.
•       Sum all of the primes in a range of integers starting from
        1,000,000 and going to 5,000,000.
•       Run on an 8 Core Mac Pro with 8 GB of ram with Python 2.6,
        completely idle, except for iTunes.
    •     The single threaded version took so long I needed music.
# Single threaded version
import math

def isprime(n):
    quot;quot;quot;Returns True if n is prime and False otherwisequot;quot;quot;
    if not isinstance(n, int):
        raise TypeError(quot;argument passed to is_prime is not of 'int' typequot;)
    if n < 2:
        return False
    if n == 2:
        return True
    max = int(math.ceil(math.sqrt(n)))
    i = 2
    while i <= max:
        if n % i == 0:
            return False
        i += 1
    return True

def sum_primes(n):
    quot;quot;quot;Calculates sum of all primes below given integer nquot;quot;quot;
    return sum([x for x in xrange(2, n) if isprime(x)])

if __name__ == quot;__main__quot;:
    for i in xrange(100000, 5000000, 100000):
        print sum_primes(i)
# Multi Threaded version
from threading import Thread
from Queue import Queue, Empty
...
def do_work(q):
    while True:
        try:
             x = q.get(block=False)
             print sum_primes(x)
        except Empty:
             break

if __name__ == quot;__main__quot;:
    work_queue = Queue()
    for i in xrange(100000, 5000000, 100000):
        work_queue.put(i)

    threads = [Thread(target=do_work, args=(work_queue,)) for i in range(8)]
    for t in threads:
        t.start()
    for t in threads:
        t.join()
# Multiprocessing version
from multiprocessing import Process, Queue
from Queue import Empty
...

if __name__ == quot;__main__quot;:
    work_queue = Queue()
    for i in xrange(100000, 5000000, 100000):
        work_queue.put(i)

   processes = [Process(target=do_work, args=(work_queue,)) for i in range(8)]
   for p in processes:
       p.start()
   for p in processes:
       p.join()
Results

•   All results are in wall-clock time.
•   Single Threaded: 41 minutes, 57 seconds
•   Multi Threaded (8 threads):    106 minutes, 29 seconds
•   MultiProcessing (8 Processes): 6 minutes, 22 seconds
•   This is a trivial example. More benchmarks/data were
    included in the PEP.
The catch.
•   Objects that are shared between processes must be
    serialize-able (pickle).
    •   40921 object/sec versus 24989 objects a second.
•   Processes are “heavy-weight”.
•   Processes can be slow to start. (on windows)
•   Supported on Linux, Solaris, Windows, OS/X - but not *BSD,
    and possibly others.
•   If you are creating and destroying lots of threads - processes
    are a significant impact.
API Time
It starts with a Process

•   Exactly like threading:
    •    Thread(target=func, args=(args,)).start()
    •    Process(target=func, args=(args,)).start()
•   You can subclass multiprocessing.Process exactly as you
    would with threading.Thread.
from threading import Thread
threads = [Thread(target=do_work, args=(q,)) for i in range(8)]

from multiprocessing import Process
processes = [Process(target=do_work, args=(q,)) for i in range(8)]
# Multiprocessing version             # Threading version
from multiprocessing import Process   from threading import Thread

class MyProcess(Process):             class MyThread(Thread):
    def __init__(self):                   def __init__(self):
        Process.__init__(self)                Thread.__init__(self)
    def run(self):                        def run(self):
        a, b = 0, 1                           a, b = 0, 1
        for i in range(100000):               for i in range(100000):
            a, b = b, a + b                       a, b = b, a + b

if __name__ == quot;__main__quot;:            if __name__ == quot;__main__quot;:
    p = MyProcess()                       t = MyThread()
    p.start()                             t.start()
    print p.pid                           t.join()
    p.join()
    print p.exitcode
Queues


•   multiprocessing includes 2 Queue implementations - Queue
    and JoinableQueue.
•   Queue is modeled after Queue.Queue but uses pipes
    underneath to transmit the data.
•   JoinableQueue is the same as Queue except it adds a .join()
    method and .task_done() ala Queue.Queue in python 2.5.
Queue.warning

•   The first is that if you call .terminate or kill a process which is
    currently accessing a queue: that queue may become corrupted.
•   The second is that any Queue that a Process has put data on
    must be drained prior to joining the processes which have put
    data there: otherwise, you’ll get a deadlock.
     •   Avoid this by calling Queue.cancel_join_thread() in
         the child process.
     •   Or just eat everything on the results pipe before calling
         join (e.g. work_queue, results_queue).
Pipes and Locks

•   Multiprocessing supports communication primitives.
    •   multiprocessing.Pipe(), which returns a pair of Connection
        objects which represent the ends of the pipe.
    •   The data sent on the connection must be pickle-able.
•   Multiprocessing has clones of all of the threading modules
    lock/RLock, Event, Condition and semaphore objects.
    •   Most of these support timeout arguments, too!
Shared Memory
•   Multiprocessing has a sharedctypes
    module.
•   This module allows you to create a   from multiprocessing import Process
    ctypes object in shared memory       from multiprocessing.sharedctypes import Value
    and share it with other processes.   from ctypes import c_int

                                         def modify(x):
•   The sharedctypes module offers           x.value += 1
    some safety through the use/
                                         x = Value(ctypes.c_int, 7)
    allocation of locks which prevent    p = Process(target=modify, args=(x))
    simultaneous accessing/              p.start()
    modification of the shared objects.   p.join()

                                         print x.value
Pools


•   One of the big “ugh” moments using threading is when you
    have a simple problem you simply want to pass to a pool of
    workers to hammer out.
•   Fact: There’s more thread pool implementations out there
    then stray cats in my neighborhood.
Process Pools!
•   Multiprocessing has the Pool object. This supports the up-front
    creation of a number of processes and a number of methods of
    passing work to the workers.
•   Pool.apply() - this is a clone of builtin apply() function.
     •   Pool.apply_async() - which can call a callback for you
         when the result is available.
•   Pool.map() - again, a parallel clone of the built in function.
     •   Pool.map_async() method, which can also get a callback to
         ring up when the results are done.
•   Fact: Functional programming people love this!
Pools raise insurance rates!

  from multiprocessing import Pool

  def f(x):
      return x*x

  if __name__ == '__main__':
      pool = Pool(processes=2)
      result = pool.apply_async(f, (10,))
      print result.get()


The output is: 100, note that the result returned is a AsyncResult
                                type.
Managers

•   Managers are a network and process-based way of sharing data
    between processes (and machines).
•   The primary manager type is the BaseManager - this is the
    basic Manager object, and can easily be subclassed to share
    data remotely
•   A proxy object is the type returned when accessing a
    shared object - this is a reference to the actual object being
    exported by the manager.
Sharing a queue (server)
# Manager Server
from Queue import Empty
from multiprocessing import managers, Queue

_queue = Queue()
def get_queue():
    return _queue

class QueueManager(managers.BaseManager): pass

QueueManager.register('get_queue', callable=get_queue)

m = QueueManager(address=('127.0.0.1', 8081), authkey=quot;lolquot;)
_queue.put('What’s up remote process')

s = m.get_server()
s.serve_forever()
Sharing a queue (client)

# Manager Client
from multiprocessing import managers

class QueueManager(managers.BaseManager): pass

QueueManager.register('get_queue')

m = QueueManager(address=('127.0.0.1', 8081), authkey=quot;lolquot;)
m.connect()
remote_queue = m.get_queue()
print remote_queue.get()
Gotchas

•   Processes which feed into a multiprocessing.Queue will block
    waiting for all the objects it put there to be removed.
•   Data must be pickle-able: this means some objects (for
    instance, GUI ones) can not be shared.
•   Arguments to Proxies (managers) methods must be pickle-able
    as well.
•   While it supports locking/semaphores: using those means
    you’re sharing something you may not need to be sharing.
In Closing

•   Multiple processes are not mutually exclusive with using
    Threads
•   Multiprocessing offers a simple and known API
     •   This lowers the barrier of entry significantly
     •   Side steps the GIL
•   In addition to “just processes” multiprocessing offers the start
    of grid-computing utilities
Questions?

More Related Content

What's hot

Introduction To Python | Edureka
Introduction To Python | EdurekaIntroduction To Python | Edureka
Introduction To Python | EdurekaEdureka!
 
What is Multithreading In Python | Python Multithreading Tutorial | Edureka
What is Multithreading In Python | Python Multithreading Tutorial | EdurekaWhat is Multithreading In Python | Python Multithreading Tutorial | Edureka
What is Multithreading In Python | Python Multithreading Tutorial | EdurekaEdureka!
 
Python Course | Python Programming | Python Tutorial | Python Training | Edureka
Python Course | Python Programming | Python Tutorial | Python Training | EdurekaPython Course | Python Programming | Python Tutorial | Python Training | Edureka
Python Course | Python Programming | Python Tutorial | Python Training | EdurekaEdureka!
 
What is Python? | Edureka
What is Python? | EdurekaWhat is Python? | Edureka
What is Python? | EdurekaEdureka!
 
Introduction to Rust language programming
Introduction to Rust language programmingIntroduction to Rust language programming
Introduction to Rust language programmingRodolfo Finochietti
 
Socket programming in Java (PPTX)
Socket programming in Java (PPTX)Socket programming in Java (PPTX)
Socket programming in Java (PPTX)UC San Diego
 
Go Programming language, golang
Go Programming language, golangGo Programming language, golang
Go Programming language, golangBasil N G
 
Introduction to TLS-1.3
Introduction to TLS-1.3 Introduction to TLS-1.3
Introduction to TLS-1.3 Vedant Jain
 
Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Paige Bailey
 
Introduction to Go programming language
Introduction to Go programming languageIntroduction to Go programming language
Introduction to Go programming languageSlawomir Dorzak
 

What's hot (20)

Introduction To Python | Edureka
Introduction To Python | EdurekaIntroduction To Python | Edureka
Introduction To Python | Edureka
 
What is Multithreading In Python | Python Multithreading Tutorial | Edureka
What is Multithreading In Python | Python Multithreading Tutorial | EdurekaWhat is Multithreading In Python | Python Multithreading Tutorial | Edureka
What is Multithreading In Python | Python Multithreading Tutorial | Edureka
 
Python - the basics
Python - the basicsPython - the basics
Python - the basics
 
Beautiful soup
Beautiful soupBeautiful soup
Beautiful soup
 
JavaScript: Events Handling
JavaScript: Events HandlingJavaScript: Events Handling
JavaScript: Events Handling
 
Python Course | Python Programming | Python Tutorial | Python Training | Edureka
Python Course | Python Programming | Python Tutorial | Python Training | EdurekaPython Course | Python Programming | Python Tutorial | Python Training | Edureka
Python Course | Python Programming | Python Tutorial | Python Training | Edureka
 
Python programming
Python  programmingPython  programming
Python programming
 
Go Lang Tutorial
Go Lang TutorialGo Lang Tutorial
Go Lang Tutorial
 
Namespaces
NamespacesNamespaces
Namespaces
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHP
 
Python Intro
Python IntroPython Intro
Python Intro
 
What is Python? | Edureka
What is Python? | EdurekaWhat is Python? | Edureka
What is Python? | Edureka
 
Introduction to Rust language programming
Introduction to Rust language programmingIntroduction to Rust language programming
Introduction to Rust language programming
 
Socket programming in Java (PPTX)
Socket programming in Java (PPTX)Socket programming in Java (PPTX)
Socket programming in Java (PPTX)
 
Go Programming language, golang
Go Programming language, golangGo Programming language, golang
Go Programming language, golang
 
Introduction to TLS-1.3
Introduction to TLS-1.3 Introduction to TLS-1.3
Introduction to TLS-1.3
 
Threads in python
Threads in pythonThreads in python
Threads in python
 
Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)
 
Python Flow Control
Python Flow ControlPython Flow Control
Python Flow Control
 
Introduction to Go programming language
Introduction to Go programming languageIntroduction to Go programming language
Introduction to Go programming language
 

Viewers also liked

Python Training in Bangalore | Multi threading | Learnbay.in
Python Training in Bangalore | Multi threading | Learnbay.inPython Training in Bangalore | Multi threading | Learnbay.in
Python Training in Bangalore | Multi threading | Learnbay.inLearnbayin
 
Radiology nurse kpi
Radiology nurse kpiRadiology nurse kpi
Radiology nurse kpiasfoskenfur
 
Neonatal hypoglycaemia sandra
Neonatal hypoglycaemia sandraNeonatal hypoglycaemia sandra
Neonatal hypoglycaemia sandraSandra Gboneme
 
Sociology contribution to understanding the (no)diffusion of a medical innova...
Sociology contribution to understanding the (no)diffusion of a medical innova...Sociology contribution to understanding the (no)diffusion of a medical innova...
Sociology contribution to understanding the (no)diffusion of a medical innova...Philippe GORRY
 
Level of consciousness (GCS)
Level of consciousness (GCS)Level of consciousness (GCS)
Level of consciousness (GCS)Sandra Gboneme
 
LSTE 7310_DUA #3
LSTE 7310_DUA #3LSTE 7310_DUA #3
LSTE 7310_DUA #3lisaarhoden
 
How To Write A Business Proposal - The Ultimate Guide
How To Write A Business Proposal - The Ultimate GuideHow To Write A Business Proposal - The Ultimate Guide
How To Write A Business Proposal - The Ultimate GuideFit Small Business
 
How to Excel at Event Marketing with Social Media
How to Excel at Event Marketing with Social MediaHow to Excel at Event Marketing with Social Media
How to Excel at Event Marketing with Social MediaHubSpot
 
responsibility of radiographer
responsibility of radiographerresponsibility of radiographer
responsibility of radiographerBISHAL KHANAL
 
Ppt on expanded role of nurse
Ppt on expanded role of nursePpt on expanded role of nurse
Ppt on expanded role of nurseNursing Path
 
Med Surg A Neuro Ppt
Med Surg A Neuro PptMed Surg A Neuro Ppt
Med Surg A Neuro Pptguestc323ed
 

Viewers also liked (20)

Multithreading by rj
Multithreading by rjMultithreading by rj
Multithreading by rj
 
Python Training in Bangalore | Multi threading | Learnbay.in
Python Training in Bangalore | Multi threading | Learnbay.inPython Training in Bangalore | Multi threading | Learnbay.in
Python Training in Bangalore | Multi threading | Learnbay.in
 
Radiology nurse kpi
Radiology nurse kpiRadiology nurse kpi
Radiology nurse kpi
 
Neonatal hypoglycaemia sandra
Neonatal hypoglycaemia sandraNeonatal hypoglycaemia sandra
Neonatal hypoglycaemia sandra
 
Initial assesment atls
Initial assesment  atlsInitial assesment  atls
Initial assesment atls
 
Sociology contribution to understanding the (no)diffusion of a medical innova...
Sociology contribution to understanding the (no)diffusion of a medical innova...Sociology contribution to understanding the (no)diffusion of a medical innova...
Sociology contribution to understanding the (no)diffusion of a medical innova...
 
R2 Medscan Ppt
R2 Medscan PptR2 Medscan Ppt
R2 Medscan Ppt
 
Lung Cancer Navigation
Lung Cancer NavigationLung Cancer Navigation
Lung Cancer Navigation
 
Radiology safety (3)
Radiology safety (3)Radiology safety (3)
Radiology safety (3)
 
Level of consciousness (GCS)
Level of consciousness (GCS)Level of consciousness (GCS)
Level of consciousness (GCS)
 
LSTE 7310_DUA #3
LSTE 7310_DUA #3LSTE 7310_DUA #3
LSTE 7310_DUA #3
 
How To Write A Business Proposal - The Ultimate Guide
How To Write A Business Proposal - The Ultimate GuideHow To Write A Business Proposal - The Ultimate Guide
How To Write A Business Proposal - The Ultimate Guide
 
How to Excel at Event Marketing with Social Media
How to Excel at Event Marketing with Social MediaHow to Excel at Event Marketing with Social Media
How to Excel at Event Marketing with Social Media
 
Ch08 eec3
Ch08 eec3Ch08 eec3
Ch08 eec3
 
responsibility of radiographer
responsibility of radiographerresponsibility of radiographer
responsibility of radiographer
 
Fast Scan
Fast ScanFast Scan
Fast Scan
 
Ppt on expanded role of nurse
Ppt on expanded role of nursePpt on expanded role of nurse
Ppt on expanded role of nurse
 
Med Surg A Neuro Ppt
Med Surg A Neuro PptMed Surg A Neuro Ppt
Med Surg A Neuro Ppt
 
Abdominal trauma
Abdominal traumaAbdominal trauma
Abdominal trauma
 
Emergency Radiology
Emergency RadiologyEmergency Radiology
Emergency Radiology
 

Similar to Multiprocessing with python

Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance PythonIan Ozsvald
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesTatiana Al-Chueyr
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded ProgrammingSri Prasanna
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)Marcel Caraciolo
 
cs2110Concurrency1.ppt
cs2110Concurrency1.pptcs2110Concurrency1.ppt
cs2110Concurrency1.pptnarendra551069
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linuxPavel Klimiankou
 
The State of PHPUnit
The State of PHPUnitThe State of PHPUnit
The State of PHPUnitEdorian
 
The State of PHPUnit
The State of PHPUnitThe State of PHPUnit
The State of PHPUnitEdorian
 
The state of PHPUnit
The state of PHPUnitThe state of PHPUnit
The state of PHPUnitEdorian
 
maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming Max Kleiner
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based programRalf Gommers
 
Multithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdfMultithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdfgiridharsripathi
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnelukdpe
 
Parallel Programming With Dot Net
Parallel Programming With Dot NetParallel Programming With Dot Net
Parallel Programming With Dot NetNeeraj Kaushik
 
Global Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealGlobal Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealTzung-Bi Shih
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/MultitaskingSasha Kravchuk
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 

Similar to Multiprocessing with python (20)

Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
PythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummiesPythonBrasil[8] - CPython for dummies
PythonBrasil[8] - CPython for dummies
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
 
cs2110Concurrency1.ppt
cs2110Concurrency1.pptcs2110Concurrency1.ppt
cs2110Concurrency1.ppt
 
Troubleshooting .net core on linux
Troubleshooting .net core on linuxTroubleshooting .net core on linux
Troubleshooting .net core on linux
 
The State of PHPUnit
The State of PHPUnitThe State of PHPUnit
The State of PHPUnit
 
The State of PHPUnit
The State of PHPUnitThe State of PHPUnit
The State of PHPUnit
 
The state of PHPUnit
The state of PHPUnitThe state of PHPUnit
The state of PHPUnit
 
maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming maXbox Starter 42 Multiprocessing Programming
maXbox Starter 42 Multiprocessing Programming
 
Parallelism in a NumPy-based program
Parallelism in a NumPy-based programParallelism in a NumPy-based program
Parallelism in a NumPy-based program
 
Multithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdfMultithreaded_Programming_in_Python.pdf
Multithreaded_Programming_in_Python.pdf
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
 
Parallel Programming With Dot Net
Parallel Programming With Dot NetParallel Programming With Dot Net
Parallel Programming With Dot Net
 
Global Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealGlobal Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the Seal
 
.NET Multithreading/Multitasking
.NET Multithreading/Multitasking.NET Multithreading/Multitasking
.NET Multithreading/Multitasking
 
concurrency
concurrencyconcurrency
concurrency
 
Ndp Slides
Ndp SlidesNdp Slides
Ndp Slides
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Java Performance Tweaks
Java Performance TweaksJava Performance Tweaks
Java Performance Tweaks
 

Recently uploaded

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 

Recently uploaded (20)

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 

Multiprocessing with python

  • 1. Getting started with Concurrency ...using Multiprocessing and Threading PyWorks, Atlanta 2008 Jesse Noller
  • 2. Who am I? • Just another guy. • Wrote PEP 371- “Addition of the multiprocessing package” • Did a lot of the integration, now the primary point of contact. • This means I talk a lot on the internet. • Test Engineering is my focus - many of those are distributed and/or concurrent. • This stuff is not my full time job.
  • 3. What is concurrency? • Simultaneous execution • Potentially interacting tasks • Uses multi-core hardware • Includes Parallelism.
  • 4. What is a thread? • Share the memory and state of the parent. • Are “light weight” • Each gets its own stack • Do not use Inter-Process Communication or messaging. • POSIX “Threads” - pthreads.
  • 5. What are they good for? • Adding throughput and reduce latency within most applications. • Throughput: adding threads allows you to process more information faster. • Latency: adding threads to make the application react faster, such as GUI actions. • Algorithms which rely on shared data/state
  • 6. What is a process? • An independent process-of-control. • Processes are “share nothing”. • Must use some form of Inter-Process Communication to communicate/coordinate. • Processes are “big”.
  • 7. Uses for Processes. • When you don’t need to share lots of state and want a large amount of throughput. • Shared-Nothing-or-Little is “safer” then shared-everything. • Processes automatically run on multiple cores. • Easier to turn into a distributed application.
  • 8. The Difference • Threads are implicitly “share everything” - this makes the programmer have to protect (lock) anything which will be shared between threads. • Processes are “share nothing” - programmers must explicitly share any data/state - this means that the programmer is forced to think about what is being shared • Explicit is better than Implicit
  • 9. Python Threads • Python has threads, they are real, OS/Kernel level POSIX (p) threads. • When you use threading.Thread, you get a pthread.
  • 10. 2.6 changes • camelCase method names are now foo_bar() style, e.g.: active_count, current_thread, is_alive, etc. • Attributes of threads and processes have been turned into properties. • E.g.: daemon is now Thread.daemon = <bool> • For threading: these changes are optional. The old methods still exist.
  • 11. ... Python does not use “green threads”. It has real threads. OS Ones. Stop saying it doesn’t.
  • 12. ... But: Python only allows a single thread to be executing within the interpreter at once. This restriction is enforced by the GIL.
  • 13. The GIL • GIL: “Global Interpreter Lock” - this is a lock which must be acquired for a thread to enter the interpreter’s space. • Only one thread may be executing within the Python interpreter at once.
  • 14. Yeah but... • No, it is not a bug. • It is an implementation detail of CPython interpreter • Interpreter maintenance is easier. • Creation of new C extension modules easier. • (mostly) sidestepped if the app is I/O (file, socket) bound. • A threaded app which makes heavy use of sockets, won’t see a a huge GIL penalty: it is still there though. • Not going away right now.
  • 15. The other guys • Jython: No GIL, allows “free” threading by using the underlying Java threading system. • IronPython: No GIL - uses the underlying CLR, threads all run concurrently. • Stackless: Has a GIL. But has micro threads. • PyPy: Has a GIL... for now (dun dun dun)
  • 17. What is multiprocessing? • Follows the threading API closely but uses Processes and inter-process communication under the hood • Also offers distributed-computing faculties as well. • Allows the side-stepping of the GIL for CPU bound applications. • Allows for data/memory sharing. • CPython only.
  • 18. Why include it? • Covered in PEP 371, we wanted to have something which was fast and “freed” many users from the GIL restrictions. • Wanted to add to the concurrency toolbox for Python as a whole. • It is not the “final” answer. Nor is it “feature complete” • Oh, and it beats the threading module in speed.* * lies, damned lies and benchmarks
  • 19. How much faster? • It depends on the problem. • For example: number crunching, it’s significantly faster then adding threads. • Also faster in wide-finder/map-reduce situations • Process creation can be sluggish: create the workers up front.
  • 20. Example: Crunching Primes • Yes, I picked something embarrassingly parallel. • Sum all of the primes in a range of integers starting from 1,000,000 and going to 5,000,000. • Run on an 8 Core Mac Pro with 8 GB of ram with Python 2.6, completely idle, except for iTunes. • The single threaded version took so long I needed music.
  • 21. # Single threaded version import math def isprime(n): quot;quot;quot;Returns True if n is prime and False otherwisequot;quot;quot; if not isinstance(n, int): raise TypeError(quot;argument passed to is_prime is not of 'int' typequot;) if n < 2: return False if n == 2: return True max = int(math.ceil(math.sqrt(n))) i = 2 while i <= max: if n % i == 0: return False i += 1 return True def sum_primes(n): quot;quot;quot;Calculates sum of all primes below given integer nquot;quot;quot; return sum([x for x in xrange(2, n) if isprime(x)]) if __name__ == quot;__main__quot;: for i in xrange(100000, 5000000, 100000): print sum_primes(i)
  • 22. # Multi Threaded version from threading import Thread from Queue import Queue, Empty ... def do_work(q): while True: try: x = q.get(block=False) print sum_primes(x) except Empty: break if __name__ == quot;__main__quot;: work_queue = Queue() for i in xrange(100000, 5000000, 100000): work_queue.put(i) threads = [Thread(target=do_work, args=(work_queue,)) for i in range(8)] for t in threads: t.start() for t in threads: t.join()
  • 23. # Multiprocessing version from multiprocessing import Process, Queue from Queue import Empty ... if __name__ == quot;__main__quot;: work_queue = Queue() for i in xrange(100000, 5000000, 100000): work_queue.put(i) processes = [Process(target=do_work, args=(work_queue,)) for i in range(8)] for p in processes: p.start() for p in processes: p.join()
  • 24. Results • All results are in wall-clock time. • Single Threaded: 41 minutes, 57 seconds • Multi Threaded (8 threads): 106 minutes, 29 seconds • MultiProcessing (8 Processes): 6 minutes, 22 seconds • This is a trivial example. More benchmarks/data were included in the PEP.
  • 25. The catch. • Objects that are shared between processes must be serialize-able (pickle). • 40921 object/sec versus 24989 objects a second. • Processes are “heavy-weight”. • Processes can be slow to start. (on windows) • Supported on Linux, Solaris, Windows, OS/X - but not *BSD, and possibly others. • If you are creating and destroying lots of threads - processes are a significant impact.
  • 27. It starts with a Process • Exactly like threading: • Thread(target=func, args=(args,)).start() • Process(target=func, args=(args,)).start() • You can subclass multiprocessing.Process exactly as you would with threading.Thread.
  • 28. from threading import Thread threads = [Thread(target=do_work, args=(q,)) for i in range(8)] from multiprocessing import Process processes = [Process(target=do_work, args=(q,)) for i in range(8)]
  • 29. # Multiprocessing version # Threading version from multiprocessing import Process from threading import Thread class MyProcess(Process): class MyThread(Thread): def __init__(self): def __init__(self): Process.__init__(self) Thread.__init__(self) def run(self): def run(self): a, b = 0, 1 a, b = 0, 1 for i in range(100000): for i in range(100000): a, b = b, a + b a, b = b, a + b if __name__ == quot;__main__quot;: if __name__ == quot;__main__quot;: p = MyProcess() t = MyThread() p.start() t.start() print p.pid t.join() p.join() print p.exitcode
  • 30. Queues • multiprocessing includes 2 Queue implementations - Queue and JoinableQueue. • Queue is modeled after Queue.Queue but uses pipes underneath to transmit the data. • JoinableQueue is the same as Queue except it adds a .join() method and .task_done() ala Queue.Queue in python 2.5.
  • 31. Queue.warning • The first is that if you call .terminate or kill a process which is currently accessing a queue: that queue may become corrupted. • The second is that any Queue that a Process has put data on must be drained prior to joining the processes which have put data there: otherwise, you’ll get a deadlock. • Avoid this by calling Queue.cancel_join_thread() in the child process. • Or just eat everything on the results pipe before calling join (e.g. work_queue, results_queue).
  • 32. Pipes and Locks • Multiprocessing supports communication primitives. • multiprocessing.Pipe(), which returns a pair of Connection objects which represent the ends of the pipe. • The data sent on the connection must be pickle-able. • Multiprocessing has clones of all of the threading modules lock/RLock, Event, Condition and semaphore objects. • Most of these support timeout arguments, too!
  • 33. Shared Memory • Multiprocessing has a sharedctypes module. • This module allows you to create a from multiprocessing import Process ctypes object in shared memory from multiprocessing.sharedctypes import Value and share it with other processes. from ctypes import c_int def modify(x): • The sharedctypes module offers x.value += 1 some safety through the use/ x = Value(ctypes.c_int, 7) allocation of locks which prevent p = Process(target=modify, args=(x)) simultaneous accessing/ p.start() modification of the shared objects. p.join() print x.value
  • 34. Pools • One of the big “ugh” moments using threading is when you have a simple problem you simply want to pass to a pool of workers to hammer out. • Fact: There’s more thread pool implementations out there then stray cats in my neighborhood.
  • 35. Process Pools! • Multiprocessing has the Pool object. This supports the up-front creation of a number of processes and a number of methods of passing work to the workers. • Pool.apply() - this is a clone of builtin apply() function. • Pool.apply_async() - which can call a callback for you when the result is available. • Pool.map() - again, a parallel clone of the built in function. • Pool.map_async() method, which can also get a callback to ring up when the results are done. • Fact: Functional programming people love this!
  • 36. Pools raise insurance rates! from multiprocessing import Pool def f(x): return x*x if __name__ == '__main__': pool = Pool(processes=2) result = pool.apply_async(f, (10,)) print result.get() The output is: 100, note that the result returned is a AsyncResult type.
  • 37. Managers • Managers are a network and process-based way of sharing data between processes (and machines). • The primary manager type is the BaseManager - this is the basic Manager object, and can easily be subclassed to share data remotely • A proxy object is the type returned when accessing a shared object - this is a reference to the actual object being exported by the manager.
  • 38. Sharing a queue (server) # Manager Server from Queue import Empty from multiprocessing import managers, Queue _queue = Queue() def get_queue(): return _queue class QueueManager(managers.BaseManager): pass QueueManager.register('get_queue', callable=get_queue) m = QueueManager(address=('127.0.0.1', 8081), authkey=quot;lolquot;) _queue.put('What’s up remote process') s = m.get_server() s.serve_forever()
  • 39. Sharing a queue (client) # Manager Client from multiprocessing import managers class QueueManager(managers.BaseManager): pass QueueManager.register('get_queue') m = QueueManager(address=('127.0.0.1', 8081), authkey=quot;lolquot;) m.connect() remote_queue = m.get_queue() print remote_queue.get()
  • 40. Gotchas • Processes which feed into a multiprocessing.Queue will block waiting for all the objects it put there to be removed. • Data must be pickle-able: this means some objects (for instance, GUI ones) can not be shared. • Arguments to Proxies (managers) methods must be pickle-able as well. • While it supports locking/semaphores: using those means you’re sharing something you may not need to be sharing.
  • 41. In Closing • Multiple processes are not mutually exclusive with using Threads • Multiprocessing offers a simple and known API • This lowers the barrier of entry significantly • Side steps the GIL • In addition to “just processes” multiprocessing offers the start of grid-computing utilities