SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
Making Apache suck less for hosting
                               Python web applications
                                     Graham Dumpleton
                                   PyCon US - March 2013




Saturday, 16 March 2013
Following Along




                          http://www.slideshare.net/GrahamDumpleton


                                   Slides contains presenter notes!




Saturday, 16 March 2013


If you want to follow along with the slides for this talk on
your laptop, you can view them on slideshare.net. The
slides do have my presenter notes, so you can also read
it all again later if you are too busy surfing the net, or
didn't catch some point.
Apache Sucks


      •   Is too hard to configure.
      •   Is bloated and uses heaps of memory.
      •   Is slow and doesn't perform very well.
      •   Is not able to handle a high number of concurrent requests.




Saturday, 16 March 2013


These days there seems to be a never ending line of
people who will tell you that Apache sucks and that you
are crazy if you use it, especially for hosting Python web
applications. What is the truth? Do the people who make
such claims actually understand how Apache works or
are they just repeating what someone else told them?
Apache Sucks


      •   Is too hard to configure.
      •   Is bloated and uses heaps of memory.
      •   Is slow and doesn't perform very well.
      •   Is not able to handle a high number of concurrent requests.




Saturday, 16 March 2013


As the author of the mod_wsgi module for Apache, what
I want to do in this talk is go through and look at what
some of the pain points are when configuring Apache to
run Python web applications. The intent is that you can
walk away with a bit more insight into how Apache works
and what is required to properly setup Apache and
mod_wsgi. So, if you like, I am going to explain how to
make Apache suck less.
Where Has All My Memory Gone


      •   Reasons for excessive memory usage.
          •   Python web applications are fat to start with.
          •   Poor choice of multiprocessing module (MPM).
          •   Poor choice of configuration for the MPM used.
          •   Loading of Apache modules you aren't using.
          •   Size of Apache memory pools for each thread.
          •   Inability to benefit from copy on write.




Saturday, 16 March 2013


The biggest criticism which seems to be levelled at
Apache is that it is bloated and uses too much memory.
There are various reasons Apache can use a lot of
memory. Many of these are under the control of the user
and not necessarily a failing of Apache though.
Why Is Response Time So Bad


      •   Reasons for slow response times.
          •   Not enough capacity configured to handle throughput.
          •   Keep alive causing artificial reduction in capacity.
          •   Machine slowing down due to frequent process recycling.
          •   High cost of loading WSGI applications on process startup.




Saturday, 16 March 2013


Another criticism is why is Apache so slow. Like with
memory usage, this can also have a lot to do with how
Apache has been configured. In practice, if Apache is
simply setup properly for the specifics of running
dynamic Python web applications, and takes into
consideration the constraints of the system it is being
run on, neither of these should be an issue.
Streamlining The Apache Installation




             LoadModule   authz_host_module modules/mod_authz_host.so
             LoadModule   mime_module modules/mod_mime.so
             LoadModule   rewrite_module modules/mod_rewrite.so
             LoadModule   wsgi_module modules/mod_wsgi.so




Saturday, 16 March 2013


The first thing one can do is to strip down what modules
Apache is loading. Because Apache is a workhorse that
can be used for many different tasks, it comes with a
range of pluggable modules. There is likely going to be
any number of modules getting loaded you aren't using.
To cut down on base memory used by Apache itself, you
should disable all Apache modules you are not using.
Python Web Applications Are Fat


      •   Raw Web Server
          •    Apache - Streamlined (2 MB)

      •   Python WSGI Hello World
          •   Apache/mod_wsgi - Streamlined (5 MB)
          •   Apache/mod_wsgi - Kitchen Sink (10MB)
          •   Gunicorn - Sync Worker (10MB)

      •   Real Python Web Application
          •   Django (20-100MB+)




Saturday, 16 March 2013


Beyond the server, it has to be recognised that any use
of Python will cause an immediate increase in memory
used. Load a typical web application, along with all the
modules from the standard library it requires, as well as
third party modules and memory use will grow quite
quickly. The actual base memory consumed by the web
server at that point can be quite small in comparison.
Python Web Applications Are Fat


      •   Raw Web Server
          •    Apache - Streamlined (2 MB)

      •   Python WSGI Hello World
          •   Apache/mod_wsgi - Streamlined (5 MB)
          •   Apache/mod_wsgi - Kitchen Sink (10MB)
          •   Gunicorn - Sync Worker (10MB)

      •   Real Python Web Application
          •   Django (20-100MB+)




Saturday, 16 March 2013


Overall, it shouldn't really matter what WSGI server you
use, the Python interpreter and the Python web
application itself should always use a comparable
amount of memory for a comparable configuration. The
laws of nature don't suddenly change when you start
using Apache to host a Python web application. Memory
used by the Python web application itself in a single
process should not suddenly balloon out for no reason.
Processes Vs Threads


                                               Server            Browser
                                               Parent             Client




                                      Server   Server   Server
                          Processes
                                      Worker   Worker   Worker



                           Threads




Saturday, 16 March 2013


An appearance therefore of increased memory usage is
more likely going to be due to differences in the server
architecture. When I say server architecture, I specifically
mean the mix of processes vs threads that are used by
the server hosting the WSGI application to handle
requests. Using processes in preference to threads will
obviously mean that more memory is being used.
Different Server Architectures


      •   Apache
          •   Prefork MPM - Multiple single threaded processes.
          •   Worker MPM - Multiple multi threaded processes.
          •   WinNT MPM - Single multi threaded processes.

      •   Gunicorn
          •   Sync Worker - Multiple single threaded processes.




                                                          MPM = Multiprocessing Module



Saturday, 16 March 2013


The big problem in this respect is that beginners know
no better and will use whatever the default configuration
is that their server distribution provides. For Apache,
which is often supplied with the prefork multiprocessing
module, or MPM, this can very easily cause problems,
because it uses single threaded processes.
Default Server Configuration


      •   Apache
          •   Prefork MPM - 150 processes (maximum) / 1 thread per process.
          •   Worker MPM - 6 processes (maximum) / 25 threads per process.
          •   WinNT MPM - 1 process (fixed) / 150 threads per process.

      •   Gunicorn
          •   Sync Worker - 1 process (fixed) / 1 thread per process.




Saturday, 16 March 2013


Although prefork MPM may only initially start out with a
single process, it can automatically scale out to 150
processes. That is 150 copies of your Python web
application. At 20MB per process that is already 3GB and
at 20MB that would be considered a small Python web
application. In contrast, gunicorn sync worker defaults to
a single process and single thread and doesn't scale. The
memory requirement of gunicorn would therefore stay
the same over time.
Making Apache Suck Less (1)


      •   Don't use Apache default configurations for hosting Python
          web applications.
      •   Don't use the prefork MPM unless you know how to
          configure Apache properly, use the worker MPM, it is more
          forgiving.
      •   Don't allow Apache to automatically scale out the number
          of processes over too great a range.
      •   Don't try and use a single Apache instance to host Python,
          PHP, Perl web applications at the same time.




Saturday, 16 March 2013


So, whatever you do, don't use the default configuration
that comes with your server distribution. For Python web
applications you generally can't avoid having to tune it.
This is because the Apache defaults are setup for static
file serving and PHP applications. Especially don't try and
use the same Apache instance to host Python web
applications at the same time as running PHP or Perl
applications as each has different configuration
requirements.
What MPM Are You Using?




                      $ /usr/sbin/httpd -V | grep 'Server MPM'
                      Server MPM:     Prefork




Saturday, 16 March 2013


How do you work out which MPM you are using? Prior to
Apache 2.4 the type of MPM being used was defined
when Apache was being compiled and was statically
linked into the Apache executable. From Apache 2.4, the
MPM can also be dynamically loaded and so defined at
runtime by the Apache configuration. Either way, you can
determine the MPM in use by running the Apache
executable with the '-V' option.
WSGI multiprocess/multithread


                            wsgi.run_once   wsgi.multiprocess   wsgi.multithread

                     CGI       TRUE              TRUE                FALSE

                  Prefork      FALSE             TRUE                FALSE

                 Worker        FALSE             TRUE                TRUE

                 WinNT         FALSE             FALSE               TRUE




Saturday, 16 March 2013


Another way of determining the specific process
architecture in use is by consulting the multiprocess and
multithread attributes passed in the WSGI environ with
each request. Neither of these though will actually tell
you how many processes or threads are in use. For that
you need to start looking at the Apache configuration
itself.
Defaults From Configuration File


                              extra/httpd-mpm.conf



          <IfModule mpm_prefork_module>      <IfModule mpm_worker_module>

          StartServers          1            StartServers          2
          MinSpareServers       1            MaxClients          150
          MaxSpareServers      10            MinSpareThreads      25
          MaxClients          150            MaxSpareThreads      75
          MaxRequestsPerChild   0            ThreadsPerChild      25
                                             MaxRequestsPerChild   0
          </IfModule>
                                             </IfModule>




Saturday, 16 March 2013


This is where we can actually end up in a trap. For the
standard Apache configuration as provided with the
Apache Software Foundation's distribution, although
there are example MPM settings provided, that
configuration file isn't actually included by default. The
settings in that file are also different to what is compiled
into Apache, so you can't even use it as a guide to what
the compiled in defaults are.
Compiled In Prefork MPM Settings



                          StartServers              5
                          MinSpareServers           5
                          MaxSpareServers          10
                          MaxClients              256
                          MaxRequestsPerChild   10000




Saturday, 16 March 2013


So although I said before that the prefork MPM could
scale up to 150 processes automatically, that was on the
assumption that the default settings in the Apache
configuration file were actually used. If those settings
aren't used, then it is instead 256 processes, making it
even worse. Some people recommend throwing away the
default configuration files and starting from scratch
meaning one uses the more lethal compiled in settings.
Meaning Of Prefork Settings


      •   StartServers - Number of child server processes created
          at startup.
      •   MaxClients - Maximum number of connections that will be
          processed simultaneously.
      •   MaxRequestsPerChild - Limit on the number of requests
          that an individual child server will handle during its life.




Saturday, 16 March 2013


For those who are not familiar with these settings, what
do they actually mean. StartServers is the initial number
of processes created to handle requests. Because prefork
uses single threaded processes, the maximum number
of processes ends up being dictated by MaxClients.
Automatic Scaling In Prefork


      •   MinSpareServers - Minimum number of idle child server
          processes.
      •   MaxSpareServers - Maximum number of idle child server
          processes.




Saturday, 16 March 2013


The settings which need more explanation are the min
and max spare processes. The purpose of these is to
control how Apache dynamically adjusts the number of
processes being used to handle requests.
Algorithm For Scaling In Prefork




                if idle_process_count > max_spare_servers:
                  kill a single process

                elif idle_process_count < min_spare_servers:
                  spawn one or more processes




Saturday, 16 March 2013


In very simple terms, what Apache does is wake up each
second and looks at how many idle processes it has at
that point which are not handling requests. If it has more
idle processes than the maximum specified, it will kill off
a single process. If it has less idle processes than the
minimum spare required it will spawn more. How many it
spawns will depend on whether it had spawned any in
the previous check and whether it is creating them
quickly enough.
Visualising Scaling In Prefork

                                   Concurrent Requests



                                     Process Creation



                                   Total No Of Processes



                                      Idle Processes




Saturday, 16 March 2013


When presented with the MPM settings being used, even
I have to think hard sometimes about what the result of
those settings will be. To make it easier to understand, I
run the settings through a simulator and chart the
results. In this example the number of concurrent
requests is ramped up from zero and when maximum
capacity is reached, it is ramped back down to zero
again.
Visualising Scaling In Prefork

                                          Concurrent Requests



                          Create   Kill     Process Creation



                                          Total No Of Processes



                                             Idle Processes




Saturday, 16 March 2013


By using a simulator and visualising the results, it
becomes much easier to understand how Apache will
behave. We can see when Apache would create processes
or kill them off. What the total number of processes will
be as it scales, and how that is being driven by trying to
always maintain a pool of idle processes within the
bounds specified.
Floor On The Number Of Processes
                                            MaxSpareServers




                          MinSpareServers




Saturday, 16 March 2013


Zooming in on the chart for the total number of
processes available to handle requests, one thing that
stands out for example is that there is an effective floor
on the number of processes which will be kept around.
No processes will be killed if we are at or below this level.
Starting Less Than The Minimum



                          StartServers        1
                          MinSpareServers     5
                          MaxSpareServers    10
                          MaxClients        256




Saturday, 16 March 2013


Too often, people who have no idea how to configure
Apache get in and start mucking around with these
values, not understanding the implications of what they
are doing. The simulator is great in being able to give a
quick visual indicator as to whether something is amiss.
One example is where the number of servers to be
started is less than the minimum number of spare
servers.
Delayed Started Of Processes


                                       Creating Processes




Saturday, 16 March 2013


What happens in this case is that as soon as Apache
starts doing its checks, it sees that it isn't actually
running enough processes to satisfy the requirement for
the minimum number of idle processes. It therefore
starts creating more, doubling the number each time
until it has started enough. Rather than actually starting
all processes immediately, in this case it takes 3 seconds
thus potentially limiting the initial capacity of the server.
Starting Up Too Many Servers



                          StartServers      100
                          MinSpareServers     5
                          MaxSpareServers    10
                          MaxClients        256




Saturday, 16 March 2013


At the other end of the scale, we have people who
change the number of servers to be started to be greater
than the maximum spare allowed.
Immediate Kill Off Of Processes


                                        Killing Processes




Saturday, 16 March 2013


This time when Apache starts doing its checks, if finds it
has more idle processes than allowed and starts killing
them off at a rate of 1 per second. Presuming no traffic
came in that necessitated those processes actually
existing, it would take over a minute to kill off all the
excess processes.
Making Apache Suck Less (2)


      •   Ensure MaxSpareServers is greater than MinSpareServers.
          If you don't, Apache will set MaxSpareServers to be
          MinSpareServers+1 for you anyway.
      •   Don't set StartServers to be less than MinSpareServers as
          it will delay start up of processes so as to reach minimum
          spare required.
      •   Don't set StartServers to be greater than MaxSpareServers
          as processes will start to be killed off immediately.




Saturday, 16 March 2013


To avoid such delayed process creation, or immediate
killing off of processes on startup, you should ensure
that the value of StartServers is bounded by
MinSpareServers and MaxSpareServers.
Overheads Of Process Creation


      •   Initialisation of the Python interpreter.
      •   Loading of the WSGI application.
      •   Loading of required standard library modules.
      •   Loading of required third party modules.
      •   Initialisation of the WSGI application.




Saturday, 16 March 2013


Why do we care about unnecessary process creation?
After all, aren't the processes just a fork of the Apache
parent process and so cheap to create? The problem is
that unlike mod_php where PHP is initialised and all
extension modules preloaded into the Apache parent
process, when using mod_wsgi, Python initialisation is
deferred until after the processes are forked. Any
application code and required Python modules are then
lazily loaded.
Preloading Is A Security Risk


      •   You don't necessarily know which WSGI application to load
          until the first request arrives for it.
      •   Python web applications aren't usually designed properly
          for preloading prior to forking of worker processes.
      •   All code run in the Apache parent process is run as root.




Saturday, 16 March 2013


If initialising Python and loading the WSGI application in
the worker process can be expensive, why can't we
preload everything in the Apache parent process before
the worker processes are forked? Even if a Python web
application were designed to be able to be preloaded
and run properly after the process was forked, the key
issue is that users application code on startup would run
as root if executed in the parent process and that is one
very big security risk.
Preloading Causes Memory Leaks


      •   The Python interpreter will leak memory into the parent
          process when an Apache restart occurs.




Saturday, 16 March 2013


Add to that, because of what Python does (or should I
say doesn't do) when the interpreter is destroyed,
combined with the way in which Apache reloads
mod_wsgi when restarting, the Python interpreter will
leak memory into the Apache parent process. If Apache
restarts are done on a regular basis, the size of the
Apache parent will keep growing over time and thus so
will the forked worker processes as well.
Downsides Of Not Preloading


      •   Additional CPU load when creating new worker process.
      •   Worker processes will not be immediately ready.
      •   No saving in memory usage from copy on write.




Saturday, 16 March 2013


So running within Apache we have no choice and have to
defer initialisation of Python and loading of the WSGI
application until after the child processes are forked.
This causes additional CPU load each time a process is
started up and the time taken will also mean that
requests will be held up. Finally, because we are not
preloading in the parent, we cannot benefit from reduced
memory usage from copy on write features of the
operating system.
Avoiding Process Creation




Saturday, 16 March 2013


Process startup is therefore expensive. We want to avoid
doing it and certainly don't want it occurring at a time
which is inconvenient. Unfortunately, especially with
prefork MPM, allowing Apache to dynamically create
processes can result in a lot of process churn if you are
not careful. The more problematic situation is where
there is a sudden burst off traffic and there is not
enough processes already running to handle it.
Sudden Burst In Traffic



                          Sudden need to create processes.




Saturday, 16 March 2013


In the worst case scenario, the increased load from
creating processes when a sustained traffic spike occurs,
could see the whole system slow down. The slow down
can make Apache think it isn't creating enough
processes quickly enough, so it keeps creating more and
more. Pretty quickly it has created the maximum number
of processes, with the combined CPU load of loading the
WSGI application for all of them, causing the server to
grind to a halt.
Constant Churn Of Processes



                          Continual killing off and creation of processes.




Saturday, 16 March 2013


Even after the processes have been created we can still
see process churn. This is because Apache doesn't look
at request load over time, it only looks at the number of
concurrent requests running at the time of the check.
This can actually bounce around quite a lot each second.
There will therefore be a continual churn of processes as
Apache thinks there is more than required and kills some
off and then when it again believes it does not have
enough and creates more.
Raising The Floor



                          StartServers        5
                          MinSpareServers     5
                          MaxSpareServers   150
                          MaxClients        256




Saturday, 16 March 2013


What if we raise that floor on the number of processes as
determined by the MaxSpareServers setting? We said that
so long as the number of processes was below this level,
none would be killed off. Lets try then setting that to a
level above the average number of processes in use.
Number Of Processes Plateaus



                          Initial Spike   Reduced Incidence
                                          Of Process Creation




Saturday, 16 March 2013


What will happen is that although there will be an initial
spike in the number of processes created, after that
there will only be a slow increase as the number of
processes finds its natural level and plateaus. So long as
we stay below MaxSpareServers we will avoid process
churn.
Breaching The Maximum



                             Falls Back To
                          Maximum After Spike




Saturday, 16 March 2013


We haven't set the maximum number of spare processes
to the maximum allowed clients quite yet though. So it is
still possible that when a spike occurs we will create
more than the maximum spare allowed. When traffic
recedes, the number of processes will reduce back to the
level of the maximum and no further. Finding the
optimal level for the maximum spare processes to avoid
churn can be tricky, but one can work it out by
monitoring utilisation.
Maximum Number Of Requests




                          MaxRequestsPerChild   0




Saturday, 16 March 2013


Do be aware though that all this work in tuning the
settings can be undone by the MaxRequestsPerChild
setting. We want to avoid process churn. It is no good
setting this to such a low value that this would cause
process recycling after a very short period of time, as it
just reintroduces process churn in another way. It is
better to have this be zero resulting in processes staying
persistent in memory until shutdown.
Handling Large Number Of Clients




                          MaxClients     256




Saturday, 16 March 2013


Now the only reason that the defaults for Apache specify
such a large value for MaxClients, and thus a large
number of processes when single threading is used, is
because of slow clients and keep alive. A high number is
required to support concurrent sessions from many
users. If using single threaded processes though, this
means you will need to have much more memory
available.
Front End Proxy And KeepAlive



                                                     nginx
                                   Server                    Browser
                                                     Front
                                   Parent                     Client
                                                      End




                          Server   Server   Server
           Processes
                          Worker   Worker   Worker




Saturday, 16 March 2013


A much better solution is to put a nginx proxy in front of
Apache. The nginx server will isolate Apache from slow
clients as it will only forward a request when it is
completely available and can be handled immediately.
The nginx server can also handle keep alive connections
meaning it can be turned off in Apache. This allows us to
significantly reduce the number of processes needed for
Apache to handle the same amount of traffic as before.
Making Apache Suck Less (3)


      •   Set MaxSpareProcesses at a level above the typical
          number of concurrent requests you would need to handle.
      •   Do not use MaxRequestsPerChild, especially at a low
          count which would cause frequent process churn.
      •   Remember that if you don't set MaxRequestsPerChild
          explicitly, it defaults to 10000.
      •   Use nginx as a front end proxy to isolate Apache from slow
          clients.
      •   Turn off keep alive in Apache when using nginx as a front
          end.


Saturday, 16 March 2013


Key in eliminating unwanted CPU usage was therefore
avoiding process churn which is achieved by adjusting
the maximum allowed number of spare processes and
ensuring we aren't periodically recycling processes for no
good reason. We can though also reduce the number of
processes we need in the first place by adding nginx as a
proxy in front of Apache.
Prefork MPM Vs Worker MPM

                                     Prefork MPM




Saturday, 16 March 2013


All of what I have explained so far focused on prefork
MPM. I have concentrated on it because it magnifies the
problems that can arise. When people say Apache sucks
it is usually because they were using prefork MPM with
an inadequate configuration. Use of prefork MPM with a
nginx proxy will give you the best performance possible
if setup correctly. As I said before though, using worker
MPM is much more forgiving of you having a poor setup.
Visualising Scaling In Prefork

                                     Worker MPM




Saturday, 16 March 2013


The reasons for this are that for the same large default
value of MaxClients, worker MPM will use a lot less
processes than prefork MPM. This is because each
process will have 25 threads handling requests instead
of 1. Being less processes, worker MPM will therefore see
less copies of your Python web application and so less
memory usage.
Compiled In Worker MPM Settings



                          StartServers              3
                          MinSpareThreads          75
                          MaxSpareThreads         250
                          ThreadsPerChild          25
                          MaxClients              400
                          MaxRequestsPerChild   10000




Saturday, 16 March 2013


In the case of worker MPM, by default 3 processes would
be started initially with the compiled in defaults. With
MaxClients of 400 and ThreadsPerChild being 25, that
means a maximum of 16 processes would be created.
Automatic Scaling In Worker


      •   MinSpareThreads - Minimum number of idle threads
          available to handle request spikes.
      •   MaxSpareThreads - Maximum number of idle threads.




Saturday, 16 March 2013


Settings related to scaling when using worker MPM refer
to threads whereas with prefork MPM they were in terms
of processes. MaxSpareThreads defaults to 250, which
equates to the equivalent of 10 processes.
Making Apache Suck Less (4)


      •   Ensure MaxSpareThreads is at at least MinSpareThreads
          +ThreadsPerChild. If you don't, Apache will set it to that for
          you anyway.
      •   Suggested that MinSpareThreads and MaxSpareThreads
          be set as multiples of ThreadsPerChild.
      •   Don't set StartServers to be less than MinSpareThreads/
          ThreadsPerChild as it will delay start up of processes so as
          to reach minimum spare required.
      •   Don't set StartServers to be greater than
          MaxSpareThreads/ThreadsPerChild as processes will start
          to be killed off immediately.

Saturday, 16 March 2013


One very important thing to note, is that although these
are expressed in terms of threads, Apache doesn't scale
at the thread level. The number of threads per process is
static. When scaling it is the same as prefork, a process
will either be created or killed. The decision though is
based on available threads instead.
Worker Defaults More Forgiving



                          Initial Spike Is For Much Fewer Processes
                                   Followed By No Churn At All




Saturday, 16 March 2013


Running our simulation of random traffic from before
with a similar level of concurrent requests and although
we still had a initial spike in creating processes, no new
processes were needed after that, as we were within the
level specified by max spare threads. No churn means no
wasted CPU through continually creating and killing
processes. Using the compiled in defaults at least, this is
why worker MPM is more forgiving that prefork MPM.
Reducing Per Thread Memory Use


      •   MaxMemFree - Maximum amount of memory that the
          main allocator is allowed to hold without calling free().




                    MaxMemFree      256         # KBytes




Saturday, 16 March 2013


As before, and especially if using nginx as a front end
proxy, one can adjust MaxClients, min and max spare
threads and perhaps bring down even further the
amount of resources used. A more important setting
though is MaxMemFree. This is the maximum amount of
memory the Apache per thread memory pool is allowed
to hold before calling free on memory. Prior to Apache
2.4, this was unbounded. In Apache 2.4 it is 2MB.
Making Apache Suck Less (5)


      •   Ensure that MaxMemFree is set and not left to be
          unbounded.
      •   Even on Apache 2.4 where is 2MB, consider reducing the
          value further.




Saturday, 16 March 2013


Even at 2MB in Apache 2.4, this could mean that for 25
threads, 50MB can be held by the persistent memory
pools in each process. When running mod_wsgi, under
normal circumstances, there should not be much call for
memory to be allocated from the per request memory
pool. To be safe though you should ensure MaxMemFree
is set and with a reduced value if possible.
Daemon Mode Of Apache/mod_wsgi

                                                  Server             Browser
                                                  Parent              Client




                                      Server      Server   Server
                          Processes
                                      Worker      Worker   Worker




                                      Daemon Process(es)   Daemon
                                                           Process




                                               Threads




Saturday, 16 March 2013


Now the configuration for prefork or worker MPM are
principally an issue when using what is called embedded
mode of mod_wsgi. That is, your WSGI application runs
inside of the Apache server child worker processes. The
dynamic scaling algorithm of Apache being what can
cause us grief when doing this. Using worker MPM helps,
but an even safer alternative is to use mod_wsgi daemon
mode instead. In this case your WSGI application runs in
a separate set of managed processes.
Daemon Mode Configuration




         WSGIDaemonProcess myapp processes=3 threads=5

         WSGIScriptAlias / /some/path/wsgi.py 
             process-group=myapp application-group=%{GLOBAL}




Saturday, 16 March 2013


The main difference when using daemon mode is that
there is no automatic scaling of the number of
processes. The number of processes and threads is
instead fixed. Being fixed everything is more predictable
and you only need to ensure you have sufficient capacity.
Using daemon mode, the need to have nginx as a front
end is reduced as the Apache server child worker
processes are serving much the same process in
isolating the WSGI application from slow clients.
Exclusively Using Daemon Mode


      •   WSGIRestrictEmbedded - Controls whether the Python
          interpreter is initialised in Apache server worker processes.




                          WSGIRestrictEmbedded        On




Saturday, 16 March 2013


Because the Apache server processes are now only acting
as a proxy, forwarding requests to the mod_wsgi
daemon process, as well as serving static files, we don't
need to initialise the Python interpreter in the Apache
server processes. Process creation is again lightweight
and we have side stepped the need to pay so much
attention to the Apache MPM settings.
The Things That Make Apache Suck


      •   An algorithm for dynamically scaling processes which isn't
          particularly suited to embedded Python web applications.
      •   Default MPM and settings which magnify the issues which
          can arise with dynamic scaling when running Python web
          applications.
      •   A concurrency mechanism that can use a lot of memory for
          a high number of concurrent requests, especially around
          handling of keep alive connections.
      •   Defaults for memory pool sizes which cause Apache to be
          heavyweight on memory usage.


Saturday, 16 March 2013


So Apache can certainly be a challenging environment for
running Python web applications. The main pain points
are how its algorithm for dynamic scaling works and
memory requirements to support high concurrency. With
careful attention it is possible though to configure
Apache to reduce the problems these can cause.
Application Performance Monitoring




Saturday, 16 March 2013


The simulator I demonstrated can be used to try and
validate any configuration before you use it, but the
random nature of web site traffic means that it will not
be conclusive. This is where live monitoring of traffic in
your production web site provides a much better level of
feedback. New Relic is obviously the package I would like
to see you using, but any monitoring is better than none.
Capacity Analysis




Saturday, 16 March 2013


In New Relic, one of the reports it generates which is
particularly relevant to coming up with the best
processes/threads configuration is its capacity analysis
report. From this report one can see whether you have
provided enough capacity, or whether you have over
allocated and so wasting memory, or are running your
application over more hosts than you need and therefore
paying more money for hosting than you need.
Capacity Analysis




Saturday, 16 March 2013


Although this talk has been about Apache/mod_wsgi,
this report is just as relevant to other WSGI hosting
mechanisms, such as gunicorn and uWSGI. Working at
New Relic and being able to see data coming in from a
wide variety of deployments it is really quite amazing
how poorly some servers are being set up, and not just
when Apache is being used. So if you are using New
Relic, I would really suggest paying a bit more attention
to this report. Doing so can help you make your server
run better and possibly save you money as well.
More Information


      •   Slides (with presenter notes).
          •   http://www.slideshare.net/GrahamDumpleton

      •   Apache/mod_wsgi mailing list (preferred contact point).
          •   http://groups.google.com/group/modwsgi

      •   New Relic (Application Performance Monitoring)
          •   http://newrelic.com
          •   http://newrelic.com/pycon (special 30 day promo code - pycon13)

      •   Personal blog posts on Apache/mod_wsgi and WSGI.
          •   http://blog.dscpl.com.au

      •   If you really really must bother me directly.
          •   Graham.Dumpleton@gmail.com
          •   @GrahamDumpleton



Saturday, 16 March 2013


And that is all I want to cover today. If you are after more
information, especially if you are interested in the
simulator I demonstrated, keep an eye on my blog for
more details of that sometime in the near future. If you
are interested in using New Relic to better configure your
WSGI server, then you can catch me in the expo hall after
the talk. Questions?

Contenu connexe

En vedette

Glimpse Pp You Are What You Absorb!
Glimpse Pp You Are What You Absorb!Glimpse Pp You Are What You Absorb!
Glimpse Pp You Are What You Absorb!Lori Jones
 
Deteccio maltractament infantil
Deteccio maltractament infantilDeteccio maltractament infantil
Deteccio maltractament infantilJordi Muner
 
Brochure oilandgas
Brochure oilandgasBrochure oilandgas
Brochure oilandgasapallares1
 
PROMOCIÓN EQUIPOS BÁSICOS LABORATORIO 2015-2016
PROMOCIÓN EQUIPOS BÁSICOS LABORATORIO 2015-2016PROMOCIÓN EQUIPOS BÁSICOS LABORATORIO 2015-2016
PROMOCIÓN EQUIPOS BÁSICOS LABORATORIO 2015-2016Controltecnica
 
Targeting Your Extension Audience Through Social Media
Targeting Your Extension Audience Through Social MediaTargeting Your Extension Audience Through Social Media
Targeting Your Extension Audience Through Social MediaAndy Kleinschmidt
 
Presentación filtros y mantenimiento wabco
Presentación filtros y mantenimiento wabcoPresentación filtros y mantenimiento wabco
Presentación filtros y mantenimiento wabcoJonathan Ortiz
 
Educación por la Paz en 7º Grado Turno MañAna
Educación por la Paz en  7º Grado Turno MañAnaEducación por la Paz en  7º Grado Turno MañAna
Educación por la Paz en 7º Grado Turno MañAnaesc3de2
 
Capítulo 17 muestre de aceptació1
Capítulo 17 muestre de aceptació1Capítulo 17 muestre de aceptació1
Capítulo 17 muestre de aceptació1leodanabelardo
 

En vedette (20)

Semillero de informática iee
Semillero de informática ieeSemillero de informática iee
Semillero de informática iee
 
Fun Print
Fun PrintFun Print
Fun Print
 
Grupo2
Grupo2Grupo2
Grupo2
 
Glimpse Pp You Are What You Absorb!
Glimpse Pp You Are What You Absorb!Glimpse Pp You Are What You Absorb!
Glimpse Pp You Are What You Absorb!
 
Deteccio maltractament infantil
Deteccio maltractament infantilDeteccio maltractament infantil
Deteccio maltractament infantil
 
Lenovo
LenovoLenovo
Lenovo
 
SAP and Alfresco
SAP and AlfrescoSAP and Alfresco
SAP and Alfresco
 
La crisis económica en europa
La crisis económica en europaLa crisis económica en europa
La crisis económica en europa
 
O buraco negro
O buraco negroO buraco negro
O buraco negro
 
Espanhol
EspanholEspanhol
Espanhol
 
WTM | C-Magazine n 27
WTM | C-Magazine n 27WTM | C-Magazine n 27
WTM | C-Magazine n 27
 
Proyecto de digitalización documental asam
Proyecto de digitalización documental asamProyecto de digitalización documental asam
Proyecto de digitalización documental asam
 
Bufy lana cool
Bufy lana coolBufy lana cool
Bufy lana cool
 
Brochure oilandgas
Brochure oilandgasBrochure oilandgas
Brochure oilandgas
 
PROMOCIÓN EQUIPOS BÁSICOS LABORATORIO 2015-2016
PROMOCIÓN EQUIPOS BÁSICOS LABORATORIO 2015-2016PROMOCIÓN EQUIPOS BÁSICOS LABORATORIO 2015-2016
PROMOCIÓN EQUIPOS BÁSICOS LABORATORIO 2015-2016
 
Targeting Your Extension Audience Through Social Media
Targeting Your Extension Audience Through Social MediaTargeting Your Extension Audience Through Social Media
Targeting Your Extension Audience Through Social Media
 
Presentación filtros y mantenimiento wabco
Presentación filtros y mantenimiento wabcoPresentación filtros y mantenimiento wabco
Presentación filtros y mantenimiento wabco
 
Educación por la Paz en 7º Grado Turno MañAna
Educación por la Paz en  7º Grado Turno MañAnaEducación por la Paz en  7º Grado Turno MañAna
Educación por la Paz en 7º Grado Turno MañAna
 
Capítulo 17 muestre de aceptació1
Capítulo 17 muestre de aceptació1Capítulo 17 muestre de aceptació1
Capítulo 17 muestre de aceptació1
 
Sistemas operativos moviles Android
Sistemas operativos moviles  AndroidSistemas operativos moviles  Android
Sistemas operativos moviles Android
 

Similaire à PyCon US 2013 Making Apache suck less for hosting Python web applications

Panther loves Symfony apps
Panther loves Symfony appsPanther loves Symfony apps
Panther loves Symfony appsSimone D'Amico
 
Scale Apache with Nginx
Scale Apache with NginxScale Apache with Nginx
Scale Apache with NginxBud Siddhisena
 
Web servers presentacion
Web servers presentacionWeb servers presentacion
Web servers presentacionKiwi Science
 
Django book15 caching
Django book15 cachingDjango book15 caching
Django book15 cachingShih-yi Wei
 
Benchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible DisastersBenchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible DisastersMongoDB
 
Apache Traffic Server
Apache Traffic ServerApache Traffic Server
Apache Traffic Serversupertom
 
Joomla! Performance on Steroids
Joomla! Performance on SteroidsJoomla! Performance on Steroids
Joomla! Performance on SteroidsSiteGround.com
 
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)Chris Tankersley
 
High Performance Web Sites
High Performance Web SitesHigh Performance Web Sites
High Performance Web SitesRavi Raj
 
PyCon US 2012 - Web Server Bottlenecks and Performance Tuning
PyCon US 2012 - Web Server Bottlenecks and Performance TuningPyCon US 2012 - Web Server Bottlenecks and Performance Tuning
PyCon US 2012 - Web Server Bottlenecks and Performance TuningGraham Dumpleton
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKXThe Art of Message Queues - TEKX
The Art of Message Queues - TEKXMike Willbanks
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with BlackfireMarko Mitranić
 
WordPress Development Tools and Best Practices
WordPress Development Tools and Best PracticesWordPress Development Tools and Best Practices
WordPress Development Tools and Best PracticesDanilo Ercoli
 
Growing MongoDB on AWS
Growing MongoDB on AWSGrowing MongoDB on AWS
Growing MongoDB on AWScolinthehowe
 

Similaire à PyCon US 2013 Making Apache suck less for hosting Python web applications (20)

Panther loves Symfony apps
Panther loves Symfony appsPanther loves Symfony apps
Panther loves Symfony apps
 
Scale Apache with Nginx
Scale Apache with NginxScale Apache with Nginx
Scale Apache with Nginx
 
ForkJoinPools and parallel streams
ForkJoinPools and parallel streamsForkJoinPools and parallel streams
ForkJoinPools and parallel streams
 
Web servers presentacion
Web servers presentacionWeb servers presentacion
Web servers presentacion
 
2016 03 15_biological_databases_part4
2016 03 15_biological_databases_part42016 03 15_biological_databases_part4
2016 03 15_biological_databases_part4
 
Django book15 caching
Django book15 cachingDjango book15 caching
Django book15 caching
 
webservers
webserverswebservers
webservers
 
Benchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible DisastersBenchmarking, Load Testing, and Preventing Terrible Disasters
Benchmarking, Load Testing, and Preventing Terrible Disasters
 
Apache Traffic Server
Apache Traffic ServerApache Traffic Server
Apache Traffic Server
 
Joomla! Performance on Steroids
Joomla! Performance on SteroidsJoomla! Performance on Steroids
Joomla! Performance on Steroids
 
Solu
SoluSolu
Solu
 
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
Your Inner Sysadmin - Tutorial (SunshinePHP 2015)
 
High Performance Web Sites
High Performance Web SitesHigh Performance Web Sites
High Performance Web Sites
 
PyCon US 2012 - Web Server Bottlenecks and Performance Tuning
PyCon US 2012 - Web Server Bottlenecks and Performance TuningPyCon US 2012 - Web Server Bottlenecks and Performance Tuning
PyCon US 2012 - Web Server Bottlenecks and Performance Tuning
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKXThe Art of Message Queues - TEKX
The Art of Message Queues - TEKX
 
Concurrency in ruby
Concurrency in rubyConcurrency in ruby
Concurrency in ruby
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 
Introduce Django
Introduce DjangoIntroduce Django
Introduce Django
 
WordPress Development Tools and Best Practices
WordPress Development Tools and Best PracticesWordPress Development Tools and Best Practices
WordPress Development Tools and Best Practices
 
Growing MongoDB on AWS
Growing MongoDB on AWSGrowing MongoDB on AWS
Growing MongoDB on AWS
 

Plus de Graham Dumpleton

Implementing a decorator for thread synchronisation.
Implementing a decorator for thread synchronisation.Implementing a decorator for thread synchronisation.
Implementing a decorator for thread synchronisation.Graham Dumpleton
 
Data analytics in the cloud with Jupyter notebooks.
Data analytics in the cloud with Jupyter notebooks.Data analytics in the cloud with Jupyter notebooks.
Data analytics in the cloud with Jupyter notebooks.Graham Dumpleton
 
“warpdrive”, making Python web application deployment magically easy.
“warpdrive”, making Python web application deployment magically easy.“warpdrive”, making Python web application deployment magically easy.
“warpdrive”, making Python web application deployment magically easy.Graham Dumpleton
 
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.Graham Dumpleton
 
OpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaSOpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaSGraham Dumpleton
 
Automated Image Builds in OpenShift and Kubernetes
Automated Image Builds in OpenShift and KubernetesAutomated Image Builds in OpenShift and Kubernetes
Automated Image Builds in OpenShift and KubernetesGraham Dumpleton
 
PyCon HK 2015 - Monitoring the performance of python web applications
PyCon HK 2015 -  Monitoring the performance of python web applicationsPyCon HK 2015 -  Monitoring the performance of python web applications
PyCon HK 2015 - Monitoring the performance of python web applicationsGraham Dumpleton
 
PyCon AU 2015 - Using benchmarks to understand how wsgi servers work
PyCon AU 2015  - Using benchmarks to understand how wsgi servers workPyCon AU 2015  - Using benchmarks to understand how wsgi servers work
PyCon AU 2015 - Using benchmarks to understand how wsgi servers workGraham Dumpleton
 
PyCon NZ 2013 - Advanced Methods For Creating Decorators
PyCon NZ 2013 - Advanced Methods For Creating DecoratorsPyCon NZ 2013 - Advanced Methods For Creating Decorators
PyCon NZ 2013 - Advanced Methods For Creating DecoratorsGraham Dumpleton
 
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.Graham Dumpleton
 
PyCon US 2012 - State of WSGI 2
PyCon US 2012 - State of WSGI 2PyCon US 2012 - State of WSGI 2
PyCon US 2012 - State of WSGI 2Graham Dumpleton
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsGraham Dumpleton
 
DjangoCon US 2011 - Monkeying around at New Relic
DjangoCon US 2011 - Monkeying around at New RelicDjangoCon US 2011 - Monkeying around at New Relic
DjangoCon US 2011 - Monkeying around at New RelicGraham Dumpleton
 

Plus de Graham Dumpleton (14)

Implementing a decorator for thread synchronisation.
Implementing a decorator for thread synchronisation.Implementing a decorator for thread synchronisation.
Implementing a decorator for thread synchronisation.
 
Not Tom Eastman
Not Tom EastmanNot Tom Eastman
Not Tom Eastman
 
Data analytics in the cloud with Jupyter notebooks.
Data analytics in the cloud with Jupyter notebooks.Data analytics in the cloud with Jupyter notebooks.
Data analytics in the cloud with Jupyter notebooks.
 
“warpdrive”, making Python web application deployment magically easy.
“warpdrive”, making Python web application deployment magically easy.“warpdrive”, making Python web application deployment magically easy.
“warpdrive”, making Python web application deployment magically easy.
 
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.
Hear no evil, see no evil, patch no evil: Or, how to monkey-patch safely.
 
OpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaSOpenShift, Docker, Kubernetes: The next generation of PaaS
OpenShift, Docker, Kubernetes: The next generation of PaaS
 
Automated Image Builds in OpenShift and Kubernetes
Automated Image Builds in OpenShift and KubernetesAutomated Image Builds in OpenShift and Kubernetes
Automated Image Builds in OpenShift and Kubernetes
 
PyCon HK 2015 - Monitoring the performance of python web applications
PyCon HK 2015 -  Monitoring the performance of python web applicationsPyCon HK 2015 -  Monitoring the performance of python web applications
PyCon HK 2015 - Monitoring the performance of python web applications
 
PyCon AU 2015 - Using benchmarks to understand how wsgi servers work
PyCon AU 2015  - Using benchmarks to understand how wsgi servers workPyCon AU 2015  - Using benchmarks to understand how wsgi servers work
PyCon AU 2015 - Using benchmarks to understand how wsgi servers work
 
PyCon NZ 2013 - Advanced Methods For Creating Decorators
PyCon NZ 2013 - Advanced Methods For Creating DecoratorsPyCon NZ 2013 - Advanced Methods For Creating Decorators
PyCon NZ 2013 - Advanced Methods For Creating Decorators
 
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
PyCon AU 2010 - Getting Started With Apache/mod_wsgi.
 
PyCon US 2012 - State of WSGI 2
PyCon US 2012 - State of WSGI 2PyCon US 2012 - State of WSGI 2
PyCon US 2012 - State of WSGI 2
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
DjangoCon US 2011 - Monkeying around at New Relic
DjangoCon US 2011 - Monkeying around at New RelicDjangoCon US 2011 - Monkeying around at New Relic
DjangoCon US 2011 - Monkeying around at New Relic
 

PyCon US 2013 Making Apache suck less for hosting Python web applications

  • 1. Making Apache suck less for hosting Python web applications Graham Dumpleton PyCon US - March 2013 Saturday, 16 March 2013
  • 2. Following Along http://www.slideshare.net/GrahamDumpleton Slides contains presenter notes! Saturday, 16 March 2013 If you want to follow along with the slides for this talk on your laptop, you can view them on slideshare.net. The slides do have my presenter notes, so you can also read it all again later if you are too busy surfing the net, or didn't catch some point.
  • 3. Apache Sucks • Is too hard to configure. • Is bloated and uses heaps of memory. • Is slow and doesn't perform very well. • Is not able to handle a high number of concurrent requests. Saturday, 16 March 2013 These days there seems to be a never ending line of people who will tell you that Apache sucks and that you are crazy if you use it, especially for hosting Python web applications. What is the truth? Do the people who make such claims actually understand how Apache works or are they just repeating what someone else told them?
  • 4. Apache Sucks • Is too hard to configure. • Is bloated and uses heaps of memory. • Is slow and doesn't perform very well. • Is not able to handle a high number of concurrent requests. Saturday, 16 March 2013 As the author of the mod_wsgi module for Apache, what I want to do in this talk is go through and look at what some of the pain points are when configuring Apache to run Python web applications. The intent is that you can walk away with a bit more insight into how Apache works and what is required to properly setup Apache and mod_wsgi. So, if you like, I am going to explain how to make Apache suck less.
  • 5. Where Has All My Memory Gone • Reasons for excessive memory usage. • Python web applications are fat to start with. • Poor choice of multiprocessing module (MPM). • Poor choice of configuration for the MPM used. • Loading of Apache modules you aren't using. • Size of Apache memory pools for each thread. • Inability to benefit from copy on write. Saturday, 16 March 2013 The biggest criticism which seems to be levelled at Apache is that it is bloated and uses too much memory. There are various reasons Apache can use a lot of memory. Many of these are under the control of the user and not necessarily a failing of Apache though.
  • 6. Why Is Response Time So Bad • Reasons for slow response times. • Not enough capacity configured to handle throughput. • Keep alive causing artificial reduction in capacity. • Machine slowing down due to frequent process recycling. • High cost of loading WSGI applications on process startup. Saturday, 16 March 2013 Another criticism is why is Apache so slow. Like with memory usage, this can also have a lot to do with how Apache has been configured. In practice, if Apache is simply setup properly for the specifics of running dynamic Python web applications, and takes into consideration the constraints of the system it is being run on, neither of these should be an issue.
  • 7. Streamlining The Apache Installation LoadModule authz_host_module modules/mod_authz_host.so LoadModule mime_module modules/mod_mime.so LoadModule rewrite_module modules/mod_rewrite.so LoadModule wsgi_module modules/mod_wsgi.so Saturday, 16 March 2013 The first thing one can do is to strip down what modules Apache is loading. Because Apache is a workhorse that can be used for many different tasks, it comes with a range of pluggable modules. There is likely going to be any number of modules getting loaded you aren't using. To cut down on base memory used by Apache itself, you should disable all Apache modules you are not using.
  • 8. Python Web Applications Are Fat • Raw Web Server • Apache - Streamlined (2 MB) • Python WSGI Hello World • Apache/mod_wsgi - Streamlined (5 MB) • Apache/mod_wsgi - Kitchen Sink (10MB) • Gunicorn - Sync Worker (10MB) • Real Python Web Application • Django (20-100MB+) Saturday, 16 March 2013 Beyond the server, it has to be recognised that any use of Python will cause an immediate increase in memory used. Load a typical web application, along with all the modules from the standard library it requires, as well as third party modules and memory use will grow quite quickly. The actual base memory consumed by the web server at that point can be quite small in comparison.
  • 9. Python Web Applications Are Fat • Raw Web Server • Apache - Streamlined (2 MB) • Python WSGI Hello World • Apache/mod_wsgi - Streamlined (5 MB) • Apache/mod_wsgi - Kitchen Sink (10MB) • Gunicorn - Sync Worker (10MB) • Real Python Web Application • Django (20-100MB+) Saturday, 16 March 2013 Overall, it shouldn't really matter what WSGI server you use, the Python interpreter and the Python web application itself should always use a comparable amount of memory for a comparable configuration. The laws of nature don't suddenly change when you start using Apache to host a Python web application. Memory used by the Python web application itself in a single process should not suddenly balloon out for no reason.
  • 10. Processes Vs Threads Server Browser Parent Client Server Server Server Processes Worker Worker Worker Threads Saturday, 16 March 2013 An appearance therefore of increased memory usage is more likely going to be due to differences in the server architecture. When I say server architecture, I specifically mean the mix of processes vs threads that are used by the server hosting the WSGI application to handle requests. Using processes in preference to threads will obviously mean that more memory is being used.
  • 11. Different Server Architectures • Apache • Prefork MPM - Multiple single threaded processes. • Worker MPM - Multiple multi threaded processes. • WinNT MPM - Single multi threaded processes. • Gunicorn • Sync Worker - Multiple single threaded processes. MPM = Multiprocessing Module Saturday, 16 March 2013 The big problem in this respect is that beginners know no better and will use whatever the default configuration is that their server distribution provides. For Apache, which is often supplied with the prefork multiprocessing module, or MPM, this can very easily cause problems, because it uses single threaded processes.
  • 12. Default Server Configuration • Apache • Prefork MPM - 150 processes (maximum) / 1 thread per process. • Worker MPM - 6 processes (maximum) / 25 threads per process. • WinNT MPM - 1 process (fixed) / 150 threads per process. • Gunicorn • Sync Worker - 1 process (fixed) / 1 thread per process. Saturday, 16 March 2013 Although prefork MPM may only initially start out with a single process, it can automatically scale out to 150 processes. That is 150 copies of your Python web application. At 20MB per process that is already 3GB and at 20MB that would be considered a small Python web application. In contrast, gunicorn sync worker defaults to a single process and single thread and doesn't scale. The memory requirement of gunicorn would therefore stay the same over time.
  • 13. Making Apache Suck Less (1) • Don't use Apache default configurations for hosting Python web applications. • Don't use the prefork MPM unless you know how to configure Apache properly, use the worker MPM, it is more forgiving. • Don't allow Apache to automatically scale out the number of processes over too great a range. • Don't try and use a single Apache instance to host Python, PHP, Perl web applications at the same time. Saturday, 16 March 2013 So, whatever you do, don't use the default configuration that comes with your server distribution. For Python web applications you generally can't avoid having to tune it. This is because the Apache defaults are setup for static file serving and PHP applications. Especially don't try and use the same Apache instance to host Python web applications at the same time as running PHP or Perl applications as each has different configuration requirements.
  • 14. What MPM Are You Using? $ /usr/sbin/httpd -V | grep 'Server MPM' Server MPM: Prefork Saturday, 16 March 2013 How do you work out which MPM you are using? Prior to Apache 2.4 the type of MPM being used was defined when Apache was being compiled and was statically linked into the Apache executable. From Apache 2.4, the MPM can also be dynamically loaded and so defined at runtime by the Apache configuration. Either way, you can determine the MPM in use by running the Apache executable with the '-V' option.
  • 15. WSGI multiprocess/multithread wsgi.run_once wsgi.multiprocess wsgi.multithread CGI TRUE TRUE FALSE Prefork FALSE TRUE FALSE Worker FALSE TRUE TRUE WinNT FALSE FALSE TRUE Saturday, 16 March 2013 Another way of determining the specific process architecture in use is by consulting the multiprocess and multithread attributes passed in the WSGI environ with each request. Neither of these though will actually tell you how many processes or threads are in use. For that you need to start looking at the Apache configuration itself.
  • 16. Defaults From Configuration File extra/httpd-mpm.conf <IfModule mpm_prefork_module> <IfModule mpm_worker_module> StartServers 1 StartServers 2 MinSpareServers 1 MaxClients 150 MaxSpareServers 10 MinSpareThreads 25 MaxClients 150 MaxSpareThreads 75 MaxRequestsPerChild 0 ThreadsPerChild 25 MaxRequestsPerChild 0 </IfModule> </IfModule> Saturday, 16 March 2013 This is where we can actually end up in a trap. For the standard Apache configuration as provided with the Apache Software Foundation's distribution, although there are example MPM settings provided, that configuration file isn't actually included by default. The settings in that file are also different to what is compiled into Apache, so you can't even use it as a guide to what the compiled in defaults are.
  • 17. Compiled In Prefork MPM Settings StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 256 MaxRequestsPerChild 10000 Saturday, 16 March 2013 So although I said before that the prefork MPM could scale up to 150 processes automatically, that was on the assumption that the default settings in the Apache configuration file were actually used. If those settings aren't used, then it is instead 256 processes, making it even worse. Some people recommend throwing away the default configuration files and starting from scratch meaning one uses the more lethal compiled in settings.
  • 18. Meaning Of Prefork Settings • StartServers - Number of child server processes created at startup. • MaxClients - Maximum number of connections that will be processed simultaneously. • MaxRequestsPerChild - Limit on the number of requests that an individual child server will handle during its life. Saturday, 16 March 2013 For those who are not familiar with these settings, what do they actually mean. StartServers is the initial number of processes created to handle requests. Because prefork uses single threaded processes, the maximum number of processes ends up being dictated by MaxClients.
  • 19. Automatic Scaling In Prefork • MinSpareServers - Minimum number of idle child server processes. • MaxSpareServers - Maximum number of idle child server processes. Saturday, 16 March 2013 The settings which need more explanation are the min and max spare processes. The purpose of these is to control how Apache dynamically adjusts the number of processes being used to handle requests.
  • 20. Algorithm For Scaling In Prefork if idle_process_count > max_spare_servers: kill a single process elif idle_process_count < min_spare_servers: spawn one or more processes Saturday, 16 March 2013 In very simple terms, what Apache does is wake up each second and looks at how many idle processes it has at that point which are not handling requests. If it has more idle processes than the maximum specified, it will kill off a single process. If it has less idle processes than the minimum spare required it will spawn more. How many it spawns will depend on whether it had spawned any in the previous check and whether it is creating them quickly enough.
  • 21. Visualising Scaling In Prefork Concurrent Requests Process Creation Total No Of Processes Idle Processes Saturday, 16 March 2013 When presented with the MPM settings being used, even I have to think hard sometimes about what the result of those settings will be. To make it easier to understand, I run the settings through a simulator and chart the results. In this example the number of concurrent requests is ramped up from zero and when maximum capacity is reached, it is ramped back down to zero again.
  • 22. Visualising Scaling In Prefork Concurrent Requests Create Kill Process Creation Total No Of Processes Idle Processes Saturday, 16 March 2013 By using a simulator and visualising the results, it becomes much easier to understand how Apache will behave. We can see when Apache would create processes or kill them off. What the total number of processes will be as it scales, and how that is being driven by trying to always maintain a pool of idle processes within the bounds specified.
  • 23. Floor On The Number Of Processes MaxSpareServers MinSpareServers Saturday, 16 March 2013 Zooming in on the chart for the total number of processes available to handle requests, one thing that stands out for example is that there is an effective floor on the number of processes which will be kept around. No processes will be killed if we are at or below this level.
  • 24. Starting Less Than The Minimum StartServers 1 MinSpareServers 5 MaxSpareServers 10 MaxClients 256 Saturday, 16 March 2013 Too often, people who have no idea how to configure Apache get in and start mucking around with these values, not understanding the implications of what they are doing. The simulator is great in being able to give a quick visual indicator as to whether something is amiss. One example is where the number of servers to be started is less than the minimum number of spare servers.
  • 25. Delayed Started Of Processes Creating Processes Saturday, 16 March 2013 What happens in this case is that as soon as Apache starts doing its checks, it sees that it isn't actually running enough processes to satisfy the requirement for the minimum number of idle processes. It therefore starts creating more, doubling the number each time until it has started enough. Rather than actually starting all processes immediately, in this case it takes 3 seconds thus potentially limiting the initial capacity of the server.
  • 26. Starting Up Too Many Servers StartServers 100 MinSpareServers 5 MaxSpareServers 10 MaxClients 256 Saturday, 16 March 2013 At the other end of the scale, we have people who change the number of servers to be started to be greater than the maximum spare allowed.
  • 27. Immediate Kill Off Of Processes Killing Processes Saturday, 16 March 2013 This time when Apache starts doing its checks, if finds it has more idle processes than allowed and starts killing them off at a rate of 1 per second. Presuming no traffic came in that necessitated those processes actually existing, it would take over a minute to kill off all the excess processes.
  • 28. Making Apache Suck Less (2) • Ensure MaxSpareServers is greater than MinSpareServers. If you don't, Apache will set MaxSpareServers to be MinSpareServers+1 for you anyway. • Don't set StartServers to be less than MinSpareServers as it will delay start up of processes so as to reach minimum spare required. • Don't set StartServers to be greater than MaxSpareServers as processes will start to be killed off immediately. Saturday, 16 March 2013 To avoid such delayed process creation, or immediate killing off of processes on startup, you should ensure that the value of StartServers is bounded by MinSpareServers and MaxSpareServers.
  • 29. Overheads Of Process Creation • Initialisation of the Python interpreter. • Loading of the WSGI application. • Loading of required standard library modules. • Loading of required third party modules. • Initialisation of the WSGI application. Saturday, 16 March 2013 Why do we care about unnecessary process creation? After all, aren't the processes just a fork of the Apache parent process and so cheap to create? The problem is that unlike mod_php where PHP is initialised and all extension modules preloaded into the Apache parent process, when using mod_wsgi, Python initialisation is deferred until after the processes are forked. Any application code and required Python modules are then lazily loaded.
  • 30. Preloading Is A Security Risk • You don't necessarily know which WSGI application to load until the first request arrives for it. • Python web applications aren't usually designed properly for preloading prior to forking of worker processes. • All code run in the Apache parent process is run as root. Saturday, 16 March 2013 If initialising Python and loading the WSGI application in the worker process can be expensive, why can't we preload everything in the Apache parent process before the worker processes are forked? Even if a Python web application were designed to be able to be preloaded and run properly after the process was forked, the key issue is that users application code on startup would run as root if executed in the parent process and that is one very big security risk.
  • 31. Preloading Causes Memory Leaks • The Python interpreter will leak memory into the parent process when an Apache restart occurs. Saturday, 16 March 2013 Add to that, because of what Python does (or should I say doesn't do) when the interpreter is destroyed, combined with the way in which Apache reloads mod_wsgi when restarting, the Python interpreter will leak memory into the Apache parent process. If Apache restarts are done on a regular basis, the size of the Apache parent will keep growing over time and thus so will the forked worker processes as well.
  • 32. Downsides Of Not Preloading • Additional CPU load when creating new worker process. • Worker processes will not be immediately ready. • No saving in memory usage from copy on write. Saturday, 16 March 2013 So running within Apache we have no choice and have to defer initialisation of Python and loading of the WSGI application until after the child processes are forked. This causes additional CPU load each time a process is started up and the time taken will also mean that requests will be held up. Finally, because we are not preloading in the parent, we cannot benefit from reduced memory usage from copy on write features of the operating system.
  • 33. Avoiding Process Creation Saturday, 16 March 2013 Process startup is therefore expensive. We want to avoid doing it and certainly don't want it occurring at a time which is inconvenient. Unfortunately, especially with prefork MPM, allowing Apache to dynamically create processes can result in a lot of process churn if you are not careful. The more problematic situation is where there is a sudden burst off traffic and there is not enough processes already running to handle it.
  • 34. Sudden Burst In Traffic Sudden need to create processes. Saturday, 16 March 2013 In the worst case scenario, the increased load from creating processes when a sustained traffic spike occurs, could see the whole system slow down. The slow down can make Apache think it isn't creating enough processes quickly enough, so it keeps creating more and more. Pretty quickly it has created the maximum number of processes, with the combined CPU load of loading the WSGI application for all of them, causing the server to grind to a halt.
  • 35. Constant Churn Of Processes Continual killing off and creation of processes. Saturday, 16 March 2013 Even after the processes have been created we can still see process churn. This is because Apache doesn't look at request load over time, it only looks at the number of concurrent requests running at the time of the check. This can actually bounce around quite a lot each second. There will therefore be a continual churn of processes as Apache thinks there is more than required and kills some off and then when it again believes it does not have enough and creates more.
  • 36. Raising The Floor StartServers 5 MinSpareServers 5 MaxSpareServers 150 MaxClients 256 Saturday, 16 March 2013 What if we raise that floor on the number of processes as determined by the MaxSpareServers setting? We said that so long as the number of processes was below this level, none would be killed off. Lets try then setting that to a level above the average number of processes in use.
  • 37. Number Of Processes Plateaus Initial Spike Reduced Incidence Of Process Creation Saturday, 16 March 2013 What will happen is that although there will be an initial spike in the number of processes created, after that there will only be a slow increase as the number of processes finds its natural level and plateaus. So long as we stay below MaxSpareServers we will avoid process churn.
  • 38. Breaching The Maximum Falls Back To Maximum After Spike Saturday, 16 March 2013 We haven't set the maximum number of spare processes to the maximum allowed clients quite yet though. So it is still possible that when a spike occurs we will create more than the maximum spare allowed. When traffic recedes, the number of processes will reduce back to the level of the maximum and no further. Finding the optimal level for the maximum spare processes to avoid churn can be tricky, but one can work it out by monitoring utilisation.
  • 39. Maximum Number Of Requests MaxRequestsPerChild 0 Saturday, 16 March 2013 Do be aware though that all this work in tuning the settings can be undone by the MaxRequestsPerChild setting. We want to avoid process churn. It is no good setting this to such a low value that this would cause process recycling after a very short period of time, as it just reintroduces process churn in another way. It is better to have this be zero resulting in processes staying persistent in memory until shutdown.
  • 40. Handling Large Number Of Clients MaxClients 256 Saturday, 16 March 2013 Now the only reason that the defaults for Apache specify such a large value for MaxClients, and thus a large number of processes when single threading is used, is because of slow clients and keep alive. A high number is required to support concurrent sessions from many users. If using single threaded processes though, this means you will need to have much more memory available.
  • 41. Front End Proxy And KeepAlive nginx Server Browser Front Parent Client End Server Server Server Processes Worker Worker Worker Saturday, 16 March 2013 A much better solution is to put a nginx proxy in front of Apache. The nginx server will isolate Apache from slow clients as it will only forward a request when it is completely available and can be handled immediately. The nginx server can also handle keep alive connections meaning it can be turned off in Apache. This allows us to significantly reduce the number of processes needed for Apache to handle the same amount of traffic as before.
  • 42. Making Apache Suck Less (3) • Set MaxSpareProcesses at a level above the typical number of concurrent requests you would need to handle. • Do not use MaxRequestsPerChild, especially at a low count which would cause frequent process churn. • Remember that if you don't set MaxRequestsPerChild explicitly, it defaults to 10000. • Use nginx as a front end proxy to isolate Apache from slow clients. • Turn off keep alive in Apache when using nginx as a front end. Saturday, 16 March 2013 Key in eliminating unwanted CPU usage was therefore avoiding process churn which is achieved by adjusting the maximum allowed number of spare processes and ensuring we aren't periodically recycling processes for no good reason. We can though also reduce the number of processes we need in the first place by adding nginx as a proxy in front of Apache.
  • 43. Prefork MPM Vs Worker MPM Prefork MPM Saturday, 16 March 2013 All of what I have explained so far focused on prefork MPM. I have concentrated on it because it magnifies the problems that can arise. When people say Apache sucks it is usually because they were using prefork MPM with an inadequate configuration. Use of prefork MPM with a nginx proxy will give you the best performance possible if setup correctly. As I said before though, using worker MPM is much more forgiving of you having a poor setup.
  • 44. Visualising Scaling In Prefork Worker MPM Saturday, 16 March 2013 The reasons for this are that for the same large default value of MaxClients, worker MPM will use a lot less processes than prefork MPM. This is because each process will have 25 threads handling requests instead of 1. Being less processes, worker MPM will therefore see less copies of your Python web application and so less memory usage.
  • 45. Compiled In Worker MPM Settings StartServers 3 MinSpareThreads 75 MaxSpareThreads 250 ThreadsPerChild 25 MaxClients 400 MaxRequestsPerChild 10000 Saturday, 16 March 2013 In the case of worker MPM, by default 3 processes would be started initially with the compiled in defaults. With MaxClients of 400 and ThreadsPerChild being 25, that means a maximum of 16 processes would be created.
  • 46. Automatic Scaling In Worker • MinSpareThreads - Minimum number of idle threads available to handle request spikes. • MaxSpareThreads - Maximum number of idle threads. Saturday, 16 March 2013 Settings related to scaling when using worker MPM refer to threads whereas with prefork MPM they were in terms of processes. MaxSpareThreads defaults to 250, which equates to the equivalent of 10 processes.
  • 47. Making Apache Suck Less (4) • Ensure MaxSpareThreads is at at least MinSpareThreads +ThreadsPerChild. If you don't, Apache will set it to that for you anyway. • Suggested that MinSpareThreads and MaxSpareThreads be set as multiples of ThreadsPerChild. • Don't set StartServers to be less than MinSpareThreads/ ThreadsPerChild as it will delay start up of processes so as to reach minimum spare required. • Don't set StartServers to be greater than MaxSpareThreads/ThreadsPerChild as processes will start to be killed off immediately. Saturday, 16 March 2013 One very important thing to note, is that although these are expressed in terms of threads, Apache doesn't scale at the thread level. The number of threads per process is static. When scaling it is the same as prefork, a process will either be created or killed. The decision though is based on available threads instead.
  • 48. Worker Defaults More Forgiving Initial Spike Is For Much Fewer Processes Followed By No Churn At All Saturday, 16 March 2013 Running our simulation of random traffic from before with a similar level of concurrent requests and although we still had a initial spike in creating processes, no new processes were needed after that, as we were within the level specified by max spare threads. No churn means no wasted CPU through continually creating and killing processes. Using the compiled in defaults at least, this is why worker MPM is more forgiving that prefork MPM.
  • 49. Reducing Per Thread Memory Use • MaxMemFree - Maximum amount of memory that the main allocator is allowed to hold without calling free(). MaxMemFree 256 # KBytes Saturday, 16 March 2013 As before, and especially if using nginx as a front end proxy, one can adjust MaxClients, min and max spare threads and perhaps bring down even further the amount of resources used. A more important setting though is MaxMemFree. This is the maximum amount of memory the Apache per thread memory pool is allowed to hold before calling free on memory. Prior to Apache 2.4, this was unbounded. In Apache 2.4 it is 2MB.
  • 50. Making Apache Suck Less (5) • Ensure that MaxMemFree is set and not left to be unbounded. • Even on Apache 2.4 where is 2MB, consider reducing the value further. Saturday, 16 March 2013 Even at 2MB in Apache 2.4, this could mean that for 25 threads, 50MB can be held by the persistent memory pools in each process. When running mod_wsgi, under normal circumstances, there should not be much call for memory to be allocated from the per request memory pool. To be safe though you should ensure MaxMemFree is set and with a reduced value if possible.
  • 51. Daemon Mode Of Apache/mod_wsgi Server Browser Parent Client Server Server Server Processes Worker Worker Worker Daemon Process(es) Daemon Process Threads Saturday, 16 March 2013 Now the configuration for prefork or worker MPM are principally an issue when using what is called embedded mode of mod_wsgi. That is, your WSGI application runs inside of the Apache server child worker processes. The dynamic scaling algorithm of Apache being what can cause us grief when doing this. Using worker MPM helps, but an even safer alternative is to use mod_wsgi daemon mode instead. In this case your WSGI application runs in a separate set of managed processes.
  • 52. Daemon Mode Configuration WSGIDaemonProcess myapp processes=3 threads=5 WSGIScriptAlias / /some/path/wsgi.py process-group=myapp application-group=%{GLOBAL} Saturday, 16 March 2013 The main difference when using daemon mode is that there is no automatic scaling of the number of processes. The number of processes and threads is instead fixed. Being fixed everything is more predictable and you only need to ensure you have sufficient capacity. Using daemon mode, the need to have nginx as a front end is reduced as the Apache server child worker processes are serving much the same process in isolating the WSGI application from slow clients.
  • 53. Exclusively Using Daemon Mode • WSGIRestrictEmbedded - Controls whether the Python interpreter is initialised in Apache server worker processes. WSGIRestrictEmbedded On Saturday, 16 March 2013 Because the Apache server processes are now only acting as a proxy, forwarding requests to the mod_wsgi daemon process, as well as serving static files, we don't need to initialise the Python interpreter in the Apache server processes. Process creation is again lightweight and we have side stepped the need to pay so much attention to the Apache MPM settings.
  • 54. The Things That Make Apache Suck • An algorithm for dynamically scaling processes which isn't particularly suited to embedded Python web applications. • Default MPM and settings which magnify the issues which can arise with dynamic scaling when running Python web applications. • A concurrency mechanism that can use a lot of memory for a high number of concurrent requests, especially around handling of keep alive connections. • Defaults for memory pool sizes which cause Apache to be heavyweight on memory usage. Saturday, 16 March 2013 So Apache can certainly be a challenging environment for running Python web applications. The main pain points are how its algorithm for dynamic scaling works and memory requirements to support high concurrency. With careful attention it is possible though to configure Apache to reduce the problems these can cause.
  • 55. Application Performance Monitoring Saturday, 16 March 2013 The simulator I demonstrated can be used to try and validate any configuration before you use it, but the random nature of web site traffic means that it will not be conclusive. This is where live monitoring of traffic in your production web site provides a much better level of feedback. New Relic is obviously the package I would like to see you using, but any monitoring is better than none.
  • 56. Capacity Analysis Saturday, 16 March 2013 In New Relic, one of the reports it generates which is particularly relevant to coming up with the best processes/threads configuration is its capacity analysis report. From this report one can see whether you have provided enough capacity, or whether you have over allocated and so wasting memory, or are running your application over more hosts than you need and therefore paying more money for hosting than you need.
  • 57. Capacity Analysis Saturday, 16 March 2013 Although this talk has been about Apache/mod_wsgi, this report is just as relevant to other WSGI hosting mechanisms, such as gunicorn and uWSGI. Working at New Relic and being able to see data coming in from a wide variety of deployments it is really quite amazing how poorly some servers are being set up, and not just when Apache is being used. So if you are using New Relic, I would really suggest paying a bit more attention to this report. Doing so can help you make your server run better and possibly save you money as well.
  • 58. More Information • Slides (with presenter notes). • http://www.slideshare.net/GrahamDumpleton • Apache/mod_wsgi mailing list (preferred contact point). • http://groups.google.com/group/modwsgi • New Relic (Application Performance Monitoring) • http://newrelic.com • http://newrelic.com/pycon (special 30 day promo code - pycon13) • Personal blog posts on Apache/mod_wsgi and WSGI. • http://blog.dscpl.com.au • If you really really must bother me directly. • Graham.Dumpleton@gmail.com • @GrahamDumpleton Saturday, 16 March 2013 And that is all I want to cover today. If you are after more information, especially if you are interested in the simulator I demonstrated, keep an eye on my blog for more details of that sometime in the near future. If you are interested in using New Relic to better configure your WSGI server, then you can catch me in the expo hall after the talk. Questions?