Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Capacity Planning
    for Web Operations




        John Allspaw
    Operations Engineering
Are you tracking how
  your servers are
    performing?
                             ?
                            Do you ...
monitoring


testing                          deployment

               forecasting
  architecture
                      ...
monitoring


testing                  deployment
                   go see Adam Jacob’s talk!
               forecasting
 ...
traditional capacity planning
capacity planning for web
Why capacity planning is
      important

            Hardware* costs $$

        (Cloudware costs $$, too)

Having too li...
Growth

     “Normal”
                  projected
                  planned           (yay!)
                  expected
  ...
“Normal” growth at Flickr
              in a year....



 4x increase in photo requests/sec

  2.2x increase in uploads/da...
Yahoo! FrontPage link
    XMas lights




              “Instantaneous”
“Instantaneous” coping


 - Disabling “heavier” features on the site

 - Cache aggressively or serve stale data

  - Bake ...
capacity != performance

  Making something fast
doesn’t necessarily make it
           last
                             ...
Stewart:
    “Allspaw!!!! OMG!!!”




   How many servers will
    we need next year?!
   (we need to tell finance by 2pm t...
“Ah, just buy twice as
  much as we need”


   2 x (how much we need) = ?
measurement
Good capacity
measurement tools can...

 Easily measure and record any number
         that changes over time

 Easily com...
good tools are out there


cacti.net            munin.projects.linpro.no



       hyperic.com




                       ...
good tools are out there


cacti.net                 munin.projects.linpro.no



       hyperic.com


            Flickr u...
photo
uploads
  via
 email
  per
minute



                    hour

          application metrics
your stu!, not just system
            stu!

photos uploaded (and processed) per minute
 average photo processing time per...
your stu!, not just system
               stu!

photos uploaded (and processed) per minute
    average photo processing ti...
Tie application metrics to
     system metrics




       Pretty!! But what does
             this mean?
It means that with about
    60% total CPU...




   It means we can
       process
   ~120 images per
       minute

    ...
Benchmarking

Great for comparing hardware platforms
and configurations




                 BUT
   Doesn’t represent real ...
not exactly like a
 bike messenger
Finding your ceilings

• Use real data, from production servers
  (if at all possible)
• No, really
How much traffic can each webserver
       take before it dies?

How many webservers can fail
   before we’re screwed?


Wh...
people




                          LBs


webservers




     (databases, etc.)
people


                                  network

                          LBs
                                      cp...
people


                                  network

                          LBs
                                      cp...
people




                          LBs


webservers




     (databases, etc.)
people




                          LBs


webservers




     (databases, etc.)
people




                          LBs


webservers




     (databases, etc.)
what happens here?




Ceiling = upper limit of “work” (and resources)
Trends of peaks




      Time
Benchmarking
Might be your only option if you have a single server.


          some good benchmarking tools:

           ...
Economics
Time makes everything
      cheaper
      (the Moore’s Law thing)

               BUT



 you don’t have a lot of time to
...
Vertical scaling
Horizontal architectures
Diagonal scaling
Diagonal scaling




Replacing 67 dual-core webservers with 18 dual
                    quads
Diagonal scaling




more traquot;c from less machines
Diagonal Scaling

servers   CPUs           RAM         drives       total power (W)
                                      ...
Utility Computing




Disclosure: We don’t use clouds at Flickr.
      (but we know folks who do)
clouds


  Help with deployment timelines
  Help with procurement timelines
                  BUT
      Still have to pay ...
Use Common Sense(tm)
      Pay attention to the right metrics

   Don’t pretend to know the exact future

     Measure con...
Some more stats
   Serving 32,000 photos per second at peak


         Consuming 6-8TB per day


Consumed >34TB per day du...
June 23-24, 2008

 20% off discount: “vel08js”
We Are Hiring!
            (DBA, engineers)

 http://flickr.com/photos/85013738@N00/542591121/
 http://flickr.com/photos/for...
Capacity Planning For Web Operations Presentation
Capacity Planning For Web Operations Presentation
Prochain SlideShare
Chargement dans…5
×

Capacity Planning For Web Operations Presentation

1 107 vues

Publié le

Publié dans : Technologie
  • Soyez le premier à commenter

Capacity Planning For Web Operations Presentation

  1. 1. Capacity Planning for Web Operations John Allspaw Operations Engineering
  2. 2. Are you tracking how your servers are performing? ? Do you know how many servers do you have? Do you know how much Are you tracking how tra!c can your application is being your servers used ? handle? (without dying)
  3. 3. monitoring testing deployment forecasting architecture metrics product planning capex procurement
  4. 4. monitoring testing deployment go see Adam Jacob’s talk! forecasting architecture metrics product planning capex procurement
  5. 5. traditional capacity planning
  6. 6. capacity planning for web
  7. 7. Why capacity planning is important Hardware* costs $$ (Cloudware costs $$, too) Having too little is bad (!@#!!) -> ($$$) Having too much is bad ($$$$!) * and network, datacenter space, power, etc.
  8. 8. Growth “Normal” projected planned (yay!) expected hoped for “Instantaneous” spikes (yay?) unexpected external events (omg! wtf!) digg, etc.
  9. 9. “Normal” growth at Flickr in a year.... 4x increase in photo requests/sec 2.2x increase in uploads/day 3x increase in database queries/sec
  10. 10. Yahoo! FrontPage link XMas lights “Instantaneous”
  11. 11. “Instantaneous” coping - Disabling “heavier” features on the site - Cache aggressively or serve stale data - Bake dynamic pages into static ones
  12. 12. capacity != performance Making something fast doesn’t necessarily make it last Performance tuning = good, just don’t count on it Accept (for now) the performance you have, not the performance you wished you had, or you think you might have later
  13. 13. Stewart: “Allspaw!!!! OMG!!!” How many servers will we need next year?! (we need to tell finance by 2pm today)
  14. 14. “Ah, just buy twice as much as we need” 2 x (how much we need) = ?
  15. 15. measurement
  16. 16. Good capacity measurement tools can... Easily measure and record any number that changes over time Easily compare metrics to any other metrics from anywhere else (import/export) Easily make graphs
  17. 17. good tools are out there cacti.net munin.projects.linpro.no hyperic.com ganglia.info
  18. 18. good tools are out there cacti.net munin.projects.linpro.no hyperic.com Flickr uses ganglia.info
  19. 19. photo uploads via email per minute hour application metrics
  20. 20. your stu!, not just system stu! photos uploaded (and processed) per minute average photo processing time per minute average photo size disk space consumed per day user registrations per day etc etc etc
  21. 21. your stu!, not just system stu! photos uploaded (and processed) per minute average photo processing time per minute average photo size disk space consumed per day user registrations per day etc etc etc
  22. 22. Tie application metrics to system metrics Pretty!! But what does this mean?
  23. 23. It means that with about 60% total CPU... It means we can process ~120 images per minute ...and we can process them in ~3.5 seconds (on average)
  24. 24. Benchmarking Great for comparing hardware platforms and configurations BUT Doesn’t represent real workloads
  25. 25. not exactly like a bike messenger
  26. 26. Finding your ceilings • Use real data, from production servers (if at all possible) • No, really
  27. 27. How much traffic can each webserver take before it dies? How many webservers can fail before we’re screwed? When should I add more webservers?
  28. 28. people LBs webservers (databases, etc.)
  29. 29. people network LBs cpu memory webservers disk usage disk i/o (databases, etc.)
  30. 30. people network LBs cpu memory webservers disk usage disk i/o (databases, etc.) - comments/min - photos/min - videos/min - kittens/min - etc etc etc/min
  31. 31. people LBs webservers (databases, etc.)
  32. 32. people LBs webservers (databases, etc.)
  33. 33. people LBs webservers (databases, etc.)
  34. 34. what happens here? Ceiling = upper limit of “work” (and resources)
  35. 35. Trends of peaks Time
  36. 36. Benchmarking Might be your only option if you have a single server. some good benchmarking tools: Siege http://www.joedog.org/JoeDog/Siege httperf/autobench http://www.hpl.hp.com/research/linux/httperf/ http://www.xenoclast.org/autobench sysbench http://sysbench.sf.net
  37. 37. Economics
  38. 38. Time makes everything cheaper (the Moore’s Law thing) BUT you don’t have a lot of time to wait around, do you?
  39. 39. Vertical scaling
  40. 40. Horizontal architectures
  41. 41. Diagonal scaling
  42. 42. Diagonal scaling Replacing 67 dual-core webservers with 18 dual quads
  43. 43. Diagonal scaling more traquot;c from less machines
  44. 44. Diagonal Scaling servers CPUs RAM drives total power (W) @60% peak per server per server per server 67 2 4GB 1x80GB 8763.6 18 8 4GB 1x146GB 2332.8 ~70% less power 49U less rack space
  45. 45. Utility Computing Disclosure: We don’t use clouds at Flickr. (but we know folks who do)
  46. 46. clouds Help with deployment timelines Help with procurement timelines BUT Still have to pay attention Many people use the same forecasting methods
  47. 47. Use Common Sense(tm) Pay attention to the right metrics Don’t pretend to know the exact future Measure constantly, adapt constantly Complex simulation and modeling is rarely worth it Don’t expect tuning and tweaking will ever win you any excess capacity
  48. 48. Some more stats Serving 32,000 photos per second at peak Consuming 6-8TB per day Consumed >34TB per day during Y!Photos migration ~3M uploads per day, 60 per second at peak
  49. 49. June 23-24, 2008 20% off discount: “vel08js”
  50. 50. We Are Hiring! (DBA, engineers) http://flickr.com/photos/85013738@N00/542591121/ http://flickr.com/photos/formatc1/2301500208/ http://flickr.com/photos/mikefats/11546240/ http://flickr.com/photos/kanaka/491064256/ http://flickr.com/photos/randysonofrobert/1035003071/ http://flickr.com/photos/halcyonsnow/446166047/ http://flickr.com/photos/wwworks/2313927146/ http://flickr.com/photos/sunxez/1392677065/ http://flickr.com/photos/spacesuitcatalyst/536389937/ http://flickr.com/photos/theklan/1276710183/

×