SlideShare une entreprise Scribd logo
1  sur  84
Télécharger pour lire hors ligne
Building Scalable Systems
                 an Asynchronous Approach


                         / pleasure and pain



Monday, June 20, 2011
Who am I? @postwait on twitter


                        Author of “Scalable Internet Architectures”
                        Pearson, ISBN: 067232699X

                        Contributor to “Web Operations”
                        O’Reilly, ISBN:



                        Founder of OmniTI, Message Systems, Fontdeck, & Circonus
                        I like to tackle problems that are “always on” and “always growing.”




                        I am an Engineer
                        A practitioner of academic computing.
                        IEEE member and Senior ACM member.
                        On the Editorial Board of ACM’s Queue magazine.



                                                      2
Monday, June 20, 2011
Some rants about systems.


                         / cutting through the crap



Monday, June 20, 2011
BIG data




                    •   BIG data: doesn’t exist

                        •   but it sure is a good way to market things.

                    •   If you measure things in petabytes
                        as we have for the last several years
                        you have data, not BIG data.




Monday, June 20, 2011
Data stores




            •       “new” NoSQL systems can scale better than an RDBMS

                  •     yes

            •       They scale better than sharded RDBMS

                  •     no

            •       I need NoSQL systems for “BIG” data.

                  •     no




Monday, June 20, 2011
Why NoSQL




                  •     The choice to use a NoSQL system is driven by:

                        •   a more suitable data model

                        •   we desire to shift our CAP theorem constraints




Monday, June 20, 2011
The cloud




            •       The cloud is not magic.

            •       The cloud enables engineers to rapidly deploy an
                    architecture to run an application.

            •       There is nothing wrong with this.




Monday, June 20, 2011
The cloud: gotcha 1



            •       Provisioning services “the old way” took time due to:

                  •     installing systems

                  •     install packages

                  •     determining and enforcing data security

                  •     determining and enforcing SLA monitoring

                  •     determining and enforcing DR: RPO and RTO

                  •     documenting escalation and remediation efforts




Monday, June 20, 2011
The cloud: gotcha 2




                  •     Quality of service in multi-tenancy environments is a
                        very, very hard problem.

                  •     In fact, no one has solved it.




Monday, June 20, 2011
The cloud: gotcha 3




                  •     Platform as a service provides:

                        •   MySQL? PostgreSQL? Redis? Node.js?

                        •   .. patched.

                        •   .. patched.

                        •   .. patched.

                        •   .. patched.




Monday, June 20, 2011
The cloud: gotcha 4




                        •   Scaling isn’t magic pixie dust.

                        •   “I can run more instances of my app.”

                            •   horribly false sense of security.

                        •   You must have a scalable architecture




Monday, June 20, 2011
Design & Implementation
                 Techniques


                         / some say architecture != implementation



Monday, June 20, 2011
Architecture vs. Implementation




Monday, June 20, 2011
Architecture vs. Implementation




                        Architecture is without specification of the vendor,
                        make, and model of components.




Monday, June 20, 2011
Architecture vs. Implementation




                        Architecture is without specification of the vendor,
                        make, and model of components.
                        Implementation is the adaptation of an architecture
                        to embrace available technologies.




Monday, June 20, 2011
Architecture vs. Implementation




                        Architecture is without specification of the vendor,
                        make, and model of components.
                        Implementation is the adaptation of an architecture
                        to embrace available technologies.
                        They are intrinsically tied.
                        Insisting on separation is a metaphysical argument
                        (with no winners)




Monday, June 20, 2011
Respect Engineering Math



                        Engineering math:
                            19 + 89 = 110
                        “Precise” Math:
                            19 + 89 = 10.8




Monday, June 20, 2011
Respect Engineering Math



                        Engineering math:
                            19 + 89 = 110
                        “Precise” Math:
                            19 + 89 = 10.8



                                      Ok. Ok. I must have, I must have put a decimal
                                      point in the wrong place or something. Shit. I
                                      always do that. I always mess up some mundane
                                      detail.

                                                        - Michael Bolton in Office Space



Monday, June 20, 2011
Ensure the gods aren’t angry.




Monday, June 20, 2011
Ensure the gods aren’t angry.

          Bob: We need to grow our cluster of web servers.




Monday, June 20, 2011
Ensure the gods aren’t angry.

          Bob: We need to grow our cluster of web servers.
          Alice: How many requests per second do they do, how many
          do you have and what is their current resource utilization?




Monday, June 20, 2011
Ensure the gods aren’t angry.

          Bob: We need to grow our cluster of web servers.
          Alice: How many requests per second do they do, how many
          do you have and what is their current resource utilization?
          Bob: About 200 req/second, 8 servers and they have no
          headroom.




Monday, June 20, 2011
Ensure the gods aren’t angry.

          Bob: We need to grow our cluster of web servers.
          Alice: How many requests per second do they do, how many
          do you have and what is their current resource utilization?
          Bob: About 200 req/second, 8 servers and they have no
          headroom.
          Alice: How many req/second do you need?




Monday, June 20, 2011
Ensure the gods aren’t angry.

          Bob: We need to grow our cluster of web servers.
          Alice: How many requests per second do they do, how many
          do you have and what is their current resource utilization?
          Bob: About 200 req/second, 8 servers and they have no
          headroom.
          Alice: How many req/second do you need?
          Bob: 800 req/second would be good.




Monday, June 20, 2011
Ensure the gods aren’t angry.

          Bob: We need to grow our cluster of web servers.
          Alice: How many requests per second do they do, how many
          do you have and what is their current resource utilization?
          Bob: About 200 req/second, 8 servers and they have no
          headroom.
          Alice: How many req/second do you need?
          Bob: 800 req/second would be good.
          Alice: Um, Bob, 200 x 8 = 1600... you have 50% headroom on
          your goal.




Monday, June 20, 2011
Ensure the gods aren’t angry.

          Bob: We need to grow our cluster of web servers.
          Alice: How many requests per second do they do, how many
          do you have and what is their current resource utilization?
          Bob: About 200 req/second, 8 servers and they have no
          headroom.
          Alice: How many req/second do you need?
          Bob: 800 req/second would be good.
          Alice: Um, Bob, 200 x 8 = 1600... you have 50% headroom on
          your goal.
          Bob: No... 200 / 8 = 25 req/second per server.




Monday, June 20, 2011
Ensure the gods aren’t angry.

          Bob: We need to grow our cluster of web servers.
          Alice: How many requests per second do they do, how many
          do you have and what is their current resource utilization?
          Bob: About 200 req/second, 8 servers and they have no
          headroom.
          Alice: How many req/second do you need?
          Bob: 800 req/second would be good.
          Alice: Um, Bob, 200 x 8 = 1600... you have 50% headroom on
          your goal.
          Bob: No... 200 / 8 = 25 req/second per server.
          Alice: Bob... the gods are angry.


Monday, June 20, 2011
Why you’ve pissed off the gods.




Monday, June 20, 2011
Why you’ve pissed off the gods.


          Most web apps are CPU bound (as I/O happens on a different
          layer)




Monday, June 20, 2011
Why you’ve pissed off the gods.


          Most web apps are CPU bound (as I/O happens on a different
          layer)
          Typical box today:
           8 cores are 2.8GHz or
           22.4 BILLION instructions per second.




Monday, June 20, 2011
Why you’ve pissed off the gods.


          Most web apps are CPU bound (as I/O happens on a different
          layer)
          Typical box today:
           8 cores are 2.8GHz or
           22.4 BILLION instructions per second.
          22x109 instr/s / 25 req/s = 880 MILLION instructions per
          request.




Monday, June 20, 2011
Why you’ve pissed off the gods.


          Most web apps are CPU bound (as I/O happens on a different
          layer)
          Typical box today:
           8 cores are 2.8GHz or
           22.4 BILLION instructions per second.
          22x109 instr/s / 25 req/s = 880 MILLION instructions per
          request.
          This same effort (per-request) provided me with approximately
          15 minutes enjoying “Might & Magic 2” on my Apple IIe
          - you’ve certainly pissed me off.




Monday, June 20, 2011
Why you’ve pissed off the gods.


          Most web apps are CPU bound (as I/O happens on a different
          layer)
          Typical box today:
           8 cores are 2.8GHz or
           22.4 BILLION instructions per second.
          22x109 instr/s / 25 req/s = 880 MILLION instructions per
          request.
          This same effort (per-request) provided me with approximately
          15 minutes enjoying “Might & Magic 2” on my Apple IIe
          - you’ve certainly pissed me off.
          No wonder the gods are angry.



Monday, June 20, 2011
Develop a model




Monday, June 20, 2011
Develop a model


             Queue theoretic models are for “other people.”




Monday, June 20, 2011
Develop a model


             Queue theoretic models are for “other people.”
             Sorta, not really.




Monday, June 20, 2011
Develop a model


             Queue theoretic models are for “other people.”
             Sorta, not really.
             Problems:




Monday, June 20, 2011
Develop a model


             Queue theoretic models are for “other people.”
             Sorta, not really.
             Problems:
                    very hard to develop a complete and accurate model




Monday, June 20, 2011
Develop a model


             Queue theoretic models are for “other people.”
             Sorta, not really.
             Problems:
                    very hard to develop a complete and accurate model
             Benefits:




Monday, June 20, 2011
Develop a model


             Queue theoretic models are for “other people.”
             Sorta, not really.
             Problems:
                    very hard to develop a complete and accurate model
             Benefits:
                    provides insight on architecture capacitance dependencies




Monday, June 20, 2011
Develop a model


             Queue theoretic models are for “other people.”
             Sorta, not really.
             Problems:
                    very hard to develop a complete and accurate model
             Benefits:
                    provides insight on architecture capacitance dependencies
                    relatively easy to understand




Monday, June 20, 2011
Develop a model


             Queue theoretic models are for “other people.”
             Sorta, not really.
             Problems:
                    very hard to develop a complete and accurate model
             Benefits:
                    provides insight on architecture capacitance dependencies
                    relatively easy to understand
                    illustrates opportunities to further isolate work




Monday, June 20, 2011
Rationalize your model




Monday, June 20, 2011
Rationalize your model

              Draw your model out




Monday, June 20, 2011
Rationalize your model

              Draw your model out
              Take measurements and walk through the model to rationalize it
              i.e. prove it to be empirically correct




Monday, June 20, 2011
Rationalize your model

              Draw your model out
              Take measurements and walk through the model to rationalize it
              i.e. prove it to be empirically correct
              You should be able to map actions to consequences:




Monday, June 20, 2011
Rationalize your model

              Draw your model out
              Take measurements and walk through the model to rationalize it
              i.e. prove it to be empirically correct
              You should be able to map actions to consequences:
              A user signs up ➙
                4 synchronous DB inserts (1 synch IOPS + 4 asynch writes)
                1 AMQP durable, persistent message
                1 asynch DB read ➙ 1/10 IOPS writing new Lucene indexes




Monday, June 20, 2011
Rationalize your model

              Draw your model out
              Take measurements and walk through the model to rationalize it
              i.e. prove it to be empirically correct
              You should be able to map actions to consequences:
              A user signs up ➙
                4 synchronous DB inserts (1 synch IOPS + 4 asynch writes)
                1 AMQP durable, persistent message
                1 asynch DB read ➙ 1/10 IOPS writing new Lucene indexes
              In a dev environment, simulate traffic and rationalize your model




Monday, June 20, 2011
Rationalize your model

              Draw your model out
              Take measurements and walk through the model to rationalize it
              i.e. prove it to be empirically correct
              You should be able to map actions to consequences:
              A user signs up ➙
                4 synchronous DB inserts (1 synch IOPS + 4 asynch writes)
                1 AMQP durable, persistent message
                1 asynch DB read ➙ 1/10 IOPS writing new Lucene indexes
              In a dev environment, simulate traffic and rationalize your model
              I call this a “data flow causality map”




Monday, June 20, 2011
Complexity will eat your lunch




Monday, June 20, 2011
Complexity will eat your lunch

                there will always be empirical variance from your model




Monday, June 20, 2011
Complexity will eat your lunch

                there will always be empirical variance from your model
                explaining the phantoms leads to enlightenment




Monday, June 20, 2011
Complexity will eat your lunch

                there will always be empirical variance from your model
                explaining the phantoms leads to enlightenment
                service decoupling in complex systems gives:




Monday, June 20, 2011
Complexity will eat your lunch

                there will always be empirical variance from your model
                explaining the phantoms leads to enlightenment
                service decoupling in complex systems gives:
                        simplified modeling and capacity planning




Monday, June 20, 2011
Complexity will eat your lunch

                there will always be empirical variance from your model
                explaining the phantoms leads to enlightenment
                service decoupling in complex systems gives:
                        simplified modeling and capacity planning
                        slight inefficiencies




Monday, June 20, 2011
Complexity will eat your lunch

                there will always be empirical variance from your model
                explaining the phantoms leads to enlightenment
                service decoupling in complex systems gives:
                        simplified modeling and capacity planning
                        slight inefficiencies
                        promotes lower contention




Monday, June 20, 2011
Complexity will eat your lunch

                there will always be empirical variance from your model
                explaining the phantoms leads to enlightenment
                service decoupling in complex systems gives:
                        simplified modeling and capacity planning
                        slight inefficiencies
                        promotes lower contention
                        requires design of systems with less coherency
                        requirements




Monday, June 20, 2011
Complexity will eat your lunch

                there will always be empirical variance from your model
                explaining the phantoms leads to enlightenment
                service decoupling in complex systems gives:
                        simplified modeling and capacity planning
                        slight inefficiencies
                        promotes lower contention
                        requires design of systems with less coherency
                        requirements
                        each isolated service is simpler and safer




Monday, June 20, 2011
Complexity will eat your lunch

                there will always be empirical variance from your model
                explaining the phantoms leads to enlightenment
                service decoupling in complex systems gives:
                        simplified modeling and capacity planning
                        slight inefficiencies
                        promotes lower contention
                        requires design of systems with less coherency
                        requirements
                        each isolated service is simpler and safer
                        SCALES.


Monday, June 20, 2011
Asynchronous Systems


                         / it’s likely you have no idea what you’re doing



Monday, June 20, 2011
Asychronous



                    •   of or requiring a form of computer control timing
                        protocol in which a specific operation begins upon
                        receipt of an indication (signal) that the preceding
                        operation has been completed.



                    •   ...or “I’ll act when you tell me you are done”



                    •   ...or a protocol wherein the initiation of a task and
                        the report of its completion are separate operations.




Monday, June 20, 2011
Protocols




                    •   Standards:

                        •   AMQP
                            (impl: ActiveMQ, RabbitMQ, OpenAMQ, etc.)

                    •   Others:

                        •   ZeroMQ

                        •   Gearman




Monday, June 20, 2011
Guarantees




                    •   Queueing protocols can be misleading.

                    •   Are you sure you did what you think you did?



                    •   Let’s use a publish as an example.




Monday, June 20, 2011
Publication



                    •   Imagine a Queue:




                    •   You assume that by calling “publish” that
                        your message is placed on the queue and
                        will eventually be consumed
                        (assuming a consumer).

                    •   Most systems are ‘more’ asynchronous than that.




Monday, June 20, 2011
Publication what you think happens

                        User Space             Kernel       Network Stack    Network Stack           Queue

                                     publish            write
                 call                                                message frame           read


                                                                                                       S
                                                                                                       A
                                                                                                       F
                                                        read                                 write     E
                                      error                          message frame
                   return

                 call                publish            write
                                                                     message frame           read


                                                                                                       B
                                                                                                       O
                                                                                                       O
                                                                                                       M
                                                        read         message frame           write
                                      error

                   return




Monday, June 20, 2011
Publication what really happens

                        User Space             Kernel       Network Stack    Network Stack           Queue

                 call                publish
                                                        write
                   return                                            message frame
                                                                                             read


                                                                                                       S
                                                                                                       A
                                                                                                       F
                 call                publish
                                                        write                                          E
                   return                                            message frame
                                                                                             read


                                                                                                       B
                                                                                                       O
                                                                                                       O
                                                                                                       M
                                                                                             write

                                                        read         message frame
                                      error

                   return




Monday, June 20, 2011
Why?




                    •   Why do queueing protocols use

                                      “silence for success?”



                    •   Simple: performance

                        •   no need for a roundtrip before the next message

                        •   success is common, failure rare




Monday, June 20, 2011
Why?




                        •   AMQP is not alone in this...

                        •   0MQ as well.




Monday, June 20, 2011
Now what?




                    •   In each component you must decide if you need:

                        •   synchronous system w/ synchronous protocol

                        •   asynchronous system w/ synchronous protocol

                        •   asynchronous system w/ asynchronous protocol




Monday, June 20, 2011
Service decisions




                    •   Knowing you can lose messages is... okay?

                        •   it can be

                        •   there are plenty of uses for unreliable
                            communications

                        •   however... generally,
                            it is much easier to build services that have end-
                            to-end guarantees.




Monday, June 20, 2011
Non-asynchronous: synchronous

                 User Space             Kernel       Network Stack    Network Stack           Database

                              publish            write
             call                                             message frame           read


                                                                                                 S
                                                                                                 A
                                                                                                 F
                                                 read                                 write      E
                               error                          message frame
               return

             call             publish            write
                                                              message frame           read


                                                                                                 B
                                                                                                 O
                                                                                                 O
                                                                                                 M
                                                 read         message frame           write
                               error

               return




Monday, June 20, 2011
Asynchronous to the purpose


                    •   Why is a Queue “asynchronous”

                    •   and a Database “synchronous”



                    •   I lied... “asynchronous” is “to the purpose.”



                    •   If the ultimate, final goal is: storage in a DB

                        •   and you return the result only after a commit

                        •   then you are synchronous




Monday, June 20, 2011
Simple example: image thumbnailing


                    •   A user uploads an email to a web site

                    •   you need to produce 7 different transformations

                        •   (size, color, etc.)



                    •   Asynchronous system:

                        •   synchronous upload protocol:

                            •   user upload -> thank you we have it

                        •   asynchronous processing

                            •   file -> 7 mutations



Monday, June 20, 2011
A (more) complete example.


                         / foursquare-like, untappd.com-like service



Monday, June 20, 2011
Better example: rewards calculation




                    •   A user performs an action on your site

                    •   and you need to reward them based on:

                        •   social network, history, value

                    •   you want to show them their reward “immediately.”



                    •   Step 1: engineer for failure.




Monday, June 20, 2011
Rewards calculation: step 1




                    •   the inability to calculate the reward
                        shall not prevent the action.
                        (think: beer checkin on untappd)

                    •   I want the reward calculation immediately.

                    •   I need the checkin to be recorded.




Monday, June 20, 2011
Rewards calculation: step 2




                    •   Decouple the rewards calculation:
                        1. receive user request
                        2. store(C)
                        3. queue the checkin(C) on QC
                        4. wait up to 500ms (reading rewards R from QR)
                        5. return R witnessed.




Monday, June 20, 2011
Rewards calculation: step 3




                    •   Decouple the rewards calculation:
                        1. dequeue checkin: C from QC
                        2. calculate rewards(C) -> R
                        3. store(R)
                        4. queue(R) on QR




Monday, June 20, 2011
Rewards calculation: win

                    •   You win big.

                        •   If the rewards calculation system is

                            •   too slow, or

                            •   goes offline

                        •   checkins still proceed and

                        •   responses are served within 500ms



                    •   You have decoupled the service availability
                        requirements of the checkin system from the
                        rewards system: happier users.


Monday, June 20, 2011
Final random thoughts


                         / think outside of the box



Monday, June 20, 2011
Things to look at: free your mind



                    •   Node.js

                        •   Javascript? Seriously?

                        •   Yes.

                    •   Forces you to think asynchronously

                    •   Forces you to share nothing

                    •   Forces you to build stateless systems

                    •   These systems scale




Monday, June 20, 2011
unsafe: when to use




                    •   “silence is success” messaging is almost always
                        useful when new, more temporally relevant data is
                        bound to arrive.

                        •   game location data

                        •   performance data

                        •   status data

                        •   the casual observer




Monday, June 20, 2011
be mindful




                        •   Always monitor:

                            •   message rates

                            •   queue depths

                            •   queue counts

                            •   connection concurrency




Monday, June 20, 2011
Thank you.



                    •   Thank you

                    •   Merci beaucoup.




Monday, June 20, 2011

Contenu connexe

Similaire à Building Scalable Systems: an asynchronous approach

Damien Tanner, Pusher
Damien Tanner, PusherDamien Tanner, Pusher
Damien Tanner, PusherMashery
 
Time Series Data Storage in MongoDB
Time Series Data Storage in MongoDBTime Series Data Storage in MongoDB
Time Series Data Storage in MongoDBsky_jackson
 
Data Viz Barcamp, Amsterdam
Data Viz Barcamp, AmsterdamData Viz Barcamp, Amsterdam
Data Viz Barcamp, AmsterdamDan Brickley
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentationTheo Schlossnagle
 
Web micro-framework BATTLE!
Web micro-framework BATTLE!Web micro-framework BATTLE!
Web micro-framework BATTLE!Richard Jones
 
Javascript framework and backbone
Javascript framework and backboneJavascript framework and backbone
Javascript framework and backboneDaniel Lv
 
How to use Mobile Applications to extend your brand
How to use Mobile Applications to extend your brandHow to use Mobile Applications to extend your brand
How to use Mobile Applications to extend your brandAxway Appcelerator
 
Fast Map Interaction without Flash
Fast Map Interaction without FlashFast Map Interaction without Flash
Fast Map Interaction without FlashDevelopment Seed
 
Building the Stonehenge using Gall's law - ruby summit brasil 2020-12-05
Building the Stonehenge using Gall's law -  ruby summit brasil 2020-12-05Building the Stonehenge using Gall's law -  ruby summit brasil 2020-12-05
Building the Stonehenge using Gall's law - ruby summit brasil 2020-12-05Fabricio Nogueira Buzeto
 
Shifting from a newspapermindset to an information perspective
Shifting from a newspapermindset to an information perspectiveShifting from a newspapermindset to an information perspective
Shifting from a newspapermindset to an information perspectiveWAN-IFRA
 
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.jsBluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.jsAltoros
 
Do This, Don't Do That: A Primer on Sitecore Development
Do This, Don't Do That: A Primer on Sitecore DevelopmentDo This, Don't Do That: A Primer on Sitecore Development
Do This, Don't Do That: A Primer on Sitecore DevelopmentStacy Heidt, PMP
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Tessel: The End of Web Development (as we know it)
Tessel: The End of Web Development (as we know it)Tessel: The End of Web Development (as we know it)
Tessel: The End of Web Development (as we know it)TechnicalMachine
 
Bone.io for HTML5 Apps
Bone.io for HTML5 AppsBone.io for HTML5 Apps
Bone.io for HTML5 AppsBrad Carleton
 
PLM Innovation Congress 2011: PLM and Engineering Software Trends
PLM Innovation Congress 2011: PLM and Engineering Software Trends PLM Innovation Congress 2011: PLM and Engineering Software Trends
PLM Innovation Congress 2011: PLM and Engineering Software Trends Oleg Shilovitsky
 
Bayesian Autoencoders (BAE) & Honest Thoughts on research
Bayesian Autoencoders (BAE) & Honest Thoughts on research Bayesian Autoencoders (BAE) & Honest Thoughts on research
Bayesian Autoencoders (BAE) & Honest Thoughts on research Bang Xiang Yong
 

Similaire à Building Scalable Systems: an asynchronous approach (20)

Damien Tanner, Pusher
Damien Tanner, PusherDamien Tanner, Pusher
Damien Tanner, Pusher
 
Time Series Data Storage in MongoDB
Time Series Data Storage in MongoDBTime Series Data Storage in MongoDB
Time Series Data Storage in MongoDB
 
Data Viz Barcamp, Amsterdam
Data Viz Barcamp, AmsterdamData Viz Barcamp, Amsterdam
Data Viz Barcamp, Amsterdam
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentation
 
Web micro-framework BATTLE!
Web micro-framework BATTLE!Web micro-framework BATTLE!
Web micro-framework BATTLE!
 
Javascript framework and backbone
Javascript framework and backboneJavascript framework and backbone
Javascript framework and backbone
 
How to use Mobile Applications to extend your brand
How to use Mobile Applications to extend your brandHow to use Mobile Applications to extend your brand
How to use Mobile Applications to extend your brand
 
Fast Map Interaction without Flash
Fast Map Interaction without FlashFast Map Interaction without Flash
Fast Map Interaction without Flash
 
Building the Stonehenge using Gall's law - ruby summit brasil 2020-12-05
Building the Stonehenge using Gall's law -  ruby summit brasil 2020-12-05Building the Stonehenge using Gall's law -  ruby summit brasil 2020-12-05
Building the Stonehenge using Gall's law - ruby summit brasil 2020-12-05
 
Shifting from a newspapermindset to an information perspective
Shifting from a newspapermindset to an information perspectiveShifting from a newspapermindset to an information perspective
Shifting from a newspapermindset to an information perspective
 
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.jsBluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
Bluemix Live Sync: Speed Up Maintenance and Delivery for Node.js
 
Feature folders
Feature foldersFeature folders
Feature folders
 
Do This, Don't Do That: A Primer on Sitecore Development
Do This, Don't Do That: A Primer on Sitecore DevelopmentDo This, Don't Do That: A Primer on Sitecore Development
Do This, Don't Do That: A Primer on Sitecore Development
 
Micro services
Micro servicesMicro services
Micro services
 
Promise notes
Promise notesPromise notes
Promise notes
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Tessel: The End of Web Development (as we know it)
Tessel: The End of Web Development (as we know it)Tessel: The End of Web Development (as we know it)
Tessel: The End of Web Development (as we know it)
 
Bone.io for HTML5 Apps
Bone.io for HTML5 AppsBone.io for HTML5 Apps
Bone.io for HTML5 Apps
 
PLM Innovation Congress 2011: PLM and Engineering Software Trends
PLM Innovation Congress 2011: PLM and Engineering Software Trends PLM Innovation Congress 2011: PLM and Engineering Software Trends
PLM Innovation Congress 2011: PLM and Engineering Software Trends
 
Bayesian Autoencoders (BAE) & Honest Thoughts on research
Bayesian Autoencoders (BAE) & Honest Thoughts on research Bayesian Autoencoders (BAE) & Honest Thoughts on research
Bayesian Autoencoders (BAE) & Honest Thoughts on research
 

Plus de Theo Schlossnagle

Plus de Theo Schlossnagle (20)

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to Complexity
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Monitoring 101
Monitoring 101Monitoring 101
Monitoring 101
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service design
 
Craftsmanship
CraftsmanshipCraftsmanship
Craftsmanship
 
SRECon Coherent Performance
SRECon Coherent PerformanceSRECon Coherent Performance
SRECon Coherent Performance
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Adaptive availability
Adaptive availabilityAdaptive availability
Adaptive availability
 
Project reality
Project realityProject reality
Project reality
 
Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
Operational Software Design
Operational Software DesignOperational Software Design
Operational Software Design
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About Performance
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Understanding Slowness
Understanding SlownessUnderstanding Slowness
Understanding Slowness
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Omnios and unix
Omnios and unixOmnios and unix
Omnios and unix
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
Xtreme Deployment
Xtreme DeploymentXtreme Deployment
Xtreme Deployment
 

Dernier

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Dernier (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Building Scalable Systems: an asynchronous approach

  • 1. Building Scalable Systems an Asynchronous Approach / pleasure and pain Monday, June 20, 2011
  • 2. Who am I? @postwait on twitter Author of “Scalable Internet Architectures” Pearson, ISBN: 067232699X Contributor to “Web Operations” O’Reilly, ISBN: Founder of OmniTI, Message Systems, Fontdeck, & Circonus I like to tackle problems that are “always on” and “always growing.” I am an Engineer A practitioner of academic computing. IEEE member and Senior ACM member. On the Editorial Board of ACM’s Queue magazine. 2 Monday, June 20, 2011
  • 3. Some rants about systems. / cutting through the crap Monday, June 20, 2011
  • 4. BIG data • BIG data: doesn’t exist • but it sure is a good way to market things. • If you measure things in petabytes as we have for the last several years you have data, not BIG data. Monday, June 20, 2011
  • 5. Data stores • “new” NoSQL systems can scale better than an RDBMS • yes • They scale better than sharded RDBMS • no • I need NoSQL systems for “BIG” data. • no Monday, June 20, 2011
  • 6. Why NoSQL • The choice to use a NoSQL system is driven by: • a more suitable data model • we desire to shift our CAP theorem constraints Monday, June 20, 2011
  • 7. The cloud • The cloud is not magic. • The cloud enables engineers to rapidly deploy an architecture to run an application. • There is nothing wrong with this. Monday, June 20, 2011
  • 8. The cloud: gotcha 1 • Provisioning services “the old way” took time due to: • installing systems • install packages • determining and enforcing data security • determining and enforcing SLA monitoring • determining and enforcing DR: RPO and RTO • documenting escalation and remediation efforts Monday, June 20, 2011
  • 9. The cloud: gotcha 2 • Quality of service in multi-tenancy environments is a very, very hard problem. • In fact, no one has solved it. Monday, June 20, 2011
  • 10. The cloud: gotcha 3 • Platform as a service provides: • MySQL? PostgreSQL? Redis? Node.js? • .. patched. • .. patched. • .. patched. • .. patched. Monday, June 20, 2011
  • 11. The cloud: gotcha 4 • Scaling isn’t magic pixie dust. • “I can run more instances of my app.” • horribly false sense of security. • You must have a scalable architecture Monday, June 20, 2011
  • 12. Design & Implementation Techniques / some say architecture != implementation Monday, June 20, 2011
  • 14. Architecture vs. Implementation Architecture is without specification of the vendor, make, and model of components. Monday, June 20, 2011
  • 15. Architecture vs. Implementation Architecture is without specification of the vendor, make, and model of components. Implementation is the adaptation of an architecture to embrace available technologies. Monday, June 20, 2011
  • 16. Architecture vs. Implementation Architecture is without specification of the vendor, make, and model of components. Implementation is the adaptation of an architecture to embrace available technologies. They are intrinsically tied. Insisting on separation is a metaphysical argument (with no winners) Monday, June 20, 2011
  • 17. Respect Engineering Math Engineering math: 19 + 89 = 110 “Precise” Math: 19 + 89 = 10.8 Monday, June 20, 2011
  • 18. Respect Engineering Math Engineering math: 19 + 89 = 110 “Precise” Math: 19 + 89 = 10.8 Ok. Ok. I must have, I must have put a decimal point in the wrong place or something. Shit. I always do that. I always mess up some mundane detail. - Michael Bolton in Office Space Monday, June 20, 2011
  • 19. Ensure the gods aren’t angry. Monday, June 20, 2011
  • 20. Ensure the gods aren’t angry. Bob: We need to grow our cluster of web servers. Monday, June 20, 2011
  • 21. Ensure the gods aren’t angry. Bob: We need to grow our cluster of web servers. Alice: How many requests per second do they do, how many do you have and what is their current resource utilization? Monday, June 20, 2011
  • 22. Ensure the gods aren’t angry. Bob: We need to grow our cluster of web servers. Alice: How many requests per second do they do, how many do you have and what is their current resource utilization? Bob: About 200 req/second, 8 servers and they have no headroom. Monday, June 20, 2011
  • 23. Ensure the gods aren’t angry. Bob: We need to grow our cluster of web servers. Alice: How many requests per second do they do, how many do you have and what is their current resource utilization? Bob: About 200 req/second, 8 servers and they have no headroom. Alice: How many req/second do you need? Monday, June 20, 2011
  • 24. Ensure the gods aren’t angry. Bob: We need to grow our cluster of web servers. Alice: How many requests per second do they do, how many do you have and what is their current resource utilization? Bob: About 200 req/second, 8 servers and they have no headroom. Alice: How many req/second do you need? Bob: 800 req/second would be good. Monday, June 20, 2011
  • 25. Ensure the gods aren’t angry. Bob: We need to grow our cluster of web servers. Alice: How many requests per second do they do, how many do you have and what is their current resource utilization? Bob: About 200 req/second, 8 servers and they have no headroom. Alice: How many req/second do you need? Bob: 800 req/second would be good. Alice: Um, Bob, 200 x 8 = 1600... you have 50% headroom on your goal. Monday, June 20, 2011
  • 26. Ensure the gods aren’t angry. Bob: We need to grow our cluster of web servers. Alice: How many requests per second do they do, how many do you have and what is their current resource utilization? Bob: About 200 req/second, 8 servers and they have no headroom. Alice: How many req/second do you need? Bob: 800 req/second would be good. Alice: Um, Bob, 200 x 8 = 1600... you have 50% headroom on your goal. Bob: No... 200 / 8 = 25 req/second per server. Monday, June 20, 2011
  • 27. Ensure the gods aren’t angry. Bob: We need to grow our cluster of web servers. Alice: How many requests per second do they do, how many do you have and what is their current resource utilization? Bob: About 200 req/second, 8 servers and they have no headroom. Alice: How many req/second do you need? Bob: 800 req/second would be good. Alice: Um, Bob, 200 x 8 = 1600... you have 50% headroom on your goal. Bob: No... 200 / 8 = 25 req/second per server. Alice: Bob... the gods are angry. Monday, June 20, 2011
  • 28. Why you’ve pissed off the gods. Monday, June 20, 2011
  • 29. Why you’ve pissed off the gods. Most web apps are CPU bound (as I/O happens on a different layer) Monday, June 20, 2011
  • 30. Why you’ve pissed off the gods. Most web apps are CPU bound (as I/O happens on a different layer) Typical box today: 8 cores are 2.8GHz or 22.4 BILLION instructions per second. Monday, June 20, 2011
  • 31. Why you’ve pissed off the gods. Most web apps are CPU bound (as I/O happens on a different layer) Typical box today: 8 cores are 2.8GHz or 22.4 BILLION instructions per second. 22x109 instr/s / 25 req/s = 880 MILLION instructions per request. Monday, June 20, 2011
  • 32. Why you’ve pissed off the gods. Most web apps are CPU bound (as I/O happens on a different layer) Typical box today: 8 cores are 2.8GHz or 22.4 BILLION instructions per second. 22x109 instr/s / 25 req/s = 880 MILLION instructions per request. This same effort (per-request) provided me with approximately 15 minutes enjoying “Might & Magic 2” on my Apple IIe - you’ve certainly pissed me off. Monday, June 20, 2011
  • 33. Why you’ve pissed off the gods. Most web apps are CPU bound (as I/O happens on a different layer) Typical box today: 8 cores are 2.8GHz or 22.4 BILLION instructions per second. 22x109 instr/s / 25 req/s = 880 MILLION instructions per request. This same effort (per-request) provided me with approximately 15 minutes enjoying “Might & Magic 2” on my Apple IIe - you’ve certainly pissed me off. No wonder the gods are angry. Monday, June 20, 2011
  • 34. Develop a model Monday, June 20, 2011
  • 35. Develop a model Queue theoretic models are for “other people.” Monday, June 20, 2011
  • 36. Develop a model Queue theoretic models are for “other people.” Sorta, not really. Monday, June 20, 2011
  • 37. Develop a model Queue theoretic models are for “other people.” Sorta, not really. Problems: Monday, June 20, 2011
  • 38. Develop a model Queue theoretic models are for “other people.” Sorta, not really. Problems: very hard to develop a complete and accurate model Monday, June 20, 2011
  • 39. Develop a model Queue theoretic models are for “other people.” Sorta, not really. Problems: very hard to develop a complete and accurate model Benefits: Monday, June 20, 2011
  • 40. Develop a model Queue theoretic models are for “other people.” Sorta, not really. Problems: very hard to develop a complete and accurate model Benefits: provides insight on architecture capacitance dependencies Monday, June 20, 2011
  • 41. Develop a model Queue theoretic models are for “other people.” Sorta, not really. Problems: very hard to develop a complete and accurate model Benefits: provides insight on architecture capacitance dependencies relatively easy to understand Monday, June 20, 2011
  • 42. Develop a model Queue theoretic models are for “other people.” Sorta, not really. Problems: very hard to develop a complete and accurate model Benefits: provides insight on architecture capacitance dependencies relatively easy to understand illustrates opportunities to further isolate work Monday, June 20, 2011
  • 44. Rationalize your model Draw your model out Monday, June 20, 2011
  • 45. Rationalize your model Draw your model out Take measurements and walk through the model to rationalize it i.e. prove it to be empirically correct Monday, June 20, 2011
  • 46. Rationalize your model Draw your model out Take measurements and walk through the model to rationalize it i.e. prove it to be empirically correct You should be able to map actions to consequences: Monday, June 20, 2011
  • 47. Rationalize your model Draw your model out Take measurements and walk through the model to rationalize it i.e. prove it to be empirically correct You should be able to map actions to consequences: A user signs up ➙ 4 synchronous DB inserts (1 synch IOPS + 4 asynch writes) 1 AMQP durable, persistent message 1 asynch DB read ➙ 1/10 IOPS writing new Lucene indexes Monday, June 20, 2011
  • 48. Rationalize your model Draw your model out Take measurements and walk through the model to rationalize it i.e. prove it to be empirically correct You should be able to map actions to consequences: A user signs up ➙ 4 synchronous DB inserts (1 synch IOPS + 4 asynch writes) 1 AMQP durable, persistent message 1 asynch DB read ➙ 1/10 IOPS writing new Lucene indexes In a dev environment, simulate traffic and rationalize your model Monday, June 20, 2011
  • 49. Rationalize your model Draw your model out Take measurements and walk through the model to rationalize it i.e. prove it to be empirically correct You should be able to map actions to consequences: A user signs up ➙ 4 synchronous DB inserts (1 synch IOPS + 4 asynch writes) 1 AMQP durable, persistent message 1 asynch DB read ➙ 1/10 IOPS writing new Lucene indexes In a dev environment, simulate traffic and rationalize your model I call this a “data flow causality map” Monday, June 20, 2011
  • 50. Complexity will eat your lunch Monday, June 20, 2011
  • 51. Complexity will eat your lunch there will always be empirical variance from your model Monday, June 20, 2011
  • 52. Complexity will eat your lunch there will always be empirical variance from your model explaining the phantoms leads to enlightenment Monday, June 20, 2011
  • 53. Complexity will eat your lunch there will always be empirical variance from your model explaining the phantoms leads to enlightenment service decoupling in complex systems gives: Monday, June 20, 2011
  • 54. Complexity will eat your lunch there will always be empirical variance from your model explaining the phantoms leads to enlightenment service decoupling in complex systems gives: simplified modeling and capacity planning Monday, June 20, 2011
  • 55. Complexity will eat your lunch there will always be empirical variance from your model explaining the phantoms leads to enlightenment service decoupling in complex systems gives: simplified modeling and capacity planning slight inefficiencies Monday, June 20, 2011
  • 56. Complexity will eat your lunch there will always be empirical variance from your model explaining the phantoms leads to enlightenment service decoupling in complex systems gives: simplified modeling and capacity planning slight inefficiencies promotes lower contention Monday, June 20, 2011
  • 57. Complexity will eat your lunch there will always be empirical variance from your model explaining the phantoms leads to enlightenment service decoupling in complex systems gives: simplified modeling and capacity planning slight inefficiencies promotes lower contention requires design of systems with less coherency requirements Monday, June 20, 2011
  • 58. Complexity will eat your lunch there will always be empirical variance from your model explaining the phantoms leads to enlightenment service decoupling in complex systems gives: simplified modeling and capacity planning slight inefficiencies promotes lower contention requires design of systems with less coherency requirements each isolated service is simpler and safer Monday, June 20, 2011
  • 59. Complexity will eat your lunch there will always be empirical variance from your model explaining the phantoms leads to enlightenment service decoupling in complex systems gives: simplified modeling and capacity planning slight inefficiencies promotes lower contention requires design of systems with less coherency requirements each isolated service is simpler and safer SCALES. Monday, June 20, 2011
  • 60. Asynchronous Systems / it’s likely you have no idea what you’re doing Monday, June 20, 2011
  • 61. Asychronous • of or requiring a form of computer control timing protocol in which a specific operation begins upon receipt of an indication (signal) that the preceding operation has been completed. • ...or “I’ll act when you tell me you are done” • ...or a protocol wherein the initiation of a task and the report of its completion are separate operations. Monday, June 20, 2011
  • 62. Protocols • Standards: • AMQP (impl: ActiveMQ, RabbitMQ, OpenAMQ, etc.) • Others: • ZeroMQ • Gearman Monday, June 20, 2011
  • 63. Guarantees • Queueing protocols can be misleading. • Are you sure you did what you think you did? • Let’s use a publish as an example. Monday, June 20, 2011
  • 64. Publication • Imagine a Queue: • You assume that by calling “publish” that your message is placed on the queue and will eventually be consumed (assuming a consumer). • Most systems are ‘more’ asynchronous than that. Monday, June 20, 2011
  • 65. Publication what you think happens User Space Kernel Network Stack Network Stack Queue publish write call message frame read S A F read write E error message frame return call publish write message frame read B O O M read message frame write error return Monday, June 20, 2011
  • 66. Publication what really happens User Space Kernel Network Stack Network Stack Queue call publish write return message frame read S A F call publish write E return message frame read B O O M write read message frame error return Monday, June 20, 2011
  • 67. Why? • Why do queueing protocols use “silence for success?” • Simple: performance • no need for a roundtrip before the next message • success is common, failure rare Monday, June 20, 2011
  • 68. Why? • AMQP is not alone in this... • 0MQ as well. Monday, June 20, 2011
  • 69. Now what? • In each component you must decide if you need: • synchronous system w/ synchronous protocol • asynchronous system w/ synchronous protocol • asynchronous system w/ asynchronous protocol Monday, June 20, 2011
  • 70. Service decisions • Knowing you can lose messages is... okay? • it can be • there are plenty of uses for unreliable communications • however... generally, it is much easier to build services that have end- to-end guarantees. Monday, June 20, 2011
  • 71. Non-asynchronous: synchronous User Space Kernel Network Stack Network Stack Database publish write call message frame read S A F read write E error message frame return call publish write message frame read B O O M read message frame write error return Monday, June 20, 2011
  • 72. Asynchronous to the purpose • Why is a Queue “asynchronous” • and a Database “synchronous” • I lied... “asynchronous” is “to the purpose.” • If the ultimate, final goal is: storage in a DB • and you return the result only after a commit • then you are synchronous Monday, June 20, 2011
  • 73. Simple example: image thumbnailing • A user uploads an email to a web site • you need to produce 7 different transformations • (size, color, etc.) • Asynchronous system: • synchronous upload protocol: • user upload -> thank you we have it • asynchronous processing • file -> 7 mutations Monday, June 20, 2011
  • 74. A (more) complete example. / foursquare-like, untappd.com-like service Monday, June 20, 2011
  • 75. Better example: rewards calculation • A user performs an action on your site • and you need to reward them based on: • social network, history, value • you want to show them their reward “immediately.” • Step 1: engineer for failure. Monday, June 20, 2011
  • 76. Rewards calculation: step 1 • the inability to calculate the reward shall not prevent the action. (think: beer checkin on untappd) • I want the reward calculation immediately. • I need the checkin to be recorded. Monday, June 20, 2011
  • 77. Rewards calculation: step 2 • Decouple the rewards calculation: 1. receive user request 2. store(C) 3. queue the checkin(C) on QC 4. wait up to 500ms (reading rewards R from QR) 5. return R witnessed. Monday, June 20, 2011
  • 78. Rewards calculation: step 3 • Decouple the rewards calculation: 1. dequeue checkin: C from QC 2. calculate rewards(C) -> R 3. store(R) 4. queue(R) on QR Monday, June 20, 2011
  • 79. Rewards calculation: win • You win big. • If the rewards calculation system is • too slow, or • goes offline • checkins still proceed and • responses are served within 500ms • You have decoupled the service availability requirements of the checkin system from the rewards system: happier users. Monday, June 20, 2011
  • 80. Final random thoughts / think outside of the box Monday, June 20, 2011
  • 81. Things to look at: free your mind • Node.js • Javascript? Seriously? • Yes. • Forces you to think asynchronously • Forces you to share nothing • Forces you to build stateless systems • These systems scale Monday, June 20, 2011
  • 82. unsafe: when to use • “silence is success” messaging is almost always useful when new, more temporally relevant data is bound to arrive. • game location data • performance data • status data • the casual observer Monday, June 20, 2011
  • 83. be mindful • Always monitor: • message rates • queue depths • queue counts • connection concurrency Monday, June 20, 2011
  • 84. Thank you. • Thank you • Merci beaucoup. Monday, June 20, 2011