SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
What’s in a Number?
          monitoring and measuring;
          the spiritual and the carnal.




Thursday, July 12, 12
Theo Schlossnagle                                                @postwait

                Entrepreneur: OmniTI, Message Systems, Circonus
                Author: Scalable Internet Architectures, Web Operations
                Speaker: over 50 speaking engagements around the world
                Engineer: Senior ACM Member, IEEE Member, ACM Queue
                Open Source:
                        Apache Traffic Server, Varnish, Node.js, node-iptrie, node-amqp,
                        Solaris HTTP accept filers, OpenSSH + SecurID, JLog, Mungo,
                        Portable umem, Reconnoiter, mod_log_spread/spreadlogd, Spread,
                        Wackamole, Zetaback, concurrencykit, Net::RabbitMQ, MCrypt,
                        node-gmp, node-compress, and others.


Thursday, July 12, 12
b le ”
          Theo Schlossnagle c
                                                          s               i@postwait
                                                    “i ra
                Entrepreneur: OmniTI, Message Systems, Circonus
                Author: Scalable Internet Architectures, Web Operations
                Speaker: over 50 speaking engagements around the world
                Engineer: Senior ACM Member, IEEE Member, ACM Queue
                Open Source:
                        Apache Traffic Server, Varnish, Node.js, node-iptrie, node-amqp,
                        Solaris HTTP accept filers, OpenSSH + SecurID, JLog, Mungo,
                        Portable umem, Reconnoiter, mod_log_spread/spreadlogd, Spread,
                        Wackamole, Zetaback, concurrencykit, Net::RabbitMQ, MCrypt,
                        node-gmp, node-compress, and others.


Thursday, July 12, 12
Monitoring ~ Voyeurism

                Watching from the outside in.
                Observing systems and their behaviors.
                Ultimately, a single observation distills into two things:
                        a set of circumstances
                        a value




Thursday, July 12, 12
Circumstances: Dimensions
                What are you looking at?
                        outbound octets on
                        an Ethernet port (1/37) on a switch
                        in rack 12, in datacenter 6
                        for client ACME
                What time is it?



Thursday, July 12, 12
Circumstances: Dimensions
                What are you looking at?
                        length of time until DOM ready for
                        a user loading a web page (/store/foo/item/bar)
                        using Chrome 10
                        from Verizon FioS in Fulton, Maryland.
                What time is it?



Thursday, July 12, 12
Value
                Ultimately, there are two types of values we care about:
                        text
                          an version number, http code, ssh fingerprint
                        numeric
                          0.0000000000093123,
                          18446744064986053382,
                          1.1283e97, 6.33e-23,
                          0, -1, 1000, 42


Thursday, July 12, 12
Text Data
                Incredibly useful,
                because the “when” can be correlated
                with other events and conditions.
                Idea:
                        if the distinct set of text values is sufficiently small,
                        consider the text value as a dimension, and
                        the new value = 1



Thursday, July 12, 12
Text Metrics in Use




                Code deployments vs. process activity



Thursday, July 12, 12
Numeric Data

                There tends to be a lot of it.
                Example web hits:
                        one box, one metric, 1 billion / month.
                        that’s only 400 events/second.
                        each event has about 20 dimensions.




Thursday, July 12, 12
Numeric Metrics in Use




                672px ≅ 4 weeks (2419200s)   or   3600s ≅ 1px



Thursday, July 12, 12
Numeric Metrics in Use




                672px ≅ 4 weeks (2419200s) or 3600s ≅ 1px
                in previous example: 24000 samples per ‘point’


Thursday, July 12, 12
Numeric Metrics in Use




                672px ≅ 4 weeks (2419200s) or 3600s ≅ 1px
                in previous example: 24000 samples per ‘point’
                even at one minute samples: 60 samples per ‘point’
Thursday, July 12, 12
An Event
          each is a singularity:
          individual;
          unrepeatable;
          unique.
                                   mondolithic.com




Thursday, July 12, 12
Passive vs. Active
                Active is a synthesized event.
                        You perform some action an measure it.
                Passive is the collection of other events.
                        Stuff that really happened.


                Ask yourself which is better.



Thursday, July 12, 12
Conceding the Truth
                Passive events are ideal in high traffic environments.
                Active events are necessary in low traffic environments.


                “high and low” are relative, so...
                        you need both
                        always



Thursday, July 12, 12
Numbers: how they work.
                Think of a car’s speedometer.
                        it shows you how fast you are going (sorta)
                        at the moment you read it (almost)
                        the only reason this is acceptable is:
                          you have a rudimentary understanding of physics,
                          you are in the car, and
                          your body is exceptionally good
                          at detecting acceleration.
Thursday, July 12, 12
Computers are different.

                computers are fast: billions of instructions per second.
                you aren’t inside the computer.
                most “users” somehow forget rudimentary physics.
                interfaces and visual representations mislead you.
                a speedometer (or familiar readout) leads you to
                a false confidence in system stability between samples.



Thursday, July 12, 12
Write IOPS
          Simple, relatively stable, easy to trend...
          and wrong.
Thursday, July 12, 12
Zooming in shows insight.
          It is not as smooth as we thought.
          Good to know.
Thursday, July 12, 12
Dishonesty Revealed.
          This is “inside” a 5 minute sample.
          Note: we’re still sampling; now at 0.5 Hz.
Thursday, July 12, 12
You’ve Been Deceived.

          1. Denial and Isolation
          2. Anger
          3. Bargaining
          4. Depression
          5. Acceptance




Thursday, July 12, 12
You’ve Been Deceived.
                                    let’s just skip this one, okay?

          1. Denial and Isolation
          2. Anger
          3. Bargaining
          4. Depression
          5. Acceptance




Thursday, July 12, 12
You’ve Been Deceived.
                                    let’s just skip this one, okay?

          1. Denial and Isolation
                                    If you’re not angry, get angry:
          2. Anger                    it’s the next step in healing

          3. Bargaining
          4. Depression
          5. Acceptance




Thursday, July 12, 12
You’ve Been Deceived.
                                         let’s just skip this one, okay?

          1. Denial and Isolation
                                        If you’re not angry, get angry:
          2. Anger                        it’s the next step in healing

          3. Bargaining
                                    This is where it gets interesting.
          4. Depression
          5. Acceptance




Thursday, July 12, 12
Bargaining.

                I can’t reasonably store all the data:
                        likely true.
                Since I can’t store all the data,
                I have to store a summarization
                        yes; they’re called “aggregates.”




Thursday, July 12, 12
Bargaining: you’re losing.
                You are sampling things and losing data.
                        the speedometer example.


                or


                You are taking an average of a large set of samples.



Thursday, July 12, 12
“Average” is a bad bargain.

                The average of a set of numbers is:
                        the single most useful statistic
                By itself, it provides very little insight into
                the nature of the original set.
                Hello statistics!




Thursday, July 12, 12
Rounding out your statistics.
                  You should store, in order of importance:
                        average                   derivatives
                        cardinality
                        standard deviation        derivative stddev
                        covariance                derivative covariance
                        min, max                  derivative min, max




Thursday, July 12, 12
Rounding out your statistics.
                  You should store, in order of importance:
                        average                        derivatives
                        cardinality
                        standard deviation             derivative stddev
                        covariance                     derivative covariance
                        min, max                       derivative min, max

                        FWIW: we don’t store these at Circonus (today)

Thursday, July 12, 12
What statistics give you.

                With the cardinality, average and stddev
                        you can understand if a “reasonable” number of the
                        data points are “near” the average.
                However, this is often misleading itself.
                We expect normal distributions or
                (more likely) gamma distributions



Thursday, July 12, 12
So, we can make due.


                No.
                Let’s revisit the five stages of grieving.




Thursday, July 12, 12
You’ve Been Deceived.
                                         let’s just skip this one, okay?

          1. Denial and Isolation
                                        If you’re not angry, get angry:
          2. Anger                        it’s the next step in healing

          3. Bargaining
                                    This is where it gets interesting.
          4. Depression
          5. Acceptance




Thursday, July 12, 12
You’ve Been Deceived.
                                         let’s just skip this one, okay?

          1. Denial and Isolation
                                        If you’re not angry, get angry:
          2. Anger                        it’s the next step in healing

          3. Bargaining
                                    This is where it gets interesting.
          4. Depression
          5. Acceptance               Here we are.




Thursday, July 12, 12
Gamma Gamma Gamma
                                                average: 0.72




               0        1   2   3   4   5   6   7   8   9       10




Thursday, July 12, 12
Gamma Gamma Gamma




               0        1   2   3   4   5   6   7   8   9   10




Thursday, July 12, 12
Gamma Gamma Gamma
                                                average: 1.16




               0        1   2   3   4   5   6   7   8   9       10




Thursday, July 12, 12
Gamma Gamma Gamma




               0        1   2   3   4   5   6   7   8   9   10




Thursday, July 12, 12
Gamma Gamma Gamma




Thursday, July 12, 12
Gamma Gamma Gamma
                                                average: 4.67




               0        1   2   3   4   5   6   7   8   9




Thursday, July 12, 12
Do we need more?
                Wait. Stop.
                In some systems we know enough.
                We know their distributions,
                ... the leather seat, the inertia of the vessel.
                A sample here and there, an average, etc.
                Good enough: temperature, load, payments, bugs.
                Not enough: disk IOPS, memory use, CPU time.


Thursday, July 12, 12
We do need more.


                we have a lot of data sources from which
                we typically compose an average of many values, and
                understanding the distribution is extremely valuable.




Thursday, July 12, 12
Histograms: the need.
                Given an average of 4, and a cardinality of 1000
                maybe you had 1000 samples of exactly 4?
                Dunno.
                If you have a stddev of 1,
                about 68.2% of your
                data points will be
                between 3 and 5.


                This will never be as detailed as a
                histogram of the full dataset.

Thursday, July 12, 12
Histograms: the want

                Ideally, we’d want histograms
                bucketed as we see fit
                with all data represented.
                Too much data: this isn’t tractable.
                So, how do we bucket?




Thursday, July 12, 12
Bucketing Basics
                If I had numbers bounded by a small space:
                        between 0ms and 10ms
                I could bucket [0-1),[1-2),...[9,10).
                        eleven buckets, fairly manageable.
                What do I do when I get 20? or 200? or 20000?
                What do I do when I get lots of data
                between 0.001 and 1?


Thursday, July 12, 12
How we do more.

                We make an assumption (yes: ass: u & mption)
                We assume the same thing floating point assumes:
                        The difference between “very large” numbers and the
                        difference between “very small” numbers can have
                        accuracy inversely proportional to their magnitude.
                We take this and build histograms.



Thursday, July 12, 12
Solution: log buckets
                [0.001,0.01),                [2-10,2-9),[2-9,2-8),[2-8,2-7),
                [0.01,0.1),                  [2-7,2-6),[2-6,2-5),[2-5,2-4),
                [0.1,1),                     [2-4,2-3),[2-3,2-2),[2-2,2-1),
                [1,10),                      [2-1,1),[1,2), [2,4), [4,8),
                [10,100),                    [8,16),[16,32),[32,64),
                [100,1000),                  [64,128),[27,28),[29,210),
                [1000,10000),                [210,211),[211,212),[212,213),
                [10000,100000)               [213,214),[214,215)

                            Engineers might like these,
                               but humans do not.

Thursday, July 12, 12
Histograms: the compromise

                Log-linear:
                        0.1 , 0.15 , 0.2 , 0.25 , 0.3 , 0.35 , 0.4 , 0.45
                        0.5 ,        0.6 ,        0.7 ,        0.8 ,      0.9
                        1.0 , 1.5 , 2.0 , 2.5 , 3.0 , 3.5 , 4.0 , 4.5
                        5.0 ,        6.0 ,        7.0 ,        8.0 ,      9.0
                        10...
                I first witnessed this in DTrace courtesy of Bryan Cantrill



Thursday, July 12, 12
Where are we now?
                For some data, we store simple statistical aggregations
                Other bits of data we use log-linear histograms
                        quite insightful, while
                        still small enough to store
                A single “five minute average” can look more like this:




Thursday, July 12, 12
Histograms 101
                This.
                This is a histogram.
                It shows the frequency of
                values within a population.
                Height represents frequency




Thursday, July 12, 12
Histograms 101
                This.
                This is a histogram.
                It shows the frequency of
                values within a population.
                Now, height and color
                represents frequency




Thursday, July 12, 12
Histograms 101
                This.
                This is a histogram.
                It shows the frequency of
                values within a population.
                Now, only color
                represents frequency




Thursday, July 12, 12
Histograms 101
                This.
                This is a histogram.
                It shows the frequency of
                values within a population.
                Now, only color
                represents frequency




Thursday, July 12, 12
Histograms ➠ time series
                This.
                This is a histogram.
                It shows the frequency of
                values within a population.
                Now, only color
                represents frequency




Thursday, July 12, 12
Histograms ➠ time series
                This.
                This is a histogram.
                It shows the frequency of
                values within a population.
                Now, only color
                represents frequency




Thursday, July 12, 12
Histograms ➠ time series
                This.
                This is a histogram.
                It shows the frequency of
                values within a population.
                Now, only color
                represents frequency

                                              at a single time interval


Thursday, July 12, 12
Histogramificationism




Thursday, July 12, 12
You’ve Been Deceived.
                                         let’s just skip this one, okay?

          1. Denial and Isolation
                                        If you’re not angry, get angry:
          2. Anger                        it’s the next step in healing

          3. Bargaining
                                    This is where it gets interesting.
          4. Depression
          5. Acceptance               We are still here...
                                      at least in the open source world.



Thursday, July 12, 12
Reconnoiter

                Reconnoiter:
                        https://labs.omniti.com/labs/reconnoiter
                        https://github.com/omniti-labs/reconnoiter


                Open source (BSD)




Thursday, July 12, 12
Reconnoiter: backend
                C (as God intended),
                with lua bindings to ease plugin development
                Java (external) bridge for making things like JMX easier
                Esper (Java) for streaming queries (fault detection)
                AMQP and REST for component intercommunication
                PostgreSQL backing store
                Very actively developed


Thursday, July 12, 12
Reconnoiter: frontend

                HTML + CSS
                Javascript: JQuery, flot
                PHP for AJAX APIs


                Needs engineers.




Thursday, July 12, 12
Features (1)
                Data Collection:
                        most standard protocols (SNMP, REST, JDBC, etc.)
                        some nice bridges to legacy systems:
                        (nagios, munin, etc.)
                Data Storage:
                        supports sharding
                        never delete old data


Thursday, July 12, 12
Features (2)
                        We don’t do histograms now
                        Because all the old data is stored, we could
                          If we have enough data to make this worthwhile,
                          we will likley need an on-the-fly approach
                        We store raw data and rollups including:
                          average, cardinality, stddev, derivatives
                          (and counters)



Thursday, July 12, 12
Features (3)
                        separates data from visualization
                        drag-n-drop graphing
                        real-time data:
                          you can run checks “often” (every 100ms?)
                          you can see that data in the browser
                          you can compose a graph from disparate data



Thursday, July 12, 12
Active vs. Passive
                Reconnoiter supports passive metric submission
                        what version of code is deployed
                Reconnoiter’s design is focused on active collection
                        how may users have signed up
                        how many visitors to your site
                        how full is your disk
                        how many open bugs
                        how many hours spent

Thursday, July 12, 12
Bias
                Reconnoiter doesn’t solve all your problems
                (we’re busy trying to do that with Circonus)


                I believe Reconnoiter is the best tool available
                for heterogeneous, multi-datacenter, full-architecture
                monitoring.


                I am seriously biased.


Thursday, July 12, 12
You’ve Been Deceived.
                                         let’s just skip this one, okay?

          1. Denial and Isolation
                                        If you’re not angry, get angry:
          2. Anger                        it’s the next step in healing

          3. Bargaining
                                    This is where it gets interesting.
          4. Depression
          5. Acceptance               Here we are.


                             This is on you. Good luck.

Thursday, July 12, 12
Thank You!
          Questions?

Thursday, July 12, 12

Contenu connexe

En vedette

En vedette (14)

Understanding Slowness
Understanding SlownessUnderstanding Slowness
Understanding Slowness
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
SRECon Coherent Performance
SRECon Coherent PerformanceSRECon Coherent Performance
SRECon Coherent Performance
 
Omnios and unix
Omnios and unixOmnios and unix
Omnios and unix
 
Adaptive availability
Adaptive availabilityAdaptive availability
Adaptive availability
 
Craftsmanship
CraftsmanshipCraftsmanship
Craftsmanship
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentation
 
Project reality
Project realityProject reality
Project reality
 
OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012OmniOS Motivation and Design ~ LISA 2012
OmniOS Motivation and Design ~ LISA 2012
 
It's all about telemetry
It's all about telemetryIt's all about telemetry
It's all about telemetry
 
Monitoring and observability
Monitoring and observabilityMonitoring and observability
Monitoring and observability
 
A Coherent Discussion About Performance
A Coherent Discussion About PerformanceA Coherent Discussion About Performance
A Coherent Discussion About Performance
 
Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
Operational Software Design
Operational Software DesignOperational Software Design
Operational Software Design
 

Similaire à What's in a Number? Measuring System Behavior

The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInCarl Steinbach
 
Accio presentation closing event
Accio presentation closing eventAccio presentation closing event
Accio presentation closing eventimec.archive
 
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi
 
The Network The Next Frontier for Devops ?
The Network   The Next Frontier for Devops ?The Network   The Next Frontier for Devops ?
The Network The Next Frontier for Devops ?John Willis
 
OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf
OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdfOpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf
OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdfOpenStack Foundation
 
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"Randy Bias
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Adam Kawa
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10benoitg
 
Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Kellan
 
Inside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private CloudInside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private CloudAtlassian
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning ExplainedMelanie Swan
 
Insider's Technology Guide to Measuring and Improving Your ROI
Insider's Technology Guide to Measuring and Improving Your ROIInsider's Technology Guide to Measuring and Improving Your ROI
Insider's Technology Guide to Measuring and Improving Your ROIDan Leatherman
 
Developing a Globally Distributed Purging System
Developing a Globally Distributed Purging SystemDeveloping a Globally Distributed Purging System
Developing a Globally Distributed Purging SystemFastly
 
Introduction à kafka
Introduction à kafkaIntroduction à kafka
Introduction à kafkaunivalence
 
Talk at Ken Goldberg's Berkeley Lab - June 12th
Talk at Ken Goldberg's Berkeley Lab - June 12thTalk at Ken Goldberg's Berkeley Lab - June 12th
Talk at Ken Goldberg's Berkeley Lab - June 12thNick Pinkston
 
Stability patterns presentation
Stability patterns presentationStability patterns presentation
Stability patterns presentationJustin Dorfman
 

Similaire à What's in a Number? Measuring System Behavior (20)

The Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedInThe Past, Present, and Future of Hadoop at LinkedIn
The Past, Present, and Future of Hadoop at LinkedIn
 
LinkedIn
LinkedInLinkedIn
LinkedIn
 
Accio presentation closing event
Accio presentation closing eventAccio presentation closing event
Accio presentation closing event
 
Winning in a fast changing world
Winning in a fast changing worldWinning in a fast changing world
Winning in a fast changing world
 
Daniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 PanelDaniel Abadi: VLDB 2009 Panel
Daniel Abadi: VLDB 2009 Panel
 
The Network The Next Frontier for Devops ?
The Network   The Next Frontier for Devops ?The Network   The Next Frontier for Devops ?
The Network The Next Frontier for Devops ?
 
OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf
OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdfOpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf
OpenStack-Design-Summit-HA-Pairs-Are-Not-The-Only-Answer copy.pdf
 
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
OpenStack Summit :: Redundancy Doesn't Always Mean "HA" or "Cluster"
 
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
Hadoop Adventures At Spotify (Strata Conference + Hadoop World 2013)
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10
 
Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012
 
Inside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private CloudInside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private Cloud
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
Insider's Technology Guide to Measuring and Improving Your ROI
Insider's Technology Guide to Measuring and Improving Your ROIInsider's Technology Guide to Measuring and Improving Your ROI
Insider's Technology Guide to Measuring and Improving Your ROI
 
Developing a Globally Distributed Purging System
Developing a Globally Distributed Purging SystemDeveloping a Globally Distributed Purging System
Developing a Globally Distributed Purging System
 
Introduction à kafka
Introduction à kafkaIntroduction à kafka
Introduction à kafka
 
Super secure clouds
Super secure cloudsSuper secure clouds
Super secure clouds
 
Talk at Ken Goldberg's Berkeley Lab - June 12th
Talk at Ken Goldberg's Berkeley Lab - June 12thTalk at Ken Goldberg's Berkeley Lab - June 12th
Talk at Ken Goldberg's Berkeley Lab - June 12th
 
Data science unit2
Data science unit2Data science unit2
Data science unit2
 
Stability patterns presentation
Stability patterns presentationStability patterns presentation
Stability patterns presentation
 

Plus de Theo Schlossnagle

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to ComplexityTheo Schlossnagle
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwareTheo Schlossnagle
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or NotTheo Schlossnagle
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service designTheo Schlossnagle
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoringTheo Schlossnagle
 
Building Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachBuilding Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachTheo Schlossnagle
 
Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everythingTheo Schlossnagle
 

Plus de Theo Schlossnagle (13)

Adding Simplicity to Complexity
Adding Simplicity to ComplexityAdding Simplicity to Complexity
Adding Simplicity to Complexity
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 
Monitoring 101
Monitoring 101Monitoring 101
Monitoring 101
 
Distributed Systems - Like It Or Not
Distributed Systems - Like It Or NotDistributed Systems - Like It Or Not
Distributed Systems - Like It Or Not
 
Applying SRE techniques to micro service design
Applying SRE techniques to micro service designApplying SRE techniques to micro service design
Applying SRE techniques to micro service design
 
Commandments of scale
Commandments of scaleCommandments of scale
Commandments of scale
 
Is this normal?
Is this normal?Is this normal?
Is this normal?
 
Social improvements in monitoring
Social improvements in monitoringSocial improvements in monitoring
Social improvements in monitoring
 
Building Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approachBuilding Scalable Systems: an asynchronous approach
Building Scalable Systems: an asynchronous approach
 
Webops dashboards
Webops dashboardsWebops dashboards
Webops dashboards
 
Web Operations Career
Web Operations CareerWeb Operations Career
Web Operations Career
 
Http front-ends
Http front-endsHttp front-ends
Http front-ends
 
Applying operations culture to everything
Applying operations culture to everythingApplying operations culture to everything
Applying operations culture to everything
 

Dernier

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Dernier (20)

Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

What's in a Number? Measuring System Behavior

  • 1. What’s in a Number? monitoring and measuring; the spiritual and the carnal. Thursday, July 12, 12
  • 2. Theo Schlossnagle @postwait Entrepreneur: OmniTI, Message Systems, Circonus Author: Scalable Internet Architectures, Web Operations Speaker: over 50 speaking engagements around the world Engineer: Senior ACM Member, IEEE Member, ACM Queue Open Source: Apache Traffic Server, Varnish, Node.js, node-iptrie, node-amqp, Solaris HTTP accept filers, OpenSSH + SecurID, JLog, Mungo, Portable umem, Reconnoiter, mod_log_spread/spreadlogd, Spread, Wackamole, Zetaback, concurrencykit, Net::RabbitMQ, MCrypt, node-gmp, node-compress, and others. Thursday, July 12, 12
  • 3. b le ” Theo Schlossnagle c s i@postwait “i ra Entrepreneur: OmniTI, Message Systems, Circonus Author: Scalable Internet Architectures, Web Operations Speaker: over 50 speaking engagements around the world Engineer: Senior ACM Member, IEEE Member, ACM Queue Open Source: Apache Traffic Server, Varnish, Node.js, node-iptrie, node-amqp, Solaris HTTP accept filers, OpenSSH + SecurID, JLog, Mungo, Portable umem, Reconnoiter, mod_log_spread/spreadlogd, Spread, Wackamole, Zetaback, concurrencykit, Net::RabbitMQ, MCrypt, node-gmp, node-compress, and others. Thursday, July 12, 12
  • 4. Monitoring ~ Voyeurism Watching from the outside in. Observing systems and their behaviors. Ultimately, a single observation distills into two things: a set of circumstances a value Thursday, July 12, 12
  • 5. Circumstances: Dimensions What are you looking at? outbound octets on an Ethernet port (1/37) on a switch in rack 12, in datacenter 6 for client ACME What time is it? Thursday, July 12, 12
  • 6. Circumstances: Dimensions What are you looking at? length of time until DOM ready for a user loading a web page (/store/foo/item/bar) using Chrome 10 from Verizon FioS in Fulton, Maryland. What time is it? Thursday, July 12, 12
  • 7. Value Ultimately, there are two types of values we care about: text an version number, http code, ssh fingerprint numeric 0.0000000000093123, 18446744064986053382, 1.1283e97, 6.33e-23, 0, -1, 1000, 42 Thursday, July 12, 12
  • 8. Text Data Incredibly useful, because the “when” can be correlated with other events and conditions. Idea: if the distinct set of text values is sufficiently small, consider the text value as a dimension, and the new value = 1 Thursday, July 12, 12
  • 9. Text Metrics in Use Code deployments vs. process activity Thursday, July 12, 12
  • 10. Numeric Data There tends to be a lot of it. Example web hits: one box, one metric, 1 billion / month. that’s only 400 events/second. each event has about 20 dimensions. Thursday, July 12, 12
  • 11. Numeric Metrics in Use 672px ≅ 4 weeks (2419200s) or 3600s ≅ 1px Thursday, July 12, 12
  • 12. Numeric Metrics in Use 672px ≅ 4 weeks (2419200s) or 3600s ≅ 1px in previous example: 24000 samples per ‘point’ Thursday, July 12, 12
  • 13. Numeric Metrics in Use 672px ≅ 4 weeks (2419200s) or 3600s ≅ 1px in previous example: 24000 samples per ‘point’ even at one minute samples: 60 samples per ‘point’ Thursday, July 12, 12
  • 14. An Event each is a singularity: individual; unrepeatable; unique. mondolithic.com Thursday, July 12, 12
  • 15. Passive vs. Active Active is a synthesized event. You perform some action an measure it. Passive is the collection of other events. Stuff that really happened. Ask yourself which is better. Thursday, July 12, 12
  • 16. Conceding the Truth Passive events are ideal in high traffic environments. Active events are necessary in low traffic environments. “high and low” are relative, so... you need both always Thursday, July 12, 12
  • 17. Numbers: how they work. Think of a car’s speedometer. it shows you how fast you are going (sorta) at the moment you read it (almost) the only reason this is acceptable is: you have a rudimentary understanding of physics, you are in the car, and your body is exceptionally good at detecting acceleration. Thursday, July 12, 12
  • 18. Computers are different. computers are fast: billions of instructions per second. you aren’t inside the computer. most “users” somehow forget rudimentary physics. interfaces and visual representations mislead you. a speedometer (or familiar readout) leads you to a false confidence in system stability between samples. Thursday, July 12, 12
  • 19. Write IOPS Simple, relatively stable, easy to trend... and wrong. Thursday, July 12, 12
  • 20. Zooming in shows insight. It is not as smooth as we thought. Good to know. Thursday, July 12, 12
  • 21. Dishonesty Revealed. This is “inside” a 5 minute sample. Note: we’re still sampling; now at 0.5 Hz. Thursday, July 12, 12
  • 22. You’ve Been Deceived. 1. Denial and Isolation 2. Anger 3. Bargaining 4. Depression 5. Acceptance Thursday, July 12, 12
  • 23. You’ve Been Deceived. let’s just skip this one, okay? 1. Denial and Isolation 2. Anger 3. Bargaining 4. Depression 5. Acceptance Thursday, July 12, 12
  • 24. You’ve Been Deceived. let’s just skip this one, okay? 1. Denial and Isolation If you’re not angry, get angry: 2. Anger it’s the next step in healing 3. Bargaining 4. Depression 5. Acceptance Thursday, July 12, 12
  • 25. You’ve Been Deceived. let’s just skip this one, okay? 1. Denial and Isolation If you’re not angry, get angry: 2. Anger it’s the next step in healing 3. Bargaining This is where it gets interesting. 4. Depression 5. Acceptance Thursday, July 12, 12
  • 26. Bargaining. I can’t reasonably store all the data: likely true. Since I can’t store all the data, I have to store a summarization yes; they’re called “aggregates.” Thursday, July 12, 12
  • 27. Bargaining: you’re losing. You are sampling things and losing data. the speedometer example. or You are taking an average of a large set of samples. Thursday, July 12, 12
  • 28. “Average” is a bad bargain. The average of a set of numbers is: the single most useful statistic By itself, it provides very little insight into the nature of the original set. Hello statistics! Thursday, July 12, 12
  • 29. Rounding out your statistics. You should store, in order of importance: average derivatives cardinality standard deviation derivative stddev covariance derivative covariance min, max derivative min, max Thursday, July 12, 12
  • 30. Rounding out your statistics. You should store, in order of importance: average derivatives cardinality standard deviation derivative stddev covariance derivative covariance min, max derivative min, max FWIW: we don’t store these at Circonus (today) Thursday, July 12, 12
  • 31. What statistics give you. With the cardinality, average and stddev you can understand if a “reasonable” number of the data points are “near” the average. However, this is often misleading itself. We expect normal distributions or (more likely) gamma distributions Thursday, July 12, 12
  • 32. So, we can make due. No. Let’s revisit the five stages of grieving. Thursday, July 12, 12
  • 33. You’ve Been Deceived. let’s just skip this one, okay? 1. Denial and Isolation If you’re not angry, get angry: 2. Anger it’s the next step in healing 3. Bargaining This is where it gets interesting. 4. Depression 5. Acceptance Thursday, July 12, 12
  • 34. You’ve Been Deceived. let’s just skip this one, okay? 1. Denial and Isolation If you’re not angry, get angry: 2. Anger it’s the next step in healing 3. Bargaining This is where it gets interesting. 4. Depression 5. Acceptance Here we are. Thursday, July 12, 12
  • 35. Gamma Gamma Gamma average: 0.72 0 1 2 3 4 5 6 7 8 9 10 Thursday, July 12, 12
  • 36. Gamma Gamma Gamma 0 1 2 3 4 5 6 7 8 9 10 Thursday, July 12, 12
  • 37. Gamma Gamma Gamma average: 1.16 0 1 2 3 4 5 6 7 8 9 10 Thursday, July 12, 12
  • 38. Gamma Gamma Gamma 0 1 2 3 4 5 6 7 8 9 10 Thursday, July 12, 12
  • 40. Gamma Gamma Gamma average: 4.67 0 1 2 3 4 5 6 7 8 9 Thursday, July 12, 12
  • 41. Do we need more? Wait. Stop. In some systems we know enough. We know their distributions, ... the leather seat, the inertia of the vessel. A sample here and there, an average, etc. Good enough: temperature, load, payments, bugs. Not enough: disk IOPS, memory use, CPU time. Thursday, July 12, 12
  • 42. We do need more. we have a lot of data sources from which we typically compose an average of many values, and understanding the distribution is extremely valuable. Thursday, July 12, 12
  • 43. Histograms: the need. Given an average of 4, and a cardinality of 1000 maybe you had 1000 samples of exactly 4? Dunno. If you have a stddev of 1, about 68.2% of your data points will be between 3 and 5. This will never be as detailed as a histogram of the full dataset. Thursday, July 12, 12
  • 44. Histograms: the want Ideally, we’d want histograms bucketed as we see fit with all data represented. Too much data: this isn’t tractable. So, how do we bucket? Thursday, July 12, 12
  • 45. Bucketing Basics If I had numbers bounded by a small space: between 0ms and 10ms I could bucket [0-1),[1-2),...[9,10). eleven buckets, fairly manageable. What do I do when I get 20? or 200? or 20000? What do I do when I get lots of data between 0.001 and 1? Thursday, July 12, 12
  • 46. How we do more. We make an assumption (yes: ass: u & mption) We assume the same thing floating point assumes: The difference between “very large” numbers and the difference between “very small” numbers can have accuracy inversely proportional to their magnitude. We take this and build histograms. Thursday, July 12, 12
  • 47. Solution: log buckets [0.001,0.01), [2-10,2-9),[2-9,2-8),[2-8,2-7), [0.01,0.1), [2-7,2-6),[2-6,2-5),[2-5,2-4), [0.1,1), [2-4,2-3),[2-3,2-2),[2-2,2-1), [1,10), [2-1,1),[1,2), [2,4), [4,8), [10,100), [8,16),[16,32),[32,64), [100,1000), [64,128),[27,28),[29,210), [1000,10000), [210,211),[211,212),[212,213), [10000,100000) [213,214),[214,215) Engineers might like these, but humans do not. Thursday, July 12, 12
  • 48. Histograms: the compromise Log-linear: 0.1 , 0.15 , 0.2 , 0.25 , 0.3 , 0.35 , 0.4 , 0.45 0.5 , 0.6 , 0.7 , 0.8 , 0.9 1.0 , 1.5 , 2.0 , 2.5 , 3.0 , 3.5 , 4.0 , 4.5 5.0 , 6.0 , 7.0 , 8.0 , 9.0 10... I first witnessed this in DTrace courtesy of Bryan Cantrill Thursday, July 12, 12
  • 49. Where are we now? For some data, we store simple statistical aggregations Other bits of data we use log-linear histograms quite insightful, while still small enough to store A single “five minute average” can look more like this: Thursday, July 12, 12
  • 50. Histograms 101 This. This is a histogram. It shows the frequency of values within a population. Height represents frequency Thursday, July 12, 12
  • 51. Histograms 101 This. This is a histogram. It shows the frequency of values within a population. Now, height and color represents frequency Thursday, July 12, 12
  • 52. Histograms 101 This. This is a histogram. It shows the frequency of values within a population. Now, only color represents frequency Thursday, July 12, 12
  • 53. Histograms 101 This. This is a histogram. It shows the frequency of values within a population. Now, only color represents frequency Thursday, July 12, 12
  • 54. Histograms ➠ time series This. This is a histogram. It shows the frequency of values within a population. Now, only color represents frequency Thursday, July 12, 12
  • 55. Histograms ➠ time series This. This is a histogram. It shows the frequency of values within a population. Now, only color represents frequency Thursday, July 12, 12
  • 56. Histograms ➠ time series This. This is a histogram. It shows the frequency of values within a population. Now, only color represents frequency at a single time interval Thursday, July 12, 12
  • 58. You’ve Been Deceived. let’s just skip this one, okay? 1. Denial and Isolation If you’re not angry, get angry: 2. Anger it’s the next step in healing 3. Bargaining This is where it gets interesting. 4. Depression 5. Acceptance We are still here... at least in the open source world. Thursday, July 12, 12
  • 59. Reconnoiter Reconnoiter: https://labs.omniti.com/labs/reconnoiter https://github.com/omniti-labs/reconnoiter Open source (BSD) Thursday, July 12, 12
  • 60. Reconnoiter: backend C (as God intended), with lua bindings to ease plugin development Java (external) bridge for making things like JMX easier Esper (Java) for streaming queries (fault detection) AMQP and REST for component intercommunication PostgreSQL backing store Very actively developed Thursday, July 12, 12
  • 61. Reconnoiter: frontend HTML + CSS Javascript: JQuery, flot PHP for AJAX APIs Needs engineers. Thursday, July 12, 12
  • 62. Features (1) Data Collection: most standard protocols (SNMP, REST, JDBC, etc.) some nice bridges to legacy systems: (nagios, munin, etc.) Data Storage: supports sharding never delete old data Thursday, July 12, 12
  • 63. Features (2) We don’t do histograms now Because all the old data is stored, we could If we have enough data to make this worthwhile, we will likley need an on-the-fly approach We store raw data and rollups including: average, cardinality, stddev, derivatives (and counters) Thursday, July 12, 12
  • 64. Features (3) separates data from visualization drag-n-drop graphing real-time data: you can run checks “often” (every 100ms?) you can see that data in the browser you can compose a graph from disparate data Thursday, July 12, 12
  • 65. Active vs. Passive Reconnoiter supports passive metric submission what version of code is deployed Reconnoiter’s design is focused on active collection how may users have signed up how many visitors to your site how full is your disk how many open bugs how many hours spent Thursday, July 12, 12
  • 66. Bias Reconnoiter doesn’t solve all your problems (we’re busy trying to do that with Circonus) I believe Reconnoiter is the best tool available for heterogeneous, multi-datacenter, full-architecture monitoring. I am seriously biased. Thursday, July 12, 12
  • 67. You’ve Been Deceived. let’s just skip this one, okay? 1. Denial and Isolation If you’re not angry, get angry: 2. Anger it’s the next step in healing 3. Bargaining This is where it gets interesting. 4. Depression 5. Acceptance Here we are. This is on you. Good luck. Thursday, July 12, 12
  • 68. Thank You! Questions? Thursday, July 12, 12