Within the observability community, there’s a saying, “nines don’t matter if users aren’t happy,” meaning that 99.999% server uptime is a pointless goal if our customers aren’t having a fast, smooth, productive experience. But how do we know if users are happy? As members of the web performance community, we’ve been thinking about the best ways to answer that question for years. Now the observability community is asking the same questions, but coming at them from the opposite side of the stack. What can we learn from each other? Emily will talk about how approaching web performance through the lens of observability has changed the way her team thinks about performance instrumentation and optimization. She’ll cover the nuts & bolts of how Honeycomb instrumented its customer-facing web app, and she’ll show how the Honeycomb team is using this data to find and fix some of its trickiest performance issues, optimize customer productivity, and drive the design of new features.
6. Site Reliability Engineering
(SRE) Island
The Isle of Ops
Backend Engineer Island
Fullstackville
Frontend Island
Perf
Island
The Isle of
Browser
Vendor
Expansive Domain of
of Designers &
Product Managers
Observability
Web Perf
7. An observable system is one
whose internal state can be
deeply understood just by
observing its outputs.
Observability
13. Same deal with Web Perf & Observability folks
Web Perf practitioner
@solutionist | https://www.flickr.com/photos/solutionist/48227528782/
Observability practitioner
@solutionist | https://www.flickr.com/photos/solutionist/48227528782/
14. An observable system is one
whose internal state can be
deeply understood just by
observing its outputs.
Observability is a system
property, just like performance.
Observability
16. Same deal with Web Perf & Observability folks
Web Perf practitioner
@solutionist | https://www.flickr.com/photos/solutionist/48227528782/
Observability practitioner
@solutionist | https://www.flickr.com/photos/solutionist/48227528782/
Worries about developer adoption Thinks these numbers look wrong
Cares deeply about UX Not sure how to balance this,
my real job, with what they think
they pay me for
Debating nine
different ways
to measure
the same thing
Lots of emotional energy
going into some standards
or spec process
Obsessed with numbers
18. This talk
1. Talk about birds for five minutes
2. Data models
3. SLOs vs. performance budgets
4. Observability for perf optimization
5. Observability for UX design
21. Two communities, different superpowers
Web Perf:
" Sophisticated tooling, many tool types
" Amazing, mature developer experience
" Lots of experience improving the ecosystem
through specs & new browser APIs
Observability:
" Focused on instrumentation best practices
" Goal is to enable answering any question
about the state of your software
" Just starting on specs
26. High cardinality
Fields that may have many unique values
Common examples:
• email address
• username / user id / team id
• server hostname
• IP address
• user agent string
• build id
• request url
• feature flags / flag combinations
27. What about the Three Pillars?
Logs
Metrics
Traces
207.46.1.2 -
[03/Nov/2016:16:11:43 -0700]
”GET /robots.txt HTTP/1.1"
29. Derive all Three Pillars from Events
1 (structured) log line ~= 1 event
metrics can be derived from events
traces = n events (spans) with parent/child relationships
31. Derive all Three Pillars from Events
1 (structured) log line ~= 1 event
metrics can be derived from events
traces = n events (spans) with parent/child relationships
36. When we create events (spans)
• On page load
• On history state change (SPA navigation)
• On significant user actions
• On error (also send to error monitoring tools)
• On page unload
37. What’s in an event?
{
// For the page load event, collect information about the page
“type”: “page-load”,
“duration”: “1278”,
“device_type”: “tablet”,
“connection_type”: “3g”,
“user_agent”: “Mozilla/5.0 (Macintosh)…”,
// ...all feature flag states
// ...all navigation timing measurements
}
{
// For the button click, collect information about the interaction
“type”: “usage-mode-button-click”,
“duration”: “28”,
“location”: “dataset list”,
“animation_render_duration”: “127”,
}
47. SLOs, SLIs, SLAs
For each facet of system performance (latency, errors, etc.) ask:
• SLI: Service Level Indicator — what do we measure?
○ response time of web app requests
• SLA: Service Level Agreement — what did we promise our customers?
○ response time will be under 10 seconds, 99% of the time
• SLO: Service Level Objective — what number would keep our users happy?
○ response time should be under 1 second, 99.9% of the time
49. Two tools, different superpowers
Performance Budgets:
" Many easy ways to get started
" Great tooling support (webpack, lighthouse)
" Easy to understand
" Business stakeholders might not care
SLOs (Service Level Objectives):
" Extremely flexible—use any # you measure!
" Get burn alerts when your budget runs low
" Read multiple book chapters to understand
" You get business stakeholder buy-in up front
50. 5. Observability for Perf Optimization
Using observability to make things faster
56. High-performance browser instrumentation code:
1. Batch requests together so you don’t run down battery & use up resources
2. Use the Beacon API to send events in a non-blocking way
3. Use `requestIdleCallback` or `setTimeout` to handle slower calculations
Don’t shoot yourself in the foot
while trying to look at your own foot
70. How to tell a hawk from a falcon
Red-Tailed Hawk
@hmclin | https://www.flickr.com/photos/hmclin/14119319574
Peregrine Falcon
@zonotrichia | https://www.flickr.com/photos/zonotrichia/31001823086