It’s impossible to understand the health of your system without real-time logging. Our logging platform is one of our favorite and most popular features, and currently handles millions of requests and gigabytes per second of traffic. It’s stateless, real-time, and can provide insight and ship data to a multitude of destinations. Fastly has consistently iterated to keep up with the growth of the platform, and we’ve learned many lessons along the way. In this talk, you’ll get a peek into the system, and how we’ve developed it.
12. But Why?
This pipeline is one of the oldest systems at Fastly
Born out of our dissatisfaction w the status quo
We wanted something that would send you logs
extremely fast (stream them near realtime) to
anywhere you want (many endpoints)
22. Logging pipeline is Stateless
We don’t batch your logs
We don’t store your logs
We stream your logs in
near real-time to your
defined endpoints
We really don’t want your
logs on disk
27. Logging pipeline is Best Effort
We try our best to send logs to
your defined endpoint
Your endpoint must be up &
healthy in order for us to be
able to send data to it
We have minimal buffering
Pipeline optimized for log
streaming speed
28. Logging Endpoints
We don’t limit the number
of endpoints or log lines
per request
~8.6K active endpoints
Ecosystem of endpoints in
different stages of
evolution
Aggregators
Endpoints
s3
syslog
gcs
sumologic
bigquery
ftp
papertrail
…
36. We send a lot of data continuously to
our supported endpoints
Syslog continues to be our most
popular endpoint but S3 & GCS have
the highest volume
The 70's are still alive with a very
respectable 13 MBps to ftp and 74
kBps to sftp*
* for the non-millennials
Logging Endpoints
39. Volume Challenges
No hard limits to what you
can log, this can be
challenging
System is multi-tenant. Noisy
neighbors can affect delivery
Consider sampling for high
volume logging
40. Burden of many
endpoints
Classic integrations
challenges (each endpoint is
a downstream dependency)
Standard endpoint clients
often don’t meet our needs
Having our own clients
affords us extra optimizations
41. Endpoints & Health
Some endpoints have known
limitations (infamous
examples: S3, BigQuery, GCS)
Difficult to infer if an
endpoint is working or not
(Hard to test setup too)
Structured logging (JSON via
VCL) is challenging
42. Service Isolation
Prioritize delivery of content over
log retention
An aggregator discards the oldest
logs it has when it can’t deliver
them fast enough
In a cache node we are our own
customers so senders do the
same when they can’t reach
aggregators fast enough
43. Expectation Mismatch
Burden of a system that works so well is that it
makes you believe you have strong guarantees
Design constraints determine the SLA of the
pipeline
General advice: Understand the design choices of
the systems you use because they limit what is
possible to guarantee *
45. The team have been Busy bees
H2
H1
Platform performance
& addressing the
challenges of
individual endpoints
We are getting fancy!
46. Platform Performance
Reducing lock contention & CPU usage
Smarter memory allocation &
management
Overhauling all endpoints
Halving the time it takes for a log line to
be processed (from sender read to
aggregator line preparation)
51. Want more endpoints?
Want metrics?
Want easier structured logging?
Want VCL counters + secondly
aggregation + a higher SLA?
Dom Fee
Want More?
52. Want more endpoints?
Want metrics?
Want easier structured logging?
Want VCL counters + secondly
aggregation + a higher SLA?
Dom Fee
Want More?
53. tl;dr LOGGING
Fastly lets you extend the
visibility of your system to the
edge & gain meaningful insights
in near real-time
Is a pipeline with very specific
constraints & guarantees
Exciting things are coming!