Matt Carroll
Infrastructure Security Engineer at Yelp
"Attestation is hard" is something you might hear from security researchers tracking nation states and APTs, but it's actually pretty true for most network-connected systems!
Modern deployment methodologies mean that disparate teams create workloads for shared worker-hosts (ranging from Jenkins to Kubernetes and all the other orchestrators and CI tools in-between), meaning that at any given moment your hosts could be running any one of a number of services, connecting to who-knows-what on the internet.
So when your network-based intrusion detection system (IDS) opaquely declares that one of these machines has made an "anomalous" network connection, how do you even determine if it's business as usual? Sure you can log on to the host to try and figure it out, but (in case you hadn't noticed) computers are pretty fast these days, and once the connection is closed it might as well not have happened... Assuming it wasn't actually a reverse shell...
At Yelp we turned to the Linux kernel to tell us whodunit! Utilizing the Linux kernel's eBPF subsystem - an in-kernel VM with syscall hooking capabilities - we're able to aggregate metadata about the calling process tree for any internet-bound TCP connection by filtering IPs and ports in-kernel and enriching with process tree information in userland. The result is "pidtree-bcc": a supplementary IDS. Now whenever there's an alert for a suspicious connection, we just search for it in our SIEM (spoiler alert: it's nearly always an engineer doing something "innovative")! And the cherry on top? It's stupid fast with negligible overhead, creating a much higher signal-to-noise ratio than the kernels firehose-like audit subsystems.
This talk will look at how you can tune the signal-to-noise ratio of your IDS by making it reflect your business logic and common usage patterns, get more work done by reducing MTTR for false positives, use eBPF and the kernel to do all the hard work for you, accidentally load test your new IDS by not filtering all RFC-1918 addresses, and abuse Docker to get to production ASAP!
As well as looking at some of the technologies that the kernel puts at your disposal, this talk will also tell pidtree-bcc's road from hackathon project to production system and how focus on demonstrating business value early on allowed the organization to give us buy-in to build and deploy a brand new project from scratch.
3. ● Matt Carroll
○ @grimmware
○ github.com/oholiab
● Infrastructure Security
Engineer at Yelp
● Ex-SRE (like a sysadmin but
with more yaml)
● Hand-wringing Linux
botherer
Who am I?
4. ● We built a supplementary* IDS and it’s pretty cool!
● Utilizing OS features as security features
● Told in (roughly) the order it happened.
What is this about?
5. ● How to get a greenfield security project off the ground
○ Treating defensive security like economics
○ Gluing together extant technologies to bootstrap
custom security tools
○ Using your business logic to maximize signal vs noise
What is this about?
15. ● What host class connected?
● What IP/ASN did it connect to?
● What’s on the other end?
● How long was the connection?
● What direction?
● How many bytes were transferred?
● What did the pslogs say?
Attestation From Inference
16. ● What host class connected?
● What IP/ASN did it connect to?
● What’s on the other end?
● How long was the connection?
● What direction?
● How many bytes were transferred?
● What did the pslogs say?
Attestation From Inference
lol jk
18. What if we could reduce MTTR for false
positives?
19. ● When a GuardDuty alert fires I want to be able to
determine if it’s a false-positive quickly
● Only for GuardDuty traffic (not internal to our VPCs)
● Only for outbound TCP (i.e. non-RFC1918)
● I want the entire calling process tree so I can see full
local causality
● Include process ownership information
● Must not require workload tooling
The problem space
22. ● “Berkeley Packet Filter” from BSD
● An in-kernel VM accessed as a device
(/dev/bpf)
● Limited number of registers
● No loops (to prevent kernel
deadlocking)
● Used for packet filtering
BPF
23. ● An in-kernel VM in Linux (and now FreeBSD!)
● It’s “extended”!
● Moar registers than BPF
● Used for hooking syscalls, tracing, proxying sockets, and
(you guessed) in-kernel packet filtering
○ Can actually offload to some NICs!
● In our case, dispatching kprobes for the tcp_v4_connect
syscall
eBPF
34. ● Filters in-kernel from Jinja2
templates which iterate over
subnets in YAML
configuration
● Events that don’t get filtered
out are passed to userland
Python daemon
● psutil used to crawl process
tree to init and log alongside
other metadata
36. Except it was a hackathon project so all
it did was print events to stdout and
could only match classful networks and
I developed it on my personal laptop.
39. ● I realised only the classful networks worked
because of the byte boundaries
● Don’t try to do clever bitwise shifting with the
mask length
● Endianness and byte ordering between network
and host don’t work how you think they do
● No srs
Matching
all CIDRs
40. ● A coworker was trying to figure out which batch jobs
were accessing a service for a data auth project
● He asked me if we could match ports
● I said I’d have it:
○ Matching ports
○ Dockerized for adhoc usage
○ By the next day
● The next day he found all
unauthenticated clients.
Dockerizing for debugging
41. ● Contains python2.7 and
dependencies (sorry)
● Needs some setup at
runtime
● Volume mount
/etc/passwd for uid
mapping
● Not your typical flags:
○ --privileged
○ --cap-add sys_admin
○ --pid host
● Don’t worry I am a
professional probably.
pidtree-bcc in Docker
42. ● We run our own PaaS called PaaSTA which uses Docker
as containerizer
● Runs the vast majority of our workloads
● Can pull-from-registry and run in a systemd unit file
without further setup
● Don’t have to install dependencies
(inc. LLVM, python2)
● Get coverage quickly
Opportunistic deploy with Docker
43. ● Previous projects with goaudit meant we already had a
secure logging pipeline for reading a FIFO and outputting
to Amazon Kinesis
○ syslog2kinesis adds other Yelpy metadata (e.g.
hostname, environment, Puppet role...)
● Originally fed to our Logstash => Elasticsearch SIEM
● Migrated to Kinesis Firehose => Splunk this quarter <3
Log aggregation
44. ● Better to ask forgiveness than permission...
● Rolled out to two security devboxes and watched the
logs roll in!
● Negligible performance impact!!!
○ As postulated, cost of subnet filtering << cost of
instantiating a TCP connection
● Lots of connections out to public Amazon IPs creating a
lot of noise
Dip Test
45. If only Amazon maintained some kind
of list of their public prefixes...
46.
47.
48. Surely you can’t load ~200 netblocks
into the kernel and compare all non-
RFC1918 tcp_v4_connect syscalls to
them in a performant manner...
49. Surely you can’t load ~200 netblocks
into the kernel and compare all non-
RFC1918 tcp_v4_connect syscalls to
them in a performant manner...
50.
51. ● ~25,000 - ~50,000 messages per hour across dev and
stage
● Once accidentally load-tested at ~80,000 messages in
5m from one host for several hours
● Nobody on the host noticed
● TCP connections are way more expensive than the
filters!
Load
52. ● bpf_trace_printk() -> BPF_PERF_OUTPUT()
○ Global (e.g. per-kernel) debug output with hand-
hacked json and string manipulation
○ To structured data in a ring buffer
○ Multi-tenancy makes it a better utility and more
testable!
● Added unit tests
● Adding integration tests
● Adding infrastructure for deploy in production
environment
Undoing my nasty hacks
53. ● De-containerize (e.g. debian package)
● Python3
● Plugin for container awareness
○ Easy mapping to service and therefore owner!
● Enable immutable loginuid and add that to metadata
○ --loginuid-immutable under `man auditctl`
○ Cryptically says “but can cause some problems in
certain kinds of containers”
● Threat modelling/hardening!
Future work