DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation with eBPF

London | 14-15 November 2019
A Kernel of Truth
Matt Carroll

Matt Carroll
@grimmware
A Kernel of Truth
Intrusion Detection and Attestation with eBPF

● Matt Carroll
○ @grimmware
○ github.com/oholiab
● Infrastructure Security
Engineer at Yelp
● Ex-SRE (like a sysadmin but
with more yaml)
● Hand-wringing Linux
botherer
Who am I?

● We built a supplementary* IDS and it’s pretty cool!
● Utilizing OS features as security features
● Told in (roughly) the order it happened.
What is this about?

● How to get a greenfield security project off the ground
○ Treating defensive security like economics
○ Gluing together extant technologies to bootstrap
custom security tools
○ Using your business logic to maximize signal vs noise
What is this about?

Yelp’s Mission
Connecting people with great
local businesses.

● Built on Mesos + Marathon
+ Docker
● More recently migration
towards k8s
● Majority of our workloads
run here
● What are they all doing???
PaaSTA

Kind of unsurprising, also pretty unhelpful...

● What host class connected?
● What IP/ASN did it connect to?
● What’s on the other end?
● How long was the connection?
● What direction?
● How many bytes were transferred?
● What did the pslogs say?
Attestation From Inference

● What host class connected?
● What IP/ASN did it connect to?
● What’s on the other end?
● How long was the connection?
● What direction?
● How many bytes were transferred?
● What did the pslogs say?
Attestation From Inference
lol jk

Context is lost as soon as the
instantiating process ends

What if we could reduce MTTR for false
positives?

● When a GuardDuty alert fires I want to be able to
determine if it’s a false-positive quickly
● Only for GuardDuty traffic (not internal to our VPCs)
● Only for outbound TCP (i.e. non-RFC1918)
● I want the entire calling process tree so I can see full
local causality
● Include process ownership information
● Must not require workload tooling
The problem space

● “Berkeley Packet Filter” from BSD
● An in-kernel VM accessed as a device
(/dev/bpf)
● Limited number of registers
● No loops (to prevent kernel
deadlocking)
● Used for packet filtering
BPF

● An in-kernel VM in Linux (and now FreeBSD!)
● It’s “extended”!
● Moar registers than BPF
● Used for hooking syscalls, tracing, proxying sockets, and
(you guessed) in-kernel packet filtering
○ Can actually offload to some NICs!
● In our case, dispatching kprobes for the tcp_v4_connect
syscall
eBPF

Enjoy writing your filters
as an array of BPF VM
instructions...

How it works
sd
54321
for each syscall...

● Filters in-kernel from Jinja2
templates which iterate over
subnets in YAML
configuration
● Events that don’t get filtered
out are passed to userland
Python daemon
● psutil used to crawl process
tree to init and log alongside
other metadata

Except it was a hackathon project so all
it did was print events to stdout and
could only match classful networks and
I developed it on my personal laptop.

Don’t try
to be
clever
with
bitwise
network
matching

● I realised only the classful networks worked
because of the byte boundaries
● Don’t try to do clever bitwise shifting with the
mask length
● Endianness and byte ordering between network
and host don’t work how you think they do
● No srs
Matching
all CIDRs

● A coworker was trying to figure out which batch jobs
were accessing a service for a data auth project
● He asked me if we could match ports
● I said I’d have it:
○ Matching ports
○ Dockerized for adhoc usage
○ By the next day
● The next day he found all
unauthenticated clients.
Dockerizing for debugging

● Contains python2.7 and
dependencies (sorry)
● Needs some setup at
runtime
● Volume mount
/etc/passwd for uid
mapping
● Not your typical flags:
○ --privileged
○ --cap-add sys_admin
○ --pid host
● Don’t worry I am a
professional probably.
pidtree-bcc in Docker

● We run our own PaaS called PaaSTA which uses Docker
as containerizer
● Runs the vast majority of our workloads
● Can pull-from-registry and run in a systemd unit file
without further setup
● Don’t have to install dependencies
(inc. LLVM, python2)
● Get coverage quickly
Opportunistic deploy with Docker

● Previous projects with goaudit meant we already had a
secure logging pipeline for reading a FIFO and outputting
to Amazon Kinesis
○ syslog2kinesis adds other Yelpy metadata (e.g.
hostname, environment, Puppet role...)
● Originally fed to our Logstash => Elasticsearch SIEM
● Migrated to Kinesis Firehose => Splunk this quarter <3
Log aggregation

● Better to ask forgiveness than permission...
● Rolled out to two security devboxes and watched the
logs roll in!
● Negligible performance impact!!!
○ As postulated, cost of subnet filtering << cost of
instantiating a TCP connection
● Lots of connections out to public Amazon IPs creating a
lot of noise
Dip Test

If only Amazon maintained some kind
of list of their public prefixes...

Surely you can’t load ~200 netblocks
into the kernel and compare all non-
RFC1918 tcp_v4_connect syscalls to
them in a performant manner...

● ~25,000 - ~50,000 messages per hour across dev and
stage
● Once accidentally load-tested at ~80,000 messages in
5m from one host for several hours
● Nobody on the host noticed
● TCP connections are way more expensive than the
filters!
Load

● bpf_trace_printk() -> BPF_PERF_OUTPUT()
○ Global (e.g. per-kernel) debug output with hand-
hacked json and string manipulation
○ To structured data in a ring buffer
○ Multi-tenancy makes it a better utility and more
testable!
● Added unit tests
● Adding integration tests
● Adding infrastructure for deploy in production
environment
Undoing my nasty hacks

● De-containerize (e.g. debian package)
● Python3
● Plugin for container awareness
○ Easy mapping to service and therefore owner!
● Enable immutable loginuid and add that to metadata
○ --loginuid-immutable under `man auditctl`
○ Cryptically says “but can cause some problems in
certain kinds of containers”
● Threat modelling/hardening!
Future work

● Performance improvements
○ BPF longest-match maps
○ Pre-processing masks
○ Probably totally unnecessary
● Moar syscalls!
○ TCP listens, ipv6, UDP, SUID, forwarded SSH socket
reads…
● SIEM tooling
○ ASN matching, bad IP matching, GuardDuty auto-
enrichment...
Future work

www.yelp.com/careers/
We're Hiring!

@YelpEngineering
fb.com/YelpEngineers
engineeringblog.yelp.com
github.com/yelp

London | 14-15 November 2019
https://github.com/Yelp/pidtree-bcc
@grimmware
Thanks for listening!

DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation with eBPF

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation with eBPF

Similaire à DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation with eBPF (20)

Plus de DevSecCon

Plus de DevSecCon (20)

Dernier

Dernier (20)

DevSecCon London 2019: A Kernel of Truth: Intrusion Detection and Attestation with eBPF