This talk is a followup to Deploying systemd at scale that was presented at systemd.conf 2016, and covers the aftermath of the migration of our fleet to CentOS 7. Now that systemd is available everywhere, we found more and more services that started adopting it for their deployment, leveraging its features and occasionally exposing interesting behaviors. At the same time, we've been able to hone our process for integrating and rolling out new versions of systemd on the fleet, and started building tooling to manage and monitor it at scale.
6. • 100% of the bare metal feet on CentOS 7!
• Migrated countless services to systemd
• libsystemd integration in our build system
• Containers: see Zeal’s talk later today!
Recap
CentOS 7 migration
8. • systemd 231 232 233 (234 235)→ → → →
• Also tracking util-linux, dbus, etc.
• Published our Rawhide-based backports on:
https://github.com/facebookincubator/rpm-backports
• Binary RPMs based on it on:
https://copr.fedorainfracloud.org/coprs/jsynacek/systemd-
backports-for-centos-7/
Tracking upstream
Staying up to date
9. • Not specifc to systemd
• Duplicate systemd RPMs: package-cleanup wrapper
• rpmdb corruption: dcrpm
• Mismatch between systemd and systemd-libs
Tracking upstream
RPM issues
if ldd /usr/lib/systemd/systemd | grep ‘systemd.*not found$’
yum reinstall -y $systemd_packages
fi
10. • Rebuild packaging for the Meson transition
• Backported meson, ninja-build in CentOS
• Standalone systemd-compat-libs
https://github.com/facebookincubator/systemd-compat-libs
Tracking upstream
Meson and compat-libs
11. Tracking upstream
tty woes with 234
• When rolling 234 we discovered a race in the kernel tty
subsystem (repros all the way back to 4.0)
• Turns out both systemd and Tupperware use the real tty0
• Investigation still in progress, likely a use-after-free bug
• Tupperware should probably just use a pty here
13. • See Chris’s talk tomorrow for all things cgroup2!
• Using systemd to partition services and apply limits
• Lightweight daemon to collect metrics from /sys/fs/cgroup
• Chef API to apply confgurations and manage experiments
Resource management
Rolling out cgroup2
16. Service monitoring
• systemd exposes lots of useful metrics over dbus
• Unit properties (e.g. *Timestamp*, NRestarts)
• Status events (e.g. unit state changes)
• Options: python-dbus, sd-bus, coreos/go-systemd/dbus
Getting metrics out of systemd
17. Service monitoring
• Lightweight daemon to feed systemd metrics to various
monitoring systems
• Polling for unit properties, subscriptions for status events
• Initial implementation in golang
systemdmon
18. Service monitoring
• Thin Cython wrapper on top of sd-bus
• Expose systemd dbus object model
• ipython REPL for prototyping
• Will be opensourced together with systemdmon
pystemd
20. Case studies
dbus reliability
• Issues with dbus-daemon or the system bus afect systemd
• systemctl hanging or failing Chef failing→
• Easy to DoS the bus, especially with user services
• Hard to remediate without a reboot
• Looking forward to dbus-broker!
21. Case studies
rpm macros for systemd services
• By default RPM macros will restart units on upgrade...
• …which is a problem if you’ve also setup Chef to restart
• Solution: knob in our internal packaging tool to optionally
disable the restart macro
22. Case studies
Logging
• Journald setup: 10MB in memory logging feeding rsyslog
• journalctl is awesome
• Double writing problem
• No way to set per-unit limits
23. Case studies
Unit loops
• Easy to create loops with x-systemd-requires in fstab
• systemd will delete a random unit to break loops
• Solution: add _netdev to the fstab entry
• systemd-analyze to help debugging
systemd-tmpfiles-setup.service: Job systemd-tmpfiles-
setup.service/start deleted to break ordering cycle starting
with smc_proxy.service/start
24. Case studies
Transient unit creep
• systemd-run creates units in /run/systemd/transient
• If the unit fails, it sticks around in ‘failed’ state
• 10k failed units 50% cpu usage for pid 1→
• 30k failed units 100% cpu usage for pid 1→
• Fix: call systemctl reset-failed periodically
25. Case studies
KillMode=process
• KillMode=process may leave stray processes in the cgroup
• Changes to unit slices don’t apply unless the old slice is
empty
• Fix: move to use KillMode=control-group
26. Case studies
Unit escaping
• Escape logic relies on shell control characters:
/dev/dm0 dev-dmx2d1.swap→
• Chef fx: https://github.com/chef/chef/pull/6230
• path_to_unit wrapper in fb_systemd