Today's systems are complex and the most successful products are SaaS. When you need to ship a SaaS architecture to someone (private SaaS) there are a lot of moving parts to install and maintain. I'll talk about what we do at Circonus to provide our complex software stack on large clusters on-premise using Chef as the orchestration framework.
The InstallShield of the 21st Century – Theo Schlossnagle
1. Chef
The InstallShield®
of the 21st Century
InstallShield is a registered trademark of Flexera Software LLC
Neither Circonus nor Chef or their related companies have or make any claim thereto.
Monday, May 6, 13
2. Hi, I’m @postwait
My background is in computing systems engineering
hardware debugging
cut kernel code
network debugging
storage debugging
cut user-space code
operating system release managment
Monday, May 6, 13
3. Circonus is...
An API-accessible, self-service, state-of-the-art,
scalable monitoring and telemetry analysis platform.
With Chef cookbooks available to make monitoring
your architecture elegant and simple.
Run as a SaaS... sometimes.
Monday, May 6, 13
5. This talk is...
about the challenges involved with
turn-key software installation.
Monday, May 6, 13
6. Installing software
Installing (most closed and some open) software
is easy.
RPM, deb, IPS, etc.
(they all suck, but they all work passably well)
<pkgcommand> install <pkgname>
e.g. on OmniOS:
pkg install chef
Monday, May 6, 13
8. Operating Software
Is actually not that hard.
Most software that survives consumers, runs.
Except when it doesn’t.
This is why we monitor and measure things.
Monday, May 6, 13
9. What if...
Your software was a complex distributed system?
Your software was the monitoring system itself?
Your software wasn’t your SaaS,
but instead your customer’s SaaS on their IaaS.
Monday, May 6, 13
10. We didn’t start here...
Our software started as SaaS only.
We ran the one, true copy.
Turns out there is still a strong business model around
selling enterprise software that
companies run on-premise.
(in their own cloud.... whatever).
Monday, May 6, 13
11. Right tool for the job...
In SaaS, we take the “right tool for the job” seriously.
In shipped software, we historically have not.
integration costs are high
support costs are high
licensing challenges
Monday, May 6, 13
12. A brief look at Circonus
PostgreSQL/pg_amqp, RabbitMQ, redis, memcached,
Apache/mod_perl, Node.js, ElasticSearch, OpenSSL
(ca)
CEP system (Ernie), case management system (Bert),
real-time OLAP system (Razalbath), websockets/etc.
(Enzo), metric storage (Srollup & Snowth), API services,
web portal, broker (noitd), metric transit (stratcond),
long-tail storage services
Monday, May 6, 13
13. More than 20
asynchronous components
Everything can go wrong...
it’s just like a Internet infrastructure
Despair.
Monday, May 6, 13
14. How Chef helped
First: Chef did not save the day.
It did help us quite a bit.
It provided a framework for
converging on an expected state.
Monday, May 6, 13
16. 1 databag to rule them all
One databag ‘site.json’ that describes
the global topology of all that is Circonus.
It exposes all tunables our clients can control.
A special role called ‘self-configure’
/opt/circonus/bin/run-chef self-configure
builds a node file with appropriate roles
Monday, May 6, 13
17. Chef templating...
Services need to know about other services
site.json has what and where
Chef uses that to template out
all the configs for
all the services running
on the current node
Monday, May 6, 13
18. Chef’s upsides
Chef knows how to start services and
how to restart them if they are disabled/in maintenance
(this is crucially important in distributed systems)
It means that all dependent services can
self-recover simply through diligence.
Monday, May 6, 13
19. Chef’s upsides
As everything is “automated,” the system is far less
tolerant of procedures that sometimes don’t work.
It has the effect of
automating the QA around
installation and maintenance tasks
Monday, May 6, 13
20. Chef’s downsides
It has been horribly impracticable to
perform good upgrades (due to packaging)
act as omnipotent state control (due to cost)
Monday, May 6, 13
21. Chef’s downsides
We support OmniOS and Linux
I want to do the same thing on both:
Apps should live in their own filesystem, logs in
another, data in another, use a filesystem for
everything. And, of course, use ZFS.
Instead of role: “ernie”
we have roles: “ernie,” “ernie-omnios”, “ernie-rhel”
Monday, May 6, 13
22. Summary
Chef sucks. Long live Chef.
The process of automating through Chef has:
improved the quality of our deployment process
made it possible to ship a software platform to clients
that runs all the same bits as our production SaaS
and stays up-to-date with our latest bits
Monday, May 6, 13
23. Thanks
Eric Sproul ~ Circonus Release Engineer
(who implemented all of these things)
Monday, May 6, 13