Slides for my keynote at incontrodevops.it, where I talked about distributed architectures, microservices, kubernetes and cloud native environments. All to get to the question: are microservices worth it?
2. Let’s take a step back
microkernels or monolithic kernels?
3. “ [...]
If you'd ask me to define the problem with microkernels with one word, that would be
"complexity". And it’s a kind of complexity that impacts everything:
- Debugging is hard: On monolithic kernels, you have a single image, with both code and state. Hunting a
bug is just a matter of jumping into the internal debugger (or attaching an external one, or generating a
dump, or...) and looking around. On Hurd, the state is spread among Mach and the servers, so you'll
have to look at each one trying to follow the trail left by the bug.
- Managing resources is hard: Mach knows everything about the machine, but nothing the user. The
server knows everything about the user, but nothing about the machine. And keeping them in sync is too
much expensive. Go figure.
- Obtaining a reasonable performance is har... impossible: You want to read() a pair of bytes from
disk? Good, prepare a message, call to Mach, yield a little while the server is scheduled, copy the message,
unmarshall it, process the request, prepare another message to Mach to read from disk, call to Mach, yield
waiting for rescheduling, obtain the data, prepare the answer, call to Mach, yield waiting for rescheduling,
obtain your 2 bytes. Easy!
[...]
That said, if you're into kernels, microkernels are different and fun! Don't miss the opportunity of doing
some hacking with one of them. Just don't be a fool like me, and avoid become obsessed trying to
achieve the impossible.
-- Sergio Lopez, news.ycombinator.com, 2015
6. And now...
● Microservices and
infrastructure
● From our experience at
Wikimedia: challenges...
● … and mitigations
● So, are microservices
worth it?
9. Microservices have a
cost.
Using a microservices-based architecture has costs in terms of observability,
debugging, tooling, performance. You have to compensate for most of those at
the infrastucture level. Such infrastructural investments are expensive.
10. “If you’re not using
microservices you’re
doing it wrong” -- the internet
13. Some background
● The Wikimedia Foundation is the
organization running the
infrastructure supporting
Wikipedia and other projects
● Monthly stats:
○ 17 Billion page views
○ 45 Million edits
○ 323 Thousands new registered
users
14. Our infrastructure
● 2 Primary DCs (Ashburn, Dallas)
● 3 Caching DCs (San Francisco,
Amsterdam, Singapore)
● ~1400 hardware machines
● Size of engineering: ~170 people
● SRE team: ~ 25 people (6 dedicated
to the application layer)
24. Standard interface
adapters
The proliferation of services developed
in different languages can quickly lead
to a higher cost of setup and
maintenance from all points of view.
So you need to create adapters.
25. Streamlined service
delivery
Goals: reduce the toil on the SRE
team, and the technical challenges
dev teams face when a new service
gets created. Also, provide a
platform that can run our
microservices optimally
Requirements:
● Dev/staging/prod consistency, flexibility
● Ability to schedule dynamically
resource allocation across the cluster.
● Uniform testing/CI/deployment
interfaces
● More control (and less access) to
developers
26. Implementation
● Run containers everywhere. Abstract
the logic of dockerfiles
● Use a declarative .pipeline directory to
configure how your project will be build
● Uniform testing/CI/deployment
interfaces
● Use kubernetes to manage execution of
such containers, resource allocation
27.
28. Monitoring interfaces
● Developer-defined monitoring
● “Give me an url that describes
what request/responses are to be
expected”
● Service-checker does the
server-side work, for now only
available for swagger specs
● Functional testing needs some
balance
● Prometheus adapter for legacy
services
29. Logging pipeline
● Standard formats
● Sink everything in a local daemon
● Kafka for reliable, secure transport
(even cross-dc)
● Multi-tiered ELK for storage /
consulting
● Unified request ID
● Full tracing will be implemented
soonTM
30. RPC interface [WiP]
● Envoy as a middleware/proxy
between services
● Protection from internal-DDOS,
thundering herd
● Telemetry both client and server
side
● Support our discovery service
● TLS tunneling
● Tracing
● KISS: not a true service mesh
31. Bandwidth
● Relatively effective
● Standards and
infrastructural helpers
● Developers portal / new
service request
● Implementation
guidelines.
● Rules of engagement
with SRE.
34. Good and bad reasons
➔ Performance of critical parts of the
application need different
languages
➔ You’re Google, Facebook, Uber,
Spotify etc. and you have 100s of
engineering teams and several
100s services
➔ You have very specific security
requirements and separation of
concerns is needed
➔ You’re secretly working for a large
cloud provider and you get a cut
for the bill
❖ “Devs can’t work with our monolith
code, it’s too legacy”
❖ Addressing future, hypothetical
scalability needs.
❖ You want to use the new cool toys
❖ Netflix and Spotify do it
❖ Conway’s Law
❖ “A microservice architecture is
theoretically superior”
35. How much investment?
Either you implement infrastructure yourself or you pay one built by someone else (which puts in in a
vendor lock-in).
To attain the same level of operational excellence when using microservices, your infrastructure will be
exponentially more complex and less boring. You need to hire people that will work on infrastructure
and consider that work a first-class citizen. Else, you’re harming yourself.
36. TL;DR
● Understand if you need
microservices
● Create a “cloud native”
environment
● Have middlewares
abstract the complexity
away from the services
● Invest in infrastructure
● Prepare for a non-boring
production environment