This document provides guidance on upgrading Kafka. It emphasizes the importance of upgrading early and often to the latest bugfix release in order to address security vulnerabilities and other bugs. It recommends using automated rolling upgrades to upgrade brokers one by one with zero downtime. It also outlines best practices like backing up configurations, reading release notes, and ensuring protocol compatibility when upgrading.
11. 11
It was a dark and
stormy night...
● Support got a call: “We
are seeing high number
of URP. Some brokers
are not replicating”
● Earlier: Several Kafka
brokers were restarted
in quick succession
for… reasons.
12. 12
kafka.common.InvalidOffsetException: Attempt to append an
offset (1239742691) to position 35728 no larger than the
last offset appended (1239742822) to
/data3/kafka/mt_xp/00000000001239444214.index
[2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting
because log truncation is not allowed for topic test, Current leader 1's
latest offset 0 is less than replica 2's latest offset 151
(kafka.server.ReplicaFetcherThread)
"Error due to... kafka.common.KafkaException: Error
processing data for partition topic-KK offset NNNN ...
Caused by: java.lang.IllegalArgumentException: Out of
order offsets found in List(...
13. 13
0.11.0.0
KIP-101: Added leader epoch.
Followers truncate to first offset
of new epoch
< 0.11.0
High watermark used for
follower log truncation.
Unfortunately, it is propagated
asynchronously.
2.0.0
KIP-279: Negotiating the
correct epoch. Followers get the
prev epoch + its last offset. Can
ask for older epoch.
2.3.0
KIP-320: Consumers handle log
truncation from unclean election
without resetting offsets.
KIP-416: Replica threads no longer die.
2.1.0
KIP-320: Fencing against
zombie replicas. Fetch requests
include epoch. Zombies will be
rejected.
???
????
16. 16
It was a dark and
stormy night...
● You restarted a broker
● When it came back,
some replicas were
offline
● Or maybe partitions are
truncated to zero
● Or maybe clients
timeout during
controlled leader
election
17. 17
0.11.0.0
Controller is now single
threaded, has integration tests,
several bug fixes
2.1.0
Controller fenced against direct
updates to ZK
< 0.11.0.0
Controller is multi-threaded,
communicates with ZK and
brokers synchronously and
un-batched
1.1.0
Async ZK client: faster failover
and election. Fencing against
controller coming back after GC
pause as zombie.
2.2.0
Broker epochs protect brokers
from controller messages sent
before restart
???
???
19. 19
Kubernetes?
● Your Java clients must be 2.2.0
or 2.1.1 and above
● Otherwise we don’t resolve IPs
correctly and don’t recover
correctly
● KAFKA-6863
● KAFKA-7755
20. 20
JBOD?
● You need to be at least on: 1.1.2, 2.0.2,
2.1.2 or 2.2.0
● And this may not be enough…
EOS?
● Definitely not earlier than 1.0.2
● Watch for KIP-360
21. 21
Vulnerabilities
● CVE-2018-17196 AUTHENTICATED
CLIENTS WITH WRITE PERMISSION MAY
BYPASS TRANSACTION/IDEMPOTENT ACL
VALIDATION - 2.11 and later
● CVE-2018-1288 AUTHENTICATED KAFKA
CLIENTS MAY INTERFERE WITH DATA
REPLICATION -
● CVE-2017-12610 AUTHENTICATED KAFKA
CLIENTS MAY IMPERSONATE OTHER
USERS
22. 22
Lessons! First release of a major
feature has more bugs
Bugfix releases have
only bug fixes.
Always upgrade to
latest bug fix release :
2.3.1
Especially if you care
about security.
24. 24
The Basics
● Backup configuration
● Read the docs
● Read the “notable changes” list.
● Don’t skip anything in bold font
● Make sure you have current version
configured:
inter.broker.protocol.version=CURRENT_KAFKA_VERSION
log.message.format.version=CURRENT_MESSAGE_FORMAT_VERSI
ON
32. 32
Protocol Bumps ● Repeat the roll twice
● First time you upgrade the Kafka binaries
● Second time you change to new version
one or both of:
○ inter.broker.protocol.version
○ Log.message.format.version
● You can run for years with new version and
old protocol
● Only bump protocol and message format
when you are certain things look good.
● Try to bump message format version after
most consumers are on new version
33. 33
Rollbacks and Downgrades
As long as you just updated
binaries - downgrade is easy.
Switch back to old binaries.
Some message and protocol
version bumps are reversible,
but there are lots of caveats.
Tread carefully.
The protocol bump from 2.1.x to
2.2.0+ is not reversible. Not
even with replication to older
cluster!
35. 35
How not to upgrade
● Add a broker with new version, move
partitions, remove old broker.
● Replicate to a cluster of a new version.
● Set up two clusters. Produce to both
for 30 days. Slowly move consumers
over.
36. 3636
Some (bad)
reasons not to
upgrade
My users won’t let me
upgrade.
Upgrades are
considered risky in my
organization, we are
discouraged from
making any changes.
Testing is difficult.
Once we went through
all the tests, we just
want to stick to our
version.
I don’t have time.
42. 42
Primary Color
Palette
Ok to use for color
blocks, icons, misc art,
drawing attention to text.
#FF671F
Orange
#ADC9E8
Light Blue
#3AB0C8
Teal
#4597CB
Blue
43. 43
Secondary Colors
Can be used to highlight text or elements if primary colors are insufficient.
#5233B2
Purple
#FB5660
Red
#00004B
Navy
#003CC8
Blue
44. 44
Utility Color Palette
Ok to used for text, dividers, table borders/fills, icons, misc art.
Default color for text on white backgrounds is BLACK.
#000000
Black
#C8C8C8
Grey
#F4F4F4
Light Grey
#666666
Dark Grey
45. 45
We know it can be hard,
but please try to stick to the theme colors in
this template.
When you are creating new shapes or changing the color of type, there is a section
in your color palette labeled “Theme.”
46. 46
Typography
Headings
When creating a new text box,
the font defaults to Arial in
Google Slides and PowerPoint.
Please make sure to select one
of the outlined fonts listed here.
H1 / Roboto Bold
H2 / Roboto Bold 30pt
H3 / Roboto Bold 20pt
40pt
47. 47
Typography
Body Text
When creating a new text box,
the font defaults to Arial in
Google Slides and PowerPoint.
Please make sure to select one
of the outlined fonts listed here.
Body / Roboto Regular
Small / Roboto Light
Source Code / Source Code Pro 14pt
14pt
18pt
48. 4848
A few helpful tips
When copying and pasting slides from different decks, please select:
theme > in this presentation > the lastest version (Use latest)
If starting with an old deck:
theme > import theme > company drive > presentations > templates
You can select a slide on the left and select layout to preview and select
the best layout option for your content.
Note: You still have to select a layout for each slide.