Javaeeconf 2016 how to cook apache kafka with camel and spring boot

HOW TO COOK APACHE KAFKA
WITH CAMEL AND SPRING BOOT
2Java EE conference 2016
Ivan Vasyliev
Playtika Core Services Team

AGENDA
Basics of Apache Kafka
Apache Camel
Spring Boot
Demo
Q&A
CODE SLIDES

WHY APACHE KAFKA?
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf

WHY APACHE KAFKA?
Designed for large scale
Widely adopted by top tech companies
Hardened production quality product
Data replication out of the box

FEATURES
At most once, at least once guarantees
Batching for high throughput cases
Efficient with DEFAULT settings

EVEN MORE FEATURES
Mirroring between datacenters
Connectors to various DWH
Complex event processing integrations

HIGH LEVEL VIEW
http://kafka.apache.org/documentation.html#introduction

HIGH LEVEL VIEW
Publisher/subscriber and point-to-point models
Client which sends message – producer
Client which receives messages - consumer

WHAT IS NOT INCLUDED - JMS

WHAT IS NOT INCLUDED - JMS
Not a JMS compliant server
No message headers
Can employ message key
Send in payload
Wait for it, on roadmap
No transactions/JTA support

WHAT IS NOT INCLUDED - EXACTLY ONCE GUARANTEE

WHAT IS NOT INCLUDED - EXACTLY ONCE GUARANTEE
No exactly once guarantee
Duplicates because of failures
De-duplication is on roadmap
De-duplication on consumer
With camel EIP, by message ID/body
Consumer can tolerate duplicates

APACHE KAFKA LANGUAGE
http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf

APACHE KAFKA LANGUAGE
Topic - represents stream of messages
Contains set of partitions
Partition - subset of messages in stream
Partitioning is done by message key on producer
No “queue” in dictionary

TOPICS AND PARTITIONS
http://kafka.apache.org/documentation.html#intro_topics

TOPICS AND PARTITIONS
Partition is smallest unit of storage in kafka
Partition is data file with messages
Producer always append to end of file
Consumers scroll/seek over file
Consumer offset is persisted (zk or kafka)
Strong ordering guarantees for consumer

QUEUE SEMANTIC IS DONE ON CLIENT
http://kafka.apache.org/documentation.html#intro_consumers

QUEUE
Consumer offset is persisted by group id/per partition
Queue semantic inside of consumer group
Topic semantic between consumer groups

CONSUMPTION IS ALL ABOUT OFFSETS
https://hadoopabcd.wordpress.com/2015/04/11/kafka-building-a-real-time-data-pipeline/

CONSUMPTION IS ALL ABOUT OFFSETS
Consumer polls data from broker
Consumer offset is send (committed) to server
Auto offset commit enabled
By separate thread, periodically
Auto offset commit disabled
By your code, when batch of messages processed

CONSUMER OFFSET AND AUTO-COMMIT

CONSUMER OFFSET AND AUTO-COMMIT
With “auto-commit” enabled you can loose messages
Step1: One thread did not finish processing and failed
Step 2: Auto-commit thread does not care
Auto-commit is OK for status heartbeats
Auto-commit is NOT OK if you need “at least once”
guarantee, e.g. payment processing

DATA REPLICATION

DATA REPLICATION
Leader receives all reads and writes
Decides when to commit message
Follower syncs messages from leader
Take over if leader is down
Replication controller maintains leader
Zookeper used for coordination
Leader election
Consensus protocol

APACHE KAFKA PRODUCER

APACHE KAFKA PRODUCER
Performs load balancing
Uses message key to select partition
Finds appropriate kafka broker leader for partition
Has few configurable acknowledge modes
Can do batching in async mode

DELIVERY GUARANTEED

DELIVERY GUARANTEED
Durability with ack levels on producer side
Data replication between brokers
No in-memory state, efficient persistence
Manually committing offset on consumer side

ISSUES - OPS
Ops is not free
There is Zookeeper on board
Easy to setup with Docker/Rancher
Need to learn basics to setup and monitor

ISSUES – DATA
Can’t auto-scale existing data
Option 1: Add new partitions, they will go to new nodes
Option 2: Do it manually, move partitions around
Option 3: Wait for it, on roadmap
Mirroring seems to work into one direction
Can’t handle very large number of topics

WHY APACHE CAMEL?

WHY APACHE CAMEL?
Message routing DSL (java/scala/grooovy)
Enterprise Integration Patterns
Idempotent consumer (de-duplication)
Aggregator
…
Abstractions for testing
MockEndpoint
Route Advice

APACHE CAMEL
http://camel.apache.org/java-dsl.html

APACHE CAMEL
Lightweight and embeddable
Spring boot integration
Connectors to various message and data sources

SPRING BOOT

SPRING BOOT
Fat jar/jee containerless deployment
Autoconfiguration and conditionals
Сodeless usage of spring cloud/netflix projects

GOTCHA’S – PRODUCER FASTER THAN CONSUMER, PRECONDITIONS
Its not recommended to have lots of partitions
Each partition is consumed by one consumer thread
Producer X times faster than consumer

GOTCHA’S – PRODUCER FASTER THAN CONSUMERS, ACTIONS
Monitor kafka lag
Messages not consumed by group
Add intermediate multiplexing queue
See camel “seda” component
Think carefully since in-memory state can lead to data loss
Consider adding more partitions
Will allow more consumption threads

GOTCHA’S – PRODUCER FASTER THAN CONSUMERS, TOOLS
https://github.com/quantifind/KafkaOffsetMonitor

GOTCHA’S – AUTO OFFSET RESET
When you start test you do not receive any messages
Producer sends message before consumer is UP
Check auto.offset.reset setting in unit test
Latest (or largest in old api) can lead to consumption of only new messages
Earliest (or smallest in old api) will mean “from beginning”

GOTCHA’S – CLIENT VERSION MIGHT NEED TO MATCH SERVER
Clients supposed to be “backward compatible”, but …
If you see weird things – you should check classpath

GOTCHA’S – WATCH THE CLASSPATH
Multiple versions of kafka client
Multiple versions of kafka client dependencies
Multiple versions of zookeper client

DEPENDENCY MANAGEMENT
Use dependency management to force versions and
exclusions
Use “Maven helper” Intellij plugin to check issues
https://github.com/krasa/MavenHelper
https://plugins.jetbrains.com/plugin/7179

Thank you!
ivasylyev@playtika.com
Join us:
http://goo.gl/LuWMo3

Javaeeconf 2016 how to cook apache kafka with camel and spring boot

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Javaeeconf 2016 how to cook apache kafka with camel and spring boot

Similaire à Javaeeconf 2016 how to cook apache kafka with camel and spring boot (20)

Dernier

Dernier (20)

Javaeeconf 2016 how to cook apache kafka with camel and spring boot

Notes de l'éditeur