Imagine a world where operational data is continuously flowing from applications and devices at an extremely high rate. Now imagine services intercepting this data and analyzing it real time. Sounds futuristic? It's not—it's here today. Mark Richards describes what streaming architecture is all about—what it is, when to use it, and how to implement it in a microservices ecosystem. Mark describes the overall ecosystem for streaming architecture—including a brief discussion about the differences in Apache Spark, Flink, and Hadoop—and then explains how Apache Kafka works. Using live coding examples in Kafka, Mark demonstrates some techniques for leveraging streaming data such as metrics gathering for monitoring, distributed logging, request distribution analysis, and threshold analysis in a microservices ecosystem. Join Mark as he discusses some of the implications and limitations of Kafka and the important topic of when to choose Kafka over standard messaging such as JMS, AMQP, and MSMQ.
3. NFJS Software Symposium Series 2016
Author of Software Architecture Fundamentals Video Series (O’Reilly)
Author of Microservices Pitfalls and AntiPatterns (O’Reilly)
Author of Microservices vs. Service-Oriented Architecture (O’Reilly)
Author of Enterprise Messaging Video Series (O’Reilly)
Author of Java Message Service 2nd Edition (O’Reilly)
Independent Consultant
Hands-on So*ware Architect
Published Author / Conference Speaker
www.wmrichards.com
Mark Richards
Leveraging Streaming Data in a
Microservices Ecosystem
agenda
kafka producers and consumers
streaming architecture patterns
real-world examples of streaming data
kafka vs. standard messaging
kafka overview
11. kafka producers and consumers
service
(producer)
client
metrics
(consumer)
offsets (0.11)
service
(producer)
kafka producers and consumers
client
metrics
(consumer)
offsets (0.11)
12. kafka producers and consumers
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "…kafka…StringSerializer");
props.put("value.serializer", "…kafka…StringSerializer");
KafkaProducer<String, String> producer =
new KafkaProducer<String, String>(props);
String topic = "customer_comment_service_metrics";
String key = "duration";
String value = "320";
ProducerRecord<String, String> msg =
new ProducerRecord<>(topic, key, value);
producer.send(msg);
producer.flush();
producer.close(); messages are sent in a batch
within a separate thread
service (producer)
kafka producers and consumers
client
service
(producer)
metrics
(consumer)
offsets (0.11)
13. kafka producers and consumers
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "CG1");
props.put("key.deserializer", "…kafka…StringDeserializer");
props.put("value.deserializer", "…kafka…StringDeserializer");
KafkaConsumer<String, String> consumer =
new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList(
"customer_comment_service_metrics"));
metrics (consumer)
kafka producers and consumers
try {
while (true) {
ConsumerRecords<String, String> msgs = consumer.poll(100);
for (ConsumerRecord<String, String> msg : msgs) {
System.out.println("topic: " + msg.topic());
System.out.println("key: " + msg.key());
System.out.println("value: " + msg.value());
System.out.println("partition: " + msg.partition());
System.out.println("offset: " + msg.offset());
}
} finally {
consumer.close();
}
metrics (consumer)
we are using auto commit of
our offset sync point (5 sec)
14. kafka producers and consumers
try {
while (true) {
ConsumerRecords<String, String> msgs = consumer.poll(100);
for (ConsumerRecord<String, String> msg : msgs) {
System.out.println("topic: " + msg.topic());
System.out.println("key: " + msg.key());
System.out.println("value: " + msg.value());
System.out.println("partition: " + msg.partition());
System.out.println("offset: " + msg.offset());
}
try {
consumer.commitSync();
} catch (CommitFailedException e) {
log.error("rats - I have no idea what to do now!");
}
} finally {
consumer.close();
}
metrics (consumer)
kafka producers and consumers
19. primary data type
throughput
payload
data loss
data duplication
msg confirmation
operational data transactional data
up to 1 million/sec up to 4K/sec (10K/sec)
single name/value pair aggregate data
possible* rare
possible* rare
consumer managed broker managed
apache kafka vs. standard messaging
* StreamsAPI significantly reduces data loss and duplication
load balancing
message order
msg properties
messaging models
msg persistence
supported supported
consumer control preserved (fifo)*
supported** supported
pub/sub pub/sub, p2p, hybrid
always optional
guaranteed delivery not supported** supported
apache kafka vs. standard messaging
** StreamsAPI provides some support for these
* use of message priority can change message order
21. NFJS Software Symposium Series 2016
Author of Software Architecture Fundamentals Video Series (O’Reilly)
Author of Microservices Pitfalls and AntiPatterns (O’Reilly)
Author of Microservices vs. Service-Oriented Architecture (O’Reilly)
Author of Enterprise Messaging Video Series (O’Reilly)
Author of Java Message Service 2nd Edition (O’Reilly)
Independent Consultant
Hands-on So*ware Architect
Published Author / Conference Speaker
www.wmrichards.com
Mark Richards
Leveraging Streaming Data in a
Microservices Ecosystem