Talk given at the Apache Kafka NYC Meetup, October 20, 2015.
http://www.meetup.com/Apache-Kafka-NYC/events/225697500/
Kafka has emerged as a clear choice for a high-throughput, low latency messaging system that addresses the needs of high-performance streaming applications. The Spring Framework has been, in the last decade, the de-facto standard for developing enterprise Java applications, providing a simple and powerful programming model that allows developers to focus on the business needs, leaving the boilerplate and middleware integration to the framework itself. In fact, it has evolved into a rich and powerful ecosystem, with projects focusing on specific aspects of enterprise software development - like Spring Boot, Spring Data, Spring Integration, Spring XD, Spring Cloud Stream/Data Flow to name just a few.
In this presentation, Marius Bogoevici from the Spring team will take the perspective of the Kafka user, and show, with live demos, how the various projects in the Spring ecosystem address their needs:
- how to build simple data integration applications using Spring Integration Kafka;
- how to build sophisticated data pipelines with Spring XD and Kafka;
- how to build cloud native message-driven microservices using Spring Cloud Stream and Kafka, and how to orchestrate them using Spring Cloud Data Flow;
2. Agenda
• The Spring ecosystem today
• Spring Integration and Spring Integration Kafka
• Data integration
• Spring XD
• Spring Cloud Data Flow
3. Spring Framework
• Since 2002
• Java-based enterprise application development
• “Plumbing” should not be a developer concern
• Platform agnostic
4. Have you seen Spring lately?
• XML-less operation (since Spring 3.0, 2009)
• Component detection via @ComponentScan
• Declarative stereotypes:
• @Component, @Controller, @Repository
• Dependency injection @Autowired
• Extensive ecosystem
5. A simple REST controller
@RestController
public class GreetingController {
private static final String template = "Hello, %s!";
private final AtomicLong counter = new AtomicLong();
@RequestMapping("/greeting")
public Greeting greeting(@PathVariable(value="name", defaultValue="World") String name) {
return new Greeting(counter.incrementAndGet(),
String.format(template, name));
}
}
9. Spring Data
• Spring-based data access model
• Data mapping and repository abstractions
• Retains the characteristics of underlying data store
• Framework generated implementation
• Customized query support
10. Spring Data Repositories
public interface PersonRepository extends CrudRepository<Person, Long> {
Person findByFirstName(String firstName);
}
@RestController
public class PersonController {
@Autowired PersonRepository repository;
@RequestMapping(“/“)
public List<Person> getAll() {
return repository.findAll();
}
@RequestMapping(“/{firstName}”)
public Person readOne(@PathVariable String firstName) {
return repository.findByFirstname(String name);
}
}
Only declare the interfaces
Implementation is generated and injected
12. Spring Boot
• Auto configuration: infrastructure automatically
created based on class path contents
• Smart defaults
• Standalone executable artifacts (“just run”)
• Uberjar + embedded runtime
• Configuration via CLI, environment
13. Spring Boot Application
@Controller
@EnableAutoConfiguration
public class SampleController {
@RequestMapping("/")
@ResponseBody
String home() {
return "Hello World!";
}
public static void main(String[] args) throws Exception {
SpringApplication.run(SampleController.class, args);
}
}
java -jar application.jar
14. Spring Integration
• Since 2007
• Pipes and Filters: Messages, Channels, Endpoints
• Enterprise Integration Patterns as first-class
constructs
• Large set of adapters
• Java DSL
20. Spring Integration Kafka
• Started in 2011
• Goal: adapting to the abstractions Spring Messaging and
Spring Integration
• Easy access to the unique features of Kafka;
• Namespace, Java DSL support
• To migrate to 0.9 once available
• Defaults focused towards performance (disable ID
generation, timestamp)
22. Spring Integration Kafka
Producer Configuration
• Default producer configuration
• Distinct per-topic producer configurations
• Destination target or partition controlled via
expression evaluation or headers
23. Spring Integration Kafka
Consumer
• Own client based on Simple Consumer API
• Listen to specific partitions!
• Offset control - when to be written and where (no
Zookeeper);
• Programmer-controlled acknowledgment;
• Concurrent message processing (preserving per-
partition ordering)
• Basic operations via KafkaTemplate
• Kafka specific headers
24. Spring Integration Kafka
Message Listener
• Auto-acknowledging
• With manual acknowledgment
public interface MessageListener {
void onMessage(KafkaMessage message);
}
public interface AcknowledgingMessageListener {
void onMessage(KafkaMessage message, Acknowledgment
acknowledgment);
}
25. Spring Integration Kafka:
Offset Management
• Injectable strategy
• Allows customizing the starting offsets
• Implementations: SI MetadataStore-backed (e.g. Redis, Gemfire),
Kafka compacted topic-backed (pre-0.8.2), Kafka 0.8.2 native
• Messages can be auto acknowledged (by the adapter) or manually
acknowledged (by the user)
• Manual acknowledgment useful when messages are processed
asynchronously
• Acknowledgment passed as message header or as argument
26. Stream processing with
Spring XD
• Higher abstractions are required
• Integrating seamlessly and transparently with the
middleware
• Building on top of Spring Integration and Spring
Batch
• Pre-built modules using the entire power of the
Spring ecosystem
27. Streams in Spring XD
HTTP$
JMS$
Ka*a$
RabbitMQ$
JMS$
Gemfire$
File$
SFTP$
Mail$
JDBC$
Twi;er$
Syslog$
TCP$
UDP$
MQTT$
Trigger$
Filter$
Transformer$
Spli;er$
Aggregator$
HTTP$Client$
JPMML$Evaluator$
Shell$
Python$
Groovy$
Java$
RxJava$
Spark$Streaming$
File$
HDFS$
HAWQ$
Ka*a$
RabbitMQ$
Redis$
Splunk$
Mongo$
Redis$
JDBC$
TCP$
Log$
Mail$
Gemfire$
MQTT$
Dynamic$Router$
Counters$
Note: Named channels allow for a
directed graph of data flow
channel
29. Spring XD - Message Bus
abstraction
• Binds module inputs and outputs to a transport
Binds module inputs and outputs to a transport
Performs Serialization (Kryo)
Local, Rabbit, Redis, and Kafka
31. Spring XD and Kafka - the
message bus
• Each pipe between modules is a topic;
• Spring XD creates topics automatically;
• Topics are pre-partitioned based on module count
and concurrency;
• Overpartitioning is available as an option;
• Multiple consumer modules ‘divide’ the partition set
of a topic using a deterministic algorithm;
32. Partitioning in Spring XD
• Required in distributed stateful processing: related data must be
processed on the same node;
• Partitioning logic configured in Spring XD via deployment manifest
• partitionKeyExpression=payload.sensorId
• When using Kafka as a bus, partition key logic maps directly to Kafka
transport partitioning natively
37. Goals
• Scale without undeploying running stream or
batch pipelines
• Avoid hierarchical ‘classloader' issues, inadvertent
spiral of ‘xd/lib’
• Skip network hops within a stream
• Do rolling upgrades and continuous deployments
49. Summary
• Scalable pipelines composed of Spring Boot cloud
native applications
• Spring Cloud Stream provides the programming
model
• Transparently mapping to Kafka-native concepts
• Spring Cloud Data Flow provides the orchestration
model