Developing real-time data pipelines with Spring and Kafka

Developing real-time data
pipelines with Spring and
Kafka
Marius Bogoevici
Staff Engineer, Pivotal
@mariusbogoevici

Agenda
• The Spring ecosystem today
• Spring Integration and Spring Integration Kafka
• Data integration
• Spring XD
• Spring Cloud Data Flow

Spring Framework
• Since 2002
• Java-based enterprise application development
• “Plumbing” should not be a developer concern
• Platform agnostic

Have you seen Spring lately?
• XML-less operation (since Spring 3.0, 2009)
• Component detection via @ComponentScan
• Declarative stereotypes:
• @Component, @Controller, @Repository
• Dependency injection @Autowired
• Extensive ecosystem

A simple REST controller
@RestController
public class GreetingController {
private static final String template = "Hello, %s!";
private final AtomicLong counter = new AtomicLong();
@RequestMapping("/greeting")
public Greeting greeting(@PathVariable(value="name", defaultValue="World") String name) {
return new Greeting(counter.incrementAndGet(),
String.format(template, name));
}
}

Spring Data
• Spring-based data access model
• Data mapping and repository abstractions
• Retains the characteristics of underlying data store
• Framework generated implementation
• Customized query support

Spring Data Repositories
public interface PersonRepository extends CrudRepository<Person, Long> {
Person findByFirstName(String firstName);
}
@RestController
public class PersonController {
@Autowired PersonRepository repository;
@RequestMapping(“/“)
public List<Person> getAll() {
return repository.findAll();
}
@RequestMapping(“/{firstName}”)
public Person readOne(@PathVariable String firstName) {
return repository.findByFirstname(String name);
}
}
Only declare the interfaces
Implementation is generated and injected

Spring Data
JPA
Spring Data
REST

Spring Boot
• Auto conﬁguration: infrastructure automatically
created based on class path contents
• Smart defaults
• Standalone executable artifacts (“just run”)
• Uberjar + embedded runtime
• Conﬁguration via CLI, environment

Spring Boot Application
@Controller
@EnableAutoConfiguration
public class SampleController {
@RequestMapping("/")
@ResponseBody
String home() {
return "Hello World!";
}
public static void main(String[] args) throws Exception {
SpringApplication.run(SampleController.class, args);
}
}
java -jar application.jar

Spring Integration
• Since 2007
• Pipes and Filters: Messages, Channels, Endpoints
• Enterprise Integration Patterns as ﬁrst-class
constructs
• Large set of adapters
• Java DSL

Spring Integration
Message
Encapsulates
Data
(headers + payload)
Channel
Transports
Data
Endpoint
Handles
Data

Example: a simple pipeline
Message
Translator
integerMessageSource inputChannel queueChannel
myFlow
(transform, ﬁlter)

Example: a simple pipeline
@Configuration
@EnableIntegration
public class MyConfiguration {
@Bean
public MessageSource<?> integerMessageSource() {
MethodInvokingMessageSource source = new MethodInvokingMessageSource();
source.setObject(new AtomicInteger());
source.setMethodName("getAndIncrement");
return source;
}
@Bean
public DirectChannel inputChannel() {
return new DirectChannel();
}
@Bean
public IntegrationFlow myFlow() {
return IntegrationFlows.from(this.integerMessageSource(), c ->
c.poller(Pollers.fixedRate(100)))
.channel(this.inputChannel())
.filter((Integer p) -> p > 0)
.transform(Object::toString)
.channel(MessageChannels.queue())
.get();
}
}

Spring Integration:
Cafe Example

Spring Integration
Components
• Enterprise Integration Patterns:
• Filter, Transform, Gateway, Service Activator,
Aggregator, Channel Adapter, Routing Slip
• Adapters:
• JMS, RabbitMQ, Kafka, MongoDB, JDBC,
Splunk, AWS (S3, SQS), Twitter, Email, etc.

Spring Integration Kafka
• Started in 2011
• Goal: adapting to the abstractions Spring Messaging and
Spring Integration
• Easy access to the unique features of Kafka;
• Namespace, Java DSL support
• To migrate to 0.9 once available
• Defaults focused towards performance (disable ID
generation, timestamp)

Spring Integration Kafka:
Channel Adapters
Kafka Inbound
Channel Adapter
Kafka Outbound
Channel Adapter
Message Channel
Message Message

Producer Configuration
• Default producer configuration
• Distinct per-topic producer configurations
• Destination target or partition controlled via
expression evaluation or headers

Consumer
• Own client based on Simple Consumer API
• Listen to speciﬁc partitions!
• Offset control - when to be written and where (no
Zookeeper);
• Programmer-controlled acknowledgment;
• Concurrent message processing (preserving per-
partition ordering)
• Basic operations via KafkaTemplate
• Kafka speciﬁc headers

Message Listener
• Auto-acknowledging
• With manual acknowledgment
public interface MessageListener {
void onMessage(KafkaMessage message);
}
public interface AcknowledgingMessageListener {
void onMessage(KafkaMessage message, Acknowledgment
acknowledgment);
}

Spring Integration Kafka:
Offset Management
• Injectable strategy
• Allows customizing the starting offsets
• Implementations: SI MetadataStore-backed (e.g. Redis, Gemﬁre),
Kafka compacted topic-backed (pre-0.8.2), Kafka 0.8.2 native
• Messages can be auto acknowledged (by the adapter) or manually
acknowledged (by the user)
• Manual acknowledgment useful when messages are processed
asynchronously
• Acknowledgment passed as message header or as argument

Stream processing with
Spring XD
• Higher abstractions are required
• Integrating seamlessly and transparently with the
middleware
• Building on top of Spring Integration and Spring
Batch
• Pre-built modules using the entire power of the
Spring ecosystem

Streams in Spring XD
HTTP$
JMS$
Ka*a$
RabbitMQ$
JMS$
Gemﬁre$
File$
SFTP$
Mail$
JDBC$
Twi;er$
Syslog$
TCP$
UDP$
MQTT$
Trigger$
Filter$
Transformer$
Spli;er$
Aggregator$
HTTP$Client$
JPMML$Evaluator$
Shell$
Python$
Groovy$
Java$
RxJava$
Spark$Streaming$
File$
HDFS$
HAWQ$
Ka*a$
RabbitMQ$
Redis$
Splunk$
Mongo$
Redis$
JDBC$
TCP$
Log$
Mail$
Gemﬁre$
MQTT$
Dynamic$Router$
Counters$
Note: Named channels allow for a
directed graph of data flow
channel

Spring XD - Message Bus
abstraction
• Binds module inputs and outputs to a transport
Binds module inputs and outputs to a transport
Performs Serialization (Kryo)
Local, Rabbit, Redis, and Kafka

XD Modules
XD
Admin
XD Containers
ZooKeeper
ZooKeeper
Admin / Flo UI
Shell
CURL
Spring XD Architecture
Database
© Copyright 2015 Pivotal. All rights reserved.

Spring XD and Kafka - the
message bus
• Each pipe between modules is a topic;
• Spring XD creates topics automatically;
• Topics are pre-partitioned based on module count
and concurrency;
• Overpartitioning is available as an option;
• Multiple consumer modules ‘divide’ the partition set
of a topic using a deterministic algorithm;

Partitioning in Spring XD
• Required in distributed stateful processing: related data must be
processed on the same node;
• Partitioning logic conﬁgured in Spring XD via deployment manifest
• partitionKeyExpression=payload.sensorId
• When using Kafka as a bus, partition key logic maps directly to Kafka
transport partitioning natively

Partitioned Streams with
Kafka
Partition 0
Partition 1
HTTP
HTTP
HTTP
Average Processor
Average Processor
Topic
http | avg-temperatures

Performance metrics of
Spring XD 1.2

Performance metrics of Spring XD 1.2

Spring Cloud Data Flow
(Spring XD 2.0)

Goals
• Scale without undeploying running stream or
batch pipelines
• Avoid hierarchical ‘classloader' issues, inadvertent
spiral of ‘xd/lib’
• Skip network hops within a stream
• Do rolling upgrades and continuous deployments

Spring Cloud Data Flow is
a cloud native programming and operating model
for composable data microservices on a
structured platform

a cloud native programming and operating
model for composable data microservices on a
structured platform

structured platform
@EnableBinding(Source.class)

public class Greeter {

@InboundChannelAdapter(Source.OUTPUT)

public String greet() {

return "hello world”;

}

}
@EnableBinding(Source.class)
@EnableBinding(Processor.class)
@EnableBinding(Sink.class)
public interface Source {  
String OUTPUT = "output"; 

@Output(Source.OUTPUT)

MessageChannel output(); 
}

structured platform
continuous
delivery
continuous
deployment
monitoring

structured platform
http transform jdbc
job foo
< bar || baz & jaz
> bye
Streams
Jobs
foo
bar jaz
baz
bye
| |

structured platform
YARN
?
LATTICE

Admin
Admin / Flo UI
Shell
CURL
??X
YARN
Bootiﬁed Modules
New Architecture

XD [ Container ] Orchestration
ZooKeeper
HOST
XD ContainerXD Container
XD Modules

Messaging-Driven
Data Microservices
HOST
Spring Cloud Stream Modules

Orchestrate Composable
Data Microservices
HOST
Cloud Foundry YARN X
Spring Cloud Data Flow
Lattice
Spring Cloud Stream ModulesSpring Cloud Stream Binders [Rabbit, Kafka, Redis]

Partitioned stream scaling
with SCDF and Kafka
INSTANCE_INDEX=0
HTTP
…
INSTANCE_INDEX=0
INSTANCE_INDEX=1
INSTANCE_INDEX=6
…
LOG
Kafka Service
http.0 (0) http.0 (1)
http.0 (2)
http.0 (6)
…
Broker 0 Broker 1 Broker 4
stream create logger
--definition "http | log"
stream deploy logger
--properties
module.log.partitioned=true,
module.log.count=7

Summary
• Scalable pipelines composed of Spring Boot cloud
native applications
• Spring Cloud Stream provides the programming
model
• Transparently mapping to Kafka-native concepts
• Spring Cloud Data Flow provides the orchestration
model

Questions?
https://spring.io
http://projects.spring.io/spring-xd/
http://cloud.spring.io/spring-cloud-dataﬂow/
http://projects.spring.io/spring-integration/
http://cloud.spring.io/spring-cloud-stream/

Developing real-time data pipelines with Spring and Kafka

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (8)

Similaire à Developing real-time data pipelines with Spring and Kafka

Similaire à Developing real-time data pipelines with Spring and Kafka (20)

Dernier

Dernier (20)

Developing real-time data pipelines with Spring and Kafka