In this presentation we will examine various scalability options in order to improve the robustness and performance of your Spring Batch applications. We start out with a single threaded Spring Batch application that we will refactor so we can demonstrate how to run it using:
* Concurrent Steps
* Remote Chunking
* AsyncItemProcessor and AsyncItemWriter
* Remote Partitioning
Additionally, we will show how you can deploy Spring Batch applications to Spring XD which provides high availability and failover capabilities. Spring XD also allows you to integrate Spring Batch applications with other Big Data processing needs.
4. Batch processing ... is defined as the processing of
data without interaction or interruption.
4
“ Michael T. Minella, Pro Spring Batch
5. Batch Jobs
• Generally long-running
• Non-interactive
• Often include logic for handling errors and restartability options
• Process large volumes of data
• More than what may fit in memory or a single transaction
5
6. Batch and offline processing
• Close of business processing
• Order processing, Business reporting, Account reconciliation,
Payroll
• Import / export handling
• a.k.a. ETL jobs (Extract-Transform-Load)
• Data warehouse synchronization
• Large-scale output jobs
• Loyalty program emails, Bank statements
• Hadoop job orchestration
6
7. Features
• Transaction management
• Chunk based processing
• Schema and Java Config support
• Annotations for callback type scenarios such as Listeners
• Start/Restart/Skip capabilities
• Based on the Spring framework
• JSR 352: Batch Applications for the Java Platform
7
16. Integration Styles
• Business to Business Integration (B2B)
• Inter Application Integration (EAI)
• Intra Application Integration
16
JVM JVM
EAI
Core Messaging
B2B
External Business
Partner
18. Enterprise Integration Patterns
• By Gregor Hohpe & Bobby Woolf
• Published 2003
• Collection of well-known patterns
• Icon library provided
18
http://www.eaipatterns.com/eaipatterns.html
19. Spring Integration provides an extension of the Spring programming model
to support the well-known enterprise integration patterns.
19
“ Spring Integration Website
23. Launching batch jobs through messages
• Event-Driven execution of the JobLauncher
• Spring Integration retrieves the data (e.g. file system, FTP, ...)
• Easy to support separate input sources simultaneously
23
D C
FTP
Inbound Channel Adapter
JobLauncher
Transformer
File
JobLaunchRequest
24. JobLaunchRequest
24
public class FileMessageToJobRequest {!
private Job job;!
private String fileParameterName;!
...!
@Transformer!
public JobLaunchRequest toRequest(Message<File> message) {!
JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();!
jobParametersBuilder.addString(fileParameterName,!
message.getPayload().getAbsolutePath());!
return new JobLaunchRequest(job, jobParametersBuilder.toJobParameters());!
}!
}!
26. Get feedback with informational messages
!
• Spring Batch provides support for listeners:
• StepExecutionListener
• ChunkListener
• JobExecutionListener
26
27. Get feedback with informational messages
27
<batch:job id="importPayments">
...
<batch:listeners>
<batch:listener ref="notificationExecutionsListener"/>
</batch:listeners>
</batch:job>
!
<int:gateway id="notificationExecutionsListener"
service-interface="o.s.batch.core.JobExecutionListener"
default-request-channel="jobExecutions"/>
30. Scaling and externalizing batch process execution
• Utilization of Spring Integration for multi process communication
• Distribute complex processing
• Single process
o Multi-threaded steps
o Parallel steps
o Local partitioning
• Multi process
o Remote chunking
o Remote partitioning
• Asynchronous Item processing support
• AsyncItemProcessor
• AsyncItemWriter
30
31. Single Thread
31
Reader
Item Result
Gateway
Output
Input
Processor Writer
Item Result
35. Asynchronous Processors
• AsyncItemProcessor
• Dispatches ItemProcessor logic on new thread, returning a
Future to the AsyncItemWriter
• AsyncItemWriter
• Writes the processed items after processing is complete
35
41. Demo - Launching via messages & informational messages
41
Does not provide scaling but demonstrates how launch job via
messages and send information messages to integration points
42. Spring XD
42
http://projects.spring.io/spring-xd/
43. Tackling Big Data Complexity
!
• Data Ingestion
• Real-time Analytics
• Workflow Orchestration
• Data Export
43
44. Tackling Big Data Complexity cont.
!
• Built on existing Spring assets
• Spring Integration
• Spring Batch
• Spring Data
• Spring Boot
• Spring for Apache Hadoop
• Spring Shell
• Redis, GemFire, Hadoop
44
45. Data Ingestion Streams
• DSL based on Unix pipes and filters syntax
!
• Modules are parameterizable
!
• Simple logic can be added via expressions or scripts
45
http | file
twittersearch --query=spring | file --dir=/spring
http | filter --expression=payload=='Spring' | hdfs
46. Hadoop workflow managed by Spring Batch
• Reuse Batch infrastructure and features to
manage Hadoop workflows
• Job state management, launching, monitoring,
restart/retry policies, etc.
• Step can be any Hadoop job type or HDFS script
• Can mix and match with other Batch readers/
writers, e.g. JDBC for import/export use-cases
46