1. Spring Batch
This is a reference/guide for software developers to understand/learn Spring Batch.
Jayasree Perilakkalam
2. Spring Batch Layered Architecture
• Reference: https://docs.spring.io/spring-
batch/docs/3.0.x/reference/html/spring-batch-intro.html
Application
Spring Batch Core
Infrasrtucture
All batch jobs and custom code written
by developers using Spring Batch
Core runtime classes necessary to
launch and control a batch job such as
JobLauncher, Job, and Step
Common readers, writers , and
services such as the RetryTemplate
4. Spring Batch Domain/Concepts
• Job
• “Job” encapsulates an entire batch process. As is common with other Spring projects, a “Job” is
wired together using either an XML configuration or a Java configuration.
• A “Job” has to one to many steps each of which has exactly one ItemReader, ItemWriter, and
ItemProcessor. A “Job” allows for configuration of properties global to all steps, such as
restartability.
• A “Job” needs to be launched using a “JobLauncher” and the metadata about the currently
running process is stored in “JobRepository”.
• A default implementation of “Job” interface is provided in Spring Batch in the form of the
“SimpleJob” class. When using Java based configuration, a collection of builders is available for the
instantiation of “Job”.
• A “JobInstance” refers to the concept of a logical job run. So a “Job” has many “JobInstance”. A
“Job” can be scheduled to run many times. Each of these is a “JobInstance”. Each “JobInstance” is
tracked separately and thus if it fails, it needs to be run again. Therefore, each “JobInstance” can
have multiple executions (“JobExecution”). Only one “JobInstance” corresponding to a particular
“Job” and identifying “JobParameters” can run at a given time.
• The definition of “JobInstance” has no bearing on the data to be loaded. It’s entirely up to the
“ItemReader” implementation to determine how the data is loaded.
5. Spring Batch Domain/Concepts
• Using the same “JobInstance” determines whether or not the same state (i.e. “ExecutionContext”) from the
previous execution is used. Using a new “JobInstance” means start from the beginning, and using and existing
“JobInstance” generally means start from where you left off.
• Now this question arises. How is one “JobInstance” distinguished from the other? The answer is
“JobParameters”. A “JobParameters” object holds a set of parameters used to start a batch job. They can be
used for identification or even as reference data during the run. Thus, JobInstance = Job + identifying
JobParameters . Note: Not all job parameters are required to contribute to the identification of “JobInstance”.
• “JobExecution” is a technical concept of a single attempt to run a job. An execution may end in a failure or a
success, but “JobInstance” corresponding to a given execution is not considered complete unless the
execution completes successfully. Consider a “JobInstance” that failed, when it is run again with the same
identifying “JobParameters”, a new “JobExecution” is created. However, there is still only one “JobInstance”
(the same one as before).
• A “Job” defines what a job is and how it is to be executed. A “JobInstance” is a purely organizational object to
group executions together, primarily to enable correct restart semantics. A “JobExecution” , however, is the
primary storage mechanism for what actually happened during a run and contains properties that must be
controlled and persisted.
6. Spring Batch Domain/Concepts
• “executionContext” is a property of “JobExecution”. It is the property bag that contains any user data that need to be
preserved(persisted) between executions.
• A batch job metadata tables are “batch_job_instance”, batch_job_execution_params”, “batch_job_execution”.
• Step
• This is a domain object that encapsulates an independent, sequential phase of a batch job. Thus every job is
composed of one or more steps.
• As with “Job”, “Step” has an individual “StepExecution” that correlates with a unique “JobExecution”.
“StepExecution” represents a single attempt to execute a “Step”. A new “StepExecution” is created each time a
“Step” is run similar to “JobExecution”. However, if a “Step” fails because a “Step” before it failed, no execution is
persisted for it. A “StepExecution” is created only when its “Step” is actually started. “Step” executions are
represented by objects of the “StepExecution” class. Each execution contains reference to its corresponding step and
“JobExecution” and transaction related data such as commit and rollback counts and start and end times.
• Additionally, each “StepExecution” has a “executionContext” property which contains any data a developer needs to
have persisted across batch runs such as statistics or state information needed to restart.
• An executionContext represents a collection of key/value pairs that are persisted/controlled by the framework in
order to allow developers to store persistent state that is scoped to a “StepExecution” object or a ”JobExecution”
object. The best usage example is to facilitate restart.
• Also, there is at least one executionContext per JobExecution and one for every StepExecution. They are two
different executionContexts. The one scoped to the step is saved at every commit point in the step, whereas the one
scoped to the job is saved in between every step execution.
8. Spring Batch Domain/Concepts
• JobRepository
• “JobRepository” is the persistence mechanism for all the batch stereotypes.
• It provides CRUD operations for JobLauncher, Job and Step implementations.
• When a “Job” is first launched, a “JobExecution” is obtained from “JobRepository” and during the
course of the execution, “StepExecution” and “JobExecution” implementations are persisted by
passing them to “JobRepository”.
• When using Java configuration, @EnableBatchProcessing annotation provides a “JobRepository”
as one of the components automatically configured.
• JobLauncher
• “JobLauncher” represents a simple interface for a launching a “Job” with a given set of
“JobParameters”.
Public interface JobLauncher {
public JobExecution run(Job job, JobParameters jobParameters)
throws JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException,
JobParametersValidException;
}
• It is expected that a valid “JobExecution” is obtained from “JobRepository” to execute the “Job”.
9. Spring Batch Domain/Concepts
• ItemReader
• This is an abstraction that represents the retrieval of input for a “Step”, one item at a
time.
• When “ItemReader” has exhausted the items it can provide, it indicates this by
returning null.
• ItemWriter
• “ItemWriter” is an abstraction that represents the output of a “Step”, one batch or
chunk of items at a time.
• ItemProcessor
• “ItemProcessor” is an abstraction that represents the business processing of an item.
• If while processing the item, it is determined that the item is not valid, returning null
indicates that the item should not be written out.
10. Maven Dependency Configuration
• Add the following in the pom.xml file
<!-- https://mvnrepository.com/artifact/org.springframework.batch/spring-batch-core -->
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-core</artifactId>
<version>4.2.0.RELEASE</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.springframework.batch/spring-batch-infrastructure -->
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-infrastructure</artifactId>
<version>4.2.0.RELEASE</version>
</dependency>
11. Spring Batch Sample Configuration
Reference: https://docs.spring.io/spring-
batch/docs/current/reference/html/job.html#javaConfig
@Configuration
@EnableBatchProcessing
@Import(PersistExampleConfig.class)
public class ExampleBatchConfig {
@Autowired
private JobBuilderFactory jobs;
@Autowired
private StepBuilderFactory steps;
Contd…
13. Spring Batch Sample Configuration
Contd…
@Bean
protected Step step2(Tasklet tasklet) {
return steps.get("step2")
.tasklet(tasklet)
.build();
}
}
Notes:
1. “Tasklet” is a simple interface which has only one method “execute” which is called repeatedly by “TaskletStep”
until it returns either “RepeatStatus.FINISHED” or throws an exception to signal a failure. A “Tasklet” is supposed
to perform a single task within a step. To create a “TaskletStep”, the bean passed to the tasklet method of the
step builder (as indicated above) must implement the “Tasklet” interface.
2. Spring Batch incorporates chunk-oriented processing as well. Instead of processing all the data at once, it
processes chunks of data. One item is read by “ItemReader” and passed to “ItemProcessor” and aggregated. Once
the number of items read/processed equals the commit interval, the entire chunk is written out by “ItemWriter”.
Reference: https://docs.spring.io/spring-batch/docs/current/reference/html/step.html#chunkOrientedProcessing
14. Intercepting Job Execution
• Reference: https://docs.spring.io/spring-
batch/docs/current/reference/html/job.html#interceptingJobExecution
• During the course of the job execution, it may be useful to be notified of
various events in the lifecycle. This can be achieved by adding
“JobExecutionListener” object to the listener element on the job.
e.g. @Bean
public Job footballJob() {
return this.jobBuilderFactory.get("footballJob")
.listener(sampleListener())
...
.build();
}
15. Intercepting Job Execution
• “JobExecutionListener” is an interface in Spring Batch (shown below):
public interface JobExecutionListener {
void beforeJob(JobExecution jobExecution);
void afterJob(JobExecution jobExecution);
}
16. Conclusion
• This is a reference for developers for understanding/implementing
Spring Batch in a software application.
• There are other frameworks too for batch processing like “Easy
Batch”.
• This reference will help developers to build batch processing
applications faster.
Thank you