Contenu connexe Similaire à Spring & SpringBatch EN (20) Spring & SpringBatch EN2. 2Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Plan & Goals
Plan
1. A tour of Spring & Spring-Boot
2. What is a Batch and Spring-Batch ?
3. How does it work ?
4. Advanced notions and applications
5. A good example
Goals
1. Understanding how Spring-Batch works
2. Detecting use cases
3. Be able to do Batch and go even further with ...
4. 4Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring Framework (Core 1/3)
Open-source Java framework created in 2003 by “Pivotal Software”
A light container (unlike EJBs)
Mainly based on injection of dependency and AOP
Easy integration of other Framework (Hibernate, JSF, Thymeleaf, ...)
5. 5Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring Framework (Boot 2/3)
Little configuration (self configurable as long as possible)
Runnable Application, autonomous even in production.
Embedded Tomcat (+ Jetty) , no need War
No need for XML files (context.xml, web.xml, …) @annotations
6. 6Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring Framework (Trend 3/3 @GoogleTrend)
8. 8Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Batch (What's this ? 1/2)
Lot Data Processing
Large data process
Several operations ensue for each lot
Automatic or manual triggering
9. 9Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Batch (Example 2/2)
Example : import of daily orders
Reading a lot of
commands from
a file
Checking
orders
Saving the lot of
commands in
storage system
Repeat this cycle for each lot of orders until the last
Triggered by CRON, everyday at 5 am, before the arrival of the collaborators
10. 10Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring-Batch (and ! 1/6)
Open-source framework for Batch Processing
Robust Batch Application (Entreprise application)
Reusable functions for data processing
Adds a standard approach to defining Batch Jobs
11. 11Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring-Batch (More 2/6)
Transaction management (Commit interval & Rollback)
Batch processing easily (Chuncked process)
Error Management, Recovery and Stopping Jobs
All in Spring…
12. 12Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring-Batch (Concept 3/6)
A Job is defined by a Step Flow (one or more)
A Step usually defined as a Reader, Processor and Writer
A Step can also define a simple Tasklet
A Job Repository (JobRepository) and a Job Executor (JobLauncher)
Origin : Spring.io
13. 13Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring-Batch (How does it work ? 4/6)
The Reader read one item at a time
The Processor processe one item at a time
When the lot is read and processed, the Writer writes it
Origin : Spring.io
14. 14Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring-Batch (Advanced 5/6)
Flow of Sequential, Conditional or Parallel Step
Split Flow and Multi-threading
Many Listeners (StepExecutionListener, ChunkListener, ...)
Exceptions management
15. 15Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Spring-Batch (Diagramme 6/6)
Origin : Cépria FR
16. 16Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example
from life…
17. 17Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Context 1/8)
Web application that can use Jobs from View
Track Jobs Status
Allow to play Jobs in Synchronous and Asynchronous
Three Jobs, one for importing CSV files to Database, one for exporting
data to a JSON file, and one Job for importing and exporting large
amounts of data (200,000 lines) into less than 10 minutes
18. 18Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Business request 2/8)
Import Job :
• Input : a CSV file containing employee information and their annual gross
salaries
• Output : processed information and taxes calculated and saved in the
database
Export Job :
• Input : employee information and taxes calculated and saved in the
database
• Output : A JSON file containing all data
+ A REST API to perform tax calculation and validation
19. 19Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Architecture & technical choices 3/8)
Use of Spring-Boot with Spring-MVC and Srping-Batch
Integration of Thymeleaf as a templating engine (+ nekohtml)
MySQL Driver, Mokito for tests and Jackson for JSON support
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
…
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
…
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
…
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
…
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-thymeleaf</artifactId>
…
<groupId>net.sourceforge.nekohtml</groupId>
<artifactId>nekohtml</artifactId>
…
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
20. 20Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Solution diagramme 4/8)
21. 21Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Programming 5/8)
An interface is available to define a Tasklet
public interface ItemReader<T> {
T read() throws Exception, UnexpectedInputException , ParseException;
}
public interface ItemWriter<T> {
void write(List<? extends T> items) throws Exception;
}
public interface ItemProcessor<I, O> {
O process(I item) throws Exception;
}
Three interfaces are available to define a Reader, Processor and Writer
public interface Tasklet {
RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception
}
An interface is available to define a Partitioner
public interface Partitioner {
Map<String, ExecutionContext> partition(int gridSize);
}
22. 22Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Annotation job 6/8)
@Bean
public ItemReader<Person> reader() {
return new PersonReaderFromFile();
}
@Bean
public ImportPersonItemProcessor processor() {
return new ImportPersonItemProcessor();
}
@Bean
public ItemWriter<Person> writer() {
return new PersonWriterToDataBase();
}
@Bean
public Tasklet cleaner() {
return new CleanDBTasklet ();
}
@Bean
public Job importUserJob() {
return jobBuilderFactory.get("importUserJob").incrementer(new RunIdIncrementer()).flow(stepClean())
.next(stepImport()).end().listener(new ImportJobExecutionListener(reader())
.validator(new FileNameParameterValidator()).build();
}
@Bean
public Step stepImport() {
return stepBuilderFactory.get("stepImport").<Person, Person>chunk(10).reader(reader()).processor(processor()).writer(writer()).build();
}
@Bean
public Step stepClean() {
return stepBuilderFactory.get("stepClean").tasklet(cleaner()).build();
}
23. 23Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (XML Jobs 7/8)
<batch:job id="importUserJob">
<batch:step id="stepClean" next="importStep">
<batch:tasklet ref="cleanDBTasklet" />
</batch:step>
<batch:step id="importStep">
<batch:tasklet>
<batch:chunk reader="reader" writer="writer" processor="processor" commit-interval="10" />
</batch:tasklet>
</batch:step>
</batch:job>
<bean id="reader" class="com.capgemini.reader.PersonReaderFromFile" scope="step" />
<bean id="processor" class="com.capgemini.processor.ImportPersonItemProcessor" scope="step" />
<bean id="writer" class="com.capgemini.writer.PersonWriterToDataBase" scope="step" />
<bean id="cleanDBTasklet" class="com.capgemini.tasklet.CleanDBTasklet" />
<batch:job id=“exportUserJob">
<batch:step id=“exportStep">
<batch:tasklet>
<batch:chunk reader="reader" writer="writer" processor="processor" commit-interval="10" />
</batch:tasklet>
</batch:step>
</batch:job>
<bean id="reader" class="com.capgemini.reader.PersonReaderFromDataBase" scope="step" />
<bean id="processor" class="com.capgemini.processor.ExportPersonItemProcessor" scope="step" />
<bean id="writer" class="com.capgemini.writer.PersonWriterToFile" scope="step" />
24. 24Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Example of use (Application 8/8)
26. 26Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Demo (Result and Conclusion1/6)
Small or medium file import:
• For small files, an average of 4 seconds for Synchronous and
Asynchronous modes
• For medium-sized files (10,000 lines), an average of 43 seconds of
processing between Synchronous and Asynchronous
Import large file: A processing average of 13 minutes
Export 1000 rows or 10,000 rows : For the 1000 rows, an average of 4
seconds of processing, and for the 100,000 rows, an average of 30
seconds
Export of 200,000 rows : A processing average of 12 minutes
=> For an import and export of 200,000 lines, it exceeds 10 minutes
27. 27Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Demo (Solution 2/6)
To import and export 200,000 lines in less than 10 minutes, multi-threading
is one of the solutions
Multi-threading: multiple Threads, perform a parallel task
New problem: the FileReader and FileWriter are not ThreadSafe
Solution for the FileReader: Split the input file so that each file is processed
by a Thread
Solution for the FileWriter: paginate data from the database to export
several files and concatenate them at the end of the Process
28. 28Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Demo (Configuration 3/6)
<batch:job id="transformJob">
<batch:step id="deleteDir" next="cleanDB">
<batch:tasklet ref="fileDeletingTasklet" />
</batch:step>
<batch:step id="cleanDB" next="countThread">
<batch:tasklet ref="cleanDBTasklet" />
</batch:step>
<batch:step id="countThread" next="split">
<batch:tasklet ref="countThreadTasklet" />
</batch:step>
<batch:step id="split" next="partitionerMasterImporter">
<batch:tasklet>
<batch:chunk reader="largeCSVReader" writer="smallCSVWriter"
commit-interval="#{jobExecutionContext['chunk.count']}" />
</batch:tasklet>
</batch:step>
<batch:step id="partitionerMasterImporter" next="partitionerMasterExporter">
<partition step="importChunked" partitioner="filePartitioner">
<handler grid-size="10" task-executor="taskExecutor" />
</partition>
</batch:step>
<batch:step id="partitionerMasterExporter" next="concat">
<partition step="exportChunked" partitioner="dbPartitioner">
<handler grid-size="10" task-executor="taskExecutor" />
</partition>
</batch:step>
<batch:step id="concat">
<batch:tasklet ref="concatFileTasklet" />
</batch:step>
</batch:job>
29. 29Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Demo (Configuration 4/6)
<batch:step id="importChunked">
<batch:tasklet>
<batch:chunk reader="smallCSVFileReader" writer="dbWriter"
processor="importProcessor" commit-interval="500">
</batch:chunk>
</batch:tasklet>
</batch:step>
<batch:step id="exportChunked">
<batch:tasklet>
<batch:chunk reader="dbReader" writer="jsonFileWriter" processor="exportProcessor" commit-
interval="#{jobExecutionContext['chunk.count']}">
</batch:chunk>
</batch:tasklet>
</batch:step>
<bean id="jsonFileWriter" class="com.capgemini.writer.PersonWriterToFile" scope="step">
<property name="outputPath" value="csv/chunked/paged-#{stepExecutionContext[page]}.json" />
</bean>
<bean id="dbReader" class="com.capgemini.reader.PersonReaderFromDataBase" scope="step">
<property name="iPersonRepository" ref="IPersonRepository" />
<property name="page" value="#{stepExecutionContext[page]}"/>
<property name="size" value="#{stepExecutionContext[size]}"/>
</bean>
<bean id="countThreadTasklet" class="com.capgemini.tasklet.CountingTasklet" scope="step">
<property name="input" value="file:csv/input/#{jobParameters[filename]}" />
</bean>
<bean id="cleanDBTasklet" class="com.capgemini.tasklet.CleanDBTasklet" />
<bean id="fileDeletingTasklet" class="com.capgemini.tasklet.FileDeletingTasklet">
<property name="directory" value="file:csv/chunked/" />
</bean>
30. 30Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Demo (Configuration 5/6)
<bean id="concatFileTasklet" class="com.capgemini.tasklet.FileConcatTasklet">
<property name="directory" value="file:csv/chunked/" />
<property name="outputFilename" value="csv/output/export.json" />
</bean>
<bean id="filePartitioner" class="com.capgemini.partitioner.FilePartitioner">
<property name="outputPath" value="csv/chunked/" />
</bean>
<bean id="dbPartitioner" class="com.capgemini.partitioner.DBPartitioner" scope="step">
<property name="pageSize" value="#{jobExecutionContext['chunk.count']}" />
</bean>
<bean id="largeCSVReader" class="com.capgemini.reader.LineReaderFromFile" scope="step">
<property name="inputPath" value="csv/input/#{jobParameters[filename]}" />
</bean>
<bean id="smallCSVWriter" class="com.capgemini.writer.LineWriterToFile" scope="step">
<property name="outputPath" value="csv/chunked/"></property>
</bean>
<bean id="smallCSVFileReader" class="com.capgemini.reader.PersonReaderFromFile" scope="step">
<constructor-arg value="csv/chunked/#{stepExecutionContext[file]}" />
</bean>
<bean id="importProcessor" class="com.capgemini.processor.ImportPersonItemProcessor" />
<bean id="exportProcessor" class="com.capgemini.processor.ExportPersonItemProcessor" />
<bean id="dbWriter" class="com.capgemini.writer.PersonWriterToDataBase">
<property name="iPersonRepository" ref="IPersonRepository" />
</bean>
31. 31Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
The test
With multi-threading
33. 33Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Test (Threads 2/3)
Before Job aunch
During the
execution of the
Job
34. 34Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Test (Output 3/3)
36. 36Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Summary (Good + 1/2)
Define a pattern for Batch
Embed Batch in any kind of Spring application easily
Reliability, maintainability
Advanced functions such as "Multi-Threading"
Integrated batch testing
Error Tolerance and Recovery
37. 37Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Summary (Not good – 2/2)
The Spring Batch Admin project is no longer maintained : mandatory switch
to the Spring Cloud Data Flow
Difficulty to run a job defined by annotation in a project under JAR package
embarking several Jobs
Version compatibility issues between Spring Batch and H2 Database, very
useful for testing Jobs
39. 39Copyright © 2017 Capgemini. All rights reserved
CSD | October 2017
Useful links
Full source code of the project : https://gitlab.com/mmohamed/spring-batch
Documentation :
• http://projects.spring.io/spring-batch/#quick-start
• https://blog.octo.com/spring-batch-par-quel-bout-le-prendre
• https://blog.netapsys.fr/spring-batch-par-lexemple-2
• http://jeremy-jeanne.developpez.com/tutoriels/spring/spring-batch/
40. www.capgemini.com
The information contained in this presentation is proprietary.
© 2017 Capgemini. All rights reserved. Rightshore® is a trademark belonging to Capgemini.
About Capgemini
With more than 190,000 people, Capgemini is present in over 40
countries and celebrates its 50th Anniversary year in 2017. A
global leader in consulting, technology and outsourcing services,
the Group reported 2016 global revenues of EUR 12.5 billion.
Together with its clients, Capgemini creates and delivers
business, technology and digital solutions that fit their needs,
enabling them to achieve innovation and competitiveness. A
deeply multicultural organization, Capgemini has developed its
own way of working, Collaborative Business ExperienceTM, and
draws on Rightshore®, its worldwide delivery model
Learn more about us at www.capgemini.com
Rightshore® is a trademark belonging to Capgemini