SlideShare une entreprise Scribd logo
© 2022 RWS
1
A way to parse huge JSON files
when the memory used to be a
limitation
Negruti Andrei
© 2022 RWS
2
2 © 2022 RWS
Why do we
have to
process
huge JSON
files?
© 2022 RWS
3
3 © 2022 RWS
Over 7,500 experts across 36 countries
and a client base spanning Europe,
North and South America and Asia
Pacific
Our unrivalled experience and deep
understanding of language has been
developed over more than 60 years
Our global
scale and
experience
We support 330+ language variants
and translate 378+ billion words a
year
© 2022 RWS
4
© 2022 RWS
5
5 © 2022 RWS
Books to BCM’s
(Bilingual Content Model)
© 2022 RWS
6
• 48.922 words
• Original file: 0.7 Mb
• BCM (JSON): 2.9 Mb
The Great Gatsby
F. Scott Fitzgerald
© 2022 RWS
7
• 105.204 words
• Original file: 1.6 Mb
• BCM (JSON): 4.8 Mb
1984
George Orwell
© 2022 RWS
8
• 67.495 words
• Original file: 3.2 Mb
• BCM (JSON): 8.5 Mb
The Clean Coder
Robert E. Martin
© 2022 RWS
9
• 572.298 words
• Original file: 3.6 Mb
• BCM (JSON): 33.2 Mb
War and Peace
Leo Tolstoy
© 2022 RWS
10
• 561.317 words
• Original file: 6.4 Mb
• BCM (JSON): 53.5 Mb
The Lord of The Rings
(Entire trilogy)
J.R.R. Tolkien
© 2022 RWS
11
• 449.467 words
• Original file: 14.1 Mb
• BCM (JSON): 145.8 Mb
Introduction to Algorithms
Thomas H. Cormen,
Charles E. Leiserson, Ronald L. Rivest
and Clifford Stein
© 2022 RWS
12
• 2.565 words
• Original file: 10.2 Mb
• BCM (JSON): 1.3 Mb
A way to parse huge JSON
files when the memory used
to be a limitation
Negruti Andrei
© 2022 RWS
13
13 © 2022 RWS
Zoom into the
Apply Machine Translation
Step
© 2022 RWS
14
© 2022 RWS
15
© 2022 RWS
16
© 2022 RWS
17
© 2022 RWS
18
© 2022 RWS
19
© 2022 RWS
20
© 2022 RWS
21
© 2022 RWS
22
© 2022 RWS
23
© 2022 RWS
24
© 2022 RWS
25
© 2022 RWS
26
26 © 2022 RWS
Processing a JSON in a
Streaming way
© 2022 RWS
27
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
© 2022 RWS
28
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</dependency>
© 2022 RWS
29
JsonParser parser = new JsonFactory().createParser(input)
© 2022 RWS
30
parser.nextToken()
© 2022 RWS
31
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
{
“action”: “SUM_NUMBERS”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
JsonToken.START_OBJECT
JsonToken.FIELD_NAME
JsonToken.VALUE_STRING
JsonToken.FIELD_NAME
JsonToken.START_OBJECT
JsonToken.FIELD_NAME
JsonToken.VALUE_STRING
JsonToken.FIELD_NAME
JsonToken.VALUE_STRING
JsonToken.END_OBJECT
JsonToken.FIELD_NAME
JsonToken.START_ARRAY
JsonToken.VALUE_NUMBER_INT
JsonToken.VALUE_NUMBER_INT
...
JsonToken.VALUE_NUMBER_INT
JsonToken.END_ARRAY
JsonToken.END_OBJECT
null
parser.nextToken()
© 2022 RWS
32
JsonParser parser = new JsonFactory().createParser(numbersFile);
JsonToken token = parser.nextToken();
long total = 0;
while (token != null) {
token = parser.nextToken();
if (JsonToken.FIELD_NAME.equals(token) && parser.getCurrentName().equals("numbers")) {
parser.nextToken(); //Position cursor at START_ARRAY
while (parser.nextToken() != JsonToken.END_ARRAY) {
total += parser.getValueAsInt();
}
}
}
© 2022 RWS
33
33 © 2022 RWS
We built a new
way to process
JSON’s
© 2022 RWS
34
<dependency>
<groupId>com.sdl.lt.lc.json.streaming</groupId>
<artifactId>json-streaming-processor</artifactId>
<version>0.0.1</version>
</dependency>
© 2022 RWS
35
ReadJsonProcessor processor = JsonProcessorBuilder.initProcessor(numbersFile);
PathMatcher pathMatcher = PathMatcherBuilder.builder()
.field("numbers").startArray()
.build();
Iterator<Integer> numbersIterator = processor.readValues(pathMatcher, Integer.class);
long total = 0;
while (numbersIterator.hasNext()) {
total += numbersIterator.next();
}
© 2022 RWS
36
36 © 2022 RWS
Rewrite JSON and add
+1 to each number
© 2022 RWS
37
JsonFactory jsonFactory = new JsonFactory();
JsonParser parser = jsonFactory.createParser(numbersFile);
JsonGenerator generator = jsonFactory.createGenerator(outputStream);
JsonToken token = parser.nextToken();
generator.copyCurrentEvent(parser);
while (token != null) {
token = parser.nextToken();
if (token == null) {
break;
}
generator.copyCurrentEvent(parser);
if (JsonToken.FIELD_NAME.equals(token) && parser.getCurrentName().equals("numbers")) {
parser.nextToken(); //Position cursor at START_ARRAY
generator.copyCurrentEvent(parser);
while (parser.nextToken() != JsonToken.END_ARRAY) {
generator.writeNumber(parser.getValueAsInt() + 1);
}
generator.copyCurrentEvent(parser);
}
}
© 2022 RWS
38
JsonProcessorBuilder builder = JsonProcessorBuilder.initBuilder(numbersFile, outputStream);
PathMatcher pathMatcher = PathMatcherBuilder.builder()
.field("numbers").startArray()
.build();
JsonElementTransformer plusOneEachNumber = builder.mapEach(pathMatcher, Integer.class, nr -> nr + 1);
JsonVisitor visitor = JsonVisitor.withTransformer(plusOneEachNumber);
VisitJsonProcessor processor = builder.build();
processor.visit(visitor);
© 2022 RWS
39
39 © 2022 RWS
Rewrite JSON and add
+1 to each number
Bonus:
retrieve username
© 2022 RWS
40
{
“action”: “PLUS_ONE”,
“requester”: {
“id”: “307d3a82”,
“username”: “admin”
},
“numbers”: [123, 731, ..., 421]
}
© 2022 RWS
41
JsonProcessorBuilder builder = JsonProcessorBuilder.initBuilder(numbersFile, outputStream);
PathMatcher numbersPathMatcher = PathMatcherBuilder.builder()
.field("numbers").startArray()
.build();
PathMatcher usernamePathMatcher = PathMatcherBuilder.builder()
.field("requester").field("username")
.build();
AtomicReference<String> usernameValue = new AtomicReference<>();
JsonVisitor visitor = JsonVisitor.withTransformers(
List.of(
builder.mapEach(numbersPathMatcher, Integer.class, nr -> nr + 1),
builder.peek(usernamePathMatcher, String.class, e -> usernameValue.set(e.getElement()))
)
);
VisitJsonProcessor processor = builder.build();
processor.visit(visitor);
System.out.println(usernameValue.get());
© 2022 RWS
42
42 © 2022 RWS
Performance
© 2022 RWS
43 © 2022 RWS
43
54
ms
104
ms
352
ms
366
ms
904
ms
76
ms
148
ms
435
ms
482
ms
1498
ms
81
ms
155
ms
589
ms
1025
ms
4868
ms
ms
1000 ms
2000 ms
3000 ms
4000 ms
5000 ms
6000 ms
10MB 20MB 40MB 60MB 100MB
SUMMING ALL NUMBERS
Jackson Library Memory
© 2022 RWS
44 © 2022 RWS
44
92
ms
208
ms
699
ms
972
ms
1789
ms
135
ms
258
ms
853
ms
1324
ms
2013
ms
150
ms
295
ms
1067
ms
3541
ms
10747
ms
ms
2000 ms
4000 ms
6000 ms
8000 ms
10000 ms
12000 ms
10MB 20MB 40MB 60MB 100MB
+1 EACH NUMBER AND REWRITE JSON
Jackson Library Memory
© 2022 RWS
45
45 © 2022 RWS
When should you use this
library?
© 2022 RWS
46
46 © 2022 RWS
You want to
avoid building
a complex
token based
logic
© 2022 RWS
47
47 © 2022 RWS
You can break
your JSON into
smaller
processable
units
© 2022 RWS
48
48 © 2022 RWS
You don’t mind
the
deserialization
penalty
© 2022 RWS
49
49 © 2022 RWS
You want to be
up and
running faster
© 2022 RWS
50
50 © 2022 RWS
Our results from processing
JSON’s streamed
© 2022 RWS
51
51 © 2022 RWS
No more out
of memory
errors
© 2022 RWS
52
52 © 2022 RWS
Can process
JSON’s of
ANY SIZE
© 2022 RWS
53
53 © 2022 RWS
Can process
multiple
files in
parallel
© 2022 RWS
54
54 © 2022 RWS
Cheaper to
run our
services
© 2022 RWS
55
https://github.com/RWS/json-streaming-processor

Contenu connexe

Similaire à IT Days - Parse huge JSON files in a streaming way.pptx

Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
MongoDB
 
Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011
rogerbodamer
 
Codable routing
Codable routingCodable routing
Codable routing
Pushkar Kulkarni
 
Mongo db dla administratora
Mongo db dla administratoraMongo db dla administratora
Mongo db dla administratora
Łukasz Jagiełło
 
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.io
Randall Hunt
 
Superficial mongo db
Superficial mongo dbSuperficial mongo db
Superficial mongo db
DaeMyung Kang
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB
 
Spray Json and MongoDB Queries: Insights and Simple Tricks.
Spray Json and MongoDB Queries: Insights and Simple Tricks.Spray Json and MongoDB Queries: Insights and Simple Tricks.
Spray Json and MongoDB Queries: Insights and Simple Tricks.
Andrii Lashchenko
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
MongoDB
 
Redis 101
Redis 101Redis 101
Redis 101
Doğan Can
 
Boost Development With Java EE7 On EAP7 (Demitris Andreadis)
Boost Development With Java EE7 On EAP7 (Demitris Andreadis)Boost Development With Java EE7 On EAP7 (Demitris Andreadis)
Boost Development With Java EE7 On EAP7 (Demitris Andreadis)
Red Hat Developers
 
huhu
huhuhuhu
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
NoSQLmatters
 
NoSQL meets Microservices
NoSQL meets MicroservicesNoSQL meets Microservices
NoSQL meets Microservices
ArangoDB Database
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Henrik Ingo
 
Big Data Expo 2015 - Gigaspaces Making Sense of it all
Big Data Expo 2015 - Gigaspaces Making Sense of it allBig Data Expo 2015 - Gigaspaces Making Sense of it all
Big Data Expo 2015 - Gigaspaces Making Sense of it all
BigDataExpo
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
Server Density
 
Hybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
Hybrid solutions – combining in memory solutions with SSD - Christos ErotocritouHybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
Hybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
JAXLondon_Conference
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
Gianfranco Palumbo
 

Similaire à IT Days - Parse huge JSON files in a streaming way.pptx (20)

Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011
 
Codable routing
Codable routingCodable routing
Codable routing
 
Mongo db dla administratora
Mongo db dla administratoraMongo db dla administratora
Mongo db dla administratora
 
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
 
A Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.ioA Century Of Weather Data - Midwest.io
A Century Of Weather Data - Midwest.io
 
Superficial mongo db
Superficial mongo dbSuperficial mongo db
Superficial mongo db
 
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
MongoDB for Time Series Data: Analyzing Time Series Data Using the Aggregatio...
 
Spray Json and MongoDB Queries: Insights and Simple Tricks.
Spray Json and MongoDB Queries: Insights and Simple Tricks.Spray Json and MongoDB Queries: Insights and Simple Tricks.
Spray Json and MongoDB Queries: Insights and Simple Tricks.
 
MongoDB for Analytics
MongoDB for AnalyticsMongoDB for Analytics
MongoDB for Analytics
 
Redis 101
Redis 101Redis 101
Redis 101
 
Boost Development With Java EE7 On EAP7 (Demitris Andreadis)
Boost Development With Java EE7 On EAP7 (Demitris Andreadis)Boost Development With Java EE7 On EAP7 (Demitris Andreadis)
Boost Development With Java EE7 On EAP7 (Demitris Andreadis)
 
huhu
huhuhuhu
huhu
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
 
NoSQL meets Microservices
NoSQL meets MicroservicesNoSQL meets Microservices
NoSQL meets Microservices
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
Big Data Expo 2015 - Gigaspaces Making Sense of it all
Big Data Expo 2015 - Gigaspaces Making Sense of it allBig Data Expo 2015 - Gigaspaces Making Sense of it all
Big Data Expo 2015 - Gigaspaces Making Sense of it all
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Hybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
Hybrid solutions – combining in memory solutions with SSD - Christos ErotocritouHybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
Hybrid solutions – combining in memory solutions with SSD - Christos Erotocritou
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 

Dernier

Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
Reetu63
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
widenerjobeyrl638
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Peter Caitens
 
What’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete RoadmapWhat’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete Roadmap
Envertis Software Solutions
 
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
campbellclarkson
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
Severalnines
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
OnePlan Solutions
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
Maitrey Patel
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
ervikas4
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
Pedro J. Molina
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 

Dernier (20)

Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
 
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
美洲杯赔率投注网【​网址​🎉3977·EE​🎉】
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
 
What’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete RoadmapWhat’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete Roadmap
 
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
 
Kubernetes at Scale: Going Multi-Cluster with Istio
Kubernetes at Scale:  Going Multi-Cluster  with IstioKubernetes at Scale:  Going Multi-Cluster  with Istio
Kubernetes at Scale: Going Multi-Cluster with Istio
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
 
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 

IT Days - Parse huge JSON files in a streaming way.pptx

  • 1. © 2022 RWS 1 A way to parse huge JSON files when the memory used to be a limitation Negruti Andrei
  • 2. © 2022 RWS 2 2 © 2022 RWS Why do we have to process huge JSON files?
  • 3. © 2022 RWS 3 3 © 2022 RWS Over 7,500 experts across 36 countries and a client base spanning Europe, North and South America and Asia Pacific Our unrivalled experience and deep understanding of language has been developed over more than 60 years Our global scale and experience We support 330+ language variants and translate 378+ billion words a year
  • 5. © 2022 RWS 5 5 © 2022 RWS Books to BCM’s (Bilingual Content Model)
  • 6. © 2022 RWS 6 • 48.922 words • Original file: 0.7 Mb • BCM (JSON): 2.9 Mb The Great Gatsby F. Scott Fitzgerald
  • 7. © 2022 RWS 7 • 105.204 words • Original file: 1.6 Mb • BCM (JSON): 4.8 Mb 1984 George Orwell
  • 8. © 2022 RWS 8 • 67.495 words • Original file: 3.2 Mb • BCM (JSON): 8.5 Mb The Clean Coder Robert E. Martin
  • 9. © 2022 RWS 9 • 572.298 words • Original file: 3.6 Mb • BCM (JSON): 33.2 Mb War and Peace Leo Tolstoy
  • 10. © 2022 RWS 10 • 561.317 words • Original file: 6.4 Mb • BCM (JSON): 53.5 Mb The Lord of The Rings (Entire trilogy) J.R.R. Tolkien
  • 11. © 2022 RWS 11 • 449.467 words • Original file: 14.1 Mb • BCM (JSON): 145.8 Mb Introduction to Algorithms Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein
  • 12. © 2022 RWS 12 • 2.565 words • Original file: 10.2 Mb • BCM (JSON): 1.3 Mb A way to parse huge JSON files when the memory used to be a limitation Negruti Andrei
  • 13. © 2022 RWS 13 13 © 2022 RWS Zoom into the Apply Machine Translation Step
  • 26. © 2022 RWS 26 26 © 2022 RWS Processing a JSON in a Streaming way
  • 27. © 2022 RWS 27 { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] }
  • 29. © 2022 RWS 29 JsonParser parser = new JsonFactory().createParser(input)
  • 31. © 2022 RWS 31 { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } { “action”: “SUM_NUMBERS”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] } JsonToken.START_OBJECT JsonToken.FIELD_NAME JsonToken.VALUE_STRING JsonToken.FIELD_NAME JsonToken.START_OBJECT JsonToken.FIELD_NAME JsonToken.VALUE_STRING JsonToken.FIELD_NAME JsonToken.VALUE_STRING JsonToken.END_OBJECT JsonToken.FIELD_NAME JsonToken.START_ARRAY JsonToken.VALUE_NUMBER_INT JsonToken.VALUE_NUMBER_INT ... JsonToken.VALUE_NUMBER_INT JsonToken.END_ARRAY JsonToken.END_OBJECT null parser.nextToken()
  • 32. © 2022 RWS 32 JsonParser parser = new JsonFactory().createParser(numbersFile); JsonToken token = parser.nextToken(); long total = 0; while (token != null) { token = parser.nextToken(); if (JsonToken.FIELD_NAME.equals(token) && parser.getCurrentName().equals("numbers")) { parser.nextToken(); //Position cursor at START_ARRAY while (parser.nextToken() != JsonToken.END_ARRAY) { total += parser.getValueAsInt(); } } }
  • 33. © 2022 RWS 33 33 © 2022 RWS We built a new way to process JSON’s
  • 35. © 2022 RWS 35 ReadJsonProcessor processor = JsonProcessorBuilder.initProcessor(numbersFile); PathMatcher pathMatcher = PathMatcherBuilder.builder() .field("numbers").startArray() .build(); Iterator<Integer> numbersIterator = processor.readValues(pathMatcher, Integer.class); long total = 0; while (numbersIterator.hasNext()) { total += numbersIterator.next(); }
  • 36. © 2022 RWS 36 36 © 2022 RWS Rewrite JSON and add +1 to each number
  • 37. © 2022 RWS 37 JsonFactory jsonFactory = new JsonFactory(); JsonParser parser = jsonFactory.createParser(numbersFile); JsonGenerator generator = jsonFactory.createGenerator(outputStream); JsonToken token = parser.nextToken(); generator.copyCurrentEvent(parser); while (token != null) { token = parser.nextToken(); if (token == null) { break; } generator.copyCurrentEvent(parser); if (JsonToken.FIELD_NAME.equals(token) && parser.getCurrentName().equals("numbers")) { parser.nextToken(); //Position cursor at START_ARRAY generator.copyCurrentEvent(parser); while (parser.nextToken() != JsonToken.END_ARRAY) { generator.writeNumber(parser.getValueAsInt() + 1); } generator.copyCurrentEvent(parser); } }
  • 38. © 2022 RWS 38 JsonProcessorBuilder builder = JsonProcessorBuilder.initBuilder(numbersFile, outputStream); PathMatcher pathMatcher = PathMatcherBuilder.builder() .field("numbers").startArray() .build(); JsonElementTransformer plusOneEachNumber = builder.mapEach(pathMatcher, Integer.class, nr -> nr + 1); JsonVisitor visitor = JsonVisitor.withTransformer(plusOneEachNumber); VisitJsonProcessor processor = builder.build(); processor.visit(visitor);
  • 39. © 2022 RWS 39 39 © 2022 RWS Rewrite JSON and add +1 to each number Bonus: retrieve username
  • 40. © 2022 RWS 40 { “action”: “PLUS_ONE”, “requester”: { “id”: “307d3a82”, “username”: “admin” }, “numbers”: [123, 731, ..., 421] }
  • 41. © 2022 RWS 41 JsonProcessorBuilder builder = JsonProcessorBuilder.initBuilder(numbersFile, outputStream); PathMatcher numbersPathMatcher = PathMatcherBuilder.builder() .field("numbers").startArray() .build(); PathMatcher usernamePathMatcher = PathMatcherBuilder.builder() .field("requester").field("username") .build(); AtomicReference<String> usernameValue = new AtomicReference<>(); JsonVisitor visitor = JsonVisitor.withTransformers( List.of( builder.mapEach(numbersPathMatcher, Integer.class, nr -> nr + 1), builder.peek(usernamePathMatcher, String.class, e -> usernameValue.set(e.getElement())) ) ); VisitJsonProcessor processor = builder.build(); processor.visit(visitor); System.out.println(usernameValue.get());
  • 42. © 2022 RWS 42 42 © 2022 RWS Performance
  • 43. © 2022 RWS 43 © 2022 RWS 43 54 ms 104 ms 352 ms 366 ms 904 ms 76 ms 148 ms 435 ms 482 ms 1498 ms 81 ms 155 ms 589 ms 1025 ms 4868 ms ms 1000 ms 2000 ms 3000 ms 4000 ms 5000 ms 6000 ms 10MB 20MB 40MB 60MB 100MB SUMMING ALL NUMBERS Jackson Library Memory
  • 44. © 2022 RWS 44 © 2022 RWS 44 92 ms 208 ms 699 ms 972 ms 1789 ms 135 ms 258 ms 853 ms 1324 ms 2013 ms 150 ms 295 ms 1067 ms 3541 ms 10747 ms ms 2000 ms 4000 ms 6000 ms 8000 ms 10000 ms 12000 ms 10MB 20MB 40MB 60MB 100MB +1 EACH NUMBER AND REWRITE JSON Jackson Library Memory
  • 45. © 2022 RWS 45 45 © 2022 RWS When should you use this library?
  • 46. © 2022 RWS 46 46 © 2022 RWS You want to avoid building a complex token based logic
  • 47. © 2022 RWS 47 47 © 2022 RWS You can break your JSON into smaller processable units
  • 48. © 2022 RWS 48 48 © 2022 RWS You don’t mind the deserialization penalty
  • 49. © 2022 RWS 49 49 © 2022 RWS You want to be up and running faster
  • 50. © 2022 RWS 50 50 © 2022 RWS Our results from processing JSON’s streamed
  • 51. © 2022 RWS 51 51 © 2022 RWS No more out of memory errors
  • 52. © 2022 RWS 52 52 © 2022 RWS Can process JSON’s of ANY SIZE
  • 53. © 2022 RWS 53 53 © 2022 RWS Can process multiple files in parallel
  • 54. © 2022 RWS 54 54 © 2022 RWS Cheaper to run our services

Notes de l'éditeur

  1. Hello & Welcome A production issue fixed with the help of an open source library we built
  2. Let’s start from the top Why do we even have to process huge JSON files? By process I mean actually manipulating these files not only storing them for import/export flows or analytics
  3. We, in RWS, do translation’s a lot of them for a lot of clients In our system whatever the customer uploads turns into a JSON that we call BCM (Billingual Content Model)
  4. How does a simple translation flow Upload files Convert to BCM’s (which are JSON’s that hold both the original file details and the resulting translated document details) Apply Machine Translation Maybe someone will then rewrite parts of the machined translated text Finally we convert this JSON back to the original file type with the translated text
  5. To give you a sense of how big these files can get I will quickly show some books and tell you the BCM’s size for each of them Keep in mind this is before applying any translation which will of course make files even bigger
  6. About 50 thousand words Gets to 3Mb
  7. About 105 thousand words Gets you to 5 Mb
  8. Couldn’t skip Uncle Bob’s book About 68 thousand words Gets you to 8 and a half Mb JSON
  9. Let’s jump to one of the longest novels About 570 thousand words Gives us 33 Mb Json
  10. The entire Lord of The Rings Trilogy Slightly less words than the previous book 53 Mb Json
  11. Finally, Introduction to Algorithms About 450 thousand words Almost 150 Mb JSON
  12. Finally just to introduce an Inception moment This presentation has about 2.5k words including notes And the JSON we will process to translate is a little over 1Mb
  13. - Now that you have a sense of how big some of our BCM’s can be let’s zoom into the place where we first noticed problems
  14. - First we receive a message that we have to apply translation to a file
  15. - Then we download the file in memory
  16. - Remove the already translated paragraphs
  17. - Send the file to translation
  18. The send to translation might give us some problems But when we were merging the translation back into the original file we had more problems This is triggered by receiving another message that the translation for a given file is done
  19. First we download the original file again
  20. Then we download the translation
  21. When we have both files in memory we merge them together
  22. Then we upload the BCM to be used by other services Where’s the problem?
  23. Go back to the merging Its here where we have 2 JSON’s in memory to be able to merge them together Let me show you how much memory this might use
  24. What happens if we try to translate Introduction to Algorithms The file without any translation is 150Mb Assume the translation doubles the file just to make math easier In memory those same files can be a lot bigger with one bigger JSON that I’ve tested it was 3 times bigger Assuming the same multiplier results in almost 1.3 Gb of memory being used This is on top of everything else running on the service
  25. What’s the immediate fix you can do? Pay more for more memory Reduce your throughput This is not a long term solution
  26. Here is where we started working on a different approach Working in the Java ecosystem we looked around for a solution that fits our needs Here is where I found about Jackson’s streaming capabilities
  27. Say we have this JSON We receive a request from specifying an action to perform on a list of numbers from a user The JSON is too big to deserialize in memory How do we sum up all numbers with limited memory use using Jackson?
  28. First we add the Jackson dependency to our project
  29. Then we will be creating a JsonParser To do this we will need an InputSource from where the JSON will be read, let’s assume a file
  30. - Then we will be using the parser.nextToken() the most to process this JSON
  31. If we keep calling the .nextToken() method If you want to sum the numbers you will have a while block Find the numbers field After that iterate over all numbers and add them to a total Doing this will ensure that you have at most a few Kb of memory being used for the bytes that Jackson will buffer in memory for better performance
  32. This is how a version of the code might look like We managed to sum the numbers The only issue is that even for our simple example this is the logic, imagine how your own logic mapped to tokens would look like Performance wise this is the best but our use cases are too complex to implement them using with tokens
  33. We built a new library Backed by Jackson but without the token logic Let’s try again to sum all the numbers with our new library
  34. As of last night you can import the library in your own project
  35. We initialize a processor to read from our file We define the path where we can find the list of numbers After that using the processor and the path defined we can say that at the given path we expect to find Integer’s and to read them all Using the iterator we can quickly sum up all the numbers
  36. Most of our operations on the BCM’s require us to rewrite it and apply different logic while we do that We made sure that the library we wrote permits us to do that as well Let’s look over another exercise and see how we can do it in a streamed way
  37. won’t bother you with how you can do the rewrite and +1 using Jackson because it’s a lot I’m just showing you a snippet of how one might be able to do it in over 20 lines of codes
  38. Using our library it’s a lot easier to do this We create a processor builder and initialize with the input source pointing to the numbers file and and outputSource where we write our JSON Define numbers path again Create what we call a transformer that we tell to look at the numbers array path, expect to find Integer’s and apply the following function Then we create a visitor using this transformer Build our Visiting processor And we call the visit method of the processor We will iterate over JSON and apply all the transformers specified while rewriting the entire JSON in the OutputStream
  39. - If we want to complicate the problem even more and say we want to extract the username from the JSON
  40. Remember we had a requester field that has an object including the username
  41. Using our library we can quickly achieve this Define our path matchers Define an object where we will be storing out username value Define our JSON visitor specifying the 2 transformers Then we visit the JSON At the end we will have the username value stored and we can do whatever we want with it
  42. How does our library compare with Jackson and processing files in memory?
  43. Did some benchmark tests just to give you a glimpse of how Jackson our Library and processing files in memory compare As you can see from 10Mb files to 100Mb JSON’s, if we want to sum the numbers Jackson will be the fastest, followed closely by our library Starting from about 60Mb files processing in memory becomes way slower I don’t have a graph to show you how much memory is being used but is Kb vs Mb’s For my numbers JSON I’ve only used 3 digit numbers. 3 digits on disk it’s 3 bits, same 3 digits as an int in Java will use 32 bits. Each number is more than 10 times bigger in memory.
  44. Comparing performance for rewriting JSON’s shows us a similar story Doing anything in memory is always slower than doing it in a streamed way
  45. If you know your JSON’s are too big for you to hold in memory, when is it a good idea to use our library over Jackson?
  46. - This is a good tell-tale sign that you should be using our library
  47. I’ve only shown you deserialization to Integer’s and String’s but using our library you can deserialize to an entire object Split your logic into smaller processable units
  48. Like I mentioned the slide before you can deserialize to any object and process into smaller units Deserialization has it’s performance penalty, you have to check how many deserialization’s you end up doing
  49. Even for the easiest of logics building that logic using our library will be faster
  50. For now we use the library in 4 distinct flows What are our own personal results from processing files using the library we built For our BCM’s our processable units are Paragraphs and we were able to do everything we needed by only deserializing one paragraph at a time
  51. - Doesn’t matter the size of a BCM a paragraph can’t be so big that we can’t hold in memory
  52. - Back to processing multiple files in parallel since we use less memory
  53. - We simply don’t need as much memory added to these services because we don’t use as much now
  54. On GitHub you can find the source code, all the other transformers you can use and tests that show you how they all work I’ve got a list of refactorings, new api’s, tests to add so if you like the library please do follow the progress there Would love to receive any feedback on it If you have a use case where you think this library might work for you try using the library or come contact me If you use a different programming language than Java and you need the same functionality also contact me because we can make something work