SlideShare une entreprise Scribd logo
1  sur  43
Let’s Get to the Rapids
Understanding Java 8 Stream Performance
QCon NewYork
June 2015
@mauricenaftalin
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/java8-stream-performance
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
@mauricenaftalin
Maurice Naftalin
Developer, designer, architect, teacher, learner, writer
Repeat offender:
Maurice Naftalin
Java 5 Java 8
@mauricenaftalin
The Lambda FAQ
www.lambdafaq.org
Agenda
– Background
– Java 8 Streams
– Parallelism
– Microbenchmarking
– Case study
– Conclusions
Streams – Why?
• Bring functional style to Java
• Exploit hardware parallelism – “explicit but unobtrusive”
Streams – Why?
• Intention: replace loops for aggregate operations
List<Person> people = …
Set<City> shortCities = new HashSet<>();

for (Person p : people) {
City c = p.getCity();
if (c.getName().length() < 4 ) {
shortCities.add(c);
}
}
instead of writing this:
7
Streams – Why?
• Intention: replace loops for aggregate operations
• more concise, more readable, composable operations, parallelizable
Set<City> shortCities = new HashSet<>();

for (Person p : people) {
City c = p.getCity();
if (c.getName().length() < 4 ) {
shortCities.add(c);
}
}
instead of writing this:
List<Person> people = …
Set<City> shortCities = people.stream()
.map(Person::getCity)

.filter(c -> c.getName().length() < 4)
.collect(toSet());
8
we’re going to write this:
Streams – Why?
• Intention: replace loops for aggregate operations
• more concise, more readable, composable operations, parallelizable
Set<City> shortCities = new HashSet<>();

for (Person p : people) {
City c = p.getCity();
if (c.getName().length() < 4 ) {
shortCities.add(c);
}
}
instead of writing this:
List<Person> people = …
Set<City> shortCities = people.parallelStream()
.map(Person::getCity)

.filter(c -> c.getName().length() < 4)
.collect(toSet());
9
we’re going to write this:
@mauricenaftalin
x0x1 x0x2x3 y0y1
Intermediate Op(s)
(Mutable)
ReductionSpliterator
Visualizing Stream Operations
x1
Practical Benefits of Streams?
Functional style will affect (nearly) all collection processing
Automatic parallelism is useful, in certain situations
- but everyone cares about performance!
Parallelism – Why?
The Free Lunch Is Over
http://www.gotw.ca/publications/concurrency-ddj.htm
Intel Xeon E5 2600 10-core
@mauricenaftalin
x2
x0
x1
x3
x0
x1
x2
x3
y0
y1
y2
y3
Intermediate Op(s)
(Mutable)
Reduction
Spliterator
Visualizing Stream Operations
What to Measure?
How do code changes affect system performance?
Controlled experiment, production conditions
- difficult!
So: controlled experiment, lab conditions
- beware the substitution effect!
Microbenchmarking
Really hard to get meaningful results from a dynamic runtime:
– timing methods are flawed
– System.currentTimeMillis() and System.nanoTime()
– compilation can occur at any time
– garbage collection interferes
– runtime optimizes code after profiling it for some time
– then may deoptimize it
– optimizations include dead code elimination
Microbenchmarking
Don’t try to eliminate these effects yourself!
Use a benchmarking library
– Caliper
– JMH (Java Benchmarking Harness)
Ensure your results are statistically meaningful
Get your benchmarks peer-reviewed
Case Study: grep -b
The Moving Finger writes; and, having writ,
Moves on: nor all thy Piety nor Wit
Shall bring it back to cancel half a Line
Nor all thy Tears wash out a Word of it.
rubai51.txt
grep -b:
“The offset in bytes of a matched pattern
is displayed in front of the matched line.”
$ grep -b 'W.*t' rubai51.txt
44:Moves on: nor all thy Piety nor Wit
122:Nor all thy Tears wash out a Word of it.
Because we don’t have a problem
Why Shouldn’t We Optimize Code?
Why Shouldn’t We Optimize Code?
Because we don’t have a problem
- No performance target!
Because we don’t have a problem
- No performance target!
Else there is a problem, but not in our process
Why Shouldn’t We Optimize Code?
Because we don’t have a problem
- No performance target!
Else there is a problem, but not in our process
- The OS is struggling!
Why Shouldn’t We Optimize Code?
Because we don’t have a problem
- No performance target!
Else there is a problem, but not in our process
- The OS is struggling!
Else there’s a problem in our process, but not in the code
Why Shouldn’t We Optimize Code?
Because we don’t have a problem
- No performance target!
Else there is a problem, but not in our process
- The OS is struggling!
Else there’s a problem in our process, but not in the code
- GC is using all the cycles!
Why Shouldn’t We Optimize Code?
Because we don’t have a problem
- No performance target!
Else there is a problem, but not in our process
- The OS is struggling!
Else there’s a problem in our process, but not in the code
- GC is using all the cycles!
Why Shouldn’t We Optimize Code?
Else there’s a problem in the code… somewhere
- now we can consider optimising!
41 122Nor …Moves … 36 44
grep -b: Collector combiner
0The … 44[ , 42 80Shall … ]
41 42Nor …Moves … 36 440The … 44[ , 42 0Shall … ]
, ,
,] [
Moves … 36 0 41 0Nor …42 0Shall …0The … 44
grep -b: Collector accumulator
44 0The moving … writ,
“Moves on: … Wit”
44 0The moving … writ, 36 44Moves on: … Wit
][
][ ,
[ ]
Supplier “The moving … writ,”
accumulator
accumulator
41 122Nor …Moves … 36 44
grep -b: Collector solution
0The … 44[ , 42 80Shall … ]
41 42Nor …Moves … 36 440The … 44[ , 42 0Shall … ]
, ,
,] [
80
What’s wrong?
• Possibly very little
- overall performance comparable to Unix grep -b
• Can we improve it by going parallel?
Serial vs. Parallel
• The problem is a prefix sum – every element contains the
sum of the preceding ones.
- Combiner is O(n)
• The source is streaming IO (BufferedReader.lines())
• Amdahl’s Law strikes:
A Parallel Solution for grep -b
Need to get rid of streaming IO – inherently serial
Parallel streams need splittable sources
Stream Sources
Implemented by a Spliterator
Moves …Wit
LineSpliterator
The moving Finger … writ n Shall … Line Nor all thy … itn n n
spliterator coverage
new spliterator coverage
MappedByteBuffer mid
Parallelizing grep -b
• Splitting action of LineSpliterator is O(log n)
• Collector no longer needs to compute index
• Result (relatively independent of data size):
- sequential stream ~2x as fast as iterative solution
- parallel stream >2.5x as fast as sequential stream
- on 4 hardware threads
When to go Parallel
The workload of the intermediate operations must be great
enough to outweigh the overheads (~100µs):
– initializing the fork/join framework
– splitting
– concurrent collection
Often quoted as N x Q
size of data set processing cost per element
Intermediate Operations
Parallel-unfriendly intermediate operations:
stateful ones
– need to store some or all of the stream data in memory
– sorted()
those requiring ordering
– limit()
Collectors Cost Extra!
Depends on the performance of accumulator and
combiner functions
• toList(), toSet(), toCollection() – performance
normally dominated by accumulator
• but allow for the overhead of managing multithread access to non-
threadsafe containers for the combine operation
• toMap(), toConcurrentMap() – map merging is slow.
Resizing maps, especially concurrent maps, is very expensive.
Whenever possible, presize all data structures, maps in
particular.
Threads for executing parallel streams are (all but one) drawn
from the common Fork/Join pool
• Intermediate operations that block (for example on I/O) will
prevent pool threads from servicing other requests
• Fork/Join pool assumes by default that it can use all cores
– Maybe other thread pools (or other processes) are running?
Parallel Streams in the Real World
Performance mostly doesn’t matter
But if you must…
• sequential streams normally beat iterative solutions
• parallel streams can utilize all cores, providing
- the data is efficiently splittable
- the intermediate operations are sufficiently expensive and are
CPU-bound
- there isn’t contention for the processors
Conclusions
@mauricenaftalin
Resources
http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html
http://shipilev.net/talks/devoxx-Nov2013-benchmarking.pdf
http://openjdk.java.net/projects/code-tools/jmh/
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/java8-
stream-performance

Contenu connexe

Plus de C4Media

Plus de C4Media (20)

High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Let’s Get to the Rapids: Java 8 Stream Performance

  • 1. Let’s Get to the Rapids Understanding Java 8 Stream Performance QCon NewYork June 2015 @mauricenaftalin
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /java8-stream-performance
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. @mauricenaftalin Maurice Naftalin Developer, designer, architect, teacher, learner, writer
  • 7. Agenda – Background – Java 8 Streams – Parallelism – Microbenchmarking – Case study – Conclusions
  • 8. Streams – Why? • Bring functional style to Java • Exploit hardware parallelism – “explicit but unobtrusive”
  • 9. Streams – Why? • Intention: replace loops for aggregate operations List<Person> people = … Set<City> shortCities = new HashSet<>();
 for (Person p : people) { City c = p.getCity(); if (c.getName().length() < 4 ) { shortCities.add(c); } } instead of writing this: 7
  • 10. Streams – Why? • Intention: replace loops for aggregate operations • more concise, more readable, composable operations, parallelizable Set<City> shortCities = new HashSet<>();
 for (Person p : people) { City c = p.getCity(); if (c.getName().length() < 4 ) { shortCities.add(c); } } instead of writing this: List<Person> people = … Set<City> shortCities = people.stream() .map(Person::getCity)
 .filter(c -> c.getName().length() < 4) .collect(toSet()); 8 we’re going to write this:
  • 11. Streams – Why? • Intention: replace loops for aggregate operations • more concise, more readable, composable operations, parallelizable Set<City> shortCities = new HashSet<>();
 for (Person p : people) { City c = p.getCity(); if (c.getName().length() < 4 ) { shortCities.add(c); } } instead of writing this: List<Person> people = … Set<City> shortCities = people.parallelStream() .map(Person::getCity)
 .filter(c -> c.getName().length() < 4) .collect(toSet()); 9 we’re going to write this:
  • 12. @mauricenaftalin x0x1 x0x2x3 y0y1 Intermediate Op(s) (Mutable) ReductionSpliterator Visualizing Stream Operations x1
  • 13. Practical Benefits of Streams? Functional style will affect (nearly) all collection processing Automatic parallelism is useful, in certain situations - but everyone cares about performance!
  • 14. Parallelism – Why? The Free Lunch Is Over http://www.gotw.ca/publications/concurrency-ddj.htm
  • 15. Intel Xeon E5 2600 10-core
  • 17. What to Measure? How do code changes affect system performance? Controlled experiment, production conditions - difficult! So: controlled experiment, lab conditions - beware the substitution effect!
  • 18. Microbenchmarking Really hard to get meaningful results from a dynamic runtime: – timing methods are flawed – System.currentTimeMillis() and System.nanoTime() – compilation can occur at any time – garbage collection interferes – runtime optimizes code after profiling it for some time – then may deoptimize it – optimizations include dead code elimination
  • 19. Microbenchmarking Don’t try to eliminate these effects yourself! Use a benchmarking library – Caliper – JMH (Java Benchmarking Harness) Ensure your results are statistically meaningful Get your benchmarks peer-reviewed
  • 20. Case Study: grep -b The Moving Finger writes; and, having writ, Moves on: nor all thy Piety nor Wit Shall bring it back to cancel half a Line Nor all thy Tears wash out a Word of it. rubai51.txt grep -b: “The offset in bytes of a matched pattern is displayed in front of the matched line.” $ grep -b 'W.*t' rubai51.txt 44:Moves on: nor all thy Piety nor Wit 122:Nor all thy Tears wash out a Word of it.
  • 21. Because we don’t have a problem Why Shouldn’t We Optimize Code?
  • 22. Why Shouldn’t We Optimize Code? Because we don’t have a problem - No performance target!
  • 23. Because we don’t have a problem - No performance target! Else there is a problem, but not in our process Why Shouldn’t We Optimize Code?
  • 24. Because we don’t have a problem - No performance target! Else there is a problem, but not in our process - The OS is struggling! Why Shouldn’t We Optimize Code?
  • 25. Because we don’t have a problem - No performance target! Else there is a problem, but not in our process - The OS is struggling! Else there’s a problem in our process, but not in the code Why Shouldn’t We Optimize Code?
  • 26. Because we don’t have a problem - No performance target! Else there is a problem, but not in our process - The OS is struggling! Else there’s a problem in our process, but not in the code - GC is using all the cycles! Why Shouldn’t We Optimize Code?
  • 27. Because we don’t have a problem - No performance target! Else there is a problem, but not in our process - The OS is struggling! Else there’s a problem in our process, but not in the code - GC is using all the cycles! Why Shouldn’t We Optimize Code? Else there’s a problem in the code… somewhere - now we can consider optimising!
  • 28. 41 122Nor …Moves … 36 44 grep -b: Collector combiner 0The … 44[ , 42 80Shall … ] 41 42Nor …Moves … 36 440The … 44[ , 42 0Shall … ] , , ,] [ Moves … 36 0 41 0Nor …42 0Shall …0The … 44
  • 29. grep -b: Collector accumulator 44 0The moving … writ, “Moves on: … Wit” 44 0The moving … writ, 36 44Moves on: … Wit ][ ][ , [ ] Supplier “The moving … writ,” accumulator accumulator
  • 30. 41 122Nor …Moves … 36 44 grep -b: Collector solution 0The … 44[ , 42 80Shall … ] 41 42Nor …Moves … 36 440The … 44[ , 42 0Shall … ] , , ,] [ 80
  • 31. What’s wrong? • Possibly very little - overall performance comparable to Unix grep -b • Can we improve it by going parallel?
  • 32. Serial vs. Parallel • The problem is a prefix sum – every element contains the sum of the preceding ones. - Combiner is O(n) • The source is streaming IO (BufferedReader.lines()) • Amdahl’s Law strikes:
  • 33. A Parallel Solution for grep -b Need to get rid of streaming IO – inherently serial Parallel streams need splittable sources
  • 35. Moves …Wit LineSpliterator The moving Finger … writ n Shall … Line Nor all thy … itn n n spliterator coverage new spliterator coverage MappedByteBuffer mid
  • 36. Parallelizing grep -b • Splitting action of LineSpliterator is O(log n) • Collector no longer needs to compute index • Result (relatively independent of data size): - sequential stream ~2x as fast as iterative solution - parallel stream >2.5x as fast as sequential stream - on 4 hardware threads
  • 37. When to go Parallel The workload of the intermediate operations must be great enough to outweigh the overheads (~100µs): – initializing the fork/join framework – splitting – concurrent collection Often quoted as N x Q size of data set processing cost per element
  • 38. Intermediate Operations Parallel-unfriendly intermediate operations: stateful ones – need to store some or all of the stream data in memory – sorted() those requiring ordering – limit()
  • 39. Collectors Cost Extra! Depends on the performance of accumulator and combiner functions • toList(), toSet(), toCollection() – performance normally dominated by accumulator • but allow for the overhead of managing multithread access to non- threadsafe containers for the combine operation • toMap(), toConcurrentMap() – map merging is slow. Resizing maps, especially concurrent maps, is very expensive. Whenever possible, presize all data structures, maps in particular.
  • 40. Threads for executing parallel streams are (all but one) drawn from the common Fork/Join pool • Intermediate operations that block (for example on I/O) will prevent pool threads from servicing other requests • Fork/Join pool assumes by default that it can use all cores – Maybe other thread pools (or other processes) are running? Parallel Streams in the Real World
  • 41. Performance mostly doesn’t matter But if you must… • sequential streams normally beat iterative solutions • parallel streams can utilize all cores, providing - the data is efficiently splittable - the intermediate operations are sufficiently expensive and are CPU-bound - there isn’t contention for the processors Conclusions
  • 43. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/java8- stream-performance