The slides from the talk I gave at Oracle III #JuevesTecnológicos in Madrid.
A review of how the ParallelStreams Work in Java 8 and some considerations we must know in order to get the better performance from the concurrent data processing in #Java8
2. Do you remember?
use stream()
for (int i = 0; i < 100; i++) {
long start = System.currentTimeMillis();
List<Integer> even = numbers.parallelStream()
.filter(n -> n % 2 == 0)
.sorted()
.collect(toList());
System.out.printf(
"%d elements computed in %5d msecs with %d threadsn”,
even.size(), System.currentTimeMillis() - start,
Thread.activeCount());
}
4999299 elements computed in 225 msecs with 9 threads
4999299 elements computed in 230 msecs with 9 threads
4999299 elements computed in 250 msecs with 9 threads
@dgomezg
5. A Stream is…
An convenience method to iterate over
collections in a declarative way
List<Integer> numbers = new ArrayList<Integer>();
for (int i= 0; i < 100 ; i++) {
numbers.add(i);
}
List<Integer> evenNumbers = numbers.stream()
.filter(n -> n % 2 == 0)
.collect(toList());
@dgomezg
6. Anatomy of a Stream
Source
Intermediate
Operations
filter
map
order
function
Final
operation
pipeline
@dgomezg
7. Iterating a Stream
List<Integer> evenNumbers = numbers.stream()
.filter(n -> n % 2 == 0)
.collect(toList());
Internal Iteration
- No manual Iterators handling
- Concise
- Fluent API: chain sequence processing
Elements computed only when needed
@dgomezg
8. Iterating a Stream
List<Integer> evenNumbers = numbers.parallelStream()
.filter(n -> n % 2 == 0)
.collect(toList());
Easily Parallelism
- Concurrency is hard to be done right!
- Uses ForkJoin
- Process steps should be
- stateless
- independent
@dgomezg
9. Parallel Streams
use stream()
List<Integer> numbers = new ArrayList<>();
for (int i= 0; i < 10_000_000 ; i++) {
numbers.add((int)Math.round(Math.random()*100));
}
//This will use just a single thread
Stream<Integer> evenNumbers = numbers.stream();
or parallelStream()
//Automatically select the optimum number of threads
Stream<Integer> evenNumbers = numbers.parallelStream();
@dgomezg
10. Let’s test it
use stream()
for (int i = 0; i < 100; i++) {
long start = System.currentTimeMillis();
List<Integer> even = numbers.stream()
.filter(n -> n % 2 == 0)
.sorted()
.collect(toList());
System.out.printf(
"%d elements computed in %5d msecs with %d threadsn”,
even.size(), System.currentTimeMillis() - start,
Thread.activeCount());
}
5001983 elements computed in 828 msecs with 2 threads
5001983 elements computed in 843 msecs with 2 threads
5001983 elements computed in 675 msecs with 2 threads
5001983 elements computed in 795 msecs with 2 threads
@dgomezg
11. Going parallel
use stream()
for (int i = 0; i < 100; i++) {
long start = System.currentTimeMillis();
List<Integer> even = numbers.parallelStream()
.filter(n -> n % 2 == 0)
.sorted()
.collect(toList());
System.out.printf(
"%d elements computed in %5d msecs with %d threadsn”,
even.size(), System.currentTimeMillis() - start,
Thread.activeCount());
}
4999299 elements computed in 225 msecs with 9 threads
4999299 elements computed in 230 msecs with 9 threads
4999299 elements computed in 250 msecs with 9 threads
@dgomezg
14. Fork/Join Framework
Proposed by Doug Lea
"a style of parallel programming in
which problems are solved by
(recursively) splitting them into
subtasks that are solved in parallel."
Available in Java 7
Used by ParallelStreams
15. The F/J algorithm
Result solve(Problem problem)
{
if (problem is small)
directly solve problem
else
{
split problem into independent parts
fork new subtasks to solve each part
join all subtasks
compose result from subresults
}
}
as proposed by Doug Lea
17. ForkJoinTask
Abstract class that represents a task to be run
concurrently
Every ForkJoinTask could be splitted (if not small
enough) and solved Recursively
Two concrete implementations
• RecursiveAction if not returning value
• RecursiveTask if returning a value
18. ForkJoinWorkerThread
Any of the threads created by the ForkJoinPool
Executes ForkJoinTasks
Everyone has a Dequeue for tasks (allows task
stealing)
19. ForkJoinWorkerThread
Result solve(Problem problem)
{
if (problem is small)
directly solve problem
else
{
split problem into independent parts
fork new subtasks to solve each part
join all subtasks
compose result from subresults
}
}
the F/J algorithm
plus Task Stealing.
20. Fork/Join. When to use?
For computations that could be splitted into smaller
tasks
aka ‘divide and conquer’ algorithms
Independent
Reduction with no contention.
22. ParallellStreams
for (int i = 0; i < 100; i++) {
long start = System.currentTimeMillis();
List<Integer> even = numbers.parallelStream()
.filter(n -> n % 2 == 0)
.sorted()
.collect(toList());
System.out.printf(
"%d elements computed in %5d msecs with %d threadsn”,
even.size(), System.currentTimeMillis() - start,
Thread.activeCount());
}
4999299 elements computed in 225 msecs with 9 threads
4999299 elements computed in 230 msecs with 9 threads
4999299 elements computed in 250 msecs with 9 threads
23. Thread.activeCount not accurate
for (int i = 0; i < 100; i++) {
long start = System.currentTimeMillis();
List<Integer> even = numbers.parallelStream()
.filter(n -> n % 2 == 0)
.sorted()
.collect(toList());
System.out.printf(
"%d elements computed in %5d msecs with %d threadsn”,
even.size(), System.currentTimeMillis() - start,
Thread.activeCount());
}
Thread.activeCount() does not show the effective
number of threads processing the stream
24. Better count threads involved
Set<String> workerThreadNames = new ConcurrentSet<>();
for (int i = 0; i < 100; i++) {
long start = System.currentTimeMillis();
List<Integer> even = numbers.stream()
.filter(n -> n % 2 == 0)
.peek(n -> workerThreadNames.add(
Thread.currentThread().getName()))
.sorted()
.collect(toList());
System.out.printf(
"%d elements computed in %5d msecs with %d threadsn”,
even.size(), System.currentTimeMillis() - start,
workerThreadNames.size());
}
25. Threads usage
ParallelStreams use the common ForkJoinPool
Number of worker threads configured with
-‐Djava.util.concurrent.ForkJoinPool.common.parallelism=n
Useful to keep CPU parallelism under control…
…but …
26. Limiting parallelism
for (int i = 0; i < 100; i++) {
long start = System.currentTimeMillis();
List<Integer> even = numbers.stream()
.filter(n -> n % 2 == 0)
.peek(n -> workerThreadNames.add(
Thread.currentThread().getName()))
.sorted()
.collect(toList());
System.out.printf(
"%d elements computed in %5d msecs with %d threadsn”,
even.size(), System.currentTimeMillis() - start,
workerThreadNames.size());
}
-‐Djava.util.concurrent.ForkJoinPool.common.parallelism=4
5001069 elements computed in 269 msecs with 5 threads
WTF
27. Limiting parallelism
for (int i = 0; i < 100; i++) {
long start = System.currentTimeMillis();
List<Integer> even = numbers.stream()
.filter(n -> n % 2 == 0)
.peek(n -> workerThreadNames.add(
Thread.currentThread().getName()))
.sorted()
.collect(toList());
System.out.printf(
"%d elements computed in %5d msecs with %d threadsn”,
even.size(), System.currentTimeMillis() - start,
workerThreadNames.size());
}
System.out.println("credits to threads: “
+ workerThreadNames);
5001069 elements computed in 269 msecs with 5 threads
credits to threads:
ForkJoinPool.commonPool-worker-0,
ForkJoinPool.commonPool-worker-1,
ForkJoinPool.commonPool-worker-2,
ForkJoinPool.commonPool-worker-3, main
WTF
28. Threads Involved in ParallelStream
ParallelStreams use the common ForkJoinPool
Thread invoking ParallelStream also used as
Worker
Caveats:
•ParallelStream processing is synchronous for
invoking thread
•Other Threads using common ForkJoinPool
could be affected
29. ParallelStream Hack
ParallelStream can be forced to use a custom
ForkJoinPool
ForkJoinPool forkJoinPool = new ForkJoinPool(4);
long start = System.currentTimeMillis();
numbers.parallelStream()
.filter(n -> n % 2 == 0)
.sorted()
.collect(toList());
30. ParallelStream Hack
ParallelStream can be forced to use a custom
ForkJoinPool
ForkJoinPool forkJoinPool = new ForkJoinPool(4);
long start = System.currentTimeMillis();
ForkJoinTask<List<Integer>> task =
forkJoinPool.submit(() -> {
return numbers.parallelStream()
.filter(n -> n % 2 == 0)
.sorted()
.collect(toList());
}
);
List<Integer> even = task.get();
31. ParallelStream Hack
ParallelStream can be forced to use a custom
ForkJoinPool
ForkJoinPool forkJoinPool = new ForkJoinPool(4);
ForkJoinTask<List<Integer>> task =
forkJoinPool.submit(() -> {
return numbers.parallelStream()
.filter(n -> n % 2 == 0)
.sorted()
.collect(toList());
}
);
List<Integer> even = task.get();
Task submitted in 1 msecs
5000805 elements computed in 328 msecs with 4 threads
32. ParallelStream Hack benefits
A custom ExecutorService
• Does not affect other ParallelStreams
• Does not affect Common ForkJoinPool users
• Reduces unpredictable latency due to other
CommonForkJoin Pool load
• Invoking thread not used as worker (async
parallel process)
34. Blocking for IO
If firsts URLs stuck on a ConnectionTimeOut, overall
performance could be affected
Stream<String> urls =
Files.lines(Paths.get("urlsToCheck.txt"));
List<String> errors = urls.parallel().filter(url -> {
//Connect to URL and wait for 200 response or timeout
return true;
}).collect(toList());
35. Nested parallelStreams
Outer parallelStream could exhaust ForkJoin
Workers:
long start = System.currentTimeMillis();
IntStream.range(0, 10_000).parallel()
.forEach(i -> {
results[i][0] = (int) Math.round(Math.random() * 100);
IntStream.range(1, 9_999)
.parallel().forEach((int j) ->
results[i][j] =
(int) Math.round(Math.random() * 1000));
});
Process finalized in 22974 msecs
Process finalized in 22575 msecs
Process finalized in 22606 msecs
36. Nested parallelStreams
Outer parallelStream could exhaust ForkJoin
Workers:
long start = System.currentTimeMillis();
IntStream.range(0, 10_000).parallel()
.forEach(i -> {
results[i][0] = (int) Math.round(Math.random() * 100);
IntStream.range(1, 9_999)
.sequential().forEach((int j) ->
results[i][j] =
(int) Math.round(Math.random() * 1000));
});
Process finalized in 12491 msecs
Process finalized in 12589 msecs
Process finalized in 12798 msecs
38. Too much Auto(un)boxing
outboxing and boxing of Integers in every filter call
List<Integer> even = numbers.parallelStream()
.filter(n -> n % 2 == 0)
.sorted()
.collect(toList());
4999464 elements computed in 290 msecs with 8 threads
4999464 elements computed in 276 msecs with 8 threads
4999464 elements computed in 257 msecs with 8 threads
4999464 elements computed in 265 msecs with 8 threads
39. Less Auto(un)boxing
outboxing and boxing of Integers in every filter call
List<Integer> even = numbers.parallelStream()
.mapToInt(n -> n)
.filter(n -> n % 2 == 0)
.sorted()
.boxed()
.collect(toList());
4999460 elements computed in 160 msecs with 8 threads
4999460 elements computed in 243 msecs with 8 threads
4999460 elements computed in 144 msecs with 8 threads
4999460 elements computed in 140 msecs with 8 threads
41. Conclusions
ParallelStreams eases concurrent processing but:
• Understand how it works
• Don’t abuse the default common ForkJoinPool
• Don’t use when blocking by IO
• Or use a custom ForkJoinPool
• Avoid unnecessary autoboxing
• Don’t add contention or synchronisation
• Be careful with nested parallel streams
• Use method references when sorting