This document summarizes a presentation about functional programming and how functions like map(), flatMap(), and reduce() can simplify collection processing, concurrency, and big data problems. The presentation introduces functional programming concepts and how languages like Java 8 have adopted these with features like lambda expressions and streams. It provides examples of how to use streams to map, filter, and reduce collections in a more declarative way compared to imperative for loops. It also discusses how functions and futures can help simplify concurrent operations by allowing asynchronous work to be expressed more clearly.
1. @crichardson
Map(), flatMap() and reduce() are your
new best friends:
Simpler collections, concurrency, and big
data
Chris Richardson
Author of POJOs in Action
Founder of the original CloudFoundry.com
@crichardson
chris@chrisrichardson.net
http://plainoldobjects.com
4. @crichardson
About Chris
Founder of a buzzword compliant (stealthy, social, mobile,
big data, machine learning, ...) startup
Consultant helping organizations improve how they
architect and deploy applications using cloud, micro
services, polyglot applications, NoSQL, ...
10. @crichardson
Functions as first class
citizens
Assign functions to variables
Store functions in fields
Use and write higher-order functions:
Pass functions as arguments
Return functions as values
11. @crichardson
Avoids mutable state
Use:
Immutable data structures
Single assignment variables
Some functional languages such as Haskell don’t side-effects
There are benefits to immutability
Easier concurrency
More reliable code
But be pragmatic
13. @crichardson
Why functional programming?
More expressive
More concise
More intuitive - solution matches problem definition
Elimination of error-prone mutable state
Easy parallelization
16. @crichardson
Lisp = an early functional
language invented in 1958
http://en.wikipedia.org/wiki/
Lisp_(programming_language)
1940
1950
1960
1970
1980
1990
2000
2010
garbage collection
dynamic typing
self-hosting compiler
tree data structures
(defun factorial (n)
(if (<= n 1)
1
(* n (factorial (- n 1)))))
17. @crichardson
My final year project in 1985:
Implementing SASL
sieve (p:xs) =
p : sieve [x | x <- xs, rem x p > 0];
primes = sieve [2..]
A list of integers starting with 2
Filter out multiples of p
18. Mostly an Ivory Tower
technology
Lisp was used for AI
FP languages: Miranda,
ML, Haskell, ...
“Side-effects
kills kittens and
puppies”
20. @crichardson
But today FP is mainstream
Clojure - a dialect of Lisp
A hybrid OO/functional language
A hybrid OO/FP language for .NET
Java 8 has lambda expressions
21. @crichardson
Java 8 lambda expressions
are functions
x -> x * x
x -> {
for (int i = 2; i < Math.sqrt(x); i = i + 1) {
if (x % i == 0)
return false;
}
return true;
};
(x, y) -> x * x + y * y
22. @crichardson
Java 8 lambdas are a
shorthand* for an anonymous
inner class
* not exactly. See http://programmers.stackexchange.com/questions/
177879/type-inference-in-java-8
23. @crichardson
Java 8 functional interfaces
Interface with a single abstract method
e.g. Runnable, Callable, Spring’s TransactionCallback
A lambda expression is an instance of a functional
interface.
You can use a lambda wherever a function interface
“value” is expected
The type of the lambda expression is determined from it’s
context
24. @crichardson
Example Functional Interface
Function<Integer, Integer> square = x -> x * x;
BiFunction<Integer, Integer, Integer>
sumSquares = (x, y) -> x * x + y * y;
Predicate<Integer> makeIsDivisibleBy(int y) {
return x -> x % y == 0;
}
Predicate<Integer> isEven = makeIsDivisibleBy(2);
Assert.assertTrue(isEven.test(8));
Assert.assertFalse(isEven.test(11));
25. @crichardson
Example Functional Interface
ExecutorService executor = ...;
final int x = 999
Future<Boolean> outcome = executor.submit(() -> {
for (int i = 2; i < Math.sqrt(x); i = i + 1) {
if (x % i == 0)
return false;
}
return true;
}
This lambda is
a Callable
28. @crichardson
Social network example
public class Person {
enum Gender { MALE, FEMALE }
private Name name;
private LocalDate birthday;
private Gender gender;
private Hometown hometown;
private Set<Friend> friends = new HashSet<Friend>();
....
public class Friend {
private Person friend;
private LocalDate becameFriends;
...
}
public class SocialNetwork {
private Set<Person> people;
...
29. @crichardson
Mapping, filtering, and
reducing
public class Person {
public Set<Hometown> hometownsOfFriends() {
Set<Hometown> result = new HashSet<>();
for (Friend friend : friends) {
result.add(friend.getPerson().getHometown());
}
return result;
}
30. @crichardson
Mapping, filtering, and
reducing
public class Person {
public Set<Person> friendOfFriends() {
Set<Person> result = new HashSet();
for (Friend friend : friends)
for (Friend friendOfFriend : friend.getPerson().friends)
if (friendOfFriend.getPerson() != this)
result.add(friendOfFriend.getPerson());
return result;
}
31. @crichardson
Mapping, filtering, and
reducing
public class SocialNetwork {
private Set<Person> people;
...
public Set<Person> lonelyPeople() {
Set<Person> result = new HashSet<Person>();
for (Person p : people) {
if (p.getFriends().isEmpty())
result.add(p);
}
return result;
}
32. @crichardson
Mapping, filtering, and
reducing
public class SocialNetwork {
private Set<Person> people;
...
public int averageNumberOfFriends() {
int sum = 0;
for (Person p : people) {
sum += p.getFriends().size();
}
return sum / people.size();
}
33. @crichardson
Problems with this style of
programming
Low level
Imperative (how to do it) NOT declarative (what to do)
Verbose
Mutable variables are potentially error prone
Difficult to parallelize
34. @crichardson
Java 8 streams to the rescue
A sequence of elements
“Wrapper” around a collection
Streams can also be infinite
Provides a functional/lambda-based API for transforming,
filtering and aggregating elements
Much simpler, cleaner code
35. @crichardson
Using Java 8 streams -
mapping
class Person ..
private Set<Friend> friends = ...;
public Set<Hometown> hometownsOfFriends() {
return friends.stream()
.map(f -> f.getPerson().getHometown())
.collect(Collectors.toSet());
}
37. @crichardson
public class SocialNetwork {
private Set<Person> people;
...
public Set<Person> peopleWithNoFriends() {
Set<Person> result = new HashSet<Person>();
for (Person p : people) {
if (p.getFriends().isEmpty())
result.add(p);
}
return result;
}
Using Java 8 streams -
filtering
public class SocialNetwork {
private Set<Person> people;
...
public Set<Person> lonelyPeople() {
return people.stream()
.filter(p -> p.getFriends().isEmpty())
.collect(Collectors.toSet());
}
38. @crichardson
Using Java 8 streams - friend
of friends V1
class Person ..
public Set<Person> friendOfFriends() {
Set<Set<Friend>> fof = friends.stream()
.map(friend -> friend.getPerson().friends)
.collect(Collectors.toSet());
...
}
Using map()
=> Set of Sets :-(
Somehow we need to flatten
39. @crichardson
Using Java 8 streams -
mapping
class Person ..
public Set<Person> friendOfFriends() {
return friends.stream()
.flatMap(friend -> friend.getPerson().friends.stream())
.map(Friend::getPerson)
.filter(f -> f != this)
.collect(Collectors.toSet());
}
maps and flattens
41. @crichardson
Using Java 8 streams -
reducing
public class SocialNetwork {
private Set<Person> people;
...
public long averageNumberOfFriends() {
return people.stream()
.map ( p -> p.getFriends().size() )
.reduce(0, (x, y) -> x + y)
/ people.size();
} int x = 0;
for (int y : inputStream)
x = x + y
return x;
45. @crichardson
Tony’s $1B mistake
“I call it my billion-dollar mistake.
It was the invention of the null
reference in 1965....But I couldn't
resist the temptation to put in a
null reference, simply because it
was so easy to implement...”
http://qconlondon.com/london-2009/presentation/
Null+References:+The+Billion+Dollar+Mistake
46. @crichardson
Coding with null pointers
class Person
public Friend longestFriendship() {
Friend result = null;
for (Friend friend : friends) {
if (result == null ||
friend.getBecameFriends()
.isBefore(result.getBecameFriends()))
result = friend;
}
return result;
}
Friend oldestFriend = person.longestFriendship();
if (oldestFriend != null) {
...
} else {
...
}
Null check is essential yet
easily forgotten
47. @crichardson
Java 8 Optional<T>
A wrapper for nullable references
It has two states:
empty throws an exception if you try to get the reference
non-empty contain a non-null reference
Provides methods for:
testing whether it has a value
getting the value
...
Return reference wrapped in an instance of this type instead of null
48. @crichardson
Coding with optionals
class Person
public Optional<Friend> longestFriendship() {
Friend result = null;
for (Friend friend : friends) {
if (result == null ||
friend.getBecameFriends().isBefore(result.getBecameFriends()))
result = friend;
}
return Optional.ofNullable(result);
}
Optional<Friend> oldestFriend = person.longestFriendship();
// Might throw java.util.NoSuchElementException: No value present
// Person dangerous = popularPerson.get();
if (oldestFriend.isPresent) {
...oldestFriend.get()
} else {
...
}
50. @crichardson
Using Optional.map()
public class Person {
public Optional<Friend> longestFriendship() {
return ...;
}
public Optional<Long> ageDifferenceWithOldestFriend() {
Optional<Friend> oldestFriend = longestFriendship();
return oldestFriend.map ( of ->
Math.abs(of.getPerson().getAge() - getAge())) );
}
Eliminates messy conditional logic
51. @crichardson
Using flatMap()
class Person
public Optional<Friend> longestFriendship() {...}
public Optional<Friend> longestFriendshipOfLongestFriend() {
return
longestFriendship()
.flatMap(friend ->
friend.getPerson().longestFriendship());
}
not always a symmetric
relationship. :-)
53. @crichardson
Let’s imagine you are performing
a CPU intensive operation
class Person ..
public Set<Hometown> hometownsOfFriends() {
return friends.stream()
.map(f -> cpuIntensiveOperation())
.collect(Collectors.toSet());
}
54. @crichardson
class Person ..
public Set<Hometown> hometownsOfFriends() {
return friends.parallelStream()
.map(f -> cpuIntensiveOperation())
.collect(Collectors.toSet());
}
Parallel streams = simple
concurrency Potentially uses N cores
Nx speed up
56. @crichardson
The need for concurrency
Step #1
Web service request to get the user profile including wish
list (list of product Ids)
Step #2
For each productId: web service request to get product info
Sequentially terrible response time
Need fetch productInfo concurrently
57. @crichardson
Futures are a great
concurrency abstraction
http://en.wikipedia.org/wiki/Futures_and_promises
59. @crichardson
Benefits
Simple way for two concurrent activities to communicate safely
Abstraction:
Client does not know how the asynchronous operation is
implemented
Easy to implement scatter/gather:
Scatter: Client can invoke multiple asynchronous operations
and gets a Future for each one.
Gather: Get values from the futures
60. @crichardson
Example wish list service
public interface UserService {
Future<UserProfile> getUserProfile(long userId);
}
public class UserServiceProxy implements UserService {
private ExecutorService executorService;
@Override
public Future<UserProfile> getUserProfile(long userId) {
return executorService.submit(() ->
restfulGet("http://uservice/user/" + userId,
UserProfile.class));
}
...
}
public interface ProductInfoService {
Future<ProductInfo> getProductInfo(long productId);
}
61. @crichardson
public class WishlistService {
private UserService userService;
private ProductInfoService productInfoService;
public Wishlist getWishlistDetails(long userId) throws Exception {
Future<UserProfile> userProfileFuture = userService.getUserProfile(userId);
UserProfile userProfile = userProfileFuture.get(300, TimeUnit.MILLISECONDS);
Example wish list service
get user
info
List<Future<ProductInfo>> productInfoFutures =
userProfile.getWishListProductIds().stream()
.map(productInfoService::getProductInfo)
.collect(Collectors.toList());
long deadline = System.currentTimeMillis() + 300;
List<ProductInfo> products = new ArrayList<ProductInfo>();
for (Future<ProductInfo> pif : productInfoFutures) {
long timeout = deadline - System.currentTimeMillis();
if (timeout <= 0) throw new TimeoutException(...);
products.add(pif.get(timeout, TimeUnit.MILLISECONDS));
}
...
return new Wishlist(products);
}
asynchronously
get all products
wait for
product
info
63. @crichardson
Better: Futures with callbacks
no blocking!
def asyncSquare(x : Int)
: Future[Int] = ... x * x...
val f = asyncSquare(25)
Guava ListenableFutures, Spring 4 ListenableFuture
Java 8 CompletableFuture, Scala Futures
f onSuccess {
case x : Int => println(x)
}
f onFailure {
case e : Exception => println("exception thrown")
}
Partial function applied to
successful outcome
Applied to failed outcome
71. @crichardson
Introducing Reactive
Extensions (Rx)
The Reactive Extensions (Rx) is a library for composing
asynchronous and event-based programs using
observable sequences and LINQ-style query operators.
Using Rx, developers represent asynchronous data
streams with Observables , query asynchronous
data streams using LINQ operators , and .....
https://rx.codeplex.com/
72. @crichardson
About RxJava
Reactive Extensions (Rx) for the JVM
Original motivation for Netflix was to provide rich Futures
Implemented in Java
Adaptors for Scala, Groovy and Clojure
https://github.com/Netflix/RxJava
73. @crichardson
RxJava core concepts
trait Observable[T] {
def subscribe(observer : Observer[T]) : Subscription
...
}
trait Observer[T] {
def onNext(value : T)
def onCompleted()
def onError(e : Throwable)
}
Notifies
An asynchronous stream of items
Used to
unsubscribe
74. Comparing Observable to...
Observer pattern - similar but
adds
Observer.onComplete()
Observer.onError()
Iterator pattern - mirror image
Push rather than pull
Futures - similar
Can be used as Futures
But Observables = a stream
of multiple values
Collections and Streams -
similar
Functional API supporting
map(), flatMap(), ...
But Observables are
asynchronous
75. @crichardson
Fun with observables
val every10Seconds = Observable.interval(10 seconds)
-1 0 1 ...
t=0 t=10 t=20 ...
val oneItem = Observable.items(-1L)
val ticker = oneItem ++ every10Seconds
val subscription = ticker.subscribe { (value: Long) => println("value=" + value) }
...
subscription.unsubscribe()
76. @crichardson
def getTableStatus(tableName: String) : Observable[DynamoDbStatus]=
Observable { subscriber: Subscriber[DynamoDbMessage] =>
}
Connecting observables to the
outside world
amazonDynamoDBAsyncClient.describeTableAsync(
new DescribeTableRequest(tableName),
new AsyncHandler[DescribeTableRequest, DescribeTableResult] {
override def onSuccess(request: DescribeTableRequest,
result: DescribeTableResult) = {
subscriber.onNext(DynamoDbStatus(result.getTable.getTableStatus))
subscriber.onCompleted()
}
override def onError(exception: Exception) = exception match {
case t: ResourceNotFoundException =>
subscriber.onNext(DynamoDbStatus("NOT_FOUND"))
subscriber.onCompleted()
case _ =>
subscriber.onError(exception)
}
})
}
78. @crichardson
Calculating rolling average
class AverageTradePriceCalculator {
def calculateAverages(trades: Observable[Trade]):
Observable[AveragePrice] = {
...
}
case class Trade(
symbol : String,
price : Double,
quantity : Int
...
)
case class AveragePrice(
symbol : String,
price : Double,
...)
79. @crichardson
Calculating average pricesdef calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = {
trades.groupBy(_.symbol).map { symbolAndTrades =>
val (symbol, tradesForSymbol) = symbolAndTrades
val openingEverySecond =
Observable.items(-1L) ++ Observable.interval(1 seconds)
def closingAfterSixSeconds(opening: Any) =
Observable.interval(6 seconds).take(1)
tradesForSymbol.window(...).map {
windowOfTradesForSymbol =>
windowOfTradesForSymbol.fold((0.0, 0, List[Double]())) { (soFar, trade) =>
val (sum, count, prices) = soFar
(sum + trade.price, count + trade.quantity, trade.price +: prices)
} map { x =>
val (sum, length, prices) = x
AveragePrice(symbol, sum / length, prices)
}
}.flatten
}.flatten
}
84. @crichardson
Apache Hadoop
Open-source software for reliable, scalable, distributed computing
Hadoop Distributed File System (HDFS)
Efficiently stores very large amounts of data
Files are partitioned and replicated across multiple machines
Hadoop MapReduce
Batch processing system
Provides plumbing for writing distributed jobs
Handles failures
...
86. @crichardson
MapReduce Word count -
mapperclass Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
(“Four”, 1), (“score”, 1), (“and”, 1), (“seven”, 1), ...
Four score and seven years
http://wiki.apache.org/hadoop/WordCount
88. @crichardson
MapReduce Word count -
reducer
class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key,
Iterable<IntWritable> values, Context context) {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
(“the”, 11)
(“the”, (1, 1, 1, 1, 1, 1, ...))
http://wiki.apache.org/hadoop/WordCount
89. @crichardson
About MapReduce
Very simple programming abstract yet incredibly powerful
By chaining together multiple map/reduce jobs you can process
very large amounts of data
e.g. Apache Mahout for machine learning
But
Mappers and Reducers = verbose code
Development is challenging, e.g. unit testing is difficult
It’s disk-based, batch processing slow
90. @crichardson
Scalding: Scala DSL for
MapReduce
class WordCountJob(args : Args) extends Job(args) {
TextLine( args("input") )
.flatMap('line -> 'word) { line : String => tokenize(line) }
.groupBy('word) { _.size }
.write( Tsv( args("output") ) )
def tokenize(text : String) : Array[String] = {
text.toLowerCase.replaceAll("[^a-zA-Z0-9s]", "")
.split("s+")
}
}
https://github.com/twitter/scalding
Expressive and unit testable
Each row is a map of
named fields
91. @crichardson
Apache Spark
Part of the Hadoop ecosystem
Key abstraction = Resilient Distributed Datasets (RDD)
Collection that is partitioned across cluster members
Operations are parallelized
Created from either a Scala collection or a Hadoop
supported datasource - HDFS, S3 etc
Can be cached in-memory for super-fast performance
Can be replicated for fault-tolerance
http://spark.apache.org
92. @crichardson
Spark Word Count
val sc = new SparkContext(...)
sc.textFile(“s3n://mybucket/...”)
.flatMap { _.split(" ")}
.groupBy(identity)
.mapValues(_.length)
.toArray.toMap
}
}
Expressive, unit testable and very fast
93. @crichardson
Summary
Functional programming enables the elegant expression of
good ideas in a wide variety of domains
map(), flatMap() and reduce() are remarkably versatile
higher-order functions
Use FP and OOP together
Java 8 has taken a good first step towards supporting FP