Higher-order functions such as map(), flatmap(), filter() and reduce() have their origins in mathematics and ancient functional programming languages such as Lisp. But today they have entered the mainstream and are available in languages such as JavaScript, Scala and Java 8. They are well on their way to becoming an essential part of every developer’s toolbox. In this talk you will learn how these and other higher-order functions enable you to write simple, expressive and concise code that solve problems in a diverse set of domains. We will describe how you use them to process collections in Java and Scala. You will learn how functional Futures and Rx (Reactive Extensions) Observables simplify concurrent code. We will even talk about how to write big data applications in a functional style using libraries such as Scalding.
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
Map, flatmap and reduce are your new best friends (javaone, svcc)
1. Map(), flatMap() and reduce() are your
@crichardson
new best friends:
Simpler collections, concurrency, and big
data
Chris Richardson
Author of POJOs in Action
Founder of the original CloudFoundry.com
@crichardson
chris@chrisrichardson.net
http://plainoldobjects.com
2. Presentation goal
How functional programming simplifies
@crichardson
your code
Show that
map(), flatMap() and reduce()
are remarkably versatile functions
4. @crichardson
About Chris
Founder of a buzzword compliant (stealthy, social, mobile,
big data, machine learning, ...) startup
Consultant helping organizations improve how they
architect and deploy applications using cloud, micro
services, polyglot applications, NoSQL, ...
5. @crichardson
Agenda
Why functional programming?
Simplifying collection processing
Eliminating NullPointerExceptions
Simplifying concurrency with Futures and Rx Observables
Tackling big data problems with functional programming
6. Functional programming is a programming
@crichardson
paradigm
Functions are the building blocks of the
application
Best done in a functional
programming language
7. @crichardson
Functions as first class
citizens
Assign functions to variables
Store functions in fields
Use and write higher-order functions:
Take functions as parameters
Return functions as values
8. @crichardson
Avoids mutable state
Use:
Immutable data structures
Single assignment variables
Some functional languages such as Haskell don’t allow
side-effects
9. Why functional programming?
"the highest goal of
programming-language
design to enable good
ideas to be elegantly
@crichardson
expressed"
http://en.wikipedia.org/wiki/Tony_Hoare
10. Why functional programming?
@crichardson
More expressive
More concise
More intuitive - solution matches problem definition
Functional code is usually much more composable
Immutable state:
Less error-prone
Easy parallelization and concurrency
But be pragmatic
13. @crichardson
Lisp = an early functional
language invented in 1958
http://en.wikipedia.org/wiki/
Lisp_(programming_language)
2010
2000
1990
1980
1970
1960
1950
1940
garbage collection
dynamic typing
self-hosting compiler
tree data structures
(defun factorial (n)
(if (<= n 1)
1
(* n (factorial (- n 1)))))
14. My final year project in 1985:
Implementing SASL in LISP
Filter out multiples of p
sieve (p:xs) =
p : sieve [x | x <- xs, rem x p > 0];
primes = sieve [2..]
@crichardson
A list of integers starting with 2
15. Mostly an Ivory Tower
technology
Lisp was used for AI
FP languages: Miranda,
ML, Haskell, ...
“Side-effects
kills kittens and
puppies”
17. But today FP is mainstream
@crichardson
Clojure - a dialect of Lisp
A hybrid OO/functional language
A hybrid OO/FP language for .NET
Java 8 has lambda expressions
18. @crichardson
Java 8 lambda expressions
are functions
x -> x * x
x -> {
for (int i = 2; i < Math.sqrt(x); i = i + 1) {
if (x % i == 0)
return false;
}
return true;
};
(x, y) -> x * x + y * y
19. @crichardson
Agenda
Why functional programming?
Simplifying collection processing
Eliminating NullPointerExceptions
Simplifying concurrency with Futures and Rx Observables
Tackling big data problems with functional programming
20. @crichardson
Lot’s of application code
=
collection processing:
Mapping, filtering, and reducing
21. @crichardson
Social network example
public class Person {
enum Gender { MALE, FEMALE }
private Name name;
private LocalDate birthday;
private Gender gender;
private Hometown hometown;
private Set<Friend> friends = new HashSet<Friend>();
....
public class Friend {
private Person friend;
private LocalDate becameFriends;
...
}
public class SocialNetwork {
private Set<Person> people;
...
22. @crichardson
Mapping, filtering, and
reducing
public class Person {
public Set<Hometown> hometownsOfFriends() {
Set<Hometown> result = new HashSet<>();
for (Friend friend : friends) {
result.add(friend.getPerson().getHometown());
}
return result;
}
Declare result variable
Modify result
Return result
Iterate
23. @crichardson
Mapping, filtering, and
reducing
public class SocialNetwork {
private Set<Person> people;
...
public Set<Person> lonelyPeople() {
Set<Person> result = new HashSet<Person>();
for (Person p : people) {
if (p.getFriends().isEmpty())
result.add(p);
}
return result;
}
Declare result variable
Modify result
Return result
Iterate
24. Iterate
@crichardson
Mapping, filtering, and
reducing
public class SocialNetwork {
private Set<Person> people;
...
public int averageNumberOfFriends() {
int sum = 0;
for (Person p : people) {
sum += p.getFriends().size();
}
return sum / people.size();
}
Declare scalar result
variable
Modify result
Return result
25. @crichardson
Problems with this style of
programming
Lots of verbose boilerplate - basic operations require 5+
LOC
Imperative (how to do it) NOT declarative (what to do)
Mutable variables are potentially error prone
Difficult to parallelize
26. Java 8 streams to the rescue
A sequence of elements
“Wrapper” around a collection
Streams are lazy, i.e. can be infinite
Provides a functional/lambda-based API for transforming,
filtering and aggregating elements
Much simpler, cleaner and
@crichardson
declarative code
27. @crichardson
Using Java 8 streams -
mapping
class Person ..
private Set<Friend> friends = ...;
public Set<Hometown> hometownsOfFriends() {
return friends.stream()
.map(f -> f.getPerson().getHometown())
.collect(Collectors.toSet());
}
transforming
lambda expression
28. @crichardson
The map() function
s1 a b c d e ...
s2 = s1.map(f)
s2 f(a) f(b) f(c) f(d) f(e) ...
29. @crichardson
Using Java 8 streams -
filtering
public class SocialNetwork {
private Set<Person> people;
...
public Set<Person> lonelyPeople() {
return people.stream()
.filter(p -> p.getFriends().isEmpty())
.collect(Collectors.toSet());
}
predicate
lambda expression
30. Using Java 8 streams - friend
of friends V1
@crichardson
class Person ..
public Set<Person> friendOfFriends() {
Set<Set<Friend>> fof = friends.stream()
.map(friend -> friend.getPerson().friends)
.collect(Collectors.toSet());
...
}
Using map()
=> Set of Sets :-(
Somehow we need to flatten
31. @crichardson
Using Java 8 streams -
mapping
class Person ..
public Set<Person> friendOfFriends() {
return friends.stream()
.flatMap(friend -> friend.getPerson().friends.stream())
.map(Friend::getPerson)
.filter(person -> person != this)
.collect(Collectors.toSet());
}
maps and flattens
33. @crichardson
Using Java 8 streams -
reducing
public class SocialNetwork {
private Set<Person> people;
...
public long averageNumberOfFriends() {
return people.stream()
.map ( p -> p.getFriends().size() )
.reduce(0, (x, y) -> x + y)
/ people.size();
} int x = 0;
for (int y : inputStream)
x = x + y
return x;
34. @crichardson
The reduce() function
s1 a b c d e ...
x = s1.reduce(initial, f)
f(f(f(f(f(f(initial, a), b), c), d), e), ...)
35. @crichardson
Newton's method for
calculating sqrt(x)
It’s an iterative algorithm
initial value = guess
betterValue = value - (value * value - x) / (2 * value)
Iterate until |value - betterValue| < precision
36. Functional square root in Scala
Creates an infinite stream:
seed, f(seed), f(f(seed)), .....
@crichardson
package net.chrisrichardson.fp.scala.squareroot
object SquareRootCalculator {
def squareRoot(x: Double, precision: Double) : Double =
Stream.iterate(x / 2)(
value => value - (value * value - x) / (2 * value) ).
sliding(2).map( s => (s.head, s.last)).
find { case (value , newValue) =>
Math.abs(value - newValue) < precision}.
get._2
}
a, b, c, ... =>
(a, b), (b, c), (c, ...), ...
Find the first convergent
approximation
37. @crichardson
Adopting FP with Java 8 is
straightforward
Switch your application to Java 8
Start using streams and lambdas
Eclipse can refactor anonymous inner
classes to lambdas
Or write modules in Scala: more
expressive and runs on older JVMs
38. @crichardson
Agenda
Why functional programming?
Simplifying collection processing
Eliminating NullPointerExceptions
Simplifying concurrency with Futures and Rx Observables
Tackling big data problems with functional programming
39. @crichardson
Tony’s $1B mistake
“I call it my billion-dollar mistake.
It was the invention of the null
reference in 1965....But I couldn't
resist the temptation to put in a
null reference, simply because it
was so easy to implement...”
http://qconlondon.com/london-2009/presentation/
Null+References:+The+Billion+Dollar+Mistake
40. Return null if no friends
@crichardson
Coding with null pointers
class Person
public Friend longestFriendship() {
Friend result = null;
for (Friend friend : friends) {
if (result == null ||
friend.getBecameFriends()
.isBefore(result.getBecameFriends()))
result = friend;
}
return result;
}
Friend oldestFriend = person.longestFriendship();
if (oldestFriend != null) {
...
} else {
...
}
Null check is essential yet
easily forgotten
41. @crichardson
Java 8 Optional<T>
A wrapper for nullable references
It has two states:
empty ⇒ throws an exception if you try to get the reference
non-empty ⇒ contain a non-null reference
Provides methods for: testing whether it has a value, getting the
value, ...
Use an Optional<T> parameter if caller can pass in null
Return reference wrapped in an instance of this type instead of null
Uses the type system to explicitly represent
nullability
42. @crichardson
Coding with optionals
class Person
public Optional<Friend> longestFriendship() {
Friend result = null;
for (Friend friend : friends) {
if (result == null ||
friend.getBecameFriends().isBefore(result.getBecameFriends()))
result = friend;
}
return Optional.ofNullable(result);
}
Optional<Friend> oldestFriend = person.longestFriendship();
// Might throw java.util.NoSuchElementException: No value present
// Person dangerous = popularPerson.get();
if (oldestFriend.isPresent) {
...oldestFriend.get()
} else {
...
}
44. @crichardson
Transforming with map()
public class Person {
public Optional<Friend> longestFriendship() {
return ...;
}
public Optional<Long> ageDifferenceWithOldestFriend() {
Optional<Friend> oldestFriend = longestFriendship();
return oldestFriend.map ( of ->
Math.abs(of.getPerson().getAge() - getAge())) );
}
Eliminates messy conditional logic
45. @crichardson
Chaining with flatMap()
class Person
public Optional<Friend> longestFriendship() {...}
public Optional<Friend> longestFriendshipOfLongestFriend() {
return
longestFriendship()
.flatMap(friend ->
friend.getPerson().longestFriendship());
}
not always a symmetric
relationship. :-)
46. @crichardson
Agenda
Why functional programming?
Simplifying collection processing
Eliminating NullPointerExceptions
Simplifying concurrency with Futures and Rx Observables
Tackling big data problems with functional programming
47. Let’s imagine you are performing
a CPU intensive operation
@crichardson
class Person ..
public Set<Hometown> hometownsOfFriends() {
return friends.stream()
.map(f -> cpuIntensiveOperation(f))
.collect(Collectors.toSet());
}
48. Parallel streams = simple
concurrency Potentially uses N cores
@crichardson
class Person ..
public Set<Hometown> hometownsOfFriends() {
return friends.parallelStream()
.map(f -> cpuIntensiveOperation(f))
.collect(Collectors.toSet());
}
⇒
Nx speed up
Perhaps this will be faster.
Perhaps not
49. Let’s imagine that you are
writing code to display the
products in a user’s wish list
@crichardson
50. @crichardson
The need for concurrency
Step #1
Web service request to get the user profile including wish
list (list of product Ids)
Step #2
For each productId: web service request to get product info
Sequentially ⇒ terrible response time
Need fetch productInfo concurrently
Composing sequential + scatter/gather-style
operations is very common
51. @crichardson
Futures are a great
concurrency abstraction
http://en.wikipedia.org/wiki/Futures_and_promises
52. Composition with futures
Worker thread or
event-driven code
@crichardson
Main thread
Future 1
Outcome
Future 2
Client
get Asynchronous
operation 2
set
initiates
Asynchronous
operation 1
Outcome
get
set
53. @crichardson
Benefits
Simple way for multiple concurrent activities to communicate
safely
Abstraction:
Client does not know how the asynchronous operation is
implemented, e.g. thread pool, event-driven, ....
Easy to implement scatter/gather:
Scatter: Client can invoke multiple asynchronous operations
and gets a Future for each one.
Gather: Get values from the futures
54. @crichardson
But composition with basic
futures is difficult
Java 7 future.get([timeout]):
Blocking API ⇒ client blocks thread ⇒ poor scalability
Difficult to compose multiple concurrent operations
Futures with callbacks:
e.g. Guava ListenableFutures, Spring 4 ListenableFuture
Attach callbacks to all futures and asynchronously consume outcomes
But callback-based code = messy code
See http://techblog.netflix.com/2013/02/rxjava-netflix-api.html
We need functional futures!
55. Asynchronously
transforms future
Calls asyncSquare() with the eventual
outcome of asyncPlus(), i.e. chaining
@crichardson
Functional futures - Scala, Java 8
CompletableFuture
def asyncPlus(x : Int, y :Int): Future[Int] = ... x + y ...
val future2 = asyncPlus(4, 5).map{ _ * 3 }
assertEquals(27, Await.result(future2, 1 second))
def asyncSquare(x : Int) : Future[Int] = ... x * x ...
val f2 = asyncPlus(5, 8).flatMap { x => asyncSquare(x) }
assertEquals(169, Await.result(f2, 1 second))
56. map() etc are asynchronous
outcome2 = someFn(outcome1)
@crichardson
outcome2
f2
Outcome1
f1
f2 = f1 map (someFn)
Implemented using callbacks
59. Your mouse is your database
@crichardson
Erik Meijer
http://queue.acm.org/detail.cfm?id=2169076
60. @crichardson
Introducing Reactive
Extensions (Rx)
The Reactive Extensions (Rx) is a library for composing
asynchronous and event-based programs ....
Using Rx, developers represent asynchronous data
streams with Observables , query asynchronous
data streams using LINQ operators , and .....
https://rx.codeplex.com/
61. @crichardson
About RxJava
Reactive Extensions (Rx) for the JVM
Developed by Netflix
Original motivation was to provide rich, functional Futures
Implemented in Java
Adaptors for Scala, Groovy and Clojure
Embraced by Akka and Spring Reactor: http://www.reactive-streams.
org/
https://github.com/Netflix/RxJava
62. An asynchronous stream of items
@crichardson
RxJava core concepts
trait Observable[T] {
def subscribe(observer : Observer[T]) : Subscription
...
}
Notifies
trait Observer[T] {
def onNext(value : T)
def onCompleted()
def onError(e : Throwable)
}
Used to
unsubscribe
63. Comparing Observable to...
Observer pattern - similar but
adds
Observer.onComplete()
Observer.onError()
Iterator pattern - mirror image
Push rather than pull
Futures - similar
Can be used as Futures
But Observables = a stream
of multiple values
Collections and Streams -
similar
Functional API supporting
map(), flatMap(), ...
But Observables are
asynchronous
64. val subscription = ticker.subscribe { (value: Long) => println("value=" + value) }
...
subscription.unsubscribe()
@crichardson
Fun with observables
val oneItem = Observable.items(-1L)
val every10Seconds = Observable.interval(10 seconds)
val ticker = oneItem ++ every10Seconds
-1 0 1 ...
t=0 t=10 t=20 ...
65. Observables as the result of an
asynchronous operation
@crichardson
def getTableStatus(tableName: String) : Observable[DynamoDbStatus]=
Observable { subscriber: Subscriber[DynamoDbStatus] =>
}
amazonDynamoDBAsyncClient.describeTableAsync(
new DescribeTableRequest(tableName),
new AsyncHandler[DescribeTableRequest, DescribeTableResult] {
override def onSuccess(request: DescribeTableRequest,
result: DescribeTableResult) = {
subscriber.onNext(DynamoDbStatus(result.getTable.getTableStatus))
subscriber.onCompleted()
}
override def onError(exception: Exception) = exception match {
case t: ResourceNotFoundException =>
subscriber.onNext(DynamoDbStatus("NOT_FOUND"))
subscriber.onCompleted()
case _ =>
subscriber.onError(exception)
}
})
}
66. @crichardson
Transforming/chaining
observables with flatMap()
val tableStatus = ticker.flatMap { i =>
logger.info("{}th describe table", i + 1)
getTableStatus(name)
}
Status1 Status2 Status3 ...
t=0 t=10 t=20 ...
+ Usual collection methods: map(), filter(), take(), drop(), ...
67. @crichardson
Calculating rolling average
class AverageTradePriceCalculator {
def calculateAverages(trades: Observable[Trade]):
Observable[AveragePrice] = {
...
}
case class Trade(
symbol : String,
price : Double,
quantity : Int
...
)
case class AveragePrice(
symbol : String,
price : Double,
...)
68. @crichardson
Calculating average prices
def calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = {
trades.groupBy(_.symbol).map { symbolAndTrades =>
val (symbol, tradesForSymbol) = symbolAndTrades
val openingEverySecond =
Observable.items(-1L) ++ Observable.interval(1 seconds)
def closingAfterSixSeconds(opening: Any) =
Observable.interval(6 seconds).take(1)
tradesForSymbol.window(...).map {
windowOfTradesForSymbol =>
windowOfTradesForSymbol.fold((0.0, 0, List[Double]())) { (soFar, trade) =>
val (sum, count, prices) = soFar
(sum + trade.price, count + trade.quantity, trade.price +: prices)
} map { x =>
val (sum, length, prices) = x
AveragePrice(symbol, sum / length, prices)
}
}.flatten
}.flatten
}
69. @crichardson
Agenda
Why functional programming?
Simplifying collection processing
Eliminating NullPointerExceptions
Simplifying concurrency with Futures and Rx Observables
Tackling big data problems with functional programming
71. @crichardson
Scala Word Count
val frequency : Map[String, Int] =
Source.fromFile("gettysburgaddress.txt").getLines()
.flatMap { _.split(" ") }.toList
.groupBy(identity)
.mapValues(_.length))
frequency("THE") should be(11)
frequency("LIBERTY") should be(1)
Map
Reduce
72. But how to scale to a cluster
@crichardson
of machines?
73. @crichardson
Apache Hadoop
Open-source ecosystem for reliable, scalable, distributed computing
Hadoop Distributed File System (HDFS)
Efficiently stores very large amounts of data
Files are partitioned and replicated across multiple machines
Hadoop MapReduce
Batch processing system
Provides plumbing for writing distributed jobs
Handles failures
And, much, much more...
74. @crichardson
Overview of MapReduce
Input
Data
Mapper
Mapper
Mapper
Reducer
Reducer
Reducer
Out
put
Data
Shuffle
(K,V)
(K,V)
(K,V)
(K,V)*
(K,V)*
(K,V)*
(K1,V, ....)*
(K2,V, ....)*
(K3,V, ....)*
(K,V)
(K,V)
(K,V)
75. http://wiki.apache.org/hadoop/WordCount
@crichardson
MapReduce Word count -
mapper
class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
Four score and seven years
⇒
(“Four”, 1), (“score”, 1), (“and”, 1), (“seven”, 1), ...
77. @crichardson
MapReduce Word count -
reducer
class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key,
Iterable<IntWritable> values, Context context) {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
(“the”, (1, 1, 1, 1, 1, 1, ...))
⇒
(“the”, 11)
http://wiki.apache.org/hadoop/WordCount
78. @crichardson
About MapReduce
Very simple programming abstraction yet incredibly powerful
By chaining together multiple map/reduce jobs you can process
very large amounts of data in interesting ways
e.g. Apache Mahout for machine learning
But
Mappers and Reducers = verbose code
Development is challenging, e.g. unit testing is difficult
It’s disk-based, batch processing ⇒ slow
79. Each row is a map of
@crichardson
Scalding: Scala DSL for
MapReduce
class WordCountJob(args : Args) extends Job(args) {
TextLine( args("input") )
.flatMap('line -> 'word) { line : String => tokenize(line) }
.groupBy('word) { _.size }
.write( Tsv( args("output") ) )
def tokenize(text : String) : Array[String] = {
text.toLowerCase.replaceAll("[^a-zA-Z0-9s]", "")
.split("s+")
}
}
Expressive and unit testable
https://github.com/twitter/scalding
named fields
80. @crichardson
Apache Spark
Created at UC Berkeley and now part of the Hadoop ecosystem
Key abstraction = Resilient Distributed Datasets (RDD)
Collection that is partitioned across cluster members
Operations are parallelized
Created from either a collection or a Hadoop supported datasource -
HDFS, S3 etc
Can be cached in-memory for super-fast performance
Can be replicated for fault-tolerance
Scala, Java, and Python APIs
http://spark.apache.org
81. Spark Word Count
val sc = new SparkContext(...)
sc.textFile(“s3n://mybucket/...”)
Very similar to
Scala collection
@crichardson
.flatMap { _.split(" ")}
.groupBy(identity)
.mapValues(_.length)
.toArray.toMap
}
code!!
}
Expressive, unit testable and very fast
82. @crichardson
Summary
Functional programming enables the elegant expression of
good ideas in a wide variety of domains
map(), flatMap() and reduce() are remarkably versatile
higher-order functions
Use FP and OOP together
Java 8 has taken a good first step towards supporting FP
Go write some functional code!