Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Oleksiy Dyagilev
• lead software engineer in epam
• working on scalable computing and data grids (GigaSpaces, Storm, Spark)
• blog http://d...
• Abstract Algebra (1900s?) and Category Theory (1940s)
• Mathematicians study abstract structures and relationships betwe...
• Abstract Algebra (1900s?) and Category Theory (1940s)
• Mathematicians study abstract structures and relationships betwe...
• Abstract Algebra (1900s?) and Category Theory (1940s)
• Mathematicians study abstract structures and relationships betwe...
• Abstract Algebra (1900s?) and Category Theory (1940s)
• Mathematicians study abstract structures and relationships betwe...
• Abstract Algebra (1900s?) and Category Theory (1940s)
• Mathematicians study abstract structures and relationships betwe...
• How abstractions from Math (Category Theory, Abstract Algebra) help in functional programming & Big Data
• How to levera...
User user = findUser(userId);
if (user != null) {
Address address = user.getAddress();
if (address != null) {
String zipCo...
Optional<String> cityName = findUser(userId)
.flatMap(user -> user.getAddress())
.flatMap(address -> address.getZipCode())...
Stream<Employee> employees = companies.stream()
.flatMap(company -> company.departments())
.flatMap(department -> departme...
• container with a type M<T> (e.g. Optional<T>)
• method M<U> flatMap(T -> M<U>) (e.g. flatMap(T -> Optional<U>))
• constr...
• container with a type M<T> (e.g. Optional<T>)
• method M<U> flatMap(T -> M<U>) (e.g. flatMap(T -> Optional<U>))
• constr...
• container with a type M<T> (e.g. Optional<T>)
• method M<U> flatMap(T -> M<U>) (e.g. flatMap(T -> Optional<U>))
• constr...
Optional<User> user = findUser(userId);
Optional<Order> order = findOrder(orderId);
Optional<Payment> payment = findPaymen...
Optional<User> user = findUser(userId);
Optional<Order> order = findOrder(orderId);
Optional<Payment> payment = findPaymen...
Optional<User> user = findUser(userId);
Optional<Order> order = findOrder(orderId);
Optional<Payment> payment = findPaymen...
trait Parser[T] extends (String => ParseResult[T])
sealed abstract class ParseResult[T]
case class Success[T](result: T, r...
trait Parser[T] extends (String => ParseResult[T])
sealed abstract class ParseResult[T]
case class Success[T](result: T, r...
scala.Option java.Optional Absence of value
scala.List java.Stream Multiple results
scala.Future scalaz.Task java.Completa...
• Remove boilerplate
• Modularity: separate computations from combination strategy
• Composability: compose computations f...
New data
All data Batch view
Real-time view
Data
stream
Batch processing
Real-time processing
Serving layer
Query
and merge
• Write job logic once and run on many Platforms(Hadoop, Storm)
• Library authors talk about monoids all the time 
• Write job logic once and run on many Platforms(Hadoop, Storm)
• Library authors talk about monoids all the time 
def wo...
• Write job logic once and run on many Platforms(Hadoop, Storm)
• Library authors talk about monoids all the time 
def wo...
Given a set S and a binary operation +, we say that (𝑠, +) is a Semigroup if ∀ 𝑥, 𝑦, 𝑧 ∈ 𝑆:
• Closure: 𝑥 + 𝑦 ∈ 𝑆
• Associa...
Input
data
map
map
map
map
reduce
reduce
reduce
output
Having a sequence of elements of monoid M,
we can reduce them into ...
Associativity: (𝑥 + 𝑦) + 𝑧 = 𝑥 + (𝑦 + 𝑧)
General Associativity Theorem
https://proofwiki.org/wiki/General_Associativity_Th...
𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 ℎ
+ + + +
+
+
+
𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 ℎ
+ + + +
+
+
+
a b c d e f g h
a + b + c + d + e + fBatch processing
Real-time processing
𝐵0 𝐵1 𝐵2 𝐵3 𝐵4 𝐵5 𝐵6 𝐵7
time1h now
Real-time su...
a b c d e f g h
a + b + c + d + e + fBatch processing
Real-time processing
𝐵0 𝐵1 𝐵2 𝐵3 𝐵4 𝐵5 𝐵6 𝐵7
time1h now
Query
and su...
Bloom filter is a space-efficient probabilistic data structure to test presence of an element in a set
0 0 0 0 0 0 0 0 0 0...
0 0 1 0 0 0 0 1 0 1 0 0
ℎ1(𝑒) ℎ2(𝑒) … ℎ 𝑘(𝑒)
𝑒
set bit value to 1
0 0 1 0 1 0 1 1 0 0 0 0
ℎ1(𝑒) ℎ2(𝑒) … ℎ 𝑘(𝑒)
𝑒
check if all bits are set to 1
0 0 1 0 1 0 0 1 0 0 0 0Filter A: {𝑒1, 𝑒2, 𝑒3}
1 0 1 0 0 0 0 0 1 0 0 0Filter B: {𝑒4, 𝑒5, 𝑒6}
+ OR
1 0 1 0 1 0 0 1 1 0 0 0Fi...
A few can be found in in Algebird (Abstract Algebra for Scala) https://github.com/twitter/algebird/
• Bloom Filter
• Hyper...
• Monad is just a useful pattern in functional programming
• You don’t need to understand Category Theory to use Monads
• ...
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
Monads and Monoids by Oleksiy Dyagilev
Prochain SlideShare
Chargement dans…5
×

Monads and Monoids by Oleksiy Dyagilev

448 vues

Publié le

Monads and Monoids: from daily java to Big Data analytics in Scala

Finally, after two decades of evolution, Java 8 made a step towards functional programming. What can Java learn from other mature functional languages? How to leverage obscure mathematical abstractions such as Monad or Monoid in practice? Usually people find it scary and difficult to understand. Oleksiy will explain these concepts in simple words to give a feeling of powerful tool applicable in many domains, from daily Java and Scala routines to Big Data analytics with Storm or Hadoop.

Publié dans : Technologie
  • Soyez le premier à commenter

Monads and Monoids by Oleksiy Dyagilev

  1. 1. Oleksiy Dyagilev
  2. 2. • lead software engineer in epam • working on scalable computing and data grids (GigaSpaces, Storm, Spark) • blog http://dyagilev.org
  3. 3. • Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians study abstract structures and relationships between them
  4. 4. • Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians study abstract structures and relationships between them • Early of 1990s, Eugenio Moggi described the general use of monad to structure programs
  5. 5. • Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians study abstract structures and relationships between them • Early of 1990s, Eugenio Moggi described the general use of monad to structure programs • Early of 1990s, monad appeared in Haskell, a purely functional language. As well as other concepts such as Functor, Monoid, Arrow, etc
  6. 6. • Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians study abstract structures and relationships between them • Early of 1990s, Eugenio Moggi described the general use of monad to structure programs • Early of 1990s, monad appeared in Haskell, a purely functional language. As well as other concepts such as Functor, Monoid, Arrow, etc • 2003, Martin Odersky creates Scala, a languages that unifies object- oriented and functional paradigms. Influenced by Haskell, Java, Erlang, etc.
  7. 7. • Abstract Algebra (1900s?) and Category Theory (1940s) • Mathematicians study abstract structures and relationships between them • Early of 1990s, Eugenio Moggi described the general use of monad to structure programs • Early of 1990s, monad appeared in Haskell, a purely functional language. As well as other concepts such as Functor, Monoid, Arrow, etc • 2003, Martin Odersky creates Scala, a languages that unifies object- oriented and functional paradigms. Influenced by Haskell, Java, Erlang, etc. • 2014, Java 8 released. Functional programming support – lambda, streams
  8. 8. • How abstractions from Math (Category Theory, Abstract Algebra) help in functional programming & Big Data • How to leverage them and become a better programmer
  9. 9. User user = findUser(userId); if (user != null) { Address address = user.getAddress(); if (address != null) { String zipCode = address.getZipCode(); if (zipCode != null) { City city = findCityByZipCode(zipCode); if (city != null) { return city.getName(); } } } } return null; Example #1
  10. 10. Optional<String> cityName = findUser(userId) .flatMap(user -> user.getAddress()) .flatMap(address -> address.getZipCode()) .flatMap(zipCode -> findCityByZipCode(zipCode)) .map(city -> city.getName()); which may not return a result. Refactored with Optional
  11. 11. Stream<Employee> employees = companies.stream() .flatMap(company -> company.departments()) .flatMap(department -> department.employees()); Example #2 which can return several values.
  12. 12. • container with a type M<T> (e.g. Optional<T>) • method M<U> flatMap(T -> M<U>) (e.g. flatMap(T -> Optional<U>)) • constructor to put T into M<T>; same as a static method M<T> unit(T) (e.g. Optional.of(x))
  13. 13. • container with a type M<T> (e.g. Optional<T>) • method M<U> flatMap(T -> M<U>) (e.g. flatMap(T -> Optional<U>)) • constructor to put T into M<T>; same as a static method M<T> unit(T) (e.g. Optional.of(x)) M<U> map(f) { return flatMap(x -> unit(f(x))) } Bonus: now we can define M<U> map(T -> U)
  14. 14. • container with a type M<T> (e.g. Optional<T>) • method M<U> flatMap(T -> M<U>) (e.g. flatMap(T -> Optional<U>)) • constructor to put T into M<T>; same as a static method M<T> unit(T) (e.g. Optional.of(x)) 1. Left identity: unit(x).flatMap(f) = f(x) 2. Right identity: m.flatMap(x -> unit(x)) = m 3. Associativity: m.flatMap(f).flatMap(g) = m.flatMap(x -> f(x).flatMap(g))) M<U> map(f) { return flatMap(x -> unit(f(x))) } Bonus: now we can define M<U> map(T -> U)
  15. 15. Optional<User> user = findUser(userId); Optional<Order> order = findOrder(orderId); Optional<Payment> payment = findPayment(orderId); Optional<Placement> placement = user .flatMap(u -> (order.flatMap(o -> (payment.map(p -> submitOrder(u, o, p)))))); Java: looks ugly 
  16. 16. Optional<User> user = findUser(userId); Optional<Order> order = findOrder(orderId); Optional<Payment> payment = findPayment(orderId); Optional<Placement> placement = user .flatMap(u -> (order.flatMap(o -> (payment.map(p -> submitOrder(u, o, p)))))); Java: looks ugly  • Scala, for-comprehension • Haskell, do-notation • F#, computational expressions
  17. 17. Optional<User> user = findUser(userId); Optional<Order> order = findOrder(orderId); Optional<Payment> payment = findPayment(orderId); Optional<Placement> placement = user .flatMap(u -> (order.flatMap(o -> (payment.map(p -> submitOrder(u, o, p)))))); Java: looks ugly  val placement = for { u <- findUser(userId) o <- findOrder(orderId) p <- findPayment(orderId) } yield submitOrder(u, o, p) Scala: built-in monad Support  • Scala, for-comprehension • Haskell, do-notation • F#, computational expressions
  18. 18. trait Parser[T] extends (String => ParseResult[T]) sealed abstract class ParseResult[T] case class Success[T](result: T, rest: String) extends ParseResult[T] case class Failure() extends ParseResult[Nothing] val letter: Parser[Char] = … val digit: Parser[Char] = … val space: Parser[Char] = … def map[U](f: T => U): Parser[U] = parser { in => this(in) map f } def flatMap[U](f: T => Parser[U]): Parser[U] = parser { in => this(in) withNext f } def * : Parser[List[T]] = …
  19. 19. trait Parser[T] extends (String => ParseResult[T]) sealed abstract class ParseResult[T] case class Success[T](result: T, rest: String) extends ParseResult[T] case class Failure() extends ParseResult[Nothing] val letter: Parser[Char] = … val digit: Parser[Char] = … val space: Parser[Char] = … def map[U](f: T => U): Parser[U] = parser { in => this(in) map f } def flatMap[U](f: T => Parser[U]): Parser[U] = parser { in => this(in) withNext f } def * : Parser[List[T]] = … val userParser = for { firstName <- letter.* _ <- space lastName <- letter.* _ <- space phone <- digit.*} yield User(firstName, lastName, phone) “John Doe 0671112222”
  20. 20. scala.Option java.Optional Absence of value scala.List java.Stream Multiple results scala.Future scalaz.Task java.CompletableFuture Asynchronous computations scalaz.Reader Read from shared environment scalaz.Writer Collect data in addition to computed values scalaz.State Maintain state scala.Try scalaz./ Handling failures
  21. 21. • Remove boilerplate • Modularity: separate computations from combination strategy • Composability: compose computations from simple ones • Improve maintainability • Better readability • Vocabulary
  22. 22. New data All data Batch view Real-time view Data stream Batch processing Real-time processing Serving layer Query and merge
  23. 23. • Write job logic once and run on many Platforms(Hadoop, Storm) • Library authors talk about monoids all the time 
  24. 24. • Write job logic once and run on many Platforms(Hadoop, Storm) • Library authors talk about monoids all the time  def wordCount[P <: Platform[P]] (source: Producer[P, String], store: P#Store[String, Long]) = source.flatMap { sentence => toWords(sentence).map(_ -> 1L) }.sumByKey(store)
  25. 25. • Write job logic once and run on many Platforms(Hadoop, Storm) • Library authors talk about monoids all the time  def wordCount[P <: Platform[P]] (source: Producer[P, String], store: P#Store[String, Long]) = source.flatMap { sentence => toWords(sentence).map(_ -> 1L) }.sumByKey(store) def sumByKey(store: P#Store[K, V])(implicit semigroup: Semigroup[V]): Summer[P, K, V] = …
  26. 26. Given a set S and a binary operation +, we say that (𝑠, +) is a Semigroup if ∀ 𝑥, 𝑦, 𝑧 ∈ 𝑆: • Closure: 𝑥 + 𝑦 ∈ 𝑆 • Associativity: (𝑥 + 𝑦) + 𝑧 = 𝑥 + (𝑦 + 𝑧) Monoid is a semigroup with identity element: • Identity: ∃ 𝑒 ∈ 𝑆: 𝑒 + 𝑥 = 𝑥 + 𝑒 = 𝑥 • 3 * 2 (numbers under multiplication, 1 is the identity element) • 1 + 5 (numbers under addition, 0 is the identity element) • “ab” + “cd” (strings under concatenation, empty string is the identity element) • many more
  27. 27. Input data map map map map reduce reduce reduce output Having a sequence of elements of monoid M, we can reduce them into a final value Associativity ensure that we can parallelize computation(not exactly true) Identity allows to skip elements that don’t affect the result
  28. 28. Associativity: (𝑥 + 𝑦) + 𝑧 = 𝑥 + (𝑦 + 𝑧) General Associativity Theorem https://proofwiki.org/wiki/General_Associativity_Theorem given: 𝑎 + 𝑏 + 𝑐 + 𝑑 + 𝑒 + 𝑓 + 𝑔 + ℎ you can place parentheses anywhere ((𝑎 + 𝑏) + (𝑐 + 𝑑)) + ( 𝑒 + 𝑓 + 𝑔 + ℎ ) or (𝑎 + 𝑏 + 𝑐 + 𝑑) + (𝑒 + 𝑓 + 𝑔 + ℎ)
  29. 29. 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 ℎ + + + + + + +
  30. 30. 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔 ℎ + + + + + + +
  31. 31. a b c d e f g h a + b + c + d + e + fBatch processing Real-time processing 𝐵0 𝐵1 𝐵2 𝐵3 𝐵4 𝐵5 𝐵6 𝐵7 time1h now Real-time sums from 0, each batch Batch proc. recomputes total sum
  32. 32. a b c d e f g h a + b + c + d + e + fBatch processing Real-time processing 𝐵0 𝐵1 𝐵2 𝐵3 𝐵4 𝐵5 𝐵6 𝐵7 time1h now Query and sum real-time + batch (𝑎 + 𝑏 + 𝑐 + 𝑑 + 𝑒 + 𝑓) + 𝑔 + ℎ (this is where Semigroup required)
  33. 33. Bloom filter is a space-efficient probabilistic data structure to test presence of an element in a set 0 0 0 0 0 0 0 0 0 0 0 0 𝑚 Operations: • Insert element • Query if element is present. The answer is either No or Maybe (false positives are possible) Consists of: • 𝑘 hash functions: ℎ1, ℎ2, … ℎ 𝑘 • bit array of 𝑚 bits
  34. 34. 0 0 1 0 0 0 0 1 0 1 0 0 ℎ1(𝑒) ℎ2(𝑒) … ℎ 𝑘(𝑒) 𝑒 set bit value to 1
  35. 35. 0 0 1 0 1 0 1 1 0 0 0 0 ℎ1(𝑒) ℎ2(𝑒) … ℎ 𝑘(𝑒) 𝑒 check if all bits are set to 1
  36. 36. 0 0 1 0 1 0 0 1 0 0 0 0Filter A: {𝑒1, 𝑒2, 𝑒3} 1 0 1 0 0 0 0 0 1 0 0 0Filter B: {𝑒4, 𝑒5, 𝑒6} + OR 1 0 1 0 1 0 0 1 1 0 0 0Filter A + B: {𝑒1, 𝑒2, 𝑒3, 𝑒4, 𝑒5, 𝑒6}
  37. 37. A few can be found in in Algebird (Abstract Algebra for Scala) https://github.com/twitter/algebird/ • Bloom Filter • HyperLogLog • CountMinSketch • TopK • etc
  38. 38. • Monad is just a useful pattern in functional programming • You don’t need to understand Category Theory to use Monads • Once you grasp the idea, you will see this pattern everywhere • Semigroup (commutative) and monoid define properties useful in distributed computing and Lambda Architecture. • It’s all about associativity and commutativity. No nonsense!

×