SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Spark As A Distributed Scala
Write a lot to my blog www.Fruzenshtein.com
Currently interested in Scala, Akka, Spark…
Who is who?
Alexey Zvolinskiy
~4 years of Scala experience
Passing through Functional Programming
in Scala Specialization on Coursera
@Fruzenshtein
Why Scala?
What makes Scala so great?
1. Functional programming language*
2. Immutability
3. Type system
4. Collections API
5. Pattern matching
6. Implicit
Functional programming language
1. Function is a first class citizen
2. Totality
3. Determinism
4. Purity
A => B
A1
A2
…
An
B1
B2
…
Bn
A => BAi Bi A => BAi Bi
A => BAi Bi
Immutability
1. Makes a code more predictable
2. Reduces efforts to understand a code
3. Key to thread-safety
Books:
Java concurrency in practice
Effective Java 2nd Edition
Type system
1. Static typing
2. Type inference
3. Bounds Map[V, K]
List[T1 <: T2]
Set[+T]
Collections API
val numbers = List(1,2,3,4,5,6,7,8,9,10)
numbers.filter(_ % 2 == 0)
.map(_ * 10)
//List(20, 40, 60, 80, 100)
filter(n:Int => Boolean)
//(n => n % 2 == 0)
//(n => n * 10)
map(n:Int => Int)
Collections API
val groupsOfStudents = List(
List(("Alex", 65), ("Kate", 87), ("Sam", 98)),
List(("Peter", 84), ("Bob", 79), ("Samanta", 71)),
List(("Rob", 82), ("Jack", 55), ("Ann", 90))
)
groupsOfStudents.flatMap(students => students)
.groupBy(student => student._2 > 75)
.get(true).get
//List((Kate,87), (Sam,98), (Peter,84), (Bob,79), (Rob,82), (Ann,90))
And what?!
=Parallelism=
Idea of parallelism
How to divide a problem
into subproblems?
How to use a hardware
optimally?
Parallelism background
Scala parallel collections
val from0to100000: Range = 0 until 100000
val list = from0to100000.toList
//scala.collection.parallel.immutable.ParSeq[Int]
val parList = list.par
Some benchmarks
val list = from0to100000.toList
for (i <- 1 to 10) {
val t0 = System.currentTimeMillis()
list.filter(isPrime(_))
println(System.currentTimeMillis - t0)
}
def isPrime(n: Int): Boolean = ! (
(2 until n-1) exists (n % _ == 0)
)
val parList = list.par
for (i <- 1 to 10) {
val t1 = System.currentTimeMillis()
parList.filter(isPrime(_))
println(System.currentTimeMillis - t1)
}
7106
6467
6315
6275
6478
8732
6543
6296
6299
6286
5130
5106
4649
4568
4580
4446
4447
4437
4290
4476
Ok, but what about
Spark?!
Why distributed computations?
single machine
(shared memory)
Multiple nodes
(network)
Parallel collections
(scala)
RDDs
(spark)
Almost the same API
RDD example
Spark
Spark
Spark
val tweets: RDD[Tweet] = …
tweets.filter(
_.contains(“bigdata”)
)
Latency
Numbers from Jeff Dean http://research.google.com/people/jeff/ https://gist.github.com/2841832 Graph and scale by Thomas Lee
Computation model
memory disk network
seconds
-
days
weeks
-
months
weeks
-
years
Scala transformations & actions
1. Transformations are lazy
2. Actions are eager
map
filter
flatMap
…
reduce
collect
count
…
val tweets: RDD[Tweet] = …
tweets.filter(_.contains(“bigdata”))
.map(t => (t.author, t.body)
val tweets: RDD[Tweet] = …
tweets.filter(_.contains(“bigdata”))
.map(t => (t.author, t.body)
.collect()
Rules of thumb
1. Cache
2. Apply efficiently
3. Avoid shuffling
val tweets: RDD[Tweet] = …
val cachedTweets = tweets.cache()
cachedTweets.filter(_.contains(“USA”))
.map(t => (t.author, t.body)
cachedTweets.map(t => (t.author, t.body)
.filter(_.contains(“USA”))
Shuffling
(1, 240)
(2, 500)
(2, 105)
(3, 100)
(1, 200)
(1, 500)
(1, 450)
(3, 100)
(3, 100)
(2, [500, 105]) (1, [240, 200, 500, 450]) (3, [100, 100, 100])
groupByKey()
Transaction(id: Int, amount: Int)
We want to know how much money spent each client
Reduce before group
(1, 240)
(2, 605)
(3, 100)
(1, 700)
(1, 450)
(3, 200)
(2, [605]) (1, [240, 700, 450]) (3, [100, 200])
groupByKey()
(1, 240)
(2, 500)
(2, 105)
(3, 100)
(1, 200)
(1, 500)
(1, 450)
(3, 100)
(3, 100)
reduceByKey(…)
Thanks :)
@Fruzenshtein

Contenu connexe

Tendances

Tendances (16)

Xebicon2013 scala vsjava_final
Xebicon2013 scala vsjava_finalXebicon2013 scala vsjava_final
Xebicon2013 scala vsjava_final
 
Lisp Programming Languge
Lisp Programming LangugeLisp Programming Languge
Lisp Programming Languge
 
Gentle Introduction To Lisp
Gentle Introduction To LispGentle Introduction To Lisp
Gentle Introduction To Lisp
 
Verified Subtyping with Traits and Mixins
Verified Subtyping with Traits and MixinsVerified Subtyping with Traits and Mixins
Verified Subtyping with Traits and Mixins
 
Linked lists c7
Linked lists c7Linked lists c7
Linked lists c7
 
Introduction to lambda calculus
Introduction to lambda calculusIntroduction to lambda calculus
Introduction to lambda calculus
 
Using VI Java from Scala
Using VI Java from ScalaUsing VI Java from Scala
Using VI Java from Scala
 
Getting Started With Scala
Getting Started With ScalaGetting Started With Scala
Getting Started With Scala
 
(3) collections algorithms
(3) collections algorithms(3) collections algorithms
(3) collections algorithms
 
RxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScriptRxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScript
 
OCL 2.4. (... 2.5)
OCL 2.4. (... 2.5)OCL 2.4. (... 2.5)
OCL 2.4. (... 2.5)
 
Lisp
LispLisp
Lisp
 
Object Oriented JavaScript
Object Oriented JavaScriptObject Oriented JavaScript
Object Oriented JavaScript
 
Introducing Pattern Matching in Scala
 Introducing Pattern Matching  in Scala Introducing Pattern Matching  in Scala
Introducing Pattern Matching in Scala
 
Safe navigation in OCL
Safe navigation in OCLSafe navigation in OCL
Safe navigation in OCL
 
The Scala Refactoring Library: Problems and Perspectives
The Scala Refactoring Library: Problems and PerspectivesThe Scala Refactoring Library: Problems and Perspectives
The Scala Refactoring Library: Problems and Perspectives
 

En vedette

Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Elixir Club
 
Flowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Flowex: Flow-Based Programming with Elixir GenStage - Anton MishchukFlowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Flowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Elixir Club
 
Bottleneck in Elixir Application - Alexey Osipenko
 Bottleneck in Elixir Application - Alexey Osipenko  Bottleneck in Elixir Application - Alexey Osipenko
Bottleneck in Elixir Application - Alexey Osipenko
Elixir Club
 

En vedette (20)

Big Data eBook
Big Data eBookBig Data eBook
Big Data eBook
 
ELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSSELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSS
 
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
 
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
 
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
 
Control flow in_elixir
Control flow in_elixirControl flow in_elixir
Control flow in_elixir
 
Spring IO for startups
Spring IO for startupsSpring IO for startups
Spring IO for startups
 
Flowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Flowex: Flow-Based Programming with Elixir GenStage - Anton MishchukFlowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Flowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
 
Phoenix: Inflame the Web - Alex Troush
Phoenix: Inflame the Web - Alex TroushPhoenix: Inflame the Web - Alex Troush
Phoenix: Inflame the Web - Alex Troush
 
GenStage and Flow - Jose Valim
GenStage and Flow - Jose Valim GenStage and Flow - Jose Valim
GenStage and Flow - Jose Valim
 
Distributed system in Elixir
Distributed system in ElixirDistributed system in Elixir
Distributed system in Elixir
 
Proteome array - antibody based proteome arrays
Proteome array - antibody based proteome arrays Proteome array - antibody based proteome arrays
Proteome array - antibody based proteome arrays
 
Elixir & Phoenix 推坑
Elixir & Phoenix 推坑Elixir & Phoenix 推坑
Elixir & Phoenix 推坑
 
Bottleneck in Elixir Application - Alexey Osipenko
 Bottleneck in Elixir Application - Alexey Osipenko  Bottleneck in Elixir Application - Alexey Osipenko
Bottleneck in Elixir Application - Alexey Osipenko
 
Anatomy of an elixir process and Actor Communication
Anatomy of an elixir process and Actor CommunicationAnatomy of an elixir process and Actor Communication
Anatomy of an elixir process and Actor Communication
 
Build Your Own Real-Time Web Service with Elixir Phoenix
Build Your Own Real-Time Web Service with Elixir PhoenixBuild Your Own Real-Time Web Service with Elixir Phoenix
Build Your Own Real-Time Web Service with Elixir Phoenix
 
Play vs Rails
Play vs RailsPlay vs Rails
Play vs Rails
 
Elixir basics
Elixir basicsElixir basics
Elixir basics
 
Professional Programmer (3 Years Later)
Professional Programmer (3 Years Later)Professional Programmer (3 Years Later)
Professional Programmer (3 Years Later)
 
Embedded Erlang, Nerves, and SumoBots
Embedded Erlang, Nerves, and SumoBotsEmbedded Erlang, Nerves, and SumoBots
Embedded Erlang, Nerves, and SumoBots
 

Similaire à Spark as a distributed Scala

scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
Hiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
Hiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
Hiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
Hiroshi Ono
 

Similaire à Spark as a distributed Scala (20)

Introduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with sparkIntroduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with spark
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Language
 
Scala training workshop 02
Scala training workshop 02Scala training workshop 02
Scala training workshop 02
 
Functional programming with Scala
Functional programming with ScalaFunctional programming with Scala
Functional programming with Scala
 
Functional Programming With Scala
Functional Programming With ScalaFunctional Programming With Scala
Functional Programming With Scala
 
Devoxx
DevoxxDevoxx
Devoxx
 
Spark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin Odersky
 
Quick introduction to scala
Quick introduction to scalaQuick introduction to scala
Quick introduction to scala
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
 
Lecture1
Lecture1Lecture1
Lecture1
 
The Evolution of Scala
The Evolution of ScalaThe Evolution of Scala
The Evolution of Scala
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
Scala final ppt vinay
Scala final ppt vinayScala final ppt vinay
Scala final ppt vinay
 
Scala for Java Programmers
Scala for Java ProgrammersScala for Java Programmers
Scala for Java Programmers
 
Metaprogramming in Scala 2.10, Eugene Burmako,
Metaprogramming  in Scala 2.10, Eugene Burmako, Metaprogramming  in Scala 2.10, Eugene Burmako,
Metaprogramming in Scala 2.10, Eugene Burmako,
 
Artigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfArtigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdf
 

Plus de Alex Fruzenshtein (6)

Scala UA: Big Step To Functional Programming
Scala UA: Big Step To Functional ProgrammingScala UA: Big Step To Functional Programming
Scala UA: Big Step To Functional Programming
 
Akka: Actor Design & Communication Technics
Akka: Actor Design & Communication TechnicsAkka: Actor Design & Communication Technics
Akka: Actor Design & Communication Technics
 
Boost UI tests
Boost UI testsBoost UI tests
Boost UI tests
 
N аргументов не идти в QA
N аргументов не идти в QAN аргументов не идти в QA
N аргументов не идти в QA
 
XpDays - Automated testing of responsive design (GalenFramework)
XpDays - Automated testing of responsive design (GalenFramework)XpDays - Automated testing of responsive design (GalenFramework)
XpDays - Automated testing of responsive design (GalenFramework)
 
Test-case design
Test-case designTest-case design
Test-case design
 

Dernier

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 

Dernier (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

Spark as a distributed Scala

  • 1. Spark As A Distributed Scala
  • 2. Write a lot to my blog www.Fruzenshtein.com Currently interested in Scala, Akka, Spark… Who is who? Alexey Zvolinskiy ~4 years of Scala experience Passing through Functional Programming in Scala Specialization on Coursera @Fruzenshtein
  • 4. What makes Scala so great? 1. Functional programming language* 2. Immutability 3. Type system 4. Collections API 5. Pattern matching 6. Implicit
  • 5. Functional programming language 1. Function is a first class citizen 2. Totality 3. Determinism 4. Purity A => B A1 A2 … An B1 B2 … Bn A => BAi Bi A => BAi Bi A => BAi Bi
  • 6. Immutability 1. Makes a code more predictable 2. Reduces efforts to understand a code 3. Key to thread-safety Books: Java concurrency in practice Effective Java 2nd Edition
  • 7. Type system 1. Static typing 2. Type inference 3. Bounds Map[V, K] List[T1 <: T2] Set[+T]
  • 8. Collections API val numbers = List(1,2,3,4,5,6,7,8,9,10) numbers.filter(_ % 2 == 0) .map(_ * 10) //List(20, 40, 60, 80, 100) filter(n:Int => Boolean) //(n => n % 2 == 0) //(n => n * 10) map(n:Int => Int)
  • 9. Collections API val groupsOfStudents = List( List(("Alex", 65), ("Kate", 87), ("Sam", 98)), List(("Peter", 84), ("Bob", 79), ("Samanta", 71)), List(("Rob", 82), ("Jack", 55), ("Ann", 90)) ) groupsOfStudents.flatMap(students => students) .groupBy(student => student._2 > 75) .get(true).get //List((Kate,87), (Sam,98), (Peter,84), (Bob,79), (Rob,82), (Ann,90))
  • 11. Idea of parallelism How to divide a problem into subproblems? How to use a hardware optimally?
  • 13. Scala parallel collections val from0to100000: Range = 0 until 100000 val list = from0to100000.toList //scala.collection.parallel.immutable.ParSeq[Int] val parList = list.par
  • 14. Some benchmarks val list = from0to100000.toList for (i <- 1 to 10) { val t0 = System.currentTimeMillis() list.filter(isPrime(_)) println(System.currentTimeMillis - t0) } def isPrime(n: Int): Boolean = ! ( (2 until n-1) exists (n % _ == 0) ) val parList = list.par for (i <- 1 to 10) { val t1 = System.currentTimeMillis() parList.filter(isPrime(_)) println(System.currentTimeMillis - t1) } 7106 6467 6315 6275 6478 8732 6543 6296 6299 6286 5130 5106 4649 4568 4580 4446 4447 4437 4290 4476
  • 15. Ok, but what about Spark?!
  • 16. Why distributed computations? single machine (shared memory) Multiple nodes (network) Parallel collections (scala) RDDs (spark) Almost the same API
  • 17. RDD example Spark Spark Spark val tweets: RDD[Tweet] = … tweets.filter( _.contains(“bigdata”) )
  • 18. Latency Numbers from Jeff Dean http://research.google.com/people/jeff/ https://gist.github.com/2841832 Graph and scale by Thomas Lee
  • 19. Computation model memory disk network seconds - days weeks - months weeks - years
  • 20. Scala transformations & actions 1. Transformations are lazy 2. Actions are eager map filter flatMap … reduce collect count … val tweets: RDD[Tweet] = … tweets.filter(_.contains(“bigdata”)) .map(t => (t.author, t.body) val tweets: RDD[Tweet] = … tweets.filter(_.contains(“bigdata”)) .map(t => (t.author, t.body) .collect()
  • 21. Rules of thumb 1. Cache 2. Apply efficiently 3. Avoid shuffling val tweets: RDD[Tweet] = … val cachedTweets = tweets.cache() cachedTweets.filter(_.contains(“USA”)) .map(t => (t.author, t.body) cachedTweets.map(t => (t.author, t.body) .filter(_.contains(“USA”))
  • 22. Shuffling (1, 240) (2, 500) (2, 105) (3, 100) (1, 200) (1, 500) (1, 450) (3, 100) (3, 100) (2, [500, 105]) (1, [240, 200, 500, 450]) (3, [100, 100, 100]) groupByKey() Transaction(id: Int, amount: Int) We want to know how much money spent each client
  • 23. Reduce before group (1, 240) (2, 605) (3, 100) (1, 700) (1, 450) (3, 200) (2, [605]) (1, [240, 700, 450]) (3, [100, 200]) groupByKey() (1, 240) (2, 500) (2, 105) (3, 100) (1, 200) (1, 500) (1, 450) (3, 100) (3, 100) reduceByKey(…)