SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Jul 2014
WHY FUNCTIONAL?
WHY SCALA?
Neville Li
@sinisa_lyh
MONOID!
Actuallyit'sa semigroup,monoid just soundsmore interesting :)
A Little Teaser
Crunch:CombineFns are used to representthe associative operations...
PGroupedTable<K,V>::combineValues(CombineFn<K,V>combineFn,
CombineFn<K,V>reduceFn)
Scalding:reduce with fn which mustbe associative and commutative
KeyedList[K,T]::reduce(fn:(T,T)=>T)
Spark:Merge the values for each key using an associative reduce function
PairRDDFunctions[K,V]::reduceByKey(fn:(V,V)=>V)
All ofthem work on both mapper and reducer side
0
MY STORY
Before
Mostly Python/C++ (and PHP...)
No Java experience at all
Started using Scala early 2013
Now
Discovery's* Java backend/riemann guy
The Scalding/Spark/Storm guy
Contributor to Spark, chill, cascading.avro
*Spotify'smachine learning and recommendation team
WHY THIS TALK?
Not a tutorial
Discovery's experience
Why FP matters
Why Scala matters
Common misconceptions
WHAT WE ALREADY USE
Kafka
Scalding
Spark / MLLib
Stratosphere
Storm / Riemann (Clojure)
WHAT WE WANT TO INVESTIGATE
Summingbird (Scala for Storm + Hadoop)
Spark Streaming
Shark / SparkSQL
GraphX (Spark)
BIDMach (GPU ML with GPU)
DISCOVERY
Mid 2013: 100+ Python jobs
10+ hires since (half since new year)
Few with Java experience, none with Scala
As of May 2014: ~100 Scalding jobs & 90 tests
More uncommited ad-hoc jobs
12+ commiters, 4+ using Spark
DISCOVERY
rec-sys-scalding.git
DISCOVERY
GUESS HOW MANY JOBS
WRITTEN BY YOURS TRUELY?
3
WHY FUNCTIONAL
Immutable data
Copy and transform
Not mutate in place
HDFS with M/R jobs
Storm tuples, Riemann streams
WHY FUNCTIONAL
Higher order functions
Expressions, not statements
Focus on problem solving
Not solving programming problems
WHY FUNCTIONAL
Word count in Python
lyrics=["WeallliveinAmerika","Amerikaistwunderbar"]
wc=defaultdict(int)
forlinlyrics:
forwinl.split():
wc[w]+=1
Screen too small for the Java version
WHY FUNCTIONAL
Map and reduce are key concepts in FP
vallyrics=List("WeallliveinAmerika","Amerikaistwunderbar")
lyrics.flatMap(_.split("")) //map
.groupBy(identity) //shuffle
.map{case(k,g)=>(k,g.size)} //reduce
(deflyrics["WeallliveinAmerika""Amerikaistwunderbar"])
(->>lyrics(mapcat#(clojure.string/split%#"s"))
(group-byidentity)
(map(fn[[kg]][k(countg)])))
importControl.Arrow
importData.List
letlyrics=["WeallliveinAmerika","Amerikaistwunderbar"]
mapwords>>>concat
>>>sort>>>group
>>>map(x->(headx,lengthx))$lyrics
WHY FUNCTIONAL
Linear equation in ALS matrixfactorization
= ( Y + ( − I)Y p(u)xu Y
T
Y
T
C
u
)
−1
Y
T
C
u
vectors.map{case(id,vec)=>(id,vec*vec.T)} //YtY
.map(_._2).reduce(_+_)
ratings.keyBy(fixedKey).join(outerProducts) //YtCuIY
.map{case(_,(r,op))=>(solveKey(r),op*(r.rating*alpha))}
.reduceByKey(_+_)
ratings.keyBy(fixedKey).join(vectors) //YtCupu
.map{case(_,(r,vec))=>
valCui=r.rating*alpha+1
valpui=if(Cui>0.0)1.0else0.0
(solveKey(r),vec*(Cui*pui))
}.reduceByKey(_+_)
WHY SCALA
JVM - libraries and tools
Pythonesque syntax
Static typing with inference
Transition from imperative to FP
WHY SCALA
Performance vs. agility
http://nicholassterling.wordpress.com/2012/11/16/scala-performance/
WHY SCALA
Type inference
classComplexDecorationService{
publicList<ListenableFuture<Map<String,Metadata>>>
lookupMetadata(List<String>keys){/*...*/}
}
valdata=service.lookupMetadata(keys)
typeDF=List[ListenableFuture[Map[String,Track]]]
defprocess(data:DF)={/*...*/}
WHY SCALA
Higher order functions
List<Integer>list=Lists.newArrayList(1,2,3);
Lists.transform(list,newFunction<Integer,Integer>(){
@Override
publicIntegerapply(Integerinput){
returninput+1;
}
});
vallist=List(1,2,3)
list.map(_+1) //List(2,3,4)
And then imagine ifyou have to chain or nested functions
WHY SCALA
Collections API
vall=List(1,2,3,4,5)
l.map(_+1) //List(2,3,4,5,6)
l.filter(_>3) //45
l.zip(List("a","b","c")).toMap //Map(1->a,2->b,3->c)
l.partition(_%2==0) //(List(2,4),List(1,3,5))
List(l,l.map(_*2)).flatten //List(1,2,3,4,5,2,4,6,8,10)
l.reduce(_+_) //15
l.fold(100)(_+_) //115
"WeallliveinAmerika".split("").groupBy(_.size)
//Map(2->Array(We,in),4->Array(live),
// 7->Array(Amerika),3->Array(all))
WHY SCALA
Scalding field based word count
TextLine(path))
.flatMap('line->'word){line:String=>line.split("""W+""")}
.groupBy('word){_.size}
Scalding type-safe word count
TextLine(path).read.toTypedPipe[String](Fields.ALL)
.flatMap(_.split(""W+""))
.groupBy(identity).size
Scrunch word count
read(from.textFile(file))
.flatMap(_.split("""W+""")
.count
WHY SCALA
Summingbird word count
source
.flatMap{line:String=>line.split("""W+""").map((_,1))}
.sumByKey(store)
Spark word count
sc.textFile(path)
.flatMap(_.split("""W+"""))
.map(word=>(word,1))
.reduceByKey(_+_)
Stratosphere word count
TextFile(textInput)
.flatMap(_.split("""W+"""))
.map(word=>(word,1))
.groupBy(_._1)
.reduce{(w1,w2)=>(w1._1,w1._2+w2._2)}
WHY SCALA
Many patterns also common in Java
Java 8 lambdas and streams
Guava, Crunch, etc.
Optional, Predicate
Collection transformations
ListenableFuture and transform
parallelDo, DoFn, MapFn, CombineFn
COMMON MISCONCEPTIONS
It's complex
True for language features
Not from user's perspective
We only use 20% features
Not more than needed in Java
COMMON MISCONCEPTIONS
It's slow
No slower than Python
Depend on how pure FP
Trade off with productivity
Drop down to Java or native libraries
COMMON MISCONCEPTIONS
I don't want to learn a new language
How about flatMap, reduce, fold, etc.?
Unnecessary overhead
interfacing with Python or Java
You've used monoids, monads,
or higher order functions already
THE END
THANK YOU

Contenu connexe

Tendances

Orthogonal Functional Architecture
Orthogonal Functional ArchitectureOrthogonal Functional Architecture
Orthogonal Functional Architecture
John De Goes
 

Tendances (11)

Beginning Scala Svcc 2009
Beginning Scala Svcc 2009Beginning Scala Svcc 2009
Beginning Scala Svcc 2009
 
Functional Programming with JavaScript
Functional Programming with JavaScriptFunctional Programming with JavaScript
Functional Programming with JavaScript
 
ZIO Queue
ZIO QueueZIO Queue
ZIO Queue
 
20191116 custom operators in swift
20191116 custom operators in swift20191116 custom operators in swift
20191116 custom operators in swift
 
Functional Programming in Scala
Functional Programming in ScalaFunctional Programming in Scala
Functional Programming in Scala
 
Functional Programming in PHP
Functional Programming in PHPFunctional Programming in PHP
Functional Programming in PHP
 
eMan Dev Meetup: Kotlin For Android (part 03/03) 18.5.2017
eMan Dev Meetup: Kotlin For Android (part 03/03) 18.5.2017eMan Dev Meetup: Kotlin For Android (part 03/03) 18.5.2017
eMan Dev Meetup: Kotlin For Android (part 03/03) 18.5.2017
 
Scala collections
Scala collectionsScala collections
Scala collections
 
Sneaking inside Kotlin features
Sneaking inside Kotlin featuresSneaking inside Kotlin features
Sneaking inside Kotlin features
 
Learning Functional Programming Without Growing a Neckbeard
Learning Functional Programming Without Growing a NeckbeardLearning Functional Programming Without Growing a Neckbeard
Learning Functional Programming Without Growing a Neckbeard
 
Orthogonal Functional Architecture
Orthogonal Functional ArchitectureOrthogonal Functional Architecture
Orthogonal Functional Architecture
 

En vedette

Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)
stasimus
 
Monad presentation scala as a category
Monad presentation   scala as a categoryMonad presentation   scala as a category
Monad presentation scala as a category
samthemonad
 
Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)
stasimus
 
Introduction to Functional Programming with Scala
Introduction to Functional Programming with ScalaIntroduction to Functional Programming with Scala
Introduction to Functional Programming with Scala
pramode_ce
 

En vedette (11)

Scala the language matters
Scala the language mattersScala the language matters
Scala the language matters
 
Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)Introduction to Monads in Scala (2)
Introduction to Monads in Scala (2)
 
Monad presentation scala as a category
Monad presentation   scala as a categoryMonad presentation   scala as a category
Monad presentation scala as a category
 
Introduction to Option monad in Scala
Introduction to Option monad in ScalaIntroduction to Option monad in Scala
Introduction to Option monad in Scala
 
Thinking functional-in-scala
Thinking functional-in-scalaThinking functional-in-scala
Thinking functional-in-scala
 
Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)Introduction to Monads in Scala (1)
Introduction to Monads in Scala (1)
 
Developers Summit 2015 - Scala Monad
Developers Summit 2015 - Scala MonadDevelopers Summit 2015 - Scala Monad
Developers Summit 2015 - Scala Monad
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in Scala
 
Scala: functional programming for the imperative mind
Scala: functional programming for the imperative mindScala: functional programming for the imperative mind
Scala: functional programming for the imperative mind
 
Functional Programming Fundamentals
Functional Programming FundamentalsFunctional Programming Fundamentals
Functional Programming Fundamentals
 
Introduction to Functional Programming with Scala
Introduction to Functional Programming with ScalaIntroduction to Functional Programming with Scala
Introduction to Functional Programming with Scala
 

Similaire à Why functional why scala

Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type Classes
John De Goes
 
Spark by Adform Research, Paulius
Spark by Adform Research, PauliusSpark by Adform Research, Paulius
Spark by Adform Research, Paulius
Vasil Remeniuk
 
The Magnificent Seven
The Magnificent SevenThe Magnificent Seven
The Magnificent Seven
Mike Fogus
 

Similaire à Why functional why scala (20)

FP in scalaで鍛える関数型脳
FP in scalaで鍛える関数型脳FP in scalaで鍛える関数型脳
FP in scalaで鍛える関数型脳
 
Type class survival guide
Type class survival guideType class survival guide
Type class survival guide
 
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
Neatly Hashing a Tree: FP tree-fold in Perl5 & Perl6
 
Functional perl
Functional perlFunctional perl
Functional perl
 
Implementing a many-to-many Relationship with Slick
Implementing a many-to-many Relationship with SlickImplementing a many-to-many Relationship with Slick
Implementing a many-to-many Relationship with Slick
 
Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type Classes
 
Spark by Adform Research, Paulius
Spark by Adform Research, PauliusSpark by Adform Research, Paulius
Spark by Adform Research, Paulius
 
Fp in scala part 1
Fp in scala part 1Fp in scala part 1
Fp in scala part 1
 
Metaprogramming in Haskell
Metaprogramming in HaskellMetaprogramming in Haskell
Metaprogramming in Haskell
 
Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?Is Haskell an acceptable Perl?
Is Haskell an acceptable Perl?
 
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017 Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
Davide Cerbo - Kotlin: forse è la volta buona - Codemotion Milan 2017
 
Composition birds-and-recursion
Composition birds-and-recursionComposition birds-and-recursion
Composition birds-and-recursion
 
The Magnificent Seven
The Magnificent SevenThe Magnificent Seven
The Magnificent Seven
 
Advance Scala - Oleg Mürk
Advance Scala - Oleg MürkAdvance Scala - Oleg Mürk
Advance Scala - Oleg Mürk
 
Rewriting Java In Scala
Rewriting Java In ScalaRewriting Java In Scala
Rewriting Java In Scala
 
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsQuark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
 
Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014
 
JSDC 2014 - functional java script, why or why not
JSDC 2014 - functional java script, why or why notJSDC 2014 - functional java script, why or why not
JSDC 2014 - functional java script, why or why not
 
Practical scalaz
Practical scalazPractical scalaz
Practical scalaz
 
Fp in scala with adts part 2
Fp in scala with adts part 2Fp in scala with adts part 2
Fp in scala with adts part 2
 

Plus de Neville Li

Plus de Neville Li (7)

Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Scio
ScioScio
Scio
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at Spotify
 

Dernier

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Dernier (20)

How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 

Why functional why scala

  • 1. Jul 2014 WHY FUNCTIONAL? WHY SCALA? Neville Li @sinisa_lyh
  • 2. MONOID! Actuallyit'sa semigroup,monoid just soundsmore interesting :) A Little Teaser Crunch:CombineFns are used to representthe associative operations... PGroupedTable<K,V>::combineValues(CombineFn<K,V>combineFn, CombineFn<K,V>reduceFn) Scalding:reduce with fn which mustbe associative and commutative KeyedList[K,T]::reduce(fn:(T,T)=>T) Spark:Merge the values for each key using an associative reduce function PairRDDFunctions[K,V]::reduceByKey(fn:(V,V)=>V) All ofthem work on both mapper and reducer side 0
  • 3. MY STORY Before Mostly Python/C++ (and PHP...) No Java experience at all Started using Scala early 2013 Now Discovery's* Java backend/riemann guy The Scalding/Spark/Storm guy Contributor to Spark, chill, cascading.avro *Spotify'smachine learning and recommendation team
  • 4. WHY THIS TALK? Not a tutorial Discovery's experience Why FP matters Why Scala matters Common misconceptions
  • 5. WHAT WE ALREADY USE Kafka Scalding Spark / MLLib Stratosphere Storm / Riemann (Clojure)
  • 6. WHAT WE WANT TO INVESTIGATE Summingbird (Scala for Storm + Hadoop) Spark Streaming Shark / SparkSQL GraphX (Spark) BIDMach (GPU ML with GPU)
  • 7. DISCOVERY Mid 2013: 100+ Python jobs 10+ hires since (half since new year) Few with Java experience, none with Scala As of May 2014: ~100 Scalding jobs & 90 tests More uncommited ad-hoc jobs 12+ commiters, 4+ using Spark
  • 9. DISCOVERY GUESS HOW MANY JOBS WRITTEN BY YOURS TRUELY? 3
  • 10. WHY FUNCTIONAL Immutable data Copy and transform Not mutate in place HDFS with M/R jobs Storm tuples, Riemann streams
  • 11. WHY FUNCTIONAL Higher order functions Expressions, not statements Focus on problem solving Not solving programming problems
  • 12. WHY FUNCTIONAL Word count in Python lyrics=["WeallliveinAmerika","Amerikaistwunderbar"] wc=defaultdict(int) forlinlyrics: forwinl.split(): wc[w]+=1 Screen too small for the Java version
  • 13. WHY FUNCTIONAL Map and reduce are key concepts in FP vallyrics=List("WeallliveinAmerika","Amerikaistwunderbar") lyrics.flatMap(_.split("")) //map .groupBy(identity) //shuffle .map{case(k,g)=>(k,g.size)} //reduce (deflyrics["WeallliveinAmerika""Amerikaistwunderbar"]) (->>lyrics(mapcat#(clojure.string/split%#"s")) (group-byidentity) (map(fn[[kg]][k(countg)]))) importControl.Arrow importData.List letlyrics=["WeallliveinAmerika","Amerikaistwunderbar"] mapwords>>>concat >>>sort>>>group >>>map(x->(headx,lengthx))$lyrics
  • 14. WHY FUNCTIONAL Linear equation in ALS matrixfactorization = ( Y + ( − I)Y p(u)xu Y T Y T C u ) −1 Y T C u vectors.map{case(id,vec)=>(id,vec*vec.T)} //YtY .map(_._2).reduce(_+_) ratings.keyBy(fixedKey).join(outerProducts) //YtCuIY .map{case(_,(r,op))=>(solveKey(r),op*(r.rating*alpha))} .reduceByKey(_+_) ratings.keyBy(fixedKey).join(vectors) //YtCupu .map{case(_,(r,vec))=> valCui=r.rating*alpha+1 valpui=if(Cui>0.0)1.0else0.0 (solveKey(r),vec*(Cui*pui)) }.reduceByKey(_+_)
  • 15. WHY SCALA JVM - libraries and tools Pythonesque syntax Static typing with inference Transition from imperative to FP
  • 16. WHY SCALA Performance vs. agility http://nicholassterling.wordpress.com/2012/11/16/scala-performance/
  • 18. WHY SCALA Higher order functions List<Integer>list=Lists.newArrayList(1,2,3); Lists.transform(list,newFunction<Integer,Integer>(){ @Override publicIntegerapply(Integerinput){ returninput+1; } }); vallist=List(1,2,3) list.map(_+1) //List(2,3,4) And then imagine ifyou have to chain or nested functions
  • 19. WHY SCALA Collections API vall=List(1,2,3,4,5) l.map(_+1) //List(2,3,4,5,6) l.filter(_>3) //45 l.zip(List("a","b","c")).toMap //Map(1->a,2->b,3->c) l.partition(_%2==0) //(List(2,4),List(1,3,5)) List(l,l.map(_*2)).flatten //List(1,2,3,4,5,2,4,6,8,10) l.reduce(_+_) //15 l.fold(100)(_+_) //115 "WeallliveinAmerika".split("").groupBy(_.size) //Map(2->Array(We,in),4->Array(live), // 7->Array(Amerika),3->Array(all))
  • 20. WHY SCALA Scalding field based word count TextLine(path)) .flatMap('line->'word){line:String=>line.split("""W+""")} .groupBy('word){_.size} Scalding type-safe word count TextLine(path).read.toTypedPipe[String](Fields.ALL) .flatMap(_.split(""W+"")) .groupBy(identity).size Scrunch word count read(from.textFile(file)) .flatMap(_.split("""W+""") .count
  • 21. WHY SCALA Summingbird word count source .flatMap{line:String=>line.split("""W+""").map((_,1))} .sumByKey(store) Spark word count sc.textFile(path) .flatMap(_.split("""W+""")) .map(word=>(word,1)) .reduceByKey(_+_) Stratosphere word count TextFile(textInput) .flatMap(_.split("""W+""")) .map(word=>(word,1)) .groupBy(_._1) .reduce{(w1,w2)=>(w1._1,w1._2+w2._2)}
  • 22. WHY SCALA Many patterns also common in Java Java 8 lambdas and streams Guava, Crunch, etc. Optional, Predicate Collection transformations ListenableFuture and transform parallelDo, DoFn, MapFn, CombineFn
  • 23. COMMON MISCONCEPTIONS It's complex True for language features Not from user's perspective We only use 20% features Not more than needed in Java
  • 24. COMMON MISCONCEPTIONS It's slow No slower than Python Depend on how pure FP Trade off with productivity Drop down to Java or native libraries
  • 25. COMMON MISCONCEPTIONS I don't want to learn a new language How about flatMap, reduce, fold, etc.? Unnecessary overhead interfacing with Python or Java You've used monoids, monads, or higher order functions already