SlideShare une entreprise Scribd logo
1  sur  41
Expressiveness, Simplicity, and Users Craig Chambers Google
A Brief Bio MIT: 82-86 Argus, with Barbara Liskov, Bill Weihl, Mark Day Stanford: 86-91 Self, with David Ungar, UrsHölzle, … U. of Washington: 91-07 Cecil, MultiJava, ArchJava; Vortex, DyC, Rhodium, ... Jeff Dean, Dave Grove, Jonathan Aldrich, Todd Millstein, Sorin Lerner, …  Google: 07- Flume, …
Some Questions What makes an idea successful? Which ideas are adopted most? Which ideas have the most impact?
Outline Some past projects Self language, Self compiler Cecil language, Vortex compiler A current project Flume: data-parallel programming system
Self Language[Ungar & Smith 87] Purified essence of Smalltalk-like languages all data are objects no classes all actions are messages field accesses, control structures Core ideas are very simple widely cited and understood
Self v2[Chambers, Ungar, Chang 91] Added encapsulation and privacy Added prioritized multiple inheritance supported both ordered and unordered mult. inh. Sophisticated, or complicated? Unified, or kitchen sink? Not adopted; dropped from Self v3
Self Compiler[Chambers, Ungar 89-91] Dynamic optimizer (an early JIT compiler) Customization: specialize code for each receiver class Class/type dataflow analysis; lots of inlining Lazy compilation of uncommon code paths 89: customization + simple analysis: effective 90: + complicated analysis: more effective but slow 91: + lazy compilation: still more effective, and fast [Hölzle, … 92-94]: + dynamic type feedback: zowie! Simple analysis + type feedback widely adopted
Cecil Language[Chambers, Leavens, Millstein, Litvinov 92-99] Pure objects, pure messages Multimethods, static typechecking encapsulation modules, modular typechecking constraint-based polymorphic type system integrates F-bounded poly. and “where” clauses later: MultiJava, EML [Lee], Diesel, … Work on multimethods, “open classes” is well-known Multimethods not widely available  
Vortex Compiler[Chambers, Dean, Grove, Lerner, … 94-01] Whole-program optimizer, for Cecil, Java, … Class hierarchy analysis Profile-guided class/type feedback Dataflow analysis, code specialization Interprocedural static class/type analysis Fast context-insensitive [Defouw], context-sensitive Incremental recompilation; composable dataflow analyses Project well-known CHA: my most cited paper; a very simple idea More-sophisticated work less widely adopted
Some Other Work DyC [Grant, Philipose, Mock, Eggers 96-00] Dynamic compilation for C ArchJava, AliasJava, … [Aldrich, Notkin 01-04 …] PL support for software architecture Cobalt, Rhodium [Lerner, Millstein 02-05 …] Provably correct compiler optimizations
Trends Simpler ideas easier to adopt Sophisticated ideas need a simple story to be impactful Ideal: “deceptively simple” Unification != Swiss Army Knife Language papers have had more citations;compiler work has had more practical impact The combination can work well
A Current Project:Flume[Chambers, Raniwala, Perry, ... 10] Make data-parallel MapReduce-like pipelineseasy to write  yetefficient to run
Data-Parallel Programming Analyze & transform large, homogeneous data sets, processing separate elements in parallel Web pages Click logs Purchase records Geographical data sets Census data … Ideal: “embarrassingly parallel” analysis ofpetabytes of data
Challenges Parallel distributed programming is hard To do: Assign machines Distribute program binaries Partition input data across machines Synchronize jobs, communicate data when needed Monitor jobs Deal with faults in programs, machines, network, … Tune: stragglers, work stealing, … What if user is a domain expert, not a systems/PL expert?
MapReduce[Dean & Ghemawat, 04] purchases queries map item -> co-item term -> hour+city shuffle item -> all co-items term-> (hour+city)* reduce item -> recommend term-> what’s hot, when
MapReduce Greatly eases writing fault-tolerant data-parallel programs Handles many tedious and/or tricky details Has excellent (batch) performance Offers a simple programming model Lots of knobs for tuning Pipelines of MapReduces? Additional details to handle temp files pipeline control Programming model becomes low-level
Flume Ease task of writing data-parallel pipelines Offer high-level data-parallel abstractions,as a Java or C++ library Classes for (possibly huge) immutable collections Methods for data-parallel operations Easily composed to form pipelines Entire pipeline in a single program Automatically optimize and execute pipeline,e.g., via a series of MapReduces Manage lower-level details automatically
Flume Classes and Methods Core data-parallel collection classes: PCollection<T>,  PTable<K,V> Core data-parallel methods: parallelDo(DoFn) groupByKey() combineValues(CombineFn) flatten(...) read(Source), writeTo(Sink), … Derive other methods from these primitives: join(...), count(),  top(CompareFn,N), ...
Example: TopWords PCollection<String> lines =read(TextIO.source(“/gfs/corpus/*.txt”)); PCollection<String> words =lines.parallelDo(newExtractWordsFn()); PTable<String, Long> wordCounts =words.count(); PCollection<Pair<String, Long>> topWords =wordCounts.top(newOrderCountsFn(), 1000); PCollection<String>formattedOutput =topWords.parallelDo(newFormatCountFn()); formattedOutput.writeTo(TextIO.sink(“cnts.txt”)); FlumeJava.run();
Example: TopWords read(TextIO.source(“/gfs/corpus/*.txt”)) .parallelDo(newExtractWordsFn()) .count() .top(new OrderCountsFn(), 1000) .parallelDo(new FormatCountFn()) .writeTo(TextIO.sink(“cnts.txt”)); FlumeJava.run();
Execution Graph Data-parallel primitives (e.g., parallelDo) are “lazy” Don’t actually run right away, but wait until demanded Calls to primitives build an execution graph Nodes are operations to be performed Edges are PCollections that will hold the results An unevaluated result PCollection is a “future” Points to the graph that computes it Derived operations (e.g., count, user code) call lazy primitives and so get inlined away Evaluation is “demanded” by FlumeJava.run() Optimizes, then executes
read read(TextIO.source(“/…/*.txt”)) pDo parallelDo(newExtractWordsFn()) pDo count() gbk Execution Graph cv pDo gbk top(new OrderCountsFn(), 1000) pDo pDo parallelDo(new FormatCountFn()) write writeTo(TextIO.sink(“cnts.txt”))
Optimizer Fuse trees of parallelDo operations into one Producer-consumer,co-consumers (“siblings”) Eliminate now-unused intermediate PCollections Form MapReduces pDo + gbk + cv + pDo MapShuffleCombineReduce (MSCR) General: multi-mapper, multi-reducer, multi-output pDo pDo pDo pDo pDo pDo
read read(TextIO.source(“/…/*.txt”)) mscr pDo pDo parallelDo(newExtractWordsFn()) pDo count() gbk Final Pipeline Fusion cv mscr pDo 8 operations 2 operations gbk top(new OrderCountsFn(), 1000) pDo pDo pDo parallelDo(new FormatCountFn()) write writeTo(TextIO.sink(“cnts.txt”))
Executor Runs each optimized MSCR If small data, runs locally, sequentially develop and test in normal IDE If large data, runs remotely, in parallel Handles creating, deleting temp files Supports fast re-execution of incomplete runs Caches, reuses partial pipeline results
Another Example: SiteData GetPScoreFn, GetVerticalFn pDo pDo pDo GetDocInfoFn gbk PickBestFn cv pDo pDo pDo join() gbk pDo pDo MakeDocTraitsFn
Another Example: SiteData pDo pDo pDo pDo mscr mscr pDo gbk cv pDo pDo pDo 11 ops 2 ops gbk pDo pDo pDo
Experience FlumeJava released to Google users in May 2009 Now: hundreds of pipelines run by hundreds of users every month Real pipelines process megabytes <=> petabytes Users find FlumeJava a lot easier than MapReduce Advanced users can exert control over optimizer and executor if/when necessary But when things go wrong, lower abstraction levels intrude
How Well Does It Work? How does FlumeJava compare in speed to: an equally modular Java MapReduce pipeline? a hand-optimized Java MapReduce pipeline? a hand-optimized Sawzall pipeline? Sawzall: language for logs processing How big are pipelines in practice? How much does the optimizer help?
Performance
Optimizer Impact
Current and Future Work FlumeC++ just released to Google users Auto-tuner Profile executions,choose good settings for tuning MapReduces Other execution substrates than MapReduce Continuous/streaming execution? Dynamic code generation and optimization?
A More Advanced Approach Apply advanced PL ideas to the data-parallel domain A custom language tuned to this domain A sophisticated static optimizer and code generator An integrated parallel run-time system
Lumberjack A language designed for data-parallel programming An implicitly parallel model All collections potentially PCollections All loops potentially parallel Functional Mostly side-effect free Concise lambdas Advanced type system to minimize verbosity
Static Optimizer Decide which collections are PCollections,which loops are parallel loops Interprocedural context-sensitive analysis OO type analysis side-effect analysis inlining dead assignment elimination …
Parallel Run-Time System Similar to Flume’s run-time system Schedules MapReduces Manages temp files Handles faults
Result: Not Successful A new language is a hard sell to most developers Language details obscure key new concepts Hard to be proficient in yet another language with yet another syntax Libraries? Increases risk to their projects Optimizer constrained by limits of static analysis
Response: FlumeJava Replace custom language with Java + Flume library More verbose syntactically ,[object Object]
All standard libraries & coding idioms preserved
Much less risk
Easy to try out, easy to like, easy to adopt

Contenu connexe

Tendances

Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkSandy Ryza
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Databricks
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Turi, Inc.
 
Hadoop and Cascading At AJUG July 2009
Hadoop and Cascading At AJUG July 2009Hadoop and Cascading At AJUG July 2009
Hadoop and Cascading At AJUG July 2009Christopher Curtin
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Yves Raimond
 
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)PyData
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLAdam Gibson
 
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Building Machine Learning Applications with Sparkling Water
Building Machine Learning Applications with Sparkling WaterBuilding Machine Learning Applications with Sparkling Water
Building Machine Learning Applications with Sparkling WaterSri Ambati
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXrhatr
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaSri Ambati
 
Strata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseStrata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseJulien Le Dem
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 

Tendances (20)

Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with Spark
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
 
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
 
Hadoop and Cascading At AJUG July 2009
Hadoop and Cascading At AJUG July 2009Hadoop and Cascading At AJUG July 2009
Hadoop and Cascading At AJUG July 2009
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
 
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextML
 
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Building Machine Learning Applications with Sparkling Water
Building Machine Learning Applications with Sparkling WaterBuilding Machine Learning Applications with Sparkling Water
Building Machine Learning Applications with Sparkling Water
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
H2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal MalohlavaH2O World - Sparkling Water - Michal Malohlava
H2O World - Sparkling Water - Michal Malohlava
 
Strata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed databaseStrata NY 2018: The deconstructed database
Strata NY 2018: The deconstructed database
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
L15 Data Source Layer
L15 Data Source LayerL15 Data Source Layer
L15 Data Source Layer
 

En vedette (11)

Emily_Okonjo_MBA_Certificate_Feb2015
Emily_Okonjo_MBA_Certificate_Feb2015Emily_Okonjo_MBA_Certificate_Feb2015
Emily_Okonjo_MBA_Certificate_Feb2015
 
Maereg CVV
Maereg CVVMaereg CVV
Maereg CVV
 
Using triangles in Technical Analysis
Using triangles  in Technical AnalysisUsing triangles  in Technical Analysis
Using triangles in Technical Analysis
 
Proyecto tic numero 12
Proyecto tic numero 12Proyecto tic numero 12
Proyecto tic numero 12
 
American university ms back side
American university ms back sideAmerican university ms back side
American university ms back side
 
малинин
малининмалинин
малинин
 
Lojas virtuais
Lojas virtuaisLojas virtuais
Lojas virtuais
 
Etpourtantdanslemonde.exercices.fle
Etpourtantdanslemonde.exercices.fleEtpourtantdanslemonde.exercices.fle
Etpourtantdanslemonde.exercices.fle
 
Qu'est-ce qu'une école d'art ?
Qu'est-ce qu'une école d'art ?Qu'est-ce qu'une école d'art ?
Qu'est-ce qu'une école d'art ?
 
C.V.
C.V.C.V.
C.V.
 
O que vem depois do Mobile - Campus party 2016 #CPB9
O que vem depois do Mobile - Campus party 2016 #CPB9O que vem depois do Mobile - Campus party 2016 #CPB9
O que vem depois do Mobile - Campus party 2016 #CPB9
 

Similaire à Expressiveness, Simplicity and Users

Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Andrey Vykhodtsev
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To HadoopAdeel Ahmad
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 Andrey Vykhodtsev
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!OSCON Byrum
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on HadoopPaco Nathan
 
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010Christopher Curtin
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowKaxil Naik
 
Vitus Masters Defense
Vitus Masters DefenseVitus Masters Defense
Vitus Masters DefensederDoc
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructurekaveirious
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologieszahid-mian
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingPaco Nathan
 
Software engineering
Software engineeringSoftware engineering
Software engineeringFahe Em
 
Software engineering
Software engineeringSoftware engineering
Software engineeringFahe Em
 
Software Abstractions for Parallel Hardware
Software Abstractions for Parallel HardwareSoftware Abstractions for Parallel Hardware
Software Abstractions for Parallel HardwareJoel Falcou
 

Similaire à Expressiveness, Simplicity and Users (20)

Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Programming for Problem Solving
Programming for Problem SolvingProgramming for Problem Solving
Programming for Problem Solving
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Another Intro To Hadoop
Another Intro To HadoopAnother Intro To Hadoop
Another Intro To Hadoop
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Getting Started on Hadoop
Getting Started on HadoopGetting Started on Hadoop
Getting Started on Hadoop
 
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Vitus Masters Defense
Vitus Masters DefenseVitus Masters Defense
Vitus Masters Defense
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Source-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructureSource-to-source transformations: Supporting tools and infrastructure
Source-to-source transformations: Supporting tools and infrastructure
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Software Abstractions for Parallel Hardware
Software Abstractions for Parallel HardwareSoftware Abstractions for Parallel Hardware
Software Abstractions for Parallel Hardware
 

Plus de greenwop

Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programsgreenwop
 
Unifying Remote Data, Remote Procedure, and Service Clients
Unifying Remote Data, Remote Procedure, and Service ClientsUnifying Remote Data, Remote Procedure, and Service Clients
Unifying Remote Data, Remote Procedure, and Service Clientsgreenwop
 
Category theory, Monads, and Duality in the world of (BIG) Data
Category theory, Monads, and Duality in the world of (BIG) DataCategory theory, Monads, and Duality in the world of (BIG) Data
Category theory, Monads, and Duality in the world of (BIG) Datagreenwop
 
A Featherweight Approach to FOOL
A Featherweight Approach to FOOLA Featherweight Approach to FOOL
A Featherweight Approach to FOOLgreenwop
 
The Rise of Dynamic Languages
The Rise of Dynamic LanguagesThe Rise of Dynamic Languages
The Rise of Dynamic Languagesgreenwop
 
Turning a Tower of Babel into a Beautiful Racket
Turning a Tower of Babel into a Beautiful RacketTurning a Tower of Babel into a Beautiful Racket
Turning a Tower of Babel into a Beautiful Racketgreenwop
 
Normal Considered Harmful
Normal Considered HarmfulNormal Considered Harmful
Normal Considered Harmfulgreenwop
 
Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?greenwop
 
High Performance JavaScript
High Performance JavaScriptHigh Performance JavaScript
High Performance JavaScriptgreenwop
 

Plus de greenwop (9)

Performance Analysis of Idle Programs
Performance Analysis of Idle ProgramsPerformance Analysis of Idle Programs
Performance Analysis of Idle Programs
 
Unifying Remote Data, Remote Procedure, and Service Clients
Unifying Remote Data, Remote Procedure, and Service ClientsUnifying Remote Data, Remote Procedure, and Service Clients
Unifying Remote Data, Remote Procedure, and Service Clients
 
Category theory, Monads, and Duality in the world of (BIG) Data
Category theory, Monads, and Duality in the world of (BIG) DataCategory theory, Monads, and Duality in the world of (BIG) Data
Category theory, Monads, and Duality in the world of (BIG) Data
 
A Featherweight Approach to FOOL
A Featherweight Approach to FOOLA Featherweight Approach to FOOL
A Featherweight Approach to FOOL
 
The Rise of Dynamic Languages
The Rise of Dynamic LanguagesThe Rise of Dynamic Languages
The Rise of Dynamic Languages
 
Turning a Tower of Babel into a Beautiful Racket
Turning a Tower of Babel into a Beautiful RacketTurning a Tower of Babel into a Beautiful Racket
Turning a Tower of Babel into a Beautiful Racket
 
Normal Considered Harmful
Normal Considered HarmfulNormal Considered Harmful
Normal Considered Harmful
 
Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?Programming Language Memory Models: What do Shared Variables Mean?
Programming Language Memory Models: What do Shared Variables Mean?
 
High Performance JavaScript
High Performance JavaScriptHigh Performance JavaScript
High Performance JavaScript
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Expressiveness, Simplicity and Users

  • 1. Expressiveness, Simplicity, and Users Craig Chambers Google
  • 2. A Brief Bio MIT: 82-86 Argus, with Barbara Liskov, Bill Weihl, Mark Day Stanford: 86-91 Self, with David Ungar, UrsHölzle, … U. of Washington: 91-07 Cecil, MultiJava, ArchJava; Vortex, DyC, Rhodium, ... Jeff Dean, Dave Grove, Jonathan Aldrich, Todd Millstein, Sorin Lerner, … Google: 07- Flume, …
  • 3. Some Questions What makes an idea successful? Which ideas are adopted most? Which ideas have the most impact?
  • 4. Outline Some past projects Self language, Self compiler Cecil language, Vortex compiler A current project Flume: data-parallel programming system
  • 5. Self Language[Ungar & Smith 87] Purified essence of Smalltalk-like languages all data are objects no classes all actions are messages field accesses, control structures Core ideas are very simple widely cited and understood
  • 6. Self v2[Chambers, Ungar, Chang 91] Added encapsulation and privacy Added prioritized multiple inheritance supported both ordered and unordered mult. inh. Sophisticated, or complicated? Unified, or kitchen sink? Not adopted; dropped from Self v3
  • 7. Self Compiler[Chambers, Ungar 89-91] Dynamic optimizer (an early JIT compiler) Customization: specialize code for each receiver class Class/type dataflow analysis; lots of inlining Lazy compilation of uncommon code paths 89: customization + simple analysis: effective 90: + complicated analysis: more effective but slow 91: + lazy compilation: still more effective, and fast [Hölzle, … 92-94]: + dynamic type feedback: zowie! Simple analysis + type feedback widely adopted
  • 8. Cecil Language[Chambers, Leavens, Millstein, Litvinov 92-99] Pure objects, pure messages Multimethods, static typechecking encapsulation modules, modular typechecking constraint-based polymorphic type system integrates F-bounded poly. and “where” clauses later: MultiJava, EML [Lee], Diesel, … Work on multimethods, “open classes” is well-known Multimethods not widely available 
  • 9. Vortex Compiler[Chambers, Dean, Grove, Lerner, … 94-01] Whole-program optimizer, for Cecil, Java, … Class hierarchy analysis Profile-guided class/type feedback Dataflow analysis, code specialization Interprocedural static class/type analysis Fast context-insensitive [Defouw], context-sensitive Incremental recompilation; composable dataflow analyses Project well-known CHA: my most cited paper; a very simple idea More-sophisticated work less widely adopted
  • 10. Some Other Work DyC [Grant, Philipose, Mock, Eggers 96-00] Dynamic compilation for C ArchJava, AliasJava, … [Aldrich, Notkin 01-04 …] PL support for software architecture Cobalt, Rhodium [Lerner, Millstein 02-05 …] Provably correct compiler optimizations
  • 11. Trends Simpler ideas easier to adopt Sophisticated ideas need a simple story to be impactful Ideal: “deceptively simple” Unification != Swiss Army Knife Language papers have had more citations;compiler work has had more practical impact The combination can work well
  • 12. A Current Project:Flume[Chambers, Raniwala, Perry, ... 10] Make data-parallel MapReduce-like pipelineseasy to write yetefficient to run
  • 13. Data-Parallel Programming Analyze & transform large, homogeneous data sets, processing separate elements in parallel Web pages Click logs Purchase records Geographical data sets Census data … Ideal: “embarrassingly parallel” analysis ofpetabytes of data
  • 14. Challenges Parallel distributed programming is hard To do: Assign machines Distribute program binaries Partition input data across machines Synchronize jobs, communicate data when needed Monitor jobs Deal with faults in programs, machines, network, … Tune: stragglers, work stealing, … What if user is a domain expert, not a systems/PL expert?
  • 15. MapReduce[Dean & Ghemawat, 04] purchases queries map item -> co-item term -> hour+city shuffle item -> all co-items term-> (hour+city)* reduce item -> recommend term-> what’s hot, when
  • 16. MapReduce Greatly eases writing fault-tolerant data-parallel programs Handles many tedious and/or tricky details Has excellent (batch) performance Offers a simple programming model Lots of knobs for tuning Pipelines of MapReduces? Additional details to handle temp files pipeline control Programming model becomes low-level
  • 17. Flume Ease task of writing data-parallel pipelines Offer high-level data-parallel abstractions,as a Java or C++ library Classes for (possibly huge) immutable collections Methods for data-parallel operations Easily composed to form pipelines Entire pipeline in a single program Automatically optimize and execute pipeline,e.g., via a series of MapReduces Manage lower-level details automatically
  • 18. Flume Classes and Methods Core data-parallel collection classes: PCollection<T>, PTable<K,V> Core data-parallel methods: parallelDo(DoFn) groupByKey() combineValues(CombineFn) flatten(...) read(Source), writeTo(Sink), … Derive other methods from these primitives: join(...), count(), top(CompareFn,N), ...
  • 19. Example: TopWords PCollection<String> lines =read(TextIO.source(“/gfs/corpus/*.txt”)); PCollection<String> words =lines.parallelDo(newExtractWordsFn()); PTable<String, Long> wordCounts =words.count(); PCollection<Pair<String, Long>> topWords =wordCounts.top(newOrderCountsFn(), 1000); PCollection<String>formattedOutput =topWords.parallelDo(newFormatCountFn()); formattedOutput.writeTo(TextIO.sink(“cnts.txt”)); FlumeJava.run();
  • 20. Example: TopWords read(TextIO.source(“/gfs/corpus/*.txt”)) .parallelDo(newExtractWordsFn()) .count() .top(new OrderCountsFn(), 1000) .parallelDo(new FormatCountFn()) .writeTo(TextIO.sink(“cnts.txt”)); FlumeJava.run();
  • 21. Execution Graph Data-parallel primitives (e.g., parallelDo) are “lazy” Don’t actually run right away, but wait until demanded Calls to primitives build an execution graph Nodes are operations to be performed Edges are PCollections that will hold the results An unevaluated result PCollection is a “future” Points to the graph that computes it Derived operations (e.g., count, user code) call lazy primitives and so get inlined away Evaluation is “demanded” by FlumeJava.run() Optimizes, then executes
  • 22. read read(TextIO.source(“/…/*.txt”)) pDo parallelDo(newExtractWordsFn()) pDo count() gbk Execution Graph cv pDo gbk top(new OrderCountsFn(), 1000) pDo pDo parallelDo(new FormatCountFn()) write writeTo(TextIO.sink(“cnts.txt”))
  • 23. Optimizer Fuse trees of parallelDo operations into one Producer-consumer,co-consumers (“siblings”) Eliminate now-unused intermediate PCollections Form MapReduces pDo + gbk + cv + pDo MapShuffleCombineReduce (MSCR) General: multi-mapper, multi-reducer, multi-output pDo pDo pDo pDo pDo pDo
  • 24. read read(TextIO.source(“/…/*.txt”)) mscr pDo pDo parallelDo(newExtractWordsFn()) pDo count() gbk Final Pipeline Fusion cv mscr pDo 8 operations 2 operations gbk top(new OrderCountsFn(), 1000) pDo pDo pDo parallelDo(new FormatCountFn()) write writeTo(TextIO.sink(“cnts.txt”))
  • 25. Executor Runs each optimized MSCR If small data, runs locally, sequentially develop and test in normal IDE If large data, runs remotely, in parallel Handles creating, deleting temp files Supports fast re-execution of incomplete runs Caches, reuses partial pipeline results
  • 26. Another Example: SiteData GetPScoreFn, GetVerticalFn pDo pDo pDo GetDocInfoFn gbk PickBestFn cv pDo pDo pDo join() gbk pDo pDo MakeDocTraitsFn
  • 27. Another Example: SiteData pDo pDo pDo pDo mscr mscr pDo gbk cv pDo pDo pDo 11 ops 2 ops gbk pDo pDo pDo
  • 28. Experience FlumeJava released to Google users in May 2009 Now: hundreds of pipelines run by hundreds of users every month Real pipelines process megabytes <=> petabytes Users find FlumeJava a lot easier than MapReduce Advanced users can exert control over optimizer and executor if/when necessary But when things go wrong, lower abstraction levels intrude
  • 29. How Well Does It Work? How does FlumeJava compare in speed to: an equally modular Java MapReduce pipeline? a hand-optimized Java MapReduce pipeline? a hand-optimized Sawzall pipeline? Sawzall: language for logs processing How big are pipelines in practice? How much does the optimizer help?
  • 32. Current and Future Work FlumeC++ just released to Google users Auto-tuner Profile executions,choose good settings for tuning MapReduces Other execution substrates than MapReduce Continuous/streaming execution? Dynamic code generation and optimization?
  • 33. A More Advanced Approach Apply advanced PL ideas to the data-parallel domain A custom language tuned to this domain A sophisticated static optimizer and code generator An integrated parallel run-time system
  • 34. Lumberjack A language designed for data-parallel programming An implicitly parallel model All collections potentially PCollections All loops potentially parallel Functional Mostly side-effect free Concise lambdas Advanced type system to minimize verbosity
  • 35. Static Optimizer Decide which collections are PCollections,which loops are parallel loops Interprocedural context-sensitive analysis OO type analysis side-effect analysis inlining dead assignment elimination …
  • 36. Parallel Run-Time System Similar to Flume’s run-time system Schedules MapReduces Manages temp files Handles faults
  • 37. Result: Not Successful A new language is a hard sell to most developers Language details obscure key new concepts Hard to be proficient in yet another language with yet another syntax Libraries? Increases risk to their projects Optimizer constrained by limits of static analysis
  • 38.
  • 39. All standard libraries & coding idioms preserved
  • 41. Easy to try out, easy to like, easy to adopt
  • 42. Dynamic optimizer less constrained than static optimizer
  • 44.
  • 45. Conclusions Simpler ideas easier to adopt By researchers and by users Sophisticated ideas still needed,to support simple interfaces Doing things dynamically instead of staticallycan be liberating