Soumettre la recherche
Mettre en ligne
Ramping up your Devops Fu for Big Data developers
•
1 j'aime
•
670 vues
François Garillot
Suivre
Lessons learned in building a Spark distribution
Lire moins
Lire la suite
Logiciels
Signaler
Partager
Signaler
Partager
1 sur 40
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
StampedeCon
Managing 10,000 Node Storage Clusters at Twitter
Managing 10,000 Node Storage Clusters at Twitter
J On The Beach
Event Sourcing + CQRS
Event Sourcing + CQRS
Bryan Reinero
자마린.안드로이드 기본 내장레이아웃(Built-In List Item Layouts)
자마린.안드로이드 기본 내장레이아웃(Built-In List Item Layouts)
탑크리에듀(구로디지털단지역3번출구 2분거리)
The easiest consistent hashing
The easiest consistent hashing
DaeMyung Kang
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
Holden Karau
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Holden Karau
Beyond shuffling global big data tech conference 2015 sj
Beyond shuffling global big data tech conference 2015 sj
Holden Karau
Recommandé
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
The Evolution of Data Analysis with Hadoop - StampedeCon 2014
StampedeCon
Managing 10,000 Node Storage Clusters at Twitter
Managing 10,000 Node Storage Clusters at Twitter
J On The Beach
Event Sourcing + CQRS
Event Sourcing + CQRS
Bryan Reinero
자마린.안드로이드 기본 내장레이아웃(Built-In List Item Layouts)
자마린.안드로이드 기본 내장레이아웃(Built-In List Item Layouts)
탑크리에듀(구로디지털단지역3번출구 2분거리)
The easiest consistent hashing
The easiest consistent hashing
DaeMyung Kang
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
Holden Karau
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Sparkling pandas Letting Pandas Roam - PyData Seattle 2015
Holden Karau
Beyond shuffling global big data tech conference 2015 sj
Beyond shuffling global big data tech conference 2015 sj
Holden Karau
2014 holden - databricks umd scala crash course
2014 holden - databricks umd scala crash course
Holden Karau
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Holden Karau
JP version - Beyond Shuffling - Apache Spark のスケールアップのためのヒントとコツ
JP version - Beyond Shuffling - Apache Spark のスケールアップのためのヒントとコツ
Holden Karau
Spark with Elasticsearch
Spark with Elasticsearch
Holden Karau
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
Holden Karau
Fraud Detection using Hadoop
Fraud Detection using Hadoop
hadooparchbook
What Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOps
Matt Ray
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
hadooparchbook
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Spark Summit
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
Spark Summit
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
hadooparchbook
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud Detection
hadooparchbook
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
François Garillot
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs Strata NY 2015
Holden Karau
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
Holden Karau
DevOps DC - Magic Myth and the DevOps
DevOps DC - Magic Myth and the DevOps
Jennifer Davis
Magic Myth and the DevOps, ANTIDOTES TO LEARNED HELPLESSNESS AND FEAR CULTURE...
Magic Myth and the DevOps, ANTIDOTES TO LEARNED HELPLESSNESS AND FEAR CULTURE...
Jennifer Davis
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
Joe Stein
Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)
Chris Aniszczyk
The Place of Schema.org in Linked Ocean Data
The Place of Schema.org in Linked Ocean Data
Adam Leadbetter
Seasr Overview Ws April 2009
Seasr Overview Ws April 2009
Loretta Auvil
Contenu connexe
En vedette
2014 holden - databricks umd scala crash course
2014 holden - databricks umd scala crash course
Holden Karau
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Holden Karau
JP version - Beyond Shuffling - Apache Spark のスケールアップのためのヒントとコツ
JP version - Beyond Shuffling - Apache Spark のスケールアップのためのヒントとコツ
Holden Karau
Spark with Elasticsearch
Spark with Elasticsearch
Holden Karau
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
Holden Karau
Fraud Detection using Hadoop
Fraud Detection using Hadoop
hadooparchbook
What Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOps
Matt Ray
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
hadooparchbook
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Spark Summit
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
Spark Summit
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
hadooparchbook
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud Detection
hadooparchbook
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
François Garillot
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs Strata NY 2015
Holden Karau
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
Holden Karau
En vedette
(16)
2014 holden - databricks umd scala crash course
2014 holden - databricks umd scala crash course
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
Effective testing for spark programs scala bay preview (pre-strata ny 2015)
JP version - Beyond Shuffling - Apache Spark のスケールアップのためのヒントとコツ
JP version - Beyond Shuffling - Apache Spark のスケールアップのためのヒントとコツ
Spark with Elasticsearch
Spark with Elasticsearch
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
Fraud Detection using Hadoop
Fraud Detection using Hadoop
What Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOps
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
Application Architectures with Hadoop
Application Architectures with Hadoop
Spark Summit EU talk by Ross Lawley
Spark Summit EU talk by Ross Lawley
Hadoop application architectures - Fraud detection tutorial
Hadoop application architectures - Fraud detection tutorial
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud Detection
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
A Gentle Introduction to Locality Sensitive Hashing with Apache Spark
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs Strata NY 2015
Alpine academy apache spark series #1 introduction to cluster computing wit...
Alpine academy apache spark series #1 introduction to cluster computing wit...
Similaire à Ramping up your Devops Fu for Big Data developers
DevOps DC - Magic Myth and the DevOps
DevOps DC - Magic Myth and the DevOps
Jennifer Davis
Magic Myth and the DevOps, ANTIDOTES TO LEARNED HELPLESSNESS AND FEAR CULTURE...
Magic Myth and the DevOps, ANTIDOTES TO LEARNED HELPLESSNESS AND FEAR CULTURE...
Jennifer Davis
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
Joe Stein
Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)
Chris Aniszczyk
The Place of Schema.org in Linked Ocean Data
The Place of Schema.org in Linked Ocean Data
Adam Leadbetter
Seasr Overview Ws April 2009
Seasr Overview Ws April 2009
Loretta Auvil
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
Corley S.r.l.
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
SEASR Overview
SEASR Overview
Loretta Auvil
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
MapR Technologies
Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !
Microsoft
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
Uwe Printz
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
Uwe Printz
Introduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe Seiler
Codemotion
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
Paco Nathan
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
Joy Chatterjee Portfolio_2015
Joy Chatterjee Portfolio_2015
Joy Chatterjee
Hybrid cloud wiskyweb2012
Hybrid cloud wiskyweb2012
Combell NV
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
MapR Technologies
Ncku csie talk about Spark
Ncku csie talk about Spark
Giivee The
Similaire à Ramping up your Devops Fu for Big Data developers
(20)
DevOps DC - Magic Myth and the DevOps
DevOps DC - Magic Myth and the DevOps
Magic Myth and the DevOps, ANTIDOTES TO LEARNED HELPLESSNESS AND FEAR CULTURE...
Magic Myth and the DevOps, ANTIDOTES TO LEARNED HELPLESSNESS AND FEAR CULTURE...
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
Apache Mesos at Twitter (Texas LinuxFest 2014)
Apache Mesos at Twitter (Texas LinuxFest 2014)
The Place of Schema.org in Linked Ocean Data
The Place of Schema.org in Linked Ocean Data
Seasr Overview Ws April 2009
Seasr Overview Ws April 2009
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
SEASR Overview
SEASR Overview
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe Seiler
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
Joy Chatterjee Portfolio_2015
Joy Chatterjee Portfolio_2015
Hybrid cloud wiskyweb2012
Hybrid cloud wiskyweb2012
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
Ncku csie talk about Spark
Ncku csie talk about Spark
Plus de François Garillot
Growing Your Types Without Growing Your Workload
Growing Your Types Without Growing Your Workload
François Garillot
Deep learning on a mixed cluster with deeplearning4j and spark
Deep learning on a mixed cluster with deeplearning4j and spark
François Garillot
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
François Garillot
Delivering near real time mobility insights at swisscom
Delivering near real time mobility insights at swisscom
François Garillot
Spark Streaming : Dealing with State
Spark Streaming : Dealing with State
François Garillot
Diving In The Deep End Of The Big Data Pool
Diving In The Deep End Of The Big Data Pool
François Garillot
Scala Collections : Java 8 on Steroids
Scala Collections : Java 8 on Steroids
François Garillot
Plus de François Garillot
(7)
Growing Your Types Without Growing Your Workload
Growing Your Types Without Growing Your Workload
Deep learning on a mixed cluster with deeplearning4j and spark
Deep learning on a mixed cluster with deeplearning4j and spark
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Delivering near real time mobility insights at swisscom
Delivering near real time mobility insights at swisscom
Spark Streaming : Dealing with State
Spark Streaming : Dealing with State
Diving In The Deep End Of The Big Data Pool
Diving In The Deep End Of The Big Data Pool
Scala Collections : Java 8 on Steroids
Scala Collections : Java 8 on Steroids
Dernier
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
Jittipong Loespradit
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
masabamasaba
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
masabamasaba
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
masabamasaba
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
masabamasaba
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
Shane Coughlan
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Bert Jan Schrijver
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
WSO2
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
WSO2
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
masabamasaba
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
AmarnathKambale
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
Juha-Pekka Tolvanen
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
Jim McKeeth
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
Presentation.STUDIO
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
WSO2
Dernier
(20)
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
Ramping up your Devops Fu for Big Data developers
1.
Ramping(up(your(devops1fu( for(Big(Data(Developers 1
2.
Francois)Garillot Typesafe @huitseeker 2
3.
3
4.
4
5.
Apache'Mesos • top%level)Apache)project)since)July)2013 • framework)agnos?c •
a)cluster)manager)&)resource)manager • developed)by)TwiDer)&)Mesosphere,)among)others • "The)data)center's)opera?ng)system" 5
6.
Mesos%Principles Mesos%=%cluster%+%cgroups%+%LXC 6
7.
7
8.
8
9.
Mesos%internals 9
10.
10
11.
11
12.
Mesos%topology 12
13.
13
14.
So,$why$do$we$care$? • mul%&processes • mul%&roles •
mul%&versions • legacy3use3cases 14
15.
Spark "To$validate$our$hypothesis$[...],$we$have%also%built%a% new%framework%on%top%of%Mesos%called%Spark,$ op7mized$for$itera7ve$jobs$where$a$dataset$is$reused$ in$many$parallel$operand$shown$that$Spark$can$ outperform$Hadoop$by$10x$in$itera7ve$machine$ learning$workloads. —"Hindman"&"al."2011 15
16.
Spark • top%level)Apache)Project)since)February)2014 • also,)growth 16
17.
Spark&expressivity val textFile =
spark.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") 17
18.
Java$word$count package org.myorg; import java.io.IOException; import
java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class WordCount { public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setMapperClass(Map.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitForCompletion(true); } } 18
19.
Spark&advantages • Fast&!&... • Because&no&dump&to&disk&between&every&opera9on •
Combiners&(map<side&reduce)&automa9cally& applied&... • ...&and&easy&to&define • clever&map&pipeline 19
20.
Spark&advantages • flexible(I/O(:(interfaces(to(DBs,(Streaming,(S3,(local( filesystem(and(HDFS • faultAtolerance(for(executor(&(master •
SparkSQL • MLLib,(GraphX 20
21.
Spark&Streaming 21
22.
Spark&advantages Momentum(!! • Sparkling+Water+=+H2O+++Spark • Apache+Mahout+rewrite+since+March+2014 •
DeepLearning4jBScaleout+=+Deeplearning4j+on+ ND4J+++Spark • 'Lingua+Franca'+of+distributed+data+analysis 22
23.
Spark&clustering&modes • local • standalone •
Mesos • YARN 23
24.
Spark&on&Mesos 24
25.
25
26.
Fine%grained*mode • “fine&grained”-mode-(default):-each-Spark-task-runs- as-a-separate-Mesos-task. • each-applica?on-gets-more-or-fewer-machines-as-it- ramps-up-and-down, •
but-overhead-in-launching-each-task. 26
27.
Coarse'grained,mode • “coarse)grained”/mode/:/only/one/long)running/ Spark/task/on/each/Mesos machine, • and/dynamically/schedule/its/own/“mini)tasks”/ within/it. •
much/lower/startup/overhead, • but/reserving/the/Mesos/resources/for/the/duraAon 27
28.
Deployment 28
29.
Automa'on 29
30.
Ansible • pilots(through(ssh • no(dependencies(on(slaves •
YAML(scrip7ng,(but(can(drop( down(to(Python • integrated(modules(for(EC2,( apt(... 30
31.
Ansible ... - name: download
spark sources git: repo: "{{ spark_repo }}" dest: "{{ spark_dir }}" version: "{{ spark_ref }}" force: yes - name: prepare sources for {{ scala_major_version }} command: dev/change-version-to-{{scala_major_version}}.sh args: chdir: "{{spark_dir}}" - name: build spark command: ./make-distribution.sh -Pyarn -Phadoop-{{hadoop_major_version}} args: chdir: "{{ spark_dir }}" environment: java_env ... 31
32.
Packer • hybrid(virtual(image( genera2on • provision(on(VirtualBox •
provision(on(Amazon(AWS • Vagrant(an(interes2ng(target( as(well 32
33.
Tinc • VPN • simple+file-based+configura7on+ (BSD-style) •
automa7c+mesh+rou7ng+in+1+ config+line: AutoConnect = yes • mul7ple+opera7ng+systems 33
34.
Tinc%and%Spark • Spark'binds'using'naming'only'(see'SPARK9624) • Tinc'name'resolu@on'only'works'reliably'in'some' configura@ons •
use'avahi9daemon'or'your'own'DNS • more'simply,'set'hostnames'and'write'to'/etc/ hosts'everywhere • avoid'non9ascii'in'both'@nc'network'and'machine' names 34
35.
So#Far • deployment+of+Mesos,+HDFS,+Spark • fully+automated,+from+any+commit+of+Mesos+/+Spark+ git+repositories •
...+or+our+forks • stress=tes>ng,+in+collab.+Mesosphere+&+DataBricks • partnership+for+huge+prototype+deployment 35
36.
Ongoing&steps 36
37.
Mesos%and%Spark%integra0on • dynamic)alloca,on)for)coarse1grained)mode)&) external)shuffle)service) • co1tes,ng)w/DB,)Mesosphere) •
cluster)mode) 37
38.
Docker':'your'favorite' containerizer 38
39.
39
40.
40
Télécharger maintenant