SlideShare a Scribd company logo
1 of 12
Download to read offline
Map/Reduce
Обзор решений
Алексей Злобин
alexey.zlobin@gmail.com
Sample job: driver
public static void main(String[] a) throws Exception {
Configuration conf = new Configuration();
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(a[0]));
FileOutputFormat.setOutputPath(job, new Path(a[1]));
job.waitForCompletion(true);
}
Sample job: mapper
class M extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable k, Text v, Context ctx) {
String line = v.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
ctx.write(word, one);
}
}
}
Sample job: reducer
class R extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text k, Iterable<IntWritable> v, Context ctx)
{
int sum = 0;
for (IntWritable val : v)
sum += val.get();
context.write(k, new IntWritable(sum));
}
}
Pig snippet
raw =
LOAD 'excite.log' USING PigStorage('t') AS (user, time, qry);
clean1 = FILTER raw BY
org.apache.pig.tutorial.NonURLDetector(qry);
clean2 = FOREACH clean1
GENERATE user, time, org.apache.pig.tutorial.ToLower(qr)
as query;
Hive snippet
CREATE TABLE invites
(foo INT, bar STRING) PARTITIONED BY (ds STRING);
LOAD DATA LOCAL
INPATH './examples/files/kv2.txt' OVERWRITE
INTO TABLE invites PARTITION (ds='2008-08-15');
SELECT a.foo
FROM invites a
WHERE a.ds='2008-08-15';
INSERT OVERWRITE DIRECTORY '/tmp/reg_5'
SELECT a.foo, a.bar FROM invites a;
Spark: example
val counts = lines.flatMap(line => line.split(“ “))
.map(word => (word, 1))
.reduceByKey(_ + _)
Shark example
CREATE TABLE src(key INT, value STRING);
LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt'
INTO TABLE src;
SELECT COUNT(1) FROM src;
CREATE TABLE src_cached AS SELECT * FROM SRC;
SELECT COUNT(1) FROM src_cached;
Disco example
def fun_map(line, params):
for word in line.split():
yield word, 1
def fun_reduce(iter, params):
for word, counts in kvgroup(sorted(iter)):
yield word, sum(counts)
Disco driver
job = Job().run(
input=["http://discoproject.org/media/text/chekhov.
txt"],
map=map,
reduce=reduce)
for word, count in result_iterator(job.wait(show=True)):
print(word, count)
References I
● “MapReduce: Simplified Data Processing on Large Clusters” Dean, Jeffrey and
Ghemawat, Sanjay
● “A Comparison of Join Algorithms for Log Processing in MapReduce” S. Blanas, J.
Patel, V. Ercegovac, J. Rao, E. Shekita, Y. Tian
● “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster
Computing” Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica
● “Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing
on Large Clusters” Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, Ion
Stoica
● “Shark: Fast Data Analysis Using Coarse-grained Distributed Memory” Cliff Engle,
Antonio Lupher, Reynold Xin, Matei Zaharia, Haoyuan Li, Scott Shenker, Ion Stoica
References II
● Disco Technical Overview http://disco.readthedocs.org/en/latest/overview.html
● Disco Distributed Filesystem http://disco.readthedocs.org/en/latest/howto/ddfs.html
● An efficient, immutable, persistent mapping object http://discodb.readthedocs.
org/en/latest/

More Related Content

What's hot

Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regressionAkhilesh Joshi
 
Do something in 5 minutes with gas 1-use spreadsheet as database
Do something in 5 minutes with gas 1-use spreadsheet as databaseDo something in 5 minutes with gas 1-use spreadsheet as database
Do something in 5 minutes with gas 1-use spreadsheet as databaseBruce McPherson
 
Вячеслав Блинов: "Spring Integration as an Integration Patterns Provider"
Вячеслав Блинов: "Spring Integration as an Integration Patterns Provider"Вячеслав Блинов: "Spring Integration as an Integration Patterns Provider"
Вячеслав Блинов: "Spring Integration as an Integration Patterns Provider"Anna Shymchenko
 
Air Quality in Taiwan 2013
Air Quality in Taiwan 2013 Air Quality in Taiwan 2013
Air Quality in Taiwan 2013 Kuang-Hao Cheng
 
Do something useful in Apps Script 5. Get your analytics pageviews to a sprea...
Do something useful in Apps Script 5. Get your analytics pageviews to a sprea...Do something useful in Apps Script 5. Get your analytics pageviews to a sprea...
Do something useful in Apps Script 5. Get your analytics pageviews to a sprea...Bruce McPherson
 
python高级内存管理
python高级内存管理python高级内存管理
python高级内存管理rfyiamcool
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike
 
Hacking the Internet of Things for Fun & Profit
Hacking the Internet of Things for Fun & ProfitHacking the Internet of Things for Fun & Profit
Hacking the Internet of Things for Fun & ProfitRuben van Vreeland
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Anuj Jain
 
Do something in 5 with gas 3-simple invoicing app
Do something in 5 with gas 3-simple invoicing appDo something in 5 with gas 3-simple invoicing app
Do something in 5 with gas 3-simple invoicing appBruce McPherson
 
Ganga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing gridGanga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing gridMatt Williams
 
Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)Dina Goldshtein
 
polynomial linear regression
polynomial linear regressionpolynomial linear regression
polynomial linear regressionAkhilesh Joshi
 
Google Sheets in Python with gspread
Google Sheets in Python with gspreadGoogle Sheets in Python with gspread
Google Sheets in Python with gspreadJure Cuhalev
 
Do something in 5 with gas 7-email log
Do something in 5 with gas 7-email logDo something in 5 with gas 7-email log
Do something in 5 with gas 7-email logBruce McPherson
 
LINE iOS開発で実践しているGit tips
LINE iOS開発で実践しているGit tipsLINE iOS開発で実践しているGit tips
LINE iOS開発で実践しているGit tipsLINE Corporation
 

What's hot (20)

Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
05. haskell streaming io
05. haskell streaming io05. haskell streaming io
05. haskell streaming io
 
R seminar dplyr package
R seminar dplyr packageR seminar dplyr package
R seminar dplyr package
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regression
 
Do something in 5 minutes with gas 1-use spreadsheet as database
Do something in 5 minutes with gas 1-use spreadsheet as databaseDo something in 5 minutes with gas 1-use spreadsheet as database
Do something in 5 minutes with gas 1-use spreadsheet as database
 
Вячеслав Блинов: "Spring Integration as an Integration Patterns Provider"
Вячеслав Блинов: "Spring Integration as an Integration Patterns Provider"Вячеслав Блинов: "Spring Integration as an Integration Patterns Provider"
Вячеслав Блинов: "Spring Integration as an Integration Patterns Provider"
 
Code
CodeCode
Code
 
Air Quality in Taiwan 2013
Air Quality in Taiwan 2013 Air Quality in Taiwan 2013
Air Quality in Taiwan 2013
 
Do something useful in Apps Script 5. Get your analytics pageviews to a sprea...
Do something useful in Apps Script 5. Get your analytics pageviews to a sprea...Do something useful in Apps Script 5. Get your analytics pageviews to a sprea...
Do something useful in Apps Script 5. Get your analytics pageviews to a sprea...
 
python高级内存管理
python高级内存管理python高级内存管理
python高级内存管理
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019
 
Hacking the Internet of Things for Fun & Profit
Hacking the Internet of Things for Fun & ProfitHacking the Internet of Things for Fun & Profit
Hacking the Internet of Things for Fun & Profit
 
Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1Aggregation Framework in MongoDB Overview Part-1
Aggregation Framework in MongoDB Overview Part-1
 
Do something in 5 with gas 3-simple invoicing app
Do something in 5 with gas 3-simple invoicing appDo something in 5 with gas 3-simple invoicing app
Do something in 5 with gas 3-simple invoicing app
 
Ganga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing gridGanga: an interface to the LHC computing grid
Ganga: an interface to the LHC computing grid
 
Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)Look Mommy, No GC! (TechDays NL 2017)
Look Mommy, No GC! (TechDays NL 2017)
 
polynomial linear regression
polynomial linear regressionpolynomial linear regression
polynomial linear regression
 
Google Sheets in Python with gspread
Google Sheets in Python with gspreadGoogle Sheets in Python with gspread
Google Sheets in Python with gspread
 
Do something in 5 with gas 7-email log
Do something in 5 with gas 7-email logDo something in 5 with gas 7-email log
Do something in 5 with gas 7-email log
 
LINE iOS開発で実践しているGit tips
LINE iOS開発で実践しているGit tipsLINE iOS開発で実践しているGit tips
LINE iOS開発で実践しているGit tips
 

Similar to 20140427 parallel programming_zlobin_lecture11

Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Introduction to R
Introduction to RIntroduction to R
Introduction to Ragnonchik
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...CloudxLab
 
JRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusJRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusKoichi Fujikawa
 
Implementing a many-to-many Relationship with Slick
Implementing a many-to-many Relationship with SlickImplementing a many-to-many Relationship with Slick
Implementing a many-to-many Relationship with SlickHermann Hueck
 
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Robert Metzger
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopDilum Bandara
 
Modern technologies in data science
Modern technologies in data science Modern technologies in data science
Modern technologies in data science Chucheng Hsieh
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_PennonsoftPennonSoft
 
Hadoop源码分析 mapreduce部分
Hadoop源码分析 mapreduce部分Hadoop源码分析 mapreduce部分
Hadoop源码分析 mapreduce部分sg7879
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoopdatasalt
 
Functional programming using underscorejs
Functional programming using underscorejsFunctional programming using underscorejs
Functional programming using underscorejs偉格 高
 
ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions Dr. Volkan OBAN
 

Similar to 20140427 parallel programming_zlobin_lecture11 (20)

MapReduce
MapReduceMapReduce
MapReduce
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
JRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop PapyrusJRubyKaigi2010 Hadoop Papyrus
JRubyKaigi2010 Hadoop Papyrus
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Implementing a many-to-many Relationship with Slick
Implementing a many-to-many Relationship with SlickImplementing a many-to-many Relationship with Slick
Implementing a many-to-many Relationship with Slick
 
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
Stratosphere System Overview Big Data Beers Berlin. 20.11.2013
 
Pune Clojure Course Outline
Pune Clojure Course OutlinePune Clojure Course Outline
Pune Clojure Course Outline
 
Introduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with HadoopIntroduction to Map-Reduce Programming with Hadoop
Introduction to Map-Reduce Programming with Hadoop
 
Modern technologies in data science
Modern technologies in data science Modern technologies in data science
Modern technologies in data science
 
Hadoop_Pennonsoft
Hadoop_PennonsoftHadoop_Pennonsoft
Hadoop_Pennonsoft
 
Hw09 Hadoop + Clojure
Hw09   Hadoop + ClojureHw09   Hadoop + Clojure
Hw09 Hadoop + Clojure
 
Hadoop源码分析 mapreduce部分
Hadoop源码分析 mapreduce部分Hadoop源码分析 mapreduce部分
Hadoop源码分析 mapreduce部分
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
 
Refactoring
RefactoringRefactoring
Refactoring
 
Functional programming using underscorejs
Functional programming using underscorejsFunctional programming using underscorejs
Functional programming using underscorejs
 
ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions
 

More from Computer Science Club

20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugsComputer Science Club
 
20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugsComputer Science Club
 
20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugsComputer Science Club
 
20140511 parallel programming_kalishenko_lecture12
20140511 parallel programming_kalishenko_lecture1220140511 parallel programming_kalishenko_lecture12
20140511 parallel programming_kalishenko_lecture12Computer Science Club
 
20140420 parallel programming_kalishenko_lecture10
20140420 parallel programming_kalishenko_lecture1020140420 parallel programming_kalishenko_lecture10
20140420 parallel programming_kalishenko_lecture10Computer Science Club
 
20140413 parallel programming_kalishenko_lecture09
20140413 parallel programming_kalishenko_lecture0920140413 parallel programming_kalishenko_lecture09
20140413 parallel programming_kalishenko_lecture09Computer Science Club
 
20140329 graph drawing_dainiak_lecture02
20140329 graph drawing_dainiak_lecture0220140329 graph drawing_dainiak_lecture02
20140329 graph drawing_dainiak_lecture02Computer Science Club
 
20140329 graph drawing_dainiak_lecture01
20140329 graph drawing_dainiak_lecture0120140329 graph drawing_dainiak_lecture01
20140329 graph drawing_dainiak_lecture01Computer Science Club
 
20140310 parallel programming_kalishenko_lecture03-04
20140310 parallel programming_kalishenko_lecture03-0420140310 parallel programming_kalishenko_lecture03-04
20140310 parallel programming_kalishenko_lecture03-04Computer Science Club
 
20140216 parallel programming_kalishenko_lecture01
20140216 parallel programming_kalishenko_lecture0120140216 parallel programming_kalishenko_lecture01
20140216 parallel programming_kalishenko_lecture01Computer Science Club
 

More from Computer Science Club (20)

20141223 kuznetsov distributed
20141223 kuznetsov distributed20141223 kuznetsov distributed
20141223 kuznetsov distributed
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs
 
20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs20140531 serebryany lecture02_find_scary_cpp_bugs
20140531 serebryany lecture02_find_scary_cpp_bugs
 
20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs20140531 serebryany lecture01_fantastic_cpp_bugs
20140531 serebryany lecture01_fantastic_cpp_bugs
 
20140511 parallel programming_kalishenko_lecture12
20140511 parallel programming_kalishenko_lecture1220140511 parallel programming_kalishenko_lecture12
20140511 parallel programming_kalishenko_lecture12
 
20140420 parallel programming_kalishenko_lecture10
20140420 parallel programming_kalishenko_lecture1020140420 parallel programming_kalishenko_lecture10
20140420 parallel programming_kalishenko_lecture10
 
20140413 parallel programming_kalishenko_lecture09
20140413 parallel programming_kalishenko_lecture0920140413 parallel programming_kalishenko_lecture09
20140413 parallel programming_kalishenko_lecture09
 
20140329 graph drawing_dainiak_lecture02
20140329 graph drawing_dainiak_lecture0220140329 graph drawing_dainiak_lecture02
20140329 graph drawing_dainiak_lecture02
 
20140329 graph drawing_dainiak_lecture01
20140329 graph drawing_dainiak_lecture0120140329 graph drawing_dainiak_lecture01
20140329 graph drawing_dainiak_lecture01
 
20140310 parallel programming_kalishenko_lecture03-04
20140310 parallel programming_kalishenko_lecture03-0420140310 parallel programming_kalishenko_lecture03-04
20140310 parallel programming_kalishenko_lecture03-04
 
20140223-SuffixTrees-lecture01-03
20140223-SuffixTrees-lecture01-0320140223-SuffixTrees-lecture01-03
20140223-SuffixTrees-lecture01-03
 
20140216 parallel programming_kalishenko_lecture01
20140216 parallel programming_kalishenko_lecture0120140216 parallel programming_kalishenko_lecture01
20140216 parallel programming_kalishenko_lecture01
 
20131106 h10 lecture6_matiyasevich
20131106 h10 lecture6_matiyasevich20131106 h10 lecture6_matiyasevich
20131106 h10 lecture6_matiyasevich
 
20131027 h10 lecture5_matiyasevich
20131027 h10 lecture5_matiyasevich20131027 h10 lecture5_matiyasevich
20131027 h10 lecture5_matiyasevich
 
20131027 h10 lecture5_matiyasevich
20131027 h10 lecture5_matiyasevich20131027 h10 lecture5_matiyasevich
20131027 h10 lecture5_matiyasevich
 
20131013 h10 lecture4_matiyasevich
20131013 h10 lecture4_matiyasevich20131013 h10 lecture4_matiyasevich
20131013 h10 lecture4_matiyasevich
 
20131006 h10 lecture3_matiyasevich
20131006 h10 lecture3_matiyasevich20131006 h10 lecture3_matiyasevich
20131006 h10 lecture3_matiyasevich
 
20131006 h10 lecture3_matiyasevich
20131006 h10 lecture3_matiyasevich20131006 h10 lecture3_matiyasevich
20131006 h10 lecture3_matiyasevich
 
20131006 h10 lecture2_matiyasevich
20131006 h10 lecture2_matiyasevich20131006 h10 lecture2_matiyasevich
20131006 h10 lecture2_matiyasevich
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

20140427 parallel programming_zlobin_lecture11

  • 2. Sample job: driver public static void main(String[] a) throws Exception { Configuration conf = new Configuration(); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setReducerClass(Reduce.class); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(a[0])); FileOutputFormat.setOutputPath(job, new Path(a[1])); job.waitForCompletion(true); }
  • 3. Sample job: mapper class M extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable k, Text v, Context ctx) { String line = v.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); ctx.write(word, one); } } }
  • 4. Sample job: reducer class R extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text k, Iterable<IntWritable> v, Context ctx) { int sum = 0; for (IntWritable val : v) sum += val.get(); context.write(k, new IntWritable(sum)); } }
  • 5. Pig snippet raw = LOAD 'excite.log' USING PigStorage('t') AS (user, time, qry); clean1 = FILTER raw BY org.apache.pig.tutorial.NonURLDetector(qry); clean2 = FOREACH clean1 GENERATE user, time, org.apache.pig.tutorial.ToLower(qr) as query;
  • 6. Hive snippet CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15'); SELECT a.foo FROM invites a WHERE a.ds='2008-08-15'; INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar FROM invites a;
  • 7. Spark: example val counts = lines.flatMap(line => line.split(“ “)) .map(word => (word, 1)) .reduceByKey(_ + _)
  • 8. Shark example CREATE TABLE src(key INT, value STRING); LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src; SELECT COUNT(1) FROM src; CREATE TABLE src_cached AS SELECT * FROM SRC; SELECT COUNT(1) FROM src_cached;
  • 9. Disco example def fun_map(line, params): for word in line.split(): yield word, 1 def fun_reduce(iter, params): for word, counts in kvgroup(sorted(iter)): yield word, sum(counts)
  • 10. Disco driver job = Job().run( input=["http://discoproject.org/media/text/chekhov. txt"], map=map, reduce=reduce) for word, count in result_iterator(job.wait(show=True)): print(word, count)
  • 11. References I ● “MapReduce: Simplified Data Processing on Large Clusters” Dean, Jeffrey and Ghemawat, Sanjay ● “A Comparison of Join Algorithms for Log Processing in MapReduce” S. Blanas, J. Patel, V. Ercegovac, J. Rao, E. Shekita, Y. Tian ● “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing” Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica ● “Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters” Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, Ion Stoica ● “Shark: Fast Data Analysis Using Coarse-grained Distributed Memory” Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Haoyuan Li, Scott Shenker, Ion Stoica
  • 12. References II ● Disco Technical Overview http://disco.readthedocs.org/en/latest/overview.html ● Disco Distributed Filesystem http://disco.readthedocs.org/en/latest/howto/ddfs.html ● An efficient, immutable, persistent mapping object http://discodb.readthedocs. org/en/latest/