Soumettre la recherche
Mettre en ligne
Hadoop MapReduce Streaming and Pipes
•
3 j'aime
•
4,614 vues
Hanborq Inc.
Suivre
Introduction of Hadoop MapReduce Streaming and Pipes, for training.
Lire moins
Lire la suite
Technologie
Signaler
Partager
Signaler
Partager
1 sur 26
Recommandé
Hadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without Java
Glenn K. Lockwood
Overview of Spark for HPC
Overview of Spark for HPC
Glenn K. Lockwood
03 pig intro
03 pig intro
Subhas Kumar Ghosh
06 pig etl features
06 pig etl features
Subhas Kumar Ghosh
Hadoop Interview Question and Answers
Hadoop Interview Question and Answers
techieguy85
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
soujavajug
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Yahoo Developer Network
MapReduce basic
MapReduce basic
Chirag Ahuja
Recommandé
Hadoop Streaming: Programming Hadoop without Java
Hadoop Streaming: Programming Hadoop without Java
Glenn K. Lockwood
Overview of Spark for HPC
Overview of Spark for HPC
Glenn K. Lockwood
03 pig intro
03 pig intro
Subhas Kumar Ghosh
06 pig etl features
06 pig etl features
Subhas Kumar Ghosh
Hadoop Interview Question and Answers
Hadoop Interview Question and Answers
techieguy85
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
soujavajug
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Yahoo Developer Network
MapReduce basic
MapReduce basic
Chirag Ahuja
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
Yahoo Developer Network
Introduction To Map Reduce
Introduction To Map Reduce
rantav
Introduction to Apache Pig
Introduction to Apache Pig
Jason Shao
Hadoop
Hadoop
Scott Leberknight
Hadoop 2
Hadoop 2
EasyMedico.com
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
fvanvollenhoven
Map Reduce
Map Reduce
Rahul Agarwal
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
Hadoop & MapReduce
Hadoop & MapReduce
Newvewm
Hadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
Big Data Interview Questions
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
Farzad Nozarian
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
Subhas Kumar Ghosh
Hadoop pig
Hadoop pig
Sean Murphy
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
scottcrespo
MapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
Adam Kawa
Hadoop eco system-first class
Hadoop eco system-first class
alogarg
myHadoop 0.30
myHadoop 0.30
Glenn K. Lockwood
01 hbase
01 hbase
Subhas Kumar Ghosh
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Sameer Tiwari
Designing Data Pipelines Using Hadoop
Designing Data Pipelines Using Hadoop
DataWorks Summit
Contenu connexe
Tendances
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
Yahoo Developer Network
Introduction To Map Reduce
Introduction To Map Reduce
rantav
Introduction to Apache Pig
Introduction to Apache Pig
Jason Shao
Hadoop
Hadoop
Scott Leberknight
Hadoop 2
Hadoop 2
EasyMedico.com
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
fvanvollenhoven
Map Reduce
Map Reduce
Rahul Agarwal
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
Hadoop & MapReduce
Hadoop & MapReduce
Newvewm
Hadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
Big Data Interview Questions
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
Farzad Nozarian
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
Subhas Kumar Ghosh
Hadoop pig
Hadoop pig
Sean Murphy
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
scottcrespo
MapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
Adam Kawa
Hadoop eco system-first class
Hadoop eco system-first class
alogarg
myHadoop 0.30
myHadoop 0.30
Glenn K. Lockwood
01 hbase
01 hbase
Subhas Kumar Ghosh
Tendances
(19)
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
Introduction To Map Reduce
Introduction To Map Reduce
Introduction to Apache Pig
Introduction to Apache Pig
Hadoop
Hadoop
Hadoop 2
Hadoop 2
Hadoop, HDFS and MapReduce
Hadoop, HDFS and MapReduce
Map Reduce
Map Reduce
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Hadoop & MapReduce
Hadoop & MapReduce
Hadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
Hadoop pig
Hadoop pig
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
Mastering Hadoop Map Reduce - Custom Types and Other Optimizations
MapReduce Paradigm
MapReduce Paradigm
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
Hadoop eco system-first class
Hadoop eco system-first class
myHadoop 0.30
myHadoop 0.30
01 hbase
01 hbase
En vedette
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Sameer Tiwari
Designing Data Pipelines Using Hadoop
Designing Data Pipelines Using Hadoop
DataWorks Summit
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
softwarequery
информатика 5. информация сообщение
информатика 5. информация сообщение
Вячеслав Васильченко
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Desing Pathshala
Hadoop combiner and partitioner
Hadoop combiner and partitioner
Subhas Kumar Ghosh
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
Ashraf Uddin
HDFS Federation
HDFS Federation
Hortonworks
Big Data Analytics
Big Data Analytics
Global Business Solutions SME
Big Data Analytics
Big Data Analytics
Ghulam Imaduddin
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
Basics of big data analytics hadoop
Basics of big data analytics hadoop
Ambuj Kumar
Hadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
BMC Software
Types of pipes
Types of pipes
Kaustuv Ruhela
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
Big Data Analytics 2014
Big Data Analytics 2014
Stratebi
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
Big data and Hadoop
Big data and Hadoop
Rahul Agarwal
En vedette
(20)
Seminar Presentation Hadoop
Seminar Presentation Hadoop
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Designing Data Pipelines Using Hadoop
Designing Data Pipelines Using Hadoop
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
информатика 5. информация сообщение
информатика 5. информация сообщение
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Hadoop combiner and partitioner
Hadoop combiner and partitioner
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
HDFS Federation
HDFS Federation
Big Data Analytics
Big Data Analytics
Big Data Analytics
Big Data Analytics
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
Basics of big data analytics hadoop
Basics of big data analytics hadoop
Hadoop Map Reduce
Hadoop Map Reduce
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
Types of pipes
Types of pipes
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Big Data Analytics 2014
Big Data Analytics 2014
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
Big data and Hadoop
Big data and Hadoop
Similaire à Hadoop MapReduce Streaming and Pipes
Building Your First Apache Apex Application
Building Your First Apache Apex Application
Apache Apex
Building your first aplication using Apache Apex
Building your first aplication using Apache Apex
Yogi Devendra Vyavahare
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
Apache Apex
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
Mao Geng
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96Boards
Linaro
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Ganesh Raju
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analytics
KrishnaVeni451953
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Apex
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍
Tae Young Lee
H2O on Hadoop Dec 12
H2O on Hadoop Dec 12
Sri Ambati
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyData
Cloud Foundry Open Tour China
Cloud Foundry Open Tour China
marklucovsky
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
DrPDShebaKeziaMalarc
pig.ppt
pig.ppt
Sheba41
Hadoop introduction
Hadoop introduction
Dong Ngoc
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop Engine
Nicolas Morales
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Sri Ambati
NetFlow Data processing using Hadoop and Vertica
NetFlow Data processing using Hadoop and Vertica
Josef Niedermeier
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Skills Matter
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
clairvoyantllc
Similaire à Hadoop MapReduce Streaming and Pipes
(20)
Building Your First Apache Apex Application
Building Your First Apache Apex Application
Building your first aplication using Apache Apex
Building your first aplication using Apache Apex
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
LAS16-305: Smart City Big Data Visualization on 96Boards
LAS16-305: Smart City Big Data Visualization on 96Boards
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
A slide share pig in CCS334 for big data analytics
A slide share pig in CCS334 for big data analytics
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍
H2O on Hadoop Dec 12
H2O on Hadoop Dec 12
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
Cloud Foundry Open Tour China
Cloud Foundry Open Tour China
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
pig.ppt
pig.ppt
Hadoop introduction
Hadoop introduction
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop Engine
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
Tom Kraljevic presents H2O on Hadoop- how it works and what we've learned
NetFlow Data processing using Hadoop and Vertica
NetFlow Data processing using Hadoop and Vertica
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
Plus de Hanborq Inc.
Introduction to Cassandra
Introduction to Cassandra
Hanborq Inc.
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
Hanborq Inc.
Hadoop大数据实践经验
Hadoop大数据实践经验
Hanborq Inc.
FlumeBase Study
FlumeBase Study
Hanborq Inc.
Flume and Flive Introduction
Flume and Flive Introduction
Hanborq Inc.
HBase Introduction
HBase Introduction
Hanborq Inc.
Hadoop Versioning
Hadoop Versioning
Hanborq Inc.
Hadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Task Scheduler Introduction
Hanborq Inc.
Hadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep Insight
Hanborq Inc.
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
Hanborq Inc.
How to Build Cloud Storage Service Systems
How to Build Cloud Storage Service Systems
Hanborq Inc.
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
Hanborq Inc.
Plus de Hanborq Inc.
(12)
Introduction to Cassandra
Introduction to Cassandra
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
Hadoop大数据实践经验
Hadoop大数据实践经验
FlumeBase Study
FlumeBase Study
Flume and Flive Introduction
Flume and Flive Introduction
HBase Introduction
HBase Introduction
Hadoop Versioning
Hadoop Versioning
Hadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Introduction and Deep Insight
Hadoop MapReduce Introduction and Deep Insight
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
How to Build Cloud Storage Service Systems
How to Build Cloud Storage Service Systems
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
Dernier
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
Allon Mureinik
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
BookNet Canada
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Paola De la Torre
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Scott Keck-Warren
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
Sujit Pal
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
Slack Application Development 101 Slides
Slack Application Development 101 Slides
praypatel2
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Dernier
(20)
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Slack Application Development 101 Slides
Slack Application Development 101 Slides
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Hadoop MapReduce Streaming and Pipes
1.
Hadoop Streaming and Pipes
July 10, 2012 Clay Jiang Big Data Engineering Team Hanborq Inc.
2.
Hadoop Streaming • Hadoop
Streaming 是一个将任何可执行程序 /脚本当成Map/Reduce来执行MR Job的工具 • $HADOOP_HOME/contrib/streaming/hadoop- streaming-*.jar 2
3.
First Streaming Run •
基本命令: – hadoop jar $HADOOP_HOME/contrib/streaming/hadoop -streaming-*.jar -input /path/to/inputdir -output /path/to/outputdir -mapper /path/to/map_exec -reducer /path/to/reduce_exec 3
4.
How Streaming Works? •
Mapper/Reducer将 map_exec/reduce_exec作为单独进程启 动 • Mapper/Reducer通过stdin和stdout传输 <key,value> • <key,value>以约定的形式传输给 map_exec/reduce_exec,默认形式 为”keytvalue” 4
5.
How Streaming Works?
5
6.
Hadoop Streaming Example •
Streaming WordCount 6
7.
Streaming Internal • 只是工具,不是新的机制 •
在原有的MapReduce框架上,增加适配层: – PipeMapper + PipeMapRunner – PipeCombiner – PipeReducer – No PipePartitioner 7
8.
Streaming Internal PipeMapper/PipeReducer负责与可执行程序通过
stdin/stdout传输数据 8
9.
Streaming Internal • hadoop-streaming*.jar主入口: •
三个工具其中之一: 9
10.
Streaming-StreamJob • StreamJob
– parseArgv: • Argv Field Member – setJobConf: • Field Member JobConf – submitAndMonitorJob: • JobConf submit to JobClient 10
11.
Streaming Map • -mapper
<cmd|JavaClassName> • PipeMapRunner/PipeMapper – startOutputThreads: 启动线程MROutputThread 来“tail”map_exec的stdout,并使用 OutputReader 读取输出,解析后写到collector上 – PipeMapper.map: 使用InputWriter,将key/value 写成map_exec可以解析的字符串,写到 map_exec的 stdin 11
12.
Streaming Reduce • -reducer
<cmd|JavaClassName> • PipeReducer – 倚靠MapReduce内部机制shuffle到reducer – startOutputThread: 首次reduce时,类似地启动 MROutputThread来收集“reducer cmd”的stdout – 类似地,使用inputWriter来翻译reduce的 key/values,逐对提供给“reducer cmd” 12
13.
InputWriter/OutputReader • InputWriter
– 将<key,value>按预定的编码写到可执行程序的stdin • OutputReader – 读取可执行程序的stdout并解编码为<key,value> • InputWriter + OutputReader – 形成Java进程与map/reduce可执行进程的数据传输协议 13
14.
TextInputWriter/TextOutputReader • 默认使用:
– TextInputWriter、TextOutputReader • <key,value> key + separator + value • 默认separator: t 14
15.
Streaming Data Flow
15
16.
Streaming Combiner • -combiner
<cmd|JavaClassName> • PipeCombiner简单地继承了PipeReducer,流 程与PipeReducer相同 16
17.
Streaming Partitioner • -partitioner
<javaClassName> • 目前而言,partitioner必须为java类 17
18.
Streaming I/O Format •
-inputFormat <javaClassName> – JobConf.setInputFormat() • -outputFormat <javaClassName> – JobConf.setOutputFormat() • -inputreader <javaClassName>: • 使用StreamInputFormat 作为InputFormat 18
19.
Streaming IO Spec •
TextInputWriter/TextOuputReader: – stream.map/reduce.output.field.separator • map/reduce可执行程序输出使用的separator – stream.map/reduce.input.field.separator • map/reduce可执行程序输入使用的separator – stream.num/reduce.map.output.key.fields • Separator将行分割成多个field,指定若干个fields作 为key 19
20.
Streaming IO Spec •
-io text|rawbytes|typedbytes – text TextInputWriter/TextOutputReader – rawbytes RawBytesInputWriter/RawBytesOutputReader – typedbytes TypedBytesInputWriter/TypedOutputReader – 由IdentifierResolver解析选项 20
21.
User-Defined IO Spec •
MyInputWriter/MyOutputReader – extend InputWriter/OutputReader • MyIdentifierResovler – extend IdentifierResovler – 用于解析 my MyInputWriter/MyOutputReader – -Dstream.io.identifier.resolver.class MyIdentifierResovler 21
22.
Debug Streaming • -mapdebug/-reducedebug
– 当map/reduce task执行失败时,执行debug脚本 – $script $stdout $stderr $syslog $jobconf • -debug – 执行完毕时,不删除 /tmp/${user.name}/streamjob.jar 22
23.
V.S. Hadoop Pipes •
Stdin/stdout Socket • 限定I/O接口 $HADOOP_HOME/c++/$PLATFORM/include – HadoopPipes::Mapper::map(MapContext& context) – HadoopPipes::Reducer::reduce(ReduceContext& context) • Performance: One better than the other? 23
24.
V.S. Hadoop Pipes •
实现上很相似 – PipeMapper/PipeReducer PipesMapper/PipesReducer – InputWriter/OuputReader Application – 任何可执行程序 Pipes客户端需要链接 c++库 24
25.
参考 • (1)《Hadoop the
definitive guide》 • (2)Hadoop Streaming - http://hadoop.apache.org/common/docs/r0.20.2/streaming. html • (3)How to Debug Map/Reduce Programs http://wiki.apache.org/hadoop/HowToDebugMapReduceProg rams • (4)Hadoop Wiki http://wiki.apache.org/hadoop/ 25
26.
The End Thank You
Very Much! chiangbing@gmail.com 26