Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Hadoop Streaming One way to Getting Started on Hadoop

20 497 vues

Publié le

Hadoop Streaming

One way to approach MapReduce jobs in Hadoop is to use streaming.
In other words, use any kind of script which can be run from a command
line and read/write data via stdin and stdout:
http://hadoop.apache.org/common/docs/current/streaming.html#Hadoop+Streaming


The following examples use Python scripts for Hadoop Streaming. One
really great benefit is that then you can dev/test/debug your MapReduce
code on small data sets from a command line simply by using pipes:


cat input.txt | mapper.py | sort | reducer.py

BTW, there are much better ways to handle Hadoop Streaming in Python
on Elastic MapReduce – for example, using the “boto” library. However,
these examples are kept simple so they’ll fit into a tech talk!

Publié dans : Technologie

×