Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

“WordCount”, a “Hello World” for Getting Started on Hadoop

21 573 vues

Publié le

“WordCount”, a “Hello World” for MapReduce

Definition: count how often each word appears within a collection
of text documents.

A simple program which illustrates a pretty good test case for what
MapReduce can perform, since it incorporates:

• minimal amount code
• document feature extraction (where words are “terms”)
• symbolic and numeric values
• potential use of a combiner
• bipartite graph of (doc, term) tuples
• not so many steps away from useful indexing…
When a framework can run “WordCount” in parallel at scale, then it
can handle much larger, more interesting compute problems as well.

Publié dans : Technologie

×