How Does MapReduce Work?
map(k1, v1) → list(k2, v2)
reduce(k2, list(v2)) → list(v3)
The property of data independence among tasks allows for highly
parallel processing… maybe, if the stars are all aligned :)
Primarily, a MapReduce framework is largely about fault tolerance, and
how to leverage “commodity hardware” to replace “big iron” solutions…
That phrase “big iron” might apply to Oracle + NetApp. Or perhaps an
IBM zSeries mainframe… Or something – expensive, undoubtably.
Bonus questions for self-admitted math geeks: Foresee any concerns
about O(n) complexity, given the functional definitions listed above?
Keep in mind that each phase cannot conclude and progress to the
next phase until after each of its tasks has successfully completed.