2. •A software program that lets on easily run
and write applications that process vast
amount of data.
•Created by Doug Cutting and Mike Carafella
in 2005.
•Cutting named the program after his son’s
toy elephant.
6. Collection of common utilities and libraries
that supports other hadoop modules
Assumes hardware failue are common and
should be automatically handled in
software by hadoop framework
Known as hadoop core
7. Hadoop Distributed File System
Developed by Apache Hadoop
Stores large amount of data placed across
multiple machines(1000s of simultaneuosly
connected nodes)
Replicates data and stores it copies
Consists of Clusters
Written in Java
8.
9. Specific component of Hadoop used for
Big Data Analysis.
Clustering platform that helps to manage
resources and scheduled tasks.
Helps to allocate resources to particular
applications and manage other kinds of
resources and scheduled tasks.
10. Map function:Different point of distributed
clusters to distribute their work
Reduce function: To reduce final form of
the cluster’s result into one
Main advantage: Fault tolerance
11.
12. Computational cluster
Runs hadoop s/w on low cost commodity
computers
Systems(nodes) in a cluster shares only
network and nothing else.
Boosts the speed
Highly scalable
Highly resistant to failure
13. J. Dean and S. Ghemawat, ``MapReduce:
Simplied Data Processing on Large Clusters,’’
OSDI 2004. (Google)
D. Cutting and E. Baldeschwieler, ``Meet
Hadoop,’’ OSCON, Portland, OR, USA, 25 July
2007 (Yahoo!)
R. E. Brayant, “Data Intensive Scalable
computing: The case for DISC,” Tech Report:
CMU-CS-07-128, http://www.cs.cmu.edu/~bryant