Hadoop

TOPIC: HADOOP
Presented by:
Shamama Kamal

•A software program that lets on easily run
and write applications that process vast
amount of data.
•Created by Doug Cutting and Mike Carafella
in 2005.
•Cutting named the program after his son’s
toy elephant.

Hadoop common
HDFS
Hadoop MapReduce(Batch Handling
Resources)
Hadoop Yarn
Anatomy of a Cluster(Hadoop Cluster)

Scalable
Economical
Efficient
Reliable

Collection of common utilities and libraries
that supports other hadoop modules
Assumes hardware failue are common and
should be automatically handled in
software by hadoop framework
Known as hadoop core

Hadoop Distributed File System
Developed by Apache Hadoop
Stores large amount of data placed across
multiple machines(1000s of simultaneuosly
connected nodes)
Replicates data and stores it copies
Consists of Clusters
Written in Java

Specific component of Hadoop used for
Big Data Analysis.
Clustering platform that helps to manage
resources and scheduled tasks.
Helps to allocate resources to particular
applications and manage other kinds of
resources and scheduled tasks.

Map function:Different point of distributed
clusters to distribute their work
Reduce function: To reduce final form of
the cluster’s result into one
Main advantage: Fault tolerance

Computational cluster
Runs hadoop s/w on low cost commodity
computers
Systems(nodes) in a cluster shares only
network and nothing else.
Boosts the speed
Highly scalable
Highly resistant to failure

 J. Dean and S. Ghemawat, ``MapReduce:
Simplied Data Processing on Large Clusters,’’
OSDI 2004. (Google)
 D. Cutting and E. Baldeschwieler, ``Meet
Hadoop,’’ OSCON, Portland, OR, USA, 25 July
2007 (Yahoo!)
 R. E. Brayant, “Data Intensive Scalable
computing: The case for DISC,” Tech Report:
CMU-CS-07-128, http://www.cs.cmu.edu/~bryant

Hadoop

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (12)

Similaire à Hadoop

Similaire à Hadoop (20)

Dernier

Dernier (20)

Hadoop