3. BigData
Huge Amount of Data (Terabytes or Petabytes)
Big data is the term for a collection of data sets so large
and complex that it becomes difficult to process using on-
hand database management tools or traditional data
processing applications. (MySql, Oracle, etc.)
The challenges include capture, storage, search, sharing,
transfer, analysis, and visualization.
4.
5. Hadoop
Apache Hadoop is a framework that allows the distributed
processing of large data sets across clusters of
commodity hardware using a simple programming mode.
It is an Open-source Data Management with scale-out
storage and distributed processing.
9. Block Split
Its the physical division of data file done by HDFS
while storing it
128 MB of blocks size by default for Hadoop 2.0.
10. Rack Awareness
HDFS stores blocks on the cluster in a rack aware
fashion i.e. one block on one rack and the other
two blocks on other rack
Block Replication in HDFS
Provides redundancy and fault tolerance to the
data saved. The default value is 3
11.
12.
13. RM-Resource Manager
1. It is the global resource scheduler
2. It runs on the Master Node of the Cluster
3. It is responsible for negotiating the resources of
the system amongst the competing applications.
4. It keeps a track on the heartbeats from the Node
Manager
14. NM-Node Manager
1.Node Manager communicates with the resource
manager.
2.It runs on the Slave Nodes of the Cluster
AM-Application Master
1.There is one AM per application which is
application specific or framework specific.
2.The AM runs in Containers that are created by
the resource manager on request.