More about Hadoop
www.beinghadoop.com
https://www.facebook.com/hadoopinfo
This PPT gives information about
what is Hadoop?
what is big data?
How data loaded to HDFS?
Hadoop architecture?
How user request is processed in Hadoop Cluster?
2. Big Data:
Big data is an all-encompassing term for any collection of data
sets, so large and complex that it becomes difficult to process
using on-hand data management tools or traditional data
processing applications
Big data is a huge amount of data which is too large to process
using traditional methods. Big data contains data in the form
Tera bytes , Peta bytes, Exa bytes of data.
The data can be structured, unstructured and semi structured
data.
www.beinghadoop.com
3. BIG DATA CAN BE
1. Peta bytes/exa bytes of data,
2. Millions/billions of people,
3. Billions/trillions of records,
4. Loosely-structured and often distributed data,
5. Flat schemas with few complex interrelationships,
6. Often involving time-stamped events,
7. Often made up of incomplete data,
8. Often including connections between data elements that
must be probabilistically inferred,
www.beinghadoop.com
4. DATA REPRESENTATION
www.beinghadoop.com
1 Byte=8 bits
1 Kilobyte(kb)=1024 bytes
1 Mega byte(mb)=1024 kilo bytes or 1,000,000 bytes
1 Giga byte(gb)=1024 mega bytes or1,000,000,000 bytes
1 TERA BYTE (TB)= 1024 Giga bytes or 1,000,000,000,000 bytes
1 Peta byte (pb)=1024 Tera bytes or1,000,000,000,000,000 bytes
1 Exa byte(Eb)=1024Peta bytes or 1000 000 000 000 000 000bytes
1 Zotta byte(Eb)=1024Exa bytes or 1000 000 000 000 000 000 000bytes
1 Yotta byte(Yb)=1024Zotta bytes or 1000 000 000 000 000 000 000 000 bytes
5. DATA SIGE GB PETABYTE
ACCESS Interactive and
batch
batch
UPDATE Read and
Write many times
Write once
read many
times
STRUCTURE Static schema Dynamic
schema
INTEGRITY high low
SCALING Non lenear Linear
www.beinghadoop.com
9. APACHE HADOOP:
Apache Hadoop is a scalable framework for storing and processing
data on a cluster of commodity
hardware nodes. Hadoop is designed to scale up from a single node to
thousands of
nodes. Hadoop has two main components: a computing framework
and Hadoop Distributed
File System (HDFS). HDFS uses the commodity server nodes and JBOD
(Just a Bunch Of
Disks) storage drives to store the data and provide large aggregated
I/O bandwidth to data
www.beinghadoop.com
11. Hadoop Use cases
MANUFACTURING:
Use Apache Hadoop to Increase Production, Reduce Costs &
Improve
Quality
Assure Just-In-Time Delivery of Raw Materials
Control Quality with Real-Time & Historical
Assembly Line Data
Avoid Stoppages with Proactive Equipment
Maintenance
Increase Yields in Drug Manufacturing
Channel
www.beinghadoop.com
12. Health care:
Use Apache Hadoop to Save Lives While Delivering More Efficient
Care
Access Genomic Data for Medical Trials
Monitor Patient Vitals in Real-Time
Track Equipment and Medicines with RFID Data
Improve Prescription Adherence
Retailers :
Build a 360° View of the Customer
Analyze Brand Sentiment
Localize & Personalize Promotions
Optimize Websites
Optimize Store Layouts
www.beinghadoop.com
13. TELECOM:
Use Apache Hadoop to Improve Service & Launch New
Products
Analyze Call Detail Records (CDRs)
Service Equipment Proactively
Rationalize Infrastructure Investments
Recommend Next Product to Buy (NPTB)
Allocate Bandwidth in Real-time
Develop New Products
www.beinghadoop.com