Introduction to Bigdata & Hadoop

Big Data:
Big data is an all-encompassing term for any collection of data
sets, so large and complex that it becomes difficult to process
using on-hand data management tools or traditional data
processing applications
Big data is a huge amount of data which is too large to process
using traditional methods. Big data contains data in the form
Tera bytes , Peta bytes, Exa bytes of data.
The data can be structured, unstructured and semi structured
data.
www.beinghadoop.com

BIG DATA CAN BE
1. Peta bytes/exa bytes of data,
2. Millions/billions of people,
3. Billions/trillions of records,
4. Loosely-structured and often distributed data,
5. Flat schemas with few complex interrelationships,
6. Often involving time-stamped events,
7. Often made up of incomplete data,
8. Often including connections between data elements that
must be probabilistically inferred,
www.beinghadoop.com

DATA REPRESENTATION
www.beinghadoop.com
1 Byte=8 bits
1 Kilobyte(kb)=1024 bytes
1 Mega byte(mb)=1024 kilo bytes or 1,000,000 bytes
1 Giga byte(gb)=1024 mega bytes or1,000,000,000 bytes
1 TERA BYTE (TB)= 1024 Giga bytes or 1,000,000,000,000 bytes
1 Peta byte (pb)=1024 Tera bytes or1,000,000,000,000,000 bytes
1 Exa byte(Eb)=1024Peta bytes or 1000 000 000 000 000 000bytes
1 Zotta byte(Eb)=1024Exa bytes or 1000 000 000 000 000 000 000bytes
1 Yotta byte(Yb)=1024Zotta bytes or 1000 000 000 000 000 000 000 000 bytes

DATA SIGE GB PETABYTE
ACCESS Interactive and
batch
batch
UPDATE Read and
Write many times
Write once
read many
times
STRUCTURE Static schema Dynamic
schema
INTEGRITY high low
SCALING Non lenear Linear
www.beinghadoop.com

APACHE HADOOP:
Apache Hadoop is a scalable framework for storing and processing
data on a cluster of commodity
hardware nodes. Hadoop is designed to scale up from a single node to
thousands of
nodes. Hadoop has two main components: a computing framework
and Hadoop Distributed
File System (HDFS). HDFS uses the commodity server nodes and JBOD
(Just a Bunch Of
Disks) storage drives to store the data and provide large aggregated
I/O bandwidth to data
www.beinghadoop.com

Hadoop Use cases
MANUFACTURING:
Use Apache Hadoop to Increase Production, Reduce Costs &
Improve
Quality
Assure Just-In-Time Delivery of Raw Materials
Control Quality with Real-Time & Historical
Assembly Line Data
Avoid Stoppages with Proactive Equipment
Maintenance
Increase Yields in Drug Manufacturing
Channel
www.beinghadoop.com

Health care:
Use Apache Hadoop to Save Lives While Delivering More Efficient
Care
Access Genomic Data for Medical Trials
Monitor Patient Vitals in Real-Time
Track Equipment and Medicines with RFID Data
Improve Prescription Adherence
Retailers :
Build a 360° View of the Customer
Analyze Brand Sentiment
Localize & Personalize Promotions
Optimize Websites
Optimize Store Layouts
www.beinghadoop.com

TELECOM:
Use Apache Hadoop to Improve Service & Launch New
Products
Analyze Call Detail Records (CDRs)
Service Equipment Proactively
Rationalize Infrastructure Investments
Recommend Next Product to Buy (NPTB)
Allocate Bandwidth in Real-time
Develop New Products
www.beinghadoop.com

Introduction to Bigdata & Hadoop

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

Similaire à Introduction to Bigdata & Hadoop

Similaire à Introduction to Bigdata & Hadoop (20)

Dernier

Dernier (20)

Introduction to Bigdata & Hadoop