Presentation on Hadoop Technology

Presentation On
By
Deepali Dhandar
deepali.dhandar@gmail.com

Index

Introduction and History

Use and Advantages

Issues and Need of Hadoop

Users of Hadoop

Framework and Architecture

HDFS Basic Concept

Map Reduce

Summery

Introduction and History
• Apache Software Foundation Project
• Open Source - Reliable, Scalable, Distributed
Computing and Data Storage
• Concept: Moving computation > Moving large data.
History:
• Google File System paper – Oct’2003
• MapReducing & Clustering
• Doug Cutting and Mike Carafella in 2005.
• Name: Doug Cutting – Yahoo – Feb 2006
• Name Comes - Doug Cutting’s Son (Tohvelant)

Use & Advantages
• Data-intensive text processing
• Assembly of large genomes
• Graph mining
• Machine learning and data mining
• Large scale social network analysis
Advantages:
• Massive Scalability
• Flexible Schema
• Quicker/Cheaper to set up
• Consistence with High Performance
• Limitation:

Gaps in Analytic Functionality

Multiple copies of already big data

Inefficient execution & Challenging framework

Issues and Need Of Hadoop
500 TB per day
Over 170 PB
Over 6 PB
Getting the data to the processors
becomes the bottleneck

Users Of Hadoop
And Many More…

Architecture of Hadoop
Master node (single node)
Many slave nodes

HDFS Basic Concept
• HDFS works best with a smaller number of large files
o Millions as opposed to billions of files
o Typically 100MB or more per file
• Files in HDFS are write once
• Optimized for streaming reads of large files and not
random reads

MapReduce Component
• JobTracker & TaskTracker
• JobTracker splits up data into smaller tasks(“Map”)
and sends it to the TaskTracker process in each
node
• TaskTracker reports back to the JobTracker node
and reports on job progress, sends data
(“Reduce”) or requests new jobs

Summery

Open Source Data Management with Scale out Storage

High Performance while handling large and Complex data

Optimizing for Streaming & Distributed Processing

Presentation on Hadoop Technology

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (9)

Similaire à Presentation on Hadoop Technology

Similaire à Presentation on Hadoop Technology (20)

Dernier

Dernier (20)

Presentation on Hadoop Technology