SlideShare une entreprise Scribd logo
1  sur  31
GANDHI INSTITUTE FOR TECHNOLOGICAL
ADVANCEMENT, BHUBANESWAR
TECHNICAL SEMINAR ON
HADOOP
GUIDED BY- PRESENTED BY-
PROF.KUNDAN CHANDRA PATRA NAME-ABHIJEET RAJ
PROF. SWOGAT KUMAR JENA BRANCH-CSE(1)
PROF. SAROJ KUMAR MOHANTY REG NO.-1301287529
CONTENTS -
1. INTRODUCTION TO HADOOP
2. HADOOP-HISTORY AND ORIGIN
3. BIG DATA ANALYTICS AND CHALLENGES
4. HADOOP ECOSYSTEM
5. HDFS ARCHITECTURE
6. HADOOP VS RDBMS
7. MAP REDUCE
8. PIG AND HIVE
9. CONCLUSION
1Abhijeet raj,131001
INTRODUCTION-
• What is Hadoop-
• Apache Hadoop is an open-source software
framework for distribuited storage and
processing of large data
• Written in java
• Based on Google file system(GFS)
2Abhijeet raj,131001
Continued...
• It is designed to scale up from single servers to
thousands of machines, each offering local
computation and storage.
• Hadoop framework consists on two main layers
• HDFS
• Map Reduce
Abhijeet raj,131001 3
History and Origin
• Doug cutting trying to make an open source
search engine in 2003
• Google released their distributed system
papers called Map/Reduce and Google file
system (GFS) which powered Google search
engine:
Abhijeet raj,131001 4
Continued...
• Doug cutting took these ideas and started to
work on open source
• In 2006 he joins Yahoo! and the distributed
system named as Hadoop
• Yahoo open sourced it through Apache
organization
Abhijeet raj,131001 5
Organizations using Hadoop
• Amazon
• Adobe
• Cloudspace
• Ebay
• Facebook
• Google
• IBM
• LinkedIn
• yahoo
Abhijeet raj,131001 6
Big data analytics and
challenges
• Minimum size of that a Big Data file starts is
at least 1 Terabyte.
• 4 V’s tossed for Big Data:-
1. VOLUME- The scale of data
2. VARIETY- Different forms of data
3. VELOCITY- Analysis of streaming data
4. VARACITY- Uncertainity of data
Abhijeet raj,131001 7
Challenges for Big Data
processing
• Meeting the need for speed
• Scale
• Continuous Availability
• Displaying meaningful results
• Workload diversity
• Data security
• Cost
• Manageability
Abhijeet raj,131001 8
Hadoop vs traditional RDBMS
Abhijeet raj,131001 9
Factors Hadoop RDBMS
Size of data Petabytes Gigabytes
Integrity of data Low High
Data schema Dynamic Static
Access method Interactive and batch Batch
Scaling Linear Non linear
Data structure Unstructured/structured Structured
Normalization of data Not required Required
Query response time Has latency(due to
batch process)
Can be near immediate
Hadoop Ecosystem
Abhijeet raj,131001 10
HDFS(Hadoop Distribuited File System)
• a distributed file system designed to run on
commodity hardware
• It is suitable for the distributed storage and
processing.
• The built-in servers of namenode and
datanode help users to easily check the
status of cluster.
• HDFS provides file permissions and
authentication.
Abhijeet raj,131001 11
Continued...
Namenode
• Namenode is the node which stores the filesystem
metadata i.e. which file maps to what block
locations and which blocks are stored on which
datanode.
Datanode
• The data node is where the actual data resides.
Abhijeet raj,131001 12
Continued...
Job tracker
• primary function of the job tracker is resource
management ,tracking resource availability and
task life cycle management
Task tracker
• Follow the orders of the job tracker and
updating the job tracker with its progress status
periodically.
Abhijeet raj,131001 13
Abhijeet raj,131001 14
Goals of HDFS
• Fault detection and recovery
• Huge datasets
• Reduce network traffic
• Increases throughput
Abhijeet raj,131001 15
Map Reduce
• MapReduce is a processing technique and a
program model for distributed computing
based on java
• Map-data are broken into tuples
• Reduce-combines the tuples into a smaller
form
Abhijeet raj,131001 16
Abhijeet raj,131001 17
Advantages of Map Reduce
• Easy to scale data processing over multiple
computing nodes.
• Parallel processing.
• Fast.
• Simple model of programming
Abhijeet raj,131001 18
HBASE
• Developed by Apache software foundation
• Database for Hadoop.
• Open source
• Non-relational
Abhijeet raj,131001 19
Continued...
• Distribuited
• Written in java
• Connectivity is done using JDBC –Type 4
driver
Abhijeet raj,131001 20
YARN
• Yet Another Resource Negotiator
• In Yarn, the job tracker is split into two
different daemons called Resource
Manager and Node Manager
Abhijeet raj,131001 21
YARN ARCHITECTURE
Abhijeet raj,131001 22
PIG
• Analyzing large data sets that consists of a
high-level language for expressing data
analysis programs
• Structure is amenable to substantial
parallelization
Abhijeet raj,131001 23
Continued...
• Easy of programming
• Optimization opportunities
• Extensibility
Abhijeet raj,131001 24
HIVE
• Data warehouse software facilitates querying
and managing large datasets
• Allows traditional map/reduce programmers
to plug in their custom mappers and
reducers
Abhijeet raj,131001 25
PIG VS HIVE
Abhijeet raj,131001 26
PIG HIVE
TYPES OF FLOW PROCEDURAL LANGUAGE DECLARATIVE LANGUAGE
EASY OF USE COMPLEX EASY
NATURE OF USAGE EFFICIENCY IN COMPUTING ANALYTICS AREA
TYPE OF DATA VARIABLES TABLES
DEBUGGING FACILITY DEBUGGED LOCALLY COMPLEX
MAINTENANCE MORE LESS
DEVELOPMENT TIME MORE LESS
HANDLING BIG DATA HANDLES MORE DATA MEMORY OVERFLOW
REFERENCES
• hadoop.apache.org
• tutorialspoint.com
• hbase.apache.org
• en.wikipedia.org/wiki/Apache_Hadoop
• Pig.apache.org
• datastax.com
• youtube.com
• Google images
Abhijeet raj,131001 27
Conclusion
• Hadoop has been very effective solution for
companies dealing with the data in petabytes
or big data.
• Has overcame the limitations of traditional
data storage problems.
• Being open source , widely accepted
Abhijeet raj,131001 28
Abhijeet raj,131001 29
•
Abhijeet raj,131001 30

Contenu connexe

Tendances

Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn
 

Tendances (20)

PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Hive
HiveHive
Hive
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
Apache Hadoop Tutorial | Hadoop Tutorial For Beginners | Big Data Hadoop | Ha...
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 

En vedette

Blue brain project ppt
Blue brain project pptBlue brain project ppt
Blue brain project ppt
Lishita Shah
 

En vedette (17)

Hadoop
HadoopHadoop
Hadoop
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Seminar_3D INTERNET
Seminar_3D INTERNETSeminar_3D INTERNET
Seminar_3D INTERNET
 
Blue brain by MAYANK SAHU
Blue brain by MAYANK SAHUBlue brain by MAYANK SAHU
Blue brain by MAYANK SAHU
 
3D Internet
3D Internet 3D Internet
3D Internet
 
Smart card technology
Smart card technologySmart card technology
Smart card technology
 
Best Ever PPT Of Bluebrain
Best Ever PPT Of BluebrainBest Ever PPT Of Bluebrain
Best Ever PPT Of Bluebrain
 
3d internet
3d internet3d internet
3d internet
 
Bluebrain
BluebrainBluebrain
Bluebrain
 
Blue brain
Blue brain Blue brain
Blue brain
 
Blue brain project ppt
Blue brain project pptBlue brain project ppt
Blue brain project ppt
 
Bulletin d'informations n°001 18 avril 2016 18h00-vf
Bulletin d'informations n°001 18 avril 2016   18h00-vfBulletin d'informations n°001 18 avril 2016   18h00-vf
Bulletin d'informations n°001 18 avril 2016 18h00-vf
 
Why Most Of IT Companies outsourcing?
Why Most Of IT Companies outsourcing?Why Most Of IT Companies outsourcing?
Why Most Of IT Companies outsourcing?
 
LOFAR
LOFARLOFAR
LOFAR
 
FREE Phonics worksheets
FREE Phonics worksheetsFREE Phonics worksheets
FREE Phonics worksheets
 
Lauren CV 2016
Lauren CV 2016Lauren CV 2016
Lauren CV 2016
 
Heatkal Container Design Solutions (EN 12079)
Heatkal   Container Design Solutions (EN 12079)Heatkal   Container Design Solutions (EN 12079)
Heatkal Container Design Solutions (EN 12079)
 

Similaire à Hadoop

Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Abdul Nasir
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
Thanh Nguyen
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
ManiMaran230751
 

Similaire à Hadoop (20)

M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
Hadoop Distriubted File System (HDFS) presentation 27- 5-2015
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Big data analysis using hadoop cluster
Big data analysis using hadoop clusterBig data analysis using hadoop cluster
Big data analysis using hadoop cluster
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
Hadoop
HadoopHadoop
Hadoop
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar SlidesHp Converged Systems and Hortonworks - Webinar Slides
Hp Converged Systems and Hortonworks - Webinar Slides
 

Dernier

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
Health
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 

Dernier (20)

Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 

Hadoop

  • 1. GANDHI INSTITUTE FOR TECHNOLOGICAL ADVANCEMENT, BHUBANESWAR TECHNICAL SEMINAR ON HADOOP GUIDED BY- PRESENTED BY- PROF.KUNDAN CHANDRA PATRA NAME-ABHIJEET RAJ PROF. SWOGAT KUMAR JENA BRANCH-CSE(1) PROF. SAROJ KUMAR MOHANTY REG NO.-1301287529
  • 2. CONTENTS - 1. INTRODUCTION TO HADOOP 2. HADOOP-HISTORY AND ORIGIN 3. BIG DATA ANALYTICS AND CHALLENGES 4. HADOOP ECOSYSTEM 5. HDFS ARCHITECTURE 6. HADOOP VS RDBMS 7. MAP REDUCE 8. PIG AND HIVE 9. CONCLUSION 1Abhijeet raj,131001
  • 3. INTRODUCTION- • What is Hadoop- • Apache Hadoop is an open-source software framework for distribuited storage and processing of large data • Written in java • Based on Google file system(GFS) 2Abhijeet raj,131001
  • 4. Continued... • It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. • Hadoop framework consists on two main layers • HDFS • Map Reduce Abhijeet raj,131001 3
  • 5. History and Origin • Doug cutting trying to make an open source search engine in 2003 • Google released their distributed system papers called Map/Reduce and Google file system (GFS) which powered Google search engine: Abhijeet raj,131001 4
  • 6. Continued... • Doug cutting took these ideas and started to work on open source • In 2006 he joins Yahoo! and the distributed system named as Hadoop • Yahoo open sourced it through Apache organization Abhijeet raj,131001 5
  • 7. Organizations using Hadoop • Amazon • Adobe • Cloudspace • Ebay • Facebook • Google • IBM • LinkedIn • yahoo Abhijeet raj,131001 6
  • 8. Big data analytics and challenges • Minimum size of that a Big Data file starts is at least 1 Terabyte. • 4 V’s tossed for Big Data:- 1. VOLUME- The scale of data 2. VARIETY- Different forms of data 3. VELOCITY- Analysis of streaming data 4. VARACITY- Uncertainity of data Abhijeet raj,131001 7
  • 9. Challenges for Big Data processing • Meeting the need for speed • Scale • Continuous Availability • Displaying meaningful results • Workload diversity • Data security • Cost • Manageability Abhijeet raj,131001 8
  • 10. Hadoop vs traditional RDBMS Abhijeet raj,131001 9 Factors Hadoop RDBMS Size of data Petabytes Gigabytes Integrity of data Low High Data schema Dynamic Static Access method Interactive and batch Batch Scaling Linear Non linear Data structure Unstructured/structured Structured Normalization of data Not required Required Query response time Has latency(due to batch process) Can be near immediate
  • 12. HDFS(Hadoop Distribuited File System) • a distributed file system designed to run on commodity hardware • It is suitable for the distributed storage and processing. • The built-in servers of namenode and datanode help users to easily check the status of cluster. • HDFS provides file permissions and authentication. Abhijeet raj,131001 11
  • 13. Continued... Namenode • Namenode is the node which stores the filesystem metadata i.e. which file maps to what block locations and which blocks are stored on which datanode. Datanode • The data node is where the actual data resides. Abhijeet raj,131001 12
  • 14. Continued... Job tracker • primary function of the job tracker is resource management ,tracking resource availability and task life cycle management Task tracker • Follow the orders of the job tracker and updating the job tracker with its progress status periodically. Abhijeet raj,131001 13
  • 16. Goals of HDFS • Fault detection and recovery • Huge datasets • Reduce network traffic • Increases throughput Abhijeet raj,131001 15
  • 17. Map Reduce • MapReduce is a processing technique and a program model for distributed computing based on java • Map-data are broken into tuples • Reduce-combines the tuples into a smaller form Abhijeet raj,131001 16
  • 19. Advantages of Map Reduce • Easy to scale data processing over multiple computing nodes. • Parallel processing. • Fast. • Simple model of programming Abhijeet raj,131001 18
  • 20. HBASE • Developed by Apache software foundation • Database for Hadoop. • Open source • Non-relational Abhijeet raj,131001 19
  • 21. Continued... • Distribuited • Written in java • Connectivity is done using JDBC –Type 4 driver Abhijeet raj,131001 20
  • 22. YARN • Yet Another Resource Negotiator • In Yarn, the job tracker is split into two different daemons called Resource Manager and Node Manager Abhijeet raj,131001 21
  • 24. PIG • Analyzing large data sets that consists of a high-level language for expressing data analysis programs • Structure is amenable to substantial parallelization Abhijeet raj,131001 23
  • 25. Continued... • Easy of programming • Optimization opportunities • Extensibility Abhijeet raj,131001 24
  • 26. HIVE • Data warehouse software facilitates querying and managing large datasets • Allows traditional map/reduce programmers to plug in their custom mappers and reducers Abhijeet raj,131001 25
  • 27. PIG VS HIVE Abhijeet raj,131001 26 PIG HIVE TYPES OF FLOW PROCEDURAL LANGUAGE DECLARATIVE LANGUAGE EASY OF USE COMPLEX EASY NATURE OF USAGE EFFICIENCY IN COMPUTING ANALYTICS AREA TYPE OF DATA VARIABLES TABLES DEBUGGING FACILITY DEBUGGED LOCALLY COMPLEX MAINTENANCE MORE LESS DEVELOPMENT TIME MORE LESS HANDLING BIG DATA HANDLES MORE DATA MEMORY OVERFLOW
  • 28. REFERENCES • hadoop.apache.org • tutorialspoint.com • hbase.apache.org • en.wikipedia.org/wiki/Apache_Hadoop • Pig.apache.org • datastax.com • youtube.com • Google images Abhijeet raj,131001 27
  • 29. Conclusion • Hadoop has been very effective solution for companies dealing with the data in petabytes or big data. • Has overcame the limitations of traditional data storage problems. • Being open source , widely accepted Abhijeet raj,131001 28