SlideShare une entreprise Scribd logo
1  sur  20
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.1
Prabhu Thukkaram
Director, Product Development
Oracle Complex Processing & SOA Suite
Feb 28, 2014
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.2
What is Big Data ?
HDFS
Map Reduce
HBase
Columnar DB
PIG Hive
ETL Tools BI Reporting
Self Healing
Clustered Storage
System
Distributed Data
Processing
Higher level
abstraction
Top-level interfaces
Structured Data,, Excel, etc
Unstructured & Semi-Structured Data, Web
Logs, Images, etc
SQOOP
Zoo
Keeper
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.3
HBase Quick Overview
 Relational Database – Product Table
Product Id Price SKU Inventory
Count
0001 1300 SKU0001 10
0002 2800 SKU0002 25
0003 5600 SKU0003 8
 Ideal for OLTP transactions
 Faster writes and record updates
 But slow for OLAP
 E.g. Select sum(InventoryCount) from Products;
 Reason:- Data for column “InventoryCount” is not contiguous
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.4
HBase Quick Overview
 HBase – Product Table
Product Id 0001 0002 0003
Price 1300 2800 5600
SKU SKU0001 SKU0002 SKU0003
Inventory
Count
10 25 8
 Extremely fast for OLAP
 E.g. Select sum(InventoryCount) from Products;
 Supports big data analytics – single table with million columns and
billion rows
43
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.5
Hadoop Cluster
Master
Job
Tracker
Name
Node
Slave 1
Task
Tracker
Data
Node
Slave 2
Task
Tracker
Data
Node
 Name Node
 Does not store data, maintains
directory tree of all files in the
cluster
 Tracks data blocks of a file across
the cluster
 Client apps like Hadoop Shell/CLI
talk to Name Node to locate,
create, move, rename, and delete
a file
 Returns a list of Data Node
servers where data lives
 Single point of failure, addressed
in Hadoop V2 or YARN
Hadoop
Shell, CLI,
etc
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.6
Hadoop/HDFS cluster
Master
Job
Tracker
Name
Node
Slave 1
Task
Tracker
Data
Node
Slave 2
Task
Tracker
Data
Node
 Data Node
 Stores & replicates data on the file
system
 Connects to Name Node on
startup & responds to file system
operations
 Hadoop Shell/CLI clients can talk
directly to Data Node if they know
the location of the data
 Data Nodes talk to each other
when replicating data
Hadoop
Shell, CLI,
etc
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.7
Writing a file to HDFS
Hadoop Client
NameNode
DataNode 1 DataNode 6DataNode 5 ……
File.txt
Blk A
Blk B
Blk C
Wants to write
Blocks A, B, C of
File.txt
Ok, write to data
nodes 1,5,6
Blk A Blk B Blk C
Blk A
Replication of Blk A
 Client consults Name Node
 Client writes block directly to one Data Node
 That Data Node replicates the block
 Client writes next block and the cycle repeats
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.8
Hadoop/HDFS cluster
Master
Job
Tracker
Name
Node
Slave 1
Task
Tracker
Data
Node
Slave 2
Task
Tracker
Data
Node
 Job Tracker – Accepts MR
jobs from Clients
 Contacts & submits MR tasks to
Task Trackers on located Data
Nodes
 Monitors Task Tracker nodes for
heartbeat/failures and resubmits
to a different Task Tracker as
needed
 Updates the status of a job when
complete
 Single point of failure, fixed in
MRV2 or YARN
 Note:- Jobs run as batch and clients
can retrieve the status by querying the
Job Tracker
Hadoop
Shell, CLI,
etc
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.9
Hadoop/HDFS cluster
Master
Job
Tracker
Name
Node
Slave 1
Task
Tracker
Data
Node
Slave 2
Task
Tracker
Data
Node
 Task Tracker – Accepts
map, & reduce tasks from
Job Tracker
 A predefined set of slots
determine the number of tasks it
can accept
 Spawns a separate JVM process
for the task
 Notifies the Job Tracker when the
process finishesHadoop
Shell, CLI,
etc
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.10
Map Reduce Example
• User submits a MR job/jar for input file URL hdfs://File.txt to Job Submitter on the Client JVM. Job Submitter Client
contacts Job Tracker to obtain a Job Id
• Job Tracker creates Map Tasks based on the number of input splits. Reduce jobs are defined by the job itself, configured
or in API call setNumReduceTasks()
• Client contacts Name Node to compute input splits. Copies the job jar, computed input splits, etc. to Job Tracker’s file
system with a directory named after the Job Id. Submits the Job. Job Tracker adds job to its Queue.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.11
Map Reduce Example
• Map Task splits records and passes each record to user’s Map logic code
• In above example, user’s Map logic tokenizes each record to generate one or more key value pairs.
• Output of a Map task is partitioned as per the defined # of reducers, shuffled, and sorted. Each partition output is then
routed to its Reducer
• Reducer merges partition output from other Map tasks in the cluster and calls user’s reduce logic
• M/R guarantees that the input to every Reducer is sorted by key
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.12
Map Reduce – Under the Hood
Source: Hadoop, The Definitive Guide.
User’s
Map
Logic
Record Split
Sorted & partitioned
Single Map Task
Merged
User’s
Reduce
Logic
Single Reduce Task
From other Map Tasks
To other Reducers
Data Block
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.13
Map Reduce – Summary
Map – Process of organizing entity data as key-value pairs. Key could be
Customer Id, Purchase Order Id, etc.
M/R Framework – Ensures all data relevant to an entity/key is delivered
to a single Reducer
Reduce – Process of aggregating data related to an Entity and deducing
information.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.14
Risk Analysis - Large no of CC txns, no
direct deposit into checking over the last
two months implies the customer is
unemployed and at high risk of
defaulting
Map Reduce - Risk Analysis
HDFS Global
View of
Customer
Credit Card Txns
Chat
Session
Checking
account
deposits and
withdrawals
Map
Phase
Reduce
Phase
Risk
Score
Gathers all data (CC
txns, chats, withdrawals,
etc.) pertaining to a single
customer.
Data in HDFS changes frequently and
hence the need to reevaluate risk using
batch jobs. Reevaluated results are
written to a DB to enable business
decisions. E.g. To approve a new credit
card or loan
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.15
Hadoop V1 - Limitations
 Cluster resource management is tightly coupled to Map Reduce Job
Tracker
 Job Tracker Functionality in V1
 Cluster resource management
 Application life-cycle Management (Job Scheduling/Re-scheduling/Monitoring)
 Can only run Map Reduce applications, poor utilization of cluster
 Need to run other kinds of applications – Real-
time, Graph, Messaging, etc.
 Scalability & single point of failure (Name Node & Job Tracker)
 Lack of wire compatibility protocols
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.16
Hadoop MRV2 or YARN
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.17
Hadoop MRV2 or YARN
 Job Tracker in V1 split into
 Global Resource Manager
 Application Master per Job Request
 Node Manager
 Application - classic MR job or a
DAG of jobs
Resource
Manager
Node
Manager
Container
App
Master
Node
Manager
Container
App
Master
Node
Manager
Container Container
Client Client
Job Status
Job Submission
Node Status
Resource Request
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.18
Hadoop MRV2 or YARN
Resource
Manager
 Resource Manager
 Ultimate authority for managing and scheduling resources in
cluster
 Works with the Node Manager to track and utilize available
containers
 Container is the unit of resource in YARN. E.g. 2 Cores & 2
GB memory, Disk, etc.
 Accepts Jobs from clients and delegates it to an Application
Master
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.19
Hadoop MRV2 or YARN
App Master
 Application Master
 Negotiates the required resources for job with RM
 Tracks job status and monitors progress
 By shifting job control to App Master in local slaves, YARN
provides better scale out and fault tolerance
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.20
Big Data Overview - Next

Contenu connexe

Tendances

Bringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingBringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingDataWorks Summit
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msIlya Ganelin
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d methodAjith Narayanan
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Databricks
 
Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Streamlio
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016 Databricks
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedMichael Spector
 
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillHarnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillTerence Yim
 
Production Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlibProduction Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlibSpark Summit
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Provectus
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streamingdatamantra
 
Data Streaming Technology Overview
Data Streaming Technology OverviewData Streaming Technology Overview
Data Streaming Technology OverviewDan Lynn
 

Tendances (20)

Bringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingBringing complex event processing to Spark streaming
Bringing complex event processing to Spark streaming
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2ms
 
Analyze database system using a 3 d method
Analyze database system using a 3 d methodAnalyze database system using a 3 d method
Analyze database system using a 3 d method
 
Power of the AWR Warehouse
Power of the AWR WarehousePower of the AWR Warehouse
Power of the AWR Warehouse
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
 
Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018Self Regulating Streaming - Data Platforms Conference 2018
Self Regulating Streaming - Data Platforms Conference 2018
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillHarnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache Twill
 
Production Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlibProduction Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlib
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Data Streaming Technology Overview
Data Streaming Technology OverviewData Streaming Technology Overview
Data Streaming Technology Overview
 

En vedette

Setup 3 Node Kafka Cluster on AWS - Hands On
Setup 3 Node Kafka Cluster on AWS - Hands OnSetup 3 Node Kafka Cluster on AWS - Hands On
Setup 3 Node Kafka Cluster on AWS - Hands Onhkbhadraa
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Lightbend
 
Nmap Hacking Guide
Nmap Hacking GuideNmap Hacking Guide
Nmap Hacking GuideAryan G
 
Apache Spark: Coming up to speed
Apache Spark: Coming up to speedApache Spark: Coming up to speed
Apache Spark: Coming up to speedAdarsh Pannu
 
Incident response: Advanced Network Forensics
Incident response: Advanced Network ForensicsIncident response: Advanced Network Forensics
Incident response: Advanced Network ForensicsNapier University
 
N map presentation
N map presentationN map presentation
N map presentationulirraptor
 
Nmap(network mapping)
Nmap(network mapping)Nmap(network mapping)
Nmap(network mapping)shwetha mk
 
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...skpatel91
 
Hacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning TechniquesHacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning Techniquesamiable_indian
 
A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark Anyscale
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKMaxim Shelest
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark StreamingGerard Maas
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaExploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaLightbend
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache KafkaBen Stopford
 

En vedette (20)

Setup 3 Node Kafka Cluster on AWS - Hands On
Setup 3 Node Kafka Cluster on AWS - Hands OnSetup 3 Node Kafka Cluster on AWS - Hands On
Setup 3 Node Kafka Cluster on AWS - Hands On
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
 
Nmap Hacking Guide
Nmap Hacking GuideNmap Hacking Guide
Nmap Hacking Guide
 
Glint with Apache Spark
Glint with Apache SparkGlint with Apache Spark
Glint with Apache Spark
 
Nmap
NmapNmap
Nmap
 
Apache Spark: Coming up to speed
Apache Spark: Coming up to speedApache Spark: Coming up to speed
Apache Spark: Coming up to speed
 
NMAP - The Network Scanner
NMAP - The Network ScannerNMAP - The Network Scanner
NMAP - The Network Scanner
 
Incident response: Advanced Network Forensics
Incident response: Advanced Network ForensicsIncident response: Advanced Network Forensics
Incident response: Advanced Network Forensics
 
N map presentation
N map presentationN map presentation
N map presentation
 
Nmap(network mapping)
Nmap(network mapping)Nmap(network mapping)
Nmap(network mapping)
 
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
Detection of Idle Stealth Port Scan Attack in Network Intrusion Detection Sys...
 
Understanding NMAP
Understanding NMAPUnderstanding NMAP
Understanding NMAP
 
Nmap Basics
Nmap BasicsNmap Basics
Nmap Basics
 
Hacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning TechniquesHacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning Techniques
 
A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark A Deep Dive into Structured Streaming in Apache Spark
A Deep Dive into Structured Streaming in Apache Spark
 
Apache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACKApache Kafka lessons learned @PAYBACK
Apache Kafka lessons learned @PAYBACK
 
Dive into Spark Streaming
Dive into Spark StreamingDive into Spark Streaming
Dive into Spark Streaming
 
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache KafkaExploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
Exploring Reactive Integrations With Akka Streams, Alpakka And Apache Kafka
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
 
Streamsets and spark
Streamsets and sparkStreamsets and spark
Streamsets and spark
 

Similaire à Hadoop and Big Data Overview

Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317Nan Zhu
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseMingliang Liu
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics Franco Ucci
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and futureCodemotion
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...redpel dot com
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataInfiniteGraph
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider projectSteve Loughran
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0Adam Muise
 
Oracle GoldenGate Performance Tuning
Oracle GoldenGate Performance TuningOracle GoldenGate Performance Tuning
Oracle GoldenGate Performance TuningBobby Curtis
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.
 
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARNOne Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARNDataWorks Summit
 

Similaire à Hadoop and Big Data Overview (20)

Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
Stream Analytics
Stream Analytics Stream Analytics
Stream Analytics
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Hadoop past, present and future
Hadoop past, present and futureHadoop past, present and future
Hadoop past, present and future
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management TrendsMeetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
 
PRAFUL_HADOOP
PRAFUL_HADOOPPRAFUL_HADOOP
PRAFUL_HADOOP
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
Oracle GoldenGate Performance Tuning
Oracle GoldenGate Performance TuningOracle GoldenGate Performance Tuning
Oracle GoldenGate Performance Tuning
 
Session 203 iouc summit database
Session 203 iouc summit databaseSession 203 iouc summit database
Session 203 iouc summit database
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARNOne Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
 

Dernier

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Dernier (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Hadoop and Big Data Overview

  • 1. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.1 Prabhu Thukkaram Director, Product Development Oracle Complex Processing & SOA Suite Feb 28, 2014
  • 2. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.2 What is Big Data ? HDFS Map Reduce HBase Columnar DB PIG Hive ETL Tools BI Reporting Self Healing Clustered Storage System Distributed Data Processing Higher level abstraction Top-level interfaces Structured Data,, Excel, etc Unstructured & Semi-Structured Data, Web Logs, Images, etc SQOOP Zoo Keeper
  • 3. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.3 HBase Quick Overview  Relational Database – Product Table Product Id Price SKU Inventory Count 0001 1300 SKU0001 10 0002 2800 SKU0002 25 0003 5600 SKU0003 8  Ideal for OLTP transactions  Faster writes and record updates  But slow for OLAP  E.g. Select sum(InventoryCount) from Products;  Reason:- Data for column “InventoryCount” is not contiguous
  • 4. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.4 HBase Quick Overview  HBase – Product Table Product Id 0001 0002 0003 Price 1300 2800 5600 SKU SKU0001 SKU0002 SKU0003 Inventory Count 10 25 8  Extremely fast for OLAP  E.g. Select sum(InventoryCount) from Products;  Supports big data analytics – single table with million columns and billion rows 43
  • 5. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.5 Hadoop Cluster Master Job Tracker Name Node Slave 1 Task Tracker Data Node Slave 2 Task Tracker Data Node  Name Node  Does not store data, maintains directory tree of all files in the cluster  Tracks data blocks of a file across the cluster  Client apps like Hadoop Shell/CLI talk to Name Node to locate, create, move, rename, and delete a file  Returns a list of Data Node servers where data lives  Single point of failure, addressed in Hadoop V2 or YARN Hadoop Shell, CLI, etc
  • 6. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.6 Hadoop/HDFS cluster Master Job Tracker Name Node Slave 1 Task Tracker Data Node Slave 2 Task Tracker Data Node  Data Node  Stores & replicates data on the file system  Connects to Name Node on startup & responds to file system operations  Hadoop Shell/CLI clients can talk directly to Data Node if they know the location of the data  Data Nodes talk to each other when replicating data Hadoop Shell, CLI, etc
  • 7. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.7 Writing a file to HDFS Hadoop Client NameNode DataNode 1 DataNode 6DataNode 5 …… File.txt Blk A Blk B Blk C Wants to write Blocks A, B, C of File.txt Ok, write to data nodes 1,5,6 Blk A Blk B Blk C Blk A Replication of Blk A  Client consults Name Node  Client writes block directly to one Data Node  That Data Node replicates the block  Client writes next block and the cycle repeats
  • 8. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.8 Hadoop/HDFS cluster Master Job Tracker Name Node Slave 1 Task Tracker Data Node Slave 2 Task Tracker Data Node  Job Tracker – Accepts MR jobs from Clients  Contacts & submits MR tasks to Task Trackers on located Data Nodes  Monitors Task Tracker nodes for heartbeat/failures and resubmits to a different Task Tracker as needed  Updates the status of a job when complete  Single point of failure, fixed in MRV2 or YARN  Note:- Jobs run as batch and clients can retrieve the status by querying the Job Tracker Hadoop Shell, CLI, etc
  • 9. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.9 Hadoop/HDFS cluster Master Job Tracker Name Node Slave 1 Task Tracker Data Node Slave 2 Task Tracker Data Node  Task Tracker – Accepts map, & reduce tasks from Job Tracker  A predefined set of slots determine the number of tasks it can accept  Spawns a separate JVM process for the task  Notifies the Job Tracker when the process finishesHadoop Shell, CLI, etc
  • 10. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.10 Map Reduce Example • User submits a MR job/jar for input file URL hdfs://File.txt to Job Submitter on the Client JVM. Job Submitter Client contacts Job Tracker to obtain a Job Id • Job Tracker creates Map Tasks based on the number of input splits. Reduce jobs are defined by the job itself, configured or in API call setNumReduceTasks() • Client contacts Name Node to compute input splits. Copies the job jar, computed input splits, etc. to Job Tracker’s file system with a directory named after the Job Id. Submits the Job. Job Tracker adds job to its Queue.
  • 11. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.11 Map Reduce Example • Map Task splits records and passes each record to user’s Map logic code • In above example, user’s Map logic tokenizes each record to generate one or more key value pairs. • Output of a Map task is partitioned as per the defined # of reducers, shuffled, and sorted. Each partition output is then routed to its Reducer • Reducer merges partition output from other Map tasks in the cluster and calls user’s reduce logic • M/R guarantees that the input to every Reducer is sorted by key
  • 12. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.12 Map Reduce – Under the Hood Source: Hadoop, The Definitive Guide. User’s Map Logic Record Split Sorted & partitioned Single Map Task Merged User’s Reduce Logic Single Reduce Task From other Map Tasks To other Reducers Data Block
  • 13. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.13 Map Reduce – Summary Map – Process of organizing entity data as key-value pairs. Key could be Customer Id, Purchase Order Id, etc. M/R Framework – Ensures all data relevant to an entity/key is delivered to a single Reducer Reduce – Process of aggregating data related to an Entity and deducing information.
  • 14. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.14 Risk Analysis - Large no of CC txns, no direct deposit into checking over the last two months implies the customer is unemployed and at high risk of defaulting Map Reduce - Risk Analysis HDFS Global View of Customer Credit Card Txns Chat Session Checking account deposits and withdrawals Map Phase Reduce Phase Risk Score Gathers all data (CC txns, chats, withdrawals, etc.) pertaining to a single customer. Data in HDFS changes frequently and hence the need to reevaluate risk using batch jobs. Reevaluated results are written to a DB to enable business decisions. E.g. To approve a new credit card or loan
  • 15. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.15 Hadoop V1 - Limitations  Cluster resource management is tightly coupled to Map Reduce Job Tracker  Job Tracker Functionality in V1  Cluster resource management  Application life-cycle Management (Job Scheduling/Re-scheduling/Monitoring)  Can only run Map Reduce applications, poor utilization of cluster  Need to run other kinds of applications – Real- time, Graph, Messaging, etc.  Scalability & single point of failure (Name Node & Job Tracker)  Lack of wire compatibility protocols
  • 16. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.16 Hadoop MRV2 or YARN
  • 17. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.17 Hadoop MRV2 or YARN  Job Tracker in V1 split into  Global Resource Manager  Application Master per Job Request  Node Manager  Application - classic MR job or a DAG of jobs Resource Manager Node Manager Container App Master Node Manager Container App Master Node Manager Container Container Client Client Job Status Job Submission Node Status Resource Request
  • 18. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.18 Hadoop MRV2 or YARN Resource Manager  Resource Manager  Ultimate authority for managing and scheduling resources in cluster  Works with the Node Manager to track and utilize available containers  Container is the unit of resource in YARN. E.g. 2 Cores & 2 GB memory, Disk, etc.  Accepts Jobs from clients and delegates it to an Application Master
  • 19. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.19 Hadoop MRV2 or YARN App Master  Application Master  Negotiates the required resources for job with RM  Tracks job status and monitors progress  By shifting job control to App Master in local slaves, YARN provides better scale out and fault tolerance
  • 20. Copyright © 2014, Oracle and/or its affiliates. All rights reserved.20 Big Data Overview - Next

Notes de l'éditeur

  1. Reduce phase performs risk analysis for the customer. Data in HDFS changes frequently and hence the need to reevaluate risk using batch jobs. Reevaluated results are written to a DB to enable business decisions. E.g. To approve a new credit card or loan