SlideShare une entreprise Scribd logo
1  sur  51
Hadoop Ecosystem
Lior Sidi
Sep 2016
Hello!
I am Lior Sidi
Big data V’s
Volume
Velocity
Variety
What is Hadoop?
• Hadoop – Open source implementation of MapReduce (MR)
• Perform MR Jobs fast and efficient
Goal
generating Value from large datasets
That cannot be analyzed
using traditional technologies
Hadoop Concepts
Requirements
• Linear horizontal scalability
• Jobs run in isolation
• Simple programming model
Challenges and solution
• Ch1: Data access bottleneck
• Sol: Store and process data on same node
• Ch1: Distributed Programming is Difficult
• Sol: Use high level languages API
Hadoop Timeline
2003 Oct
Google File System
paper released
2004 Dec
MapReduce: Simplified Data
Processing on Large Clusters
2006 Oct
Hadoop 1.0 released
2007 Oct
Yahoo Labs creates Pig
2008 Oct
Cloudera, Hadoop
distributor is founded
2010 Sep
Hive and Pig Graduates
2011 Jan
Zookeeper Graduates
2013 Mar
Yarn deployed in Yahoo
2014 Feb
Apache Spark top
Level Apache Project
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
workflow
Visualization
Cluster
management
Storage
Search
Data
Formats
Hadoop Ecosystem
Storage
Hadoop Ecosystem
Storage / HDFS
• “Hadoop Distributed File System”
• Design:
• Write once – read many times pattern
• Cheap hardware
• Low latency data access
• Concepts:
• Block – File is split to Size 128 MB blocks, redundancy - 3
• NameNode (Master) – per cluster - file system namespace for blocks (single point of
failure)
• DataNode (Worker) – per Node - store and retrieve blocks
• Functions:
• High availability – run a second NameNode
• Block caching – block cached in only one DataNode
• Locality - Rack sensitive, network topology
• File permissions – like POSIX – r w x – owner/group/mode file/directory
• Interfaces – HTTP (proxy/direct), Java API
• Cluster balance – evenly spread the block on the cluster
2Rack
1Rack
Data
Block 1
Block 2
Block 3
DataNodeDataNodeDataNodeDataNode
Block 1
Block 1
Block 2
Block 2
Block 3
Block 3
Block 1
DataNode
Block 2
Block 3
NameNode
HDFS proxy Client
file is distribution and
accessed on Hadoop HDFS
Resource
Management
Storage
Hadoop Ecosystem
Resource Management / YARN
• “Yet Another Resource Negotiator”
• Manage and schedule the cluster resource
• Daemons:
• Resource Manager – Per Cluster – manage resource across the cluster
• Node Manager – Per Node – launch and monitor a Container
• Container – execute an app process
• Resource requests for containers:
• Amount of computers (CPU & Memory)
• Locality (node/rack)
• Lifespan: application per user job or long-running apps shared by users
• Scheduling:
• Allocate resource by policy (FIFO, capacity (ordanisation), Fair
Hadoop Cluster
Nodemanager
node
NodeManager
Container
Master
Client node
application
Resource manager node
ResourceManager
Client
Nodemanager
node
NodeManager
Container
Worker
Nodemanager
node
NodeManager
Container
Worker
launch
launch
launch
launch
Launch
YARN app
heartbeat Job scheduling on top
Hadoop Cluster
Resource
Management
Processing
Storage
Hadoop Ecosystem
Processing / MapReduce
• Simplify, large scale, automatic, Fault tolerant development data
processing
• origin - Google paper 2004
• Batch processing
• Hadoop MR:
• JobTracker – 1per cluster - master process, schedule tasks on workers,
monitor progress
• taskTracker – 1 per worker - execute map/reduce tasks locally and
report progress
Processing / MapReduce
LiorRonLior
RonRonAndrey
LiorAndreyLior
CountName
1Lior
1Ron
1Lior
CountName
1Lior
1Andrey
1Lior
CountName
1Andrey
1Ron
1Ron
CountName
4Lior
CountName
3Ron
CountName
2Andrey
Data
Map
ReduceShuffle
& Sort
Hadoop Cluster
Nodemanager
node
NodeManager
Container
JobTracker
Client node
MR program
Resource manager node
ResourceManager
Client
Nodemanager
node
NodeManager
Container
TaskTracker
Nodemanager
node
NodeManager
Container
TaskTracker
launch
launch
launch
launch
Launch
YARN app
heartbeat
MR Job scheduling on top
Hadoop Cluster
Resource
Management
Processing
Storage
Hadoop Ecosystem
Storage / HBase
• Distributes Column Base database on top HDFS
• Real time read/write random access for large data-sets
• Region – tables splitting by row
• Pheonex - SQL on HBase
RowKey Column Family 1 Column Family 2
Col 1.1
Version Data
Col 1.2 Col 1.3
Version Data
Version Data
Hbase Data Model
Resource
Management
coordination
Processing
Storage
Hadoop Ecosystem
Coordination / ZooKeeper
• Hadoop’s distributed coordination service
• Coordinate read/write action on data
• high availability filesystem
• Implementation:
• Data model:
• Tree build from Znodes (1MB data)
• Znode – data changes, ACL (access control list )
• Leader - perform write and broadcast an update
• Follower – pass atomic request to leader
• Lock service
• User groups
• Replicate mode
Coordination / ZooKeeper
Hadoop Cluster
ZooKeaper Service
Leader
HDFSHBase
DataNodeDataNodeDataNode
HMaster Other
Client
RegionRegionRegion
NameNode
/
/HBase HDFS/
Follower
/
/HBase HDFS/
Follower
/
/HBase HDFS/
LOCK LOCK
ZooKeeper
Coordination
example
Resource
Management
coordination
Processing
Storage
Data
Formats
Hadoop Ecosystem
Row Based  Avro
• Language natural data serialization system
• Share many data formats with many code language
• Split able and sortable - Allow easy map reduce
• Rich schema resolution – flexible scheme
• Other Row Based formats
• sequenceFile - Logfile format
• MapFile - Sorted sequenceFile
Row Based  Avro
Header Block 1 Block 2 Block N
Count objs Serialized objs SyncMarker
identifier Metadata: Schema & codec SyncMarker
Size objs
{
"Type":"record"
"Name":"Person"
"Fields":
[{
"name":"firstName",
"type":"string"
"order":"descending"
},{
"name":"age",
"type":"int"
},{...
]
}
Schema
File Structure
File Structure
Parquet
• Columnar storage format
• Skip unneeded columns
• Fast queries & small size
• Efficient nested data store Header Block 1 Block 2 Block N
Column chunk Column chunk Column chunk
Page Page Page Page
Magic Number File Metadata
Footer
Message Person {
Required binary name (UTF8);
Required int32 age (UTF8);
Required group hobbies (LIST) {
Required binary array (UTF8);
}
}
Schema
Data Injection
Resource
Management
coordination
Processing
Storage
Data
Formats
Hadoop Ecosystem
Data Integration / Sqoop
• Import/export structural data
• Sqoop connector:
• import/export from a database
• Sqoop1- command line
• Sqoop2 – service
• Connectors – connect RDBs
Hadoop Cluster
Export MapReduce Job
Database
Table
Sqoop client
Import MapReduce Job
Hdfs Hdfs
Map Map
Hdfs Hdfs
Map Map
metadata
launch launch
ExportImport
Data Integration / Flume
• Event base data injection into Hadoop
• Flume agent components:
• Sources – spoolingDir (create events), Avro(RPC), Http (requests)
• Channel
• Sink – Avro, HDFS, HBase, Solr(=near real time)
• Reliability - Use separate transaction
• Fan out – one source many sinks
• Scale - agent tiers for aggregation multiple sources
• Sink grouping- avoid failure and load balancing
Fan Out
Data Integration / Flume
Hadoop Data
File
system
Flume Agent
Source Channel Sink
Tier 1
Flume Agent
Tier 1
Flume Agent
Tier 1
Flume Agent
Tier 2
Flume Agent
Tier 2
Flume Agent
Tier 3
Flume Agent
Tier 3
Flume Agent
File
system Sink
GroupingScale
HDFS
HBase
Data
Data Integration / Kafka
• distributed publish-subscribe messaging system
• Fast, scalable, durable
• Components:
• Topics – categories of feeds messages
• Procedures – process that publish messages to topic
• Message consumer – processes that subscribe for topic
• Broker – kafka servers on cluster
• Distribution
• Leader – allow read/write
• Follower – replicate
Data
Streaming
Data Injection
Resource
Management
coordination
Processing
Storage
Data
Formats
Hadoop Ecosystem
Data Integration / Streaming
• Stream processing
• Kafka Stream - Process and analyze data in Kafka
• Storm – real-time computation
• Spark streaming – process live data and can apply Spark MLib and
graphX
Flume Agent 1
Data
Kafka
Spark Streaming
Flume Agent 2 Storm
Topic
A
Topic
B
HDFS
1
1
1
2
2
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
Storage
Data
Formats
Hadoop Ecosystem
• Cluster Computing Framework
• In Memory processing
• Language: Scala, Java and Python
• RDD – resilience Distributed dataset
• Read only collection spread in the cluster
• Computation of transformation happened when Action
• DAG engine – schedule many transformations to one optimal Job
• Spark context
• parallel jobs
• Caching
• Broadcast variables (Data/Functions)
• Cluster Manager of executors:
• Local, Standalone, Mesos , Yarn
Computation / Spark
Computation / Spark
Hadoop
Driver
SparkContext
Spark Program
DAG Scheduler
Task Scheduler
Scheduler backend
Executer Executer Executer
Job
Job
Stages
Tasks
Task Task Task
Scripting / Pig
• Data flow programming language - Map reduce abstraction
• support: User defined functions (UDF), Streaming, nested data
• Don’t support: random read/write
• Pig Latin - Scripting language
• Load, store, filtering, Group, Join, Sort, Union and Split, UDF, Co-group
• Modes
• Local – small datasets
• MR mode – run on cluster
• Execution - script, grunt (shell), embedded (java)
• Parameter substitution – run script with different parameters
• Similar
• Crunch – MR pipeline with Java (no UDF)
Query / Hive
• Components
• MetaStore – tables description
• HiveQL – SQL dialect (SQL: 2003)
• tables Management
• warehouse directory
• external tables
• functionality
• Bucketing and Partitions by column
• Support UDF and UDAF (aggregate)
• Insert Update Delete:
• Saved in delta files
• Background MR Jobs
• (Available Transaction context)
• Lock table (avoid drop)
Query / Comparison
SparkSql (shark)ImpalaHive
Procedural
development
BI & SQL analyticsBatchUsage
OKBestbadSpeed
MemoryDedicated Deamons on
DataNode
MapReduceimplementation
Persto ,
Drill (SQL: 2011)
Hive On sparkSimilar tools
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
workflow
Storage
Data
Formats
Hadoop Ecosystem
Workflow / Oozie
• Schedule Hadoop jobs
• Job types:
• Workflows – sequence of jobs via Directed Graphs (DAGs)
• Coordinator - trigger jobs by time or availability
start Sqoop Fork
Pig
PigMR
Sub
workflow
FS
(HDFS)
Join End
Control flow
Action
Email
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
workflow
Storage
Search
Data
Formats
Hadoop Ecosystem
Search / Solr
• Full- text search over Hadoop
• Near real time indexing
• REST API
• Based on Apache Lucene java search library
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
workflow
Visualization
Storage
Search
Data
Formats
Hadoop Ecosystem
Visualization / Hue
• Open source Web interface for analyzing data with any Hadoop.
• Application:
• File Browser: HDFS, Hbase
• Scheduling of jobs and workflows : Oozie
• Job Browser: YARN
• SQL : Hive, Impala
• Data analysis: Pig, UDF
• Dynamic Search: Solr
• Notebooks: Spark
• Data Transfer: Sqoop 2
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
workflow
Visualization
Cluster
management
Storage
Search
Data
Formats
Hadoop Ecosystem
Cluster Management / Cloudera
• 100% open source
• The most complete and tested distribution of Hadoop
• Integrate all Hadoop project
• Express – free, end to end administration
• Enterprise – Extra features and support
Cluster Management / Comparison
https://talendexpert.com/cloudera-vs-honworks-vs-mapr
MasterMasterMaster
Other Servers
Worker
Basic Cluster configuration
Resource manager
Standby
Resource Manager
NodeManager
DataNode
Cloudera Manager
Hive GW
ZooKeeper
Impala Daemon
Impala State
Sqoop GW
Spark GW
NameNode
Master
ZooKeeper
Secondary
NameNode
Worker
NodeManager
DataNode
Impala Daemon
Worker
NodeManager
DataNode
Impala Daemon
Worker
NodeManager
DataNode
Impala Daemon
Data
Streaming
Analysis
Data Injection
Resource
Management
coordination
Processing
workflow
Visualization
Cluster
management
Storage
Search
Data
Formats
Hadoop Ecosystem
Thanks!
Any questions?

Contenu connexe

Tendances

Ambari: Agent Registration Flow
Ambari: Agent Registration FlowAmbari: Agent Registration Flow
Ambari: Agent Registration Flow
Hortonworks
 
Triplestore and SPARQL
Triplestore and SPARQLTriplestore and SPARQL
Triplestore and SPARQL
Lino Valdivia
 

Tendances (20)

Linux Directory Structure
Linux Directory StructureLinux Directory Structure
Linux Directory Structure
 
Nfs
NfsNfs
Nfs
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Introduction to Apache Spark Ecosystem
Introduction to Apache Spark EcosystemIntroduction to Apache Spark Ecosystem
Introduction to Apache Spark Ecosystem
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Toward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFSToward Better Multi-Tenancy Support from HDFS
Toward Better Multi-Tenancy Support from HDFS
 
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
Spark Streaming | Twitter Sentiment Analysis Example | Apache Spark Training ...
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Basic commands of linux
Basic commands of linuxBasic commands of linux
Basic commands of linux
 
Ambari: Agent Registration Flow
Ambari: Agent Registration FlowAmbari: Agent Registration Flow
Ambari: Agent Registration Flow
 
Learning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under ContainersLearning from ZFS to Scale Storage on and under Containers
Learning from ZFS to Scale Storage on and under Containers
 
Triplestore and SPARQL
Triplestore and SPARQLTriplestore and SPARQL
Triplestore and SPARQL
 
Spark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New YorkSpark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New York
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
HDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and SupportabilityHDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and Supportability
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
MySQL with DRBD/Pacemaker/Corosync on Linux
 MySQL with DRBD/Pacemaker/Corosync on Linux MySQL with DRBD/Pacemaker/Corosync on Linux
MySQL with DRBD/Pacemaker/Corosync on Linux
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 

Similaire à Hadoop Ecosystem

Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 

Similaire à Hadoop Ecosystem (20)

Hadoop
HadoopHadoop
Hadoop
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Hadoop
HadoopHadoop
Hadoop
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Introduction to hadoop V2
Introduction to hadoop V2Introduction to hadoop V2
Introduction to hadoop V2
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Apache drill
Apache drillApache drill
Apache drill
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Hadoop Ecosystem

Notes de l'éditeur

  1. HDFS – manage the file system across network of machines Design to store big files Master worker pattern Namenode maintain the directory tree –doesn’t maintain a perstistent location but reconstract when reboot Namenode is the most important component in the cluster when it lost the entire access to the cluster is lost therefore it possible to create high availabuility where we
  2. Design to support map reduce but is used for other operations