SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
© 2011 IBM Corporation
Hadoop architecture
IBM Information Management – Cloud Computing Center of Competence
IBM Canada Labs
© 2011 IBM Corporation
Agenda
• Terminology review
• HDFS
• MapReduce
• Type of nodes
• Topology awareness
2
© 2011 IBM Corporation
Agenda
• Terminology review
• HDFS
• MapReduce
• Type of nodes
• Topology awareness
3
© 2011 IBM Corporation
4
Node 1
Terminology review
© 2011 IBM Corporation
5
Node 2
Node 1
Terminology review
© 2011 IBM Corporation
6
Node 2
Node n
…
Node 1
Terminology review
© 2011 IBM Corporation
7
Rack 1
Node 2
Node n
…
Node 1
Terminology review
© 2011 IBM Corporation
8
Rack 1
Node 2
Node n
…
Node 1
Node 2
Node n
…
Rack 2
Node 1
Terminology review
© 2011 IBM Corporation
9
Rack 1
Node 2
Node n
…
Node 1
Node 2
Node n
…
Rack 2
Node 1
Node 2
Node n
…
Rack n
Node 1
…
Terminology review
© 2011 IBM Corporation
10
Hadoop cluster
Rack 1
Node 2
Node n
…
Node 1
Node 2
Node n
…
Rack 2
Node 1
Node 2
Node n
…
Rack n
Node 1
…
Terminology review
© 2011 IBM Corporation
Hadoop architecture
• Two main components:
– Hadoop Distributed File System (HDFS)
11
– MapReduce Engine
© 2011 IBM Corporation
Agenda
• Terminology review
• HDFS
• MapReduce
• Type of nodes
• Topology awareness
12
© 2011 IBM Corporation
Hadoop distributed file system (HDFS)
13
• Hadoop file system that runs on top of
existing file system
• Designed to handle very large files with
streaming data access patterns
• Uses blocks to store a file or parts of a file
© 2011 IBM Corporation
HDFS - Blocks
14
• File Blocks
– 64MB (default), 128MB (recommended) – compare to
4KB in UNIX
– Behind the scenes, 1 HDFS block is supported by
multiple operating system (OS) blocks
128 MB
OS Blocks
HDFS Block
. . .
© 2011 IBM Corporation
HDFS - Blocks
15
• Fits well with replication to provide fault tolerance and
availability
• Advantages of blocks:
– Fixed size – easy to calculate how many fit on a disk
– A file can be larger than any single disk in the network
– If a file or a chunk of the file is smaller than the block size, only
needed space is used. Eg: 420MB file is split as:
128 MB 128 MB128 MB 36 MB
© 2011 IBM Corporation
HDFS - Replication
• Blocks with data are replicated to multiple nodes
• Allows for node failure without data loss
16
Node 1
Node 2
Node 3
© 2011 IBM Corporation
Writing a file to HDFS
17
© 2011 IBM Corporation
Writing a file to HDFS
18
© 2011 IBM Corporation
Writing a file to HDFS
19
© 2011 IBM Corporation
Writing a file to HDFS
20
© 2011 IBM Corporation
Writing a file to HDFS
21
© 2011 IBM Corporation
Writing a file to HDFS
22
© 2011 IBM Corporation
Writing a file to HDFS
23
© 2011 IBM Corporation
Writing a file to HDFS
24
© 2011 IBM Corporation
Writing a file to HDFS
25
© 2011 IBM Corporation
Writing a file to HDFS
26
© 2011 IBM Corporation
Writing a file to HDFS
27
© 2011 IBM Corporation
HDFS Command line interface
28
• File System Shell (fs)
• Invoked as follows:
hadoop fs <args>
• Example:
• Listing the current directory in hdfs
hadoop fs –ls .
© 2011 IBM Corporation
HDFS Command line interface
29
• FS shell commands take paths URIs as argument
• URI format: scheme://authority/path
• Scheme:
• For the local filesystem, the scheme is file
• For HDFS, the scheme is hdfs
hadoop fs –copyFromLocal
file://myfile.txt
hdfs://localhost/user/keith/myfile.txt
• Scheme and authority are optional
• Defaults are taken from configuration file core-site.xml
© 2011 IBM Corporation
HDFS Command line interface
30
• Many POSIX-like commands
• cat, chgrp, chmod, chown, cp, du, ls, mkdir, mv, rm, stat, tail
• Some HDFS-specific commands
• copyFromLocal, copyToLocal, get, getmerge, put, setrep
© 2011 IBM Corporation
HDFS – Specific commands
31
• copyFromLocal / put
• Copy files from the local file system into fs
hadoop fs -copyFromLocal <localsrc> .. <dst>
hadoop fs -put <localsrc> .. <dst>
Or
© 2011 IBM Corporation
HDFS – Specific commands
32
• copyToLocal / get
• Copy files from fs into the local file system
hadoop fs -copyToLocal [-ignorecrc] [-crc]
<src> <localdst>
hadoop fs -get [-ignorecrc] [-crc]
<src> <localdst>
Or
© 2011 IBM Corporation
HDFS – Specific commands
33
• getMerge
• Get all the files in the directories that match the source file pattern
• Merge and sort them to only one file on local fs
• <src> is kept
hadoop fs -getmerge <src> <localdst>
© 2011 IBM Corporation
HDFS – Specific commands
34
• setRep
• Set the replication level of a file.
• The -R flag requests a recursive change of replication level for an
entire tree.
• If -w is specified, waits until new replication level is achieved.
hadoop fs -setrep [-R] [-w] <rep> <path/file>
© 2011 IBM Corporation
Agenda
• Terminology review
• HDFS
• MapReduce
• Type of nodes
• Topology awareness
35
© 2011 IBM Corporation
MapReduce engine
36
• Technology from Google
• A MapReduce program consists of map and reduce
functions
• A MapReduce job is broken into tasks that run in
parallel
© 2011 IBM Corporation
Agenda
• Terminology review
• HDFS
• MapReduce
• Type of nodes
• Topology awareness
37
© 2011 IBM Corporation
Types of nodes - Overview
38
• HDFS nodes
– NameNode
– DataNode
• MapReduce nodes
– JobTracker
– TaskTracker
• There are other nodes not discussed in this course
© 2011 IBM Corporation
Types of nodes - Overview
39
© 2011 IBM Corporation
Types of nodes - Overview
40
© 2011 IBM Corporation
Types of nodes - Overview
41
© 2011 IBM Corporation
Types of nodes - Overview
42
© 2011 IBM Corporation
Types of nodes - NameNode
43
• NameNode
– Only one per Hadoop cluster
– Manages the filesystem namespace and metadata
– Single point of failure, but mitigated by writing state to
multiple filesystems
– Single point of failure: Don’t use inexpensive
commodity hardware for this node, large memory
requirements
© 2011 IBM Corporation
Types of nodes - DataNode
44
• DataNode
– Many per Hadoop cluster
– Manages blocks with data and
serves them to clients
– Periodically reports to name
node the list of blocks it stores
– Use inexpensive commodity
hardware for this node
© 2011 IBM Corporation
Types of nodes - JobTracker
45
• JobTracker node
– One per Hadoop cluster
– Receives job requests submitted by client
– Schedules and monitors MapReduce jobs on task trackers
© 2011 IBM Corporation
Types of nodes - TaskTracker
46
• TaskTracker node
– Many per Hadoop cluster
– Executes MapReduce operations
– Reads blocks from DataNodes
© 2011 IBM Corporation
Agenda
• Terminology review
• HDFS
• MapReduce
• Type of nodes
• Topology awareness
47
© 2011 IBM Corporation
Topology awareness (or Rack awareness)
48
Bandwidth becomes progressively smaller in the following scenarios:
© 2011 IBM Corporation
Topology awareness
49
Bandwidth becomes progressively smaller in the following scenarios:
1.Process on the same node.
© 2011 IBM Corporation
Bandwidth becomes progressively smaller in the following scenarios:
1.Process on the same node
2.Different nodes on the same rack
Topology awareness
50
© 2011 IBM Corporation
Bandwidth becomes progressively smaller in the following scenarios:
1.Process on the same node
2.Different nodes on the same rack
3.Nodes on different racks in the same data center
Topology awareness
51
© 2011 IBM Corporation
Bandwidth becomes progressively smaller in the following scenarios:
1.Process on the same node
2.Different nodes on the same rack
3.Nodes on different racks in the same data center
4.Nodes in different data centers
Topology awareness
52
© 2011 IBM Corporation
Thank you!

Contenu connexe

Tendances

Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Manish Chopra
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Ceph Community
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
Red Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
Red Hat® Ceph Storage and Network Solutions for Software Defined InfrastructureRed Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
Red Hat® Ceph Storage and Network Solutions for Software Defined InfrastructureIntel® Software
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Programinside-BigData.com
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 

Tendances (20)

Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
 
Parallel HDF5 Developments
Parallel HDF5 DevelopmentsParallel HDF5 Developments
Parallel HDF5 Developments
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
Caching and Buffering in HDF5
Caching and Buffering in HDF5Caching and Buffering in HDF5
Caching and Buffering in HDF5
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Performance Tuning in HDF5
Performance Tuning in HDF5 Performance Tuning in HDF5
Performance Tuning in HDF5
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Red Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
Red Hat® Ceph Storage and Network Solutions for Software Defined InfrastructureRed Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
Red Hat® Ceph Storage and Network Solutions for Software Defined Infrastructure
 
HDF Tools Tutorial
HDF Tools TutorialHDF Tools Tutorial
HDF Tools Tutorial
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral ProgramBig Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
 
HDF4 and HDF5 Performance Preliminary Results
HDF4 and HDF5 Performance Preliminary ResultsHDF4 and HDF5 Performance Preliminary Results
HDF4 and HDF5 Performance Preliminary Results
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
HDF5 I/O Performance
HDF5 I/O PerformanceHDF5 I/O Performance
HDF5 I/O Performance
 

Similaire à hadoop architecture -Big data hadoop

IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopLeons Petražickis
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapakapa rohit
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaData Con LA
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
Lecture10_CloudServicesModel_MapReduceHDFS.pptx
Lecture10_CloudServicesModel_MapReduceHDFS.pptxLecture10_CloudServicesModel_MapReduceHDFS.pptx
Lecture10_CloudServicesModel_MapReduceHDFS.pptxNIKHILGR3
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolutionDataWorks Summit
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 

Similaire à hadoop architecture -Big data hadoop (20)

IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and HadoopIOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
IOD 2013 - Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jha
 
HADOOP.pptx
HADOOP.pptxHADOOP.pptx
HADOOP.pptx
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
big data ppt.ppt
big data ppt.pptbig data ppt.ppt
big data ppt.ppt
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Lecture10_CloudServicesModel_MapReduceHDFS.pptx
Lecture10_CloudServicesModel_MapReduceHDFS.pptxLecture10_CloudServicesModel_MapReduceHDFS.pptx
Lecture10_CloudServicesModel_MapReduceHDFS.pptx
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
 
Ozone and HDFS’s evolution
Ozone and HDFS’s evolutionOzone and HDFS’s evolution
Ozone and HDFS’s evolution
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 

Plus de jasikadogra

Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrangejasikadogra
 
Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrangejasikadogra
 
Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrangejasikadogra
 
Email marketing- Excelrange
Email marketing- ExcelrangeEmail marketing- Excelrange
Email marketing- Excelrangejasikadogra
 
FB marketing notes- Excelrange
FB marketing notes- ExcelrangeFB marketing notes- Excelrange
FB marketing notes- Excelrangejasikadogra
 
Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrangejasikadogra
 
Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrangejasikadogra
 
Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrangejasikadogra
 
Email marketing Notes - excelrange
Email marketing Notes - excelrangeEmail marketing Notes - excelrange
Email marketing Notes - excelrangejasikadogra
 
Email marketing Notes - excelrange
Email marketing Notes - excelrangeEmail marketing Notes - excelrange
Email marketing Notes - excelrangejasikadogra
 
Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrangejasikadogra
 
Email marketing Notes- Excelrange
Email marketing Notes- ExcelrangeEmail marketing Notes- Excelrange
Email marketing Notes- Excelrangejasikadogra
 
Facebook Ads Notes
Facebook Ads NotesFacebook Ads Notes
Facebook Ads Notesjasikadogra
 
Wordpress tutorial - Excelrange
Wordpress tutorial - ExcelrangeWordpress tutorial - Excelrange
Wordpress tutorial - Excelrangejasikadogra
 
digital marketing curriculum
digital marketing curriculumdigital marketing curriculum
digital marketing curriculumjasikadogra
 

Plus de jasikadogra (20)

Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrange
 
Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrange
 
Email marketing
Email marketingEmail marketing
Email marketing
 
Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrange
 
Email marketing- Excelrange
Email marketing- ExcelrangeEmail marketing- Excelrange
Email marketing- Excelrange
 
FB marketing notes- Excelrange
FB marketing notes- ExcelrangeFB marketing notes- Excelrange
FB marketing notes- Excelrange
 
Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrange
 
Email marketing
Email marketingEmail marketing
Email marketing
 
Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrange
 
Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrange
 
Email marketing
Email marketingEmail marketing
Email marketing
 
FB NOTES
FB NOTESFB NOTES
FB NOTES
 
Email marketing Notes - excelrange
Email marketing Notes - excelrangeEmail marketing Notes - excelrange
Email marketing Notes - excelrange
 
Email marketing Notes - excelrange
Email marketing Notes - excelrangeEmail marketing Notes - excelrange
Email marketing Notes - excelrange
 
Email marketing notes- excelrange
Email marketing notes- excelrangeEmail marketing notes- excelrange
Email marketing notes- excelrange
 
Email marketing Notes- Excelrange
Email marketing Notes- ExcelrangeEmail marketing Notes- Excelrange
Email marketing Notes- Excelrange
 
Email marketing
Email marketingEmail marketing
Email marketing
 
Facebook Ads Notes
Facebook Ads NotesFacebook Ads Notes
Facebook Ads Notes
 
Wordpress tutorial - Excelrange
Wordpress tutorial - ExcelrangeWordpress tutorial - Excelrange
Wordpress tutorial - Excelrange
 
digital marketing curriculum
digital marketing curriculumdigital marketing curriculum
digital marketing curriculum
 

Dernier

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 

Dernier (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 

hadoop architecture -Big data hadoop

  • 1. © 2011 IBM Corporation Hadoop architecture IBM Information Management – Cloud Computing Center of Competence IBM Canada Labs
  • 2. © 2011 IBM Corporation Agenda • Terminology review • HDFS • MapReduce • Type of nodes • Topology awareness 2
  • 3. © 2011 IBM Corporation Agenda • Terminology review • HDFS • MapReduce • Type of nodes • Topology awareness 3
  • 4. © 2011 IBM Corporation 4 Node 1 Terminology review
  • 5. © 2011 IBM Corporation 5 Node 2 Node 1 Terminology review
  • 6. © 2011 IBM Corporation 6 Node 2 Node n … Node 1 Terminology review
  • 7. © 2011 IBM Corporation 7 Rack 1 Node 2 Node n … Node 1 Terminology review
  • 8. © 2011 IBM Corporation 8 Rack 1 Node 2 Node n … Node 1 Node 2 Node n … Rack 2 Node 1 Terminology review
  • 9. © 2011 IBM Corporation 9 Rack 1 Node 2 Node n … Node 1 Node 2 Node n … Rack 2 Node 1 Node 2 Node n … Rack n Node 1 … Terminology review
  • 10. © 2011 IBM Corporation 10 Hadoop cluster Rack 1 Node 2 Node n … Node 1 Node 2 Node n … Rack 2 Node 1 Node 2 Node n … Rack n Node 1 … Terminology review
  • 11. © 2011 IBM Corporation Hadoop architecture • Two main components: – Hadoop Distributed File System (HDFS) 11 – MapReduce Engine
  • 12. © 2011 IBM Corporation Agenda • Terminology review • HDFS • MapReduce • Type of nodes • Topology awareness 12
  • 13. © 2011 IBM Corporation Hadoop distributed file system (HDFS) 13 • Hadoop file system that runs on top of existing file system • Designed to handle very large files with streaming data access patterns • Uses blocks to store a file or parts of a file
  • 14. © 2011 IBM Corporation HDFS - Blocks 14 • File Blocks – 64MB (default), 128MB (recommended) – compare to 4KB in UNIX – Behind the scenes, 1 HDFS block is supported by multiple operating system (OS) blocks 128 MB OS Blocks HDFS Block . . .
  • 15. © 2011 IBM Corporation HDFS - Blocks 15 • Fits well with replication to provide fault tolerance and availability • Advantages of blocks: – Fixed size – easy to calculate how many fit on a disk – A file can be larger than any single disk in the network – If a file or a chunk of the file is smaller than the block size, only needed space is used. Eg: 420MB file is split as: 128 MB 128 MB128 MB 36 MB
  • 16. © 2011 IBM Corporation HDFS - Replication • Blocks with data are replicated to multiple nodes • Allows for node failure without data loss 16 Node 1 Node 2 Node 3
  • 17. © 2011 IBM Corporation Writing a file to HDFS 17
  • 18. © 2011 IBM Corporation Writing a file to HDFS 18
  • 19. © 2011 IBM Corporation Writing a file to HDFS 19
  • 20. © 2011 IBM Corporation Writing a file to HDFS 20
  • 21. © 2011 IBM Corporation Writing a file to HDFS 21
  • 22. © 2011 IBM Corporation Writing a file to HDFS 22
  • 23. © 2011 IBM Corporation Writing a file to HDFS 23
  • 24. © 2011 IBM Corporation Writing a file to HDFS 24
  • 25. © 2011 IBM Corporation Writing a file to HDFS 25
  • 26. © 2011 IBM Corporation Writing a file to HDFS 26
  • 27. © 2011 IBM Corporation Writing a file to HDFS 27
  • 28. © 2011 IBM Corporation HDFS Command line interface 28 • File System Shell (fs) • Invoked as follows: hadoop fs <args> • Example: • Listing the current directory in hdfs hadoop fs –ls .
  • 29. © 2011 IBM Corporation HDFS Command line interface 29 • FS shell commands take paths URIs as argument • URI format: scheme://authority/path • Scheme: • For the local filesystem, the scheme is file • For HDFS, the scheme is hdfs hadoop fs –copyFromLocal file://myfile.txt hdfs://localhost/user/keith/myfile.txt • Scheme and authority are optional • Defaults are taken from configuration file core-site.xml
  • 30. © 2011 IBM Corporation HDFS Command line interface 30 • Many POSIX-like commands • cat, chgrp, chmod, chown, cp, du, ls, mkdir, mv, rm, stat, tail • Some HDFS-specific commands • copyFromLocal, copyToLocal, get, getmerge, put, setrep
  • 31. © 2011 IBM Corporation HDFS – Specific commands 31 • copyFromLocal / put • Copy files from the local file system into fs hadoop fs -copyFromLocal <localsrc> .. <dst> hadoop fs -put <localsrc> .. <dst> Or
  • 32. © 2011 IBM Corporation HDFS – Specific commands 32 • copyToLocal / get • Copy files from fs into the local file system hadoop fs -copyToLocal [-ignorecrc] [-crc] <src> <localdst> hadoop fs -get [-ignorecrc] [-crc] <src> <localdst> Or
  • 33. © 2011 IBM Corporation HDFS – Specific commands 33 • getMerge • Get all the files in the directories that match the source file pattern • Merge and sort them to only one file on local fs • <src> is kept hadoop fs -getmerge <src> <localdst>
  • 34. © 2011 IBM Corporation HDFS – Specific commands 34 • setRep • Set the replication level of a file. • The -R flag requests a recursive change of replication level for an entire tree. • If -w is specified, waits until new replication level is achieved. hadoop fs -setrep [-R] [-w] <rep> <path/file>
  • 35. © 2011 IBM Corporation Agenda • Terminology review • HDFS • MapReduce • Type of nodes • Topology awareness 35
  • 36. © 2011 IBM Corporation MapReduce engine 36 • Technology from Google • A MapReduce program consists of map and reduce functions • A MapReduce job is broken into tasks that run in parallel
  • 37. © 2011 IBM Corporation Agenda • Terminology review • HDFS • MapReduce • Type of nodes • Topology awareness 37
  • 38. © 2011 IBM Corporation Types of nodes - Overview 38 • HDFS nodes – NameNode – DataNode • MapReduce nodes – JobTracker – TaskTracker • There are other nodes not discussed in this course
  • 39. © 2011 IBM Corporation Types of nodes - Overview 39
  • 40. © 2011 IBM Corporation Types of nodes - Overview 40
  • 41. © 2011 IBM Corporation Types of nodes - Overview 41
  • 42. © 2011 IBM Corporation Types of nodes - Overview 42
  • 43. © 2011 IBM Corporation Types of nodes - NameNode 43 • NameNode – Only one per Hadoop cluster – Manages the filesystem namespace and metadata – Single point of failure, but mitigated by writing state to multiple filesystems – Single point of failure: Don’t use inexpensive commodity hardware for this node, large memory requirements
  • 44. © 2011 IBM Corporation Types of nodes - DataNode 44 • DataNode – Many per Hadoop cluster – Manages blocks with data and serves them to clients – Periodically reports to name node the list of blocks it stores – Use inexpensive commodity hardware for this node
  • 45. © 2011 IBM Corporation Types of nodes - JobTracker 45 • JobTracker node – One per Hadoop cluster – Receives job requests submitted by client – Schedules and monitors MapReduce jobs on task trackers
  • 46. © 2011 IBM Corporation Types of nodes - TaskTracker 46 • TaskTracker node – Many per Hadoop cluster – Executes MapReduce operations – Reads blocks from DataNodes
  • 47. © 2011 IBM Corporation Agenda • Terminology review • HDFS • MapReduce • Type of nodes • Topology awareness 47
  • 48. © 2011 IBM Corporation Topology awareness (or Rack awareness) 48 Bandwidth becomes progressively smaller in the following scenarios:
  • 49. © 2011 IBM Corporation Topology awareness 49 Bandwidth becomes progressively smaller in the following scenarios: 1.Process on the same node.
  • 50. © 2011 IBM Corporation Bandwidth becomes progressively smaller in the following scenarios: 1.Process on the same node 2.Different nodes on the same rack Topology awareness 50
  • 51. © 2011 IBM Corporation Bandwidth becomes progressively smaller in the following scenarios: 1.Process on the same node 2.Different nodes on the same rack 3.Nodes on different racks in the same data center Topology awareness 51
  • 52. © 2011 IBM Corporation Bandwidth becomes progressively smaller in the following scenarios: 1.Process on the same node 2.Different nodes on the same rack 3.Nodes on different racks in the same data center 4.Nodes in different data centers Topology awareness 52
  • 53. © 2011 IBM Corporation Thank you!