SlideShare une entreprise Scribd logo
1  sur  15
Hadoop HDFS/MapReduce
Architecture
Hardware
Installation and Configuration
Monitoring
Namenode
HDFS Architecture
Replication
Map Reduce
Hardware Requirements
● NameNode + JobTracker
– >= 2 cores
– >= 8 gigs ram
– >= 40gig disk RAID 10
● DataNode + TaskTracker
– >= 4 cores
– >= (+ 1 (os) 1 (TT) 1 (DN) Reducers Maps) Gig RAM
– >= N Gig disk space JBOD (no raid)
Installation
● Download tar file from hadoop or use a prebuilt
rpm
● https://github.com/gerritjvv/repo
● http://bigtop.apache.org/
Configuration
● $HADOOP_HOME/conf/core-site.xml
● $HADOOP_HOME/conf/mapred-site.xml
● $HADOOP_HOME/conf/hdfs-site.xml
● http://hadoop.apache.org/docs/stable/cluster_setup
●
Configuration Namenode
● Create directory for namenode metadata
– /data/hadoop/name
● Open core-site.xml
– Define fs.default.name = http://<host>:8020
● Open hdfs-site.xml
– Define dfs.name.dir=/data/hadoop/name
– Define dfs.replication=3
– Create dir /data/hadoop/hdfs
– Define dfs.data.dir=/data/hadoop/hdfs
– Defin dfs.http.address=localhost:50070
● Start the namenode with the format option
– /opt/hadoop/bin/hadoop namenode -format
– After the format start the namenode with service hadoop-namenode start
Configuration JobTracker
● Open /opt/hadoop/conf/mapred-site.xml
– Define the property
mapred.job.tracker=<host>:8021
– Create the directory /data/hadoop/mapred
– Define mapred.local.dir=/data/hadoop/mapred
● Start the JobTracker with service hadoop-
jobtracker start
Configuration DataNode
● On each datanode create the directory
/data/hadoop/hdfs (one directory per disk)
● Open /opt/hadoop/conf/hdfs-site.xml
– Define dfs.http.address=<host>:50070
– Define dfs.data.dir=/data/hadoop/hdfs
● Start the datanodes with service hadoop-
datanode start
Configuration Mapreduce
● On each datanode create the directory /data/hadoop/mapred
● Open /opt/hadoop/conf/mapred-site.xml
– Define mapred.local.dir=/data/hadoop/mapred
– Define mapred.tasktracker.map.tasks.maximum=<Number of map
tasks>
– Define mapred.tasktracker.reduce.tasks.maximum=<Number of reduce
tasks>
● Start the TaskTrackers with service hadoop-tasktracker start
Monitoring
● Web Html scraping
– https://github.com/gerritjvv/hadoop-monitoring
● Glanglia
– http://ganglia.info/?p=88
● Cacti
– http://blog.cloudera.com/blog/2009/07/hadoop-graphing
Namenode Edits
● Writes/Updates/Deletes are written to RAM and
to a write ahead log.
● The metadata in RAM is only merged into a
binary file during the secondary namenode
checkpoint
● This file corrupts easily
● Recovery is a manual task
HA
● Yarn and Hadoop 2.0.0
● Experimental
● http://hadoop.apache.org/docs/current/hadoop-yarn
End

Contenu connexe

Tendances

Hadoop installation
Hadoop installationHadoop installation
Hadoop installationhabeebulla g
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva
 
Postgres 12 Cluster Database operations.
Postgres 12 Cluster Database operations.Postgres 12 Cluster Database operations.
Postgres 12 Cluster Database operations.Vijay Kumar N
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storageMarian Marinov
 
Comparison of foss distributed storage
Comparison of foss distributed storageComparison of foss distributed storage
Comparison of foss distributed storageMarian Marinov
 
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webLizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webSzymon Haly
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
Automating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQLAutomating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQLNina Kaufman
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBaseAmal Abid
 
Archlinux install
Archlinux installArchlinux install
Archlinux installsambismo
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelTakahiro Inoue
 
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍Tae Young Lee
 
Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Community
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFSTakahiro Inoue
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 
Web scraping with nutch solr
Web scraping with nutch solrWeb scraping with nutch solr
Web scraping with nutch solrMike Frampton
 

Tendances (20)

Hadoop installation
Hadoop installationHadoop installation
Hadoop installation
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Postgres 12 Cluster Database operations.
Postgres 12 Cluster Database operations.Postgres 12 Cluster Database operations.
Postgres 12 Cluster Database operations.
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storage
 
Comparison of foss distributed storage
Comparison of foss distributed storageComparison of foss distributed storage
Comparison of foss distributed storage
 
GTC Japan 2014
GTC Japan 2014GTC Japan 2014
GTC Japan 2014
 
LizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-webLizardFS-WhitePaper-Eng-v3.9.2-web
LizardFS-WhitePaper-Eng-v3.9.2-web
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Automating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQLAutomating Disaster Recovery PostgreSQL
Automating Disaster Recovery PostgreSQL
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBase
 
Archlinux install
Archlinux installArchlinux install
Archlinux install
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍
 
Ceph Day KL - Bluestore
Ceph Day KL - Bluestore Ceph Day KL - Bluestore
Ceph Day KL - Bluestore
 
はじめてのGlusterFS
はじめてのGlusterFSはじめてのGlusterFS
はじめてのGlusterFS
 
Bluestore
BluestoreBluestore
Bluestore
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Web scraping with nutch solr
Web scraping with nutch solrWeb scraping with nutch solr
Web scraping with nutch solr
 

En vedette

Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Hadoop M/R Pig Hive
Hadoop M/R Pig HiveHadoop M/R Pig Hive
Hadoop M/R Pig Hivezahid-mian
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsSkillspeed
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat SheetHortonworks
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design PatternsDonald Miner
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsDataWorks Summit
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive QueriesOwen O'Malley
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word countJeff Patti
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 

En vedette (20)

Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
MapReduce DesignPatterns
MapReduce DesignPatternsMapReduce DesignPatterns
MapReduce DesignPatterns
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
Hadoop M/R Pig Hive
Hadoop M/R Pig HiveHadoop M/R Pig Hive
Hadoop M/R Pig Hive
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Apache Pig
Apache PigApache Pig
Apache Pig
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Map/Reduce intro
Map/Reduce introMap/Reduce intro
Map/Reduce intro
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
 
Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
 
MapReduce Design Patterns
MapReduce Design PatternsMapReduce Design Patterns
MapReduce Design Patterns
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
 
Optimizing Hive Queries
Optimizing Hive QueriesOptimizing Hive Queries
Optimizing Hive Queries
 
Map reduce: beyond word count
Map reduce: beyond word countMap reduce: beyond word count
Map reduce: beyond word count
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 

Similaire à Hadoop Installation and basic configuration

Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Rohit Agrawal
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basicHafizur Rahman
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuningVitthal Gogate
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)Amir Sedighi
 
Designate - Operators Deep Dive
Designate - Operators Deep DiveDesignate - Operators Deep Dive
Designate - Operators Deep DiveGraham Hayes
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanNarayana B
 
How to configure the cluster based on Multi-site (WAN) configuration
How to configure the clusterbased on Multi-site (WAN) configurationHow to configure the clusterbased on Multi-site (WAN) configuration
How to configure the cluster based on Multi-site (WAN) configurationAkihiro Kitada
 
High Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatHigh Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatChris Barber
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jkEdureka!
 
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOpsОмские ИТ-субботники
 
Setup oracle golden gate 11g replication
Setup oracle golden gate 11g replicationSetup oracle golden gate 11g replication
Setup oracle golden gate 11g replicationKanwar Batra
 
Qt native built for raspberry zero
Qt native built for  raspberry zeroQt native built for  raspberry zero
Qt native built for raspberry zeroSoheilSabzevari2
 
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce StackIQ
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14jijukjoseph
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best PracticesCloudera, Inc.
 
Hadoop Monitoring best Practices
Hadoop Monitoring best PracticesHadoop Monitoring best Practices
Hadoop Monitoring best PracticesEdward Capriolo
 

Similaire à Hadoop Installation and basic configuration (20)

Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2Hadoop Cluster Configuration and Data Loading - Module 2
Hadoop Cluster Configuration and Data Loading - Module 2
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
Hadoop 2.x  HDFS Cluster Installation (VirtualBox)Hadoop 2.x  HDFS Cluster Installation (VirtualBox)
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
 
Designate - Operators Deep Dive
Designate - Operators Deep DiveDesignate - Operators Deep Dive
Designate - Operators Deep Dive
 
Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
How to configure the cluster based on Multi-site (WAN) configuration
How to configure the clusterbased on Multi-site (WAN) configurationHow to configure the clusterbased on Multi-site (WAN) configuration
How to configure the cluster based on Multi-site (WAN) configuration
 
High Availability With DRBD & Heartbeat
High Availability With DRBD & HeartbeatHigh Availability With DRBD & Heartbeat
High Availability With DRBD & Heartbeat
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Introduction to hadoop administration jk
Introduction to hadoop administration   jkIntroduction to hadoop administration   jk
Introduction to hadoop administration jk
 
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
2017-03-11 02 Денис Нелюбин. Docker & Ansible - лучшие друзья DevOps
 
Setup oracle golden gate 11g replication
Setup oracle golden gate 11g replicationSetup oracle golden gate 11g replication
Setup oracle golden gate 11g replication
 
testing-nfs
testing-nfstesting-nfs
testing-nfs
 
Qt native built for raspberry zero
Qt native built for  raspberry zeroQt native built for  raspberry zero
Qt native built for raspberry zero
 
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
StackiFest16: Stacki 1600+ Server Journey - Dave Peterson, Salesforce
 
Stacki - The1600+ Server Journey
Stacki - The1600+ Server JourneyStacki - The1600+ Server Journey
Stacki - The1600+ Server Journey
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Linux device drivers
Linux device drivers Linux device drivers
Linux device drivers
 
Hw09 Monitoring Best Practices
Hw09   Monitoring Best PracticesHw09   Monitoring Best Practices
Hw09 Monitoring Best Practices
 
Hadoop Monitoring best Practices
Hadoop Monitoring best PracticesHadoop Monitoring best Practices
Hadoop Monitoring best Practices
 

Dernier

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Dernier (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Hadoop Installation and basic configuration

  • 5. Hardware Requirements ● NameNode + JobTracker – >= 2 cores – >= 8 gigs ram – >= 40gig disk RAID 10 ● DataNode + TaskTracker – >= 4 cores – >= (+ 1 (os) 1 (TT) 1 (DN) Reducers Maps) Gig RAM – >= N Gig disk space JBOD (no raid)
  • 6. Installation ● Download tar file from hadoop or use a prebuilt rpm ● https://github.com/gerritjvv/repo ● http://bigtop.apache.org/
  • 7. Configuration ● $HADOOP_HOME/conf/core-site.xml ● $HADOOP_HOME/conf/mapred-site.xml ● $HADOOP_HOME/conf/hdfs-site.xml ● http://hadoop.apache.org/docs/stable/cluster_setup ●
  • 8. Configuration Namenode ● Create directory for namenode metadata – /data/hadoop/name ● Open core-site.xml – Define fs.default.name = http://<host>:8020 ● Open hdfs-site.xml – Define dfs.name.dir=/data/hadoop/name – Define dfs.replication=3 – Create dir /data/hadoop/hdfs – Define dfs.data.dir=/data/hadoop/hdfs – Defin dfs.http.address=localhost:50070 ● Start the namenode with the format option – /opt/hadoop/bin/hadoop namenode -format – After the format start the namenode with service hadoop-namenode start
  • 9. Configuration JobTracker ● Open /opt/hadoop/conf/mapred-site.xml – Define the property mapred.job.tracker=<host>:8021 – Create the directory /data/hadoop/mapred – Define mapred.local.dir=/data/hadoop/mapred ● Start the JobTracker with service hadoop- jobtracker start
  • 10. Configuration DataNode ● On each datanode create the directory /data/hadoop/hdfs (one directory per disk) ● Open /opt/hadoop/conf/hdfs-site.xml – Define dfs.http.address=<host>:50070 – Define dfs.data.dir=/data/hadoop/hdfs ● Start the datanodes with service hadoop- datanode start
  • 11. Configuration Mapreduce ● On each datanode create the directory /data/hadoop/mapred ● Open /opt/hadoop/conf/mapred-site.xml – Define mapred.local.dir=/data/hadoop/mapred – Define mapred.tasktracker.map.tasks.maximum=<Number of map tasks> – Define mapred.tasktracker.reduce.tasks.maximum=<Number of reduce tasks> ● Start the TaskTrackers with service hadoop-tasktracker start
  • 12. Monitoring ● Web Html scraping – https://github.com/gerritjvv/hadoop-monitoring ● Glanglia – http://ganglia.info/?p=88 ● Cacti – http://blog.cloudera.com/blog/2009/07/hadoop-graphing
  • 13. Namenode Edits ● Writes/Updates/Deletes are written to RAM and to a write ahead log. ● The metadata in RAM is only merged into a binary file during the secondary namenode checkpoint ● This file corrupts easily ● Recovery is a manual task
  • 14. HA ● Yarn and Hadoop 2.0.0 ● Experimental ● http://hadoop.apache.org/docs/current/hadoop-yarn
  • 15. End