HDFS HA : Stockage à haute disponibilité par Damien Hardy

•

1 j'aime•2,567 vues

Jusqu'alors dans HDFS, NameNode etait un élément critique difficile à fiabiliser. Hadoop 2 et donc CDH4 introduisait HDFS HA. CDH4.1 élimine tous les SPOF. Nous verrons comment mettre en place la haute disponibilité dans HDFS. Quels sont les nouveaux services, comment les articuler. http://fr.viadeo.com/fr/profile/damien.hardy8

Hadoop CDH4.1.2
HDFS HA : Stockage à haute disponibilité

Viadeo Tech Days 2012
Damien Hardy
Architecte Infrastructure @Viadéo

Overview

1. Hadoop par Cloudera
2. CDH3 : Hadoop 1
3. CDH4 : Hadoop 2
4. HDFS HA
5. Paramétrage
6. Mire en route
7. Failover
8. Coté client
9. Information ailleurs
10.Questions

Cloudera : distribution Hadoop

Packetage Debian et Redhat
Dépots publiques
Patchs
Commiteurs Apache

CDH3 : D'où venons nous ?

Hadoop 1.0
NameNode (SPOF)
SecondarynameNode (ceci n'est pas un
NameNode)
DataNode
Jobtracker
Tasktracker
Hbase 0.90
Master server
Region server
Zookeeper 3.3
...

©http://lesjoiesdusysadmin.tumblr.com/post/35638011614

CDH4.1 : HADOOP 2

Hadoop 2.0
NameNode
DataNode
JournalNode
ZK Failover Controler (ZKFC)
Jobtracker
Tasktracker
Hbase 0.92
Master server
Region server
Zookeeper 3.4
...

HDFS HA

NameNode n'est plus un SPOF o/
SecondaryNameNode n'est plus utile
2 NameNodes en mode active/standby
Bascule automatique possible (ZKFC)
Pas d'IP flottante ni de hearbeat/keepalive
basé sur Zookeeper (déjà utilisé pour Hbase)
avec ou sans NFS (grace aux JournalNodes)

hdfs-site.xml : déclaration du cluster

dfs.nameservices : nom du "point d'accès"
dfs.ha.namenodes.hdfscluster : liste des 2
noms de serveurs NameNode
dfs.namenode.rpc-
address.hdfscluster.<nom> : adresse RPC du
node
dfs.namenode.http-
address.hdfscluster.<nom> : adresse HTTP
du node

hdfs-site.xml : méthode Failover

dfs.ha.automatic-failover.enabled : bascule
automatique ?
ha.zookeeper.quorum : liste des serveurs du
cluster ZooKeeper

hdfs-site.xml : partager les données

dfs.namenode.shared.edits.dir : dossier de
partage des meta données (sur NFS ou
Quorum JournalNode)
dfs.journalnode.edits.dir : chemin de
stockage pour le JournalNode (sur chaque
serveur du Quorum)
dfs.ha.fencing.methods : méthode de
STONITH (pour dossier partagé)

Initialisation

Hdfs NameNode (premier)
hdfs namenode -format
start hadoop-hdfs-namenode
Hdfs NameNode (second)
hdfs namenode -bootstrapStandby
start hadoop-hdfs-namenode
Hdfs zkfc (bascule automatique)
hdfs zkfc -formatZK
start hadoop-hdfs-zkfc (sur 2 NameNode)

Administration

hdfs haadmin
Usage: DFSHAAdmin [-ns
<nameserviceId>]
[-transitionToActive <serviceId>]
[-transitionToStandby <serviceId>]
[-failover [--forcefence] [--forceactive]
<serviceId> <serviceId>]
[-getServiceState <serviceId>]
[-checkHealth <serviceId>]
[-help <command>]
Pour provoquer une bascule de serveur.
checkHealth non implémenté

Côté client HDFS: l'exemple HBase

hbase-site.xml
hbase.rootdir :
hdfs://<dfs.nameservices>/hbase
La config HDFS dans le $CLASSPATH
core-site.xml
dfs.client.failover.proxy.provider.<ns>
hdfs-site.xml

Information

http://ccp.cloudera.com/display/CDH4DOC/CD
H4+High+Availability+Guide

Merci de votre attention,

à vos questions!

HDFS HA : Stockage à haute disponibilité par Damien Hardy

Contenu connexe

Tendances

Une introduction à HBaseModern Data Stack France

Dépasser map() et reduce()Modern Data Stack France

Journées SQL 2014 - Hive ou la convergence entre datawarehouse et Big DataDavid Joubert

Hadopp Vue d'ensembleModern Data Stack France

Une Introduction à HadoopModern Data Stack France

Hadoop Hbase - IntroductionBlandine Larbret

Hug france - Administration Hadoop et retour d’expérience BI avec Impala, lim...Modern Data Stack France

Chapitre 2 hadoopMouna Torjmen

HdfsInes Slimene

Installation hadoopv2.7.4-amal abidAmal Abid

Cours Big Data Chap3Amal Abid

Open Recipes - hubs : du packaging en solo à l'industrialisation du packaging Anne Nicolas

HadoopAS Stitou

TP1 Big Data - MapReduceAmal Abid

Squid squid guardFanoela Rajaonarivelo

Mercurial - PHPQuebec - December 08mdupuis

HCatalogModern Data Stack France

Open Recipes - Pouquoi le packaging est important pour l'intégration logicielleAnne Nicolas

Tendances (18)

Une introduction à HBase

Dépasser map() et reduce()

Journées SQL 2014 - Hive ou la convergence entre datawarehouse et Big Data

Hadopp Vue d'ensemble

Une Introduction à Hadoop

Hadoop Hbase - Introduction

Hug france - Administration Hadoop et retour d’expérience BI avec Impala, lim...

Chapitre 2 hadoop

Hdfs

Installation hadoopv2.7.4-amal abid

Cours Big Data Chap3

Open Recipes - hubs : du packaging en solo à l'industrialisation du packaging

Hadoop

TP1 Big Data - MapReduce

Squid squid guard

Mercurial - PHPQuebec - December 08

HCatalog

Open Recipes - Pouquoi le packaging est important pour l'intégration logicielle

En vedette

Paris stormusergroup intrudocutionParis_Storm_UG

Productionizing Hadoop: 7 Architectural Best PracticesMapR Technologies

PaaS Emerging Technologies - October 2015Krishna-Kumar

MongoDB day Paris 2012FastConnect

Démystifions le machine learning avec spark par David Martin pour le Salon B...Ippon

Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDBMongoDB

Apache Storm - Introduction au traitement temps-réel avec StormParis_Storm_UG

Présentation de Apache ZookeeperMichaël Morello

Analyser sa maison à l’aide de Apache Storm (Big Data en Temps Réel)Microsoft Décideurs IT

Realtime Web avec Akka, Kafka, Spark et Mesos - Devoxx Paris 2014Ippon

Du Big Data vers le SMART Data : Scénario d'un processusCHAKER ALLAOUI

IoT (M2M) - Big Data - Analyses : Simulation et DémonstrationCHAKER ALLAOUI

Soutenance De Stageguesta3231e

L'internet des objets (The Internet of Things)Raphaël Duperret

Petit-déjeuner OCTO Technology : Calculez vos indicateurs en temps réel ave...OCTO Technology

présentation soutenance PFE.pptMohamed Ben Bouzid

En vedette (16)

Paris stormusergroup intrudocution

Productionizing Hadoop: 7 Architectural Best Practices

PaaS Emerging Technologies - October 2015

MongoDB day Paris 2012

Démystifions le machine learning avec spark par David Martin pour le Salon B...

Plus de flexibilité et de scalabilité chez Bouygues Télécom grâce à MongoDB

Apache Storm - Introduction au traitement temps-réel avec Storm

Présentation de Apache Zookeeper

Analyser sa maison à l’aide de Apache Storm (Big Data en Temps Réel)

Realtime Web avec Akka, Kafka, Spark et Mesos - Devoxx Paris 2014

Du Big Data vers le SMART Data : Scénario d'un processus

IoT (M2M) - Big Data - Analyses : Simulation et Démonstration

Soutenance De Stage

L'internet des objets (The Internet of Things)

Petit-déjeuner OCTO Technology : Calculez vos indicateurs en temps réel ave...

présentation soutenance PFE.ppt

Similaire à HDFS HA : Stockage à haute disponibilité par Damien Hardy

Hadoop MapReduce.docxIssamHamdi

Casablanca Hadoop & Big Data Meetup - Introduction à HadoopBenoît de CHATEAUVIEUX

Cy3907 formation-cloudera-administrator-training-for-apache-hadoopCERTyou Formation

Reseau entrepriseSAIDRAISS2

APACHE HTTPRachid NID SAID

GTUG Nantes (Dec 2011) - BigTable et NoSQLMichaël Figuière

Gtug nantes big table et nosqlGDG Nantes

mise en place de service dhcp sous Ubuntu 20.04ImnaTech

Plongée dans la plateforme hadooppkernevez

DrupalCamp Nantes 2016 - Migrer un site Drupal 6 ou Drupal 7 vers Drupal 8Aurelien Navarre

08 01 mise en place d'un serveur webNoël

Apache Open SSLAnouar Loukili

Hadoop and friends : introductionfredcons

L’ Administration des Réseaux en PratiqueAmadou Dia

Mysql Apche PHP sous linuxKhalid ALLILI

Dhcp sous fedora 11Dimitri LEMBOKOLO

DHCP sous fedora Souhaib El

Architecture PHP 3 tier avec Zend en backend de DrupalThomas Delerm

Hadoop MapReduce - OSDC FR 2009Olivier Grisel

Config ipMouhsine Najih

Similaire à HDFS HA : Stockage à haute disponibilité par Damien Hardy (20)

Hadoop MapReduce.docx

Casablanca Hadoop & Big Data Meetup - Introduction à Hadoop

Cy3907 formation-cloudera-administrator-training-for-apache-hadoop

Reseau entreprise

APACHE HTTP

GTUG Nantes (Dec 2011) - BigTable et NoSQL

Gtug nantes big table et nosql

mise en place de service dhcp sous Ubuntu 20.04

Plongée dans la plateforme hadoop

DrupalCamp Nantes 2016 - Migrer un site Drupal 6 ou Drupal 7 vers Drupal 8

08 01 mise en place d'un serveur web

Apache Open SSL

Hadoop and friends : introduction

L’ Administration des Réseaux en Pratique

Mysql Apche PHP sous linux

Dhcp sous fedora 11

DHCP sous fedora

Architecture PHP 3 tier avec Zend en backend de Drupal

Hadoop MapReduce - OSDC FR 2009

Config ip

Plus de Olivier DASINI

MySQL High Availability Solutions - Avoid loss of service by reducing the r...Olivier DASINI

MySQL Document Store for Modern ApplicationsOlivier DASINI

MySQL Performance Best PracticesOlivier DASINI

MySQL 8.0.22 - New Features SummaryOlivier DASINI

MySQL Database Service - 100% Developed, Managed and Supported by the MySQL TeamOlivier DASINI

Upgrade from MySQL 5.7 to MySQL 8.0Olivier DASINI

MySQL 8.0.21 - New Features SummaryOlivier DASINI

MySQL 8.0.19 - New Features SummaryOlivier DASINI

MySQL 8.0.18 - New Features SummaryOlivier DASINI

MySQL 8.0.17 - New Features SummaryOlivier DASINI

MySQL 8.0.16 New Features SummaryOlivier DASINI

MySQL Day Paris 2018 - Introduction & The State of the DolphinOlivier DASINI

MySQL Day Paris 2018 - MySQL & GDPR; Privacy and Security requirementsOlivier DASINI

MySQL Day Paris 2018 - Upgrade from MySQL 5.7 to MySQL 8.0Olivier DASINI

MySQL Day Paris 2018 - MySQL InnoDB Cluster; A complete High Availability sol...Olivier DASINI

MySQL Day Paris 2018 - MySQL JSON Document StoreOlivier DASINI

MySQL Day Paris 2018 - What’s New in MySQL 8.0 ?Olivier DASINI

MySQL 8.0, what's new ? - Forum PHP 2018Olivier DASINI

MySQL JSON Document Store - A Document Store with all the benefits of a Trans...Olivier DASINI

MySQL 8.0 - What's New ?Olivier DASINI

Plus de Olivier DASINI (20)

MySQL High Availability Solutions - Avoid loss of service by reducing the r...

MySQL Document Store for Modern Applications

MySQL Performance Best Practices

MySQL 8.0.22 - New Features Summary

MySQL Database Service - 100% Developed, Managed and Supported by the MySQL Team

Upgrade from MySQL 5.7 to MySQL 8.0

MySQL 8.0.21 - New Features Summary

MySQL 8.0.19 - New Features Summary

MySQL 8.0.18 - New Features Summary

MySQL 8.0.17 - New Features Summary

MySQL 8.0.16 New Features Summary

MySQL Day Paris 2018 - Introduction & The State of the Dolphin

MySQL Day Paris 2018 - MySQL & GDPR; Privacy and Security requirements

MySQL Day Paris 2018 - Upgrade from MySQL 5.7 to MySQL 8.0

MySQL Day Paris 2018 - MySQL InnoDB Cluster; A complete High Availability sol...

MySQL Day Paris 2018 - MySQL JSON Document Store

MySQL Day Paris 2018 - What’s New in MySQL 8.0 ?

MySQL 8.0, what's new ? - Forum PHP 2018

MySQL JSON Document Store - A Document Store with all the benefits of a Trans...

MySQL 8.0 - What's New ?

HDFS HA : Stockage à haute disponibilité par Damien Hardy

1. Hadoop CDH4.1.2 HDFS HA : Stockage à haute disponibilité Viadeo Tech Days 2012 Damien Hardy Architecte Infrastructure @Viadéo

2. Overview 1. Hadoop par Cloudera 2. CDH3 : Hadoop 1 3. CDH4 : Hadoop 2 4. HDFS HA 5. Paramétrage 6. Mire en route 7. Failover 8. Coté client 9. Information ailleurs 10.Questions

3. Cloudera : distribution Hadoop Packetage Debian et Redhat Dépots publiques Patchs Commiteurs Apache

4. CDH3 : D'où venons nous ? Hadoop 1.0 NameNode (SPOF) SecondarynameNode (ceci n'est pas un NameNode) DataNode Jobtracker Tasktracker Hbase 0.90 Master server Region server Zookeeper 3.3 ...

6. CDH4.1 : HADOOP 2 Hadoop 2.0 NameNode DataNode JournalNode ZK Failover Controler (ZKFC) Jobtracker Tasktracker Hbase 0.92 Master server Region server Zookeeper 3.4 ...

7. HDFS HA NameNode n'est plus un SPOF o/ SecondaryNameNode n'est plus utile 2 NameNodes en mode active/standby Bascule automatique possible (ZKFC) Pas d'IP flottante ni de hearbeat/keepalive basé sur Zookeeper (déjà utilisé pour Hbase) avec ou sans NFS (grace aux JournalNodes)

8. hdfs-site.xml : déclaration du cluster dfs.nameservices : nom du "point d'accès" dfs.ha.namenodes.hdfscluster : liste des 2 noms de serveurs NameNode dfs.namenode.rpc- address.hdfscluster.<nom> : adresse RPC du node dfs.namenode.http- address.hdfscluster.<nom> : adresse HTTP du node

9. hdfs-site.xml : méthode Failover dfs.ha.automatic-failover.enabled : bascule automatique ? ha.zookeeper.quorum : liste des serveurs du cluster ZooKeeper

10. hdfs-site.xml : partager les données dfs.namenode.shared.edits.dir : dossier de partage des meta données (sur NFS ou Quorum JournalNode) dfs.journalnode.edits.dir : chemin de stockage pour le JournalNode (sur chaque serveur du Quorum) dfs.ha.fencing.methods : méthode de STONITH (pour dossier partagé)

11. Initialisation Hdfs NameNode (premier) hdfs namenode -format start hadoop-hdfs-namenode Hdfs NameNode (second) hdfs namenode -bootstrapStandby start hadoop-hdfs-namenode Hdfs zkfc (bascule automatique) hdfs zkfc -formatZK start hadoop-hdfs-zkfc (sur 2 NameNode)

12. Administration hdfs haadmin Usage: DFSHAAdmin [-ns <nameserviceId>] [-transitionToActive <serviceId>] [-transitionToStandby <serviceId>] [-failover [--forcefence] [--forceactive] <serviceId> <serviceId>] [-getServiceState <serviceId>] [-checkHealth <serviceId>] [-help <command>] Pour provoquer une bascule de serveur. checkHealth non implémenté

13. Côté client HDFS: l'exemple HBase hbase-site.xml hbase.rootdir : hdfs://<dfs.nameservices>/hbase La config HDFS dans le $CLASSPATH core-site.xml dfs.client.failover.proxy.provider.<ns> hdfs-site.xml

14. Résultat Démo

15. Information http://ccp.cloudera.com/display/CDH4DOC/CD H4+High+Availability+Guide

16. Merci de votre attention, à vos questions!

HDFS HA : Stockage à haute disponibilité par Damien Hardy

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

En vedette

En vedette (16)

Similaire à HDFS HA : Stockage à haute disponibilité par Damien Hardy

Similaire à HDFS HA : Stockage à haute disponibilité par Damien Hardy (20)

Plus de Olivier DASINI

Plus de Olivier DASINI (20)

HDFS HA : Stockage à haute disponibilité par Damien Hardy