HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

•Télécharger en tant que PPTX, PDF•

0 j'aime•1,061 vues

Hadoop and the term 'Big Data' go hand in hand. The information explosion caused due to cloud and distributed computing lead to the curiosity to process and analyze massive amount of data. The process and analysis helps to add value to an organization or derive valuable information. The current Hadoop implementation assumes that computing nodes in a cluster are homogeneous in nature. Hadoop relies on its capability to take computation to the nodes rather than migrating the data around the nodes which might cause a signicant network overhead. This strategy has its potential benets on homogeneous environment but it might not be suitable on an heterogeneous environment. The time taken to process the data on a slower node on a heterogeneous environment might be signicantly higher than the sum of network overhead and processing time on a faster node. Hence, it is necessary to study the data placement policy where we can distribute the data based on the processing power of a node. The project explores this data placement policy and notes the ramications of this strategy based on running few benchmark applications.

Technologie

Analysis of Data Placement Strategy
based on Computing Power of Nodes on
Heterogeneous Hadoop Clusters
Sanket Reddy Chintapalli
Advisor - Dr. Xiao Qin

Presentation Overview
● Synopsis
● Mapreduce Programming Model Overview
● HDFS Overview
● Motivation
● Design
● Software Description
● Hardware Description
● Results
● Conclusion

Synopsis
● Data placement strategy
● Heterogeneous Clusters
● Computing Power
● Calculating Computing Ratio
● WordCount and Grep

MapReduce Model
● Hadoop 1.0 and Hadoop 2.0
● Master - Slave Model
● JobTracker and TaskTracker Hadoop 1.0
● YARN Hadoop 2.0
● Resource Manager YARN
● Application Manager YARN
● Node Manager YARN
● MapReduce Flow

HDFS
● Namenode
● Datanode
● Replication
● Federated Namenodes

HDFS Federated Namenodes
● Scalability
● Performance
● Isolation - overload

Software Description
● Hadoop 2.3.0
● Maven
● Eclipse
● Protocol Buffers

Design
Run WordCount and Grep Applications on
individual nodes

Design
Calculate Computing Power of Individual Nodes for
a specific application

Design
● Evaluate Hadoop Distribution by running grep and
wordcount together on all nodes
● Run the CRBalancer to balance the nodes
● Finally re-run the applications to note the ramifications
of the data placement strategy.

Implementation
● CRBalancer
● CRBalancingPolicy
● CRNamenodeConnector

Contenu connexe

Tendances

writing Hadoop Map Reduce programsjani shaik

Mapreduce by examplesAndrea Iacono

Map Reduce introductionMuralidharan Deenathayalan

Hadoop scheduler with deadline constraintijccsa

Introduction to MapReduceChicago Hadoop Users Group

Hadoop Network Performance profilepramodbiligiri

Introduction to MapReduceHassan A-j

Introduction to Map ReduceApache Apex

Map ReduceMichel Bruley

Pig ExperienceTilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju

Mastering Hadoop Map Reduce - Custom Types and Other Optimizationsscottcrespo

MapReduce Scheduling AlgorithmsLeila panahi

Shuffle phase as the bottleneck in Hadoop Terasortpramodbiligiri

Python in an Evolving Enterprise System (PyData SV 2013)PyData

Stratosphere with big_data_analyticsAvinash Pandu

Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug

Hadoop Map ReduceVNIT-ACM Student Chapter

MapReduce Algorithm DesignGabriela Agustini

Resilient Distributed DatasetsGabriele Modena

Tendances (20)

writing Hadoop Map Reduce programs

Mapreduce by examples

Map Reduce introduction

Hadoop scheduler with deadline constraint

Introduction to MapReduce

Hadoop Network Performance profile

Introduction to MapReduce

Introduction to Map Reduce

Map Reduce

Pig Experience

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015

Mastering Hadoop Map Reduce - Custom Types and Other Optimizations

MapReduce Scheduling Algorithms

Shuffle phase as the bottleneck in Hadoop Terasort

Python in an Evolving Enterprise System (PyData SV 2013)

Stratosphere with big_data_analytics

Hadoop - Introduction to map reduce programming - Reunião 12/04/2014

Hadoop Map Reduce

MapReduce Algorithm Design

Resilient Distributed Datasets

En vedette

High Value Media Placement Strategies The Advertising Research Foundation

Placement strategiesAnup Singh

Computer architecturejookerbuzz

COMP2710 Software Construction: header filesXiao Qin

How to do research?Xiao Qin

Reliability Analysis for an Energy-Aware RAID SystemXiao Qin

Project 2 - how to compile os161?Xiao Qin

An Active and Hybrid Storage System for Data-intensive ApplicationsXiao Qin

Project 2 how to modify OS/161Xiao Qin

Energy Efficient Data Storage SystemsXiao Qin

OS/161 OverviewXiao Qin

Nas'12 overviewXiao Qin

IPCCC 2012 Conference Program OverviewXiao Qin

Common grammar mistakesXiao Qin

Thermal modeling and management of cluster storage systems xunfei jiang 2014Xiao Qin

Why Major in Computer Science and Software Engineering at Auburn University?Xiao Qin

Project 2 how to install and compile os161Xiao Qin

Surviving a group projectXiao Qin

Project 2 How to modify os161: A ManualXiao Qin

COMP2710: Software Construction - Linked list exercisesXiao Qin

En vedette (20)

High Value Media Placement Strategies

Placement strategies

Computer architecture

COMP2710 Software Construction: header files

How to do research?

Reliability Analysis for an Energy-Aware RAID System

Project 2 - how to compile os161?

An Active and Hybrid Storage System for Data-intensive Applications

Project 2 how to modify OS/161

Energy Efficient Data Storage Systems

OS/161 Overview

Nas'12 overview

IPCCC 2012 Conference Program Overview

Common grammar mistakes

Thermal modeling and management of cluster storage systems xunfei jiang 2014

Why Major in Computer Science and Software Engineering at Auburn University?

Project 2 how to install and compile os161

Surviving a group project

Project 2 How to modify os161: A Manual

COMP2710: Software Construction - Linked list exercises

Similaire à HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Hadoop Administration Online Training.pdfSpiritsoftsTraining

Scheduling scheme for hadoop clustersAmjith Singh

Hadoop online trainingsGeek Trainings

Apache Spark Overview part1 (20161107)Steve Min

Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3

Hadoop introductionRabindra Nath Nandi

Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer

WHAT IS HADOOP AND ITS COMPONENTS? nakshatraL

Hadoop - A Very Short Introductiondewang_mistry

Webinar: MongoDB + HadoopMongoDB

Apache spark on Hadoop Yarn Resource Managerharidasnss

Apache Spark OverviewCarol McDonald

Hadoop ABHIJEET RAJ

Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk

5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks

Glusterfs and HadoopShubhendu Tripathi

Microsoft's Hadoop StoryMichael Rys

What's the Scoop on Hadoop? How It Works and How to WORK IT!MongoDB

LAS16-305: Smart City Big Data Visualization on 96BoardsLinaro

Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Ganesh Raju

Similaire à HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters (20)

Hadoop Administration Online Training.pdf

Scheduling scheme for hadoop clusters

Hadoop online trainings

Apache Spark Overview part1 (20161107)

Survey on Performance of Hadoop Map reduce Optimization Methods

Hadoop introduction

Big Data Analytics with Hadoop, MongoDB and SQL Server

WHAT IS HADOOP AND ITS COMPONENTS?

Hadoop - A Very Short Introduction

Webinar: MongoDB + Hadoop

Apache spark on Hadoop Yarn Resource Manager

Apache Spark Overview

Hadoop

Secrets of Spark's success - Deenar Toraskar, Think Reactive

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Glusterfs and Hadoop

Microsoft's Hadoop Story

What's the Scoop on Hadoop? How It Works and How to WORK IT!

LAS16-305: Smart City Big Data Visualization on 96Boards

Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016

Plus de Xiao Qin

How to apply for internship positions?Xiao Qin

How to write research papers? Version 5.0Xiao Qin

Making a competitive nsf career proposal: Part 2 WorksheetXiao Qin

Making a competitive nsf career proposal: Part 1 TipsXiao Qin

Auburn csse faculty orientationXiao Qin

Auburn CSSE graduate student orientationXiao Qin

CSSE Graduate Programs Committee: Progress ReportXiao Qin

Understanding what our customer wants-slideshareXiao Qin

P#1 stream of praiseXiao Qin

Data center specific thermal and energy saving techniquesXiao Qin

How to add system calls to OS/161Xiao Qin

Performance Evaluation of Traditional Caching Policies on a Large System with...Xiao Qin

Reliability Modeling and Analysis of Energy-Efficient Storage SystemsXiao Qin

Plus de Xiao Qin (13)

How to apply for internship positions?

How to write research papers? Version 5.0

Making a competitive nsf career proposal: Part 2 Worksheet

Making a competitive nsf career proposal: Part 1 Tips

Auburn csse faculty orientation

Auburn CSSE graduate student orientation

CSSE Graduate Programs Committee: Progress Report

Understanding what our customer wants-slideshare

P#1 stream of praise

Data center specific thermal and energy saving techniques

How to add system calls to OS/161

Performance Evaluation of Traditional Caching Policies on a Large System with...

Reliability Modeling and Analysis of Energy-Efficient Storage Systems

Dernier

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

A Domino Admins Adventures (Engage 2024)Gabriella Davis

A Call to Action for Generative AI in 2024Results

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Artificial Intelligence: Facts and MythsJoaquim Jorge

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

How to convert PDF to text with Nanonetsnaman860154

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Dernier (20)

Boost PC performance: How more available memory can improve productivity

Finology Group – Insurtech Innovation Award 2024

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

08448380779 Call Girls In Friends Colony Women Seeking Men

CNv6 Instructor Chapter 6 Quality of Service

A Domino Admins Adventures (Engage 2024)

A Call to Action for Generative AI in 2024

Tata AIG General Insurance Company - Insurer Innovation Award 2024

Artificial Intelligence: Facts and Myths

Axa Assurance Maroc - Insurer Innovation Award 2024

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

How to convert PDF to text with Nanonets

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

08448380779 Call Girls In Civil Lines Women Seeking Men

Advantages of Hiring UIUX Design Service Providers for Your Business

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

How to Troubleshoot Apps for the Modern Connected Worker

HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

1. Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters Sanket Reddy Chintapalli Advisor - Dr. Xiao Qin

2. Presentation Overview ● Synopsis ● Mapreduce Programming Model Overview ● HDFS Overview ● Motivation ● Design ● Software Description ● Hardware Description ● Results ● Conclusion

3. Synopsis ● Data placement strategy ● Heterogeneous Clusters ● Computing Power ● Calculating Computing Ratio ● WordCount and Grep

4. MapReduce Model ● Hadoop 1.0 and Hadoop 2.0 ● Master - Slave Model ● JobTracker and TaskTracker Hadoop 1.0 ● YARN Hadoop 2.0 ● Resource Manager YARN ● Application Manager YARN ● Node Manager YARN ● MapReduce Flow

5. Mapreduce Model

6. Mapreduce Model - 1.0

7. Mapreduce Model - YARN - 2.0

8. Mapreduce Model - Flow

9. HDFS ● Namenode ● Datanode ● Replication ● Federated Namenodes

10. HDFS Architecture

11. HDFS Federated Namenodes

12. HDFS Federated Namenodes ● Scalability ● Performance ● Isolation - overload

13. Motivation

14. Software Description ● Hadoop 2.3.0 ● Maven ● Eclipse ● Protocol Buffers

15. Hardware Description

16. Design Run WordCount and Grep Applications on individual nodes

17. Design Calculate Computing Power of Individual Nodes for a specific application

18. Design ● Evaluate Hadoop Distribution by running grep and wordcount together on all nodes ● Run the CRBalancer to balance the nodes ● Finally re-run the applications to note the ramifications of the data placement strategy.

19. Design - Algorithm CRBalancer Strategy

20. Implementation ● CRBalancer ● CRBalancingPolicy ● CRNamenodeConnector

21. Results - WordCount

22. Results - Grep

23. Questions ??

HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters

Similaire à HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters (20)

Plus de Xiao Qin

Plus de Xiao Qin (13)

Dernier

Dernier (20)

HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nodes on Heterogeneous Hadoop Clusters