SlideShare une entreprise Scribd logo
1  sur  15
Is Cloud a right companion for
Hadoop?
Saravanan Prabhagaran
& Chintan Bhatt
Agenda
Hadoop and cloud Primer
Hadoop challenges
Hadoop on cloud - Advantages and Challenges
Hadoop on cloud offerings
Typical use cases of Hadoop on cloud
Considerations Hadoop deployment
Conclusion
2
3
Diagonal
scalability
Public
Elastic
Primer
Challenges with Hadoop
4
Infrastructure management
Configuration and Tuning
Capital expenditure
Agility: Provisioning and Availability
Hadoop on cloud Advantages
5
Hadoop
Cloud
Lower
operations
cost
On-Demand
Faster
Provisioning
Efficient
resource
utilization
Distributed /
Parallel
Processing
Elasticity
Agility
Pay as you go
Higher
Throughput
Hadoop on Cloud Challenges
 Data locality vs On-demand Cloud
 Import / export parallelism, Interruption
 Applications sensitive to latencies experience higher
overheads
 Higher overheads by VM than a bare metal
6
Hadoop and Cloud based offerings
7
AWS
MS Azure
Rackspace
Cloudera
Hortonworks
MapR
Joyent
EMR
HDInsight
GoGrid
Rackspace Cloud
Big Data platform
Project Sahara
Pivotal
Project Sahara
 Openstack component
 Hadoop cluster
provisioning
 REST API
 Integration with Hadoop
management tools
 Hadoop configuration
templates
8
Use Cases
 Faster cluster
provisioning for
DEV/QA
 Ad-hoc or bursty
analytical workloads
 Resource utilization
from Openstack IaaS
Project Sahara Architecture
9
Horizon
Sahara
Keystone Swift
Nova
Glance
User
REST
User
Authentication
Hadoop
Image
Create VM
Hadoop
Jobs
Typical use cases of Hadoop on Cloud
 On-Demand Analytics
 Dev/QA or POC environment
 Cluster required for executing Nightly, Weekly, Monthly jobs
 Application deployed on Cloud and Data to be used is in
cloud
10
Considerations - Hadoop on Bare-metal or on cloud
 Capex Vs Opex
 Performance Vs price-performance
 Data gravity
 Regulatory requirements
 Agility
11
Hadoop on Cloud?
12
Performance
?
Bare MetalYes
Data
Gravity
Public Cloud/
Hosted Hadoop
In-Premise
Bare Metal
Private Cloud
In Cloud?
In-premise?
Control
over Data
In-Premise
Bare Metal
Private Cloud
Strict control?
Hadoop on Cloud?
13
POC/DEV/
QA
Public Cloud/
Hosted Hadoop
In-Premise
Private Cloud
CapEx Vs
OpEx?
In-Premise
Bare Metal
Private Cloud
CapEx?
Public Cloud/
Hosted Hadoop
OpEx?
Questions?
We Are Hiring!
Thank You!

Contenu connexe

Tendances

DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIsCisco DevNet
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduGrant Henke
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudAmazon Web Services
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureUtkarsh Pandey
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudCloudera, Inc.
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...DataWorks Summit
 
High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101Amazon Web Services
 
Self-Service Provisioning and Hadoop Management with Apache Ambari
Self-Service Provisioning and  Hadoop Management with Apache AmbariSelf-Service Provisioning and  Hadoop Management with Apache Ambari
Self-Service Provisioning and Hadoop Management with Apache AmbariDataWorks Summit
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureLynn Langit
 

Tendances (20)

DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache Kudu
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS Cloud
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azure
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in Cloud
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101
 
Self-Service Provisioning and Hadoop Management with Apache Ambari
Self-Service Provisioning and  Hadoop Management with Apache AmbariSelf-Service Provisioning and  Hadoop Management with Apache Ambari
Self-Service Provisioning and Hadoop Management with Apache Ambari
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 

En vedette

Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for CybersecurityCloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for CybersecurityCloudera, Inc.
 
Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Cloudera, Inc.
 
泛在个人桌面服务
泛在个人桌面服务泛在个人桌面服务
泛在个人桌面服务ITband
 
Why Your Data and Analytics Should Live in the Cloud
Why Your Data and Analytics Should Live in the CloudWhy Your Data and Analytics Should Live in the Cloud
Why Your Data and Analytics Should Live in the CloudDavid Menninger
 
Resume - Alison Lange
Resume - Alison LangeResume - Alison Lange
Resume - Alison LangeAlison Lange
 
Av2 8abr10
Av2  8abr10Av2  8abr10
Av2 8abr10tahoma1
 
RubenGasparyanCV June 2016
RubenGasparyanCV June 2016RubenGasparyanCV June 2016
RubenGasparyanCV June 2016Ruben Gasparyan
 
Marmalade boy
Marmalade boyMarmalade boy
Marmalade boyYuyu Gray
 
LinkedIn Reviews_People Are Saying
LinkedIn Reviews_People Are SayingLinkedIn Reviews_People Are Saying
LinkedIn Reviews_People Are SayingMalcolm Ryder
 
Engaging the Digital Customer
Engaging the Digital CustomerEngaging the Digital Customer
Engaging the Digital CustomerMoxie
 
Splunking HL7 Healthcare Data for Business Value
Splunking HL7 Healthcare Data for Business ValueSplunking HL7 Healthcare Data for Business Value
Splunking HL7 Healthcare Data for Business ValueSplunk
 
Service Management Solution Framework (SMSF)
Service Management Solution Framework (SMSF)Service Management Solution Framework (SMSF)
Service Management Solution Framework (SMSF)Malcolm Ryder
 
Company_Presentation_12 10 2016_EN
Company_Presentation_12 10 2016_ENCompany_Presentation_12 10 2016_EN
Company_Presentation_12 10 2016_ENRalf Hildenbrand
 

En vedette (20)

Vocales 1a
Vocales 1aVocales 1a
Vocales 1a
 
Power Ral Y Aleja
Power Ral Y AlejaPower Ral Y Aleja
Power Ral Y Aleja
 
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for CybersecurityCloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
 
Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010
 
泛在个人桌面服务
泛在个人桌面服务泛在个人桌面服务
泛在个人桌面服务
 
Why Your Data and Analytics Should Live in the Cloud
Why Your Data and Analytics Should Live in the CloudWhy Your Data and Analytics Should Live in the Cloud
Why Your Data and Analytics Should Live in the Cloud
 
Resume - Alison Lange
Resume - Alison LangeResume - Alison Lange
Resume - Alison Lange
 
Cloudera 5.3 Update
Cloudera 5.3 UpdateCloudera 5.3 Update
Cloudera 5.3 Update
 
Apa
ApaApa
Apa
 
Av2 8abr10
Av2  8abr10Av2  8abr10
Av2 8abr10
 
RubenGasparyanCV June 2016
RubenGasparyanCV June 2016RubenGasparyanCV June 2016
RubenGasparyanCV June 2016
 
Boston webcast nv_me_2016-09
Boston webcast nv_me_2016-09Boston webcast nv_me_2016-09
Boston webcast nv_me_2016-09
 
Marmalade boy
Marmalade boyMarmalade boy
Marmalade boy
 
LinkedIn Reviews_People Are Saying
LinkedIn Reviews_People Are SayingLinkedIn Reviews_People Are Saying
LinkedIn Reviews_People Are Saying
 
Engaging the Digital Customer
Engaging the Digital CustomerEngaging the Digital Customer
Engaging the Digital Customer
 
Splunking HL7 Healthcare Data for Business Value
Splunking HL7 Healthcare Data for Business ValueSplunking HL7 Healthcare Data for Business Value
Splunking HL7 Healthcare Data for Business Value
 
Routin
RoutinRoutin
Routin
 
Service Management Solution Framework (SMSF)
Service Management Solution Framework (SMSF)Service Management Solution Framework (SMSF)
Service Management Solution Framework (SMSF)
 
Company_Presentation_12 10 2016_EN
Company_Presentation_12 10 2016_ENCompany_Presentation_12 10 2016_EN
Company_Presentation_12 10 2016_EN
 
Toile
ToileToile
Toile
 

Similaire à Is Cloud a right Companion for Hadoop

Extending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the CloudExtending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the CloudDataWorks Summit
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
Build and use a DevOps driven Migration Pipeline
Build and use a DevOps driven Migration PipelineBuild and use a DevOps driven Migration Pipeline
Build and use a DevOps driven Migration PipelineVedanta Barooah
 
Hadoop online training by certified trainer
Hadoop online training by certified trainerHadoop online training by certified trainer
Hadoop online training by certified trainersriram0233
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on webcsandit
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...cscpconf
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsCloudera, Inc.
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...EMC
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Integration of SAP HANA with Hadoop
Integration of SAP HANA with HadoopIntegration of SAP HANA with Hadoop
Integration of SAP HANA with HadoopRamkumar Rajendran
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingSamatha Kamuni
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourseSamatha Kamuni
 

Similaire à Is Cloud a right Companion for Hadoop (20)

Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Extending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the CloudExtending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the Cloud
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Build and use a DevOps driven Migration Pipeline
Build and use a DevOps driven Migration PipelineBuild and use a DevOps driven Migration Pipeline
Build and use a DevOps driven Migration Pipeline
 
Hadoop online training by certified trainer
Hadoop online training by certified trainerHadoop online training by certified trainer
Hadoop online training by certified trainer
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on web
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Integration of SAP HANA with Hadoop
Integration of SAP HANA with HadoopIntegration of SAP HANA with Hadoop
Integration of SAP HANA with Hadoop
 
Why Hadoop as a Service?
Why Hadoop as a Service?Why Hadoop as a Service?
Why Hadoop as a Service?
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online Training
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourse
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 

Dernier (20)

UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 

Is Cloud a right Companion for Hadoop

  • 1. Is Cloud a right companion for Hadoop? Saravanan Prabhagaran & Chintan Bhatt
  • 2. Agenda Hadoop and cloud Primer Hadoop challenges Hadoop on cloud - Advantages and Challenges Hadoop on cloud offerings Typical use cases of Hadoop on cloud Considerations Hadoop deployment Conclusion 2
  • 4. Challenges with Hadoop 4 Infrastructure management Configuration and Tuning Capital expenditure Agility: Provisioning and Availability
  • 5. Hadoop on cloud Advantages 5 Hadoop Cloud Lower operations cost On-Demand Faster Provisioning Efficient resource utilization Distributed / Parallel Processing Elasticity Agility Pay as you go Higher Throughput
  • 6. Hadoop on Cloud Challenges  Data locality vs On-demand Cloud  Import / export parallelism, Interruption  Applications sensitive to latencies experience higher overheads  Higher overheads by VM than a bare metal 6
  • 7. Hadoop and Cloud based offerings 7 AWS MS Azure Rackspace Cloudera Hortonworks MapR Joyent EMR HDInsight GoGrid Rackspace Cloud Big Data platform Project Sahara Pivotal
  • 8. Project Sahara  Openstack component  Hadoop cluster provisioning  REST API  Integration with Hadoop management tools  Hadoop configuration templates 8 Use Cases  Faster cluster provisioning for DEV/QA  Ad-hoc or bursty analytical workloads  Resource utilization from Openstack IaaS
  • 9. Project Sahara Architecture 9 Horizon Sahara Keystone Swift Nova Glance User REST User Authentication Hadoop Image Create VM Hadoop Jobs
  • 10. Typical use cases of Hadoop on Cloud  On-Demand Analytics  Dev/QA or POC environment  Cluster required for executing Nightly, Weekly, Monthly jobs  Application deployed on Cloud and Data to be used is in cloud 10
  • 11. Considerations - Hadoop on Bare-metal or on cloud  Capex Vs Opex  Performance Vs price-performance  Data gravity  Regulatory requirements  Agility 11
  • 12. Hadoop on Cloud? 12 Performance ? Bare MetalYes Data Gravity Public Cloud/ Hosted Hadoop In-Premise Bare Metal Private Cloud In Cloud? In-premise? Control over Data In-Premise Bare Metal Private Cloud Strict control?
  • 13. Hadoop on Cloud? 13 POC/DEV/ QA Public Cloud/ Hosted Hadoop In-Premise Private Cloud CapEx Vs OpEx? In-Premise Bare Metal Private Cloud CapEx? Public Cloud/ Hosted Hadoop OpEx?

Notes de l'éditeur

  1. Hi Good afternoon everyone. My name is Chintan Bhatt and I have with me Saravanan. We both work as part of Syntel Big Data practice. Today we want to talk about two of the most popular technologies. Hadoop which is a large scale distributed processing infrastructure and cloud computing which is known for its agility, elasticity and economy. In our session today we intend to discuss about need and challenges of Hadoop on cloud.
  2. So, The agenda of the session is as follows: First we will start with basic 101 of Hadoop and Cloud Computing in which we will highlight the important features of both the technologies. Then we look at some of the challenges of Hadoop alone. After that we showcase the advantages of Hadoop on Cloud. Then we explore some of the popular offerings of Hadoop on cloud. And then we conclude with considerations when and where hadoop should be deployed. Now I would like to invite saravanan to start the session.
  3. Does the feature of Hadoop or cloud gets hampered due to integration? Parallelism is achieved through data locality, Cloud is for on-demand, between how good is that to import and export enormous amount of data just for the sake of data locality ? Amazon quoted that by using S3 as an input to MapReduce you lose the data locality optimization, which may be significant. 32 cores running 32 VM instances may produce very large overheads than 16 core to 16VM. Researches indicate that the applications that are more sensitive to latencies experience higher overheads under virtualized resources, and this overhead increases as more and more VMs are deployed per hardware node. Results show that the virtualization overhead increases with the number of VMs deployed on a hardware node. These characteristics will have a larger impact on systems having more CPU cores per node.
  4. ThanksSarvanan. As Saravanan has pointed out underlying features of Hadoop and cloud, how Hadoop can benefit from Cloud and what are the challenges. Lets try to look at some of the offerings of Hadoop on Cloud. We have the most popular Hadoop and cloud vendors and some of the offerings with partnership between them. One of the most famous and popular offering is from Amazon called Elastic MapReduce. It is a Hadoop based web-service to process data stored on Amazon cloud. It allows various applications like MR, Pig, Hive, Cascading etc. for processing the data stored in Amazon S3. Amazon also offers the service which uses MapR or Cloudera as an underlying Hadoop distribution. The other such similar offering from Microsoft is HDInsight which is Hadoop as a service on Microsoft Azure based on HDP. It uses its BLOB storage similar to the S3,wchich is just a storage service. The advantage is that it easily integrates with Microsoft based infrastructure and DOT NET applications. Rackspace offers Hadoop based offering in different flavors of hadoop through partnership with Hortonworks. They have managed hosting of optimized HDP clusters, Public cloud based Cloud Big Data platform, which allows user to deploy, test and query Hadoop without acquiring any data or signing any contract, Private cloud and Hybrid offering with Rackconnect. There are similar offerings by other cloud providers as well. The other project which is in open-source community on openstack ecosystem is Project Sahara. Which we will look at in some details
  5. Project Sahara which was previously known as Project Savanna is a joint effort by rackspace, mirantis and Hortonworks to provide users to quickly and easily provision and manage Hadoop cluster on Openstack platform It is a native openstack component which provides REST based API to provision Hadoop cluster by providing details of number of nodes, hadoop version etc. It also provides integration with various hadoop Management tools like apache ambari to manage the cluster. It also tries to make it easy to configure the cluster by creating cluster wide configuration templates for a particular hadoop distribution, which can be created once and deployed multiple times. Data operations allow users to be insulated from cluster creation tasks and work directly with data. I.e. users specify a job to run and data source, and Savanna takes care of the entire job and cluster lifecycle: bringing up the cluster, configuring the job and data source, running the job and shutting down the cluster. Savanna supports different types of jobs: MapReduce job, Hive query, Pig job, Oozie workflow. The data could be taken from various sources: Swift, remote HDFS, NoSQL and SQL databases. Some of the use-cases which were aimed by Sahara are Faster clutster provisioning for Dev/ QA or POC requirements. Ad-hoc or busrty work loads similar to EMR, which will insulate the user form cluster life-cycle. And ability to utilize the resources from Openstack and provide users of Openstack a large case data computation capabilities.
  6. This is a high level architecture of Sahara in Openstack eco-system. Here the user can either use Horizon user interface to create, configure and manage Hadoop Cluster, or user can directly use REST API interface to perform similar operation. The keystone component is used for authentication of user to perform operations on Openstack. Nova which is openstack’s computing fabric is used for creation of Hadoop virtual machines. The pre-configured Image with OS and Hadoop software is stored in Glance, which makes the node start-up quickly. And finally openstack storage service Swift can be used as a persisted storage for Hadoop Cluster, which can store the input/output of the Hadoop processing job. In order for hadoop to access Swift as a file system, jira Hadoop-8545 was created.
  7. After looking at the Cloud based offerings and Hadoop on Cloud advantages, lets look at some of the use cases which makes more sense in Hadoop on cloud. The first use case is of on-demand analytics, in which the Analytics is provided as a service, with the implementation of a particular use case, e.g. Customer Churn, recommendations, clusters etc. User is required to upload the data in cloud: Public Or Private. This is an ideal use case for Hadoop on cloud where the cluster can be provisioned based on the size of the data and computation intensity, executed and then once the result is returned cluster can be teared down. The other obvious use case is faster provisioning of the cluster for POC, Dev and QA environements which can be quickly released. Now most of the Hadoop jobs run NOT in near real time, but run at different frequency like nightly, weekly or monthly jobs. In such scenarios it does NOT makes sense to keep the cluster occupied for the time there is NO job running. In such case its fine to run the job on the data set, get the result and again tear down the cluster. Now because of the popularity of the cloud there are many applications are deployed on cloud which generates a lot of data. As these data is generated on the cloud brining processing closer to the data is more ideal as it will avoid data transfer for in-house deployment. For example some data like click stream data is an ideal example which are on cloud.
  8. To summarize the presentation, while deciding hadoop deployment mode, these are the following criterias which needs to be considered. Capex Vs Opex: This is also a generic cloud computing deployment criteria for any application, but is also equally applicable in Hadoop as well as Hadoop requires upfront investment of its infrastructure. The other rather important point is Performance Vs Price to performance. There are some mission critical applications which has to strictly follow SLA, will have performance as its highest priority while for some applications/organizations price to performance has more priority which will try to look at the cost in achieving the performance. Data gravity is a consideration on where data is generated and having the data processing eco-system close to where the data is generated. Which means considering if the data is generated in-premise from internal applications or generated in the cloud. In some of the organizations there are some regulatory requirements which may NOT allow sensitive data generated to leave the organizations boundary. And final one is the agility, for some application like Proof of Concept, whats required to quickly setup the environment and develop it and validate the concept.
  9. Based on the criteria, we can see based on our experience what approach can be taken for the deployment. As mentioned before if the performance is a higher priority then having hadoop exclusive access of the physical hardware and configured and tuned for a specific application, bare-metal deployment is preferable. And that’s what we have been helping our clients in production deployment. Considering the data where its generated if the data is generated in cloud, its best to have your data processing in cloud, it can be a Public cloud or a hosted Hadoop services. For example we did a solution for one of our retail customer where we needed to scrape the prices of the competitors and from their own websites and give them sense of the optimized price for their product and other analytics on that data. Here it made more sense to deploy that solution on cloud rather than in-house which will involve data transfer from internet to in-premise. For data generated in-premise and also where the data cannot leave the premise, solution of either on bare-metal or private cloud within organization can be a suitable solution.
  10. Application development with shorter life cycle like POC Or quickly provision and release resources for Dev and QA are also ideal candidate for cloud which gives agility. And the primary factor like Capex and Opex can also rightly influence the choice of up-front investment in Infrastructure and leveraging it or eliminating the capex in cloud and paying as per use.