SlideShare une entreprise Scribd logo
1  sur  15
Is Cloud a right companion for
Hadoop?
Saravanan Prabhagaran
& Chintan Bhatt
Agenda
Hadoop and cloud Primer
Hadoop challenges
Hadoop on cloud - Advantages and Challenges
Hadoop on cloud offerings
Typical use cases of Hadoop on cloud
Considerations Hadoop deployment
Conclusion
2
3
Diagonal
scalability
Public
Elastic
Primer
Challenges with Hadoop
4
Infrastructure management
Configuration and Tuning
Capital expenditure
Agility: Provisioning and Availability
Hadoop on cloud Advantages
5
Hadoop
Cloud
Lower
operations
cost
On-Demand
Faster
Provisioning
Efficient
resource
utilization
Distributed /
Parallel
Processing
Elasticity
Agility
Pay as you go
Higher
Throughput
Hadoop on Cloud Challenges
 Data locality vs On-demand Cloud
 Import / export parallelism, Interruption
 Applications sensitive to latencies experience higher
overheads
 Higher overheads by VM than a bare metal
6
Hadoop and Cloud based offerings
7
AWS
MS Azure
Rackspace
Cloudera
Hortonworks
MapR
Joyent
EMR
HDInsight
GoGrid
Rackspace Cloud
Big Data platform
Project Sahara
Pivotal
Project Sahara
 Openstack component
 Hadoop cluster
provisioning
 REST API
 Integration with Hadoop
management tools
 Hadoop configuration
templates
8
Use Cases
 Faster cluster
provisioning for
DEV/QA
 Ad-hoc or bursty
analytical workloads
 Resource utilization
from Openstack IaaS
Project Sahara Architecture
9
Horizon
Sahara
Keystone Swift
Nova
Glance
User
REST
User
Authentication
Hadoop
Image
Create VM
Hadoop
Jobs
Typical use cases of Hadoop on Cloud
 On-Demand Analytics
 Dev/QA or POC environment
 Cluster required for executing Nightly, Weekly, Monthly jobs
 Application deployed on Cloud and Data to be used is in
cloud
10
Considerations - Hadoop on Bare-metal or on cloud
 Capex Vs Opex
 Performance Vs price-performance
 Data gravity
 Regulatory requirements
 Agility
11
Hadoop on Cloud?
12
Performance
?
Bare MetalYes
Data
Gravity
Public Cloud/
Hosted Hadoop
In-Premise
Bare Metal
Private Cloud
In Cloud?
In-premise?
Control
over Data
In-Premise
Bare Metal
Private Cloud
Strict control?
Hadoop on Cloud?
13
POC/DEV/
QA
Public Cloud/
Hosted Hadoop
In-Premise
Private Cloud
CapEx Vs
OpEx?
In-Premise
Bare Metal
Private Cloud
CapEx?
Public Cloud/
Hosted Hadoop
OpEx?
Questions?
We Are Hiring!
Thank You!

Contenu connexe

Tendances

DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIsCisco DevNet
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduGrant Henke
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudAmazon Web Services
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureUtkarsh Pandey
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudCloudera, Inc.
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...DataWorks Summit
 
High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101Amazon Web Services
 
Self-Service Provisioning and Hadoop Management with Apache Ambari
Self-Service Provisioning and  Hadoop Management with Apache AmbariSelf-Service Provisioning and  Hadoop Management with Apache Ambari
Self-Service Provisioning and Hadoop Management with Apache AmbariDataWorks Summit
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureLynn Langit
 

Tendances (20)

DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache Kudu
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS Cloud
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azure
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in Cloud
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101High Performance Computing (HPC) on AWS 101
High Performance Computing (HPC) on AWS 101
 
Self-Service Provisioning and Hadoop Management with Apache Ambari
Self-Service Provisioning and  Hadoop Management with Apache AmbariSelf-Service Provisioning and  Hadoop Management with Apache Ambari
Self-Service Provisioning and Hadoop Management with Apache Ambari
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
HDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows AzureHDInsight Hadoop on Windows Azure
HDInsight Hadoop on Windows Azure
 

En vedette

Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for CybersecurityCloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for CybersecurityCloudera, Inc.
 
Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Cloudera, Inc.
 
泛在个人桌面服务
泛在个人桌面服务泛在个人桌面服务
泛在个人桌面服务ITband
 
Why Your Data and Analytics Should Live in the Cloud
Why Your Data and Analytics Should Live in the CloudWhy Your Data and Analytics Should Live in the Cloud
Why Your Data and Analytics Should Live in the CloudDavid Menninger
 
Resume - Alison Lange
Resume - Alison LangeResume - Alison Lange
Resume - Alison LangeAlison Lange
 
Av2 8abr10
Av2  8abr10Av2  8abr10
Av2 8abr10tahoma1
 
RubenGasparyanCV June 2016
RubenGasparyanCV June 2016RubenGasparyanCV June 2016
RubenGasparyanCV June 2016Ruben Gasparyan
 
Marmalade boy
Marmalade boyMarmalade boy
Marmalade boyYuyu Gray
 
LinkedIn Reviews_People Are Saying
LinkedIn Reviews_People Are SayingLinkedIn Reviews_People Are Saying
LinkedIn Reviews_People Are SayingMalcolm Ryder
 
Engaging the Digital Customer
Engaging the Digital CustomerEngaging the Digital Customer
Engaging the Digital CustomerMoxie
 
Splunking HL7 Healthcare Data for Business Value
Splunking HL7 Healthcare Data for Business ValueSplunking HL7 Healthcare Data for Business Value
Splunking HL7 Healthcare Data for Business ValueSplunk
 
Service Management Solution Framework (SMSF)
Service Management Solution Framework (SMSF)Service Management Solution Framework (SMSF)
Service Management Solution Framework (SMSF)Malcolm Ryder
 
Company_Presentation_12 10 2016_EN
Company_Presentation_12 10 2016_ENCompany_Presentation_12 10 2016_EN
Company_Presentation_12 10 2016_ENRalf Hildenbrand
 

En vedette (20)

Vocales 1a
Vocales 1aVocales 1a
Vocales 1a
 
Power Ral Y Aleja
Power Ral Y AlejaPower Ral Y Aleja
Power Ral Y Aleja
 
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for CybersecurityCloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
Cloudera Federal Forum 2014: Hadoop-Powered Solutions for Cybersecurity
 
Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010Intel - Nurcan Coskun - Hadoop World 2010
Intel - Nurcan Coskun - Hadoop World 2010
 
泛在个人桌面服务
泛在个人桌面服务泛在个人桌面服务
泛在个人桌面服务
 
Why Your Data and Analytics Should Live in the Cloud
Why Your Data and Analytics Should Live in the CloudWhy Your Data and Analytics Should Live in the Cloud
Why Your Data and Analytics Should Live in the Cloud
 
Resume - Alison Lange
Resume - Alison LangeResume - Alison Lange
Resume - Alison Lange
 
Cloudera 5.3 Update
Cloudera 5.3 UpdateCloudera 5.3 Update
Cloudera 5.3 Update
 
Apa
ApaApa
Apa
 
Av2 8abr10
Av2  8abr10Av2  8abr10
Av2 8abr10
 
RubenGasparyanCV June 2016
RubenGasparyanCV June 2016RubenGasparyanCV June 2016
RubenGasparyanCV June 2016
 
Boston webcast nv_me_2016-09
Boston webcast nv_me_2016-09Boston webcast nv_me_2016-09
Boston webcast nv_me_2016-09
 
Marmalade boy
Marmalade boyMarmalade boy
Marmalade boy
 
LinkedIn Reviews_People Are Saying
LinkedIn Reviews_People Are SayingLinkedIn Reviews_People Are Saying
LinkedIn Reviews_People Are Saying
 
Engaging the Digital Customer
Engaging the Digital CustomerEngaging the Digital Customer
Engaging the Digital Customer
 
Splunking HL7 Healthcare Data for Business Value
Splunking HL7 Healthcare Data for Business ValueSplunking HL7 Healthcare Data for Business Value
Splunking HL7 Healthcare Data for Business Value
 
Routin
RoutinRoutin
Routin
 
Service Management Solution Framework (SMSF)
Service Management Solution Framework (SMSF)Service Management Solution Framework (SMSF)
Service Management Solution Framework (SMSF)
 
Company_Presentation_12 10 2016_EN
Company_Presentation_12 10 2016_ENCompany_Presentation_12 10 2016_EN
Company_Presentation_12 10 2016_EN
 
Toile
ToileToile
Toile
 

Similaire à Is Cloud a right Companion for Hadoop

Extending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the CloudExtending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the CloudDataWorks Summit
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
Build and use a DevOps driven Migration Pipeline
Build and use a DevOps driven Migration PipelineBuild and use a DevOps driven Migration Pipeline
Build and use a DevOps driven Migration PipelineVedanta Barooah
 
Hadoop online training by certified trainer
Hadoop online training by certified trainerHadoop online training by certified trainer
Hadoop online training by certified trainersriram0233
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on webcsandit
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...cscpconf
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsCloudera, Inc.
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...EMC
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Integration of SAP HANA with Hadoop
Integration of SAP HANA with HadoopIntegration of SAP HANA with Hadoop
Integration of SAP HANA with HadoopRamkumar Rajendran
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingSamatha Kamuni
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourseSamatha Kamuni
 

Similaire à Is Cloud a right Companion for Hadoop (20)

Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Extending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the CloudExtending your Hadoop Implementation to the Cloud
Extending your Hadoop Implementation to the Cloud
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Build and use a DevOps driven Migration Pipeline
Build and use a DevOps driven Migration PipelineBuild and use a DevOps driven Migration Pipeline
Build and use a DevOps driven Migration Pipeline
 
Hadoop online training by certified trainer
Hadoop online training by certified trainerHadoop online training by certified trainer
Hadoop online training by certified trainer
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
Design architecture based on web
Design architecture based on webDesign architecture based on web
Design architecture based on web
 
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVI...
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
 
Hadoop online training
Hadoop online trainingHadoop online training
Hadoop online training
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Integration of SAP HANA with Hadoop
Integration of SAP HANA with HadoopIntegration of SAP HANA with Hadoop
Integration of SAP HANA with Hadoop
 
Why Hadoop as a Service?
Why Hadoop as a Service?Why Hadoop as a Service?
Why Hadoop as a Service?
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Best Hadoop and Amazon Online Training
Best Hadoop and Amazon Online TrainingBest Hadoop and Amazon Online Training
Best Hadoop and Amazon Online Training
 
Hadoop and aws map reducecourse
Hadoop and aws map reducecourseHadoop and aws map reducecourse
Hadoop and aws map reducecourse
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Dernier (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Is Cloud a right Companion for Hadoop

  • 1. Is Cloud a right companion for Hadoop? Saravanan Prabhagaran & Chintan Bhatt
  • 2. Agenda Hadoop and cloud Primer Hadoop challenges Hadoop on cloud - Advantages and Challenges Hadoop on cloud offerings Typical use cases of Hadoop on cloud Considerations Hadoop deployment Conclusion 2
  • 4. Challenges with Hadoop 4 Infrastructure management Configuration and Tuning Capital expenditure Agility: Provisioning and Availability
  • 5. Hadoop on cloud Advantages 5 Hadoop Cloud Lower operations cost On-Demand Faster Provisioning Efficient resource utilization Distributed / Parallel Processing Elasticity Agility Pay as you go Higher Throughput
  • 6. Hadoop on Cloud Challenges  Data locality vs On-demand Cloud  Import / export parallelism, Interruption  Applications sensitive to latencies experience higher overheads  Higher overheads by VM than a bare metal 6
  • 7. Hadoop and Cloud based offerings 7 AWS MS Azure Rackspace Cloudera Hortonworks MapR Joyent EMR HDInsight GoGrid Rackspace Cloud Big Data platform Project Sahara Pivotal
  • 8. Project Sahara  Openstack component  Hadoop cluster provisioning  REST API  Integration with Hadoop management tools  Hadoop configuration templates 8 Use Cases  Faster cluster provisioning for DEV/QA  Ad-hoc or bursty analytical workloads  Resource utilization from Openstack IaaS
  • 9. Project Sahara Architecture 9 Horizon Sahara Keystone Swift Nova Glance User REST User Authentication Hadoop Image Create VM Hadoop Jobs
  • 10. Typical use cases of Hadoop on Cloud  On-Demand Analytics  Dev/QA or POC environment  Cluster required for executing Nightly, Weekly, Monthly jobs  Application deployed on Cloud and Data to be used is in cloud 10
  • 11. Considerations - Hadoop on Bare-metal or on cloud  Capex Vs Opex  Performance Vs price-performance  Data gravity  Regulatory requirements  Agility 11
  • 12. Hadoop on Cloud? 12 Performance ? Bare MetalYes Data Gravity Public Cloud/ Hosted Hadoop In-Premise Bare Metal Private Cloud In Cloud? In-premise? Control over Data In-Premise Bare Metal Private Cloud Strict control?
  • 13. Hadoop on Cloud? 13 POC/DEV/ QA Public Cloud/ Hosted Hadoop In-Premise Private Cloud CapEx Vs OpEx? In-Premise Bare Metal Private Cloud CapEx? Public Cloud/ Hosted Hadoop OpEx?

Notes de l'éditeur

  1. Hi Good afternoon everyone. My name is Chintan Bhatt and I have with me Saravanan. We both work as part of Syntel Big Data practice. Today we want to talk about two of the most popular technologies. Hadoop which is a large scale distributed processing infrastructure and cloud computing which is known for its agility, elasticity and economy. In our session today we intend to discuss about need and challenges of Hadoop on cloud.
  2. So, The agenda of the session is as follows: First we will start with basic 101 of Hadoop and Cloud Computing in which we will highlight the important features of both the technologies. Then we look at some of the challenges of Hadoop alone. After that we showcase the advantages of Hadoop on Cloud. Then we explore some of the popular offerings of Hadoop on cloud. And then we conclude with considerations when and where hadoop should be deployed. Now I would like to invite saravanan to start the session.
  3. Does the feature of Hadoop or cloud gets hampered due to integration? Parallelism is achieved through data locality, Cloud is for on-demand, between how good is that to import and export enormous amount of data just for the sake of data locality ? Amazon quoted that by using S3 as an input to MapReduce you lose the data locality optimization, which may be significant. 32 cores running 32 VM instances may produce very large overheads than 16 core to 16VM. Researches indicate that the applications that are more sensitive to latencies experience higher overheads under virtualized resources, and this overhead increases as more and more VMs are deployed per hardware node. Results show that the virtualization overhead increases with the number of VMs deployed on a hardware node. These characteristics will have a larger impact on systems having more CPU cores per node.
  4. ThanksSarvanan. As Saravanan has pointed out underlying features of Hadoop and cloud, how Hadoop can benefit from Cloud and what are the challenges. Lets try to look at some of the offerings of Hadoop on Cloud. We have the most popular Hadoop and cloud vendors and some of the offerings with partnership between them. One of the most famous and popular offering is from Amazon called Elastic MapReduce. It is a Hadoop based web-service to process data stored on Amazon cloud. It allows various applications like MR, Pig, Hive, Cascading etc. for processing the data stored in Amazon S3. Amazon also offers the service which uses MapR or Cloudera as an underlying Hadoop distribution. The other such similar offering from Microsoft is HDInsight which is Hadoop as a service on Microsoft Azure based on HDP. It uses its BLOB storage similar to the S3,wchich is just a storage service. The advantage is that it easily integrates with Microsoft based infrastructure and DOT NET applications. Rackspace offers Hadoop based offering in different flavors of hadoop through partnership with Hortonworks. They have managed hosting of optimized HDP clusters, Public cloud based Cloud Big Data platform, which allows user to deploy, test and query Hadoop without acquiring any data or signing any contract, Private cloud and Hybrid offering with Rackconnect. There are similar offerings by other cloud providers as well. The other project which is in open-source community on openstack ecosystem is Project Sahara. Which we will look at in some details
  5. Project Sahara which was previously known as Project Savanna is a joint effort by rackspace, mirantis and Hortonworks to provide users to quickly and easily provision and manage Hadoop cluster on Openstack platform It is a native openstack component which provides REST based API to provision Hadoop cluster by providing details of number of nodes, hadoop version etc. It also provides integration with various hadoop Management tools like apache ambari to manage the cluster. It also tries to make it easy to configure the cluster by creating cluster wide configuration templates for a particular hadoop distribution, which can be created once and deployed multiple times. Data operations allow users to be insulated from cluster creation tasks and work directly with data. I.e. users specify a job to run and data source, and Savanna takes care of the entire job and cluster lifecycle: bringing up the cluster, configuring the job and data source, running the job and shutting down the cluster. Savanna supports different types of jobs: MapReduce job, Hive query, Pig job, Oozie workflow. The data could be taken from various sources: Swift, remote HDFS, NoSQL and SQL databases. Some of the use-cases which were aimed by Sahara are Faster clutster provisioning for Dev/ QA or POC requirements. Ad-hoc or busrty work loads similar to EMR, which will insulate the user form cluster life-cycle. And ability to utilize the resources from Openstack and provide users of Openstack a large case data computation capabilities.
  6. This is a high level architecture of Sahara in Openstack eco-system. Here the user can either use Horizon user interface to create, configure and manage Hadoop Cluster, or user can directly use REST API interface to perform similar operation. The keystone component is used for authentication of user to perform operations on Openstack. Nova which is openstack’s computing fabric is used for creation of Hadoop virtual machines. The pre-configured Image with OS and Hadoop software is stored in Glance, which makes the node start-up quickly. And finally openstack storage service Swift can be used as a persisted storage for Hadoop Cluster, which can store the input/output of the Hadoop processing job. In order for hadoop to access Swift as a file system, jira Hadoop-8545 was created.
  7. After looking at the Cloud based offerings and Hadoop on Cloud advantages, lets look at some of the use cases which makes more sense in Hadoop on cloud. The first use case is of on-demand analytics, in which the Analytics is provided as a service, with the implementation of a particular use case, e.g. Customer Churn, recommendations, clusters etc. User is required to upload the data in cloud: Public Or Private. This is an ideal use case for Hadoop on cloud where the cluster can be provisioned based on the size of the data and computation intensity, executed and then once the result is returned cluster can be teared down. The other obvious use case is faster provisioning of the cluster for POC, Dev and QA environements which can be quickly released. Now most of the Hadoop jobs run NOT in near real time, but run at different frequency like nightly, weekly or monthly jobs. In such scenarios it does NOT makes sense to keep the cluster occupied for the time there is NO job running. In such case its fine to run the job on the data set, get the result and again tear down the cluster. Now because of the popularity of the cloud there are many applications are deployed on cloud which generates a lot of data. As these data is generated on the cloud brining processing closer to the data is more ideal as it will avoid data transfer for in-house deployment. For example some data like click stream data is an ideal example which are on cloud.
  8. To summarize the presentation, while deciding hadoop deployment mode, these are the following criterias which needs to be considered. Capex Vs Opex: This is also a generic cloud computing deployment criteria for any application, but is also equally applicable in Hadoop as well as Hadoop requires upfront investment of its infrastructure. The other rather important point is Performance Vs Price to performance. There are some mission critical applications which has to strictly follow SLA, will have performance as its highest priority while for some applications/organizations price to performance has more priority which will try to look at the cost in achieving the performance. Data gravity is a consideration on where data is generated and having the data processing eco-system close to where the data is generated. Which means considering if the data is generated in-premise from internal applications or generated in the cloud. In some of the organizations there are some regulatory requirements which may NOT allow sensitive data generated to leave the organizations boundary. And final one is the agility, for some application like Proof of Concept, whats required to quickly setup the environment and develop it and validate the concept.
  9. Based on the criteria, we can see based on our experience what approach can be taken for the deployment. As mentioned before if the performance is a higher priority then having hadoop exclusive access of the physical hardware and configured and tuned for a specific application, bare-metal deployment is preferable. And that’s what we have been helping our clients in production deployment. Considering the data where its generated if the data is generated in cloud, its best to have your data processing in cloud, it can be a Public cloud or a hosted Hadoop services. For example we did a solution for one of our retail customer where we needed to scrape the prices of the competitors and from their own websites and give them sense of the optimized price for their product and other analytics on that data. Here it made more sense to deploy that solution on cloud rather than in-house which will involve data transfer from internet to in-premise. For data generated in-premise and also where the data cannot leave the premise, solution of either on bare-metal or private cloud within organization can be a suitable solution.
  10. Application development with shorter life cycle like POC Or quickly provision and release resources for Dev and QA are also ideal candidate for cloud which gives agility. And the primary factor like Capex and Opex can also rightly influence the choice of up-front investment in Infrastructure and leveraging it or eliminating the capex in cloud and paying as per use.