SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
1 
November 13, 2014| Las Vegas, NV 
AmandeepKhurana
2 
About me 
•Principal Solutions Architect @ Cloudera 
•Engineer @ AWS 
•Co-author, HBasein Action
3 
Agenda 
•Motivation 
•Deployment paradigms 
•Storage 
•Networking 
•Instances 
•Security 
•High availability, backups, disaster recovery 
•Planning your cluster 
•Available resources
4 
•Parallel trends 
–Commoditizing infrastructure 
–Commoditizing data 
•Worlds converging… but with considerations 
–Cost 
–Flexibility 
–Ease of use 
–Operations 
–Location 
–Performance 
–Security 
Why you should care
5 
Intersection
6 
The devil…
7 
Primary consideration –Storage (source of truth) 
Amazon S3 
•Ad-hoc batch workloads 
•SLA batch workloads 
HDFS 
•Ad-hoc batch workloads 
•SLA batch workloads 
•Ad-hoc interactive workloads 
•SLA interactive workloads 
Predominantly transient clusters 
Long running clusters
9 
Deployment models 
Transient clusters 
Long-running clusters 
Primary storage substrate 
S3 or remote HDFS 
HDFS 
Backups 
S3 
S3 or second HDFS cluster 
Workloads 
•Batch(MapReduce, Spark) 
•Interactive is an anti- pattern 
•Batch(MapReduce, Spark) 
•Interactive (HBase, Solr, Impala) 
Role of cluster 
Compute only 
Compute and storage
10 
Storage 
Access pattern, performance
11 
Storage considerations 
Hadoop paradigm: 
Bring compute to storage 
Cloud paradigm: 
Everything as a service
12 
•Instance store 
–Local storage attached to instance 
–Temporary 
–Instance dependent (not configurable) 
•Amazon Elastic Block Store (EBS) -Block-level storage volume 
–External to instance 
–Lifecycle independent of instance 
•Amazon Simple Storage Service (S3) –BLOB store 
–External data store 
–Simple API –Get, Put, Delete 
–Instance dependent bandwidth 
Storage choices in AWS
13 
•In MapReducejobs by using s3a URI 
•Distcp 
–hadoopdistcp<options> hdfs:///foo/bar s3a:///mybucket/foo/ 
•HBasesnapshot export 
–hbaseorg.apache.hadoop.hbase.snapshot.ExportSnapshot 
<options>-Dmapred.task.timeout=15000000 
-snapshot <name> -mappers <nmappers> -copy-to <dir> 
Interacting with S3
14 
•Multiple implementations in the Hadoopproject 
–S3 (block based) 
–S3N (file based, using jets3t) 
–S3A (file based, using AWS SDK) Latest stuff 
•Bandwidth to S3 depends on instance type 
–<200 MB/s per instance on some of the larger ones 
•Process 
Interacting with S3 –how it works
15 
•Tune 
•Parallelize 
•Writing to S3 
–Multi-part upload for > 5 GB files 
–Pick multiple drives for local staging (HADOOP-10610) 
–Up the task timeouts when writing large files 
•Reading from S3 
–Range reads within map tasks via multiple threads 
•Large objects are better (less load on metadata lookups) 
•Randomize file names (metadata lookups are spread out) 
Optimizing S3 interaction
16 
•Ephemeral drives on Amazon EC2 instances 
•Persistent for as long as the instances are alive (no pausing) 
•Use S3 for backups 
•No EBS 
–Over the network 
–Designed for random I/O 
HDFS in AWS
17 
Networking 
Performance, access, and security
18 
Topologies – Deploy in Virtual Private Cloud (VPC) 
Cluster in public subnet Cluster in private subnet 
AWS VPC 
Corporate 
network 
Server Server Server Server 
VPN or 
Direct 
Connect 
EC2 
instance 
EC2 
instance 
EC2 
instance 
EC2 
instance 
Cloudera 
Enterprise 
Cluster 
in a public 
subnet 
Internet, Other AWS services 
EC2 
instance 
EC2 
instance 
Edge 
Nodes 
EC2 
instance 
EC2 
instance 
Edge 
Nodes 
AWS VPC 
Corporate 
network 
Server Server Server Server 
VPN or 
Direct 
Connect 
EC2 
instance 
EC2 
instance 
EC2 
instance 
EC2 
instance 
Cloudera 
Enterprise 
Cluster 
in a private 
subnet 
Internet, Other AWS services 
NAT EC2 
instance 
Public 
subnet 
EC2 
instance 
EC2 
instance 
Edge 
Nodes 
EC2 
instance 
EC2 
instance 
Edge 
Nodes
19 
•Instance <-> Instance link 
–10G 
–10G + SR-IOV (HVM) 
–!10G 
•Instance <-> S3 (equal to instance to public internet) 
•Placement groups 
–Performance maydip outside of PGs 
•Clusters within a single Availability Zone 
Performance considerations
20 
EC2 instances 
Storage, cost, performance, availability, and fault tolerance
21 
Picking the right instance 
Transient clusters 
•Primary considerations: 
–Bandwidth 
–CPU 
–Memory 
•Secondary considerations 
–Availability and fault tolerance 
–Local storage density 
•Typical choices 
–C3 family, M3 family, M1 family 
–Anti pattern to use storage dense 
Long running clusters 
•Primary considerations 
–Local storage is key 
–CPU 
–Memory 
–Availability and fault tolerance 
–Bandwidth 
•Typical choices 
–hs1.8xlarge, cc2.8xlarge, i2.8xlarge
22 
Amazon Machine Image (AMI) 
•2 kinds –PV and HVM. 
•Pick a dependable base AMI 
•Things to look out for 
–Kernel patches 
–Third-party software and library versions 
•Increase root volume size
23 
Security
24 
•Amazon Virtual Private Cloud (VPC) options 
–Private subnet 
•All traffic outside of VPC via NAT 
–Public subnet 
•Network ACLS at subnet level 
•Security groups 
•EDH guidelines for Kerberos, Active Directory, and Encryption 
•S3 provides server-side encryption 
Security considerations
25 
High Availability, Backups, Disaster Recovery
26 
•High Availability available in the Hadoopstack 
–Run NamenodeHA with 5 Journal Nodes 
–Run 5 Zookeepers 
–Run multiple HBasemasters 
•Backups and disaster recovery (based on RPO/RTO requirements) 
–Hot backup: Active-Active clusters 
–Warm backup: S3 
•Hadoop level snapshots –HDFS, HBase 
–Cold backup: Amazon Glacier 
HA, Backups, DR
27 
Planning your cluster
28 
Capacity, performance, access patterns 
•Bad news –no simple answer. You have to think through it. 
•Good news –mistakes are cheap. Learn from ours to make them even cheaper. 
•Start with workload type (ad-hoc / SLA, batch / interactive) 
•How much % of the day will you use your cluster? 
•How much data do you want to store? 
•What are the performance requirements? 
•How are you ingesting data? What does the workflow look like?
29 
•Just released –Cloudera Director! 
•AWS Quickstart 
•Available resources 
–Reference Architecture (just refreshed) 
–Best practices blog 
To make life easier
Thank you 
We are hiring!
31 
•Smarter with topology 
•Amazon EBS as storage for HDFS 
•Deeper S3 integration 
•Amazon Kinesis integration 
•Workflow management 
Opportunities
http://bit.ly/awsevals

Contenu connexe

Tendances

BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
 
Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesAmazon Web Services
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudJaipaul Agonus
 
Data science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel AvivData science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel AvivAmazon Web Services
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...Amazon Web Services
 
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best PracticesAWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best PracticesAmazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceAmazon Web Services
 
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupAutoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupRafal Kwasny
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceAmazon Web Services
 
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)Amazon Web Services
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)Amazon Web Services
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Amazon Web Services
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceAmazon Web Services
 
Amazon Web Services - Relational Database Service Meetup
Amazon Web Services - Relational Database Service MeetupAmazon Web Services - Relational Database Service Meetup
Amazon Web Services - Relational Database Service Meetupcyrilkhairallah
 
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Amazon Web Services
 
Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum Efficiency
Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum EfficiencyDeploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum Efficiency
Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum EfficiencyAmazon Web Services
 

Tendances (20)

BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
Masterclass Live: Amazon EMR
Masterclass Live: Amazon EMRMasterclass Live: Amazon EMR
Masterclass Live: Amazon EMR
 
Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute Services
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloudHive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud
 
Data science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel AvivData science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel Aviv
 
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...
 
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best PracticesAWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupAutoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)Masterclass Webinar: Amazon Elastic MapReduce (EMR)
Masterclass Webinar: Amazon Elastic MapReduce (EMR)
 
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
AWS re:Invent 2016: How to Scale and Operate Elasticsearch on AWS (DEV307)
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 
Introduction to Amazon Relational Database Service
Introduction to Amazon Relational Database ServiceIntroduction to Amazon Relational Database Service
Introduction to Amazon Relational Database Service
 
Amazon Web Services - Relational Database Service Meetup
Amazon Web Services - Relational Database Service MeetupAmazon Web Services - Relational Database Service Meetup
Amazon Web Services - Relational Database Service Meetup
 
Amazon RDS Deep Dive
Amazon RDS Deep DiveAmazon RDS Deep Dive
Amazon RDS Deep Dive
 
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
Advanced Data Migration Techniques for Amazon RDS (DAT308) | AWS re:Invent 2013
 
Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum Efficiency
Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum EfficiencyDeploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum Efficiency
Deploying a Disaster Recovery Site on AWS: Minimal Cost with Maximum Efficiency
 

En vedette

Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationmattlieber
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2IMC Institute
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupAndrei Savu
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
Big Data Hadoop Local and Public Cloud (Amazon EMR)
Big Data Hadoop Local and Public Cloud (Amazon EMR)Big Data Hadoop Local and Public Cloud (Amazon EMR)
Big Data Hadoop Local and Public Cloud (Amazon EMR)IMC Institute
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea
 
[G6]hadoop이중화왜하는거지
[G6]hadoop이중화왜하는거지[G6]hadoop이중화왜하는거지
[G6]hadoop이중화왜하는거지NAVER D2
 
Informatica Big Data Edition - Profinit - Jan Ulrych
Informatica Big Data Edition - Profinit - Jan UlrychInformatica Big Data Edition - Profinit - Jan Ulrych
Informatica Big Data Edition - Profinit - Jan UlrychProfinit
 
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa영진 박
 
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data HubCloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data HubCloudera, Inc.
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...Amazon Web Services
 
Deploy, Scale and Manage your Microsoft Investments with AWS
Deploy, Scale and Manage your Microsoft Investments with AWSDeploy, Scale and Manage your Microsoft Investments with AWS
Deploy, Scale and Manage your Microsoft Investments with AWSAmazon Web Services
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersAmazon Web Services
 
(SEC312) Taking a DevOps Approach to Security | AWS re:Invent 2014
(SEC312) Taking a DevOps Approach to Security | AWS re:Invent 2014(SEC312) Taking a DevOps Approach to Security | AWS re:Invent 2014
(SEC312) Taking a DevOps Approach to Security | AWS re:Invent 2014Amazon Web Services
 
Running your First Application on AWS
Running your First Application on AWSRunning your First Application on AWS
Running your First Application on AWSAmazon Web Services
 
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR Hadoop
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR HadoopCrunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR Hadoop
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR HadoopAdrian Cockcroft
 

En vedette (20)

Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Big Data Hadoop Local and Public Cloud (Amazon EMR)
Big Data Hadoop Local and Public Cloud (Amazon EMR)Big Data Hadoop Local and Public Cloud (Amazon EMR)
Big Data Hadoop Local and Public Cloud (Amazon EMR)
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
[G6]hadoop이중화왜하는거지
[G6]hadoop이중화왜하는거지[G6]hadoop이중화왜하는거지
[G6]hadoop이중화왜하는거지
 
Informatica Big Data Edition - Profinit - Jan Ulrych
Informatica Big Data Edition - Profinit - Jan UlrychInformatica Big Data Edition - Profinit - Jan Ulrych
Informatica Big Data Edition - Profinit - Jan Ulrych
 
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa
2012.04.11 미래사회와 빅 데이터(big data) 기술 nipa
 
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data HubCloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...
(ADV303) MediaMath’s Data Revolution with Amazon Kinesis and Amazon EMR | AWS...
 
Deploy, Scale and Manage your Microsoft Investments with AWS
Deploy, Scale and Manage your Microsoft Investments with AWSDeploy, Scale and Manage your Microsoft Investments with AWS
Deploy, Scale and Manage your Microsoft Investments with AWS
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
(SEC312) Taking a DevOps Approach to Security | AWS re:Invent 2014
(SEC312) Taking a DevOps Approach to Security | AWS re:Invent 2014(SEC312) Taking a DevOps Approach to Security | AWS re:Invent 2014
(SEC312) Taking a DevOps Approach to Security | AWS re:Invent 2014
 
Running your First Application on AWS
Running your First Application on AWSRunning your First Application on AWS
Running your First Application on AWS
 
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR Hadoop
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR HadoopCrunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR Hadoop
Crunch Your Data in the Cloud with Elastic Map Reduce - Amazon EMR Hadoop
 

Similaire à (BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS re:Invent 2014

BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
 
AWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapAWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapBarry Jones
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWSTom Laszewski
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
MongoDB and AWS: Integrations
MongoDB and AWS: IntegrationsMongoDB and AWS: Integrations
MongoDB and AWS: IntegrationsMongoDB
 
Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Sandeep Kunkunuru
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Amazon Aurora New Features - September 2016 Webinar Series
Amazon Aurora New Features - September 2016 Webinar SeriesAmazon Aurora New Features - September 2016 Webinar Series
Amazon Aurora New Features - September 2016 Webinar SeriesAmazon Web Services
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemInSemble
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonHentsū
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceJoydeep Sen Sarma
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloudAvinash Ramineni
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptxbetalab
 
What are clouds made from
What are clouds made fromWhat are clouds made from
What are clouds made fromJohn Garbutt
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoBig Data Joe™ Rossi
 

Similaire à (BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS re:Invent 2014 (20)

BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
AWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapAWS re:Invent 2013 Recap
AWS re:Invent 2013 Recap
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
MongoDB and AWS: Integrations
MongoDB and AWS: IntegrationsMongoDB and AWS: Integrations
MongoDB and AWS: Integrations
 
Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Amazon Aurora New Features - September 2016 Webinar Series
Amazon Aurora New Features - September 2016 Webinar SeriesAmazon Aurora New Features - September 2016 Webinar Series
Amazon Aurora New Features - September 2016 Webinar Series
 
Introduction To Hadoop Ecosystem
Introduction To Hadoop EcosystemIntroduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - LondonInfinitely Scalable Clusters - Grid Computing on Public Cloud - London
Infinitely Scalable Clusters - Grid Computing on Public Cloud - London
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
 
Effectively deploying hadoop to the cloud
Effectively  deploying hadoop to the cloudEffectively  deploying hadoop to the cloud
Effectively deploying hadoop to the cloud
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
What are clouds made from
What are clouds made fromWhat are clouds made from
What are clouds made from
 
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDiscoSD Big Data Monthly Meetup #4 - Session 2 - WANDisco
SD Big Data Monthly Meetup #4 - Session 2 - WANDisco
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Dernier

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 

Dernier (20)

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 

(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS re:Invent 2014

  • 1. 1 November 13, 2014| Las Vegas, NV AmandeepKhurana
  • 2. 2 About me •Principal Solutions Architect @ Cloudera •Engineer @ AWS •Co-author, HBasein Action
  • 3. 3 Agenda •Motivation •Deployment paradigms •Storage •Networking •Instances •Security •High availability, backups, disaster recovery •Planning your cluster •Available resources
  • 4. 4 •Parallel trends –Commoditizing infrastructure –Commoditizing data •Worlds converging… but with considerations –Cost –Flexibility –Ease of use –Operations –Location –Performance –Security Why you should care
  • 7. 7 Primary consideration –Storage (source of truth) Amazon S3 •Ad-hoc batch workloads •SLA batch workloads HDFS •Ad-hoc batch workloads •SLA batch workloads •Ad-hoc interactive workloads •SLA interactive workloads Predominantly transient clusters Long running clusters
  • 8. 9 Deployment models Transient clusters Long-running clusters Primary storage substrate S3 or remote HDFS HDFS Backups S3 S3 or second HDFS cluster Workloads •Batch(MapReduce, Spark) •Interactive is an anti- pattern •Batch(MapReduce, Spark) •Interactive (HBase, Solr, Impala) Role of cluster Compute only Compute and storage
  • 9. 10 Storage Access pattern, performance
  • 10. 11 Storage considerations Hadoop paradigm: Bring compute to storage Cloud paradigm: Everything as a service
  • 11. 12 •Instance store –Local storage attached to instance –Temporary –Instance dependent (not configurable) •Amazon Elastic Block Store (EBS) -Block-level storage volume –External to instance –Lifecycle independent of instance •Amazon Simple Storage Service (S3) –BLOB store –External data store –Simple API –Get, Put, Delete –Instance dependent bandwidth Storage choices in AWS
  • 12. 13 •In MapReducejobs by using s3a URI •Distcp –hadoopdistcp<options> hdfs:///foo/bar s3a:///mybucket/foo/ •HBasesnapshot export –hbaseorg.apache.hadoop.hbase.snapshot.ExportSnapshot <options>-Dmapred.task.timeout=15000000 -snapshot <name> -mappers <nmappers> -copy-to <dir> Interacting with S3
  • 13. 14 •Multiple implementations in the Hadoopproject –S3 (block based) –S3N (file based, using jets3t) –S3A (file based, using AWS SDK) Latest stuff •Bandwidth to S3 depends on instance type –<200 MB/s per instance on some of the larger ones •Process Interacting with S3 –how it works
  • 14. 15 •Tune •Parallelize •Writing to S3 –Multi-part upload for > 5 GB files –Pick multiple drives for local staging (HADOOP-10610) –Up the task timeouts when writing large files •Reading from S3 –Range reads within map tasks via multiple threads •Large objects are better (less load on metadata lookups) •Randomize file names (metadata lookups are spread out) Optimizing S3 interaction
  • 15. 16 •Ephemeral drives on Amazon EC2 instances •Persistent for as long as the instances are alive (no pausing) •Use S3 for backups •No EBS –Over the network –Designed for random I/O HDFS in AWS
  • 16. 17 Networking Performance, access, and security
  • 17. 18 Topologies – Deploy in Virtual Private Cloud (VPC) Cluster in public subnet Cluster in private subnet AWS VPC Corporate network Server Server Server Server VPN or Direct Connect EC2 instance EC2 instance EC2 instance EC2 instance Cloudera Enterprise Cluster in a public subnet Internet, Other AWS services EC2 instance EC2 instance Edge Nodes EC2 instance EC2 instance Edge Nodes AWS VPC Corporate network Server Server Server Server VPN or Direct Connect EC2 instance EC2 instance EC2 instance EC2 instance Cloudera Enterprise Cluster in a private subnet Internet, Other AWS services NAT EC2 instance Public subnet EC2 instance EC2 instance Edge Nodes EC2 instance EC2 instance Edge Nodes
  • 18. 19 •Instance <-> Instance link –10G –10G + SR-IOV (HVM) –!10G •Instance <-> S3 (equal to instance to public internet) •Placement groups –Performance maydip outside of PGs •Clusters within a single Availability Zone Performance considerations
  • 19. 20 EC2 instances Storage, cost, performance, availability, and fault tolerance
  • 20. 21 Picking the right instance Transient clusters •Primary considerations: –Bandwidth –CPU –Memory •Secondary considerations –Availability and fault tolerance –Local storage density •Typical choices –C3 family, M3 family, M1 family –Anti pattern to use storage dense Long running clusters •Primary considerations –Local storage is key –CPU –Memory –Availability and fault tolerance –Bandwidth •Typical choices –hs1.8xlarge, cc2.8xlarge, i2.8xlarge
  • 21. 22 Amazon Machine Image (AMI) •2 kinds –PV and HVM. •Pick a dependable base AMI •Things to look out for –Kernel patches –Third-party software and library versions •Increase root volume size
  • 23. 24 •Amazon Virtual Private Cloud (VPC) options –Private subnet •All traffic outside of VPC via NAT –Public subnet •Network ACLS at subnet level •Security groups •EDH guidelines for Kerberos, Active Directory, and Encryption •S3 provides server-side encryption Security considerations
  • 24. 25 High Availability, Backups, Disaster Recovery
  • 25. 26 •High Availability available in the Hadoopstack –Run NamenodeHA with 5 Journal Nodes –Run 5 Zookeepers –Run multiple HBasemasters •Backups and disaster recovery (based on RPO/RTO requirements) –Hot backup: Active-Active clusters –Warm backup: S3 •Hadoop level snapshots –HDFS, HBase –Cold backup: Amazon Glacier HA, Backups, DR
  • 26. 27 Planning your cluster
  • 27. 28 Capacity, performance, access patterns •Bad news –no simple answer. You have to think through it. •Good news –mistakes are cheap. Learn from ours to make them even cheaper. •Start with workload type (ad-hoc / SLA, batch / interactive) •How much % of the day will you use your cluster? •How much data do you want to store? •What are the performance requirements? •How are you ingesting data? What does the workflow look like?
  • 28. 29 •Just released –Cloudera Director! •AWS Quickstart •Available resources –Reference Architecture (just refreshed) –Best practices blog To make life easier
  • 29. Thank you We are hiring!
  • 30. 31 •Smarter with topology •Amazon EBS as storage for HDFS •Deeper S3 integration •Amazon Kinesis integration •Workflow management Opportunities