SlideShare une entreprise Scribd logo
1  sur  39
How @twitterhadoop
chose Google Cloud
Joep Rottinghuis & Lohit VijayaRenu
Twitter Hadoop Team (@twitterhadoop)
1
1. Twitter infrastructure
2. Hadoop evaluation
3. Evaluation outcomes
4. Recommendations and conclusions
5. Q&A
Credit to presentation at GoogleNext 2019 by Derek Lyon & Dave
Beckett (https://youtu.be/4FLFcWgZdo4) 2
Twitter Infrastructure
3
Twitter’s infrastructure
● Twitter founded in 2006
● Global-scale application
● Unique scale and performance characteristics
● Real-time
● Built to purpose and well optimized
● Large data centers
4
Strategic questions
1. What is the long-term mix of cloud versus
datacenter?
2. Which cloud provider(s) should we use?
3. How can we be confident in this type of
decision?
4. Why should we evaluate this now (2016)?
5
Tactical questions
1. What is the feasibility and cost of large-scale
adoption?
2. Which workloads are best-suited for the cloud
and are they separable?
3. How would our architecture change on the
cloud?
4. How do we get to an actionable plan?
6
Evaluation process
● Started evaluation in 2016
● Were able to make a patient, rigorous
decision
● Defined baseline workload requirements
● Engaged major providers
● Analyzed clouds for each major workload
● Built overall cloud plan
● Iterated and optimized choices 7
Evaluation Timeline
Considering Moving
● PoC’s Completed
& Results
Delivered
● Legal Agreement with
T&C’s ratified
● Kickoff dataproc,
bigquery, dataflow
experimentation
● Security and
Platform
Review
● v1 Hadoop on GCP
Architecture
Ratified
● Begin build for
migration plan
● Consensus built with
Product, Revenue, Eng
● Migration Kickoff
● Proposal to migrate
Hadoop to GCP
formally accepted
June
‘16
● Initial Cloud RfP release
● 27 Synthetic PoC’s on
GCP begin
● Testing Projects /
Network established
Sept
‘16
Mar
‘17
July
‘17
Nov
‘17
Jan
‘18
Apr
‘18
June
‘18
8
Built overall cloud plan
● Created a series of candidate architectures
for each platform with their resource
requirements
● Developed a migration project plan &
timeline
● Created financial projections
● With some other business considerations
9
Financial modeling
● 10-year time horizon to avoid timing artifacts
● Compared on premise and multiple cloud
scenarios
● Costs of migration and long-term
● Long-term price/performance curves
(e.g. Moore’s Law, historical pricing)
● Two independent models to avoid model
errors
10
● An immediate all-in migration at Twitter scale
is: expensive, distracting, and risky
● More value from new architectures and
transformation, so start smaller and learn as
we go
● Hadoop offered several important, specific
benefits with lower risk
● We gained confidence in our investments in
both cloud projects and data centers
What we found
11
>1.4T
Messages Per Day
>500K
Compute Cores
>300PB
Logical Storage
Hadoop@Twitter scale
>12,500
Peak Cluster Size
12
Type Use Compute %
Real-time Critical performance production jobs
with dedicated capacity
10%
Processing Regularly scheduled production jobs
with dedicated capacity
60%
Ad-hoc One off / ad-hoc queries and analysis 30%
Cold Dense storage clusters, not for compute minimal
Twitter Hadoop cluster types
13
Twitter Hadoop challenges
1. Scaling: Significant YoY Compute & Storage growth
2. Hardware: Designing, building, maintaining & operating
3. Capacity Planning: Hard to predict for adhoc especially
4. Agility: Must respond fast especially for adhoc compute
5. Deployment: Must deploy at scale and in-flight
6. Network: Both cross-DC and cross-cluster
7. Disaster Recovery: Durable copies needed in 2+ DCs
14
Twitter Hadoop requirements
● Network sustained bandwidth per core
● Disk (data) sustained bandwidth per core
● Large sequential reads & writes
● Throughput not latency
● Capacity
● CPU / RAM not usually the bottleneck
● Consistency of datasets (set of HDFS files)
15
Twitter Hadoop on premise hardware
numbers
Clusters: 10 to 10K nodes
Network: 10G moving to 25G
Data Disks: 24T-72T over 12 HDDs
CPU: 8 cores with 64G memory
I/O: Network: ~20MB/s
sustained, peaks of 10x
HDFS read: 20 rq/s sustained,
peaks of 3x
HDFS write: large variation, 16
2. Twitter Hadoop on
cloud VMs
Durable storage: cloud
object store
Scratch storage:
a. with HDFS over
cloud object store
b. with HDFS on cloud
block store
c. with HDFS on local
disks
1. Hadoop-as-a-
Service
(HaaS) from the
cloud
provider
Cloud architectural options
17
2. Functional Test
Gridmix: IO + Compute
● Capture of real
production cluster
workload (1k-5k jobs)
● Replays reads, writes,
shuffles, compute
Testing plan
1. Baseline Tests
● TestDFSIO:
low level IO read/write
● Teragen:
measure maximum
write rate
● Terasort:
read, shuffle, write
18
HDFS configurations tested
Availability
● Critical data: 2 regions
● Other data: 2 zones
Each type of Object, Block
and Local Storage
Dataset consistency
Test cloud provider choices:
1. object store
2. object store with external
consistency service
19
Hadoop Evaluation
20
GCP HaaS: DataProc config
● Hadoop 2.7.2
● Performance tests with 800 vCPUs:
○ 100 x n1-standard-8 (8 VCPU, 30G memory)
○ 200 x n1-standard-4 (4 VCPU, 30G memory)
● Scale test with 8000 vCPUs:
○ 1000 x n1-standard-8 (8 vCPU, 30G memory)
● Modeled average CPU and average to peak CPU.
● No preemptible instances in initial work
● Similar to on premise hardware SKUs
21
Decided to use DataProc
for evaluation.
Durable
Storage
Scratch
Storage
HDFS Speedup vs on premise
(normalized by IO-per-core)
Cloud
Storage
Local SSD 3 x 375G SSD ~2x (but expensive)
Cloud
Storage
PD-HDD 1.5TB PD-HDD ~1x
None PD-HDD 1.5TB PD-HDD ~1x
DataProc 100 x n1-standard-8 Results
Tuned Compute Engine instance types to get the optimum balance of
network : cores : storage (this changes over time)
22
Durable
Storage
Scratch
Storage
HDFS Speedup vs on premise
(normalized by IO-per-core)
Cloud
Storage
Local SSD 2 x 375G SSD ~2x (but expensive)
Cloud
Storage
PD-HDD 1.5TB PD-HDD 1.4x
DataProc 200 x n1-standard-4 Results
23
Benchmark Findings
1. Application Benchmarks
are critical
Total job time is composed of
multiple steps. We found
variation both better and worse
at each step.
Recommendation: You should
rely on an application
benchmark like GridMix rather
than micro-benchmarks.
2. Can treat network
storage like local disk
Both Cloud Storage and PD
offered nearly as much
bandwidth as typical direct
attached HDDs on premise
24
Functional Test Findings
1. Live Migration of VMs was not noticeable
during Hadoop testing. It was during other
Twitter platform testing of Compute Engine
(cache at very high rps of small objects)
2. Cloud Storage checksum vs HDFS checksum.
Fixed via HDFS-13056 in collaboration with
Google
3. fsync() system call on Local SSD was slow
(fixed)
25
Evaluation Outcomes
26
+ Leads to the fastest migration
+ Limits duplication of costs during migration period
- Introduces significant tech debt post-migration
- Requires a major rearchitecture post-migration to
capture benefits of cloud
- Concerns around overall cost, risk, and distraction of this
approach at Twitter scale
Life-and-Shift
everything
Disqualified Lift-and-Shift *Everything*
27
● Separable with fewer dependencies
● Standard open source software:
○ Continue to develop in house and run on premise
○ Reduces lock-in risk
● Rearchitecting is achievable
○ Not a lift-and-shift
● Data in Cloud Storage:
○ Enables broader diversity of data processing
frameworks and services
● Long-term bet on Google’s Big Data ecosystem
Hadoop to Cloud was Interesting
28
Separate Hadoop Compute and Storage
● Scaling the dimensions independently
● Makes it easy to run multiple clusters and processing
frameworks over the same data
● Virtual network and project primitives provide
segmentation of access and cost structures.
● State is preserved in Cloud Storage therefore
deployments, upgrades, and testing are simpler
● Can treat storage as a commodity
Enables
29
1. Cold Cluster
● Storage: Cloud Storage
● Compute: Limited
ephemeral Dataproc an
option
● Scaling: mostly storage
driven
2. Ad-Hoc Clusters
● Storage: Cloud Storage
● Compute: Compute
Engine and Twitter build
of Hadoop (long running
clusters)
● Scaling: mixture, with
spiky compute
Twitter Hadoop Rearchitected for Cloud
30
Twitter production Hadoop remains on premise
● Not as separable from other production workloads
● Focusing on non-production workloads limits our risk
● Regular compute-intensive usage patterns
● Benefits more from purpose built hardware
● Fewer processing frameworks are needed
31
Twitter Strategic Benefits
● Next-generation architecture with numerous
enhancements:
○ security, encryption, isolation, live migration
● Leverage Google’s capacity and R&D
● Larger ecosystem of open source & cloud software
● Long-term strategic collaboration with Google
● Beachhead that enable teams across Twitter to make
tactical cloud adoption decisions
What does this do
overall for Twitter?
32
Infrastructure benefits
● Large-scale ad-hoc
analysis and backfills
● Cloud Storage avoids
HDFS limits
● Offsite Backup
● Increases availability of
cold data
Twitter Functional Benefits
Platform benefits
● Built-in compliance
support (e.g. SOX)
● Direct chargeback using
Project
● Simplified retention
● GCP services such as
BigQuery, Spanner,
Cloud ML, TPUs, etc
33
Finding: At Twitter Scale, Cloud has limits
● Cloud providers have limits for all sorts of things
and we often need them increased.
● Cloud HaaS do not generally support 10K node
hadoop clusters
● Dynamic scaling down < O(days) is not yet
feasible / cost-effective with current Hadoop at
Twitter scale
● Capacity planning with cloud providers is
encouraged for O(10K) vCPU deltas and required
for O(100K) vCPU deltas
34
What we are working on now
❏ Finalizing bucket & user creation and IAM designs
❏ Building replication, cluster deployment, and data
management software
❏ Hadoop Cloud Storage connector improvements
continue (open source)
❏ Retention and “directory” / dataset atomicity in GCS
35
✓ Foundational network
(8x100Gbps)
✓ Copy cluster
✓ Copying PBs of data to the
cloud
✓ Early Presto analytics use
case: up to 100K-core
Dataproc cluster querying
15PB dataset in Cloud
Storage
Recommendations
and Conclusion
36
3. Ensure migration plan
captures benefits
Lift-and-shift may not deliver
value in all cases.
Substantial iteration is required
to balance tactical migration
work with long-term strategy.
2. Compare application
benchmark costs
Compare the cost of running an
application using benchmark
results. Don’t just look at
pricing pages.
e.g. the network is hugely
important to performance.
1. Run the most informative
tests
Application-level
benchmarking (e.g. GridMix)
Scale testing
Recommendations
37
2. Cloud adoption
is complex
Finding separable workloads
can be a challenge.
Architectural choices are non-
obvious.
Methodical evaluation is well-
worth the effort.
1. Separate compute and
storage is a real thing
The better the network, the less
locality matters.
Life gets much easier when
Compute can be stateless.
You can treat PD like direct
attached HDDs.
Conclusions
3. Very early in this process
and lots more to come
We’re excited to be gaining
experience with the platform
and learning from everyone.
38
Thank You
Questions?
39

Contenu connexe

Tendances

Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataNicolas Poggi
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Codemotion
 
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityData Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityDataWorks Summit
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
Spark, spark streaming & tachyon
Spark, spark streaming & tachyonSpark, spark streaming & tachyon
Spark, spark streaming & tachyonJohan hong
 
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...GEO Analytics Canada
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBencht_ivanov
 
The hadoop ecosystem table
The hadoop ecosystem tableThe hadoop ecosystem table
The hadoop ecosystem tableMohamed Magdy
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020GEO Analytics Canada
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBencht_ivanov
 
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopHadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopYahoo Developer Network
 
Sharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsSharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsDataWorks Summit
 
Lessons Learned on Benchmarking Big Data Platforms
Lessons Learned on Benchmarking  Big Data PlatformsLessons Learned on Benchmarking  Big Data Platforms
Lessons Learned on Benchmarking Big Data Platformst_ivanov
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Data set cloudrank-d-hpca_tutorial
Data set cloudrank-d-hpca_tutorialData set cloudrank-d-hpca_tutorial
Data set cloudrank-d-hpca_tutorialaminnezarat
 
Accelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph ObjectsAccelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph ObjectsAlluxio, Inc.
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData, Inc.
 

Tendances (20)

Benchmarking Hadoop and Big Data
Benchmarking Hadoop and Big DataBenchmarking Hadoop and Big Data
Benchmarking Hadoop and Big Data
 
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
 
Data Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data SecurityData Gloveboxes: A Philosophy of Data Science Data Security
Data Gloveboxes: A Philosophy of Data Science Data Security
 
HDF Update 2016
HDF Update 2016HDF Update 2016
HDF Update 2016
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
Spark, spark streaming & tachyon
Spark, spark streaming & tachyonSpark, spark streaming & tachyon
Spark, spark streaming & tachyon
 
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
 
BDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBenchBDSE 2015 Evaluation of Big Data Platforms with HiBench
BDSE 2015 Evaluation of Big Data Platforms with HiBench
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
The hadoop ecosystem table
The hadoop ecosystem tableThe hadoop ecosystem table
The hadoop ecosystem table
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
 
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopHadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
 
Sharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloadsSharing resources with non-Hadoop workloads
Sharing resources with non-Hadoop workloads
 
Lessons Learned on Benchmarking Big Data Platforms
Lessons Learned on Benchmarking  Big Data PlatformsLessons Learned on Benchmarking  Big Data Platforms
Lessons Learned on Benchmarking Big Data Platforms
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Data set cloudrank-d-hpca_tutorial
Data set cloudrank-d-hpca_tutorialData set cloudrank-d-hpca_tutorial
Data set cloudrank-d-hpca_tutorial
 
Accelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph ObjectsAccelerating Data Computation on Ceph Objects
Accelerating Data Computation on Ceph Objects
 
Realtime analytics with_hadoop
Realtime analytics with_hadoopRealtime analytics with_hadoop
Realtime analytics with_hadoop
 
BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)BlueData and Hortonworks Data Platform (HDP)
BlueData and Hortonworks Data Platform (HDP)
 

Similaire à How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu

Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Getting started with GCP ( Google Cloud Platform)
Getting started with GCP ( Google  Cloud Platform)Getting started with GCP ( Google  Cloud Platform)
Getting started with GCP ( Google Cloud Platform)bigdata trunk
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successDataWorks Summit
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAlluxio, Inc.
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Sumeet Singh
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Getting more into GCP.pdf
Getting more into GCP.pdfGetting more into GCP.pdf
Getting more into GCP.pdfKnoldus Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
The Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgThe Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgJohn Mark Walker
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraAlluxio, Inc.
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Dockerharidasnss
 
Google Cloud - Stand Out Features
Google Cloud - Stand Out FeaturesGoogle Cloud - Stand Out Features
Google Cloud - Stand Out FeaturesGDG Cloud Bengaluru
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Global Business Events
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
Geo-distributed Analytics with NetApp StorageGRID and Alluxio
Geo-distributed Analytics with NetApp StorageGRID and AlluxioGeo-distributed Analytics with NetApp StorageGRID and Alluxio
Geo-distributed Analytics with NetApp StorageGRID and AlluxioAlluxio, Inc.
 

Similaire à How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu (20)

Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Getting started with GCP ( Google Cloud Platform)
Getting started with GCP ( Google  Cloud Platform)Getting started with GCP ( Google  Cloud Platform)
Getting started with GCP ( Google Cloud Platform)
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Getting more into GCP.pdf
Getting more into GCP.pdfGetting more into GCP.pdf
Getting more into GCP.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
The Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgThe Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.org
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Accelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud EraAccelerate Analytics and ML in the Hybrid Cloud Era
Accelerate Analytics and ML in the Hybrid Cloud Era
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Docker
 
Google Cloud - Stand Out Features
Google Cloud - Stand Out FeaturesGoogle Cloud - Stand Out Features
Google Cloud - Stand Out Features
 
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
Justin Sheppard & Ankur Gupta from Sears Holdings Corporation - Single point ...
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Geo-distributed Analytics with NetApp StorageGRID and Alluxio
Geo-distributed Analytics with NetApp StorageGRID and AlluxioGeo-distributed Analytics with NetApp StorageGRID and Alluxio
Geo-distributed Analytics with NetApp StorageGRID and Alluxio
 

Plus de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...Yahoo Developer Network
 

Plus de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
 

Dernier

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Dernier (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu

  • 1. How @twitterhadoop chose Google Cloud Joep Rottinghuis & Lohit VijayaRenu Twitter Hadoop Team (@twitterhadoop) 1
  • 2. 1. Twitter infrastructure 2. Hadoop evaluation 3. Evaluation outcomes 4. Recommendations and conclusions 5. Q&A Credit to presentation at GoogleNext 2019 by Derek Lyon & Dave Beckett (https://youtu.be/4FLFcWgZdo4) 2
  • 4. Twitter’s infrastructure ● Twitter founded in 2006 ● Global-scale application ● Unique scale and performance characteristics ● Real-time ● Built to purpose and well optimized ● Large data centers 4
  • 5. Strategic questions 1. What is the long-term mix of cloud versus datacenter? 2. Which cloud provider(s) should we use? 3. How can we be confident in this type of decision? 4. Why should we evaluate this now (2016)? 5
  • 6. Tactical questions 1. What is the feasibility and cost of large-scale adoption? 2. Which workloads are best-suited for the cloud and are they separable? 3. How would our architecture change on the cloud? 4. How do we get to an actionable plan? 6
  • 7. Evaluation process ● Started evaluation in 2016 ● Were able to make a patient, rigorous decision ● Defined baseline workload requirements ● Engaged major providers ● Analyzed clouds for each major workload ● Built overall cloud plan ● Iterated and optimized choices 7
  • 8. Evaluation Timeline Considering Moving ● PoC’s Completed & Results Delivered ● Legal Agreement with T&C’s ratified ● Kickoff dataproc, bigquery, dataflow experimentation ● Security and Platform Review ● v1 Hadoop on GCP Architecture Ratified ● Begin build for migration plan ● Consensus built with Product, Revenue, Eng ● Migration Kickoff ● Proposal to migrate Hadoop to GCP formally accepted June ‘16 ● Initial Cloud RfP release ● 27 Synthetic PoC’s on GCP begin ● Testing Projects / Network established Sept ‘16 Mar ‘17 July ‘17 Nov ‘17 Jan ‘18 Apr ‘18 June ‘18 8
  • 9. Built overall cloud plan ● Created a series of candidate architectures for each platform with their resource requirements ● Developed a migration project plan & timeline ● Created financial projections ● With some other business considerations 9
  • 10. Financial modeling ● 10-year time horizon to avoid timing artifacts ● Compared on premise and multiple cloud scenarios ● Costs of migration and long-term ● Long-term price/performance curves (e.g. Moore’s Law, historical pricing) ● Two independent models to avoid model errors 10
  • 11. ● An immediate all-in migration at Twitter scale is: expensive, distracting, and risky ● More value from new architectures and transformation, so start smaller and learn as we go ● Hadoop offered several important, specific benefits with lower risk ● We gained confidence in our investments in both cloud projects and data centers What we found 11
  • 12. >1.4T Messages Per Day >500K Compute Cores >300PB Logical Storage Hadoop@Twitter scale >12,500 Peak Cluster Size 12
  • 13. Type Use Compute % Real-time Critical performance production jobs with dedicated capacity 10% Processing Regularly scheduled production jobs with dedicated capacity 60% Ad-hoc One off / ad-hoc queries and analysis 30% Cold Dense storage clusters, not for compute minimal Twitter Hadoop cluster types 13
  • 14. Twitter Hadoop challenges 1. Scaling: Significant YoY Compute & Storage growth 2. Hardware: Designing, building, maintaining & operating 3. Capacity Planning: Hard to predict for adhoc especially 4. Agility: Must respond fast especially for adhoc compute 5. Deployment: Must deploy at scale and in-flight 6. Network: Both cross-DC and cross-cluster 7. Disaster Recovery: Durable copies needed in 2+ DCs 14
  • 15. Twitter Hadoop requirements ● Network sustained bandwidth per core ● Disk (data) sustained bandwidth per core ● Large sequential reads & writes ● Throughput not latency ● Capacity ● CPU / RAM not usually the bottleneck ● Consistency of datasets (set of HDFS files) 15
  • 16. Twitter Hadoop on premise hardware numbers Clusters: 10 to 10K nodes Network: 10G moving to 25G Data Disks: 24T-72T over 12 HDDs CPU: 8 cores with 64G memory I/O: Network: ~20MB/s sustained, peaks of 10x HDFS read: 20 rq/s sustained, peaks of 3x HDFS write: large variation, 16
  • 17. 2. Twitter Hadoop on cloud VMs Durable storage: cloud object store Scratch storage: a. with HDFS over cloud object store b. with HDFS on cloud block store c. with HDFS on local disks 1. Hadoop-as-a- Service (HaaS) from the cloud provider Cloud architectural options 17
  • 18. 2. Functional Test Gridmix: IO + Compute ● Capture of real production cluster workload (1k-5k jobs) ● Replays reads, writes, shuffles, compute Testing plan 1. Baseline Tests ● TestDFSIO: low level IO read/write ● Teragen: measure maximum write rate ● Terasort: read, shuffle, write 18
  • 19. HDFS configurations tested Availability ● Critical data: 2 regions ● Other data: 2 zones Each type of Object, Block and Local Storage Dataset consistency Test cloud provider choices: 1. object store 2. object store with external consistency service 19
  • 21. GCP HaaS: DataProc config ● Hadoop 2.7.2 ● Performance tests with 800 vCPUs: ○ 100 x n1-standard-8 (8 VCPU, 30G memory) ○ 200 x n1-standard-4 (4 VCPU, 30G memory) ● Scale test with 8000 vCPUs: ○ 1000 x n1-standard-8 (8 vCPU, 30G memory) ● Modeled average CPU and average to peak CPU. ● No preemptible instances in initial work ● Similar to on premise hardware SKUs 21 Decided to use DataProc for evaluation.
  • 22. Durable Storage Scratch Storage HDFS Speedup vs on premise (normalized by IO-per-core) Cloud Storage Local SSD 3 x 375G SSD ~2x (but expensive) Cloud Storage PD-HDD 1.5TB PD-HDD ~1x None PD-HDD 1.5TB PD-HDD ~1x DataProc 100 x n1-standard-8 Results Tuned Compute Engine instance types to get the optimum balance of network : cores : storage (this changes over time) 22
  • 23. Durable Storage Scratch Storage HDFS Speedup vs on premise (normalized by IO-per-core) Cloud Storage Local SSD 2 x 375G SSD ~2x (but expensive) Cloud Storage PD-HDD 1.5TB PD-HDD 1.4x DataProc 200 x n1-standard-4 Results 23
  • 24. Benchmark Findings 1. Application Benchmarks are critical Total job time is composed of multiple steps. We found variation both better and worse at each step. Recommendation: You should rely on an application benchmark like GridMix rather than micro-benchmarks. 2. Can treat network storage like local disk Both Cloud Storage and PD offered nearly as much bandwidth as typical direct attached HDDs on premise 24
  • 25. Functional Test Findings 1. Live Migration of VMs was not noticeable during Hadoop testing. It was during other Twitter platform testing of Compute Engine (cache at very high rps of small objects) 2. Cloud Storage checksum vs HDFS checksum. Fixed via HDFS-13056 in collaboration with Google 3. fsync() system call on Local SSD was slow (fixed) 25
  • 27. + Leads to the fastest migration + Limits duplication of costs during migration period - Introduces significant tech debt post-migration - Requires a major rearchitecture post-migration to capture benefits of cloud - Concerns around overall cost, risk, and distraction of this approach at Twitter scale Life-and-Shift everything Disqualified Lift-and-Shift *Everything* 27
  • 28. ● Separable with fewer dependencies ● Standard open source software: ○ Continue to develop in house and run on premise ○ Reduces lock-in risk ● Rearchitecting is achievable ○ Not a lift-and-shift ● Data in Cloud Storage: ○ Enables broader diversity of data processing frameworks and services ● Long-term bet on Google’s Big Data ecosystem Hadoop to Cloud was Interesting 28
  • 29. Separate Hadoop Compute and Storage ● Scaling the dimensions independently ● Makes it easy to run multiple clusters and processing frameworks over the same data ● Virtual network and project primitives provide segmentation of access and cost structures. ● State is preserved in Cloud Storage therefore deployments, upgrades, and testing are simpler ● Can treat storage as a commodity Enables 29
  • 30. 1. Cold Cluster ● Storage: Cloud Storage ● Compute: Limited ephemeral Dataproc an option ● Scaling: mostly storage driven 2. Ad-Hoc Clusters ● Storage: Cloud Storage ● Compute: Compute Engine and Twitter build of Hadoop (long running clusters) ● Scaling: mixture, with spiky compute Twitter Hadoop Rearchitected for Cloud 30
  • 31. Twitter production Hadoop remains on premise ● Not as separable from other production workloads ● Focusing on non-production workloads limits our risk ● Regular compute-intensive usage patterns ● Benefits more from purpose built hardware ● Fewer processing frameworks are needed 31
  • 32. Twitter Strategic Benefits ● Next-generation architecture with numerous enhancements: ○ security, encryption, isolation, live migration ● Leverage Google’s capacity and R&D ● Larger ecosystem of open source & cloud software ● Long-term strategic collaboration with Google ● Beachhead that enable teams across Twitter to make tactical cloud adoption decisions What does this do overall for Twitter? 32
  • 33. Infrastructure benefits ● Large-scale ad-hoc analysis and backfills ● Cloud Storage avoids HDFS limits ● Offsite Backup ● Increases availability of cold data Twitter Functional Benefits Platform benefits ● Built-in compliance support (e.g. SOX) ● Direct chargeback using Project ● Simplified retention ● GCP services such as BigQuery, Spanner, Cloud ML, TPUs, etc 33
  • 34. Finding: At Twitter Scale, Cloud has limits ● Cloud providers have limits for all sorts of things and we often need them increased. ● Cloud HaaS do not generally support 10K node hadoop clusters ● Dynamic scaling down < O(days) is not yet feasible / cost-effective with current Hadoop at Twitter scale ● Capacity planning with cloud providers is encouraged for O(10K) vCPU deltas and required for O(100K) vCPU deltas 34
  • 35. What we are working on now ❏ Finalizing bucket & user creation and IAM designs ❏ Building replication, cluster deployment, and data management software ❏ Hadoop Cloud Storage connector improvements continue (open source) ❏ Retention and “directory” / dataset atomicity in GCS 35 ✓ Foundational network (8x100Gbps) ✓ Copy cluster ✓ Copying PBs of data to the cloud ✓ Early Presto analytics use case: up to 100K-core Dataproc cluster querying 15PB dataset in Cloud Storage
  • 37. 3. Ensure migration plan captures benefits Lift-and-shift may not deliver value in all cases. Substantial iteration is required to balance tactical migration work with long-term strategy. 2. Compare application benchmark costs Compare the cost of running an application using benchmark results. Don’t just look at pricing pages. e.g. the network is hugely important to performance. 1. Run the most informative tests Application-level benchmarking (e.g. GridMix) Scale testing Recommendations 37
  • 38. 2. Cloud adoption is complex Finding separable workloads can be a challenge. Architectural choices are non- obvious. Methodical evaluation is well- worth the effort. 1. Separate compute and storage is a real thing The better the network, the less locality matters. Life gets much easier when Compute can be stateless. You can treat PD like direct attached HDDs. Conclusions 3. Very early in this process and lots more to come We’re excited to be gaining experience with the platform and learning from everyone. 38

Notes de l'éditeur

  1. Not building our own cloud anytime soon
  2. Cloud Whitepapers are not showing twitter scale.
  3. Substantial effort for us Refresh cycles including modernizing infra
  4. At the time of comparison this is true for production clusters. Current HW is 12x6THDD, 128G RAM, 25Gbps NIC