SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
Big Data in a public cloud
Maximize RAM without paying for
overhead CPU and other price/
performance tricks
Big Data
• Big Data: just another way of saying large data sets
• All of them are so large its difficult to manage with
traditional tools
• Distributed computing is an approach to solve that
problem
• First the data needs to be Mapped
• Then it can be analyzed – or Reduced
MapReduce
• Instead of talking about it, lets just see some
code
Map/Reduce – distributed work
But that’s perfect for public cloud!
CPU / RAM
• Mapping – is not CPU intensive
• Reducing – is (usually) not CPU intensive
• Speed: load data in RAM or it hits the HDD
and creates iowait
First point – a lot of RAM
Ah…now they have RAM instances
I wanted more RAM I said L
What about MapReduce PaaS
• Be aware of data lock in
• Be aware of forced tool set – limits your
workflow
• …and I just don’t like it so much as a
developer; it’s a control thing
There is another way
What we do
What is compute like – why commodity?
Now lets get to work
• Chef & Puppet – deploy and config
• Clone; scale up and down
• Problem solved?
And then there is reality…iowait
ioWAIT
• 1,000 iops on a high end enterprise HDD
• 500,000 iops on a high end SSD
• Public Cloud should be forbidden on HDD
Announcing: SSD priced as HDD
Who and How does workload?
• CERN – the LHC experiments to find the Higgs
Boson
• 200 Petabytes per year
• 200,000 CPU days / day on hundreds of partner
grids and clouds
• Montecarlo simulations (CPU intensive)
The CERN case
• Golden VM images with tools, from which they clone
• A set of coordinator servers, which scales the worker nodes
up and down via provisioning API (clone, puppet config)
• Coordinator servers manages workload on each worker
node doing Montecarlo simulations
• Federated cloud brokers such as Enstratus and Slipstream
• Self healing architecture – uptime of a worker node is not
critical; just spin up a new worker node instance
The loyalty program case
• The customer runs various loyalty programs
• Plain vanilla Hadoop Map/Reduce:
• Chef and Puppet to deploy and config worker nodes
• Lots of RAM, little CPU
• Self healing architecture – uptime of a worker node is
not critical; just spin up a new worker node instance
Find IaaS suppliers with
• Unbundled resources
• Short billing cycles
• New equipment: what’s your life cycle?
• SSD as compute storage
• Allows cross connects from your equipment in the
DC
• CloudSigma, ElasticHost, ProfitBricks to mention a
few
What we do

Contenu connexe

Tendances

Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11aseager
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsAltoros
 
Hadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsHadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsandrewdenty
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architectureSohil Jain
 
Google Compute Engine
Google Compute EngineGoogle Compute Engine
Google Compute EngineCsaba Toth
 
Matt Rechenburg - Save big bucks with Cloud Computing
Matt Rechenburg - Save big bucks with Cloud ComputingMatt Rechenburg - Save big bucks with Cloud Computing
Matt Rechenburg - Save big bucks with Cloud ComputingCloudCamp Hamburg
 
Why The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best FriendWhy The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best FriendYannick Pouliot
 
To Cloud or Not To Cloud?
To Cloud or Not To Cloud?To Cloud or Not To Cloud?
To Cloud or Not To Cloud?Greg Lindahl
 
GRU: Taming a Herd of Wild Servers - Oz Katz, Similarweb - DevOpsDays Tel Avi...
GRU: Taming a Herd of Wild Servers - Oz Katz, Similarweb - DevOpsDays Tel Avi...GRU: Taming a Herd of Wild Servers - Oz Katz, Similarweb - DevOpsDays Tel Avi...
GRU: Taming a Herd of Wild Servers - Oz Katz, Similarweb - DevOpsDays Tel Avi...DevOpsDays Tel Aviv
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Jay Patel
 
vBrownBag @ VMworld - Apache CloudStack (ACS) & vSphere
vBrownBag @ VMworld - Apache CloudStack (ACS) & vSpherevBrownBag @ VMworld - Apache CloudStack (ACS) & vSphere
vBrownBag @ VMworld - Apache CloudStack (ACS) & vSphereAaron Delp
 
Empowering Amazon EC2 with GigaSpaces XAP
Empowering Amazon EC2 with GigaSpaces XAPEmpowering Amazon EC2 with GigaSpaces XAP
Empowering Amazon EC2 with GigaSpaces XAPUri Cohen
 
HybridAzureCloud
HybridAzureCloudHybridAzureCloud
HybridAzureCloudChris Condo
 
Geocloud blue raster web mapping cloud deployment lessons from the field 201...
Geocloud blue raster web mapping cloud deployment  lessons from the field 201...Geocloud blue raster web mapping cloud deployment  lessons from the field 201...
Geocloud blue raster web mapping cloud deployment lessons from the field 201...Amazon Web Services
 

Tendances (18)

Data storage for the cloud ce11
Data storage for the cloud ce11Data storage for the cloud ce11
Data storage for the cloud ce11
 
What is Cloud Computing?
What is Cloud Computing?What is Cloud Computing?
What is Cloud Computing?
 
Architecture et coût
Architecture et coûtArchitecture et coût
Architecture et coût
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
 
Hadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAsHadoop - An introduction for SQL Server DBAs
Hadoop - An introduction for SQL Server DBAs
 
Running MySQL in AWS
Running MySQL in AWSRunning MySQL in AWS
Running MySQL in AWS
 
Spark introduction and architecture
Spark introduction and architectureSpark introduction and architecture
Spark introduction and architecture
 
Google Compute Engine
Google Compute EngineGoogle Compute Engine
Google Compute Engine
 
Performance stack
Performance stackPerformance stack
Performance stack
 
Matt Rechenburg - Save big bucks with Cloud Computing
Matt Rechenburg - Save big bucks with Cloud ComputingMatt Rechenburg - Save big bucks with Cloud Computing
Matt Rechenburg - Save big bucks with Cloud Computing
 
Why The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best FriendWhy The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best Friend
 
To Cloud or Not To Cloud?
To Cloud or Not To Cloud?To Cloud or Not To Cloud?
To Cloud or Not To Cloud?
 
GRU: Taming a Herd of Wild Servers - Oz Katz, Similarweb - DevOpsDays Tel Avi...
GRU: Taming a Herd of Wild Servers - Oz Katz, Similarweb - DevOpsDays Tel Avi...GRU: Taming a Herd of Wild Servers - Oz Katz, Similarweb - DevOpsDays Tel Avi...
GRU: Taming a Herd of Wild Servers - Oz Katz, Similarweb - DevOpsDays Tel Avi...
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
 
vBrownBag @ VMworld - Apache CloudStack (ACS) & vSphere
vBrownBag @ VMworld - Apache CloudStack (ACS) & vSpherevBrownBag @ VMworld - Apache CloudStack (ACS) & vSphere
vBrownBag @ VMworld - Apache CloudStack (ACS) & vSphere
 
Empowering Amazon EC2 with GigaSpaces XAP
Empowering Amazon EC2 with GigaSpaces XAPEmpowering Amazon EC2 with GigaSpaces XAP
Empowering Amazon EC2 with GigaSpaces XAP
 
HybridAzureCloud
HybridAzureCloudHybridAzureCloud
HybridAzureCloud
 
Geocloud blue raster web mapping cloud deployment lessons from the field 201...
Geocloud blue raster web mapping cloud deployment  lessons from the field 201...Geocloud blue raster web mapping cloud deployment  lessons from the field 201...
Geocloud blue raster web mapping cloud deployment lessons from the field 201...
 

En vedette

Forecast 2012: Cloud Storm Keynote Andy Stokes
Forecast 2012: Cloud Storm Keynote Andy StokesForecast 2012: Cloud Storm Keynote Andy Stokes
Forecast 2012: Cloud Storm Keynote Andy StokesOpen Data Center Alliance
 
Seeing Shapes in the Cloud - BLUG 2011
Seeing Shapes in the Cloud - BLUG 2011Seeing Shapes in the Cloud - BLUG 2011
Seeing Shapes in the Cloud - BLUG 2011Chris Miller
 
Media Workloads in the Cloud
Media Workloads in the CloudMedia Workloads in the Cloud
Media Workloads in the CloudCloudSigma
 
Technologent-CS-Disaster_Recovery_Business_Continuity-Energy
Technologent-CS-Disaster_Recovery_Business_Continuity-EnergyTechnologent-CS-Disaster_Recovery_Business_Continuity-Energy
Technologent-CS-Disaster_Recovery_Business_Continuity-EnergyLorraine Barrows
 
Road towards multicloud
Road towards multicloudRoad towards multicloud
Road towards multicloudTarmo Ploom
 
Your compelling 60 second elevator pitch
Your compelling 60 second elevator pitchYour compelling 60 second elevator pitch
Your compelling 60 second elevator pitchScott Hammond
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterAltoros
 
CloudLightning - Multiclouds: Challenges and Current Solutions
CloudLightning - Multiclouds: Challenges and Current SolutionsCloudLightning - Multiclouds: Challenges and Current Solutions
CloudLightning - Multiclouds: Challenges and Current SolutionsCloudLightning
 
How Choice Hotels Aligned IT and Business Through Common Metrics - AppSphere16
How Choice Hotels Aligned IT and Business Through Common Metrics - AppSphere16How Choice Hotels Aligned IT and Business Through Common Metrics - AppSphere16
How Choice Hotels Aligned IT and Business Through Common Metrics - AppSphere16AppDynamics
 

En vedette (11)

Forecast 2012: Cloud Storm Keynote Andy Stokes
Forecast 2012: Cloud Storm Keynote Andy StokesForecast 2012: Cloud Storm Keynote Andy Stokes
Forecast 2012: Cloud Storm Keynote Andy Stokes
 
Seeing Shapes in the Cloud - BLUG 2011
Seeing Shapes in the Cloud - BLUG 2011Seeing Shapes in the Cloud - BLUG 2011
Seeing Shapes in the Cloud - BLUG 2011
 
Media Workloads in the Cloud
Media Workloads in the CloudMedia Workloads in the Cloud
Media Workloads in the Cloud
 
Technologent-CS-Disaster_Recovery_Business_Continuity-Energy
Technologent-CS-Disaster_Recovery_Business_Continuity-EnergyTechnologent-CS-Disaster_Recovery_Business_Continuity-Energy
Technologent-CS-Disaster_Recovery_Business_Continuity-Energy
 
Road towards multicloud
Road towards multicloudRoad towards multicloud
Road towards multicloud
 
OpenCms Days 2012 - OpenCms on open clouds
OpenCms Days 2012 - OpenCms on open cloudsOpenCms Days 2012 - OpenCms on open clouds
OpenCms Days 2012 - OpenCms on open clouds
 
Your compelling 60 second elevator pitch
Your compelling 60 second elevator pitchYour compelling 60 second elevator pitch
Your compelling 60 second elevator pitch
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop Cluster
 
An approach for migrating applications to interoperability cloud
An approach for migrating applications to interoperability cloudAn approach for migrating applications to interoperability cloud
An approach for migrating applications to interoperability cloud
 
CloudLightning - Multiclouds: Challenges and Current Solutions
CloudLightning - Multiclouds: Challenges and Current SolutionsCloudLightning - Multiclouds: Challenges and Current Solutions
CloudLightning - Multiclouds: Challenges and Current Solutions
 
How Choice Hotels Aligned IT and Business Through Common Metrics - AppSphere16
How Choice Hotels Aligned IT and Business Through Common Metrics - AppSphere16How Choice Hotels Aligned IT and Business Through Common Metrics - AppSphere16
How Choice Hotels Aligned IT and Business Through Common Metrics - AppSphere16
 

Similaire à Big Data in a Public Cloud

Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 
Managing Performance in the Cloud
Managing Performance in the CloudManaging Performance in the Cloud
Managing Performance in the CloudDevOpsGroup
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interactionGovind Kanshi
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsDataWorks Summit
 
Architecting for the cloud scability-availability
Architecting for the cloud scability-availabilityArchitecting for the cloud scability-availability
Architecting for the cloud scability-availabilityLen Bass
 
eHarmony in the Cloud
eHarmony in the CloudeHarmony in the Cloud
eHarmony in the CloudCraig Dickson
 
IBM Power capacity planning
IBM Power capacity planningIBM Power capacity planning
IBM Power capacity planningPavel Hampl
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2aswini pilli
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowAndrew Miller
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 

Similaire à Big Data in a Public Cloud (20)

Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
Managing Performance in the Cloud
Managing Performance in the CloudManaging Performance in the Cloud
Managing Performance in the Cloud
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)Mtc learnings from isv & enterprise (dated - Dec -2014)
Mtc learnings from isv & enterprise (dated - Dec -2014)
 
Mtc learnings from isv & enterprise interaction
Mtc learnings from isv & enterprise  interactionMtc learnings from isv & enterprise  interaction
Mtc learnings from isv & enterprise interaction
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop Deployments
 
PaaS with Java
PaaS with JavaPaaS with Java
PaaS with Java
 
Architecting for the cloud scability-availability
Architecting for the cloud scability-availabilityArchitecting for the cloud scability-availability
Architecting for the cloud scability-availability
 
Big Data training
Big Data trainingBig Data training
Big Data training
 
eHarmony in the Cloud
eHarmony in the CloudeHarmony in the Cloud
eHarmony in the Cloud
 
IBM Power capacity planning
IBM Power capacity planningIBM Power capacity planning
IBM Power capacity planning
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Virtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - VarrowVirtualizing Tier One Applications - Varrow
Virtualizing Tier One Applications - Varrow
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 

Dernier

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Dernier (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Big Data in a Public Cloud

  • 1. Big Data in a public cloud Maximize RAM without paying for overhead CPU and other price/ performance tricks
  • 2. Big Data • Big Data: just another way of saying large data sets • All of them are so large its difficult to manage with traditional tools • Distributed computing is an approach to solve that problem • First the data needs to be Mapped • Then it can be analyzed – or Reduced
  • 3. MapReduce • Instead of talking about it, lets just see some code
  • 5. But that’s perfect for public cloud!
  • 6. CPU / RAM • Mapping – is not CPU intensive • Reducing – is (usually) not CPU intensive • Speed: load data in RAM or it hits the HDD and creates iowait
  • 7. First point – a lot of RAM
  • 8. Ah…now they have RAM instances
  • 9. I wanted more RAM I said L
  • 10. What about MapReduce PaaS • Be aware of data lock in • Be aware of forced tool set – limits your workflow • …and I just don’t like it so much as a developer; it’s a control thing
  • 13. What is compute like – why commodity?
  • 14. Now lets get to work • Chef & Puppet – deploy and config • Clone; scale up and down • Problem solved?
  • 15. And then there is reality…iowait
  • 16. ioWAIT • 1,000 iops on a high end enterprise HDD • 500,000 iops on a high end SSD • Public Cloud should be forbidden on HDD
  • 18. Who and How does workload? • CERN – the LHC experiments to find the Higgs Boson • 200 Petabytes per year • 200,000 CPU days / day on hundreds of partner grids and clouds • Montecarlo simulations (CPU intensive)
  • 19. The CERN case • Golden VM images with tools, from which they clone • A set of coordinator servers, which scales the worker nodes up and down via provisioning API (clone, puppet config) • Coordinator servers manages workload on each worker node doing Montecarlo simulations • Federated cloud brokers such as Enstratus and Slipstream • Self healing architecture – uptime of a worker node is not critical; just spin up a new worker node instance
  • 20. The loyalty program case • The customer runs various loyalty programs • Plain vanilla Hadoop Map/Reduce: • Chef and Puppet to deploy and config worker nodes • Lots of RAM, little CPU • Self healing architecture – uptime of a worker node is not critical; just spin up a new worker node instance
  • 21. Find IaaS suppliers with • Unbundled resources • Short billing cycles • New equipment: what’s your life cycle? • SSD as compute storage • Allows cross connects from your equipment in the DC • CloudSigma, ElasticHost, ProfitBricks to mention a few