SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Petabytes and Nanoseconds 
Distributed Data Storage andthe CAP Theorem 
FIN talk 
Robert Greiner 
Nathan Murray 
August 21,2014
CHAPTER 
The Problems 
Your phone can add two numbers in the same time it takes light to travel one foot 
All high frequency trading servers are connected to the NASDAQ network with the same length of cable, so that no party has a speed advantage
A Common Scenario 
Web 
Application 
RDBMS 
+ =
The Solution: Scale All the Things!!1
Why shouldwe scale? 
Throughput 
Latency 
Storage 
Reliability
The Solution? 
Add a load balancer 
Add more web servers 
Tune the DB. Indexes,SPs, etc.
There’sa new bottleneck 
Generally an RDBMS can becomea bottleneck around 10K transactions per second
Next Step… Distribute Your Data 
Each web server can talk to any data storage node 
Nodes distribute queries and replicate data – lots more complexity!
Cluster = Additional Complexity
Enter the CAP Theorem! 
This guy created the CAP Theorem 
This guy’s 
VP Invented the internet
CAP Theorem: Defined 
Within a distributed system, you can only make two of the following three guarantees across a write/read pair
Guarantee 1: Consistency 
If a value is written, and then fetched, I will alwaysget back the new value 
Note: not the same as the C in ACID! 
_
Guarantee 2: Availability 
If a value is written, a success message should always be returned. If a subsequent read returns a stale value, or something reasonable, it’s OK. 
_ 
Note: not the same as the A in HA!
Guarantee 3: Partition Tolerance 
The system will continue to function when network partitions occur –OOP != NP. 
_ 
Note: nothing to do with BAC!
CAP Triangle 
The CAP Theorem is explained as a triangle 
C, A or P: Pick two 
This is true in practice, except…
When choosing a distributed system… 
vs.
… You Can’t Sacrifice Partition Tolerance! 
NOTDistributed 
(a.k.a. NOTPartition Tolerant) 
Available 
AND 
Consistent 
Distributed 
(a.k.a. Partition Tolerant) 
Available 
OR 
Consistent 
_ 
_
CPvs. AP 
Synchronous. 
Waits until partition heals or times out. 
Asynchronous. 
Returns a reasonable response always.
CPvs. AP 
Synchronous. 
Waits until partition heals or times out. 
Asynchronous. 
Returns a reasonable response always. 
At a bank, you get a deposit receipt afterthe work is complete 
At a coffee shop, you get a receipt beforethe work is complete
CHAPTER 
Whendo companies care?
Companies care about internetscale
Distributed Storage Past 
2004 
Google’s Map Reduce paper published 
2006 
Google’s Big Table paper published 
2007 
Amazon’s Dynamo paper published 
2008 
Yahoo runs search on Hadoop 
2008 
Facebook open sources Cassandra 
2008 
Bitcoin paper published 
2009 
Yahoo open sources Hadoop 
2010 
Azure Table Storage released 
2012 
Google’s Spanner and F1 papers 
2013 
Amazon releases DynamoDB inside AWS 
2014 
Google’s Mesa paper published 
2015 
????
Looking forward 
•Open source implementations of more sophisticated storage systems 
•Managed services with more advanced capabilities 
•Google Cloud versions of F1, Spanner, or Mesa? 
•NoSQL + SQL 
•Distributed data storage in untrusted environments
CHAPTER 
How does this affect me
Even our most “legacy” clients are already starting to care about internet scale: 
_
Scenario 
Client = Energy Retailer (Independent Sales Force) 
Sales Agent captures info about potential customer 
Price generated on-demand based on daily rate curve 
Quote no longer valid at midnight 
Each night, rates are updated based on new rate-curve 
Used to take 4hours 
Now takes > 24hours (Due to increased demand)
Current State
Solution Strategy 
Assess 
•Analyze business performance needs 
•Select non-performing work streams 
•Filter –(Could/Should) 
•Prioritize 
•Performance Baseline / Load Test 
Strategize 
•Identify Bottlenecks (CPU/RAM/Network) 
•Optimization strategy 
•Technology Selection 
Implement 
•POC 
•Load Test 
•Optimize 
•Build
Optimize Code 
Scale Up 
Scale Out 
Managed Service
Optimize CodeLevel 1 
Least organizational impact 
No architecture changes required 
Use existing development processes 
Risky –Code may be fine 
Expensive –Dev Resources 
Time Consuming –Dev + Deploy
Scale UpLevel 2 
Easiest solution 
Utilize existing infrastructure 
Little/no architecture changes 
Low probability of network partitions 
May not solve the problem long-term 
Hardware limitations 
Non-linear improvement (2x RAM != 2x Performance) 
C/A
Scale OutLevel 3 
Highest throughput 
Improved system up-time 
No single point of failure 
Linear performance increases 
Use commodity hardware –Hard to scale-up CPU 
Increased infrastructure / system complexity 
Increased probability of network partitions 
Automation complexity 
A/C
Managed ServiceLevel 4 
Low barrier to entry 
No additional hardware investment required 
Treat as extension of existing data center 
Appliance configuration 
Globally redundant (cloud) 
Most organizational change 
Less control and customization 
Built-in redundancy and innovation 
C/A 
A/C
Optimize Code(Level 1) 
•Least organizational impact 
•No architecture changes required 
•Use existing development processes 
•Risky –Code may be fine 
•Expensive –Dev Resources 
•Time Consuming –Dev + Deploy 
Scale Up(Level 2) 
•Easiest solution 
•Utilize existing infrastructure 
•Little/no architecture changes 
•Reduce probability of network partitions 
•May not solve the problem long-term 
•Hardware limitations 
•Non-linear improvement 
Scale Out(Level 3) 
•Highest throughput 
•Improved system up-time 
•No single point of failure 
•Linear performance inc. 
•Use commodity hardware 
•Increased infrastructure / system complexity 
•Increased probability of network partitions 
•Automation complexity 
Managed Service(Level 4) 
•Low barrier to entry 
•No additional hardware investment required 
•Treat as extension of existing data center 
•Appliance configuration 
•Globally redundant (cloud) 
•Most organizational change 
•Less control and customization 
•High innovation 
Pick One (Or More!)
First Attempt
Good Enough?
Taking It to the Next Level
The Best Solution?
What Would YOUDo?
Fin’ 
robert.greiner@parivedasolutions.com 
nathan.murray@parivedasolutions.com

Contenu connexe

Tendances

Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...Amazon Web Services
 
Keeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand CurveKeeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand CurveAmazon Web Services
 
Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overviewCenk Ersoy
 
Introduction to RightScale
Introduction to RightScaleIntroduction to RightScale
Introduction to RightScaleAkelios
 
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale
 
Trends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBMTrends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBMCodemotion Tel Aviv
 
How to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeDavid Linthicum
 
How We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by VersentHow We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by VersentAmazon Web Services
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Tom Laszewski
 
Best Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash StorageBest Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash StorageRyan Snell
 
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWSAWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWSAmazon Web Services
 
3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero 3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero Amazon Web Services
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...Amazon Web Services
 
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016Amazon Web Services
 
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...Amazon Web Services
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersAmazon Web Services
 
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...Codit
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsAmazon Web Services
 
Microsoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureMicrosoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureDavid Chou
 

Tendances (20)

Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...
 
Keeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand CurveKeeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand Curve
 
Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overview
 
Introduction to RightScale
Introduction to RightScaleIntroduction to RightScale
Introduction to RightScale
 
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
 
Trends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBMTrends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBM
 
How to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First Time
 
How We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by VersentHow We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by Versent
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Best Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash StorageBest Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash Storage
 
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWSAWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
 
3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero 3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
 
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
 
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital Markets
 
AWS Architecting In The Cloud
AWS Architecting In The CloudAWS Architecting In The Cloud
AWS Architecting In The Cloud
 
Microsoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureMicrosoft Cloud Services Architecture
Microsoft Cloud Services Architecture
 

En vedette

Fin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsFin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsRobert Greiner
 
Test Driven Development at 10,000 Feet
Test Driven Development at 10,000 FeetTest Driven Development at 10,000 Feet
Test Driven Development at 10,000 FeetRobert Greiner
 
Code Quality and Tipster
Code Quality and TipsterCode Quality and Tipster
Code Quality and TipsterRobert Greiner
 
Automated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDEAutomated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDERobert Greiner
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web ServicesRobert Greiner
 

En vedette (6)

Fin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsFin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIs
 
Testing javascript
Testing javascriptTesting javascript
Testing javascript
 
Test Driven Development at 10,000 Feet
Test Driven Development at 10,000 FeetTest Driven Development at 10,000 Feet
Test Driven Development at 10,000 Feet
 
Code Quality and Tipster
Code Quality and TipsterCode Quality and Tipster
Code Quality and Tipster
 
Automated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDEAutomated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDE
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web Services
 

Similaire à Petabytes and Nanoseconds

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Migration to the cloud
Migration to the cloudMigration to the cloud
Migration to the cloudEPAM Systems
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
 
Best Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayBest Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayRandy Shoup
 
ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailabilitywebuploader
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestRodolfo Kohn
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goalskamaelian
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Coursejimliddle
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And ScalabilityJason Ragsdale
 
Telehouse Enhanced Connect slide share
Telehouse Enhanced Connect  slide shareTelehouse Enhanced Connect  slide share
Telehouse Enhanced Connect slide shareTelehouse Europe
 
5 Quick Wins for the Cloud
5 Quick Wins for the Cloud5 Quick Wins for the Cloud
5 Quick Wins for the CloudRightScale
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLTriNimbus
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?Sargun Dhillon
 

Similaire à Petabytes and Nanoseconds (20)

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Migration to the cloud
Migration to the cloudMigration to the cloud
Migration to the cloud
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Best Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayBest Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBay
 
ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailability
 
NoSQL
NoSQLNoSQL
NoSQL
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And Scalability
 
Telehouse Enhanced Connect slide share
Telehouse Enhanced Connect  slide shareTelehouse Enhanced Connect  slide share
Telehouse Enhanced Connect slide share
 
Introduction
IntroductionIntroduction
Introduction
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
5 Quick Wins for the Cloud
5 Quick Wins for the Cloud5 Quick Wins for the Cloud
5 Quick Wins for the Cloud
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?
 

Plus de Robert Greiner

Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...Robert Greiner
 
Virtual Team Best Practices
Virtual Team Best PracticesVirtual Team Best Practices
Virtual Team Best PracticesRobert Greiner
 
Becoming the Ideal Team Player
Becoming the Ideal Team PlayerBecoming the Ideal Team Player
Becoming the Ideal Team PlayerRobert Greiner
 
POV - Practical Containerization
POV - Practical ContainerizationPOV - Practical Containerization
POV - Practical ContainerizationRobert Greiner
 
POV - Enterprise Security Canvas
POV - Enterprise Security CanvasPOV - Enterprise Security Canvas
POV - Enterprise Security CanvasRobert Greiner
 
Foundations of financial independence
Foundations of financial independenceFoundations of financial independence
Foundations of financial independenceRobert Greiner
 
Why feedback is important
Why feedback is importantWhy feedback is important
Why feedback is importantRobert Greiner
 
Infrastructure as Code
Infrastructure as CodeInfrastructure as Code
Infrastructure as CodeRobert Greiner
 
Introduction to Windows Azure Data Services
Introduction to Windows Azure Data ServicesIntroduction to Windows Azure Data Services
Introduction to Windows Azure Data ServicesRobert Greiner
 

Plus de Robert Greiner (9)

Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
 
Virtual Team Best Practices
Virtual Team Best PracticesVirtual Team Best Practices
Virtual Team Best Practices
 
Becoming the Ideal Team Player
Becoming the Ideal Team PlayerBecoming the Ideal Team Player
Becoming the Ideal Team Player
 
POV - Practical Containerization
POV - Practical ContainerizationPOV - Practical Containerization
POV - Practical Containerization
 
POV - Enterprise Security Canvas
POV - Enterprise Security CanvasPOV - Enterprise Security Canvas
POV - Enterprise Security Canvas
 
Foundations of financial independence
Foundations of financial independenceFoundations of financial independence
Foundations of financial independence
 
Why feedback is important
Why feedback is importantWhy feedback is important
Why feedback is important
 
Infrastructure as Code
Infrastructure as CodeInfrastructure as Code
Infrastructure as Code
 
Introduction to Windows Azure Data Services
Introduction to Windows Azure Data ServicesIntroduction to Windows Azure Data Services
Introduction to Windows Azure Data Services
 

Dernier

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Dernier (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Petabytes and Nanoseconds

  • 1. Petabytes and Nanoseconds Distributed Data Storage andthe CAP Theorem FIN talk Robert Greiner Nathan Murray August 21,2014
  • 2. CHAPTER The Problems Your phone can add two numbers in the same time it takes light to travel one foot All high frequency trading servers are connected to the NASDAQ network with the same length of cable, so that no party has a speed advantage
  • 3. A Common Scenario Web Application RDBMS + =
  • 4. The Solution: Scale All the Things!!1
  • 5. Why shouldwe scale? Throughput Latency Storage Reliability
  • 6. The Solution? Add a load balancer Add more web servers Tune the DB. Indexes,SPs, etc.
  • 7. There’sa new bottleneck Generally an RDBMS can becomea bottleneck around 10K transactions per second
  • 8. Next Step… Distribute Your Data Each web server can talk to any data storage node Nodes distribute queries and replicate data – lots more complexity!
  • 9. Cluster = Additional Complexity
  • 10. Enter the CAP Theorem! This guy created the CAP Theorem This guy’s VP Invented the internet
  • 11. CAP Theorem: Defined Within a distributed system, you can only make two of the following three guarantees across a write/read pair
  • 12. Guarantee 1: Consistency If a value is written, and then fetched, I will alwaysget back the new value Note: not the same as the C in ACID! _
  • 13. Guarantee 2: Availability If a value is written, a success message should always be returned. If a subsequent read returns a stale value, or something reasonable, it’s OK. _ Note: not the same as the A in HA!
  • 14. Guarantee 3: Partition Tolerance The system will continue to function when network partitions occur –OOP != NP. _ Note: nothing to do with BAC!
  • 15. CAP Triangle The CAP Theorem is explained as a triangle C, A or P: Pick two This is true in practice, except…
  • 16. When choosing a distributed system… vs.
  • 17. … You Can’t Sacrifice Partition Tolerance! NOTDistributed (a.k.a. NOTPartition Tolerant) Available AND Consistent Distributed (a.k.a. Partition Tolerant) Available OR Consistent _ _
  • 18. CPvs. AP Synchronous. Waits until partition heals or times out. Asynchronous. Returns a reasonable response always.
  • 19. CPvs. AP Synchronous. Waits until partition heals or times out. Asynchronous. Returns a reasonable response always. At a bank, you get a deposit receipt afterthe work is complete At a coffee shop, you get a receipt beforethe work is complete
  • 21. Companies care about internetscale
  • 22. Distributed Storage Past 2004 Google’s Map Reduce paper published 2006 Google’s Big Table paper published 2007 Amazon’s Dynamo paper published 2008 Yahoo runs search on Hadoop 2008 Facebook open sources Cassandra 2008 Bitcoin paper published 2009 Yahoo open sources Hadoop 2010 Azure Table Storage released 2012 Google’s Spanner and F1 papers 2013 Amazon releases DynamoDB inside AWS 2014 Google’s Mesa paper published 2015 ????
  • 23. Looking forward •Open source implementations of more sophisticated storage systems •Managed services with more advanced capabilities •Google Cloud versions of F1, Spanner, or Mesa? •NoSQL + SQL •Distributed data storage in untrusted environments
  • 24. CHAPTER How does this affect me
  • 25. Even our most “legacy” clients are already starting to care about internet scale: _
  • 26.
  • 27. Scenario Client = Energy Retailer (Independent Sales Force) Sales Agent captures info about potential customer Price generated on-demand based on daily rate curve Quote no longer valid at midnight Each night, rates are updated based on new rate-curve Used to take 4hours Now takes > 24hours (Due to increased demand)
  • 29. Solution Strategy Assess •Analyze business performance needs •Select non-performing work streams •Filter –(Could/Should) •Prioritize •Performance Baseline / Load Test Strategize •Identify Bottlenecks (CPU/RAM/Network) •Optimization strategy •Technology Selection Implement •POC •Load Test •Optimize •Build
  • 30. Optimize Code Scale Up Scale Out Managed Service
  • 31. Optimize CodeLevel 1 Least organizational impact No architecture changes required Use existing development processes Risky –Code may be fine Expensive –Dev Resources Time Consuming –Dev + Deploy
  • 32. Scale UpLevel 2 Easiest solution Utilize existing infrastructure Little/no architecture changes Low probability of network partitions May not solve the problem long-term Hardware limitations Non-linear improvement (2x RAM != 2x Performance) C/A
  • 33. Scale OutLevel 3 Highest throughput Improved system up-time No single point of failure Linear performance increases Use commodity hardware –Hard to scale-up CPU Increased infrastructure / system complexity Increased probability of network partitions Automation complexity A/C
  • 34. Managed ServiceLevel 4 Low barrier to entry No additional hardware investment required Treat as extension of existing data center Appliance configuration Globally redundant (cloud) Most organizational change Less control and customization Built-in redundancy and innovation C/A A/C
  • 35. Optimize Code(Level 1) •Least organizational impact •No architecture changes required •Use existing development processes •Risky –Code may be fine •Expensive –Dev Resources •Time Consuming –Dev + Deploy Scale Up(Level 2) •Easiest solution •Utilize existing infrastructure •Little/no architecture changes •Reduce probability of network partitions •May not solve the problem long-term •Hardware limitations •Non-linear improvement Scale Out(Level 3) •Highest throughput •Improved system up-time •No single point of failure •Linear performance inc. •Use commodity hardware •Increased infrastructure / system complexity •Increased probability of network partitions •Automation complexity Managed Service(Level 4) •Low barrier to entry •No additional hardware investment required •Treat as extension of existing data center •Appliance configuration •Globally redundant (cloud) •Most organizational change •Less control and customization •High innovation Pick One (Or More!)
  • 38. Taking It to the Next Level