SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Petabytes and Nanoseconds 
Distributed Data Storage andthe CAP Theorem 
FIN talk 
Robert Greiner 
Nathan Murray 
August 21,2014
CHAPTER 
The Problems 
Your phone can add two numbers in the same time it takes light to travel one foot 
All high frequency trading servers are connected to the NASDAQ network with the same length of cable, so that no party has a speed advantage
A Common Scenario 
Web 
Application 
RDBMS 
+ =
The Solution: Scale All the Things!!1
Why shouldwe scale? 
Throughput 
Latency 
Storage 
Reliability
The Solution? 
Add a load balancer 
Add more web servers 
Tune the DB. Indexes,SPs, etc.
There’sa new bottleneck 
Generally an RDBMS can becomea bottleneck around 10K transactions per second
Next Step… Distribute Your Data 
Each web server can talk to any data storage node 
Nodes distribute queries and replicate data – lots more complexity!
Cluster = Additional Complexity
Enter the CAP Theorem! 
This guy created the CAP Theorem 
This guy’s 
VP Invented the internet
CAP Theorem: Defined 
Within a distributed system, you can only make two of the following three guarantees across a write/read pair
Guarantee 1: Consistency 
If a value is written, and then fetched, I will alwaysget back the new value 
Note: not the same as the C in ACID! 
_
Guarantee 2: Availability 
If a value is written, a success message should always be returned. If a subsequent read returns a stale value, or something reasonable, it’s OK. 
_ 
Note: not the same as the A in HA!
Guarantee 3: Partition Tolerance 
The system will continue to function when network partitions occur –OOP != NP. 
_ 
Note: nothing to do with BAC!
CAP Triangle 
The CAP Theorem is explained as a triangle 
C, A or P: Pick two 
This is true in practice, except…
When choosing a distributed system… 
vs.
… You Can’t Sacrifice Partition Tolerance! 
NOTDistributed 
(a.k.a. NOTPartition Tolerant) 
Available 
AND 
Consistent 
Distributed 
(a.k.a. Partition Tolerant) 
Available 
OR 
Consistent 
_ 
_
CPvs. AP 
Synchronous. 
Waits until partition heals or times out. 
Asynchronous. 
Returns a reasonable response always.
CPvs. AP 
Synchronous. 
Waits until partition heals or times out. 
Asynchronous. 
Returns a reasonable response always. 
At a bank, you get a deposit receipt afterthe work is complete 
At a coffee shop, you get a receipt beforethe work is complete
CHAPTER 
Whendo companies care?
Companies care about internetscale
Distributed Storage Past 
2004 
Google’s Map Reduce paper published 
2006 
Google’s Big Table paper published 
2007 
Amazon’s Dynamo paper published 
2008 
Yahoo runs search on Hadoop 
2008 
Facebook open sources Cassandra 
2008 
Bitcoin paper published 
2009 
Yahoo open sources Hadoop 
2010 
Azure Table Storage released 
2012 
Google’s Spanner and F1 papers 
2013 
Amazon releases DynamoDB inside AWS 
2014 
Google’s Mesa paper published 
2015 
????
Looking forward 
•Open source implementations of more sophisticated storage systems 
•Managed services with more advanced capabilities 
•Google Cloud versions of F1, Spanner, or Mesa? 
•NoSQL + SQL 
•Distributed data storage in untrusted environments
CHAPTER 
How does this affect me
Even our most “legacy” clients are already starting to care about internet scale: 
_
Scenario 
Client = Energy Retailer (Independent Sales Force) 
Sales Agent captures info about potential customer 
Price generated on-demand based on daily rate curve 
Quote no longer valid at midnight 
Each night, rates are updated based on new rate-curve 
Used to take 4hours 
Now takes > 24hours (Due to increased demand)
Current State
Solution Strategy 
Assess 
•Analyze business performance needs 
•Select non-performing work streams 
•Filter –(Could/Should) 
•Prioritize 
•Performance Baseline / Load Test 
Strategize 
•Identify Bottlenecks (CPU/RAM/Network) 
•Optimization strategy 
•Technology Selection 
Implement 
•POC 
•Load Test 
•Optimize 
•Build
Optimize Code 
Scale Up 
Scale Out 
Managed Service
Optimize CodeLevel 1 
Least organizational impact 
No architecture changes required 
Use existing development processes 
Risky –Code may be fine 
Expensive –Dev Resources 
Time Consuming –Dev + Deploy
Scale UpLevel 2 
Easiest solution 
Utilize existing infrastructure 
Little/no architecture changes 
Low probability of network partitions 
May not solve the problem long-term 
Hardware limitations 
Non-linear improvement (2x RAM != 2x Performance) 
C/A
Scale OutLevel 3 
Highest throughput 
Improved system up-time 
No single point of failure 
Linear performance increases 
Use commodity hardware –Hard to scale-up CPU 
Increased infrastructure / system complexity 
Increased probability of network partitions 
Automation complexity 
A/C
Managed ServiceLevel 4 
Low barrier to entry 
No additional hardware investment required 
Treat as extension of existing data center 
Appliance configuration 
Globally redundant (cloud) 
Most organizational change 
Less control and customization 
Built-in redundancy and innovation 
C/A 
A/C
Optimize Code(Level 1) 
•Least organizational impact 
•No architecture changes required 
•Use existing development processes 
•Risky –Code may be fine 
•Expensive –Dev Resources 
•Time Consuming –Dev + Deploy 
Scale Up(Level 2) 
•Easiest solution 
•Utilize existing infrastructure 
•Little/no architecture changes 
•Reduce probability of network partitions 
•May not solve the problem long-term 
•Hardware limitations 
•Non-linear improvement 
Scale Out(Level 3) 
•Highest throughput 
•Improved system up-time 
•No single point of failure 
•Linear performance inc. 
•Use commodity hardware 
•Increased infrastructure / system complexity 
•Increased probability of network partitions 
•Automation complexity 
Managed Service(Level 4) 
•Low barrier to entry 
•No additional hardware investment required 
•Treat as extension of existing data center 
•Appliance configuration 
•Globally redundant (cloud) 
•Most organizational change 
•Less control and customization 
•High innovation 
Pick One (Or More!)
First Attempt
Good Enough?
Taking It to the Next Level
The Best Solution?
What Would YOUDo?
Fin’ 
robert.greiner@parivedasolutions.com 
nathan.murray@parivedasolutions.com

Contenu connexe

Tendances

Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...Amazon Web Services
 
Keeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand CurveKeeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand CurveAmazon Web Services
 
Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overviewCenk Ersoy
 
Introduction to RightScale
Introduction to RightScaleIntroduction to RightScale
Introduction to RightScaleAkelios
 
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale
 
Trends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBMTrends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBMCodemotion Tel Aviv
 
How to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeDavid Linthicum
 
How We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by VersentHow We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by VersentAmazon Web Services
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Tom Laszewski
 
Best Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash StorageBest Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash StorageRyan Snell
 
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWSAWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWSAmazon Web Services
 
3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero 3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero Amazon Web Services
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...Amazon Web Services
 
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016Amazon Web Services
 
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...Amazon Web Services
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersAmazon Web Services
 
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...Codit
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsAmazon Web Services
 
Microsoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureMicrosoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureDavid Chou
 

Tendances (20)

Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...
 
Keeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand CurveKeeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand Curve
 
Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overview
 
Introduction to RightScale
Introduction to RightScaleIntroduction to RightScale
Introduction to RightScale
 
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
 
Trends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBMTrends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBM
 
How to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First Time
 
How We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by VersentHow We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by Versent
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Best Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash StorageBest Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash Storage
 
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWSAWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
 
3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero 3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
 
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
 
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital Markets
 
AWS Architecting In The Cloud
AWS Architecting In The CloudAWS Architecting In The Cloud
AWS Architecting In The Cloud
 
Microsoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureMicrosoft Cloud Services Architecture
Microsoft Cloud Services Architecture
 

En vedette

Fin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsFin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsRobert Greiner
 
Test Driven Development at 10,000 Feet
Test Driven Development at 10,000 FeetTest Driven Development at 10,000 Feet
Test Driven Development at 10,000 FeetRobert Greiner
 
Code Quality and Tipster
Code Quality and TipsterCode Quality and Tipster
Code Quality and TipsterRobert Greiner
 
Automated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDEAutomated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDERobert Greiner
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web ServicesRobert Greiner
 

En vedette (6)

Fin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsFin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIs
 
Testing javascript
Testing javascriptTesting javascript
Testing javascript
 
Test Driven Development at 10,000 Feet
Test Driven Development at 10,000 FeetTest Driven Development at 10,000 Feet
Test Driven Development at 10,000 Feet
 
Code Quality and Tipster
Code Quality and TipsterCode Quality and Tipster
Code Quality and Tipster
 
Automated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDEAutomated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDE
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web Services
 

Similaire à Petabytes and Nanoseconds

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Migration to the cloud
Migration to the cloudMigration to the cloud
Migration to the cloudEPAM Systems
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
 
Best Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayBest Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayRandy Shoup
 
ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailabilitywebuploader
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...confluent
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestRodolfo Kohn
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goalskamaelian
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Coursejimliddle
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And ScalabilityJason Ragsdale
 
Telehouse Enhanced Connect slide share
Telehouse Enhanced Connect  slide shareTelehouse Enhanced Connect  slide share
Telehouse Enhanced Connect slide shareTelehouse Europe
 
5 Quick Wins for the Cloud
5 Quick Wins for the Cloud5 Quick Wins for the Cloud
5 Quick Wins for the CloudRightScale
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLTriNimbus
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?Sargun Dhillon
 

Similaire à Petabytes and Nanoseconds (20)

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Migration to the cloud
Migration to the cloudMigration to the cloud
Migration to the cloud
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Best Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayBest Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBay
 
ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailability
 
NoSQL
NoSQLNoSQL
NoSQL
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And Scalability
 
Telehouse Enhanced Connect slide share
Telehouse Enhanced Connect  slide shareTelehouse Enhanced Connect  slide share
Telehouse Enhanced Connect slide share
 
Introduction
IntroductionIntroduction
Introduction
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
5 Quick Wins for the Cloud
5 Quick Wins for the Cloud5 Quick Wins for the Cloud
5 Quick Wins for the Cloud
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?
 

Plus de Robert Greiner

Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...Robert Greiner
 
Virtual Team Best Practices
Virtual Team Best PracticesVirtual Team Best Practices
Virtual Team Best PracticesRobert Greiner
 
Becoming the Ideal Team Player
Becoming the Ideal Team PlayerBecoming the Ideal Team Player
Becoming the Ideal Team PlayerRobert Greiner
 
POV - Practical Containerization
POV - Practical ContainerizationPOV - Practical Containerization
POV - Practical ContainerizationRobert Greiner
 
POV - Enterprise Security Canvas
POV - Enterprise Security CanvasPOV - Enterprise Security Canvas
POV - Enterprise Security CanvasRobert Greiner
 
Foundations of financial independence
Foundations of financial independenceFoundations of financial independence
Foundations of financial independenceRobert Greiner
 
Why feedback is important
Why feedback is importantWhy feedback is important
Why feedback is importantRobert Greiner
 
Infrastructure as Code
Infrastructure as CodeInfrastructure as Code
Infrastructure as CodeRobert Greiner
 
Introduction to Windows Azure Data Services
Introduction to Windows Azure Data ServicesIntroduction to Windows Azure Data Services
Introduction to Windows Azure Data ServicesRobert Greiner
 

Plus de Robert Greiner (9)

Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
 
Virtual Team Best Practices
Virtual Team Best PracticesVirtual Team Best Practices
Virtual Team Best Practices
 
Becoming the Ideal Team Player
Becoming the Ideal Team PlayerBecoming the Ideal Team Player
Becoming the Ideal Team Player
 
POV - Practical Containerization
POV - Practical ContainerizationPOV - Practical Containerization
POV - Practical Containerization
 
POV - Enterprise Security Canvas
POV - Enterprise Security CanvasPOV - Enterprise Security Canvas
POV - Enterprise Security Canvas
 
Foundations of financial independence
Foundations of financial independenceFoundations of financial independence
Foundations of financial independence
 
Why feedback is important
Why feedback is importantWhy feedback is important
Why feedback is important
 
Infrastructure as Code
Infrastructure as CodeInfrastructure as Code
Infrastructure as Code
 
Introduction to Windows Azure Data Services
Introduction to Windows Azure Data ServicesIntroduction to Windows Azure Data Services
Introduction to Windows Azure Data Services
 

Dernier

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

Petabytes and Nanoseconds

  • 1. Petabytes and Nanoseconds Distributed Data Storage andthe CAP Theorem FIN talk Robert Greiner Nathan Murray August 21,2014
  • 2. CHAPTER The Problems Your phone can add two numbers in the same time it takes light to travel one foot All high frequency trading servers are connected to the NASDAQ network with the same length of cable, so that no party has a speed advantage
  • 3. A Common Scenario Web Application RDBMS + =
  • 4. The Solution: Scale All the Things!!1
  • 5. Why shouldwe scale? Throughput Latency Storage Reliability
  • 6. The Solution? Add a load balancer Add more web servers Tune the DB. Indexes,SPs, etc.
  • 7. There’sa new bottleneck Generally an RDBMS can becomea bottleneck around 10K transactions per second
  • 8. Next Step… Distribute Your Data Each web server can talk to any data storage node Nodes distribute queries and replicate data – lots more complexity!
  • 9. Cluster = Additional Complexity
  • 10. Enter the CAP Theorem! This guy created the CAP Theorem This guy’s VP Invented the internet
  • 11. CAP Theorem: Defined Within a distributed system, you can only make two of the following three guarantees across a write/read pair
  • 12. Guarantee 1: Consistency If a value is written, and then fetched, I will alwaysget back the new value Note: not the same as the C in ACID! _
  • 13. Guarantee 2: Availability If a value is written, a success message should always be returned. If a subsequent read returns a stale value, or something reasonable, it’s OK. _ Note: not the same as the A in HA!
  • 14. Guarantee 3: Partition Tolerance The system will continue to function when network partitions occur –OOP != NP. _ Note: nothing to do with BAC!
  • 15. CAP Triangle The CAP Theorem is explained as a triangle C, A or P: Pick two This is true in practice, except…
  • 16. When choosing a distributed system… vs.
  • 17. … You Can’t Sacrifice Partition Tolerance! NOTDistributed (a.k.a. NOTPartition Tolerant) Available AND Consistent Distributed (a.k.a. Partition Tolerant) Available OR Consistent _ _
  • 18. CPvs. AP Synchronous. Waits until partition heals or times out. Asynchronous. Returns a reasonable response always.
  • 19. CPvs. AP Synchronous. Waits until partition heals or times out. Asynchronous. Returns a reasonable response always. At a bank, you get a deposit receipt afterthe work is complete At a coffee shop, you get a receipt beforethe work is complete
  • 21. Companies care about internetscale
  • 22. Distributed Storage Past 2004 Google’s Map Reduce paper published 2006 Google’s Big Table paper published 2007 Amazon’s Dynamo paper published 2008 Yahoo runs search on Hadoop 2008 Facebook open sources Cassandra 2008 Bitcoin paper published 2009 Yahoo open sources Hadoop 2010 Azure Table Storage released 2012 Google’s Spanner and F1 papers 2013 Amazon releases DynamoDB inside AWS 2014 Google’s Mesa paper published 2015 ????
  • 23. Looking forward •Open source implementations of more sophisticated storage systems •Managed services with more advanced capabilities •Google Cloud versions of F1, Spanner, or Mesa? •NoSQL + SQL •Distributed data storage in untrusted environments
  • 24. CHAPTER How does this affect me
  • 25. Even our most “legacy” clients are already starting to care about internet scale: _
  • 26.
  • 27. Scenario Client = Energy Retailer (Independent Sales Force) Sales Agent captures info about potential customer Price generated on-demand based on daily rate curve Quote no longer valid at midnight Each night, rates are updated based on new rate-curve Used to take 4hours Now takes > 24hours (Due to increased demand)
  • 29. Solution Strategy Assess •Analyze business performance needs •Select non-performing work streams •Filter –(Could/Should) •Prioritize •Performance Baseline / Load Test Strategize •Identify Bottlenecks (CPU/RAM/Network) •Optimization strategy •Technology Selection Implement •POC •Load Test •Optimize •Build
  • 30. Optimize Code Scale Up Scale Out Managed Service
  • 31. Optimize CodeLevel 1 Least organizational impact No architecture changes required Use existing development processes Risky –Code may be fine Expensive –Dev Resources Time Consuming –Dev + Deploy
  • 32. Scale UpLevel 2 Easiest solution Utilize existing infrastructure Little/no architecture changes Low probability of network partitions May not solve the problem long-term Hardware limitations Non-linear improvement (2x RAM != 2x Performance) C/A
  • 33. Scale OutLevel 3 Highest throughput Improved system up-time No single point of failure Linear performance increases Use commodity hardware –Hard to scale-up CPU Increased infrastructure / system complexity Increased probability of network partitions Automation complexity A/C
  • 34. Managed ServiceLevel 4 Low barrier to entry No additional hardware investment required Treat as extension of existing data center Appliance configuration Globally redundant (cloud) Most organizational change Less control and customization Built-in redundancy and innovation C/A A/C
  • 35. Optimize Code(Level 1) •Least organizational impact •No architecture changes required •Use existing development processes •Risky –Code may be fine •Expensive –Dev Resources •Time Consuming –Dev + Deploy Scale Up(Level 2) •Easiest solution •Utilize existing infrastructure •Little/no architecture changes •Reduce probability of network partitions •May not solve the problem long-term •Hardware limitations •Non-linear improvement Scale Out(Level 3) •Highest throughput •Improved system up-time •No single point of failure •Linear performance inc. •Use commodity hardware •Increased infrastructure / system complexity •Increased probability of network partitions •Automation complexity Managed Service(Level 4) •Low barrier to entry •No additional hardware investment required •Treat as extension of existing data center •Appliance configuration •Globally redundant (cloud) •Most organizational change •Less control and customization •High innovation Pick One (Or More!)
  • 38. Taking It to the Next Level