SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
Cloud Architecture Patterns
Running PostgreSQL at Scale
(when RDS won't do what you need)
Corey Huinker
Corlogic Consulting
March 2018
First, we need a problem to
solve.
This is You
You Get An Idea For a Product
You make a product! ...now you have to sell it.
To advertise the product, you need an ad...
...so you talk to an ad agency.
But placing ads has challenges
Need to find websites with visitors who:
● Would want to buy your product
● Are able to buy your product
● Would be drawn in by the creative you have designed
Websites Claims about their Visitors...
...are not always accurate.
Buying ad-space on websites directly is usually not
possible, you need a broker/auction service.
So how do you know that your ad was seen?
Focal points of ad monitoring
● Number of times ad landed on a page (impressions)
● Where on the page did it land?
● Did it fit the space allotted.
● How long did the page stay up.
● Did the viewer interact with the ad in any way?
● Was the viewer a human?
● How do these numbers compare with the claims of the website?
● How do these numbers compare with the claims of the broker?
This creates a lot of data
● Not all impressions phone home (sampling rate varies by contract)
● Sampling events recorded per day (approx): 50 Billion
● Sampling events are chained together to tell the story of that impression.
● Impression data is then aggregated by date, ad campaign, browser
● After aggregation, about 500M rows are left per day.
● Each row has > 125 measures of viewability metrics
Capturing The Events
● Pixel servers
● Need to be fast to not slow down user experience
● or risk losing event data
● Need to get log data off of machine ASAP
● Approximately 500 machines
○ Low CPU workload
○ Low disk I/O workload
○ High network bandwidth
○ low latency
○ generously over-provisioned
Real-time accumulation and aggregation
● Consumes event logs from pixel servers as fast as possible.
● Each server is effectively a shard of the whole "today" database
● Custom in-memory database updating continuously
● Serving API calls continuously
● Approximately 450 machines
○ CPU load nearly 100%
○ To swap is to die
○ High network bandwidth
○ low latency
○ generously over-provisioned
What Didn't Work: MySQL
● Original DB choice
● Performed adequately when daily volume was < 1% of current volume
● Impossible to add new columns to tables
● Easier to create a new shard than to modify an existing one.
● New metrics being added every few weeks, or even days
● Dozens of shards, no consistency in their size
What Didn't Work: Redshift
● Intended to compliment MySQL
● Performed adequately when daily volume was < 1% of current volume
● Needed subsecond response, was getting 30s+ response
● Was only machine that had a copy of data across all time
● HDD was slow, tried SSD instances, but had limited space
● Eventually got up to a 26 node cluster with 32 cores per node.
● Cannot distinguish a large query from a small one
● Had no insight into how the data was partitioned
● Reorganizing data according to AWS suggestions would have resulted in
vacuums taking several days.
What Didn't Work: Vertica
● Intended to compliment MySQL
● Good response times over larger data volumes
● Needed local disk to perform adequately, which limited disk size
● each cluster could only hold a few months of data
● 5 node clusters, 32 cores each.
● Could only have K-safety of 1, or else load took too long (2 hrs vs 10)
● Nodes failed daily, until glibc bug was fixed
● Expensive
What Did Work: Postgres
● Migrated OLTP MySQL DB (which held some DW tables)
● Conversion took 2 weeks with 2 programmers
● Used mysql_fdw to create migration tables
● Triggers on tables to identify modified rows
● Moved read-only workloads to postgres instance
● Migrated read-write apps in stages
● Only downtime was in final cut-over
● Single 32 core EC2 with 1-2 physical read replicas
What Did Work: Zipfian workloads
● Customers primarily care about data from today and last seven days
● About 85% of all API requests were in that date range
● Vanilla PostgreSQL instance, 32 cores, ample RAM, 5TB disk
● Data partitioned by day. Drop any partitions > 10 days old.
● Stores derivative data, so no need for backup and recovery strategy
● Focus on loading the data as quickly as possible each morning.
● Adjust apps to be aware that certain client's data is available earlier than
others
● Codename: L7
What Did Work: Getting cute with RAID
● Engineer discovered a quirk in AWS pricing of disk by size
● Could maximize IOPS by combining 30 small drives into a RAID-0
● Same hardware as an L7 could now store ~40 days of data, but data growth
meant that that figure would shrink with time
● Same strategy as L7, just adjusted for longer date coverage
● Codename:
○ L-Month? Would sound silly when X fell below 30
○ L-More? Accurate but not catchy.
○ L-mo?
○ Elmo
What Did Work: Typeahead search
● "Type-ahead" queries must return in < 100ms
● Such queries can be across arbitrary time range
● Scope of response is limited (screen real estate)
● Engineer discovers that our data compresses really well with TOAST
● Specialized instance to store all data at highest grain level, TOASTed
● pseudo-materialized views that aggregate data in search-friendly forms
● Use of "Dimension" tables as a form of compression on the matviews.
● Heavy btree_gin indexing on searchable terms and tokens in dimensions
● Single 32 core machine, abundant memory, 2 read replicas
● Rebuild from scratch would take days, so B&R strategy was needed
What Did Work: TOASTing the Kitchen Sink
● Data usage patterns guaranteed that a client usually wants most of the data
across their org for whatever date range is requested
● Putting such data in arrays guarantees TOASTing and compression.
● Compression shifts workload from scarce IOPS to abundant CPU
● Size of array chunks was heavily tuned for the EC2 instance type.
● Same RAID-0 as used in Elmo instance could now hold all customer data
● 5 32-core machines with an ETL-load sharing feature such that each one
processes a client/day then shares it with other nodes
● Replaced all Redshift and Vertica instances
● Codename: Marjory (the all seeing, all knowing trash heap)
What Did Work: Foreign Data Wrappers
● One FDW converted queries into API calls to the in-memory "today" database
● Another one used query quals to determine the set of client-dates that must
be fetched
● All client data stored on S3 as both .csv.gz and a compressed SQLite db
● FDW starts web service, launches on lambda per sqlite file
● Lambda queries SQLite file, sends results to web service
● web service re-issues lambdas as needed, returns results to FDW
● Very good for queries across long date ranges
● Codename: Frackles (the name for background monster muppets)
What Did Work: PMPP
● Poor Man's Parallel Processing
● Allows an application to issue multiple queries in parallel to multiple servers,
provided all the queries have the same shape
● Returns data via a set returning function, which can then do secondary
aggregation, joins, etc.
● Any machine that talks libpq could be queried (PgSQL, Vertica, Redshift)
● Allows for partial aggregation on DW boxes
● Secondary aggregation can occur on local machine
What Did Work: Decanters
● A place to let the data "breathe"
● Abundant CPUs, abundant memory per CPU, minimal disk
● Very small lookup tables replicated for performance reasons
● All other local tables are FDWs to OLTP database
● Mostly executes aggregation queries that use PMPP to access: Statscache,
Elmo, Marjory, Frackles, each one doing a local aggregation
● Final aggregation happens on decanter
● Can occasionally experience OOM (rather than on an important machine)
● New decanter can spin up and enter load balancer in 5 minutes
● No engineering time to be spent rescuing failed decanters
Putting it all together with PostgreSQL
Tagged Ads
Viewable Events
Pixel
Servers
Stats
Aggregators
S3 - CSVs
S3 - SQLite
Log shipping
Daily
Summaries
Elmo
Clusters
Marjory
Clusters
Search
Clusters
Daily ETLs
Putting it all together with PostgreSQL
User
Stats Requests
Elmo
Clusters
Marjory
Clusters
S3 - SQLite
PMPP Requests
OLTP DB
Third Party DW
Search
Clusters
Searches
Pg FDW
Frackles
FDW
Pg
FDW
Decanters
Live Stats
Aggregators
Stats-Cache
FDW
Why Not RDS?
● No ability to install custom extensions (esp Partitioning modules)
● No place to do local copy operations
● Reduced insight into the server load
● Reduced ability to tune pg server
● No ability to try beta versions
● Expense
Why Not Aurora?
● Had early adopter access
● AWS Devs said that it wasn't geared for DW workloads
● Seems nice on I/O
● Nice not having to worry about which servers are read only
● Wasn't there yet
● Data volumes necessitate advanced partitioning
● Expense
Why Not Athena?
● Athena had no concept of constraint exclusion to avoid reading irrelevant files
● Costs $5/TB of data read
● Most queries would cost > $100 each
● Running thousands of queries per hour
Questions?

Contenu connexe

Tendances

Tendances (20)

Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Redis
RedisRedis
Redis
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuBuild a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public CloudUsing S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
 
[GKE & Spanner 勉強会] Cloud Spanner の技術概要
[GKE & Spanner 勉強会] Cloud Spanner の技術概要[GKE & Spanner 勉強会] Cloud Spanner の技術概要
[GKE & Spanner 勉強会] Cloud Spanner の技術概要
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Open Service Broker APIとKubernetes Service Catalog #k8sjp
Open Service Broker APIとKubernetes Service Catalog #k8sjpOpen Service Broker APIとKubernetes Service Catalog #k8sjp
Open Service Broker APIとKubernetes Service Catalog #k8sjp
 

Similaire à Cloud arch patterns

kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
Anna Ossowski
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 

Similaire à Cloud arch patterns (20)

MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Big data nyu
Big data nyuBig data nyu
Big data nyu
 
Spark Meetup at Uber
Spark Meetup at UberSpark Meetup at Uber
Spark Meetup at Uber
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
Data Platform in the Cloud
Data Platform in the CloudData Platform in the Cloud
Data Platform in the Cloud
 
Machine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systemsMachine learning and big data @ uber a tale of two systems
Machine learning and big data @ uber a tale of two systems
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with ScyllaiFood on Delivering 100 Million Events a Month to Restaurants with Scylla
iFood on Delivering 100 Million Events a Month to Restaurants with Scylla
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 
BlackRay - The open Source Data Engine
BlackRay - The open Source Data EngineBlackRay - The open Source Data Engine
BlackRay - The open Source Data Engine
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Cloud arch patterns

  • 1. Cloud Architecture Patterns Running PostgreSQL at Scale (when RDS won't do what you need) Corey Huinker Corlogic Consulting March 2018
  • 2. First, we need a problem to solve.
  • 4. You Get An Idea For a Product
  • 5. You make a product! ...now you have to sell it.
  • 6. To advertise the product, you need an ad... ...so you talk to an ad agency.
  • 7. But placing ads has challenges Need to find websites with visitors who: ● Would want to buy your product ● Are able to buy your product ● Would be drawn in by the creative you have designed
  • 8. Websites Claims about their Visitors...
  • 9. ...are not always accurate.
  • 10. Buying ad-space on websites directly is usually not possible, you need a broker/auction service.
  • 11. So how do you know that your ad was seen?
  • 12. Focal points of ad monitoring ● Number of times ad landed on a page (impressions) ● Where on the page did it land? ● Did it fit the space allotted. ● How long did the page stay up. ● Did the viewer interact with the ad in any way? ● Was the viewer a human? ● How do these numbers compare with the claims of the website? ● How do these numbers compare with the claims of the broker?
  • 13. This creates a lot of data ● Not all impressions phone home (sampling rate varies by contract) ● Sampling events recorded per day (approx): 50 Billion ● Sampling events are chained together to tell the story of that impression. ● Impression data is then aggregated by date, ad campaign, browser ● After aggregation, about 500M rows are left per day. ● Each row has > 125 measures of viewability metrics
  • 14. Capturing The Events ● Pixel servers ● Need to be fast to not slow down user experience ● or risk losing event data ● Need to get log data off of machine ASAP ● Approximately 500 machines ○ Low CPU workload ○ Low disk I/O workload ○ High network bandwidth ○ low latency ○ generously over-provisioned
  • 15. Real-time accumulation and aggregation ● Consumes event logs from pixel servers as fast as possible. ● Each server is effectively a shard of the whole "today" database ● Custom in-memory database updating continuously ● Serving API calls continuously ● Approximately 450 machines ○ CPU load nearly 100% ○ To swap is to die ○ High network bandwidth ○ low latency ○ generously over-provisioned
  • 16. What Didn't Work: MySQL ● Original DB choice ● Performed adequately when daily volume was < 1% of current volume ● Impossible to add new columns to tables ● Easier to create a new shard than to modify an existing one. ● New metrics being added every few weeks, or even days ● Dozens of shards, no consistency in their size
  • 17. What Didn't Work: Redshift ● Intended to compliment MySQL ● Performed adequately when daily volume was < 1% of current volume ● Needed subsecond response, was getting 30s+ response ● Was only machine that had a copy of data across all time ● HDD was slow, tried SSD instances, but had limited space ● Eventually got up to a 26 node cluster with 32 cores per node. ● Cannot distinguish a large query from a small one ● Had no insight into how the data was partitioned ● Reorganizing data according to AWS suggestions would have resulted in vacuums taking several days.
  • 18. What Didn't Work: Vertica ● Intended to compliment MySQL ● Good response times over larger data volumes ● Needed local disk to perform adequately, which limited disk size ● each cluster could only hold a few months of data ● 5 node clusters, 32 cores each. ● Could only have K-safety of 1, or else load took too long (2 hrs vs 10) ● Nodes failed daily, until glibc bug was fixed ● Expensive
  • 19. What Did Work: Postgres ● Migrated OLTP MySQL DB (which held some DW tables) ● Conversion took 2 weeks with 2 programmers ● Used mysql_fdw to create migration tables ● Triggers on tables to identify modified rows ● Moved read-only workloads to postgres instance ● Migrated read-write apps in stages ● Only downtime was in final cut-over ● Single 32 core EC2 with 1-2 physical read replicas
  • 20. What Did Work: Zipfian workloads ● Customers primarily care about data from today and last seven days ● About 85% of all API requests were in that date range ● Vanilla PostgreSQL instance, 32 cores, ample RAM, 5TB disk ● Data partitioned by day. Drop any partitions > 10 days old. ● Stores derivative data, so no need for backup and recovery strategy ● Focus on loading the data as quickly as possible each morning. ● Adjust apps to be aware that certain client's data is available earlier than others ● Codename: L7
  • 21. What Did Work: Getting cute with RAID ● Engineer discovered a quirk in AWS pricing of disk by size ● Could maximize IOPS by combining 30 small drives into a RAID-0 ● Same hardware as an L7 could now store ~40 days of data, but data growth meant that that figure would shrink with time ● Same strategy as L7, just adjusted for longer date coverage ● Codename: ○ L-Month? Would sound silly when X fell below 30 ○ L-More? Accurate but not catchy. ○ L-mo? ○ Elmo
  • 22. What Did Work: Typeahead search ● "Type-ahead" queries must return in < 100ms ● Such queries can be across arbitrary time range ● Scope of response is limited (screen real estate) ● Engineer discovers that our data compresses really well with TOAST ● Specialized instance to store all data at highest grain level, TOASTed ● pseudo-materialized views that aggregate data in search-friendly forms ● Use of "Dimension" tables as a form of compression on the matviews. ● Heavy btree_gin indexing on searchable terms and tokens in dimensions ● Single 32 core machine, abundant memory, 2 read replicas ● Rebuild from scratch would take days, so B&R strategy was needed
  • 23. What Did Work: TOASTing the Kitchen Sink ● Data usage patterns guaranteed that a client usually wants most of the data across their org for whatever date range is requested ● Putting such data in arrays guarantees TOASTing and compression. ● Compression shifts workload from scarce IOPS to abundant CPU ● Size of array chunks was heavily tuned for the EC2 instance type. ● Same RAID-0 as used in Elmo instance could now hold all customer data ● 5 32-core machines with an ETL-load sharing feature such that each one processes a client/day then shares it with other nodes ● Replaced all Redshift and Vertica instances ● Codename: Marjory (the all seeing, all knowing trash heap)
  • 24. What Did Work: Foreign Data Wrappers ● One FDW converted queries into API calls to the in-memory "today" database ● Another one used query quals to determine the set of client-dates that must be fetched ● All client data stored on S3 as both .csv.gz and a compressed SQLite db ● FDW starts web service, launches on lambda per sqlite file ● Lambda queries SQLite file, sends results to web service ● web service re-issues lambdas as needed, returns results to FDW ● Very good for queries across long date ranges ● Codename: Frackles (the name for background monster muppets)
  • 25. What Did Work: PMPP ● Poor Man's Parallel Processing ● Allows an application to issue multiple queries in parallel to multiple servers, provided all the queries have the same shape ● Returns data via a set returning function, which can then do secondary aggregation, joins, etc. ● Any machine that talks libpq could be queried (PgSQL, Vertica, Redshift) ● Allows for partial aggregation on DW boxes ● Secondary aggregation can occur on local machine
  • 26. What Did Work: Decanters ● A place to let the data "breathe" ● Abundant CPUs, abundant memory per CPU, minimal disk ● Very small lookup tables replicated for performance reasons ● All other local tables are FDWs to OLTP database ● Mostly executes aggregation queries that use PMPP to access: Statscache, Elmo, Marjory, Frackles, each one doing a local aggregation ● Final aggregation happens on decanter ● Can occasionally experience OOM (rather than on an important machine) ● New decanter can spin up and enter load balancer in 5 minutes ● No engineering time to be spent rescuing failed decanters
  • 27. Putting it all together with PostgreSQL Tagged Ads Viewable Events Pixel Servers Stats Aggregators S3 - CSVs S3 - SQLite Log shipping Daily Summaries Elmo Clusters Marjory Clusters Search Clusters Daily ETLs
  • 28. Putting it all together with PostgreSQL User Stats Requests Elmo Clusters Marjory Clusters S3 - SQLite PMPP Requests OLTP DB Third Party DW Search Clusters Searches Pg FDW Frackles FDW Pg FDW Decanters Live Stats Aggregators Stats-Cache FDW
  • 29. Why Not RDS? ● No ability to install custom extensions (esp Partitioning modules) ● No place to do local copy operations ● Reduced insight into the server load ● Reduced ability to tune pg server ● No ability to try beta versions ● Expense
  • 30. Why Not Aurora? ● Had early adopter access ● AWS Devs said that it wasn't geared for DW workloads ● Seems nice on I/O ● Nice not having to worry about which servers are read only ● Wasn't there yet ● Data volumes necessitate advanced partitioning ● Expense
  • 31. Why Not Athena? ● Athena had no concept of constraint exclusion to avoid reading irrelevant files ● Costs $5/TB of data read ● Most queries would cost > $100 each ● Running thousands of queries per hour