SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
Carbon Fiber Tank, SpaceX
How to lower the
costs of your Drupal
Site's resources and
plan Capacity in
advance
ricardoamaro sre@acquia
About me
@ricardoamaro
● Principal SRE @Acquia (Cloud Data Team)
● Joined in December 2011
● Location: Lisbon, Portugal
● Co-authored Seeking SRE w/ Machine Learning for SRE (O’Reilly)
● Founder and Lead of the Portuguese Drupal Association
● Fun Facts:
○ Presented in DevOps events including DrupalCons.
○ Dedicated father of 2 kids and still manages to study and write.
○ First Linux installation: Slackware in 1994.
○ Former theatre actor.
Agenda
What we will be talking about
The problem
What is Capacity
Why do Capacity Planning
Relation to Site Reliability Engineering
Budget & Capacity Planning
Load Testing
Performance Tuning vs. Capacity Planning
What to measure
How to measure
How to track capacity
Forecasting
First Easy Steps
Conclusions
The Problem
Site Launch & User Expectations
Falcon Heavy launch, Spacex
Typical Drupal Site Launch
What about
Capacity Planning??
- Disable devel
- Configure cron
- Check The Upload Sizes & Execution Time
- Check Recipient Email Addresses
- Set The File Permissions
- Protect Your Root Account
- Check Permissions
- Turn Off Error Reporting
- Handle 404 Errors Gracefully
- Check Robots.txt
- Combine Pathauto With Global Redirect
- Create A Maintenance Page
- Configure Caching
- Css And Javascript Optimisation
- Check Unpublished Content Is Not Visible
- Configure Statistics
- Monitor the Site
-
** Plan for Failure **
User Expectations
Drupal click screenshot
● The end goal of capacity
planning is a smooth and
speedy experience for the users
● Varies depending on what type
of application is and what
portion of the application they
interact with
No silver bullet
● Plenty of capacity but a slow
website or unavailable
● Capacity is only one part of
making the end-user experience
fast
● We want to measure and track
to make forecasts
● Intolerable amount of latency
should raise a flag
What is
Capacity
resources required to run your services
in the context you have chosen to run them
Carbon Fiber Tank, SpaceX
Capacity in Site Reliability Engineering (SRE)
● Capacity: The maximum amount of output a product deployment is
capable of completing in a given period of time
● Capacity planning: Process that determines the resources needed,
like people, instances, CPU, memory, time and more, for the company
to meet changing demands for its services
● In the Drupal World we focus mostly on serving WEB capacity
Resource management
The Art of Capacity Planning
Arun Kejariwal, John Allspaw
"O'Reilly Media, Inc."
● Ensure proper resources are
available to handle load
● Define procurement and an
approval process
● Justify capital needs
● Manage resources after
deployment
Why do
Capacity Planning
Kroger grocery store, Lexington Kentucky,
1947, by Brett Streutket
Quick and Dirty Math
● Only spend as much as you
actually need
● Be ahead of sharp growth
● Avoid emergencies
Stay Fast and Reliable
Site Reliability
Engineering
Rocket Laboratory, 1952
NASA/William A. Bowles
Ben Treynor - Google
...an SRE team is responsible for
the availability, latency,
performance, efficiency, change
management, monitoring,
emergency response, and capacity
planning of their service(s)...
“
“
Demand Forecasting and Capacity Planning
● Ensuring that there is sufficient
capacity and redundancy
● Serve projected future demand
with the required availability
● Ensure the required capacity is
in place by the time it is needed
● Take both organic and inorganic
growth into account
https://unsplash.com/photos/mexeVPlTB6k
How SRE advocates for Capacity Planning
● Perform regular load testing
● Incorporate SLOs on Capacity
● Capacity is critical to
availability, therefore the SRE
team leads capacity planning
initiatives and provisioning
https://unsplash.com/photos/DX9X0g0Cg88
Budget & Capacity Planning
Vintage Grow Your Money
by Chris Potter, ccPixs.com
Keeping the costs low
● Meet with Finance, Engineering
and Product
● Gather Systems and Application
metrics
● Use that data to justify the
investment Three forces that impact Capacity Planning
Product
FinanceEngineering
Plan
Load Testing
“Hope is not a strategy”
St. Margrethen - Load Test by Kecko
Load testing a Drupal stack
● How to load test?
“Hit it until it breaks”
● Include the points of failure in
the calculations
● Determining backend limits can
be tricky
● Use those resource ceilings as a
basis while predicting future
growth
https://docs.acquia.com/acquia-cloud/arch/
Database Backend Load Test
➔ How many queries/second (QPS)
can the DB server manage?
➔ How many QPS can it serve
before performance
degradation affects end-user
experience?
● What load will cause the
database to be unresponsive or
fail-over? Allowing to set alert
thresholds accordingly.
● What to expect from adding (or
removing) nodes to the
backend?
● When to begin sizing for a new
database capacity?
A Few Load testing Tools
simulate
● Loadrunner
○ http://bit.ly/microfocus-loadrunner
● Iago
○ https://github.com/twitter/iago
● JMeter
○ http://jmeter.apache.org/
collect
● Prometheus
○ http://www.prometheus.io/
● Signalfx
○ http://www.signalfx.com/
● Cacti
○ http://cacti.net
● Ganglia
○ http://ganglia.info
● Nagios
○ http://nagios.org/
https://www.gocomics.com/calvinandhobbes/1986/11/26
Performance Tuning
vs. Capacity planning
(different goals)
Top Speed
by Alexander Nie
What to measure
defining the metrics
End-of-life
by Dennis van Zuijlekom
Divide & Conquer
● Splitting nodes
● Understand capacity demands
of each node
● Measure more distinctly
● How requests or queries per
second affect resources
Identifying the key resources to measure
● Disk space (MB)
● Disk throughput (IOPS)
● CPU performance (FLOPS)
● RAM memory (MB)
● Network bandwidth (Mbps)
● Network IP pool (Netmask)
● Others
How to measure
Living Computer Museum, Seattle
http://www.brendangregg.com/Perf/linux_perf_tools_full.png
| Tools to measure on Linux servers |
Collecting resources on web servers
TODO: CODE
● Example script that
sends metrics to statsd
● Low footprint using
/proc, df and ps
● For a constant reliable
monitoring service use
collectd: https://collectd.org
or Telegraf:
https://www.influxdata.com/time-
series-platform/telegraf/
How to track Capacity
Store and display time-series
● Signalfx
● Cacti
● Ganglia
● Graphite
● Signalfx
● Datadog
● Ruxit
● LogicMonitor
● Sematext
● CoScale
● Riemann
● Prometheus
● Sensu
● Idera
● Bijk
● X-Pack
● vRealize Hyperic HQ
A couple of load testing tips
load testing Tutorials:
https://www.tutorialspoint.com/jmeter
https://www.blazemeter.com/load-testing
docker app for grafana:
https://github.com/kamon-io/docker-grafana-graphite
Forecasting
(predicting trends)
Numbers And Finance by SeniorLiving.org
Predict the future?
● Use Context & Math
● Make educated guesses
● Long-term view is generally
steady
● Generate estimates to sustain
growth
● Use an adjustable process
● Forecast guides autoscaling
policies
Ceilings and Historical data
● Daily storage consumption
example
● Metric: total available disk space
● Cumulative total provides an
historical perspective
● We can predict future needs
● Storage will probably be
exhausted in the ceiling to
where the line is headed
Curve fitting
● Curve fitting
● Creative & Scientific
● Stay ahead of growth
● Use time-series data
● Forecast by constructing new
data points beyond the known
● Reconciliation of what we know
and the best fit equation
● Consider context before math
y = mx+b
Forecasting Peak-Driven Resource Usage
● Track how the peaks change over time
● Extrapolate from that data to predict
future needs
● Identify the server resource ceilings
● Find a relation between resources and
application-level work
● Decide if we should scale vertically or
horizontally
● and perform proactive autoscalling
● Fityk is an Open Source
Software for nonlinear fitting
of analytical functions to data.
● Incorporate cfityk scripts into
automated curve fitting, like:
cfityk ricardo-disk.fit
@0 < ricardo-disk.csv
guess Quadratic
fit
info formula
quit
Returns the formula:
4888.18 + 363.063 * x + 8.91132 + -1.55119*x + 0.0660771*x^2
Homepage: https://fityk.nieto.pl/
cfityk ricardo-disk.fit
@0 < ricardo-disk.csv
guess Quadratic
fit
info formula
quit
Automating Forecasts with fityk & cfityk
Small demo: https://youtube.com/watch?v=EZnyq1Hr_7I
Forecasting with Machine Learning
Seeking SRE
Conversations About
Running Production Systems
at Scale
Publisher: O'Reilly Media
● Most popular method for
curve-fitting in fityk is
Levenberg-Marquardt
● ML is also an option for
forecasting (book I co-authored)
● Code examples and guides
https://github.com/ricardoamaro/MachineLearning4SRE
Start with Easy Steps
Get Started
1. Select a process owner.
2. Identify the resources to be measured.
3. Measure these resources.
4. Compare to maximum capacity.
5. Collect workload forecasts.
6. Use forecasts for IT resource requirements.
7. Map requirements onto existing utilizations.
8. Predict when the system will be out of capacity.
9. Update forecasts and utilizations.
Set a Goal!
● Two Classes:
○ Load: usually expressed in
arrival rate or peak rate of
requests hitting the service
eg. target for 10.000 authenticated concurrent
Drupal users
○ Performance: usually expressed
in the form of Service Level
Objectives
eg. 99th percentile of all requests should return
in less 500ms
Be proactive
( plan & document ahead)
Picasso drawing with Paloma and Claude at Villa la Galloise, 1953.
By Edward Quinn, EdwardQuinn.com.
Capacity Planning Dashboard
● Support your conclusions with
metrics in a dashboard
● Both manual scaling and auto
scaling decision should be based
on real data
● When to scale?
○ date and time (be alerted if needed)
● How to scale?
○ vertical, horizontal or diagonal scaling
(Example) Drupal Cluster Dashboard
type valu
e
limit/
node
ceiling
units
limit
(total)
current
(peak)
peak
%
Estimated
days left
Varnish
cache
28 1024 req/sec 2048 600 29% 830
Web 31 80 busy calls 160 145 90% 12
Database 15 60 connections 120 96 80% 36
Storage 14 30 TB 30 14 46% 21
Conclusions
Drive the system to the appropriate level of risk for the lowest cost.
Join us for
contribution opportunities
Thursday, October 31, 2019
9:00-18:00
Room: Europe Foyer 2
Mentored
Contribution
First Time
Contributor Workshop
General
Contribution
#DrupalContributions
9:00-14:00
Room: Diamond Lounge
9:00-18:00
Room: Europe Foyer 2

Contenu connexe

Tendances

Pivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewPivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewVMware Tanzu
 
Introduction to AWS Database Services
Introduction to AWS Database ServicesIntroduction to AWS Database Services
Introduction to AWS Database ServicesAmazon Web Services
 
금융 X 하이브리드 클라우드 플랫폼 - 한화생명 디지털 트랜스포메이션 전략 - 김나영 AWS 금융부문 사업개발 담당 / 박인규 AWS 금융...
금융 X 하이브리드 클라우드 플랫폼 - 한화생명 디지털 트랜스포메이션 전략 - 김나영 AWS 금융부문 사업개발 담당 / 박인규 AWS 금융...금융 X 하이브리드 클라우드 플랫폼 - 한화생명 디지털 트랜스포메이션 전략 - 김나영 AWS 금융부문 사업개발 담당 / 박인규 AWS 금융...
금융 X 하이브리드 클라우드 플랫폼 - 한화생명 디지털 트랜스포메이션 전략 - 김나영 AWS 금융부문 사업개발 담당 / 박인규 AWS 금융...Amazon Web Services Korea
 
Google Cloud Platform (GCP)
Google Cloud Platform (GCP)Google Cloud Platform (GCP)
Google Cloud Platform (GCP)Chetan Sharma
 
IaC로 AWS인프라 관리하기 - 이진성 (AUSG) :: AWS Community Day Online 2021
IaC로 AWS인프라 관리하기 - 이진성 (AUSG) :: AWS Community Day Online 2021IaC로 AWS인프라 관리하기 - 이진성 (AUSG) :: AWS Community Day Online 2021
IaC로 AWS인프라 관리하기 - 이진성 (AUSG) :: AWS Community Day Online 2021AWSKRUG - AWS한국사용자모임
 
AWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services ComparisonAWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services ComparisonAniket Kanitkar
 
AWS basics
AWS basicsAWS basics
AWS basicsmbaric
 
Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)Garvit Anand
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREAraf Karsh Hamid
 
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...Amazon Web Services
 
Cloud native principles
Cloud native principlesCloud native principles
Cloud native principlesDiego Pacheco
 
Your Journey to the Cloud
Your Journey to the CloudYour Journey to the Cloud
Your Journey to the CloudDori Degenhardt
 

Tendances (20)

Pivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewPivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical Overview
 
What is Serverless Computing?
What is Serverless Computing?What is Serverless Computing?
What is Serverless Computing?
 
Introduction to AWS Database Services
Introduction to AWS Database ServicesIntroduction to AWS Database Services
Introduction to AWS Database Services
 
Azure Cloud Services
Azure Cloud ServicesAzure Cloud Services
Azure Cloud Services
 
금융 X 하이브리드 클라우드 플랫폼 - 한화생명 디지털 트랜스포메이션 전략 - 김나영 AWS 금융부문 사업개발 담당 / 박인규 AWS 금융...
금융 X 하이브리드 클라우드 플랫폼 - 한화생명 디지털 트랜스포메이션 전략 - 김나영 AWS 금융부문 사업개발 담당 / 박인규 AWS 금융...금융 X 하이브리드 클라우드 플랫폼 - 한화생명 디지털 트랜스포메이션 전략 - 김나영 AWS 금융부문 사업개발 담당 / 박인규 AWS 금융...
금융 X 하이브리드 클라우드 플랫폼 - 한화생명 디지털 트랜스포메이션 전략 - 김나영 AWS 금융부문 사업개발 담당 / 박인규 AWS 금융...
 
Deep Dive on Amazon RDS
Deep Dive on Amazon RDSDeep Dive on Amazon RDS
Deep Dive on Amazon RDS
 
Google Cloud Platform (GCP)
Google Cloud Platform (GCP)Google Cloud Platform (GCP)
Google Cloud Platform (GCP)
 
IaC로 AWS인프라 관리하기 - 이진성 (AUSG) :: AWS Community Day Online 2021
IaC로 AWS인프라 관리하기 - 이진성 (AUSG) :: AWS Community Day Online 2021IaC로 AWS인프라 관리하기 - 이진성 (AUSG) :: AWS Community Day Online 2021
IaC로 AWS인프라 관리하기 - 이진성 (AUSG) :: AWS Community Day Online 2021
 
AWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services ComparisonAWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services Comparison
 
AWS basics
AWS basicsAWS basics
AWS basics
 
Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)Introduction to Amazon Web Services (AWS)
Introduction to Amazon Web Services (AWS)
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SRE
 
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
 
Introduction to Serverless
Introduction to ServerlessIntroduction to Serverless
Introduction to Serverless
 
AWS core services
AWS core servicesAWS core services
AWS core services
 
Aws certified solutions architect
Aws certified solutions architectAws certified solutions architect
Aws certified solutions architect
 
Getting Started with Amazon EC2
Getting Started with Amazon EC2Getting Started with Amazon EC2
Getting Started with Amazon EC2
 
Cloud native principles
Cloud native principlesCloud native principles
Cloud native principles
 
Your Journey to the Cloud
Your Journey to the CloudYour Journey to the Cloud
Your Journey to the Cloud
 
AWS Overview in a Single Diagram
AWS Overview in a Single DiagramAWS Overview in a Single Diagram
AWS Overview in a Single Diagram
 

Similaire à Capacity Planning Infrastructure for Web Applications (Drupal)

[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...Srijan Technologies
 
[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification
[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification
[Study Guide] Google Professional Cloud Architect (GCP-PCA) CertificationAmaaira Johns
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analyticsSouth West Data Meetup
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingVianney FOUCAULT
 
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...Altinity Ltd
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On DemandBogdan Kyryliuk
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesWeaveworks
 
Accelerating Digital Transformation: It's About Digital Enablement
Accelerating Digital Transformation:  It's About Digital EnablementAccelerating Digital Transformation:  It's About Digital Enablement
Accelerating Digital Transformation: It's About Digital EnablementJoshua Gossett
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalSub Szabolcs Feczak
 
KSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKristofferson A
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Giridhar Addepalli
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuningYosuke Mizutani
 
Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsNicolas (Nick) Barcet
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production Hung Lin
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAlluxio, Inc.
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Brian Brazil
 
Triple 20 IT – How to reduce costs on target while increasing speed and quality
Triple 20 IT – How to reduce costs on target while increasing speed and qualityTriple 20 IT – How to reduce costs on target while increasing speed and quality
Triple 20 IT – How to reduce costs on target while increasing speed and qualityStormForge .io
 

Similaire à Capacity Planning Infrastructure for Web Applications (Drupal) (20)

[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
 
[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification
[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification
[Study Guide] Google Professional Cloud Architect (GCP-PCA) Certification
 
Leveraging open source for large scale analytics
Leveraging open source for large scale analyticsLeveraging open source for large scale analytics
Leveraging open source for large scale analytics
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
 
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
ClickHouse Paris Meetup. ClickHouse at ContentSquare, by Christophe Kalenzaga...
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 
Kubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slidesKubecon seattle 2018 workshop slides
Kubecon seattle 2018 workshop slides
 
Accelerating Digital Transformation: It's About Digital Enablement
Accelerating Digital Transformation:  It's About Digital EnablementAccelerating Digital Transformation:  It's About Digital Enablement
Accelerating Digital Transformation: It's About Digital Enablement
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
KSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success Story
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 
Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01Adtech scala-performance-tuning-150323223738-conversion-gate01
Adtech scala-performance-tuning-150323223738-conversion-gate01
 
Adtech x Scala x Performance tuning
Adtech x Scala x Performance tuningAdtech x Scala x Performance tuning
Adtech x Scala x Performance tuning
 
Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOps
 
6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production 6 Months Sailing with Docker in Production
6 Months Sailing with Docker in Production
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
Triple 20 IT – How to reduce costs on target while increasing speed and quality
Triple 20 IT – How to reduce costs on target while increasing speed and qualityTriple 20 IT – How to reduce costs on target while increasing speed and quality
Triple 20 IT – How to reduce costs on target while increasing speed and quality
 

Plus de Ricardo Amaro

SRE - drupal day aveiro 2016
SRE - drupal day aveiro 2016SRE - drupal day aveiro 2016
SRE - drupal day aveiro 2016Ricardo Amaro
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsRicardo Amaro
 
Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing Ricardo Amaro
 
Drupal workshop ist 2014
Drupal workshop ist 2014Drupal workshop ist 2014
Drupal workshop ist 2014Ricardo Amaro
 
Drupal workshop fcul_2014
Drupal workshop fcul_2014Drupal workshop fcul_2014
Drupal workshop fcul_2014Ricardo Amaro
 
The free software history and communities’ journey ahead
The free software history and communities’ journey aheadThe free software history and communities’ journey ahead
The free software history and communities’ journey aheadRicardo Amaro
 
Drupalcamp es 2013 drupal with lxc docker and vagrant
Drupalcamp es 2013  drupal with lxc docker and vagrant Drupalcamp es 2013  drupal with lxc docker and vagrant
Drupalcamp es 2013 drupal with lxc docker and vagrant Ricardo Amaro
 
Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant Ricardo Amaro
 
_ Drupal and the Art of Scrum _
_ Drupal and the Art of Scrum __ Drupal and the Art of Scrum _
_ Drupal and the Art of Scrum _Ricardo Amaro
 

Plus de Ricardo Amaro (11)

Web Devtoolspanel
Web DevtoolspanelWeb Devtoolspanel
Web Devtoolspanel
 
SRE - drupal day aveiro 2016
SRE - drupal day aveiro 2016SRE - drupal day aveiro 2016
SRE - drupal day aveiro 2016
 
S.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systemsS.R.E - create ultra-scalable and highly reliable systems
S.R.E - create ultra-scalable and highly reliable systems
 
Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing Docker containers & the Future of Drupal testing
Docker containers & the Future of Drupal testing
 
Drupal workshop ist 2014
Drupal workshop ist 2014Drupal workshop ist 2014
Drupal workshop ist 2014
 
Drupal workshop fcul_2014
Drupal workshop fcul_2014Drupal workshop fcul_2014
Drupal workshop fcul_2014
 
The free software history and communities’ journey ahead
The free software history and communities’ journey aheadThe free software history and communities’ journey ahead
The free software history and communities’ journey ahead
 
Drupalcamp es 2013 drupal with lxc docker and vagrant
Drupalcamp es 2013  drupal with lxc docker and vagrant Drupalcamp es 2013  drupal with lxc docker and vagrant
Drupalcamp es 2013 drupal with lxc docker and vagrant
 
Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant Automate drupal deployments with linux containers, docker and vagrant
Automate drupal deployments with linux containers, docker and vagrant
 
_ Drupal and the Art of Scrum _
_ Drupal and the Art of Scrum __ Drupal and the Art of Scrum _
_ Drupal and the Art of Scrum _
 
Cck views
Cck viewsCck views
Cck views
 

Dernier

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Capacity Planning Infrastructure for Web Applications (Drupal)

  • 1. Carbon Fiber Tank, SpaceX How to lower the costs of your Drupal Site's resources and plan Capacity in advance ricardoamaro sre@acquia
  • 2. About me @ricardoamaro ● Principal SRE @Acquia (Cloud Data Team) ● Joined in December 2011 ● Location: Lisbon, Portugal ● Co-authored Seeking SRE w/ Machine Learning for SRE (O’Reilly) ● Founder and Lead of the Portuguese Drupal Association ● Fun Facts: ○ Presented in DevOps events including DrupalCons. ○ Dedicated father of 2 kids and still manages to study and write. ○ First Linux installation: Slackware in 1994. ○ Former theatre actor.
  • 3. Agenda What we will be talking about The problem What is Capacity Why do Capacity Planning Relation to Site Reliability Engineering Budget & Capacity Planning Load Testing Performance Tuning vs. Capacity Planning What to measure How to measure How to track capacity Forecasting First Easy Steps Conclusions
  • 4. The Problem Site Launch & User Expectations Falcon Heavy launch, Spacex
  • 5. Typical Drupal Site Launch What about Capacity Planning?? - Disable devel - Configure cron - Check The Upload Sizes & Execution Time - Check Recipient Email Addresses - Set The File Permissions - Protect Your Root Account - Check Permissions - Turn Off Error Reporting - Handle 404 Errors Gracefully - Check Robots.txt - Combine Pathauto With Global Redirect - Create A Maintenance Page - Configure Caching - Css And Javascript Optimisation - Check Unpublished Content Is Not Visible - Configure Statistics - Monitor the Site - ** Plan for Failure **
  • 6. User Expectations Drupal click screenshot ● The end goal of capacity planning is a smooth and speedy experience for the users ● Varies depending on what type of application is and what portion of the application they interact with
  • 7. No silver bullet ● Plenty of capacity but a slow website or unavailable ● Capacity is only one part of making the end-user experience fast ● We want to measure and track to make forecasts ● Intolerable amount of latency should raise a flag
  • 8. What is Capacity resources required to run your services in the context you have chosen to run them Carbon Fiber Tank, SpaceX
  • 9. Capacity in Site Reliability Engineering (SRE) ● Capacity: The maximum amount of output a product deployment is capable of completing in a given period of time ● Capacity planning: Process that determines the resources needed, like people, instances, CPU, memory, time and more, for the company to meet changing demands for its services ● In the Drupal World we focus mostly on serving WEB capacity
  • 10. Resource management The Art of Capacity Planning Arun Kejariwal, John Allspaw "O'Reilly Media, Inc." ● Ensure proper resources are available to handle load ● Define procurement and an approval process ● Justify capital needs ● Manage resources after deployment
  • 11. Why do Capacity Planning Kroger grocery store, Lexington Kentucky, 1947, by Brett Streutket
  • 12. Quick and Dirty Math ● Only spend as much as you actually need ● Be ahead of sharp growth ● Avoid emergencies Stay Fast and Reliable
  • 13. Site Reliability Engineering Rocket Laboratory, 1952 NASA/William A. Bowles
  • 14. Ben Treynor - Google ...an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s)... “ “
  • 15. Demand Forecasting and Capacity Planning ● Ensuring that there is sufficient capacity and redundancy ● Serve projected future demand with the required availability ● Ensure the required capacity is in place by the time it is needed ● Take both organic and inorganic growth into account https://unsplash.com/photos/mexeVPlTB6k
  • 16. How SRE advocates for Capacity Planning ● Perform regular load testing ● Incorporate SLOs on Capacity ● Capacity is critical to availability, therefore the SRE team leads capacity planning initiatives and provisioning https://unsplash.com/photos/DX9X0g0Cg88
  • 17. Budget & Capacity Planning Vintage Grow Your Money by Chris Potter, ccPixs.com
  • 18. Keeping the costs low ● Meet with Finance, Engineering and Product ● Gather Systems and Application metrics ● Use that data to justify the investment Three forces that impact Capacity Planning Product FinanceEngineering Plan
  • 19. Load Testing “Hope is not a strategy” St. Margrethen - Load Test by Kecko
  • 20. Load testing a Drupal stack ● How to load test? “Hit it until it breaks” ● Include the points of failure in the calculations ● Determining backend limits can be tricky ● Use those resource ceilings as a basis while predicting future growth https://docs.acquia.com/acquia-cloud/arch/
  • 21. Database Backend Load Test ➔ How many queries/second (QPS) can the DB server manage? ➔ How many QPS can it serve before performance degradation affects end-user experience? ● What load will cause the database to be unresponsive or fail-over? Allowing to set alert thresholds accordingly. ● What to expect from adding (or removing) nodes to the backend? ● When to begin sizing for a new database capacity?
  • 22. A Few Load testing Tools simulate ● Loadrunner ○ http://bit.ly/microfocus-loadrunner ● Iago ○ https://github.com/twitter/iago ● JMeter ○ http://jmeter.apache.org/ collect ● Prometheus ○ http://www.prometheus.io/ ● Signalfx ○ http://www.signalfx.com/ ● Cacti ○ http://cacti.net ● Ganglia ○ http://ganglia.info ● Nagios ○ http://nagios.org/ https://www.gocomics.com/calvinandhobbes/1986/11/26
  • 23. Performance Tuning vs. Capacity planning (different goals) Top Speed by Alexander Nie
  • 24. What to measure defining the metrics End-of-life by Dennis van Zuijlekom
  • 25. Divide & Conquer ● Splitting nodes ● Understand capacity demands of each node ● Measure more distinctly ● How requests or queries per second affect resources
  • 26. Identifying the key resources to measure ● Disk space (MB) ● Disk throughput (IOPS) ● CPU performance (FLOPS) ● RAM memory (MB) ● Network bandwidth (Mbps) ● Network IP pool (Netmask) ● Others
  • 27. How to measure Living Computer Museum, Seattle
  • 29. Collecting resources on web servers TODO: CODE ● Example script that sends metrics to statsd ● Low footprint using /proc, df and ps ● For a constant reliable monitoring service use collectd: https://collectd.org or Telegraf: https://www.influxdata.com/time- series-platform/telegraf/
  • 30. How to track Capacity
  • 31. Store and display time-series ● Signalfx ● Cacti ● Ganglia ● Graphite ● Signalfx ● Datadog ● Ruxit ● LogicMonitor ● Sematext ● CoScale ● Riemann ● Prometheus ● Sensu ● Idera ● Bijk ● X-Pack ● vRealize Hyperic HQ
  • 32. A couple of load testing tips load testing Tutorials: https://www.tutorialspoint.com/jmeter https://www.blazemeter.com/load-testing docker app for grafana: https://github.com/kamon-io/docker-grafana-graphite
  • 33. Forecasting (predicting trends) Numbers And Finance by SeniorLiving.org
  • 34. Predict the future? ● Use Context & Math ● Make educated guesses ● Long-term view is generally steady ● Generate estimates to sustain growth ● Use an adjustable process ● Forecast guides autoscaling policies
  • 35. Ceilings and Historical data ● Daily storage consumption example ● Metric: total available disk space ● Cumulative total provides an historical perspective ● We can predict future needs ● Storage will probably be exhausted in the ceiling to where the line is headed
  • 36. Curve fitting ● Curve fitting ● Creative & Scientific ● Stay ahead of growth ● Use time-series data ● Forecast by constructing new data points beyond the known ● Reconciliation of what we know and the best fit equation ● Consider context before math y = mx+b
  • 37. Forecasting Peak-Driven Resource Usage ● Track how the peaks change over time ● Extrapolate from that data to predict future needs ● Identify the server resource ceilings ● Find a relation between resources and application-level work ● Decide if we should scale vertically or horizontally ● and perform proactive autoscalling
  • 38. ● Fityk is an Open Source Software for nonlinear fitting of analytical functions to data. ● Incorporate cfityk scripts into automated curve fitting, like: cfityk ricardo-disk.fit @0 < ricardo-disk.csv guess Quadratic fit info formula quit Returns the formula: 4888.18 + 363.063 * x + 8.91132 + -1.55119*x + 0.0660771*x^2 Homepage: https://fityk.nieto.pl/ cfityk ricardo-disk.fit @0 < ricardo-disk.csv guess Quadratic fit info formula quit Automating Forecasts with fityk & cfityk Small demo: https://youtube.com/watch?v=EZnyq1Hr_7I
  • 39. Forecasting with Machine Learning Seeking SRE Conversations About Running Production Systems at Scale Publisher: O'Reilly Media ● Most popular method for curve-fitting in fityk is Levenberg-Marquardt ● ML is also an option for forecasting (book I co-authored) ● Code examples and guides https://github.com/ricardoamaro/MachineLearning4SRE
  • 41. Get Started 1. Select a process owner. 2. Identify the resources to be measured. 3. Measure these resources. 4. Compare to maximum capacity. 5. Collect workload forecasts. 6. Use forecasts for IT resource requirements. 7. Map requirements onto existing utilizations. 8. Predict when the system will be out of capacity. 9. Update forecasts and utilizations.
  • 42. Set a Goal! ● Two Classes: ○ Load: usually expressed in arrival rate or peak rate of requests hitting the service eg. target for 10.000 authenticated concurrent Drupal users ○ Performance: usually expressed in the form of Service Level Objectives eg. 99th percentile of all requests should return in less 500ms
  • 43. Be proactive ( plan & document ahead) Picasso drawing with Paloma and Claude at Villa la Galloise, 1953. By Edward Quinn, EdwardQuinn.com.
  • 44. Capacity Planning Dashboard ● Support your conclusions with metrics in a dashboard ● Both manual scaling and auto scaling decision should be based on real data ● When to scale? ○ date and time (be alerted if needed) ● How to scale? ○ vertical, horizontal or diagonal scaling (Example) Drupal Cluster Dashboard type valu e limit/ node ceiling units limit (total) current (peak) peak % Estimated days left Varnish cache 28 1024 req/sec 2048 600 29% 830 Web 31 80 busy calls 160 145 90% 12 Database 15 60 connections 120 96 80% 36 Storage 14 30 TB 30 14 46% 21
  • 45. Conclusions Drive the system to the appropriate level of risk for the lowest cost.
  • 46. Join us for contribution opportunities Thursday, October 31, 2019 9:00-18:00 Room: Europe Foyer 2 Mentored Contribution First Time Contributor Workshop General Contribution #DrupalContributions 9:00-14:00 Room: Diamond Lounge 9:00-18:00 Room: Europe Foyer 2