SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
SRE Demystified
Distributed Monitoring
ganesh@ganeshniyer.com
ganesh.vigneswara@gmail.com,
http://ganeshniyer.com
Dr Ganesh Neelakanta Iyer
SRE
•
2https://image.slidesharecdn.com/devopssreatgooglescale-190121123035/95/devops-sre-at-google-scale-30-638.jpg?cb=1548074257
Why Monitor?
3
Analyzing long-term trends
Comparing time/iterations
Alerting
Building dashboards
How big is my database and how fast is it
growing? How quickly is my daily-active user
count growing?
Are queries faster with Acme Bucket of Bytes
2.72 versus Ajax DB 3.14? How much better
is my memcache hit rate with an extra node?
Is my site slower than it was last week?
Something is broken,
and somebody needs to
fix it right now! Or,
something might break
soon, so attention
needed soon.
Dashboards should answer basic questions
about your service, and normally include
some form of the four golden signals
Conducting ad hoc retrospective analysis
Our latency just shot up; what else happened around the same time?
https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
Setting Reasonable Expectations for Monitoring
• Simpler and faster monitoring systems, with better tools for post
hoc analysis
• Other uses of monitoring data such as capacity planning and
traffic prediction can tolerate more fragility, and thus, more
complexity
• To keep noise low and signal high, the elements of your
monitoring system that direct to a pager need to be very simple
and robust
• Rules that generate alerts for humans should be simple to
understand and represent a clear failure
4https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
Example symptoms and causes
5
Symptom Cause
I’m serving HTTP 500s or 404s Database servers are refusing connections
My responses are slow
CPUs are overloaded by a bogosort, or an Ethernet
cable is crimped under a rack, visible as partial
packet loss
Users in Antarctica aren’t receiving animated cat
GIFs
Your Content Distribution Network hates scientists
and felines, and thus blacklisted some client IPs
Private content is world-readable
A new software push caused ACLs to be forgotten
and allowed all requests
https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
Black-box monitoring White-box monitoring
• Restricted to external
service behaviour
• e.g. ping, http
6
• Internal metrics and logs to
surface system issues
• e.g. request_errors_total,
request_processing_time,
DevOps School
7https://www.devopsschool.com/blog/white-box-monitoring-and-black-box-monitoring-explained/
Four Golden Signals
8
Choosing an Appropriate Resolution for
Measurements
• Observing CPU load over the time span of a minute won’t reveal even
quite long-lived spikes that drive high tail latencies
• For a web service targeting no more than 9 hours aggregate downtime
per year (99.9% annual uptime), probing for a 200 (success) status
more than once or twice a minute is probably unnecessarily frequent.
• Similarly, checking hard drive fullness for a service targeting 99.9%
availability more than once every 1–2 minutes is probably unnecessary
9
References
10
Dr Ganesh Neelakanta Iyer
ganesh@ganeshniyer.com
ganesh.vigneswara@gmail.com

Contenu connexe

Tendances

Understanding parametersniffing sqlsat
Understanding parametersniffing sqlsatUnderstanding parametersniffing sqlsat
Understanding parametersniffing sqlsatSanil Mhatre
 
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorksPerformance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorksThoughtworks
 
Ride the database in JUnit tests with Database Rider
Ride the database in JUnit tests with Database RiderRide the database in JUnit tests with Database Rider
Ride the database in JUnit tests with Database RiderMikalai Alimenkou
 
Ohio 2012-help-sysad-out
Ohio 2012-help-sysad-outOhio 2012-help-sysad-out
Ohio 2012-help-sysad-outmralexjuarez
 
Modern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with KubernetesModern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with KubernetesMikalai Alimenkou
 
SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...
SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...
SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...South Tyrol Free Software Conference
 
ImprovingSBHSRemotetriggered server and website
ImprovingSBHSRemotetriggered server and websiteImprovingSBHSRemotetriggered server and website
ImprovingSBHSRemotetriggered server and websiteDevdutt Kumar
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Lari Hotari
 
Baltimore share point user group june 2015
Baltimore share point user group june 2015Baltimore share point user group june 2015
Baltimore share point user group june 2015Toby McGrail
 

Tendances (9)

Understanding parametersniffing sqlsat
Understanding parametersniffing sqlsatUnderstanding parametersniffing sqlsat
Understanding parametersniffing sqlsat
 
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorksPerformance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
Performance monitoring - Adoniram Mishra, Rupesh Dubey, ThoughtWorks
 
Ride the database in JUnit tests with Database Rider
Ride the database in JUnit tests with Database RiderRide the database in JUnit tests with Database Rider
Ride the database in JUnit tests with Database Rider
 
Ohio 2012-help-sysad-out
Ohio 2012-help-sysad-outOhio 2012-help-sysad-out
Ohio 2012-help-sysad-out
 
Modern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with KubernetesModern CI/CD in the microservices world with Kubernetes
Modern CI/CD in the microservices world with Kubernetes
 
SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...
SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...
SFScon 21 - Eduardo Guerra - A Lean Software Analytics Canvas for Agile Small...
 
ImprovingSBHSRemotetriggered server and website
ImprovingSBHSRemotetriggered server and websiteImprovingSBHSRemotetriggered server and website
ImprovingSBHSRemotetriggered server and website
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014
 
Baltimore share point user group june 2015
Baltimore share point user group june 2015Baltimore share point user group june 2015
Baltimore share point user group june 2015
 

Similaire à SRE Demystified - 06 - Distributed Monitoring

Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...rschuppe
 
Csdn Drdobbs Tenni Theurer Yahoo
Csdn Drdobbs Tenni Theurer YahooCsdn Drdobbs Tenni Theurer Yahoo
Csdn Drdobbs Tenni Theurer Yahooguestb1b95b
 
Perfmon And Profiler 101
Perfmon And Profiler 101Perfmon And Profiler 101
Perfmon And Profiler 101Quest Software
 
The 5S Approach to Performance Tuning by Chuck Ezell
The 5S Approach to Performance Tuning by Chuck EzellThe 5S Approach to Performance Tuning by Chuck Ezell
The 5S Approach to Performance Tuning by Chuck EzellDatavail
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scaleBalvinder Hira
 
High Performance Web Sites
High Performance Web SitesHigh Performance Web Sites
High Performance Web SitesPáris Neto
 
High Performance Websites By Souders Steve
High Performance Websites By Souders SteveHigh Performance Websites By Souders Steve
High Performance Websites By Souders Stevew3guru
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopKevin Crawley
 
Big server-is-watching-you
Big server-is-watching-youBig server-is-watching-you
Big server-is-watching-youmkherlakian
 
Performance Optimization
Performance OptimizationPerformance Optimization
Performance OptimizationNeha Thakur
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Sascha Wenninger
 
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamFrom Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamAndreas Grabner
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Pythondidip
 
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning TrainingOrion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning TrainingSolarWinds
 
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...rschuppe
 
Empowering Customers with Personalized Insights
Empowering Customers with Personalized InsightsEmpowering Customers with Personalized Insights
Empowering Customers with Personalized InsightsCloudera, Inc.
 
SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017Speedment, Inc.
 

Similaire à SRE Demystified - 06 - Distributed Monitoring (20)

Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Csdn Drdobbs Tenni Theurer Yahoo
Csdn Drdobbs Tenni Theurer YahooCsdn Drdobbs Tenni Theurer Yahoo
Csdn Drdobbs Tenni Theurer Yahoo
 
Perfmon And Profiler 101
Perfmon And Profiler 101Perfmon And Profiler 101
Perfmon And Profiler 101
 
The 5S Approach to Performance Tuning by Chuck Ezell
The 5S Approach to Performance Tuning by Chuck EzellThe 5S Approach to Performance Tuning by Chuck Ezell
The 5S Approach to Performance Tuning by Chuck Ezell
 
Observability in real time at scale
Observability in real time at scaleObservability in real time at scale
Observability in real time at scale
 
Plop
PlopPlop
Plop
 
High Performance Web Sites
High Performance Web SitesHigh Performance Web Sites
High Performance Web Sites
 
High Performance Websites By Souders Steve
High Performance Websites By Souders SteveHigh Performance Websites By Souders Steve
High Performance Websites By Souders Steve
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
Big server-is-watching-you
Big server-is-watching-youBig server-is-watching-you
Big server-is-watching-you
 
Performance Optimization
Performance OptimizationPerformance Optimization
Performance Optimization
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 PotsdamFrom Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
From Zero to Performance Hero in Minutes - Agile Testing Days 2014 Potsdam
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
Os Solomon
Os SolomonOs Solomon
Os Solomon
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Python
 
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning TrainingOrion Network Performance Monitor (NPM) Optimization and Tuning Training
Orion Network Performance Monitor (NPM) Optimization and Tuning Training
 
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
Application Performance Troubleshooting 1x1 - Von Schweinen, Schlangen und Pa...
 
Empowering Customers with Personalized Insights
Empowering Customers with Personalized InsightsEmpowering Customers with Personalized Insights
Empowering Customers with Personalized Insights
 
SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017SenchaCon Roadshow Irvine 2017
SenchaCon Roadshow Irvine 2017
 

Plus de Dr Ganesh Iyer

SRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewSRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewDr Ganesh Iyer
 
SRE Demystified - 13 - Docs that matter -2
SRE Demystified - 13 - Docs that matter -2SRE Demystified - 13 - Docs that matter -2
SRE Demystified - 13 - Docs that matter -2Dr Ganesh Iyer
 
SRE Demystified - 10 - Release management-1
SRE Demystified - 10 - Release management-1SRE Demystified - 10 - Release management-1
SRE Demystified - 10 - Release management-1Dr Ganesh Iyer
 
SRE Demystified - 04 - Engagement Model
SRE Demystified - 04 - Engagement ModelSRE Demystified - 04 - Engagement Model
SRE Demystified - 04 - Engagement ModelDr Ganesh Iyer
 
Machine Learning for Statisticians - Introduction
Machine Learning for Statisticians - IntroductionMachine Learning for Statisticians - Introduction
Machine Learning for Statisticians - IntroductionDr Ganesh Iyer
 
Making Decisions - A Game Theoretic approach
Making Decisions - A Game Theoretic approachMaking Decisions - A Game Theoretic approach
Making Decisions - A Game Theoretic approachDr Ganesh Iyer
 
Game Theory and Engineering Applications
Game Theory and Engineering ApplicationsGame Theory and Engineering Applications
Game Theory and Engineering ApplicationsDr Ganesh Iyer
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its ApplicationsDr Ganesh Iyer
 
How to become a successful entrepreneur
How to become a successful entrepreneurHow to become a successful entrepreneur
How to become a successful entrepreneurDr Ganesh Iyer
 
Dockers and kubernetes
Dockers and kubernetesDockers and kubernetes
Dockers and kubernetesDr Ganesh Iyer
 
Containerization Principles Overview for app development and deployment
Containerization Principles Overview for app development and deploymentContainerization Principles Overview for app development and deployment
Containerization Principles Overview for app development and deploymentDr Ganesh Iyer
 
Game Theory and Engineering Applications
Game Theory and Engineering ApplicationsGame Theory and Engineering Applications
Game Theory and Engineering ApplicationsDr Ganesh Iyer
 
Demystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data ScientistsDemystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data ScientistsDr Ganesh Iyer
 
Cloud computing for image processing and bio informatics
Cloud computing for image processing and bio informaticsCloud computing for image processing and bio informatics
Cloud computing for image processing and bio informaticsDr Ganesh Iyer
 
Docker 101 - High level introduction to docker
Docker 101 - High level introduction to dockerDocker 101 - High level introduction to docker
Docker 101 - High level introduction to dockerDr Ganesh Iyer
 
DrGanesh-Jan-17-Resume-V1.0
DrGanesh-Jan-17-Resume-V1.0DrGanesh-Jan-17-Resume-V1.0
DrGanesh-Jan-17-Resume-V1.0Dr Ganesh Iyer
 
Simplify enterprise IT with no code platform - aPaaS
Simplify enterprise IT with no code platform - aPaaSSimplify enterprise IT with no code platform - aPaaS
Simplify enterprise IT with no code platform - aPaaSDr Ganesh Iyer
 

Plus de Dr Ganesh Iyer (20)

SRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overviewSRE Demystified - 14 - SRE Practices overview
SRE Demystified - 14 - SRE Practices overview
 
SRE Demystified - 13 - Docs that matter -2
SRE Demystified - 13 - Docs that matter -2SRE Demystified - 13 - Docs that matter -2
SRE Demystified - 13 - Docs that matter -2
 
SRE Demystified - 10 - Release management-1
SRE Demystified - 10 - Release management-1SRE Demystified - 10 - Release management-1
SRE Demystified - 10 - Release management-1
 
SRE Demystified - 04 - Engagement Model
SRE Demystified - 04 - Engagement ModelSRE Demystified - 04 - Engagement Model
SRE Demystified - 04 - Engagement Model
 
Machine Learning for Statisticians - Introduction
Machine Learning for Statisticians - IntroductionMachine Learning for Statisticians - Introduction
Machine Learning for Statisticians - Introduction
 
Making Decisions - A Game Theoretic approach
Making Decisions - A Game Theoretic approachMaking Decisions - A Game Theoretic approach
Making Decisions - A Game Theoretic approach
 
Cloud and Industry4.0
Cloud and Industry4.0Cloud and Industry4.0
Cloud and Industry4.0
 
Game Theory and Engineering Applications
Game Theory and Engineering ApplicationsGame Theory and Engineering Applications
Game Theory and Engineering Applications
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
How to become a successful entrepreneur
How to become a successful entrepreneurHow to become a successful entrepreneur
How to become a successful entrepreneur
 
Dockers and kubernetes
Dockers and kubernetesDockers and kubernetes
Dockers and kubernetes
 
Containerization Principles Overview for app development and deployment
Containerization Principles Overview for app development and deploymentContainerization Principles Overview for app development and deployment
Containerization Principles Overview for app development and deployment
 
Game Theory and Engineering Applications
Game Theory and Engineering ApplicationsGame Theory and Engineering Applications
Game Theory and Engineering Applications
 
Demystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data ScientistsDemystifying Containerization Principles for Data Scientists
Demystifying Containerization Principles for Data Scientists
 
Cloud computing for image processing and bio informatics
Cloud computing for image processing and bio informaticsCloud computing for image processing and bio informatics
Cloud computing for image processing and bio informatics
 
Io t trends
Io t trendsIo t trends
Io t trends
 
Trends in IoT 2017
Trends in IoT 2017Trends in IoT 2017
Trends in IoT 2017
 
Docker 101 - High level introduction to docker
Docker 101 - High level introduction to dockerDocker 101 - High level introduction to docker
Docker 101 - High level introduction to docker
 
DrGanesh-Jan-17-Resume-V1.0
DrGanesh-Jan-17-Resume-V1.0DrGanesh-Jan-17-Resume-V1.0
DrGanesh-Jan-17-Resume-V1.0
 
Simplify enterprise IT with no code platform - aPaaS
Simplify enterprise IT with no code platform - aPaaSSimplify enterprise IT with no code platform - aPaaS
Simplify enterprise IT with no code platform - aPaaS
 

Dernier

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Dernier (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

SRE Demystified - 06 - Distributed Monitoring

  • 3. Why Monitor? 3 Analyzing long-term trends Comparing time/iterations Alerting Building dashboards How big is my database and how fast is it growing? How quickly is my daily-active user count growing? Are queries faster with Acme Bucket of Bytes 2.72 versus Ajax DB 3.14? How much better is my memcache hit rate with an extra node? Is my site slower than it was last week? Something is broken, and somebody needs to fix it right now! Or, something might break soon, so attention needed soon. Dashboards should answer basic questions about your service, and normally include some form of the four golden signals Conducting ad hoc retrospective analysis Our latency just shot up; what else happened around the same time? https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
  • 4. Setting Reasonable Expectations for Monitoring • Simpler and faster monitoring systems, with better tools for post hoc analysis • Other uses of monitoring data such as capacity planning and traffic prediction can tolerate more fragility, and thus, more complexity • To keep noise low and signal high, the elements of your monitoring system that direct to a pager need to be very simple and robust • Rules that generate alerts for humans should be simple to understand and represent a clear failure 4https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
  • 5. Example symptoms and causes 5 Symptom Cause I’m serving HTTP 500s or 404s Database servers are refusing connections My responses are slow CPUs are overloaded by a bogosort, or an Ethernet cable is crimped under a rack, visible as partial packet loss Users in Antarctica aren’t receiving animated cat GIFs Your Content Distribution Network hates scientists and felines, and thus blacklisted some client IPs Private content is world-readable A new software push caused ACLs to be forgotten and allowed all requests https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/
  • 6. Black-box monitoring White-box monitoring • Restricted to external service behaviour • e.g. ping, http 6 • Internal metrics and logs to surface system issues • e.g. request_errors_total, request_processing_time, DevOps School
  • 9. Choosing an Appropriate Resolution for Measurements • Observing CPU load over the time span of a minute won’t reveal even quite long-lived spikes that drive high tail latencies • For a web service targeting no more than 9 hours aggregate downtime per year (99.9% annual uptime), probing for a 200 (success) status more than once or twice a minute is probably unnecessarily frequent. • Similarly, checking hard drive fullness for a service targeting 99.9% availability more than once every 1–2 minutes is probably unnecessary 9
  • 11. Dr Ganesh Neelakanta Iyer ganesh@ganeshniyer.com ganesh.vigneswara@gmail.com