SlideShare une entreprise Scribd logo
1  sur  35
Monitoring Patterns for
Mitigating Technical Risk
@Forter
#1 risk
Slow or bad (500) API responses
Auto-healing
because humans are slow
SLA, Failover, Degradation, Throttling
Alerting
Detect, Filter, Alert, Diagnostics
SLA
Performance Data Loss Business Logic
TX Processing Low Latency Nope Best Effort
Stream
Processing High Throughput Best Effort Best Effort
Batch Processing High Volume Nope Reconciliation
Automatic Failover
http fencing (Incapsula)
http load balancing (ELB)
instance restart (Scaling Group)
process restart (upstart)
exceptions bubble up and crash
Graceful Degradation
nginx (lua)
expressjs (nodejs)
storm (java)
Stability
Code
Changes
Throttling (without back-pressure)
request priority reduced when TX/sec > thresh
Different priority →
Different queue →
Different worker
lower priority inside queue for test probes
Detect -> Filter -> Alert -> Manual Diagnostics
Alerting
Detection
filter &
route
alert
diagnostics
redundancy
CloudWatch/CollectD - fast, no root cause
App events (exceptions) - too noisy, root cause
Pingdom probes - low coverage, reliable
Internal probes - better coverage, false alarms
cloudwatch
pagerduty
alert
(no riemann)
system test
pagerduty
alert
(riemann needed)
filter tests using a state machine
filter tests using a state machine
(tagged "apisRegression"
(pagerduty-test-dispatch "1234567892ed295d91"))
(defn pagerduty-test-dispatch
[key]
(let [pd (pagerduty key)]
(changed-state {:init "passed"}
(where (state "passed") (:resolve pd))
(where (state "failed") (:trigger pd)))))
re-open manually resolved alert
re-open manually resolved alert
(tagged "apisRegression"
(pagerduty-test-dispatch "1234567892ed295d91"))
(defn pagerduty-test-dispatch
[key]
(let [pd (pagerduty key)]
(sdo
(changed-state {:init "passed"}
(where (state "passed") (:resolve pd)))
(where (state "failed")
(by [:host :service]
(throttle 1 60 (:trigger pd)))))))
Diagnostics - Storm topology timing
Diagnostics - Storm timelines
#2 risk
Slowing down merchant's website
Alerting
Monitor each and every browser
Aggregations (per browser type)
Notify on thresholds
Monitoring our javascript snippet
Timeouts
Exceptions by browser
Exception aggregation
Monitoring new versions
Riemann's Index (server monitoring)
key (host+service) event TTL
10.0.0.1-redisfree { "metric":"5"} 60
10.0.0.1-probe1 {"state":"failed"} 300
10.0.0.2-probe1 { "state":"passed"} 300
Riemann's Index
key (host+service) event TTL
199.25.1.1-1234 {"state":"loaded"} 300
199.25.2.1-4567 {"state":"downloaded"} 300
199.25.3.1-8901 {"state":"loaded"} 300
For our use case:
host=browser-ip, service=cookie
Riemann's state machine
(index)
stores last event and creates expired events (TTL)
(by [:host :service] stream)
creates a new stream for each host/service
(by-host-service stream) - forter's fork only
also closes stream when TTL expires
(defn calc-load-time [& children]
(by-host-service
(changed :state {:pairs? true}
(smap (fn [[previous current]]
(cond
(and (= (:state previous) "downloaded")
(= (:state current) "loaded"))
(assoc previous :metric (- (:time current) (:time previous)))
(and (= (:state previous) "downloaded")
(= (:state current) "expired"))
(assoc previous :metric (* JS_TIMEOUT 1000))))
children))))
(defn aggregate-by-browser [& children]
(by [:browser]
(fixed-time-window 60
(sdo
(smap folds/median
(tag "median-load-time"
children))
(smap folds/count
(tag "load-count"
children)))))))
#3 risk
Wrong decision (approve/decline)
Alerting
Anomaly detection
Motivation
Control false alarms mathematically
Threshold per customer
Threshold seasonality
Alert me if
the probability that we decline
more than k out of n transactions
given probability p
is 1 in a million (t=0.0001%)
n number of tx (30 minutes)
k number of declined txs (30 minutes)
p per customer declined/total (24 hours)
t alert threshold
Binomial Distribution Assumption
External events are uncorrelated
What happens when a customer retries the
same Tx because the first one was declined?
Questions?
email itai@forter.com
http://tech.forter.com
http://www.softwarearchitectureaddict.com

Contenu connexe

Tendances

Fall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using GrafanaFall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using Grafanatorkelo
 
Introduction to Kong API Gateway
Introduction to Kong API GatewayIntroduction to Kong API Gateway
Introduction to Kong API GatewayYohann Ciurlik
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
서비스 모니터링 구현 사례 공유 - Realtime log monitoring platform-PMon을 ...
서비스 모니터링 구현 사례 공유 - Realtime log monitoring platform-PMon을 ...서비스 모니터링 구현 사례 공유 - Realtime log monitoring platform-PMon을 ...
서비스 모니터링 구현 사례 공유 - Realtime log monitoring platform-PMon을 ...Jemin Huh
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introductionRico Chen
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Apache Spark at Airbnb
Apache Spark at AirbnbApache Spark at Airbnb
Apache Spark at AirbnbDatabricks
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillDataWorks Summit
 
Understanding and Extending Prometheus AlertManager
Understanding and Extending Prometheus AlertManagerUnderstanding and Extending Prometheus AlertManager
Understanding and Extending Prometheus AlertManagerLee Calcote
 
Apache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and LogisticsApache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and LogisticsKai Wähner
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistentconfluent
 
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭台灣資料科學年會
 
Splunk Overview
Splunk OverviewSplunk Overview
Splunk OverviewSplunk
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Brian Brazil
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesApache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesKai Wähner
 

Tendances (20)

Fall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using GrafanaFall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using Grafana
 
Introduction to Kong API Gateway
Introduction to Kong API GatewayIntroduction to Kong API Gateway
Introduction to Kong API Gateway
 
Cloud Monitoring tool Grafana
Cloud Monitoring  tool Grafana Cloud Monitoring  tool Grafana
Cloud Monitoring tool Grafana
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
서비스 모니터링 구현 사례 공유 - Realtime log monitoring platform-PMon을 ...
서비스 모니터링 구현 사례 공유 - Realtime log monitoring platform-PMon을 ...서비스 모니터링 구현 사례 공유 - Realtime log monitoring platform-PMon을 ...
서비스 모니터링 구현 사례 공유 - Realtime log monitoring platform-PMon을 ...
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Apache Spark at Airbnb
Apache Spark at AirbnbApache Spark at Airbnb
Apache Spark at Airbnb
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Understanding and Extending Prometheus AlertManager
Understanding and Extending Prometheus AlertManagerUnderstanding and Extending Prometheus AlertManager
Understanding and Extending Prometheus AlertManager
 
Apache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and LogisticsApache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and Logistics
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
 
Splunk Overview
Splunk OverviewSplunk Overview
Splunk Overview
 
Dynatrace
DynatraceDynatrace
Dynatrace
 
Kong
KongKong
Kong
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesApache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice Architectures
 

En vedette

DevOps Picc12 Management Talk
DevOps Picc12 Management TalkDevOps Picc12 Management Talk
DevOps Picc12 Management TalkMichael Rembetsy
 
Car Alarms & Smoke Alarms [Monitorama]
Car Alarms & Smoke Alarms [Monitorama]Car Alarms & Smoke Alarms [Monitorama]
Car Alarms & Smoke Alarms [Monitorama]Dan Slimmon
 
DevOps & Security from an Enterprise Toolsmith's Perspective
DevOps & Security from an Enterprise Toolsmith's PerspectiveDevOps & Security from an Enterprise Toolsmith's Perspective
DevOps & Security from an Enterprise Toolsmith's Perspectivedev2ops
 
Revolutionizing Enterprise Software Development through Continuous Delivery &...
Revolutionizing Enterprise Software Development through Continuous Delivery &...Revolutionizing Enterprise Software Development through Continuous Delivery &...
Revolutionizing Enterprise Software Development through Continuous Delivery &...People10 Technosoft Private Limited
 
AMIS 25: DevOps Best Practice for Oracle SOA and BPM
AMIS 25: DevOps Best Practice for Oracle SOA and BPMAMIS 25: DevOps Best Practice for Oracle SOA and BPM
AMIS 25: DevOps Best Practice for Oracle SOA and BPMMatt Wright
 
Achieving DevOps using Open Source Tools in the Enterprise
Achieving DevOps using Open Source Tools in the EnterpriseAchieving DevOps using Open Source Tools in the Enterprise
Achieving DevOps using Open Source Tools in the EnterpriseCollabNet
 
Continuous Deployment at Etsy: A Tale of Two Approaches
Continuous Deployment at Etsy: A Tale of Two ApproachesContinuous Deployment at Etsy: A Tale of Two Approaches
Continuous Deployment at Etsy: A Tale of Two ApproachesRoss Snyder
 
DevOps: A Culture Transformation, More than Technology
DevOps: A Culture Transformation, More than TechnologyDevOps: A Culture Transformation, More than Technology
DevOps: A Culture Transformation, More than TechnologyCA Technologies
 
Accenture DevOps: Delivering applications at the pace of business
Accenture DevOps: Delivering applications at the pace of businessAccenture DevOps: Delivering applications at the pace of business
Accenture DevOps: Delivering applications at the pace of businessAccenture Technology
 

En vedette (11)

DevOps Picc12 Management Talk
DevOps Picc12 Management TalkDevOps Picc12 Management Talk
DevOps Picc12 Management Talk
 
Car Alarms & Smoke Alarms [Monitorama]
Car Alarms & Smoke Alarms [Monitorama]Car Alarms & Smoke Alarms [Monitorama]
Car Alarms & Smoke Alarms [Monitorama]
 
DevOps & Security from an Enterprise Toolsmith's Perspective
DevOps & Security from an Enterprise Toolsmith's PerspectiveDevOps & Security from an Enterprise Toolsmith's Perspective
DevOps & Security from an Enterprise Toolsmith's Perspective
 
Revolutionizing Enterprise Software Development through Continuous Delivery &...
Revolutionizing Enterprise Software Development through Continuous Delivery &...Revolutionizing Enterprise Software Development through Continuous Delivery &...
Revolutionizing Enterprise Software Development through Continuous Delivery &...
 
AMIS 25: DevOps Best Practice for Oracle SOA and BPM
AMIS 25: DevOps Best Practice for Oracle SOA and BPMAMIS 25: DevOps Best Practice for Oracle SOA and BPM
AMIS 25: DevOps Best Practice for Oracle SOA and BPM
 
Achieving DevOps using Open Source Tools in the Enterprise
Achieving DevOps using Open Source Tools in the EnterpriseAchieving DevOps using Open Source Tools in the Enterprise
Achieving DevOps using Open Source Tools in the Enterprise
 
Continuous Deployment at Etsy: A Tale of Two Approaches
Continuous Deployment at Etsy: A Tale of Two ApproachesContinuous Deployment at Etsy: A Tale of Two Approaches
Continuous Deployment at Etsy: A Tale of Two Approaches
 
DevOps
DevOpsDevOps
DevOps
 
DevOps: A Culture Transformation, More than Technology
DevOps: A Culture Transformation, More than TechnologyDevOps: A Culture Transformation, More than Technology
DevOps: A Culture Transformation, More than Technology
 
Introducing DevOps
Introducing DevOpsIntroducing DevOps
Introducing DevOps
 
Accenture DevOps: Delivering applications at the pace of business
Accenture DevOps: Delivering applications at the pace of businessAccenture DevOps: Delivering applications at the pace of business
Accenture DevOps: Delivering applications at the pace of business
 

Similaire à Monitoring patterns for mitigating technical risk

"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.Vladimir Pavkin
 
Logging makes perfect - Riemann, Elasticsearch and friends
Logging makes perfect - Riemann, Elasticsearch and friendsLogging makes perfect - Riemann, Elasticsearch and friends
Logging makes perfect - Riemann, Elasticsearch and friendsItamar
 
Failure Self Defense: Defend your App against failures in a (micro) services ...
Failure Self Defense: Defend your App against failures in a (micro) services ...Failure Self Defense: Defend your App against failures in a (micro) services ...
Failure Self Defense: Defend your App against failures in a (micro) services ...Tony Fabeen
 
Zabbix Smart problem detection - FISL 2015 workshop
Zabbix Smart problem detection - FISL 2015 workshopZabbix Smart problem detection - FISL 2015 workshop
Zabbix Smart problem detection - FISL 2015 workshopZabbix
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureRobert Metzger
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseKostas Tzoumas
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMjavier ramirez
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Puppet
 
Practical non blocking microservices in java 8
Practical non blocking microservices in java 8Practical non blocking microservices in java 8
Practical non blocking microservices in java 8Michal Balinski
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Nagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
Nagios Conference 2007 | Nagios in very large Environments by Werner NeunteuflNagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
Nagios Conference 2007 | Nagios in very large Environments by Werner NeunteuflNETWAYS
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
 
Reactive Microservices with JRuby and Docker
Reactive Microservices with JRuby and DockerReactive Microservices with JRuby and Docker
Reactive Microservices with JRuby and DockerJohn Scattergood
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger
 
Network Automation with Salt and NAPALM: a self-resilient network
Network Automation with Salt and NAPALM: a self-resilient networkNetwork Automation with Salt and NAPALM: a self-resilient network
Network Automation with Salt and NAPALM: a self-resilient networkCloudflare
 
Monitoring with Syslog and EventMachine
Monitoring with Syslog and EventMachineMonitoring with Syslog and EventMachine
Monitoring with Syslog and EventMachineWooga
 
Cisco Router Security
Cisco Router SecurityCisco Router Security
Cisco Router Securitykktamang
 
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart NeedhamThe post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart NeedhamStewart Needham
 
FME Server Meets the Challenge of Real-time
FME Server Meets the Challenge of Real-timeFME Server Meets the Challenge of Real-time
FME Server Meets the Challenge of Real-timeSafe Software
 

Similaire à Monitoring patterns for mitigating technical risk (20)

"How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics."How about no grep and zabbix?". ELK based alerts and metrics.
"How about no grep and zabbix?". ELK based alerts and metrics.
 
Logging makes perfect - Riemann, Elasticsearch and friends
Logging makes perfect - Riemann, Elasticsearch and friendsLogging makes perfect - Riemann, Elasticsearch and friends
Logging makes perfect - Riemann, Elasticsearch and friends
 
Failure Self Defense: Defend your App against failures in a (micro) services ...
Failure Self Defense: Defend your App against failures in a (micro) services ...Failure Self Defense: Defend your App against failures in a (micro) services ...
Failure Self Defense: Defend your App against failures in a (micro) services ...
 
Zabbix Smart problem detection - FISL 2015 workshop
Zabbix Smart problem detection - FISL 2015 workshopZabbix Smart problem detection - FISL 2015 workshop
Zabbix Smart problem detection - FISL 2015 workshop
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAMAnalitica de datos en tiempo real con Apache Flink y Apache BEAM
Analitica de datos en tiempo real con Apache Flink y Apache BEAM
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
Practical non blocking microservices in java 8
Practical non blocking microservices in java 8Practical non blocking microservices in java 8
Practical non blocking microservices in java 8
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Nagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
Nagios Conference 2007 | Nagios in very large Environments by Werner NeunteuflNagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
Nagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Reactive Microservices with JRuby and Docker
Reactive Microservices with JRuby and DockerReactive Microservices with JRuby and Docker
Reactive Microservices with JRuby and Docker
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
Network Automation with Salt and NAPALM: a self-resilient network
Network Automation with Salt and NAPALM: a self-resilient networkNetwork Automation with Salt and NAPALM: a self-resilient network
Network Automation with Salt and NAPALM: a self-resilient network
 
Monitoring with Syslog and EventMachine
Monitoring with Syslog and EventMachineMonitoring with Syslog and EventMachine
Monitoring with Syslog and EventMachine
 
Cisco Router Security
Cisco Router SecurityCisco Router Security
Cisco Router Security
 
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart NeedhamThe post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
 
FME Server Meets the Challenge of Real-time
FME Server Meets the Challenge of Real-timeFME Server Meets the Challenge of Real-time
FME Server Meets the Challenge of Real-time
 

Dernier

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 

Dernier (20)

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 

Monitoring patterns for mitigating technical risk

Notes de l'éditeur

  1. riemann is an event streaming processing server written in clojure tailored for monitoring. It receives input from the infrastructure (OS/queues/DBs) and instrumented applications (all prog languages) processes the events (enrichement, filtering, aggregation) and forwards them for visualization and alerting. explain the pros/cons of infra monitoring (plugins), application monitoring (custom events), and system probes. the need for processing before a machine wakes you up at night (the role of pagerduty in all of this) and the need for visualization when you wake up at night.
  2. riemann is an event streaming processing server written in clojure tailored for monitoring. It receives input from the infrastructure (OS/queues/DBs) and instrumented applications (all prog languages) processes the events (enrichement, filtering, aggregation) and forwards them for visualization and alerting. explain the pros/cons of infra monitoring (plugins), application monitoring (custom events), and system probes. the need for processing before a machine wakes you up at night (the role of pagerduty in all of this) and the need for visualization when you wake up at night.
  3. riemann is an event streaming processing server written in clojure tailored for monitoring. It receives input from the infrastructure (OS/queues/DBs) and instrumented applications (all prog languages) processes the events (enrichement, filtering, aggregation) and forwards them for visualization and alerting. explain the pros/cons of infra monitoring (plugins), application monitoring (custom events), and system probes. the need for processing before a machine wakes you up at night (the role of pagerduty in all of this) and the need for visualization when you wake up at night.
  4. riemann is an event streaming processing server written in clojure tailored for monitoring. It receives input from the infrastructure (OS/queues/DBs) and instrumented applications (all prog languages) processes the events (enrichement, filtering, aggregation) and forwards them for visualization and alerting. explain the pros/cons of infra monitoring (plugins), application monitoring (custom events), and system probes. the need for processing before a machine wakes you up at night (the role of pagerduty in all of this) and the need for visualization when you wake up at night.
  5. riemann is an event streaming processing server written in clojure tailored for monitoring. It receives input from the infrastructure (OS/queues/DBs) and instrumented applications (all prog languages) processes the events (enrichement, filtering, aggregation) and forwards them for visualization and alerting. explain the pros/cons of infra monitoring (plugins), application monitoring (custom events), and system probes. the need for processing before a machine wakes you up at night (the role of pagerduty in all of this) and the need for visualization when you wake up at night.
  6. On the left: example for cloudwatch alert for the load balancer detecting REST API error 500 (internal server error) On the right: example for escalation policies. alert for a system test (probe) that failed.
  7. On the left: example for cloudwatch alert for the load balancer detecting REST API error 500 (internal server error) On the right: example for escalation policies. alert for a system test (probe) that failed.
  8. PagerDuty has an alerts panel. An alert is either triggered or resolved. PD assigns the alert to whoever is on duty and then notifies him based on predefined rules (for example call if the alert has not been resolved for 15 minutes). However, PagerDuty API has a throttler. We cannot send every test result to PD every time the test runs. In essence we want to replicate the riemann state into PD. Only when the riemann state changes (a test that previously passed now fails or vica versa) we update PD. Riemann makes it easy to define a state machine by storing the last's event per [host+service] combination.
  9. PagerDuty has an alerts panel. An alert is either triggered or resolved. PD assigns the alert to whoever is on duty and then notifies him based on predefined rules (for example call if the alert has not been resolved for 15 minutes). However, PagerDuty API has a throttler. We cannot send every test result to PD every time the test runs. In essence we want to replicate the riemann state into PD. Only when the riemann state changes (a test that previously passed now fails or vica versa) we update PD. Riemann makes it easy to define a state machine by storing the last's event per [host+service] combination.
  10. But what happens if we manually resolved the alert on PD and the test keeps failing? You would want the alert to be repoened again. This means that we need to filter passed test events using state machine, but not filter failed tests at all. We just throttle them to no more than 1 per minute per test permutation [host service] combination.
  11. But what happens if we manually resolved the alert on PD and the test keeps failing? You would want the alert to be repoened again. This means that we need to filter passed test events using state machine, but not filter failed tests at all. We just throttle them to no more than 1 per minute per test permutation [host service] combination.
  12. filtering widgets that were not been possible without riemann enrichmenent. The quick filters allow fast zoom-in on the branch you are working on, and werether you would like to see test probes or not.