SlideShare une entreprise Scribd logo
1  sur  152
Télécharger pour lire hors ligne
You can talk to me about:
● Making on-call better for humans
● High Availability and Load Balancing
● Mirroring free software
● Zero trust networking
● Home Automation and garage racks
● Pretty much anything
James Forman
I am a:
Linux Sysadmin, Network Engineer and People Manager
I work for:
https://en.wikipedia.org/wiki/File:Wellington_montage_2.jpghttps://en.wikipedia.org/wiki/File:New_Zealand_relief_map.jpg
Overlay by:
Wikipedia User Hazhk
Catalyst’s Wellington Team - December 2017
OSMC.de 2019
First things first
Content Warning
Photos of earthquake damage
Hot Potato is not a
monitoring system
It’s a message broker
Life before Hot Potato
One pager team
The “pager peeps”
Every person has a pager
One number, multiple pagers
A range of different
monitoring builds
Customer managed, remote, out of country, in
country with pager access, email to pager gateways
18 Monitoring Servers
Nagios 3, Icinga 1.x and Icinga2
Support Hotline
Call number, leave voicemail, wake person up
IRC based handovers
<jforman> Pager on
<redacted> Going to sleep
Why did we build it?
We already wanted a
replacement system
Aging technology
The pager system was becoming unreliable
Image credit: Matthew Inman / theoatmeal.com
The top 3 questions
1. Why not use a service?
SLAs
None of the options could meet our requirements
2. Why not use SMS?
3. Why not go staffed 24/7?
Our other motivations
Open Source
Customer data
Stop sending notifications in cleartext
https://techcrunch.com/2019/10/30/nhs-pagers-medical-health-data/
We thought we had time
to find a replacement
https://www.spark.co.nz/content/dam/kb/public/docs/media-release-paging-network-closure.pdf
We thought we had time
to find a replacement
At first it was good news
Aging technology
It was too good to be true
The pager network
became unreliable
“In response to Radio New Zealand
queries Spark said it had talked to many
of its customers before the
announcement was made and that
included the Fire Service.”
Time ran out
(in the middle of the night)
NO CARRIER
:(
“1st cab off the rank was those pager numbers
that had not signed up to the new pager
network were disconnected.”
“We have then worked with the customers who
have migrated across to replace their old
access points (ways they send a pager
message) to either Email or an API option.”
“This is because the old access points are
being turned off.”
Photo by:
BRENDON O'HAGAN/FAIRFAX NZ
Solving the
immediate problem
so people could sleep
eMail -> SMS
We sent all notifications via SMS
as an emergency measure
eMail == :(
Nameless project == :)
The first version of
Hot Potato
A really bad “API”
The worst thing I’ve put into production
A dodgy script
Rolled out to all the monitoring servers
Insert and Send
Add to database, send pager message
select * from notifications
A table of notifications
A handover button
sends a message saying you have the pager
v0.1 - Much more reliable than email
It worked (mostly)
It gave us the time and
opportunity to do better
We had some goals
Don’t get in the way
make it easy to be on-call
Enable alert reduction
let people sleep
Survive natural hazards
the reality of building systems in NZ
Volcanoes
https://www.nationalgeographic.org/news/plate-tectonics-ring-fire/
Earthquakes
Recent fatal earthquakes
22 February 2011 - Christchurch - 185 people
13 June 2011 - Christchurch - 1 person
14 November 2016 - Kaikoura - 2 people
Diagrams by:
Wikipedia User Mikenorton
Photo by:
New Zealand Defence Force
Photo by:
New Zealand Defence Force
Photo by:
RNZ / Rebekah Parsons-King
Photo of:
MP Stuart Smith
Photo by:
RNZ / Simon Morton
Photo by:
RNZ / Conan Young
Photo by:
RNZ / Aaron Smale
Photo by:
Phillip Pearson
Tsunamis
https://wremo.nz/hazards/tsunami-zones/
https://wremo.nz/hazards/tsunami-zones/
https://wremo.nz/hazards/tsunami-zones/
https://wremo.nz/hazards/tsunami-zones/
Survive any loss of
International Connectivity
we had 1 main undersea cable (2 landings)
Image credit: Tourism New Zealand
https://www.submarinecablemap.com/
https://www.submarinecablemap.com/
https://www.submarinecablemap.com/
Then we had some
requirements
Survive disasters
Earthquakes, tsunamis, volcanoes, team lunches..
Support existing
monitoring
Nagios3, Icinga 1.x, Icinga2
Get rid of email
No more using email to deliver messages
Confirm message delivery
Move from paging and SMS to Push Notifications
Improve handover
Is your pager on yet? I want to go to sleep
#deathTo Pagers
“I’d rather have a bee burrow into my
skull than carry a pager again”
- Me
What did we build?
A web app with an API
built with Python and Flask
With a funky database
and some queuing
CockroachDB and RabbitMQ
Our production
environment has 5 nodes
NZ: Porirua, Wellington and Hamilton
AU: Sydney
US: California
How does it work?
Sending notifications
Heartbeats
How does it look?
What else can it do?
Failure notifications
the pager network is down again!
Heartbeats
ensuring connectivity
Teams
put everyone on-call!
Team escalations
because sometimes bad things happen
Reports
A breakdown of the week that was
Promote alert reduction
With the help of some neopixels
What notification
providers does it support?
Twilio
For delivery of SMS messages
Modica
For delivery of SMS messages and pager messages
Pushover
For delivery of push notifications to Android and iOS
What’s planned?
Mobile app
for Android and iOS, no more pagers
Support hotline
direct calls to the on-call person or take messages
Planned work
stop forgetting to extend downtime on things
Language support
German and Italian coming soon
What do I need to try it?
What do I need to deploy
it to production?
One server
If you don’t want redundancy,
you don’t have to have it
Demo?
James Forman
Callum Dickinson
Filip Vujičić
Zac Pullar-Strecker
Opal Symes
Rhys Davies
Michael Fincham
Tim Bruce
Jamie McClymont
Toni Gardener
Manuela Spies
Sapir Ben-Shahar
Brynn Wilde
Hemanth Sonthi
Emanuel Evans
Hazel Meehan
Baxter Gray
Sam Banks
Thank you to our contributors
Open Source Academy
https://hotpotato.nz
Questions?
https://hotpotato.nz
@teamHotPotato
#hotpotato on freenode

Contenu connexe

Similaire à OSMC 2019 | Hot Potato by James Forman

P2P for mobile devices
P2P for mobile devicesP2P for mobile devices
P2P for mobile devicesimmanuelnoel
 
Farms, Fabrics and Clouds
Farms, Fabrics and CloudsFarms, Fabrics and Clouds
Farms, Fabrics and CloudsSteve Loughran
 
Kamailio World 2018: Having fun with new stuff
Kamailio World 2018: Having fun with new stuffKamailio World 2018: Having fun with new stuff
Kamailio World 2018: Having fun with new stuffOlle E Johansson
 
More fun using Kautilya
More fun using KautilyaMore fun using Kautilya
More fun using KautilyaNikhil Mittal
 
Dror-Crazy_toaster
Dror-Crazy_toasterDror-Crazy_toaster
Dror-Crazy_toasterguest66dc5f
 
CSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoT
CSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoTCSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoT
CSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoTCanSecWest
 
E Pliance Presentation.V1
E Pliance Presentation.V1E Pliance Presentation.V1
E Pliance Presentation.V1skiaya
 
Stewart MACKENZIE - The edge of the Internet is becoming the center
Stewart MACKENZIE - The edge of the Internet is becoming the centerStewart MACKENZIE - The edge of the Internet is becoming the center
Stewart MACKENZIE - The edge of the Internet is becoming the centerREVULN
 
Networking and Computer Troubleshooting
Networking and Computer TroubleshootingNetworking and Computer Troubleshooting
Networking and Computer TroubleshootingRence Montanes
 
Messaging is not just for investment banks!
Messaging is not just for investment banks!Messaging is not just for investment banks!
Messaging is not just for investment banks!elliando dias
 
Teensy Programming for Everyone
Teensy Programming for EveryoneTeensy Programming for Everyone
Teensy Programming for EveryoneNikhil Mittal
 
Saving One Network At a Time
Saving One Network At a TimeSaving One Network At a Time
Saving One Network At a TimeJeffrey Ong
 
Interledger Overview // Berlin Node.js Meetup
Interledger Overview // Berlin Node.js MeetupInterledger Overview // Berlin Node.js Meetup
Interledger Overview // Berlin Node.js MeetupInterledger
 
Network Monitoring Basics
Network Monitoring BasicsNetwork Monitoring Basics
Network Monitoring BasicsRob Dunn
 
Tech Presentation 2
Tech Presentation 2Tech Presentation 2
Tech Presentation 2guest2bdea
 
Fosdem IoT devroom, 2015, open scalable IoT systems with XMPP
Fosdem IoT devroom, 2015, open scalable IoT systems with XMPPFosdem IoT devroom, 2015, open scalable IoT systems with XMPP
Fosdem IoT devroom, 2015, open scalable IoT systems with XMPPJoachim Lindborg
 
Network Automation - Interconnection tools
Network Automation - Interconnection toolsNetwork Automation - Interconnection tools
Network Automation - Interconnection toolsAndy Davidson
 

Similaire à OSMC 2019 | Hot Potato by James Forman (20)

P2P for mobile devices
P2P for mobile devicesP2P for mobile devices
P2P for mobile devices
 
Farms, Fabrics and Clouds
Farms, Fabrics and CloudsFarms, Fabrics and Clouds
Farms, Fabrics and Clouds
 
Kamailio World 2018: Having fun with new stuff
Kamailio World 2018: Having fun with new stuffKamailio World 2018: Having fun with new stuff
Kamailio World 2018: Having fun with new stuff
 
More fun using Kautilya
More fun using KautilyaMore fun using Kautilya
More fun using Kautilya
 
Sneaky computation
Sneaky computationSneaky computation
Sneaky computation
 
Dror-Crazy_toaster
Dror-Crazy_toasterDror-Crazy_toaster
Dror-Crazy_toaster
 
CSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoT
CSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoTCSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoT
CSW2017 Yuhao song+Huimingliu cyber_wmd_vulnerable_IoT
 
E Pliance Presentation.V1
E Pliance Presentation.V1E Pliance Presentation.V1
E Pliance Presentation.V1
 
Stewart MACKENZIE - The edge of the Internet is becoming the center
Stewart MACKENZIE - The edge of the Internet is becoming the centerStewart MACKENZIE - The edge of the Internet is becoming the center
Stewart MACKENZIE - The edge of the Internet is becoming the center
 
Networking and Computer Troubleshooting
Networking and Computer TroubleshootingNetworking and Computer Troubleshooting
Networking and Computer Troubleshooting
 
Messaging is not just for investment banks!
Messaging is not just for investment banks!Messaging is not just for investment banks!
Messaging is not just for investment banks!
 
Teensy Programming for Everyone
Teensy Programming for EveryoneTeensy Programming for Everyone
Teensy Programming for Everyone
 
Saving One Network At a Time
Saving One Network At a TimeSaving One Network At a Time
Saving One Network At a Time
 
Interledger Overview // Berlin Node.js Meetup
Interledger Overview // Berlin Node.js MeetupInterledger Overview // Berlin Node.js Meetup
Interledger Overview // Berlin Node.js Meetup
 
E commerce
E commerceE commerce
E commerce
 
A new perspective on Network Visibility - RISK 2015
A new perspective on Network Visibility - RISK 2015A new perspective on Network Visibility - RISK 2015
A new perspective on Network Visibility - RISK 2015
 
Network Monitoring Basics
Network Monitoring BasicsNetwork Monitoring Basics
Network Monitoring Basics
 
Tech Presentation 2
Tech Presentation 2Tech Presentation 2
Tech Presentation 2
 
Fosdem IoT devroom, 2015, open scalable IoT systems with XMPP
Fosdem IoT devroom, 2015, open scalable IoT systems with XMPPFosdem IoT devroom, 2015, open scalable IoT systems with XMPP
Fosdem IoT devroom, 2015, open scalable IoT systems with XMPP
 
Network Automation - Interconnection tools
Network Automation - Interconnection toolsNetwork Automation - Interconnection tools
Network Automation - Interconnection tools
 

Dernier

Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 

Dernier (20)

Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 

OSMC 2019 | Hot Potato by James Forman