Managing High Availability with Low Cost

•

0 likes•307 views

DataLeader.io

Omar del Rio of Sieena Consulting gave this presentation at the Microsoft Pre-MIX11 event ROCK!

Software

Managing high availability with low cost

Experiences from the kitchen

Some definitions
• Scalability is how big you can get.
• Reliability is how consistent you are (in the short term).
• Availability is being reliable and scalable (in the long
run).

• Scalability and reliability are not related (one does not
cause the other or impacts the other).
• Can’t have availability without scalability or reliability.

Without further ado
• The requirement:
– Emergency responder system requires notifications from
emergency workers for availability.
• Results in:
– System that is available 24x7 to respond to notifications.
• Constrained by budget.
• Currently at ~1,200 users/sec

Without further ado
• The requirement:
– Call routing system that must respond to every single request
as fast as possible.
• Results in:
– System that is available 24x7 to respond to calls (marketing
and others).
• Constrained by time.

What we tried first
• What the blogs say
– Be redundant in every part of the system (this gets very
expensive!)
• What the teachers say (by the book)
– Formal engineering (this is very expensive too!)

• What our gut told us
– Test test test!

What we learned
• Scalability is a process not a destination.
• Reliability is not a matter of QA.

• The tools matter – but not in the traditional sense.
– SQL Server (from 2005 to 2008 R2)
– Windows (from 2003 to 2008 R2)

Some statistics
System availability 2010
14 minutes of failures from at least 2 of 3 monitoring
locations

1%
IIS 7
8% Network
32% Framework Bugs
52% App Bugs
37% 16%
SQL Server - 100% CPU
SQL Server - Mirroring
4% SQL Server - No reason
2%

* Only outages at the core router are displayed here as network problems.

Specific Lessons
• Design code for failure (not for 100% reliability or 0
bugs).
– Redundancy in code is critical.
• Fail fast and fail often.
– Don’t wait until the system fails completely.
• Monitor and validate.
– Monitor as frequently as it is affordable.

Design for failure

• Why once if you can • Use all available tools
twice?
– Bidirectional replication
for regional duplication
of traffic.
– Cheap load balancers.
– Cheap RAID 10 SATA.

– Don’t trust your
database.

Fail fast and fail often
• Specific configuration settings for IIS.
– Yes! App pool recycling increases availability.

• Specific configuration for queuing.
– Use a messaging system that always responds and stores
safely in case the database is not available.

• Make a lot of noise.

Monitor and validate
• Monitis
– Cheap, but support is not there.
• Other tools
– Gomez.com – expensive, but if you can afford it, great.
• Inside tools
– Open source and MS tools.

The bottom line
Design for failure Traditional Route
Database approach - Expect the system to Database Clustering
operate without a DB for
brief periods of time.
- Do mirroring locally.
- Do replication remotely.
Hardware approach - Configure for redundancy at Redundancy everywhere
the telecom level.
- Configure regional
redundancy (invest in
another server with another
host, make sure network is
different enough).
Code approach - Design multiple systems that Design for 0 bugs (the formal
do the same thing in simple method). Increase QA.
ways.
- There is nothing wrong with
multiple code paths, even
processes.
- Reliability is not having the
same bug in the same place.

What's hot

Unified Deployment: Including the Mainframe in Enterprise DevOpsCompuware

Squeeze Maximum Performance from your Hosting PlatformSiteGround.com

Elite Bug SquashingTony Brown

Go Fast, Go Safe, Go on Vacation - Compuware ISPW Webcast Compuware

Role of Pipelines in Continuous DeliveryMandi Walls

Abhiabhinay kumar

Resumenadeem shaikh

Automation & Professional ServicesMarketingArrowECS_CZ

Fuzzing and You: Automating Whitebox TestingNetSPI

Technical guidance in SaaS StartupsMalinda Kapuruge

Sdn not just a buzzwordJorge Bonilla

How Atlassian made the switch to DVCSJohn Stevenson

Moving faster with CI/CD: Best DevOps practices and lessons learntMalinda Kapuruge

Panel Discussion Continuous Deployment in SaaSJonas Cheng

Continuous Delivery in the real world - techniques to reduce the developers b...Nikolai Blackie

XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...The Linux Foundation

Resilience reloaded - more resilience patternsUwe Friedrichsen

Towards Continuous Delivery in Mobile AppsPhillip Wheatley

Game Development Best PracticesPerforce

Automation in Network Lifecycle Management - Bay Area Juniper MeetupJorge Bonilla

What's hot (20)

Unified Deployment: Including the Mainframe in Enterprise DevOps

Squeeze Maximum Performance from your Hosting Platform

Elite Bug Squashing

Go Fast, Go Safe, Go on Vacation - Compuware ISPW Webcast

Role of Pipelines in Continuous Delivery

Abhi

Resume

Automation & Professional Services

Fuzzing and You: Automating Whitebox Testing

Technical guidance in SaaS Startups

Sdn not just a buzzword

How Atlassian made the switch to DVCS

Moving faster with CI/CD: Best DevOps practices and lessons learnt

Panel Discussion Continuous Deployment in SaaS

Continuous Delivery in the real world - techniques to reduce the developers b...

XPDDS19 Keynote: Secret-free Hypervisor: Now and Future - Wei Liu, Software E...

Resilience reloaded - more resilience patterns

Towards Continuous Delivery in Mobile Apps

Game Development Best Practices

Automation in Network Lifecycle Management - Bay Area Juniper Meetup

Viewers also liked

Microsoft DigiGirlz, Teaching Teens About Databases (Trick!)DataLeader.io

10 неща, които учителите не искат да знаетеCanko Balkanski

Deaf people will talk to gsm and each other in sign languageCanko Balkanski

Voynata na taralezhiteCanko Balkanski

Published last letters of the wife of hitlerCanko Balkanski

Kim Schmidt's ResumeDataLeader.io

Complexity Management Literacy Meeting - Presentazione di Marinella De Simone...Complexity Institute

Viewers also liked (7)

Microsoft DigiGirlz, Teaching Teens About Databases (Trick!)

10 неща, които учителите не искат да знаете

Deaf people will talk to gsm and each other in sign language

Voynata na taralezhite

Published last letters of the wife of hitler

Kim Schmidt's Resume

Complexity Management Literacy Meeting - Presentazione di Marinella De Simone...

Similar to Managing High Availability with Low Cost

Db trends finalCraig Mullins

NoSQL and ACIDFoundationDB

Devops - why, what and how?Malinda Kapuruge

DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing EnvironmentDocker, Inc.

Tiger oracled0nn9n

Tokyo azure meetup #12 service fabric internalsTokyo Azure Meetup

A software monitoring framework for quality verificationDileepa Jayathilake

Improving Batch-Process Testing Techniques with a Domain-Specific LanguageDr. Spock

The Rocky Cloud RoadGert Drapers

DevOps Days Vancouver 2014 SlidesAlex Cruise

Safe and Reliable Embedded Linux Programming: How to Get ThereAdaCore

Path Dependent Development (PyCon AU)ncoghlan_dev

Introduction to dev opsLen Bass

Remote AdmittanceAjit Yadav

The quality attribute of upgradabilityLen Bass

Managing Performance in the CloudDevOpsGroup

MyHeritage - End 2 End testing InfraMatanGoren

Terrascan - Cloud Native Security Tool sangam biradar

Path dependent-development (PyCon India)ncoghlan_dev

DevOps, A brief introduction to Vagrant & AnsibleArnaud LEMAIRE

Similar to Managing High Availability with Low Cost (20)

Db trends final

NoSQL and ACID

Devops - why, what and how?

DCSF19 Transforming a 15+ Year Old Semiconductor Manufacturing Environment

Tiger oracle

Tokyo azure meetup #12 service fabric internals

A software monitoring framework for quality verification

Improving Batch-Process Testing Techniques with a Domain-Specific Language

The Rocky Cloud Road

DevOps Days Vancouver 2014 Slides

Safe and Reliable Embedded Linux Programming: How to Get There

Path Dependent Development (PyCon AU)

Introduction to dev ops

Remote Admittance

The quality attribute of upgradability

Managing Performance in the Cloud

MyHeritage - End 2 End testing Infra

Terrascan - Cloud Native Security Tool

Path dependent-development (PyCon India)

DevOps, A brief introduction to Vagrant & Ansible

Recently uploaded

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

why an Opensea Clone Script might be your perfect match.pdfjoe51371421

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

Professional Resume Template for Software DevelopersVinodh Ram

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Clustering techniques data mining book ....ShaimaaMohamedGalal

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

Test Automation Strategy for Frontend and BackendArshad QA

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Software Quality Assurance Interview QuestionsArshad QA

Recently uploaded (20)

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...

why an Opensea Clone Script might be your perfect match.pdf

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

Professional Resume Template for Software Developers

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI

Hand gesture recognition PROJECT PPT.pptx

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

Clustering techniques data mining book ....

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

Diamond Application Development Crafting Solutions with Precision

Test Automation Strategy for Frontend and Backend

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️

Unlocking the Future of AI Agents with Large Language Models

Software Quality Assurance Interview Questions

Managing High Availability with Low Cost

1. Managing high availability with low cost Experiences from the kitchen

2. Some definitions • Scalability is how big you can get. • Reliability is how consistent you are (in the short term). • Availability is being reliable and scalable (in the long run). • Scalability and reliability are not related (one does not cause the other or impacts the other). • Can’t have availability without scalability or reliability.

3. Without further ado • The requirement: – Emergency responder system requires notifications from emergency workers for availability. • Results in: – System that is available 24x7 to respond to notifications. • Constrained by budget. • Currently at ~1,200 users/sec

4. Without further ado • The requirement: – Call routing system that must respond to every single request as fast as possible. • Results in: – System that is available 24x7 to respond to calls (marketing and others). • Constrained by time.

5. Results • Switch to Word doc

6. What we tried first • What the blogs say – Be redundant in every part of the system (this gets very expensive!) • What the teachers say (by the book) – Formal engineering (this is very expensive too!) • What our gut told us – Test test test!

7. What we learned • Scalability is a process not a destination. • Reliability is not a matter of QA. • The tools matter – but not in the traditional sense. – SQL Server (from 2005 to 2008 R2) – Windows (from 2003 to 2008 R2)

8. Some statistics System availability 2010 14 minutes of failures from at least 2 of 3 monitoring locations 1% IIS 7 8% Network 32% Framework Bugs 52% App Bugs 37% 16% SQL Server - 100% CPU SQL Server - Mirroring 4% SQL Server - No reason 2% * Only outages at the core router are displayed here as network problems.

9. Specific Lessons • Design code for failure (not for 100% reliability or 0 bugs). – Redundancy in code is critical. • Fail fast and fail often. – Don’t wait until the system fails completely. • Monitor and validate. – Monitor as frequently as it is affordable.

10. Design for failure • Why once if you can • Use all available tools twice? – Bidirectional replication for regional duplication of traffic. – Cheap load balancers. – Cheap RAID 10 SATA. – Don’t trust your database.

11. Fail fast and fail often • Specific configuration settings for IIS. – Yes! App pool recycling increases availability. • Specific configuration for queuing. – Use a messaging system that always responds and stores safely in case the database is not available. • Make a lot of noise.

12. Monitor and validate • Monitis – Cheap, but support is not there. • Other tools – Gomez.com – expensive, but if you can afford it, great. • Inside tools – Open source and MS tools.

13. The bottom line Design for failure Traditional Route Database approach - Expect the system to Database Clustering operate without a DB for brief periods of time. - Do mirroring locally. - Do replication remotely. Hardware approach - Configure for redundancy at Redundancy everywhere the telecom level. - Configure regional redundancy (invest in another server with another host, make sure network is different enough). Code approach - Design multiple systems that Design for 0 bugs (the formal do the same thing in simple method). Increase QA. ways. - There is nothing wrong with multiple code paths, even processes. - Reliability is not having the same bug in the same place.

Editor's Notes

Mention how the most scalable systems are not always the most available (banks). Mention how the most reliable systems are not the most scalable (phones, specific purpose stuff).Analogy with orange chicken.
Explain the process a bit more. Tell the story from the perspective of those who use it and those who receive a benefit. Also discuss the AR example and Call routing and call tracking.
Explain the process a bit more. Tell the story from the perspective of those who use it and those who receive a benefit. Also discuss the AR example and Call routing and call tracking.
For reliability, store your session data in the database. For reliability, everything must be redundant.For availability the system must be designed from the ground up.
Explain how scalability is a moving target, maybe add the graphics from IAR.Explain how a system that works now is obsolete faster than you can think of. Mention the debate that is going on with “MySpace failed because of Microsoft”.Lessons from the MS bugs in the asp.net session handling code, enhancements in Windows performance over the different versions (from IIS 6 to IIS 7.5)
Explain how availability is defined in terms of users and not in terms of system

Managing High Availability with Low Cost

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Managing High Availability with Low Cost

Similar to Managing High Availability with Low Cost (20)

More from DataLeader.io

More from DataLeader.io (9)

Recently uploaded

Recently uploaded (20)

Managing High Availability with Low Cost

Editor's Notes