SlideShare une entreprise Scribd logo
1  sur  70
Télécharger pour lire hors ligne
Artur Bergman
          sky@crucially.net
• Wikia Inc
  – We are hiring
  – Community/Bizdev in Germany
  – Engineers in Poland
  – http://www.wikia.com/wiki/hiring
• O’Reilly Radar
  – http://radar.oreilly.com/artur/
The value of operations
•   Google
•   Orkut
•   Friendster
•   Myspace
Benefits
•   Users trust your brand
•   They rely on you
•   They spend more time on your site
•   Bad operations wastes R&D money

• Fixed amount of time + faster site =
  more page views
Stepchild of Engineering
• Product development
• Engineering
• Operations
  – Sysadmins?
• Why?
Operations Engineering
• It is engineering
• Google terminology -
  – Site Reliability Engineer
• Sure there are sysadmins too, people
  mananing NOCs and datacenters
• Provide career growth
Good Engineers
•   Detail Oriented
•   Aspire to be operational engineers
•   Stubborn
•   Can steer their inner ADD
    – Interrupt driven
• Not the same as good developers
Danger signs
• Thinks operation is a path to
  development engineering
  – Fire them
• Want people dedicated to the task
• A good operations engineer should
  spend some time in development
• A good development engineer MUST
  spend some time in operations
Debugging
• 9 Rules of debugging
• http://www.debuggingrules.com/Poster_
  download.html
  – Yes the font is horrible
Rule 1:
       Understand the system
•   Complexity Kills
•   No excuse
•   If you write it, you must know it
•   If you run it, you must know it
•   If you buy it, you must know it
Rule 3:
      Quit thinking and look
• quot;It is a capital mistake to theorize before
  one has data. Insensibly one begins to
  twist facts to suit theories, instead of
  theories to suit facts.”
Rule 3:
        Quit thinking and look
•   What do you look at?
•   The importance of monitoring
•   Monitoring
•   Monitoring
•   Monitoring
My my, confusing term
• Monitoring
• Alerting
• Trending
Monitoring
•   Collects data
•   Puts into databases
•   Makes it available for you
•   Active collection
•   Passive interaction
Alerting
• Acts on monitoring data
• Severe alerts
  – Active
  – Needs action
• Passive alerts
  – Things that need to be done but not right now
• DO NOT OVER ALERT
• DO NOT CRY WOLF
Wikia alerting strategy
•   When the site is slow
•   Or down
•   We send emails and do phone calls
•   Europe and US West coast
•   Looking to hire in East Asia
•   No night time
Trending
• Long term
• Capacity planning
Monitor Tools
•   Nagios
•   Cacti
•   MRTG
•   Hyperic
•   Cricket
•   Ganglia
External Monitoring
• Use one, tells you what your clients see
  every x minutes
• Keynote
• Gomez
• Websitepulse (cheap - easy - I like
  them; no annoying salesforce)
Nagios
•   Alerting
•   Hassle
•   C CGI??
•   Doesn’t
    scale
Hyperic
• Most exciting open source tool
• Agent base - self configured
• Baseline alerting
Cricket MRTG Cacti
• Impossible to configure
• You need to write tools to do it
• Especially Cacti
  – Somewhat more pleasant than clawing out
    your eyes
Ganglia
• We love ganglia
• Automatically graphs everything you
  want - just works
• Large scale clusters
• Multicast
• Zero config
• RRD
http://ganglia.wikimedia.org/
•   270 hosts
•   880 CPU
•   2 clusters
•   1.2 TB of Memory
http://ganglia.wikimedia.org
Custom Ganglia Gmetrics
• Write your own

gmetric --name='Oldest query' --type=int32
--units='sec' --dmax=65 --value=`echo '
show processlist' | mysql -uroot -ppass |
grep -v Sleep | grep -v 'system user' | head -2 |
tail -1 | cut -f 6`
Custom Ganglia Gmetrics
• Or Learn Unix

gmetric --name='Oldest query' --type=int32
--units='sec' --dmax=65 --value=`echo '
show processlist' | mysql -uroot -ppass |
grep -v Sleep | grep -v 'system user' | head -2 |
tail -1 | cut -f 6`
Custom Ganglia Gmetrics
• Write your own

gmetric --name='Oldest query' --type=int32
--units='sec' --dmax=65 --value=`echo '
show processlist' | mysql -uroot -ppass |
grep -v Sleep | grep -v 'system user' | head -2 |
tail -1 | cut -f 6`
Custom Ganglia Gmetrics
• Write your own

gmetric --name='Oldest query' --type=int32
--units='sec' --dmax=65 --value=`echo '
show processlist' | mysql -uroot -ppass |
grep -v Sleep | grep -v 'system user' | head -2 |
tail -1 | cut -f 6`
Custom Ganglia Gmetrics
• Write your own

gmetric --name='Oldest query' --type=int32
--units='sec' --dmax=65 --value=`echo '
show processlist' | mysql -uroot -ppass |
grep -v Sleep | grep -v 'system user' | head -2 |
tail -1 | cut -f 6`
Something is wrong

• Don’t worry, data warehouse




                      QuickTime™ and a
            TIFF (Uncompressed) decompressor
               are needed to see this picture.
tcpdump / waveshark
•   If you suspect the network
•   Don’t just suspect
•   LOOK AT IT
•   Tcpdump / waveshark will tell you
    – If your packets are lost, delayed or
      corrupted
    – Your windowing is wrong
Rule 4: Divde and Conquer
• Look at the problems in turn
• Split between people
• Go in the order you suspect is the most
  likely
Rule 5:
 Change one thing at a time
• I cannot stress this enough
• IF YOU DO NOT THEN YOU HAVE
  FAILED TO IDENTIFY THE PROBLEM
Rule 6:
        Keep an audit trail
• You might be making things worse
• Good for the root cause analysis
• Have your shell log all commands
  – Good practice anyway
• Version control
Rule 9:
    If you didn’t fix it, it ain’t fixed
•   You must do something to fix a problem
•   Or it will bite you again
•   And again
•   And again
•   They don’t just appear and disappear
•   Except BGP route convergence :)
Process
• You need a little
• Don’t worry
Don’t forget
Complexity kills
•   Design against it
•   Reuse components
•   Define standards
•   Have a few images that all machines
    look like - reimage machines every now
    and then for the heck of it.
    – EC2 forces you to do this
MTBF
Meduim Time Between Failure
• Actually mostly irrelevant
• Dealing with failure is more important
• Target the right uptime
  – Complexity scales exponatially with
    required uptime
• Don’t kid yourself, you don’t need 5
  nines
MTTR
  Medium Time To Recovery
• Important
• Noone cares if you fail once a minute
  – If you recover in 50 ms
• If you are down 1 minute a week, you
  are still going to hit 4 nines (99.99%)
• Failures happen, plan how to deal with
  them
Problem found
• If it is critical, start a phone conversation
• Use IRC to communicate technical data
• One person liasons with non technical
  staff
• One person specifically in command
• Sleep scheduling ( audit log important )
Post crisis
• Root cause analysis
  – Just find out what went wrong
  – And how to avoid it
  – Or fix it faster next time if you can’t
• Keep track of your uptime
Automation
•   All machines are created equal
•   Seriously
•   If you manually make changes
•   You are wrong
    – Unless you know what you are doing
Best practices
•   Version control
•   Gold images
•   Centralised authentication
•   Time Sync ( NTP )
•   Central logging
•   ( All of this applies for virtual machines
    too!)
cfengine
•   Standard automation tool
•   Written in C
•   Not much support
•   Very good
•   Very annoying
contro :
      l
  s te
   i      = ( mys te )
                 i        domain = (
  mysite .count y )
               r
  sysadm = (mark )          netmask = (
  255.255.255.0 )          ac i
                             t onsequence =
  (         mounta ll       mount nfo
                                  i
      addmounts          mounta l
                                l        lnks
                                          i
  )        mountpat rn = / ie) (
                      te     $(s t /$ host))
 homepat r = ( u? )
          te n
Puppet
•   New hip kid on the block
•   Written in ruby
•   Better support?
•   Much nicer syntax
•   Easier to extend
def ne yumrepo (enab
   i                 led = true)
{c i i
    onf gfle
{ /e c
 quot; t /yum.repos /
               .d $name.repo”: mode
  => 644,
source => quot; yum/repos
             /        /$name. repoquot;,
ensure => $enab led ? {
true => fl ,
         ie
defau t=> absent
      l                  }
}}
cobb er
                        l
• Automatic PXE Installer
    – Uses kickstart files
•   Redhat Enterprise
•   Centos
•   Fedora
•   Some support for debian
cobbler
cobbler system add
  --name=xen8
  --mac=00:19:B9:EE:6D:0A
  --ip=10.10.30.208
  --profile=Centos-5-x86_64
  --kopts='ksdevice=00:19:B9:EE:6D:0A
      console=ttyS1,57600 console=tty0'
cobbler
cobbler system add
  --name=xen8
  --mac=00:19:B9:EE:6D:0A
  --ip=10.10.30.208
  --profile=Centos-5-x86_64
  --kopts='ksdevice=00:19:B9:EE:6D:0A
      console=ttyS1,57600 console=tty0’
koan
• Client install tool
  – Xen
  – Or OS re-image


koan --server=10.10.30.205 --virt --
  profile=virt_fc6 --virt-name=otrs
Your datacenter
• Keep it tidy
   – Label things, keep cables as short as possible
   – Have a switch in each rack
• If you are small without dedicated DC staff
  you need
   – Remote control power switches
   – Remote console!
Virtualization
•   Please use it
•   Managing becomes much easier
•   Power consumption
•   Need a new test box
    – The requestor can have it in minutes
Power consumption
• Maybe not as important in Europe
• 8 core machines are more efficient than
  1 core
• But memcache uses 1 core and all RAM
• Get more RAM and virtualise
Our network admin boxes
•   1 Xen CPU for Vyatta
•   1 Xen CPU for LVS
•   1 Xen CPU for Squid - Carp
•   1 Xen CPU for Squid
•   1 Xen CPU for Monitoring
•   1 Xen CPU for network tasks

• We can have more of these and a loss of one
  affects us less
Vyatta
• Opensource router
  – Really like it
  – No need to use Cisco
LVS
•   Linux Virtual Server
•   Low level load balancer
•   HA
•   Fast
•   Doesn’t inspire people to put things in
    the only place that is hard to scale
Squid Carp
• Squids configured to hash the urls and
  send them to specific backend
• Very little configuration done
• Logging of UDP - no disk IO
Squid
• As a reverse web accelerator
• 90 % of our hits served from RAM in less than
  1 ms
• Same as wikipedia
• We only use RAM cache ( unlike wikipedia)
• Cached per user
• If not cacheable - cache for a second to
  redue backend effect
App servers
• 1 xen cpu for memcache ( 5 GB Ram)
• 1 xen cpu for squid ( 5GB Ram )
• 6 xen cpus for apache (6 GB Ram )

• More power efficient, less affected by
  loss
• Applications can’t affect each other
Databases
• Keep developers on short leash
• Report bad queries
• Fear object relational mappers
Outsourcing
• As much as possible
• The younger you are as a company the
  less risk
  – When you have no users, you have no
    value
• VCs don’t like having their money go
  into Capex
What I want from Vendors
• They do what they tell me
• They do what I tell them

• No annoying up sells, no premium
  services
  – I know more about what you are selling
    than you
Services we use
• Amazon EC2 and S3
• Panther-Express
Panther Express
• Fantastic Content Distribution Network
• Cheap, simple price list
  – Take note akamai
• Cut delivery time to Europe by 70%
• We let our images be cached 1 second
  to redue load
EC2 and S3
•   We save all our binlogs to S3
•   We save database dumps to S3
•   We have monitors running from EC2
•   We plan to build a datawarehouse
    cluster on EC2
EC2 Requires Automation
• Machine is blank when you bring it up
• Download database dump from S3 and
  replicate up - automatically
• Use puppet
• Amazon saves you hardware
  headaches
  – But complexity is still a problem
Thank you

Contenu connexe

En vedette

Web engineering - An overview about HTML
Web engineering -  An overview about HTMLWeb engineering -  An overview about HTML
Web engineering - An overview about HTMLNosheen Qamar
 
Web Engineering - Web Application Testing
Web Engineering - Web Application TestingWeb Engineering - Web Application Testing
Web Engineering - Web Application TestingNosheen Qamar
 
Web application testing with Selenium
Web application testing with SeleniumWeb application testing with Selenium
Web application testing with SeleniumKerry Buckley
 
Web App Testing - A Practical Approach
Web App Testing - A Practical ApproachWeb App Testing - A Practical Approach
Web App Testing - A Practical ApproachWalter Mamed
 
Testing Web Applications
Testing Web ApplicationsTesting Web Applications
Testing Web ApplicationsSeth McLaughlin
 
Web Application Testing
Web Application TestingWeb Application Testing
Web Application TestingRicha Goel
 
Selenium Testing Project report
Selenium Testing Project reportSelenium Testing Project report
Selenium Testing Project reportKapil Rajpurohit
 
Software testing basic concepts
Software testing basic conceptsSoftware testing basic concepts
Software testing basic conceptsHưng Hoàng
 
Testing concepts ppt
Testing concepts pptTesting concepts ppt
Testing concepts pptRathna Priya
 
Software Testing Fundamentals
Software Testing FundamentalsSoftware Testing Fundamentals
Software Testing FundamentalsChankey Pathak
 

En vedette (12)

Web engineering - An overview about HTML
Web engineering -  An overview about HTMLWeb engineering -  An overview about HTML
Web engineering - An overview about HTML
 
Web Engineering - Web Application Testing
Web Engineering - Web Application TestingWeb Engineering - Web Application Testing
Web Engineering - Web Application Testing
 
Web application testing with Selenium
Web application testing with SeleniumWeb application testing with Selenium
Web application testing with Selenium
 
Web App Testing - A Practical Approach
Web App Testing - A Practical ApproachWeb App Testing - A Practical Approach
Web App Testing - A Practical Approach
 
Testing Web Applications
Testing Web ApplicationsTesting Web Applications
Testing Web Applications
 
Web Application Testing
Web Application TestingWeb Application Testing
Web Application Testing
 
Testing web application
Testing web applicationTesting web application
Testing web application
 
Selenium Testing Project report
Selenium Testing Project reportSelenium Testing Project report
Selenium Testing Project report
 
Software testing basic concepts
Software testing basic conceptsSoftware testing basic concepts
Software testing basic concepts
 
Testing concepts ppt
Testing concepts pptTesting concepts ppt
Testing concepts ppt
 
Software Testing Fundamentals
Software Testing FundamentalsSoftware Testing Fundamentals
Software Testing Fundamentals
 
Software testing ppt
Software testing pptSoftware testing ppt
Software testing ppt
 

Similaire à Web 2.0 Performance and Reliability: How to Run Large Web Apps

Make Your Life Easier With Maatkit
Make Your Life Easier With MaatkitMake Your Life Easier With Maatkit
Make Your Life Easier With MaatkitMySQLConference
 
How the JDeveloper team test JDeveloper at UKOUG'08
How the JDeveloper team test JDeveloper at UKOUG'08How the JDeveloper team test JDeveloper at UKOUG'08
How the JDeveloper team test JDeveloper at UKOUG'08kingsfleet
 
Tips on High Performance Server Programming
Tips on High Performance Server ProgrammingTips on High Performance Server Programming
Tips on High Performance Server ProgrammingJoshua Zhu
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation FactoryNathan Milford
 
Understanding and hiding your operations
Understanding and hiding your operationsUnderstanding and hiding your operations
Understanding and hiding your operationsDaniel López Jiménez
 
The Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With RubyThe Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With Rubymattmatt
 
Nevmug Lighthouse Automation7.1
Nevmug   Lighthouse   Automation7.1Nevmug   Lighthouse   Automation7.1
Nevmug Lighthouse Automation7.1csharney
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth PresentationLong Nguyen
 
Practical project automation
Practical project automationPractical project automation
Practical project automationReinout van Rees
 
Securing Rails
Securing RailsSecuring Rails
Securing RailsAlex Payne
 
Secure Programming With Static Analysis
Secure Programming With Static AnalysisSecure Programming With Static Analysis
Secure Programming With Static AnalysisConSanFrancisco123
 
When Crypto Attacks! (Yahoo 2009)
When Crypto Attacks! (Yahoo 2009)When Crypto Attacks! (Yahoo 2009)
When Crypto Attacks! (Yahoo 2009)Nate Lawson
 
Nsd, il tuo compagno di viaggio quando Domino va in crash
Nsd, il tuo compagno di viaggio quando Domino va in crashNsd, il tuo compagno di viaggio quando Domino va in crash
Nsd, il tuo compagno di viaggio quando Domino va in crashFabio Pignatti
 
Scaling Rails with memcached
Scaling Rails with memcachedScaling Rails with memcached
Scaling Rails with memcachedelliando dias
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelAnne Nicolas
 
Tools and Tips to Diagnose Performance Issues
Tools and Tips to Diagnose Performance IssuesTools and Tips to Diagnose Performance Issues
Tools and Tips to Diagnose Performance IssuesClaudio Miranda
 

Similaire à Web 2.0 Performance and Reliability: How to Run Large Web Apps (20)

Make Your Life Easier With Maatkit
Make Your Life Easier With MaatkitMake Your Life Easier With Maatkit
Make Your Life Easier With Maatkit
 
Drizzle Talk
Drizzle TalkDrizzle Talk
Drizzle Talk
 
How the JDeveloper team test JDeveloper at UKOUG'08
How the JDeveloper team test JDeveloper at UKOUG'08How the JDeveloper team test JDeveloper at UKOUG'08
How the JDeveloper team test JDeveloper at UKOUG'08
 
All The Little Pieces
All The Little PiecesAll The Little Pieces
All The Little Pieces
 
Tips on High Performance Server Programming
Tips on High Performance Server ProgrammingTips on High Performance Server Programming
Tips on High Performance Server Programming
 
Becoming a Power User
Becoming a Power UserBecoming a Power User
Becoming a Power User
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
 
Understanding and hiding your operations
Understanding and hiding your operationsUnderstanding and hiding your operations
Understanding and hiding your operations
 
The Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With RubyThe Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With Ruby
 
Nevmug Lighthouse Automation7.1
Nevmug   Lighthouse   Automation7.1Nevmug   Lighthouse   Automation7.1
Nevmug Lighthouse Automation7.1
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Practical project automation
Practical project automationPractical project automation
Practical project automation
 
Securing Rails
Securing RailsSecuring Rails
Securing Rails
 
Os Wilhelm
Os WilhelmOs Wilhelm
Os Wilhelm
 
Secure Programming With Static Analysis
Secure Programming With Static AnalysisSecure Programming With Static Analysis
Secure Programming With Static Analysis
 
When Crypto Attacks! (Yahoo 2009)
When Crypto Attacks! (Yahoo 2009)When Crypto Attacks! (Yahoo 2009)
When Crypto Attacks! (Yahoo 2009)
 
Nsd, il tuo compagno di viaggio quando Domino va in crash
Nsd, il tuo compagno di viaggio quando Domino va in crashNsd, il tuo compagno di viaggio quando Domino va in crash
Nsd, il tuo compagno di viaggio quando Domino va in crash
 
Scaling Rails with memcached
Scaling Rails with memcachedScaling Rails with memcached
Scaling Rails with memcached
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
 
Tools and Tips to Diagnose Performance Issues
Tools and Tips to Diagnose Performance IssuesTools and Tips to Diagnose Performance Issues
Tools and Tips to Diagnose Performance Issues
 

Plus de adunne

Seedcamp Overview
Seedcamp OverviewSeedcamp Overview
Seedcamp Overviewadunne
 
Netvibes Preview
Netvibes PreviewNetvibes Preview
Netvibes Previewadunne
 
Community Practices: From Forums to Social Networks
Community Practices: From Forums to Social NetworksCommunity Practices: From Forums to Social Networks
Community Practices: From Forums to Social Networksadunne
 
Designing Tag Navigation
Designing Tag NavigationDesigning Tag Navigation
Designing Tag Navigationadunne
 
Social Commerce and Community
Social Commerce and CommunitySocial Commerce and Community
Social Commerce and Communityadunne
 
The Starfish and the Spider
The Starfish and the SpiderThe Starfish and the Spider
The Starfish and the Spideradunne
 
Ginger Preview
Ginger PreviewGinger Preview
Ginger Previewadunne
 
Add Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with SolrAdd Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with Solradunne
 
The Impact of Mobile Web 2.0 on the Telecoms Industry
The Impact of Mobile Web 2.0 on the Telecoms IndustryThe Impact of Mobile Web 2.0 on the Telecoms Industry
The Impact of Mobile Web 2.0 on the Telecoms Industryadunne
 
Building Web 2.0: Next-Generation Data Centers
Building Web 2.0: Next-Generation Data CentersBuilding Web 2.0: Next-Generation Data Centers
Building Web 2.0: Next-Generation Data Centersadunne
 
Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...
Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...
Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...adunne
 
Designing for a Web of Data
Designing for a Web of DataDesigning for a Web of Data
Designing for a Web of Dataadunne
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Appsadunne
 
Disrupting the Platform: Harnessing social analytics and other musings on the...
Disrupting the Platform: Harnessing social analytics and other musings on the...Disrupting the Platform: Harnessing social analytics and other musings on the...
Disrupting the Platform: Harnessing social analytics and other musings on the...adunne
 
Your User's Privacy
Your User's PrivacyYour User's Privacy
Your User's Privacyadunne
 
Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set
Under the Hood: How Geonames Aggregates Over 35 Sources into One Data SetUnder the Hood: How Geonames Aggregates Over 35 Sources into One Data Set
Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Setadunne
 
Scalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and ApproachesScalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and Approachesadunne
 
Trends in Search Engine Optimization and Search Engine Marketing
Trends in Search Engine Optimization and Search Engine MarketingTrends in Search Engine Optimization and Search Engine Marketing
Trends in Search Engine Optimization and Search Engine Marketingadunne
 
Wuala, P2P Online Storage
Wuala, P2P Online StorageWuala, P2P Online Storage
Wuala, P2P Online Storageadunne
 
Breaking Down The Barriers: Design for Accessibility
Breaking Down The Barriers: Design for AccessibilityBreaking Down The Barriers: Design for Accessibility
Breaking Down The Barriers: Design for Accessibilityadunne
 

Plus de adunne (20)

Seedcamp Overview
Seedcamp OverviewSeedcamp Overview
Seedcamp Overview
 
Netvibes Preview
Netvibes PreviewNetvibes Preview
Netvibes Preview
 
Community Practices: From Forums to Social Networks
Community Practices: From Forums to Social NetworksCommunity Practices: From Forums to Social Networks
Community Practices: From Forums to Social Networks
 
Designing Tag Navigation
Designing Tag NavigationDesigning Tag Navigation
Designing Tag Navigation
 
Social Commerce and Community
Social Commerce and CommunitySocial Commerce and Community
Social Commerce and Community
 
The Starfish and the Spider
The Starfish and the SpiderThe Starfish and the Spider
The Starfish and the Spider
 
Ginger Preview
Ginger PreviewGinger Preview
Ginger Preview
 
Add Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with SolrAdd Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with Solr
 
The Impact of Mobile Web 2.0 on the Telecoms Industry
The Impact of Mobile Web 2.0 on the Telecoms IndustryThe Impact of Mobile Web 2.0 on the Telecoms Industry
The Impact of Mobile Web 2.0 on the Telecoms Industry
 
Building Web 2.0: Next-Generation Data Centers
Building Web 2.0: Next-Generation Data CentersBuilding Web 2.0: Next-Generation Data Centers
Building Web 2.0: Next-Generation Data Centers
 
Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...
Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...
Killing the Org Chart: Organizational, Cultural and Leadership Models on the ...
 
Designing for a Web of Data
Designing for a Web of DataDesigning for a Web of Data
Designing for a Web of Data
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
 
Disrupting the Platform: Harnessing social analytics and other musings on the...
Disrupting the Platform: Harnessing social analytics and other musings on the...Disrupting the Platform: Harnessing social analytics and other musings on the...
Disrupting the Platform: Harnessing social analytics and other musings on the...
 
Your User's Privacy
Your User's PrivacyYour User's Privacy
Your User's Privacy
 
Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set
Under the Hood: How Geonames Aggregates Over 35 Sources into One Data SetUnder the Hood: How Geonames Aggregates Over 35 Sources into One Data Set
Under the Hood: How Geonames Aggregates Over 35 Sources into One Data Set
 
Scalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and ApproachesScalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and Approaches
 
Trends in Search Engine Optimization and Search Engine Marketing
Trends in Search Engine Optimization and Search Engine MarketingTrends in Search Engine Optimization and Search Engine Marketing
Trends in Search Engine Optimization and Search Engine Marketing
 
Wuala, P2P Online Storage
Wuala, P2P Online StorageWuala, P2P Online Storage
Wuala, P2P Online Storage
 
Breaking Down The Barriers: Design for Accessibility
Breaking Down The Barriers: Design for AccessibilityBreaking Down The Barriers: Design for Accessibility
Breaking Down The Barriers: Design for Accessibility
 

Dernier

Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Sheetaleventcompany
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...amitlee9823
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceEluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceDamini Dixit
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noidadlhescort
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...Sheetaleventcompany
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLkapoorjyoti4444
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...lizamodels9
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfAdmir Softic
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756dollysharma2066
 

Dernier (20)

Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceEluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 

Web 2.0 Performance and Reliability: How to Run Large Web Apps

  • 1. Artur Bergman sky@crucially.net • Wikia Inc – We are hiring – Community/Bizdev in Germany – Engineers in Poland – http://www.wikia.com/wiki/hiring • O’Reilly Radar – http://radar.oreilly.com/artur/
  • 2. The value of operations • Google • Orkut • Friendster • Myspace
  • 3. Benefits • Users trust your brand • They rely on you • They spend more time on your site • Bad operations wastes R&D money • Fixed amount of time + faster site = more page views
  • 4. Stepchild of Engineering • Product development • Engineering • Operations – Sysadmins? • Why?
  • 5. Operations Engineering • It is engineering • Google terminology - – Site Reliability Engineer • Sure there are sysadmins too, people mananing NOCs and datacenters • Provide career growth
  • 6. Good Engineers • Detail Oriented • Aspire to be operational engineers • Stubborn • Can steer their inner ADD – Interrupt driven • Not the same as good developers
  • 7. Danger signs • Thinks operation is a path to development engineering – Fire them • Want people dedicated to the task • A good operations engineer should spend some time in development • A good development engineer MUST spend some time in operations
  • 8.
  • 9. Debugging • 9 Rules of debugging • http://www.debuggingrules.com/Poster_ download.html – Yes the font is horrible
  • 10. Rule 1: Understand the system • Complexity Kills • No excuse • If you write it, you must know it • If you run it, you must know it • If you buy it, you must know it
  • 11. Rule 3: Quit thinking and look • quot;It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”
  • 12. Rule 3: Quit thinking and look • What do you look at? • The importance of monitoring • Monitoring • Monitoring • Monitoring
  • 13. My my, confusing term • Monitoring • Alerting • Trending
  • 14. Monitoring • Collects data • Puts into databases • Makes it available for you • Active collection • Passive interaction
  • 15. Alerting • Acts on monitoring data • Severe alerts – Active – Needs action • Passive alerts – Things that need to be done but not right now • DO NOT OVER ALERT • DO NOT CRY WOLF
  • 16. Wikia alerting strategy • When the site is slow • Or down • We send emails and do phone calls • Europe and US West coast • Looking to hire in East Asia • No night time
  • 17. Trending • Long term • Capacity planning
  • 18. Monitor Tools • Nagios • Cacti • MRTG • Hyperic • Cricket • Ganglia
  • 19. External Monitoring • Use one, tells you what your clients see every x minutes • Keynote • Gomez • Websitepulse (cheap - easy - I like them; no annoying salesforce)
  • 20. Nagios • Alerting • Hassle • C CGI?? • Doesn’t scale
  • 21. Hyperic • Most exciting open source tool • Agent base - self configured • Baseline alerting
  • 22. Cricket MRTG Cacti • Impossible to configure • You need to write tools to do it • Especially Cacti – Somewhat more pleasant than clawing out your eyes
  • 23. Ganglia • We love ganglia • Automatically graphs everything you want - just works • Large scale clusters • Multicast • Zero config • RRD
  • 24. http://ganglia.wikimedia.org/ • 270 hosts • 880 CPU • 2 clusters • 1.2 TB of Memory
  • 26. Custom Ganglia Gmetrics • Write your own gmetric --name='Oldest query' --type=int32 --units='sec' --dmax=65 --value=`echo ' show processlist' | mysql -uroot -ppass | grep -v Sleep | grep -v 'system user' | head -2 | tail -1 | cut -f 6`
  • 27. Custom Ganglia Gmetrics • Or Learn Unix gmetric --name='Oldest query' --type=int32 --units='sec' --dmax=65 --value=`echo ' show processlist' | mysql -uroot -ppass | grep -v Sleep | grep -v 'system user' | head -2 | tail -1 | cut -f 6`
  • 28. Custom Ganglia Gmetrics • Write your own gmetric --name='Oldest query' --type=int32 --units='sec' --dmax=65 --value=`echo ' show processlist' | mysql -uroot -ppass | grep -v Sleep | grep -v 'system user' | head -2 | tail -1 | cut -f 6`
  • 29. Custom Ganglia Gmetrics • Write your own gmetric --name='Oldest query' --type=int32 --units='sec' --dmax=65 --value=`echo ' show processlist' | mysql -uroot -ppass | grep -v Sleep | grep -v 'system user' | head -2 | tail -1 | cut -f 6`
  • 30. Custom Ganglia Gmetrics • Write your own gmetric --name='Oldest query' --type=int32 --units='sec' --dmax=65 --value=`echo ' show processlist' | mysql -uroot -ppass | grep -v Sleep | grep -v 'system user' | head -2 | tail -1 | cut -f 6`
  • 31. Something is wrong • Don’t worry, data warehouse QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
  • 32. tcpdump / waveshark • If you suspect the network • Don’t just suspect • LOOK AT IT • Tcpdump / waveshark will tell you – If your packets are lost, delayed or corrupted – Your windowing is wrong
  • 33. Rule 4: Divde and Conquer • Look at the problems in turn • Split between people • Go in the order you suspect is the most likely
  • 34. Rule 5: Change one thing at a time • I cannot stress this enough • IF YOU DO NOT THEN YOU HAVE FAILED TO IDENTIFY THE PROBLEM
  • 35. Rule 6: Keep an audit trail • You might be making things worse • Good for the root cause analysis • Have your shell log all commands – Good practice anyway • Version control
  • 36. Rule 9: If you didn’t fix it, it ain’t fixed • You must do something to fix a problem • Or it will bite you again • And again • And again • They don’t just appear and disappear • Except BGP route convergence :)
  • 37. Process • You need a little • Don’t worry
  • 39. Complexity kills • Design against it • Reuse components • Define standards • Have a few images that all machines look like - reimage machines every now and then for the heck of it. – EC2 forces you to do this
  • 40. MTBF Meduim Time Between Failure • Actually mostly irrelevant • Dealing with failure is more important • Target the right uptime – Complexity scales exponatially with required uptime • Don’t kid yourself, you don’t need 5 nines
  • 41. MTTR Medium Time To Recovery • Important • Noone cares if you fail once a minute – If you recover in 50 ms • If you are down 1 minute a week, you are still going to hit 4 nines (99.99%) • Failures happen, plan how to deal with them
  • 42. Problem found • If it is critical, start a phone conversation • Use IRC to communicate technical data • One person liasons with non technical staff • One person specifically in command • Sleep scheduling ( audit log important )
  • 43. Post crisis • Root cause analysis – Just find out what went wrong – And how to avoid it – Or fix it faster next time if you can’t • Keep track of your uptime
  • 44. Automation • All machines are created equal • Seriously • If you manually make changes • You are wrong – Unless you know what you are doing
  • 45. Best practices • Version control • Gold images • Centralised authentication • Time Sync ( NTP ) • Central logging • ( All of this applies for virtual machines too!)
  • 46. cfengine • Standard automation tool • Written in C • Not much support • Very good • Very annoying
  • 47. contro : l s te i = ( mys te ) i domain = ( mysite .count y ) r sysadm = (mark ) netmask = ( 255.255.255.0 ) ac i t onsequence = ( mounta ll mount nfo i addmounts mounta l l lnks i ) mountpat rn = / ie) ( te $(s t /$ host)) homepat r = ( u? ) te n
  • 48. Puppet • New hip kid on the block • Written in ruby • Better support? • Much nicer syntax • Easier to extend
  • 49. def ne yumrepo (enab i led = true) {c i i onf gfle { /e c quot; t /yum.repos / .d $name.repo”: mode => 644, source => quot; yum/repos / /$name. repoquot;, ensure => $enab led ? { true => fl , ie defau t=> absent l } }}
  • 50. cobb er l • Automatic PXE Installer – Uses kickstart files • Redhat Enterprise • Centos • Fedora • Some support for debian
  • 51. cobbler cobbler system add --name=xen8 --mac=00:19:B9:EE:6D:0A --ip=10.10.30.208 --profile=Centos-5-x86_64 --kopts='ksdevice=00:19:B9:EE:6D:0A console=ttyS1,57600 console=tty0'
  • 52. cobbler cobbler system add --name=xen8 --mac=00:19:B9:EE:6D:0A --ip=10.10.30.208 --profile=Centos-5-x86_64 --kopts='ksdevice=00:19:B9:EE:6D:0A console=ttyS1,57600 console=tty0’
  • 53. koan • Client install tool – Xen – Or OS re-image koan --server=10.10.30.205 --virt -- profile=virt_fc6 --virt-name=otrs
  • 54. Your datacenter • Keep it tidy – Label things, keep cables as short as possible – Have a switch in each rack • If you are small without dedicated DC staff you need – Remote control power switches – Remote console!
  • 55. Virtualization • Please use it • Managing becomes much easier • Power consumption • Need a new test box – The requestor can have it in minutes
  • 56. Power consumption • Maybe not as important in Europe • 8 core machines are more efficient than 1 core • But memcache uses 1 core and all RAM • Get more RAM and virtualise
  • 57. Our network admin boxes • 1 Xen CPU for Vyatta • 1 Xen CPU for LVS • 1 Xen CPU for Squid - Carp • 1 Xen CPU for Squid • 1 Xen CPU for Monitoring • 1 Xen CPU for network tasks • We can have more of these and a loss of one affects us less
  • 58. Vyatta • Opensource router – Really like it – No need to use Cisco
  • 59. LVS • Linux Virtual Server • Low level load balancer • HA • Fast • Doesn’t inspire people to put things in the only place that is hard to scale
  • 60. Squid Carp • Squids configured to hash the urls and send them to specific backend • Very little configuration done • Logging of UDP - no disk IO
  • 61. Squid • As a reverse web accelerator • 90 % of our hits served from RAM in less than 1 ms • Same as wikipedia • We only use RAM cache ( unlike wikipedia) • Cached per user • If not cacheable - cache for a second to redue backend effect
  • 62. App servers • 1 xen cpu for memcache ( 5 GB Ram) • 1 xen cpu for squid ( 5GB Ram ) • 6 xen cpus for apache (6 GB Ram ) • More power efficient, less affected by loss • Applications can’t affect each other
  • 63. Databases • Keep developers on short leash • Report bad queries • Fear object relational mappers
  • 64. Outsourcing • As much as possible • The younger you are as a company the less risk – When you have no users, you have no value • VCs don’t like having their money go into Capex
  • 65. What I want from Vendors • They do what they tell me • They do what I tell them • No annoying up sells, no premium services – I know more about what you are selling than you
  • 66. Services we use • Amazon EC2 and S3 • Panther-Express
  • 67. Panther Express • Fantastic Content Distribution Network • Cheap, simple price list – Take note akamai • Cut delivery time to Europe by 70% • We let our images be cached 1 second to redue load
  • 68. EC2 and S3 • We save all our binlogs to S3 • We save database dumps to S3 • We have monitors running from EC2 • We plan to build a datawarehouse cluster on EC2
  • 69. EC2 Requires Automation • Machine is blank when you bring it up • Download database dump from S3 and replicate up - automatically • Use puppet • Amazon saves you hardware headaches – But complexity is still a problem