SlideShare une entreprise Scribd logo
1  sur  10
AWS Summit 2013
Navigating the Cloud
Understanding Amazon EBS Availability and Performance
Eric Anderson
CopperEgg
April 18, 2013
CopperEgg: EBS Use Case
• How CopperEgg uses EBS
• EBS vs Provisioned IOPS EBS
• EBS and RAID
• Backup/Snapshot best practices
• Filesystem selection and tuning
• Monitoring/Migrations/Planning
How CopperEgg uses EBS
• Real-time monitoring (every 5s)
– System information
– Processes
– Synthetic HTTP/TCP/etc
– Application metrics
– Tons more..
• Requirements:
– Store many terabytes of data
– Persist the data over long periods of time
– Backups (use snapshots)
– High IO: 50-60k+ ops/s per node
• SSD + Provisioned IOPS EBS
– Consistent IO behavior (non-spikey)
EBS vs Provisioned IOPS EBS
• Standard EBS
– Good for low IO volume
– Bursty workloads may be a good
fit: do the math
• Provisioned IOPS EBS
– Great for steady IO patterns that
need consistency
– Not always more expensive than
standard!
– Be sure to use the IOPS you
provision!
EBS and RAID
• Which RAID?
– Depends on your use case, but:
• We use stripes (RAID 0) for most things
– Good performance, we build our fault tolerance at a different level
• RAID 10 (stripe of mirrors)
– Good RAID0 performance, but increase in fault tolerance due to mirrors
– Twice the cost of RAID 0
• RAID 0+1 (mirror of stripes)
– Don’t do this – same performance, worse fault tolerance
• RAID 5 (stripe with parity)
– Could be dangerous: software RAID 5 can be bad if you have any write caching enabled.
– Maybe RAID 6 (dual parity) is an option..
• Block size
– Use an appropriate stripe size for best results
• We use 64kb – but you need to test various configs to get the best fit for your application
Backup/Snapshot best practices
• Snapshot regularly
– At least once per day, more if you can
– First snapshots take a while, subsequent are faster
– Schedule for when your IO load is lowest to reduce impact
• We do it at around 9pm CST
• Use consistent naming for snapshots
– {hostname}-{raid device}-{device}-{timestamp}
• Use the API for creation
– Faster kickoff, more likely to be consistent (script it!)
– ec2-create-snapshot –d “{hostname}-{raid device}-{device}-{timestamp}” vol-d726382
• Move older snapshots to S3/Glacier for long-term storage
• RAID makes this a bit more complex:
– Make sure you unmount/snapshot/remount your file system, or use fsfreeze to keep
consistent snapshots!
Choosing a good file system
• We like ext3/4, but we love XFS
– High performance, consistent
– Robust and lots of options for tweaking/adjusting as needed
• Our favorite mount options: (your mileage may vary)
– inode64, noatime, nodiratime, attr2, nobarrier, logbufs=8, logbsize=256k, osyncisdsync, nobootwait, noauto
– Yields great performance, reduces unnecessary writes, stable
• We like ZFS a lot too, but we want to see more runtime on linux first
– But FreeBSD/ZFS would be a fine choice
• However: test your workload!
– File systems behave differently under different workloads
EBS/File system performance tuning
• Tuning file systems:
– Set the scheduler to use „deadline‟ (for each disk in RAID array/EBS):
• [as root] echo deadline > /sys/block/[disk device]/queue/scheduler
– Adjust how aggressively the cache is written to disk. Tune these back if you are
bursty in write IO:
• vm.dirty_ratio=30
• vm.dirty_background_ratio=20
• Track what you change!
– Before changing anything, monitor it
– After you make the change, monitor it
– Then: KEEP monitoring it – things can change over time in unexpected ways
Monitoring
• Observing:
– iostat –xcd –t 1
• Watch the sum of r/s and w/s – this is your IOPS metric. For PIOPS, you want it close to the provisioned
amount. We monitor this using CopperEgg custom metrics, and alert if it goes low, or high.
– grep –A 1 dirty /proc/vmstat
• If nr_dirty approaches nr_dirty_threshold, you need to tune down vm.dirty to flush writes more often.
• Reference: http://docs.neo4j.org/chunked/stable/linux-performance-guide.html
• Useful stats to capture:
– In /proc/fs/xfs/stat
• xs_trans* -> transactions
• xs_read/write* -> read/write operations stats
• xb_* -> buffer stats
• Ignore SMART - does not work for EBS
• Watch the console log
– Use the AWS API to look for warning signs of EBS issues
Migrations and Capacity Planning
• Using PIOPS?
– Plan on a data migration path if you need to increase PIOPS
• You can‟t (yet) increase IOPS on the fly
• Migration steps from an EBS backed RAID:
1. Snapshot 1hr before, then again, and again – each time it takes less time
2. Stop all services
3. Unmount the filesystem
4. Stop the RAID (mdadm –stop /dev/md0)
5. Take final snapshot
6. Create new volumes based on last snapshot
7. RAID attach new volumes – mdadm should detect the array and magically make it work.
8. Mount the filesystem
9. Restart services

Contenu connexe

En vedette

Eastenders soap example
Eastenders soap exampleEastenders soap example
Eastenders soap exampleaq101824
 
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio GuerreroTendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrerotex4future
 
Smart Technologies - Cetemmsa
Smart Technologies - CetemmsaSmart Technologies - Cetemmsa
Smart Technologies - Cetemmsatex4future
 
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...polo0007
 
Periodic Table Project 2012
Periodic Table Project 2012Periodic Table Project 2012
Periodic Table Project 2012jmori1
 
Latin I lesson 11
Latin I lesson 11Latin I lesson 11
Latin I lesson 11polaramy
 
Updated copyright presentation_after_chapter7-9
Updated copyright presentation_after_chapter7-9Updated copyright presentation_after_chapter7-9
Updated copyright presentation_after_chapter7-9albertrodriguez5150
 
Ndiaye Agricultural non family workers (Sourga) in Senegal River Valley
Ndiaye Agricultural non family workers (Sourga) in Senegal River ValleyNdiaye Agricultural non family workers (Sourga) in Senegal River Valley
Ndiaye Agricultural non family workers (Sourga) in Senegal River Valleyfutureagricultures
 
Ear study guide
Ear study guideEar study guide
Ear study guidesmblum2
 
real estate dealer in patna 9304611353
real estate dealer in patna 9304611353real estate dealer in patna 9304611353
real estate dealer in patna 9304611353Adore Global Pvt. Ltd
 
Lecture ready class 5
Lecture ready class 5Lecture ready class 5
Lecture ready class 5Les Davy
 
Twinny in Romania, Bucharest, Sc 279
Twinny in Romania, Bucharest, Sc 279Twinny in Romania, Bucharest, Sc 279
Twinny in Romania, Bucharest, Sc 279balada65
 
Voto de Gilmar Mendes contra Lula - Mar 2016
Voto de Gilmar Mendes contra Lula - Mar 2016Voto de Gilmar Mendes contra Lula - Mar 2016
Voto de Gilmar Mendes contra Lula - Mar 2016Miguel Rosario
 
Civil Society - recommendations from AIGLIA2014
Civil Society - recommendations from AIGLIA2014Civil Society - recommendations from AIGLIA2014
Civil Society - recommendations from AIGLIA2014futureagricultures
 

En vedette (17)

Betonfootball
BetonfootballBetonfootball
Betonfootball
 
Eastenders soap example
Eastenders soap exampleEastenders soap example
Eastenders soap example
 
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio GuerreroTendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
Tendències i models de negoci del sector Tèxtil –Moda de José Antonio Guerrero
 
Smart Technologies - Cetemmsa
Smart Technologies - CetemmsaSmart Technologies - Cetemmsa
Smart Technologies - Cetemmsa
 
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
It takes a pillage behind the bailouts, bonuses, and backroom deals from wash...
 
Periodic Table Project 2012
Periodic Table Project 2012Periodic Table Project 2012
Periodic Table Project 2012
 
2010 1
2010 12010 1
2010 1
 
Latin I lesson 11
Latin I lesson 11Latin I lesson 11
Latin I lesson 11
 
Updated copyright presentation_after_chapter7-9
Updated copyright presentation_after_chapter7-9Updated copyright presentation_after_chapter7-9
Updated copyright presentation_after_chapter7-9
 
Ndiaye Agricultural non family workers (Sourga) in Senegal River Valley
Ndiaye Agricultural non family workers (Sourga) in Senegal River ValleyNdiaye Agricultural non family workers (Sourga) in Senegal River Valley
Ndiaye Agricultural non family workers (Sourga) in Senegal River Valley
 
Ear study guide
Ear study guideEar study guide
Ear study guide
 
real estate dealer in patna 9304611353
real estate dealer in patna 9304611353real estate dealer in patna 9304611353
real estate dealer in patna 9304611353
 
Lecture ready class 5
Lecture ready class 5Lecture ready class 5
Lecture ready class 5
 
Twinny in Romania, Bucharest, Sc 279
Twinny in Romania, Bucharest, Sc 279Twinny in Romania, Bucharest, Sc 279
Twinny in Romania, Bucharest, Sc 279
 
Betonfootball (подробная презентация)
Betonfootball (подробная презентация)Betonfootball (подробная презентация)
Betonfootball (подробная презентация)
 
Voto de Gilmar Mendes contra Lula - Mar 2016
Voto de Gilmar Mendes contra Lula - Mar 2016Voto de Gilmar Mendes contra Lula - Mar 2016
Voto de Gilmar Mendes contra Lula - Mar 2016
 
Civil Society - recommendations from AIGLIA2014
Civil Society - recommendations from AIGLIA2014Civil Society - recommendations from AIGLIA2014
Civil Society - recommendations from AIGLIA2014
 

Plus de CopperEgg

Infographic: How much of your infrastructure is in the cloud?
Infographic: How much of your infrastructure is in the cloud?Infographic: How much of your infrastructure is in the cloud?
Infographic: How much of your infrastructure is in the cloud?CopperEgg
 
Infographic - MSP AWS Migration
Infographic - MSP AWS MigrationInfographic - MSP AWS Migration
Infographic - MSP AWS MigrationCopperEgg
 
6 Development Tools we Love for Mac
6 Development Tools we Love for Mac6 Development Tools we Love for Mac
6 Development Tools we Love for MacCopperEgg
 
Infographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance MonitoringInfographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance MonitoringCopperEgg
 
CopperEgg Popular Features
CopperEgg Popular FeaturesCopperEgg Popular Features
CopperEgg Popular FeaturesCopperEgg
 
Infographic - Essential Elements for Server and Web Monitoring
Infographic - Essential Elements for Server and Web Monitoring Infographic - Essential Elements for Server and Web Monitoring
Infographic - Essential Elements for Server and Web Monitoring CopperEgg
 
Infographic - Deploying and Monitoring AWS
Infographic - Deploying and Monitoring AWSInfographic - Deploying and Monitoring AWS
Infographic - Deploying and Monitoring AWSCopperEgg
 
Infographic - CopperEgg and Chef Integration
Infographic - CopperEgg and Chef IntegrationInfographic - CopperEgg and Chef Integration
Infographic - CopperEgg and Chef IntegrationCopperEgg
 
Infographic - Choosing EC2 Instances: Honey Badger or Sloth?
Infographic - Choosing EC2 Instances: Honey Badger or Sloth?Infographic - Choosing EC2 Instances: Honey Badger or Sloth?
Infographic - Choosing EC2 Instances: Honey Badger or Sloth?CopperEgg
 
Infographic - Cloud Monitoring Basics Cheat Sheet
Infographic - Cloud Monitoring Basics Cheat SheetInfographic - Cloud Monitoring Basics Cheat Sheet
Infographic - Cloud Monitoring Basics Cheat SheetCopperEgg
 
Top 5 Nagios Replacement Must Haves
Top 5 Nagios Replacement Must HavesTop 5 Nagios Replacement Must Haves
Top 5 Nagios Replacement Must HavesCopperEgg
 
Server Monitoring as a Service
Server Monitoring as a ServiceServer Monitoring as a Service
Server Monitoring as a ServiceCopperEgg
 
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud MonitoringCloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud MonitoringCopperEgg
 

Plus de CopperEgg (13)

Infographic: How much of your infrastructure is in the cloud?
Infographic: How much of your infrastructure is in the cloud?Infographic: How much of your infrastructure is in the cloud?
Infographic: How much of your infrastructure is in the cloud?
 
Infographic - MSP AWS Migration
Infographic - MSP AWS MigrationInfographic - MSP AWS Migration
Infographic - MSP AWS Migration
 
6 Development Tools we Love for Mac
6 Development Tools we Love for Mac6 Development Tools we Love for Mac
6 Development Tools we Love for Mac
 
Infographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance MonitoringInfographic - The State of Application Performance Monitoring
Infographic - The State of Application Performance Monitoring
 
CopperEgg Popular Features
CopperEgg Popular FeaturesCopperEgg Popular Features
CopperEgg Popular Features
 
Infographic - Essential Elements for Server and Web Monitoring
Infographic - Essential Elements for Server and Web Monitoring Infographic - Essential Elements for Server and Web Monitoring
Infographic - Essential Elements for Server and Web Monitoring
 
Infographic - Deploying and Monitoring AWS
Infographic - Deploying and Monitoring AWSInfographic - Deploying and Monitoring AWS
Infographic - Deploying and Monitoring AWS
 
Infographic - CopperEgg and Chef Integration
Infographic - CopperEgg and Chef IntegrationInfographic - CopperEgg and Chef Integration
Infographic - CopperEgg and Chef Integration
 
Infographic - Choosing EC2 Instances: Honey Badger or Sloth?
Infographic - Choosing EC2 Instances: Honey Badger or Sloth?Infographic - Choosing EC2 Instances: Honey Badger or Sloth?
Infographic - Choosing EC2 Instances: Honey Badger or Sloth?
 
Infographic - Cloud Monitoring Basics Cheat Sheet
Infographic - Cloud Monitoring Basics Cheat SheetInfographic - Cloud Monitoring Basics Cheat Sheet
Infographic - Cloud Monitoring Basics Cheat Sheet
 
Top 5 Nagios Replacement Must Haves
Top 5 Nagios Replacement Must HavesTop 5 Nagios Replacement Must Haves
Top 5 Nagios Replacement Must Haves
 
Server Monitoring as a Service
Server Monitoring as a ServiceServer Monitoring as a Service
Server Monitoring as a Service
 
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud MonitoringCloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
Cloud Monitoring 101 - The Five Key Elements to Effective Cloud Monitoring
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Dernier (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Understanding Amazon EBS Availability and Performance

  • 1. AWS Summit 2013 Navigating the Cloud Understanding Amazon EBS Availability and Performance Eric Anderson CopperEgg April 18, 2013
  • 2. CopperEgg: EBS Use Case • How CopperEgg uses EBS • EBS vs Provisioned IOPS EBS • EBS and RAID • Backup/Snapshot best practices • Filesystem selection and tuning • Monitoring/Migrations/Planning
  • 3. How CopperEgg uses EBS • Real-time monitoring (every 5s) – System information – Processes – Synthetic HTTP/TCP/etc – Application metrics – Tons more.. • Requirements: – Store many terabytes of data – Persist the data over long periods of time – Backups (use snapshots) – High IO: 50-60k+ ops/s per node • SSD + Provisioned IOPS EBS – Consistent IO behavior (non-spikey)
  • 4. EBS vs Provisioned IOPS EBS • Standard EBS – Good for low IO volume – Bursty workloads may be a good fit: do the math • Provisioned IOPS EBS – Great for steady IO patterns that need consistency – Not always more expensive than standard! – Be sure to use the IOPS you provision!
  • 5. EBS and RAID • Which RAID? – Depends on your use case, but: • We use stripes (RAID 0) for most things – Good performance, we build our fault tolerance at a different level • RAID 10 (stripe of mirrors) – Good RAID0 performance, but increase in fault tolerance due to mirrors – Twice the cost of RAID 0 • RAID 0+1 (mirror of stripes) – Don’t do this – same performance, worse fault tolerance • RAID 5 (stripe with parity) – Could be dangerous: software RAID 5 can be bad if you have any write caching enabled. – Maybe RAID 6 (dual parity) is an option.. • Block size – Use an appropriate stripe size for best results • We use 64kb – but you need to test various configs to get the best fit for your application
  • 6. Backup/Snapshot best practices • Snapshot regularly – At least once per day, more if you can – First snapshots take a while, subsequent are faster – Schedule for when your IO load is lowest to reduce impact • We do it at around 9pm CST • Use consistent naming for snapshots – {hostname}-{raid device}-{device}-{timestamp} • Use the API for creation – Faster kickoff, more likely to be consistent (script it!) – ec2-create-snapshot –d “{hostname}-{raid device}-{device}-{timestamp}” vol-d726382 • Move older snapshots to S3/Glacier for long-term storage • RAID makes this a bit more complex: – Make sure you unmount/snapshot/remount your file system, or use fsfreeze to keep consistent snapshots!
  • 7. Choosing a good file system • We like ext3/4, but we love XFS – High performance, consistent – Robust and lots of options for tweaking/adjusting as needed • Our favorite mount options: (your mileage may vary) – inode64, noatime, nodiratime, attr2, nobarrier, logbufs=8, logbsize=256k, osyncisdsync, nobootwait, noauto – Yields great performance, reduces unnecessary writes, stable • We like ZFS a lot too, but we want to see more runtime on linux first – But FreeBSD/ZFS would be a fine choice • However: test your workload! – File systems behave differently under different workloads
  • 8. EBS/File system performance tuning • Tuning file systems: – Set the scheduler to use „deadline‟ (for each disk in RAID array/EBS): • [as root] echo deadline > /sys/block/[disk device]/queue/scheduler – Adjust how aggressively the cache is written to disk. Tune these back if you are bursty in write IO: • vm.dirty_ratio=30 • vm.dirty_background_ratio=20 • Track what you change! – Before changing anything, monitor it – After you make the change, monitor it – Then: KEEP monitoring it – things can change over time in unexpected ways
  • 9. Monitoring • Observing: – iostat –xcd –t 1 • Watch the sum of r/s and w/s – this is your IOPS metric. For PIOPS, you want it close to the provisioned amount. We monitor this using CopperEgg custom metrics, and alert if it goes low, or high. – grep –A 1 dirty /proc/vmstat • If nr_dirty approaches nr_dirty_threshold, you need to tune down vm.dirty to flush writes more often. • Reference: http://docs.neo4j.org/chunked/stable/linux-performance-guide.html • Useful stats to capture: – In /proc/fs/xfs/stat • xs_trans* -> transactions • xs_read/write* -> read/write operations stats • xb_* -> buffer stats • Ignore SMART - does not work for EBS • Watch the console log – Use the AWS API to look for warning signs of EBS issues
  • 10. Migrations and Capacity Planning • Using PIOPS? – Plan on a data migration path if you need to increase PIOPS • You can‟t (yet) increase IOPS on the fly • Migration steps from an EBS backed RAID: 1. Snapshot 1hr before, then again, and again – each time it takes less time 2. Stop all services 3. Unmount the filesystem 4. Stop the RAID (mdadm –stop /dev/md0) 5. Take final snapshot 6. Create new volumes based on last snapshot 7. RAID attach new volumes – mdadm should detect the array and magically make it work. 8. Mount the filesystem 9. Restart services