Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

AWS EC2 and ELB troubleshooting

6 729 vues

Publié le

AWS EC2 and ELB troubleshooting

Publié dans : Internet
  • Identifiez-vous pour voir les commentaires

AWS EC2 and ELB troubleshooting

  1. 1. Troubleshooting (EC2, ELB) Version 1.0 Shiva N (narshiva@amazon.com)
  2. 2. v Your first stop Subscribe to the RSS Feed, and possibly integrate into your operations dashboard. If there are no issues, then it is not likely to be an issue with the AWS Service. http://status.aws.amazon.com/
  3. 3. v Your second stop AWS Cloudwatch  Provides metrics across all AWS services, that are not available to external monitoring systems.  Detailed monitoring should be enabled.  Proactively configure notifications for threshold alarming.  Use customs metrics.
  4. 4. v Troubleshooting – EC2 • Instance Launch • EC2 Instance Health • EC2 Instance Network connectivity • EBS issues
  5. 5. v Troubleshooting Instance launch • Potential causes • Account limits issue • IAM user issue • AutoScaling event terminating instance • Bad/Corrupted AMI configuration • Storage attachment issues • AWS Infrastructure issues
  6. 6. v Troubleshooting EC2 Instance Health • Potential causes • EBS volume snapshot in progress • Cloud-init user-data script failures • Meta data access issues • OS filesystem issues • Kernel issues • Underlying AWS infrastructure issues
  7. 7. v Troubleshooting Network Connectivity • Potential causes • CPU, Memory or I/O utilization of instance • Number of active connections exceeding capacity / limits / memory. • AWS EC2 Network Security Groups, ACL's, Routing misconfiguration. • Instance OS Firewall blocking connectivity. • SSH keys lost or misconfiguration. • Network carrier issues.
  8. 8. v EBS issues • Potential causes • Account limits • EBS snapshotting in progress • File system/OS configuration issues • Underlying infrastructure issues
  9. 9. v ELB Troubleshooting – API Response • OutofService: A Transient Error Occurred Internal ELB error. Retry API call, and raise support case on failure • CertificateNotFound: undefined • ELBs based on load, span across AZs, and on create might take some time to sync certificates. Retry API call after some time • The certificate you are trying to use is not found, or not well formed.
  10. 10. v ELB Troubleshooting – Error Messages • HTTP 400: BAD_REQUEST - Client sent a bad request. • HTTP 405: METHOD_NOT_ALLOWED - Length of the method in the request header exceeds 127 characters. • HTTP 408: Request Timeout - Indicates that the client cancelled the request or failed to send a full request. • HTTP 502: Bad Gateway - Indicates that the load balancer was unable to parse the response sent from a registered instance. • HTTP 503: Service Unavailable or HTTP 504 Gateway Timeout • Insufficient capacity in the load balancer to handle the request. • Registered instances closing the connection (KeepAlive issues) • No registered instances, or no healthy instances • Connection to the client is closed • The instance's security group does not allow communication with load balancer.
  11. 11. v ELB Troubleshooting – Response Metrics • HTTPCode_ELB_4XX - Indicates a malformed or a cancelled request from the client. • HTTPCode_ELB_5XX - Either the load balancer or the registered instance is causing the error or the load balancer is unable to parse the response. • HTTPCode_Backend_2XX - Indicates a normal, successful response from the registered instance(s). • HTTPCode_Backend_3XX - Indicates some type of redirect response sent from the registered instance(s). • HTTPCode_Backend_4XX - Indicates some type of client error response sent from the registered instance(s). • HTTPCode_Backend_5XX - Indicates some type of server error response sent from the registered instance(s).
  12. 12. v ELB Troubleshooting – Other issues • Load balancer health check failure • Instance(s) closing the connection to the load balancer. (ELB has 60s timeout) • Responses timing out. • Non-200 response received • Failing public key authentication. • Registering instances taking longer than expected to be In Service.
  13. 13. v ELB Troubleshooting – Potential problems • Possible causes • The client(s) are caching IP of DNS lookups • The back end instances within an AZ can have an imbalance • Sticky sessions • Request processing time - requests that can take a long time to process • Unhealthy hosts • Timeout settings (keep alives) • NACLs and SGs • ELB fails to scale on spiky traffic • SSL/Certificate issues
  14. 14. v ELB troubleshooting • Resources • ELB Logs – Proactive Monitor • Alarm ELB metrics • Network diagnostics
  15. 15. v Information required in support case • All resource ID's of all resources involved in problem description or diagnosis steps. • Instance types, AZ locations, AMI, storage configuration, etc. for any patterns or trends. • Exact times problems began to, or stopped occurring, frequency of occurrence if repeating. • Instance console or ELB logs/error logs • Troubleshooting steps performed to date, protocols used, etc.
  16. 16. v What next? • Run books / Play books • Automation • Monthly internal reviews and documented mitigation strategies • Continual improvement plan • Collaboration and Communication with AWS TAM and SA • Well Architected review
  17. 17. v Resources and References • Documentation • Amazon Elastic Cloud – User Guide • Elastic Load Balancing – Developer Guide • Training • AWS Certified Sysops Associate • AWS Certified Architect Associate
  18. 18. v Discussion…

×