SlideShare une entreprise Scribd logo
1  sur  15
Télécharger pour lire hors ligne
Network Troubleshooting
In the Cloud: Tools,
Techniques, and Gotchas
AWS Bootcamp #8 – September 6, 2018
Sherry Wei, Founder & CTO
Neel Kamal, Head of Field Operations
Frank Cabri, VP Product Marketing
© 2017 AVIATRIX SYSTEMS, INC. | 2© 2017 AVIATRIX SYSTEMS, INC. | 2
• Introductions
• Understanding VPC Networking Elements
• Common Troubleshooting Scenarios
• Demo
• Q & A
Welcome & Agenda
SHERRY WEI
Founder & CTO
NEEL KAMAL
Head of Field Operations
FEATURED SPEAKERS
© 2017 AVIATRIX SYSTEMS, INC. | 3© 2017 AVIATRIX SYSTEMS, INC. | 3
Check Out More Bootcamps – Available On-Demand
www.aviatrix.com/bootcamps
© 2017 AVIATRIX SYSTEMS, INC. | 4© 2017 AVIATRIX SYSTEMS, INC. | 4
Network Problems Often Appear at the App Layer …
“My production app can’t reach
the on-prem database. It was
working yesterday. Can you fix
the network?”
“My instance is running but I
can’t reach the Internet. Is the
network down?”
“From my QA instance, I can no
longer SSH into production. You
need to fix the network fast!”
“VPN performance really sucks.
Joe moved to Japan and he’s
griping that remote access to
dev is way too slow.”
© 2017 AVIATRIX SYSTEMS, INC. | 5© 2017 AVIATRIX SYSTEMS, INC. | 5
… and Gets Progressively Harder as You Dig Deeper
“A customer’s route table
propagated to my cloud
environment and collided with
my CIDR range.”
“I hit a VGW limit on entries.
That led to a BGP crash. And
THAT brought down the entire
cloud network.”
“Internet-bound packets from
the production VPC are getting
dropped.”
“I can’t get any friggin’ trace
logs out of VGW!”
“A partner says that IPsec
connectivity keeps going up
and down.”
© 2017 AVIATRIX SYSTEMS, INC. | 6© 2017 AVIATRIX SYSTEMS, INC. | 6
IGW
NAT SERVICE/GATEWAY
ROUTING TABLES
(PCX/BGP/VGW)
NETWORK ACLs
SECURITY POLICIES
EC2
Understanding VPC Networking Elements
• All layers must work
correctly for the network to
work
• Proving the network is not
the problem requires
proving each layer is not
the problem
• Network issues can be at
any layer, but there is no
easy way to tell, making
root cause analysis difficult
• Number of layers involved
depends upon the
destination (example: EC2
to EC2 vs. EC2 to Internet)
• Each layer has its own scale
limitation
And Limitations…
© 2017 AVIATRIX SYSTEMS, INC. | 7© 2017 AVIATRIX SYSTEMS, INC. | 7
Troubleshooting | Common Connectivity Scenarios
3. VPC to On-Prem
2. EC2 to Internet
1. EC2 to EC2
4. VPC to VNET
(multicloud)
© 2017 AVIATRIX SYSTEMS, INC. | 8© 2017 AVIATRIX SYSTEMS, INC. | 8
What can go wrong?
• Security Group Policies – for example, ports are not open
• Network ACLs – for example, inbound port is open, outbound not
open (not stateful)
• Route Table – for example, human error and limitation on number
of entries
What Does AWS Provide Natively for Troubleshooting?
• Flow Log (minimal information)
• AWS X-Ray
What’s Missing?
• Tools to gather and compare both EC2 instance attributes (security, network ACLs and
route table entries) side by side
• Guardrails – validation prior to making updates to route tables
1. EC2 to EC2 – Network Troubleshooting
EC2EC2
© 2017 AVIATRIX SYSTEMS, INC. | 9© 2017 AVIATRIX SYSTEMS, INC. | 9
What can go wrong?
• Unable to see what URLs should be allowed & denied
• All Internet-bound egress traffic is getting blocked
• Security policy (EC2 level/NAT Gateway) exceeds max limit of 200
• My proxy cannot filter non HTTP/S traffic (e.g. SFTP)
What Does AWS Provide Natively for Troubleshooting?
• Flow Log (minimal information)
What’s Missing?
• Visualization – Reporting on allowed/denied URLs
• Alerting on URL access policy violations
• Egress traffic discovery
• Domain-level filtering
2. EC2 to Internet – Network Troubleshooting
EC2
Internet
© 2017 AVIATRIX SYSTEMS, INC. | 10© 2017 AVIATRIX SYSTEMS, INC. | 10
What can go wrong?
• Network connection (IPsec) is down (VGW or on prem router)
• Direct Connect / Internet goes down
• Mismatched Ipsec parameters
• Route table is misconfigured OR unwanted routes propagated by BGP
• Exceeded route table limits
• Poor performance (latency and/or throughput)
What Does AWS Provide Natively for Troubleshooting?
• VGW up/down status and number of routes
What is Missing?
• VGW is a black box – no trace logs
• No alerts for route table limit
• No error checking for route table entries
• Automation - guardrails for updating route tables; error checks
3. VPC to On-Prem – Network Troubleshooting
VPC
On-Premises
Data Center
Direct Connect
or Internet
© 2017 AVIATRIX SYSTEMS, INC. | 11© 2017 AVIATRIX SYSTEMS, INC. | 11
What can go wrong?
• Route table is misconfigured OR unwanted routes propagated by BGP
• Exceeded route table limits
• Poor performance (latency and/or throughput)
• Azure VNet or AWS VGW goes down/maintenance schedule
What Do AWS/Azure Provide Natively for Troubleshooting?
• VGW up/down status and number of routes
What is Missing?
• No trace logs for cloud provider gateways
• No alerts for route table limit
• No error checking for route table entries
• Automation - guardrails for updating route tables; error checks
4. VPC to VNet (Multicloud) – Network Troubleshooting
VNet
VPC
© 2017 AVIATRIX SYSTEMS, INC. | 12© 2017 AVIATRIX SYSTEMS, INC. | 12
A Consolidated View for Troubleshooting all Layers of AWS Networking
Demo: Aviatrix Controller
© 2017 AVIATRIX SYSTEMS, INC. | 13© 2017 AVIATRIX SYSTEMS, INC. | 13
• Today you have lots of log data … and no insight
• Coming soon: correlated log data, with suggested expert remediation
Coming Soon – Problem Identification and Insights
© 2017 AVIATRIX SYSTEMS, INC. | 14© 2017 AVIATRIX SYSTEMS, INC. | 14
• You’ll receive email w/ a
link to a replay and slides
• Take 5 minutes and start a
free 14-day trial ….
https://www.aviatrix.com/trial
• To view other bootcamps:
https://www.aviatrix.com/bootcamps
Next Steps with Aviatrix
Use the Chat widget to talk
live with a Solution Architect
Thank You!

Contenu connexe

Tendances

Microsoft DirectAccess Remote Access (VPN) with Windows 10 and Server 2012
Microsoft DirectAccess Remote Access (VPN) with Windows 10 and Server 2012Microsoft DirectAccess Remote Access (VPN) with Windows 10 and Server 2012
Microsoft DirectAccess Remote Access (VPN) with Windows 10 and Server 2012
Kemp
 
Modern Network Compliance: Achieving Compliance in a Hybrid, Multi-Cloud World
Modern Network Compliance: Achieving Compliance in a Hybrid, Multi-Cloud WorldModern Network Compliance: Achieving Compliance in a Hybrid, Multi-Cloud World
Modern Network Compliance: Achieving Compliance in a Hybrid, Multi-Cloud World
Itential
 

Tendances (20)

Seven Criteria for Building an AWS Global Transit Network
Seven Criteria for Building an AWS Global Transit NetworkSeven Criteria for Building an AWS Global Transit Network
Seven Criteria for Building an AWS Global Transit Network
 
Understanding the New Enterprise Multi-Cloud Backbone for DevOps Engineers
Understanding the New Enterprise Multi-Cloud Backbone for DevOps EngineersUnderstanding the New Enterprise Multi-Cloud Backbone for DevOps Engineers
Understanding the New Enterprise Multi-Cloud Backbone for DevOps Engineers
 
How Intuit Monitors Connectivity to AWS
How Intuit Monitors Connectivity to AWS How Intuit Monitors Connectivity to AWS
How Intuit Monitors Connectivity to AWS
 
CDN Performance at eBay from Thousandeyes Connect
CDN Performance at eBay from Thousandeyes ConnectCDN Performance at eBay from Thousandeyes Connect
CDN Performance at eBay from Thousandeyes Connect
 
Demystifying Service Mesh
Demystifying Service MeshDemystifying Service Mesh
Demystifying Service Mesh
 
Cisco IT and ThousandEyes
Cisco IT and ThousandEyesCisco IT and ThousandEyes
Cisco IT and ThousandEyes
 
Network monitoring for the modern wan webinar
Network monitoring for the modern wan webinarNetwork monitoring for the modern wan webinar
Network monitoring for the modern wan webinar
 
WWT: NFV Solutions Presentation from Cisco Live 2017
WWT: NFV Solutions Presentation from Cisco Live 2017WWT: NFV Solutions Presentation from Cisco Live 2017
WWT: NFV Solutions Presentation from Cisco Live 2017
 
VPC and Datacenter Connectivity Options
VPC and Datacenter Connectivity OptionsVPC and Datacenter Connectivity Options
VPC and Datacenter Connectivity Options
 
How ThousandEyes Helps Atlassian Operate in the Public Cloud
How ThousandEyes Helps Atlassian Operate in the Public Cloud How ThousandEyes Helps Atlassian Operate in the Public Cloud
How ThousandEyes Helps Atlassian Operate in the Public Cloud
 
Automating Performance Monitoring at Microsoft
Automating Performance Monitoring at MicrosoftAutomating Performance Monitoring at Microsoft
Automating Performance Monitoring at Microsoft
 
Microsoft DirectAccess Remote Access (VPN) with Windows 10 and Server 2012
Microsoft DirectAccess Remote Access (VPN) with Windows 10 and Server 2012Microsoft DirectAccess Remote Access (VPN) with Windows 10 and Server 2012
Microsoft DirectAccess Remote Access (VPN) with Windows 10 and Server 2012
 
Layer 7 Observability and Centralized Configuration with Consul Service Mesh
Layer 7 Observability and Centralized Configuration with Consul Service MeshLayer 7 Observability and Centralized Configuration with Consul Service Mesh
Layer 7 Observability and Centralized Configuration with Consul Service Mesh
 
Getting Started with ThousandEyes
Getting Started with ThousandEyesGetting Started with ThousandEyes
Getting Started with ThousandEyes
 
Modern Network Compliance: Achieving Compliance in a Hybrid, Multi-Cloud World
Modern Network Compliance: Achieving Compliance in a Hybrid, Multi-Cloud WorldModern Network Compliance: Achieving Compliance in a Hybrid, Multi-Cloud World
Modern Network Compliance: Achieving Compliance in a Hybrid, Multi-Cloud World
 
Reverse Path Visibility with Agent-to-Agent Tests
Reverse Path Visibility with Agent-to-Agent TestsReverse Path Visibility with Agent-to-Agent Tests
Reverse Path Visibility with Agent-to-Agent Tests
 
Monitoring End User Experience with Endpoint Agent
Monitoring End User Experience with Endpoint AgentMonitoring End User Experience with Endpoint Agent
Monitoring End User Experience with Endpoint Agent
 
apidays LIVE Paris - Serverless security: how to protect what you don't see? ...
apidays LIVE Paris - Serverless security: how to protect what you don't see? ...apidays LIVE Paris - Serverless security: how to protect what you don't see? ...
apidays LIVE Paris - Serverless security: how to protect what you don't see? ...
 
AWS Meetup Nov 2015 - News Corp Presentation
AWS Meetup Nov 2015 - News Corp PresentationAWS Meetup Nov 2015 - News Corp Presentation
AWS Meetup Nov 2015 - News Corp Presentation
 
Enhanced Multisite Site Selection for Windows 10 and DirectAccess with KEMP L...
Enhanced Multisite Site Selection for Windows 10 and DirectAccess with KEMP L...Enhanced Multisite Site Selection for Windows 10 and DirectAccess with KEMP L...
Enhanced Multisite Site Selection for Windows 10 and DirectAccess with KEMP L...
 

Similaire à Network Troubleshooting in the Cloud: Tools, Techniques and Gotchas

Introduction to Windows Azure Service Bus Relay Service
Introduction to Windows Azure Service Bus Relay ServiceIntroduction to Windows Azure Service Bus Relay Service
Introduction to Windows Azure Service Bus Relay Service
Tamir Dresher
 
Kubernetes Ingress to Service Mesh (and beyond!)
Kubernetes Ingress to Service Mesh (and beyond!)Kubernetes Ingress to Service Mesh (and beyond!)
Kubernetes Ingress to Service Mesh (and beyond!)
Christian Posta
 

Similaire à Network Troubleshooting in the Cloud: Tools, Techniques and Gotchas (20)

BRKSEC-3771 - WSA with wccp.pdf
BRKSEC-3771 - WSA with wccp.pdfBRKSEC-3771 - WSA with wccp.pdf
BRKSEC-3771 - WSA with wccp.pdf
 
kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...
kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...
kreuzwerker AWS Modernizing Legacy Operations with Containerized Solutions 20...
 
The Top Outages of 2022: Analysis and Takeaways
The Top Outages of 2022: Analysis and TakeawaysThe Top Outages of 2022: Analysis and Takeaways
The Top Outages of 2022: Analysis and Takeaways
 
EMEA.23.02.23_Top_Outages_of_2022_Webinar_Slides.pptx
EMEA.23.02.23_Top_Outages_of_2022_Webinar_Slides.pptxEMEA.23.02.23_Top_Outages_of_2022_Webinar_Slides.pptx
EMEA.23.02.23_Top_Outages_of_2022_Webinar_Slides.pptx
 
The Hitchhiker’s Guide to Hybrid Connectivity
The Hitchhiker’s Guide to Hybrid ConnectivityThe Hitchhiker’s Guide to Hybrid Connectivity
The Hitchhiker’s Guide to Hybrid Connectivity
 
The Top Outages of 2022: Analysis and Takeaways
The Top Outages of 2022: Analysis and TakeawaysThe Top Outages of 2022: Analysis and Takeaways
The Top Outages of 2022: Analysis and Takeaways
 
Data Plane Matters! A Deep Dive and Demo on NGINX Service Mesh
Data Plane Matters! A Deep Dive and Demo on NGINX Service MeshData Plane Matters! A Deep Dive and Demo on NGINX Service Mesh
Data Plane Matters! A Deep Dive and Demo on NGINX Service Mesh
 
New ThousandEyes Product Features and Release Highlights: July 2023
New ThousandEyes Product Features and Release Highlights: July 2023New ThousandEyes Product Features and Release Highlights: July 2023
New ThousandEyes Product Features and Release Highlights: July 2023
 
NET309_Best Practices for Securing an Amazon Virtual Private Cloud
NET309_Best Practices for Securing an Amazon Virtual Private CloudNET309_Best Practices for Securing an Amazon Virtual Private Cloud
NET309_Best Practices for Securing an Amazon Virtual Private Cloud
 
Mastering the move
Mastering the moveMastering the move
Mastering the move
 
AWS Community Day - Amy Negrette - Gateways to Gateways
AWS Community Day - Amy Negrette - Gateways to GatewaysAWS Community Day - Amy Negrette - Gateways to Gateways
AWS Community Day - Amy Negrette - Gateways to Gateways
 
Discover the Power of ThousandEyes on Your Meraki MX
Discover the Power of ThousandEyes on Your Meraki MXDiscover the Power of ThousandEyes on Your Meraki MX
Discover the Power of ThousandEyes on Your Meraki MX
 
Next Generation DDoS Services – can we do this with NFV? - CF Chui
Next Generation DDoS Services – can we do this with NFV? - CF ChuiNext Generation DDoS Services – can we do this with NFV? - CF Chui
Next Generation DDoS Services – can we do this with NFV? - CF Chui
 
The Top Outages of 2023: Analyses and Takeaways
The Top Outages of 2023: Analyses and TakeawaysThe Top Outages of 2023: Analyses and Takeaways
The Top Outages of 2023: Analyses and Takeaways
 
Level-up Your Cloud Visibility Into AWS With ThousandEyes
Level-up Your Cloud Visibility Into AWS With ThousandEyesLevel-up Your Cloud Visibility Into AWS With ThousandEyes
Level-up Your Cloud Visibility Into AWS With ThousandEyes
 
5 Best Practices for Building an AWS Global Transit Network
 5 Best Practices for Building an AWS Global Transit Network 5 Best Practices for Building an AWS Global Transit Network
5 Best Practices for Building an AWS Global Transit Network
 
Service Virtualization: What Testers Need to Know
Service Virtualization: What Testers Need to KnowService Virtualization: What Testers Need to Know
Service Virtualization: What Testers Need to Know
 
Introduction to Windows Azure Service Bus Relay Service
Introduction to Windows Azure Service Bus Relay ServiceIntroduction to Windows Azure Service Bus Relay Service
Introduction to Windows Azure Service Bus Relay Service
 
Kubernetes Ingress to Service Mesh (and beyond!)
Kubernetes Ingress to Service Mesh (and beyond!)Kubernetes Ingress to Service Mesh (and beyond!)
Kubernetes Ingress to Service Mesh (and beyond!)
 
Deep Dive - Usage of on premises data gateway for hybrid integration scenarios
Deep Dive - Usage of on premises data gateway for hybrid integration scenariosDeep Dive - Usage of on premises data gateway for hybrid integration scenarios
Deep Dive - Usage of on premises data gateway for hybrid integration scenarios
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Dernier (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Network Troubleshooting in the Cloud: Tools, Techniques and Gotchas

  • 1. Network Troubleshooting In the Cloud: Tools, Techniques, and Gotchas AWS Bootcamp #8 – September 6, 2018 Sherry Wei, Founder & CTO Neel Kamal, Head of Field Operations Frank Cabri, VP Product Marketing
  • 2. © 2017 AVIATRIX SYSTEMS, INC. | 2© 2017 AVIATRIX SYSTEMS, INC. | 2 • Introductions • Understanding VPC Networking Elements • Common Troubleshooting Scenarios • Demo • Q & A Welcome & Agenda SHERRY WEI Founder & CTO NEEL KAMAL Head of Field Operations FEATURED SPEAKERS
  • 3. © 2017 AVIATRIX SYSTEMS, INC. | 3© 2017 AVIATRIX SYSTEMS, INC. | 3 Check Out More Bootcamps – Available On-Demand www.aviatrix.com/bootcamps
  • 4. © 2017 AVIATRIX SYSTEMS, INC. | 4© 2017 AVIATRIX SYSTEMS, INC. | 4 Network Problems Often Appear at the App Layer … “My production app can’t reach the on-prem database. It was working yesterday. Can you fix the network?” “My instance is running but I can’t reach the Internet. Is the network down?” “From my QA instance, I can no longer SSH into production. You need to fix the network fast!” “VPN performance really sucks. Joe moved to Japan and he’s griping that remote access to dev is way too slow.”
  • 5. © 2017 AVIATRIX SYSTEMS, INC. | 5© 2017 AVIATRIX SYSTEMS, INC. | 5 … and Gets Progressively Harder as You Dig Deeper “A customer’s route table propagated to my cloud environment and collided with my CIDR range.” “I hit a VGW limit on entries. That led to a BGP crash. And THAT brought down the entire cloud network.” “Internet-bound packets from the production VPC are getting dropped.” “I can’t get any friggin’ trace logs out of VGW!” “A partner says that IPsec connectivity keeps going up and down.”
  • 6. © 2017 AVIATRIX SYSTEMS, INC. | 6© 2017 AVIATRIX SYSTEMS, INC. | 6 IGW NAT SERVICE/GATEWAY ROUTING TABLES (PCX/BGP/VGW) NETWORK ACLs SECURITY POLICIES EC2 Understanding VPC Networking Elements • All layers must work correctly for the network to work • Proving the network is not the problem requires proving each layer is not the problem • Network issues can be at any layer, but there is no easy way to tell, making root cause analysis difficult • Number of layers involved depends upon the destination (example: EC2 to EC2 vs. EC2 to Internet) • Each layer has its own scale limitation And Limitations…
  • 7. © 2017 AVIATRIX SYSTEMS, INC. | 7© 2017 AVIATRIX SYSTEMS, INC. | 7 Troubleshooting | Common Connectivity Scenarios 3. VPC to On-Prem 2. EC2 to Internet 1. EC2 to EC2 4. VPC to VNET (multicloud)
  • 8. © 2017 AVIATRIX SYSTEMS, INC. | 8© 2017 AVIATRIX SYSTEMS, INC. | 8 What can go wrong? • Security Group Policies – for example, ports are not open • Network ACLs – for example, inbound port is open, outbound not open (not stateful) • Route Table – for example, human error and limitation on number of entries What Does AWS Provide Natively for Troubleshooting? • Flow Log (minimal information) • AWS X-Ray What’s Missing? • Tools to gather and compare both EC2 instance attributes (security, network ACLs and route table entries) side by side • Guardrails – validation prior to making updates to route tables 1. EC2 to EC2 – Network Troubleshooting EC2EC2
  • 9. © 2017 AVIATRIX SYSTEMS, INC. | 9© 2017 AVIATRIX SYSTEMS, INC. | 9 What can go wrong? • Unable to see what URLs should be allowed & denied • All Internet-bound egress traffic is getting blocked • Security policy (EC2 level/NAT Gateway) exceeds max limit of 200 • My proxy cannot filter non HTTP/S traffic (e.g. SFTP) What Does AWS Provide Natively for Troubleshooting? • Flow Log (minimal information) What’s Missing? • Visualization – Reporting on allowed/denied URLs • Alerting on URL access policy violations • Egress traffic discovery • Domain-level filtering 2. EC2 to Internet – Network Troubleshooting EC2 Internet
  • 10. © 2017 AVIATRIX SYSTEMS, INC. | 10© 2017 AVIATRIX SYSTEMS, INC. | 10 What can go wrong? • Network connection (IPsec) is down (VGW or on prem router) • Direct Connect / Internet goes down • Mismatched Ipsec parameters • Route table is misconfigured OR unwanted routes propagated by BGP • Exceeded route table limits • Poor performance (latency and/or throughput) What Does AWS Provide Natively for Troubleshooting? • VGW up/down status and number of routes What is Missing? • VGW is a black box – no trace logs • No alerts for route table limit • No error checking for route table entries • Automation - guardrails for updating route tables; error checks 3. VPC to On-Prem – Network Troubleshooting VPC On-Premises Data Center Direct Connect or Internet
  • 11. © 2017 AVIATRIX SYSTEMS, INC. | 11© 2017 AVIATRIX SYSTEMS, INC. | 11 What can go wrong? • Route table is misconfigured OR unwanted routes propagated by BGP • Exceeded route table limits • Poor performance (latency and/or throughput) • Azure VNet or AWS VGW goes down/maintenance schedule What Do AWS/Azure Provide Natively for Troubleshooting? • VGW up/down status and number of routes What is Missing? • No trace logs for cloud provider gateways • No alerts for route table limit • No error checking for route table entries • Automation - guardrails for updating route tables; error checks 4. VPC to VNet (Multicloud) – Network Troubleshooting VNet VPC
  • 12. © 2017 AVIATRIX SYSTEMS, INC. | 12© 2017 AVIATRIX SYSTEMS, INC. | 12 A Consolidated View for Troubleshooting all Layers of AWS Networking Demo: Aviatrix Controller
  • 13. © 2017 AVIATRIX SYSTEMS, INC. | 13© 2017 AVIATRIX SYSTEMS, INC. | 13 • Today you have lots of log data … and no insight • Coming soon: correlated log data, with suggested expert remediation Coming Soon – Problem Identification and Insights
  • 14. © 2017 AVIATRIX SYSTEMS, INC. | 14© 2017 AVIATRIX SYSTEMS, INC. | 14 • You’ll receive email w/ a link to a replay and slides • Take 5 minutes and start a free 14-day trial …. https://www.aviatrix.com/trial • To view other bootcamps: https://www.aviatrix.com/bootcamps Next Steps with Aviatrix Use the Chat widget to talk live with a Solution Architect