Contenu connexe Similaire à Network Troubleshooting in the Cloud: Tools, Techniques and Gotchas (20) Network Troubleshooting in the Cloud: Tools, Techniques and Gotchas1. Network Troubleshooting
In the Cloud: Tools,
Techniques, and Gotchas
AWS Bootcamp #8 – September 6, 2018
Sherry Wei, Founder & CTO
Neel Kamal, Head of Field Operations
Frank Cabri, VP Product Marketing
2. © 2017 AVIATRIX SYSTEMS, INC. | 2© 2017 AVIATRIX SYSTEMS, INC. | 2
• Introductions
• Understanding VPC Networking Elements
• Common Troubleshooting Scenarios
• Demo
• Q & A
Welcome & Agenda
SHERRY WEI
Founder & CTO
NEEL KAMAL
Head of Field Operations
FEATURED SPEAKERS
3. © 2017 AVIATRIX SYSTEMS, INC. | 3© 2017 AVIATRIX SYSTEMS, INC. | 3
Check Out More Bootcamps – Available On-Demand
www.aviatrix.com/bootcamps
4. © 2017 AVIATRIX SYSTEMS, INC. | 4© 2017 AVIATRIX SYSTEMS, INC. | 4
Network Problems Often Appear at the App Layer …
“My production app can’t reach
the on-prem database. It was
working yesterday. Can you fix
the network?”
“My instance is running but I
can’t reach the Internet. Is the
network down?”
“From my QA instance, I can no
longer SSH into production. You
need to fix the network fast!”
“VPN performance really sucks.
Joe moved to Japan and he’s
griping that remote access to
dev is way too slow.”
5. © 2017 AVIATRIX SYSTEMS, INC. | 5© 2017 AVIATRIX SYSTEMS, INC. | 5
… and Gets Progressively Harder as You Dig Deeper
“A customer’s route table
propagated to my cloud
environment and collided with
my CIDR range.”
“I hit a VGW limit on entries.
That led to a BGP crash. And
THAT brought down the entire
cloud network.”
“Internet-bound packets from
the production VPC are getting
dropped.”
“I can’t get any friggin’ trace
logs out of VGW!”
“A partner says that IPsec
connectivity keeps going up
and down.”
6. © 2017 AVIATRIX SYSTEMS, INC. | 6© 2017 AVIATRIX SYSTEMS, INC. | 6
IGW
NAT SERVICE/GATEWAY
ROUTING TABLES
(PCX/BGP/VGW)
NETWORK ACLs
SECURITY POLICIES
EC2
Understanding VPC Networking Elements
• All layers must work
correctly for the network to
work
• Proving the network is not
the problem requires
proving each layer is not
the problem
• Network issues can be at
any layer, but there is no
easy way to tell, making
root cause analysis difficult
• Number of layers involved
depends upon the
destination (example: EC2
to EC2 vs. EC2 to Internet)
• Each layer has its own scale
limitation
And Limitations…
7. © 2017 AVIATRIX SYSTEMS, INC. | 7© 2017 AVIATRIX SYSTEMS, INC. | 7
Troubleshooting | Common Connectivity Scenarios
3. VPC to On-Prem
2. EC2 to Internet
1. EC2 to EC2
4. VPC to VNET
(multicloud)
8. © 2017 AVIATRIX SYSTEMS, INC. | 8© 2017 AVIATRIX SYSTEMS, INC. | 8
What can go wrong?
• Security Group Policies – for example, ports are not open
• Network ACLs – for example, inbound port is open, outbound not
open (not stateful)
• Route Table – for example, human error and limitation on number
of entries
What Does AWS Provide Natively for Troubleshooting?
• Flow Log (minimal information)
• AWS X-Ray
What’s Missing?
• Tools to gather and compare both EC2 instance attributes (security, network ACLs and
route table entries) side by side
• Guardrails – validation prior to making updates to route tables
1. EC2 to EC2 – Network Troubleshooting
EC2EC2
9. © 2017 AVIATRIX SYSTEMS, INC. | 9© 2017 AVIATRIX SYSTEMS, INC. | 9
What can go wrong?
• Unable to see what URLs should be allowed & denied
• All Internet-bound egress traffic is getting blocked
• Security policy (EC2 level/NAT Gateway) exceeds max limit of 200
• My proxy cannot filter non HTTP/S traffic (e.g. SFTP)
What Does AWS Provide Natively for Troubleshooting?
• Flow Log (minimal information)
What’s Missing?
• Visualization – Reporting on allowed/denied URLs
• Alerting on URL access policy violations
• Egress traffic discovery
• Domain-level filtering
2. EC2 to Internet – Network Troubleshooting
EC2
Internet
10. © 2017 AVIATRIX SYSTEMS, INC. | 10© 2017 AVIATRIX SYSTEMS, INC. | 10
What can go wrong?
• Network connection (IPsec) is down (VGW or on prem router)
• Direct Connect / Internet goes down
• Mismatched Ipsec parameters
• Route table is misconfigured OR unwanted routes propagated by BGP
• Exceeded route table limits
• Poor performance (latency and/or throughput)
What Does AWS Provide Natively for Troubleshooting?
• VGW up/down status and number of routes
What is Missing?
• VGW is a black box – no trace logs
• No alerts for route table limit
• No error checking for route table entries
• Automation - guardrails for updating route tables; error checks
3. VPC to On-Prem – Network Troubleshooting
VPC
On-Premises
Data Center
Direct Connect
or Internet
11. © 2017 AVIATRIX SYSTEMS, INC. | 11© 2017 AVIATRIX SYSTEMS, INC. | 11
What can go wrong?
• Route table is misconfigured OR unwanted routes propagated by BGP
• Exceeded route table limits
• Poor performance (latency and/or throughput)
• Azure VNet or AWS VGW goes down/maintenance schedule
What Do AWS/Azure Provide Natively for Troubleshooting?
• VGW up/down status and number of routes
What is Missing?
• No trace logs for cloud provider gateways
• No alerts for route table limit
• No error checking for route table entries
• Automation - guardrails for updating route tables; error checks
4. VPC to VNet (Multicloud) – Network Troubleshooting
VNet
VPC
12. © 2017 AVIATRIX SYSTEMS, INC. | 12© 2017 AVIATRIX SYSTEMS, INC. | 12
A Consolidated View for Troubleshooting all Layers of AWS Networking
Demo: Aviatrix Controller
13. © 2017 AVIATRIX SYSTEMS, INC. | 13© 2017 AVIATRIX SYSTEMS, INC. | 13
• Today you have lots of log data … and no insight
• Coming soon: correlated log data, with suggested expert remediation
Coming Soon – Problem Identification and Insights
14. © 2017 AVIATRIX SYSTEMS, INC. | 14© 2017 AVIATRIX SYSTEMS, INC. | 14
• You’ll receive email w/ a
link to a replay and slides
• Take 5 minutes and start a
free 14-day trial ….
https://www.aviatrix.com/trial
• To view other bootcamps:
https://www.aviatrix.com/bootcamps
Next Steps with Aviatrix
Use the Chat widget to talk
live with a Solution Architect