SlideShare une entreprise Scribd logo
1  sur  47
Télécharger pour lire hors ligne
Network Traffic Visibility and
Anomaly Detection
@Scale: October 27th, 2016
Dan Ellis
• What is network traffic visibility?
• My relationship with traffic tools
• 20+ years of frustrations
• ISP’s, CDNs, enterprises
• Who is Kentik
Goal of this talk: Make your life easier
Introduction
• Data networks can be
compared to FedEx
• Imagine FedEx without
package tracking
• Majority of data
networks operate in
this vacuum of visibility
• Hard to believe?
Problem is massive
data scale, lack of
tools, little network +
systems collaboration
Traffic Visibility Problem
4
Not Helping…
•Interface Volume
(Mb/s, pps)?
A Poll
•Interface Volume
(Mb/s, pps)?
•Src/Dst IP+Port,
ASN, BGP Path?
A Poll
A Poll
•Interface Volume
(Mb/s, pps)?
•Src/Dst IP+Port,
ASN, BGP Path?
•IP, Port, ASN
or Path Thresholds?
?Maybe there isn’t a traffic visibility problem
Maybe no one really needs this data
Is this really a problem?
9
Complaints of high latency… BGP Path to Comcast
10
Dyn attack last week – ISP recursive inbound
Dyn attack last week – Traffic / source_ip
Dyn attack last week – ISP recursive outbound
Use cases of traffic visibility
• Network Planning
• Peering Analytics and Abuse
• Congestion detection
• Is it the network?
• Where on the network?
• Proactive alerting
• Distributed DDoS Detection
• What Changed Post Deploy?
• Security and Breach
Detection
• Cost Analytics
• Revenue Identification
(New + Risk)
• Enabling Internal Groups
Tenets
• Infinite granularity storage for months
• Drillable visibility, network specific UI
• Real-time and fast (< 10s queries)
• Anomaly detection + actions
• Open / API
• Scale
Now we know what we need,
how do we do it?
TCP stats data / app specific data
Where to find this data ?
Flow data
NetFlow, SFlow, IPFIX
SNMP interfaces info
Sys/Event logs
TACACS
or
Syslog
App
Server
BGP Path info
NETWORK
+
+
+
=
Something actually
useful
+
Router
Router
PCAP
agent
What kind of tools
• Current Open Source:
• Older Open Source:
• Commercial software:
• DIY Big Data:
• On-Prem Big Data:
• SaaS Big Data:
pmacct, ntop, SiLK, cacti
cflowd, AS-PATH, RRDtool
Arbor, Plixer, SevOne, Solarwinds, ManageEngine
Kafka + ELK, Hadoop, druid, grafana, tsdb
Cisco Tetration, Deepfield…
Kentik, Datadog, Appneta, Splunk
18
Many tools gets you almost there
Really though… (tools)
Open source (ish):
• Pmacct
• Nprobe / Ntop
• Elastic Search + Kibana (ELK)
Commercial:
• Arbor
• Kentik
Three layer approach
Ingest &
Fusion layer
Storage Layer
(flow specific)
Query
Layer
Each layer has separate and different scaling
characteristics
Query engine
and UI
Query
interfaces
SQL
WWW
REST
data
sources
clients
SELECT flow
FROM router
WHERE …
>_
Seriously?
How much data
• Small network (< 10Gb/s traf.)
• Large network (1 Tb/s traf.)
• Querying over 30+ days
10k flows/sec (+rows/sec)
500k flows/sec
@ 200k fps (518 B rows, 207 TB) in < 10s
Data fusion
is a key enabler to useful data
Distributed system data ingest
Proxy
DATA FUSION
Decoder
Modules
Mem
Tables
NetFlow v5
NetFlow v9
IPFIX
BGP RIB
Custom Tags
SNMP Poller
BGP
Daemon
Enrichment
DB
DATA FUSION
Geo ß à IP
ASN ß à IP
SFlow
ROUTER
FLOW FRIENDLY DATASTORE
Single flow
fused row
sent to storage
Fusing should be near real-time, performed at
ingest and data specific
Proxy
Agent
CLIENT PROCESS
Decoder
Modules
Mem
Tables
NetFlow v5
NetFlow v9
IPFIX
Unified Flow
BGP RIB
Custom Tags
SNMP Poller
BGP
Daemon
Enrichment
DB
DATA FUSION
Geo ß à IP
ASN ß à IP
SFlow
ROUTER
FLOW FRIENDLY DATASTORE
Single flow
fused row
sent to storage
Distributed system data ingest
Serious. Flux Capacitor. Is. Serious.
Network planning: traffic by BGP hop
Network planning: collapsed path, exclude 1st
Looking at existing architectures
out there
PMAcct-based implementation
BGP
Flow
PCAP
Ingest &
Fusion layer
Storage Layer
Query
Layer
data
sources
clients
frontend
Nprobe + Ntop + ElasticSearch
BGP
Flow
nProbe
nProbe(s)
Ingest &
Fusion layer
Storage Layer
Query
Layer
data
sources
clients
BGP
Flow
• Dropbox implementation of a (mostly) open-source NetFlow solution
here: Dropbox blog
• Requires custom ingest, fusing, UI
Kafka + Elastic Search
• Ingest:
• Data-store:
• Query frontends very generic:
Distributing and scaling (1xNProbe = 1xDevice)
No SNMP (= no IF info available for fusion)
Aggregation (no infinite granularity)
Challenging at scale when ES
very hard for MySQL/MongoDB
Tailoring of meaningful dashboards difficult
No anomaly detection
Caveats
Commercial HW solutions (Arbor)
Appliance based
not truly distributed
pre-determined list of aggregated data (no infinite granularity)
And so…
Why isn’t everybody already doing it?
Required areas of expertise
(because every presentation needs a Vin diagram)
Distributed
systems
engineers
Network
Engineers
SREs
Low-level
Network
developers
Resilience / Reliability
Geo-distributed ingest
Flow friendly data-store
BGP Daemon
Flow inspection & conversion
Network protocols hacking
Make all of the above
work reliably
Train all the other teams
on the involved network
protocols and their usage
Unicorn
Looking beyond the basics
Once you have a platform, what’s next?
• Augmented flow (retransmits, latency, URL, DNS)
• Anomaly detection
• Multi-hop exit determination
• BGP-path congestion detection
Building on top of the platform
Imagine if we could get performance data from the network:
• Q Depth
• Retransmits per flow
• TCP latency
• Application Latency
You can:
• Nprobe (ntop) collects Latency, Rxmits, URL, DNS -> IPFIX flow
• Deploy on a host or a sensor
• Cisco, Juniper, Arista working to expose Q Depth into flow
• Elastic Search, Plixer, Kentik can consume additional IPFIX fields
Building on top of the platform
Retransmits enhanced flow: rexmits / interface
Retransmits enhanced flow: rexmits / ASN
Retransmits enhanced flow: TCP latency / ASN
You shouldn’t have to stare at dashboards or watch logs to detect
badness
Monitor top-x of any dimension combination (IP, ASN’s, Geo, Interface)
Create baselines based on time of day
Be able to look at things beyond pps/bps such as retransmits, latency, logs
Detect shifts: did an ASN or IP on a particular interface suddenly move from
top-x #200 to #2 and that is unusual for this time of day
This is available today (Open Source: Hadoop, Spark, Storm, Samza, Flink)
Building on top of the platform
Use case: anomaly detection
Traffic	from	one	ASN	(network)	unusually	high.	Operator	notified	at	red	line.
Use case: traffic anomaly detection & annotation
Use case: traffic annotated w/ multiple events
Anomaly detection: DDoS detection & characteristics
Once you have a platform, what’s next?
üAugmented flow (retransmits, latency, URL, DNS)
üAnomaly detection
q Multi-hop exit determination
Challenging to map traffic from ingest to exit point, multi-hop
q BGP-path congestion detection
Detect individual congested paths within a circuit that isn’t
congested
Building on top of the platform
Summary
Networks can produce large amounts of data that will make your life
easier
Big Data platforms are able to consume this data
Specific tools for Network Operators are beginning to appear (free &
paid)
Paid tools are more specific to network use (UI, easy setup, etc)
Free tools have the “power” but require cobbling together pieces
Much work to be done re fusing data such as logs, changes, alerts, DNS
SaaS providers will provide community views and enable data-sharing
QUESTIONS ?
Dan Ellis
dan@kentik.com
CREDITS

Contenu connexe

Tendances

Database ,14 Parallel DBMS
Database ,14 Parallel DBMSDatabase ,14 Parallel DBMS
Database ,14 Parallel DBMS
Ali Usman
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
Rupsee
 

Tendances (20)

transport layer
transport layertransport layer
transport layer
 
RFC and internet standards presentation
RFC and internet standards presentationRFC and internet standards presentation
RFC and internet standards presentation
 
Bit torrent ppt
Bit torrent pptBit torrent ppt
Bit torrent ppt
 
Big Data Benchmarking
Big Data BenchmarkingBig Data Benchmarking
Big Data Benchmarking
 
3 additional dpdk_theory(1)
3 additional dpdk_theory(1)3 additional dpdk_theory(1)
3 additional dpdk_theory(1)
 
CAP Theorem
CAP TheoremCAP Theorem
CAP Theorem
 
Farmer Recommendation system
Farmer Recommendation systemFarmer Recommendation system
Farmer Recommendation system
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
web mining
web miningweb mining
web mining
 
Big Data - Applications and Technologies Overview
Big Data - Applications and Technologies OverviewBig Data - Applications and Technologies Overview
Big Data - Applications and Technologies Overview
 
SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
Database ,14 Parallel DBMS
Database ,14 Parallel DBMSDatabase ,14 Parallel DBMS
Database ,14 Parallel DBMS
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 
CoAP protocol -Internet of Things(iot)
CoAP protocol -Internet of Things(iot)CoAP protocol -Internet of Things(iot)
CoAP protocol -Internet of Things(iot)
 
Layering and Architecture
Layering and ArchitectureLayering and Architecture
Layering and Architecture
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
TCP-IP Reference Model
TCP-IP Reference ModelTCP-IP Reference Model
TCP-IP Reference Model
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 

En vedette

En vedette (8)

Cloud Aware Network Management
Cloud Aware Network ManagementCloud Aware Network Management
Cloud Aware Network Management
 
Nokia Big Data and Analytics
Nokia Big Data and AnalyticsNokia Big Data and Analytics
Nokia Big Data and Analytics
 
Big Data Expo 2015 - Schiphol Big Data @ Schiphol
Big Data Expo 2015 - Schiphol Big Data @ SchipholBig Data Expo 2015 - Schiphol Big Data @ Schiphol
Big Data Expo 2015 - Schiphol Big Data @ Schiphol
 
deepfield_networks
deepfield_networksdeepfield_networks
deepfield_networks
 
Next-Gen DDoS Detection
Next-Gen DDoS DetectionNext-Gen DDoS Detection
Next-Gen DDoS Detection
 
pmacct, kafka, presto, re:dash を使った高速なflow解析
pmacct, kafka, presto, re:dash を使った高速なflow解析pmacct, kafka, presto, re:dash を使った高速なflow解析
pmacct, kafka, presto, re:dash を使った高速なflow解析
 
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
 

Similaire à Kentik Network@Scale (Dan Ellis)

Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
DataWorks Summit
 
The Background Noise of the Internet
The Background Noise of the InternetThe Background Noise of the Internet
The Background Noise of the Internet
Andrew Morris
 

Similaire à Kentik Network@Scale (Dan Ellis) (20)

Co se skrývá v datovém provozu? - Pavel Minařík
Co se skrývá v datovém provozu? - Pavel MinaříkCo se skrývá v datovém provozu? - Pavel Minařík
Co se skrývá v datovém provozu? - Pavel Minařík
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and AnalyticsDataStax and Esri: Geotemporal IoT Search and Analytics
DataStax and Esri: Geotemporal IoT Search and Analytics
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Edward King SPEDDEXES 2014
Edward King SPEDDEXES 2014Edward King SPEDDEXES 2014
Edward King SPEDDEXES 2014
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Network Situational Awareness with d00gle
Network Situational Awareness with d00gleNetwork Situational Awareness with d00gle
Network Situational Awareness with d00gle
 
The Background Noise of the Internet
The Background Noise of the InternetThe Background Noise of the Internet
The Background Noise of the Internet
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
From Device to Data Center to Insights
From Device to Data Center to InsightsFrom Device to Data Center to Insights
From Device to Data Center to Insights
 
Splunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAPSplunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAP
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming Platform
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
HSB15 - Pavel Minarik - INVEATECH
HSB15 - Pavel Minarik - INVEATECHHSB15 - Pavel Minarik - INVEATECH
HSB15 - Pavel Minarik - INVEATECH
 

Dernier

Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 

Dernier (20)

S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 

Kentik Network@Scale (Dan Ellis)

  • 1. Network Traffic Visibility and Anomaly Detection @Scale: October 27th, 2016 Dan Ellis
  • 2. • What is network traffic visibility? • My relationship with traffic tools • 20+ years of frustrations • ISP’s, CDNs, enterprises • Who is Kentik Goal of this talk: Make your life easier Introduction
  • 3. • Data networks can be compared to FedEx • Imagine FedEx without package tracking • Majority of data networks operate in this vacuum of visibility • Hard to believe? Problem is massive data scale, lack of tools, little network + systems collaboration Traffic Visibility Problem
  • 6. •Interface Volume (Mb/s, pps)? •Src/Dst IP+Port, ASN, BGP Path? A Poll
  • 7. A Poll •Interface Volume (Mb/s, pps)? •Src/Dst IP+Port, ASN, BGP Path? •IP, Port, ASN or Path Thresholds?
  • 8. ?Maybe there isn’t a traffic visibility problem Maybe no one really needs this data Is this really a problem?
  • 9. 9 Complaints of high latency… BGP Path to Comcast
  • 10. 10 Dyn attack last week – ISP recursive inbound
  • 11. Dyn attack last week – Traffic / source_ip
  • 12. Dyn attack last week – ISP recursive outbound
  • 13. Use cases of traffic visibility • Network Planning • Peering Analytics and Abuse • Congestion detection • Is it the network? • Where on the network? • Proactive alerting • Distributed DDoS Detection • What Changed Post Deploy? • Security and Breach Detection • Cost Analytics • Revenue Identification (New + Risk) • Enabling Internal Groups
  • 14. Tenets • Infinite granularity storage for months • Drillable visibility, network specific UI • Real-time and fast (< 10s queries) • Anomaly detection + actions • Open / API • Scale
  • 15. Now we know what we need, how do we do it?
  • 16. TCP stats data / app specific data Where to find this data ? Flow data NetFlow, SFlow, IPFIX SNMP interfaces info Sys/Event logs TACACS or Syslog App Server BGP Path info NETWORK + + + = Something actually useful + Router Router PCAP agent
  • 17. What kind of tools • Current Open Source: • Older Open Source: • Commercial software: • DIY Big Data: • On-Prem Big Data: • SaaS Big Data: pmacct, ntop, SiLK, cacti cflowd, AS-PATH, RRDtool Arbor, Plixer, SevOne, Solarwinds, ManageEngine Kafka + ELK, Hadoop, druid, grafana, tsdb Cisco Tetration, Deepfield… Kentik, Datadog, Appneta, Splunk
  • 18. 18 Many tools gets you almost there
  • 19. Really though… (tools) Open source (ish): • Pmacct • Nprobe / Ntop • Elastic Search + Kibana (ELK) Commercial: • Arbor • Kentik
  • 20. Three layer approach Ingest & Fusion layer Storage Layer (flow specific) Query Layer Each layer has separate and different scaling characteristics Query engine and UI Query interfaces SQL WWW REST data sources clients SELECT flow FROM router WHERE … >_
  • 21. Seriously? How much data • Small network (< 10Gb/s traf.) • Large network (1 Tb/s traf.) • Querying over 30+ days 10k flows/sec (+rows/sec) 500k flows/sec @ 200k fps (518 B rows, 207 TB) in < 10s
  • 22. Data fusion is a key enabler to useful data
  • 23. Distributed system data ingest Proxy DATA FUSION Decoder Modules Mem Tables NetFlow v5 NetFlow v9 IPFIX BGP RIB Custom Tags SNMP Poller BGP Daemon Enrichment DB DATA FUSION Geo ß à IP ASN ß à IP SFlow ROUTER FLOW FRIENDLY DATASTORE Single flow fused row sent to storage Fusing should be near real-time, performed at ingest and data specific
  • 24. Proxy Agent CLIENT PROCESS Decoder Modules Mem Tables NetFlow v5 NetFlow v9 IPFIX Unified Flow BGP RIB Custom Tags SNMP Poller BGP Daemon Enrichment DB DATA FUSION Geo ß à IP ASN ß à IP SFlow ROUTER FLOW FRIENDLY DATASTORE Single flow fused row sent to storage Distributed system data ingest Serious. Flux Capacitor. Is. Serious.
  • 26. Network planning: collapsed path, exclude 1st
  • 27. Looking at existing architectures out there
  • 28. PMAcct-based implementation BGP Flow PCAP Ingest & Fusion layer Storage Layer Query Layer data sources clients frontend
  • 29. Nprobe + Ntop + ElasticSearch BGP Flow nProbe nProbe(s) Ingest & Fusion layer Storage Layer Query Layer data sources clients BGP Flow
  • 30. • Dropbox implementation of a (mostly) open-source NetFlow solution here: Dropbox blog • Requires custom ingest, fusing, UI Kafka + Elastic Search
  • 31. • Ingest: • Data-store: • Query frontends very generic: Distributing and scaling (1xNProbe = 1xDevice) No SNMP (= no IF info available for fusion) Aggregation (no infinite granularity) Challenging at scale when ES very hard for MySQL/MongoDB Tailoring of meaningful dashboards difficult No anomaly detection Caveats Commercial HW solutions (Arbor) Appliance based not truly distributed pre-determined list of aggregated data (no infinite granularity)
  • 33. Why isn’t everybody already doing it? Required areas of expertise (because every presentation needs a Vin diagram) Distributed systems engineers Network Engineers SREs Low-level Network developers Resilience / Reliability Geo-distributed ingest Flow friendly data-store BGP Daemon Flow inspection & conversion Network protocols hacking Make all of the above work reliably Train all the other teams on the involved network protocols and their usage Unicorn
  • 35. Once you have a platform, what’s next? • Augmented flow (retransmits, latency, URL, DNS) • Anomaly detection • Multi-hop exit determination • BGP-path congestion detection Building on top of the platform
  • 36. Imagine if we could get performance data from the network: • Q Depth • Retransmits per flow • TCP latency • Application Latency You can: • Nprobe (ntop) collects Latency, Rxmits, URL, DNS -> IPFIX flow • Deploy on a host or a sensor • Cisco, Juniper, Arista working to expose Q Depth into flow • Elastic Search, Plixer, Kentik can consume additional IPFIX fields Building on top of the platform
  • 37. Retransmits enhanced flow: rexmits / interface
  • 38. Retransmits enhanced flow: rexmits / ASN
  • 39. Retransmits enhanced flow: TCP latency / ASN
  • 40. You shouldn’t have to stare at dashboards or watch logs to detect badness Monitor top-x of any dimension combination (IP, ASN’s, Geo, Interface) Create baselines based on time of day Be able to look at things beyond pps/bps such as retransmits, latency, logs Detect shifts: did an ASN or IP on a particular interface suddenly move from top-x #200 to #2 and that is unusual for this time of day This is available today (Open Source: Hadoop, Spark, Storm, Samza, Flink) Building on top of the platform
  • 41. Use case: anomaly detection Traffic from one ASN (network) unusually high. Operator notified at red line.
  • 42. Use case: traffic anomaly detection & annotation
  • 43. Use case: traffic annotated w/ multiple events
  • 44. Anomaly detection: DDoS detection & characteristics
  • 45. Once you have a platform, what’s next? üAugmented flow (retransmits, latency, URL, DNS) üAnomaly detection q Multi-hop exit determination Challenging to map traffic from ingest to exit point, multi-hop q BGP-path congestion detection Detect individual congested paths within a circuit that isn’t congested Building on top of the platform
  • 46. Summary Networks can produce large amounts of data that will make your life easier Big Data platforms are able to consume this data Specific tools for Network Operators are beginning to appear (free & paid) Paid tools are more specific to network use (UI, easy setup, etc) Free tools have the “power” but require cobbling together pieces Much work to be done re fusing data such as logs, changes, alerts, DNS SaaS providers will provide community views and enable data-sharing