Dan Ellis (CTO@Kentik) presents and discusses the technology and platform behind Kentik Detect Engine.
Links to the video of the presentation: https://kentik.com/nfd14
2. KDE Quick Stats
(kentik detect engine)
NetFlow in the Cloud
• 125+ Billion Flows/Day stored
• 1,000,000+ FPS
• 50 “Large” Queries/s, thousands of sub-qps
• 75+ TB flow data stored/day
(25+ compressed)
SNMP, BGP, network performance too!
3. KDE High-Level
• KDE is a hybrid system:
○ Fusing / Ingest Layer
○ Distributed column store db / query engine
○ Realtime stream processing for anomaly detection
• We evaluated various existing engines: ES, Hadoop,
Cassandra, Storm, Spark, SILK, Druid, Kafka....
• Couldn’t find performance, multi-tenancy, and network
savvy
so we wrote our own...
4. Ingest &
Fusion
layer
Storage layer
(flow specific)
Query
layer
Each layer has separate and different scaling characteristics
Query engine
and UI
Query
interfaces
SQL
WWW
REST
Data
sources Clients
SELECT flow
FROM router
WHERE …
>_
KDE architecture
6. KDE Architecture
BGP VIP
KDE ingest layer
enKryptor
Storage layer
Streaming layer
kFlow
(HTTPS)
NetFlow
(UDP)
NetFlow
(UDP)
kFlow
(HTTPS)
kFlow
(HTTP)
kFlow
(HTTP)
relay
relay
proxy
proxy
proxy
client
C
client
C
client
C
7. KDE ingest layer
enKryptor
Storage layer
Streaming layer
kFlow
(HTTPS)
NetFlow
(UDP)
kFlow
(HTTPS)
kFlow
(HTTPS)
kFlow
(HTTPS)
proxy
proxy
proxy
client
C
client
C
client
C
BGP VIP
NetFlow
(UDP) relay
VIP + Relay
• One IP bound to multiple
servers
• Sharded by Source-IP
• Validate Sender as Kentik
Customer
• Pass flow on (raw UDP
socket) to correct proxy
• Relay handles load balancing
(Kentik specific, UDP+TCP)
relay
8. Proxy
BGP VIP
KDE ingest layer
enKryptor
Storage layer
Streaming layer
kFlow
(HTTPS)
NetFlow
(UDP)
NetFlow
(UDP)
kFlow
(HTTPS)
relay
relay
kFlow
(HTTP)
client
C
client
C
client
C
kFlow
(HTTP)
• Inspect flow & determine type:
V5, V9, IPFIX, SFlow, KFlow
• Need to resample?
• Configured Sample Rate
• Launch Client Process for each
device
• Poll for device changes
• Monitor health
• Relaunch of client crash
proxy
proxy
proxy
9. BGP VIP
KDE ingest layer
enKryptor
Storage layer
Streaming layer
kFlow
(HTTPS)
NetFlow
(UDP)
NetFlow
(UDP)
kFlow
(HTTPS)
relay
relay
proxy
proxy
proxy
kFlow
(HTTP)
kFlow
(HTTP)
client
C
client
C
client
C
Client
(where the magic happens)
• One per device
configured to send flow
• * goes in, KFlow comes
out
client
C
NetFlow
SFlow
IPFix
kFlow
12. Step 2: Enrichment
• BGP - Route data for xxx
• GeoIP - Where does my traffic start and end
• SNMP - Interface names and descriptions
• Tagging - business classification: cost-centers,
user-info, peering info
• App Specific Data - URL/DNS requests, MYSQL
query
• Performance data (NPM) - Retransmits, network latency,
appl latency
• coming soon:
• Timestamped event Data (syslog)
• Threat feeds
13. DATA FUSION in
CLIENT
Decoder
Modules
Mem
Tables
NetFlow v5
NetFlow v9
IPFIX
BGP RIB
Custom Tags
SNMP Poller
BGP
Daemon
Enrichment
DB
DATA
FUSION
Geo ←→ IP
ASN ←→ IP
SFlow
ROUTER
FLOW FRIENDLY DATASTORE
Single flow
fused row
sent to storage
PCAP
PCAP
agent
proxy
14. Step 3: Resampling & Unification
• Long term (>1 Month)
• What a process (device) said over an hour
• Two tricks:
• Flow Unification
• Resampling
16. Storage Layer
• Fused KFlow as input...Cap'n Proto (like
protobuffers)
• Shard data into small chunks
• HTTP to N distributed storage nodes
• Metadata supervisor DB handles shard locations
• Row Oriented to Column Oriented
• Compressed using ZFS
DISK
17. Multi-Tenancy DB
Needed Multitenancy for a large-scale SaaS product
Could not find other DB’s @scale with it
We succeeded by building in:
● Fairness
queries are chopped into small chunks, users are rate limited and
prioritized
● Security
data is isolated between “users” down to the thread level
● Multiuser caching with fairness
Built a cache that cannot be monopolized by any 1 user
18. Ingest &
Fusion
layer
Storage layer
(flow specific)
Query
layer
Query engine
and UI
Query
interfaces
SQL
WWW
REST
Data
sources Clients
SELECT flow
FROM router
WHERE …
>_
● SQL interface
PSQL FDW
● UI/UX
feat. advanced
data-viz
● REST API based
interface
build your own
24. • DDoS is a simple use case of anomaly detection
• V1 anomaly detection relied on KDE queries. Abusive
• V2 needed stream processing and in-ram baseline
storage
• Typically avoided streaming db’s due to aggregation
• Streaming db’s for anomaly detection+our long term
flow storage is a powerful combination
• Evaluated Spark, Storm, Samza, PipelineDB. Fail
Detecting Anomalies
25. BGP VIP
KDE ingest layer
enKryptor
Storage layer
kFlow
(HTTPS)
NetFlow
(UDP)
NetFlow
(UDP)
kFlow
(HTTPS)
kFlow
(HTTP)
kFlow
(HTTP)
relay
relay
proxy
proxy
proxy
client
C
client
C
client
C
Streaming layer