SlideShare une entreprise Scribd logo
1  sur  41
P U B L I C S E C T O R
S U M M I T
WASH INGTON DC
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Cyber Data Lake: How CIS Analyzes
Billions of Network Traffic Records
per Day
Brian Calkin
Chief Technology Officer
Center for Internet Security
3 0 2 6 3 9
Oliver Atoa
Senior Consultant
AWS/WWPS Proserve
Bob Strahan
Principal Consultant
AWS/WWPS Proserve
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Agenda
1. Center for Internet Security (CIS) Netflow Challenge (Brian)
2. Our Solution (Bob and Oliver)
3. Results (Brian)
4. Q&A
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Session Goals
1. Educate you with useful architecture concepts
2. Empower you to explore similar approaches for your own business
Familiarity with Data Lakes on AWS presumed
– this not a Data Lake pitch!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Center for Internet Security
Non-profit using the power of a global
community to develop best practices for
securing IT systems and data
Mission: Identify, develop, validate,
promote, and sustain best practice solutions
for cyber defense. Build and lead
communities to enable an environment of
trust in cyberspace.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
CIS Benchmarks and CIS Controls
CIS Benchmarks and CIS Controls are the global standard
and recognized best practices for securing IT systems and data
against the most pervasive attacks
CIS’ proven guidelines are continuously refined and verified by a
volunteer, global community of experienced IT professionals
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Multi-State & Elections Infrastructure
Information Analysis Center (MS-ISAC & EI-ISAC)
The MS-ISAC has been designated by DHS as the key
resource for cyber threat prevention, protection, response,
and recovery for the nation’s state, local, tribal, and
territorial governments
Through the EI-ISAC, election agencies gain access to an
elections-focused cyber defense suite, including sector-
specific threat intelligence products, incident response and
remediation, threat and vulnerability monitoring,
cybersecurity awareness and training products, and tools for
implementing security best practices.
~6,000 Member
Organizations
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
CIS Network Monitoring - Albert
• Network intrusion detection
sensor
• Fully monitored and managed
• State, local, tribal and territorial
government focused
• Open source software on
commodity hardware
• Alert data analyzed 24x7
• ~350 Sensors deployed
nationwide
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
• Generated on sensors
• Passive DNS data also
collected
• Data is valuable for performing
ad-hoc queries
Albert - NetFlow
• Source IP
• Destination IP
• Source Port
• Destination Port
• TCP Flags
• Number of bytes of traffic
sent/received
• Timestamp
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
• Based on Suricata
• Alerts generated based on
known signatures
• ~27,000 Signatures per
Sensor
• ~10,000 Albert events
analyzed per month
• ~5,000 Albert events escalated
to SLTT entities per month
Albert – Network Intrusion Detection
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
The Challenge
- 48 Million Records Per Minute
- Rate of incoming traffic is not consistent and continually
increasing
- Several petabytes (and growing) of NetFlow data
- Local SAN Storage Full
- Ad-hoc queries take way too long to run
- Hours, Days, Weeks…
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
The Desired Solution
- Store Six Months of NetFlow Data
- High Performance Ad-Hoc SQL Queries
- Goal: Get from Days and Weeks to Seconds and Minutes
- Cost Effective
- Highly Secure
- Extensible Features for Future Enhancements and
Growth
*Looked at Both On-Premise & Cloud Solutions
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Our mission
Ingest binary netflow and DPI records from hundreds of sensors
each generating millions of records every couple of minutes.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Our mission
.
Transform and enrich these records, and save them to cost
efficient storage.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Our mission
Provide analysts with fast seamless SQL query
access to all the records; newest (few minutes latency) to oldest
(months back).
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Our mission
Make it secure, cost effective, well instrumented,
reliable, scalable to handle future growth, and extensible as a
foundation for advanced automated analytics and machine
learning.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Lake
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Ingest binary netflow and DPI records from hundreds of sensors each
generating millions of records every couple of minutes.
• Receiver service on AWS
• Amazon Elastic Compute Cloud (Amazon EC2)
Autoscaling, Multi-AZ, NLB with EIP
• SCP file transfer with client auth (keypairs)
• IP whitelisting
• Receivers convert each incoming sensor file
to CSV files, and uploads each file to
Amazon Simple Storage Service (Amazon S3)
(not yet enriched or query ready)
• Receiver cluster scales up and down as
incoming record volumes fluctuate.
Ingestion
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Enrichment and Amazon S3 data lake storage
Transform and enrich flow records and save them to cost efficient storage.
We use a AWS Lambda function
• Triggered immediately as each new CSV
file arrives in Amazon S3 from Receiver
• For each record:
• Detect corrupted records and fix or reject
them
• Enrich good records with additional useful
fields (e.g. IP ASN, directionality, etc.)
• Save records to Amazon S3 (stage0), with
prefixes that define Hive partitions:
• p_sensor=<sensorname>/p_year=YYYY/
p_date=YYYY-MM-DD/p_hour=YYYY-MM-
DD_hh/<file>.csv.gz
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Near real-time SQL access
Provide analysts with fast seamless SQL query access to all the records;
newest (few minutes latency) to oldest (months back).
Enrich AWS Lambda function also adds new
partitions to predefined AWS Glue catalog
tables
Partitions optimize query cost and speed; filters
on sensorname and/or flow timestamp use
‘partition pruning’
Records are accessible to analysts via Amazon
Athena SQL
Latency (sensor to SQL) < 5min
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Optimize for efficient long term queries
Provide analysts with fast seamless SQL query access to all the records;
newest (few minutes latency) to oldest (months back).
Small files (micro-batches) minimize the
latency for NRT queries (stage0).
But we need to optimize for large time
span queries (stage1)
Stage 1 table has same columns and
partitions as stage0, but:
• Columnar file format (parquet)
• Bucketed by srcIP (faster queries)
• Minimize files per hourly partition
Deep dive coming up!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Seamless SQL access
Provide analysts with fast seamless SQL query access to all the records
Recap
• Stage 0 optimized for NRT queries (Enrich Lambda)
– lots of small files per partition
• Stage 1 optimized for historical queries (ETL)
– fewer optimized files per partition
Views combine (UNION ALL) stage0 & stage1
tables to give the best of both worlds
• Low latency for recent data from stage0 (<6hrs)
• Efficiency (cost + speed) from stage1 (>6hrs)
• Views updated every hour (scheduled Lambda)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Well Architected!
Make it secure, cost effective, well instrumented, reliable,
scalable to handle future growth, and extendible as a foundation
for advanced automated analytics and machine learning.
Data Lake on Amazon S3 +
AWS Glue catalog: foundation for
future enhancements and
innovations (using the ‘right tool for
the job’ – Amazon SageMaker,
Amazon QuickSight, Amazon
RedShift, etc.)
Automated
Deployment:
AWS CodePipeline,
AWS CodeBuild, and
AWS CloudFormation
Storage Retention & Costs:
Amazon S3 Intelligent Tiering and
Lifecycle Management
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Stage 1 ETL – Deep Dive
• Columnar file format - Parquet
• Bucketed by source IP (fixed
number of larger files)
• CSV
• Variable number of
small files in NRT
Make queries faster and cheaper
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Stage 1 ETL – Deep Dive
VPC
AWS Cloud
Auto Scaling group
EC2 instance EC2 instance
Stage0 puts partition values in
ETL Amazon Simple Queue
Service
(Amazon SQS) queue
AutoScaling Group driven by
Amazon SQS queue depth
Each Amazon EC2 instance runs
a Dask script that processes one
partition at a time - avoid
network shuffling
Maintain Stage0 partition
structure
ETL SQS Queue
Partition Message
Read all CSV files in partition
Data Lake Bucket
Data catalog
stage0
stage1
Partition Message
Write bucketed Parquet
Get table info & update partition
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Stage 1 ETL – Deep Dive
Why not Amazon Kinesis or Firehose?
Initially iterated using established streaming patterns with
Kinesis and Firehose. As we optimized, we needed greater
control for this particular case:
• Avoid data shuffling. The file micro-batches being ingested
are inherently partitioned in the way we needed
• Firehose partitions files in Amazon S3 by ingestion time. To
optimize NRT queries we need to partition by fields in the
payload: e.g. sensorname and flow timestamp
• File format optimizations. E.g. Parquet row group size and
Hive Bucketing
Amazon Simple
Queue Service
Amazon EC2
Amazon
Kinesis
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Firehose
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Stage 1 ETL – Deep Dive
Why Dask and not Amazon EMR, Spark, or AWS Glue?
Initially iterated using AWS Glue and Amazon EMR. Awesome tools!
Landed on Dask for greater low-level control based on a combination
of optimizations and features for this particular case:
• Skewed partitions – larger sensors can cause stragglers
• Reduce network data shuffling
• Amazon S3 multi-part uploads. Avoid staging and visible when
successful
• Automatic retry of failed partitions
• Dynamic partition overwrite. Backfill or redo of partitions
• Partition management – make them available when needed
• Hive Bucketing
Amazon
EMR
AWS Glue
Amazon EC2
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Stage 1 ETL – Deep Dive
Bucketing
• Hive Partitions for small set of values
• Hive Bucketing for large unique values
(e.g., IP address billions)
• Bucketing provides significant
performance improvement for queries
using bucketed fields
• Hash field and apply mod function based
on number of buckets
• Fixed number of evenly distributed files
• We implemented it in Dask hash(bucket) %
number of buckets
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Stage 1 ETL – Deep Dive
• Partitioning targets a
specific folder
• Bucketing targets a
specific file
• Columnar format can
skip data within the file
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
The results are in…
The answer
is…
You may be asking yourself, did it all work out??
Yes!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Some Examples
Count of all records generated by a single
sensor
• The “old way” – 15 minutes
• The AWS way – 2 minutes
7.5x
faster
Count of all records generated by all sensors
• Old way – 36 hours
• AWS – 3 minutes
720x faster
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Athena Query 1 – Flows Byte Aggregation 1HR
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
A couple more…
Query for all traffic, destined to a specific IP
address and port, over a one week time
period
• Old way – 48 hours
• AWS – 19 minutes
150x faster
Query for all traffic, destined to a set of IP
addresses over port 80, over a one week
time period
• Old way – 72 hours
• AWS – 12 minutes
360x faster
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Athena Query 2 – All Traffic to a Single IP & Port – 1 Week
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Next Steps
• Leverage Amazon Redshift Spectrum
• Building UI for broader usability
• Collect additional datatypes for improved alerting and correlation
• Leverage Artificial Intelligence/Machine Learning to identify malicious
activity
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Bob Strahan
strahanr@amazon.com
Oliver Atoa
oatoa@amazon.com
Brian Calkin
Brian.Calkin@cisecurity.org
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T

Contenu connexe

Tendances

Threat Hunting
Threat HuntingThreat Hunting
Threat HuntingSplunk
 
Bsides 2019 - Intelligent Threat Hunting
Bsides 2019 - Intelligent Threat HuntingBsides 2019 - Intelligent Threat Hunting
Bsides 2019 - Intelligent Threat HuntingDhruv Majumdar
 
Threat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-onThreat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-onSplunk
 
Threat Hunting Procedures and Measurement Matrice
Threat Hunting Procedures and Measurement MatriceThreat Hunting Procedures and Measurement Matrice
Threat Hunting Procedures and Measurement MatriceVishal Kumar
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivitySqrrl
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon KinesisAmazon Web Services
 
Thick Client Penetration Testing.pdf
Thick Client Penetration Testing.pdfThick Client Penetration Testing.pdf
Thick Client Penetration Testing.pdfSouvikRoy114738
 
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Web Services
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Amazon EC2 Instances, Featuring Performance Optimisation Best Practices
Amazon EC2 Instances, Featuring Performance Optimisation Best PracticesAmazon EC2 Instances, Featuring Performance Optimisation Best Practices
Amazon EC2 Instances, Featuring Performance Optimisation Best PracticesAmazon Web Services
 
AWS Lambda Tutorial For Beginners | What is AWS Lambda? | AWS Tutorial For Be...
AWS Lambda Tutorial For Beginners | What is AWS Lambda? | AWS Tutorial For Be...AWS Lambda Tutorial For Beginners | What is AWS Lambda? | AWS Tutorial For Be...
AWS Lambda Tutorial For Beginners | What is AWS Lambda? | AWS Tutorial For Be...Simplilearn
 
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech TalksCloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech TalksAmazon Web Services
 
Threat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-onThreat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-onSplunk
 
Hunting for Evil with the Elastic Stack
Hunting for Evil with the Elastic StackHunting for Evil with the Elastic Stack
Hunting for Evil with the Elastic StackElasticsearch
 
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기Amazon Web Services Korea
 

Tendances (20)

Threat Hunting
Threat HuntingThreat Hunting
Threat Hunting
 
AWS Cloud trail
AWS Cloud trailAWS Cloud trail
AWS Cloud trail
 
Bsides 2019 - Intelligent Threat Hunting
Bsides 2019 - Intelligent Threat HuntingBsides 2019 - Intelligent Threat Hunting
Bsides 2019 - Intelligent Threat Hunting
 
Threat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-onThreat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-on
 
A Threat Hunter Himself
A Threat Hunter HimselfA Threat Hunter Himself
A Threat Hunter Himself
 
Threat Hunting Procedures and Measurement Matrice
Threat Hunting Procedures and Measurement MatriceThreat Hunting Procedures and Measurement Matrice
Threat Hunting Procedures and Measurement Matrice
 
Deep Dive on AWS Lambda
Deep Dive on AWS LambdaDeep Dive on AWS Lambda
Deep Dive on AWS Lambda
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 
Thick Client Penetration Testing.pdf
Thick Client Penetration Testing.pdfThick Client Penetration Testing.pdf
Thick Client Penetration Testing.pdf
 
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Amazon EC2 Instances, Featuring Performance Optimisation Best Practices
Amazon EC2 Instances, Featuring Performance Optimisation Best PracticesAmazon EC2 Instances, Featuring Performance Optimisation Best Practices
Amazon EC2 Instances, Featuring Performance Optimisation Best Practices
 
AWS Lambda Tutorial For Beginners | What is AWS Lambda? | AWS Tutorial For Be...
AWS Lambda Tutorial For Beginners | What is AWS Lambda? | AWS Tutorial For Be...AWS Lambda Tutorial For Beginners | What is AWS Lambda? | AWS Tutorial For Be...
AWS Lambda Tutorial For Beginners | What is AWS Lambda? | AWS Tutorial For Be...
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
AWS Cloud Watch
AWS Cloud WatchAWS Cloud Watch
AWS Cloud Watch
 
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech TalksCloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
Cloud Based Business Intelligence with Amazon QuickSight - AWS Online Tech Talks
 
Threat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-onThreat Hunting with Splunk Hands-on
Threat Hunting with Splunk Hands-on
 
Hunting for Evil with the Elastic Stack
Hunting for Evil with the Elastic StackHunting for Evil with the Elastic Stack
Hunting for Evil with the Elastic Stack
 
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
AWS Lambda를 기반으로한 실시간 빅테이터 처리하기
 

Similaire à CIS Cyber Data Lake Analytics

Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...AWS Summits
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Amazon Web Services
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time dataAmazon Web Services
 
Scalable, secure log analytics with Amazon ES - ADB302 - Chicago AWS Summit
Scalable, secure log analytics with Amazon ES - ADB302 - Chicago AWS SummitScalable, secure log analytics with Amazon ES - ADB302 - Chicago AWS Summit
Scalable, secure log analytics with Amazon ES - ADB302 - Chicago AWS SummitAmazon Web Services
 
Architetture per l'analisi di flussi di dati in tempo reale
Architetture per l'analisi di flussi di dati in tempo realeArchitetture per l'analisi di flussi di dati in tempo reale
Architetture per l'analisi di flussi di dati in tempo realeAmazon Web Services
 
갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...
갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...
갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...Amazon Web Services Korea
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitAmazon Web Services
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Amazon Web Services
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeAmazon Web Services
 
From Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With DataFrom Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With DataAmazon Web Services
 
Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWSAmazon Web Services
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesAmazon Web Services
 
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...javier ramirez
 
Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...
Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...
Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...Amazon Web Services
 
How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019Randall Hunt
 
Choosing the Right Database (Database Freedom)
Choosing the Right Database (Database Freedom)Choosing the Right Database (Database Freedom)
Choosing the Right Database (Database Freedom)Amazon Web Services
 
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &MLAWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &MLAmazon Web Services
 

Similaire à CIS Cyber Data Lake Analytics (20)

Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time data
 
Scalable, secure log analytics with Amazon ES - ADB302 - Chicago AWS Summit
Scalable, secure log analytics with Amazon ES - ADB302 - Chicago AWS SummitScalable, secure log analytics with Amazon ES - ADB302 - Chicago AWS Summit
Scalable, secure log analytics with Amazon ES - ADB302 - Chicago AWS Summit
 
Architetture per l'analisi di flussi di dati in tempo reale
Architetture per l'analisi di flussi di dati in tempo realeArchitetture per l'analisi di flussi di dati in tempo reale
Architetture per l'analisi di flussi di dati in tempo reale
 
갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...
갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...
갤럭시 규모의 인공지능 서비스를 위한 AWS 데이터베이스 아키텍처 - 김상필 솔루션 아키텍트 매니저, AWS / 김정환 데브옵스 엔지니어,...
 
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS SummitBuilding Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
Building Data Lakes for Analytics on AWS - ADB201 - Anaheim AWS Summit
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
From Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With DataFrom Strategy to Reality: Better Decisions With Data
From Strategy to Reality: Better Decisions With Data
 
Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWS
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
 
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
All Databases Are Equal, But Some Databases Are More Equal than Others: How t...
 
Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...
Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...
Built & Delivered in Six Months Using Serverless Technical Patterns and Micro...
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
 
How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019
 
Choosing the Right Database (Database Freedom)
Choosing the Right Database (Database Freedom)Choosing the Right Database (Database Freedom)
Choosing the Right Database (Database Freedom)
 
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &MLAWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
 
Data_Analytics_and_AI_ML
Data_Analytics_and_AI_MLData_Analytics_and_AI_ML
Data_Analytics_and_AI_ML
 
AWS re:Invent recap
AWS re:Invent recapAWS re:Invent recap
AWS re:Invent recap
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

CIS Cyber Data Lake Analytics

  • 1. P U B L I C S E C T O R S U M M I T WASH INGTON DC
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Cyber Data Lake: How CIS Analyzes Billions of Network Traffic Records per Day Brian Calkin Chief Technology Officer Center for Internet Security 3 0 2 6 3 9 Oliver Atoa Senior Consultant AWS/WWPS Proserve Bob Strahan Principal Consultant AWS/WWPS Proserve
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Agenda 1. Center for Internet Security (CIS) Netflow Challenge (Brian) 2. Our Solution (Bob and Oliver) 3. Results (Brian) 4. Q&A
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Session Goals 1. Educate you with useful architecture concepts 2. Empower you to explore similar approaches for your own business Familiarity with Data Lakes on AWS presumed – this not a Data Lake pitch!
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Center for Internet Security Non-profit using the power of a global community to develop best practices for securing IT systems and data Mission: Identify, develop, validate, promote, and sustain best practice solutions for cyber defense. Build and lead communities to enable an environment of trust in cyberspace.
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T CIS Benchmarks and CIS Controls CIS Benchmarks and CIS Controls are the global standard and recognized best practices for securing IT systems and data against the most pervasive attacks CIS’ proven guidelines are continuously refined and verified by a volunteer, global community of experienced IT professionals
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Multi-State & Elections Infrastructure Information Analysis Center (MS-ISAC & EI-ISAC) The MS-ISAC has been designated by DHS as the key resource for cyber threat prevention, protection, response, and recovery for the nation’s state, local, tribal, and territorial governments Through the EI-ISAC, election agencies gain access to an elections-focused cyber defense suite, including sector- specific threat intelligence products, incident response and remediation, threat and vulnerability monitoring, cybersecurity awareness and training products, and tools for implementing security best practices. ~6,000 Member Organizations
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T CIS Network Monitoring - Albert • Network intrusion detection sensor • Fully monitored and managed • State, local, tribal and territorial government focused • Open source software on commodity hardware • Alert data analyzed 24x7 • ~350 Sensors deployed nationwide
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T • Generated on sensors • Passive DNS data also collected • Data is valuable for performing ad-hoc queries Albert - NetFlow • Source IP • Destination IP • Source Port • Destination Port • TCP Flags • Number of bytes of traffic sent/received • Timestamp
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T • Based on Suricata • Alerts generated based on known signatures • ~27,000 Signatures per Sensor • ~10,000 Albert events analyzed per month • ~5,000 Albert events escalated to SLTT entities per month Albert – Network Intrusion Detection
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T The Challenge - 48 Million Records Per Minute - Rate of incoming traffic is not consistent and continually increasing - Several petabytes (and growing) of NetFlow data - Local SAN Storage Full - Ad-hoc queries take way too long to run - Hours, Days, Weeks…
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T The Desired Solution - Store Six Months of NetFlow Data - High Performance Ad-Hoc SQL Queries - Goal: Get from Days and Weeks to Seconds and Minutes - Cost Effective - Highly Secure - Extensible Features for Future Enhancements and Growth *Looked at Both On-Premise & Cloud Solutions
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Our mission Ingest binary netflow and DPI records from hundreds of sensors each generating millions of records every couple of minutes.
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Our mission . Transform and enrich these records, and save them to cost efficient storage.
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Our mission Provide analysts with fast seamless SQL query access to all the records; newest (few minutes latency) to oldest (months back).
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Our mission Make it secure, cost effective, well instrumented, reliable, scalable to handle future growth, and extensible as a foundation for advanced automated analytics and machine learning.
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Data Lake
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Ingest binary netflow and DPI records from hundreds of sensors each generating millions of records every couple of minutes. • Receiver service on AWS • Amazon Elastic Compute Cloud (Amazon EC2) Autoscaling, Multi-AZ, NLB with EIP • SCP file transfer with client auth (keypairs) • IP whitelisting • Receivers convert each incoming sensor file to CSV files, and uploads each file to Amazon Simple Storage Service (Amazon S3) (not yet enriched or query ready) • Receiver cluster scales up and down as incoming record volumes fluctuate. Ingestion
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Enrichment and Amazon S3 data lake storage Transform and enrich flow records and save them to cost efficient storage. We use a AWS Lambda function • Triggered immediately as each new CSV file arrives in Amazon S3 from Receiver • For each record: • Detect corrupted records and fix or reject them • Enrich good records with additional useful fields (e.g. IP ASN, directionality, etc.) • Save records to Amazon S3 (stage0), with prefixes that define Hive partitions: • p_sensor=<sensorname>/p_year=YYYY/ p_date=YYYY-MM-DD/p_hour=YYYY-MM- DD_hh/<file>.csv.gz
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Near real-time SQL access Provide analysts with fast seamless SQL query access to all the records; newest (few minutes latency) to oldest (months back). Enrich AWS Lambda function also adds new partitions to predefined AWS Glue catalog tables Partitions optimize query cost and speed; filters on sensorname and/or flow timestamp use ‘partition pruning’ Records are accessible to analysts via Amazon Athena SQL Latency (sensor to SQL) < 5min
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Optimize for efficient long term queries Provide analysts with fast seamless SQL query access to all the records; newest (few minutes latency) to oldest (months back). Small files (micro-batches) minimize the latency for NRT queries (stage0). But we need to optimize for large time span queries (stage1) Stage 1 table has same columns and partitions as stage0, but: • Columnar file format (parquet) • Bucketed by srcIP (faster queries) • Minimize files per hourly partition Deep dive coming up!
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Seamless SQL access Provide analysts with fast seamless SQL query access to all the records Recap • Stage 0 optimized for NRT queries (Enrich Lambda) – lots of small files per partition • Stage 1 optimized for historical queries (ETL) – fewer optimized files per partition Views combine (UNION ALL) stage0 & stage1 tables to give the best of both worlds • Low latency for recent data from stage0 (<6hrs) • Efficiency (cost + speed) from stage1 (>6hrs) • Views updated every hour (scheduled Lambda)
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Well Architected! Make it secure, cost effective, well instrumented, reliable, scalable to handle future growth, and extendible as a foundation for advanced automated analytics and machine learning. Data Lake on Amazon S3 + AWS Glue catalog: foundation for future enhancements and innovations (using the ‘right tool for the job’ – Amazon SageMaker, Amazon QuickSight, Amazon RedShift, etc.) Automated Deployment: AWS CodePipeline, AWS CodeBuild, and AWS CloudFormation Storage Retention & Costs: Amazon S3 Intelligent Tiering and Lifecycle Management
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Stage 1 ETL – Deep Dive • Columnar file format - Parquet • Bucketed by source IP (fixed number of larger files) • CSV • Variable number of small files in NRT Make queries faster and cheaper
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Stage 1 ETL – Deep Dive VPC AWS Cloud Auto Scaling group EC2 instance EC2 instance Stage0 puts partition values in ETL Amazon Simple Queue Service (Amazon SQS) queue AutoScaling Group driven by Amazon SQS queue depth Each Amazon EC2 instance runs a Dask script that processes one partition at a time - avoid network shuffling Maintain Stage0 partition structure ETL SQS Queue Partition Message Read all CSV files in partition Data Lake Bucket Data catalog stage0 stage1 Partition Message Write bucketed Parquet Get table info & update partition
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Stage 1 ETL – Deep Dive Why not Amazon Kinesis or Firehose? Initially iterated using established streaming patterns with Kinesis and Firehose. As we optimized, we needed greater control for this particular case: • Avoid data shuffling. The file micro-batches being ingested are inherently partitioned in the way we needed • Firehose partitions files in Amazon S3 by ingestion time. To optimize NRT queries we need to partition by fields in the payload: e.g. sensorname and flow timestamp • File format optimizations. E.g. Parquet row group size and Hive Bucketing Amazon Simple Queue Service Amazon EC2 Amazon Kinesis Amazon Kinesis Data Streams Amazon Kinesis Data Firehose
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Stage 1 ETL – Deep Dive Why Dask and not Amazon EMR, Spark, or AWS Glue? Initially iterated using AWS Glue and Amazon EMR. Awesome tools! Landed on Dask for greater low-level control based on a combination of optimizations and features for this particular case: • Skewed partitions – larger sensors can cause stragglers • Reduce network data shuffling • Amazon S3 multi-part uploads. Avoid staging and visible when successful • Automatic retry of failed partitions • Dynamic partition overwrite. Backfill or redo of partitions • Partition management – make them available when needed • Hive Bucketing Amazon EMR AWS Glue Amazon EC2
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Stage 1 ETL – Deep Dive Bucketing • Hive Partitions for small set of values • Hive Bucketing for large unique values (e.g., IP address billions) • Bucketing provides significant performance improvement for queries using bucketed fields • Hash field and apply mod function based on number of buckets • Fixed number of evenly distributed files • We implemented it in Dask hash(bucket) % number of buckets
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Stage 1 ETL – Deep Dive • Partitioning targets a specific folder • Bucketing targets a specific file • Columnar format can skip data within the file
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T The results are in… The answer is… You may be asking yourself, did it all work out?? Yes!
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Some Examples Count of all records generated by a single sensor • The “old way” – 15 minutes • The AWS way – 2 minutes 7.5x faster Count of all records generated by all sensors • Old way – 36 hours • AWS – 3 minutes 720x faster
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Athena Query 1 – Flows Byte Aggregation 1HR
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T A couple more… Query for all traffic, destined to a specific IP address and port, over a one week time period • Old way – 48 hours • AWS – 19 minutes 150x faster Query for all traffic, destined to a set of IP addresses over port 80, over a one week time period • Old way – 72 hours • AWS – 12 minutes 360x faster
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Athena Query 2 – All Traffic to a Single IP & Port – 1 Week
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Next Steps • Leverage Amazon Redshift Spectrum • Building UI for broader usability • Collect additional datatypes for improved alerting and correlation • Leverage Artificial Intelligence/Machine Learning to identify malicious activity
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T
  • 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T Bob Strahan strahanr@amazon.com Oliver Atoa oatoa@amazon.com Brian Calkin Brian.Calkin@cisecurity.org
  • 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R S U M M I T