More Related Content
Similar to Data-Driven Security Analytics Using a Data Lake
Similar to Data-Driven Security Analytics Using a Data Lake (20)
Data-Driven Security Analytics Using a Data Lake
- 1. 1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Principal Data Scientist, Pivotal
Derek Lin
- 2. 2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Agenda
• Information Security Analytics & Use Cases
• Data Lake
• Data Science: Extracting Values from Data Lake
• Demo
- 3. 3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Information Security Analytics
Landscape
Application Areas
- 4. 4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
CIO Survey: Top Concerns
Sources: Barclays September 2013 CIO Survey, KPMG January 2014 CIO/CFO Survey
54% What to Collect
85% How to Analyze
- 5. 5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
A Big Data Analytics Response
• More sophisticated adversaries and sophisticated methods.
• Limited human capacity combined with massive amounts of events
– 40% of all survey respondents are overwhelmed with the security data
they already collect
– 35% have insufficient time or expertise to analyze what they collect
• Security tools, tactics and defenses becoming outdated:
– Content is static and not as dynamic as the threat landscape
– Segregated by too many point products, tool interfaces, disparate data
sets
1 EMA, The Rise of Data-Driven Security, Crawford, Aug 2012
Survey Sample Size = 200
- 6. 6© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Enterprise Information Security
Analytics
Insider
Threat
Asset Risk
Malware
Threat
- 7. 7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
APT Kill Chain
Advanced Persistent Threat (APT)
A handful of users
are targeted by two
phishing attacks:
one user opens
Zero day payload
(CVE-02011-0609)
The user machine
is accessed
remotely by
Poison Ivy tool
Attacker elevates
access to important
user, service and
admin accounts,
and specific
systems
Data is acquired from
target servers and
staged for exfiltration
Data is exfiltrated via
encrypted files over
ftp to external,
compromised machine
at a hosting provider
Phishing and
Zero Day Attack
Back Door
Lateral
Movement
Data
Gathering
Exfiltrate
1 2 3 4 5
- 8. 8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Anatomy of an Attack | Anatomy of a Response
- 9. 9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Technology Landscape in the Kill Chain
Perimeter
penetration
Malware
beacon
Lateral
movement
Staging &
exfiltration
Single source
real-time
Proactive, limited-sources
rule-based methods
over short-range
Proactive, multi-sources
data-driven methods
over long-range
Reactive, manual
post-incident response
Host/Network Analysis & Search/Indexing
Data-lake enabled analytics
IDS/IPS
Anti-virus
SIEM
DLP
- 10. 10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Analytics Opportunities in Threat Defense
• Malware discovery
– Host infection detection
– Malware Command & Control
beaconing activity detection
• Perimeter penetration
prevention/detection
– Anomalous VPN login detection
– Denial of service attack
mitigation
– Local IP black list construction
– Watering hole attack detection
– Chat room monitoring
– Phishing attack detection
– Web server attack detection
• In-network anomaly detection
– Anomalous resource access
detection
– Critical server activity
monitoring
• IR efficiency improvement
– Semi-automated analysis
• SIEM efficiency improvement
– Threat feed normalization
– Alert prioritization
Malware
Threat
- 11. 11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Information Security Analytics Areas
Insider
Threat
Asset Risk
Malware
Threat
- 12. 12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Malicious Insider Threat
http://www.logrhythm.com/Portals/0/resources/LogRhythm_Survey_15.42014.pdf
What do you think is the biggest threat to your
organization’s confidential data?
Does your organization have any systems in place to
stop employees accessing confidential information or
taking data?
- 13. 13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Analytics Opportunities in Identity, Access,
and Management
• Anomalous user to resource access detection
– User-resource access data
• Role and activity auditing
+ Role and provisioning data
• Privilege escalation auditing
+ Privilege escalation data
• IT support personnel auditing
- Support ticket data
- Command activity data Insider
Threat
- 14. 14© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Information Security Analytics Areas
Insider
Threat
Asset Risk
Malware
Threat
- 15. 15© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Information Asset Management
- 16. 16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Analytics Opportunities in Asset Management
• Document categorization for risk labeling
– Unstructured data
– User access data
• Asset risk profiling
– Vulnerability scanner data
– User access data
Asset
Risk
- 17. 17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Data Lake
Needs and Trend
Analytics Support
- 18. 18© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Life Before Data Lake
Departmental
Warehouse
Enterprise
Apps
Reporting
Non-Agile Models
Spread
marts
Prioritized
Operational Processes
Errant data and marts
Departmental
Warehouse
Siloed
Analytics
Data
Sources
Non-Prioritized
Data Provisioning
Static schemas
grow over time
- 19. 19© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Impact of Status Quo
• High-value data is hard to reach and leverage
• Data Scientists are last in line for data
– Queued after prioritized operational processes
• Data is moving in batches from Warehouse(s) to
Data Scientists’ desktops
– In-memory analytical work (w/ R, SAS, SPSS, Excel)
– Sampled, driving model accuracies down
• There is a “cottage industry” of analytics, rather
than centrally-managed harnessing of analytics
– Non-standardized initiatives
– Frequently, not-aligned with corporate business goals
Slow
“time-to-
insight”
&
reduced
business impact
- 20. 20© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Data-Driven Digital Media Analytics
Targeting &
Retention
Social Media
Analysis
Campaign
optimization
0
Transaction
History
Purchases
Clickstream
Customer
Data
Unified data supporting re-usable predictive models
GB
TB
PB
Data
Size
- 21. 21© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Data-Driven Financial Protection Analytics
Unified data supporting re-usable predictive models
TB
Data
Size
Web
Optimization
Fraud
Detection
Product
Recommendation
ATM
Member Data
Transactional Log
Firewall
Clickstream
Phone Channel
GB
- 22. 22© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Data-Driven IT Operation Analytics
Unified data supporting re-usable predictive models
GB
TB/
PB
Data
Size
Failure
Prediction
Root Cause
Analysis
Project Risk
Forecasting
Server or VM
Performance Metrics
CMDB
Configuration
Setting
Alerts & Incident
Server logs
Network
Performance Metrics
- 23. 23© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Data-Driven Security Analytics
Unified data supporting re-usable predictive models
TB/
PB
Data
Size
Insider Threat
Detection
Malware
Detection
DDoS Mitigation
AD/Auth
Asset/Role
Netflow
DNS/Firewall/Proxy
Critical Server
Packet Capture
GB
Defense with breadth in variety and depth in time
- 24. 24© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Business Data Lake Architecture
Centralized Management
System monitoring System management
Unified Data Management Tier
Data mgmt.
services
MDM
RDM
Audit and
policy mgmt.
Processing Tier
Workflow Management
In-memory
MPP database
Existing Sources
Unified Sources Flexible Actions
Real-time
ingestion
Micro batch
ingestion
Batch
ingestion
Real-time
insights
Interactive
insights
Batch
insights
HDFS
New Data Sources
- 25. 25© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Business Data Lake Architecture
Centralized Management
Unified Data Management Tier
Data Dispatch
MDM
RDM
Data Dispatch
Processing Tier
Spring XD
Pivotal GemFire XD
HAWQ
Unified Sources Flexible Actions
Clickstream
Sensor Data
Weblogs
Network Data
CRM Data
ERP Data
Pivotal
GemFire
Pivotal
RabbitMQ
Redis
Pivotal CFPivotal HD
Command Center
Existing SourcesNew Data Sources
- 26. 26© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Data Lake: More Than a Data Repository
To iterate and
experiment to
fail fast for fast
cycle of value
generation
Analytics
Support
Fast
Query
Data
Store
- 27. 27© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Data Science Tools
Commercial Open Source (or Free)
PL/R, PL/Python PL/Java
- 28. 28© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
MADlib In-Database Functions
Predictive Modeling Library
Linear Systems
• Sparse and Dense Solvers
Matrix Factorization
• Single Value Decomposition
(SVD)
• Low-Rank
Generalized Linear Models
• Linear Regression
• Logistic Regression
• Multinomial Logistic Regression
• Cox Proportional Hazards
• Regression
• Elastic Net Regularization
• Sandwich Estimators (Huber
white, clustered, marginal
effects)
Machine Learning Algorithms
• Principal Component Analysis (PCA)
• Association Rules (Affinity Analysis,
Market Basket)
• Topic Modeling (Parallel LDA)
• Decision Trees
• Ensemble Learners (Random Forests)
• Support Vector Machines
• Conditional Random Field (CRF)
• Clustering (K-means)
• Cross Validation
Descriptive Statistics
Sketch-based Estimators
• CountMin (Cormode-
Muthukrishnan)
• FM (Flajolet-Martin)
• MFV (Most Frequent
Values)
Correlation
Summary
Support Modules
Array Operations
Sparse Vectors
Random Sampling
Probability Functions
- 29. 29© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Data Science: Extracting Value from
Data Lake
Technology & Tools
People
- 30. 30© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Evolution of Data Analytics in Security
BI and
Compliance-
driven
Investigation-
driven
Behavior-
metrics driven
Data-science
driven
Data goes in,
hard to extract
value
Fast queries
over large data
Single source
metrics, simple
correlation, rule-
based, high false
positive
Leverage full
contextual info,
multi-source,
automatic, for low
false positives
- 31. 31© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
What is Data Science?
The use of statistical and machine learning techniques
on big multi-structured data in a distributed computing
environment
to identify correlations and causal relationships
classify and predict events
identify patterns and anomalies
and infer probabilities, interest and sentiment.
- 32. 32© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Data Science: The Next Security Frontier
• Beyond signatures
• Beyond simple metrics for
thresholding
• Beyond manual engineering
of rules
• Monitor each and every
entity in its environmental
context with 360° view
over long time window with
advanced mathematics
- 33. 33© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Network Intelligence
Demo
- 34. 34© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
- 35. 35© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
- 36. 36© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
- 37. 37© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
What is a
“Data
Scientist”?
ProgrammingSkills
Mathematical/Statistical Skills
- 38. 38© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Mathematical/Statistical Skills
One Team
Member
ProgrammingSkills
- 39. 39© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Mathematical/Statistical Skills
Another
Team
Member
ProgrammingSkills
- 40. 40© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Yet
Another
ProgrammingSkills
Mathematical/Statistical Skills
- 41. 41© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Together
ProgrammingSkills
Mathematical/Statistical Skills
- 42. 42© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Take-Home Messages
Information
Security = Big
Data Problem
Data Lake is
more than a
data store
Data Science drives
value from Information
Security Data Lake
- 43. 43© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.