Cyber-security Intelligence with Big Data Analytics : Values, Machine learning Algorithms & Defence strategy, Architecture & Processes, Data processing paradigms, Ecosystem overview, and case studies
1. 1/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Workshop 3 : Fight against Cybercrime and Crime
by Big Data Analysis
28 Octobre 2016
Karim BAÏNA
Co-responsable du Diplôme Universitaire « Big Data Scientist »
Chef du Département Génie Logiciel
Chef de Service de Coopération
ENSIAS, Mohammed V University of Rabat, Morocco
2. 2/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
SAN FRANCISCO — Eleven hours after a massive online attack that
blocked access to many popular websites, the company under assault
has finally restored its service.
Dyn, a New Hampshire-based company that monitors and routes
Internet traffic, was the victim of a massive attack that began at 7:10
a.m. Friday morning.
The issue kept some users on the East Coast from accessing Twitter,
Spotify, Netflix, Amazon, Tumblr, Reddit, PayPal and other sites.
11 hours later (at 6:17 p.m. Friday), Dyn updated its website to
say it had resolved the DDoS had been restored.
Mirai software (origin of the attack) uses malware from phishing
emails to first infect a computer or home network, then spreads to
everything on it, taking over DVRs, cable set-top boxes, routers and
even Internet-connected cameras used by stores and businesses for
surveillance. These devices are in turn used to create a robot network
(or botnet), to send the millions of messages that knocks the out
victims' computer systems.
Hacked home devices caused massive
Internet large-scale DDoS attack
Source : USA TODAY 10:04 a.m. Tuesday, October 22, 2016
The massive DDoS attack was a sophisticated, highly distributed attack
involving "10s of millions of IP addresses" of IoT devices part of the Mirai botnet
protected by little more than factory-default usernames and passwords
3. 3/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
« Cybersecurity Framework, Big Data and the
associated analytics tools coupled with the
emergence of cloud, mobile, and social
computing offer opportunities to process and
analyze structured and unstructured
cybersecurity-relevant data » NIST, National
Institute of Standards and Technology'2014
« Security analytics market is projected to hit
$7.1 billion by 2020 » Markets and Markets, 2015
4. 4/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
1. Introduction
2. Cyber-security Intelligence with Big Data Analytics :
Value
Machine learning Algorithms & Defence strategy
Architecture & Processes
Data processing paradigms
Ecosystem overview
3. Case Studies
4. References
Outline
5. 5/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Introduction
Karim BAÏNA, ENSIAS
28 Octobre 2016
Workshop 3 : Fight against Cybercrime and Crime by Big Data analysis
6. 6/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
90% of universal data has been produced during
last 5 years
+1,2 T (10¹²) search on Google
+4 B (10⁹) hours of video on Youtube
+1 B active users on Facebook spending 700 M
min per month
+500 M users posting +55 M Tweets every day
+30 B RFID Tag in 2013 (1.3 Billion in 2005)
+6 B of mobile phones
+4,6 B of camera phone
+420 M of wearable, wireless health Monitors
+200 M smart metter in 2014 (76 M in 2009)
+100 M of GPS enabled
Big Data – Digital Universe drives growth
and integration of digital economy
source intel.com
source : IBM
source Hongkiat.com
7. 7/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Crime-As-Service business model
on drives digital underground
economy that provides wide range of
commercial services that facilitate any
type of cybercrimes tragetting
vulnerabilities of people, process, and
technology.
The financial gain from cybercrime
stimulates the commercialisation of
cybercrime as well as its innovation,
scale, and further sophistication,
intelligence, versatility, and availability.
"Dark Web" and underground cybersecurity
economy
Prices on Dark internet
1000 verified e-mail @ 10 $
1000 social network account 12 $
1 passport scan 2 $
1 Cloud account 8 $
1 Credit card number 20 $
10. 10/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Firewalls, Intrusion Detection Systems
(IDS), Intrusion Prevention System (IPS)
– Security architects realized the need for
layered security
e.g., reactive security and breach response
because a system with 100% protective
security is impossible.
Source: www.123rf.com
Cyber-security Intelligence - 1st generation
11. 11/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Security information and event management –
Managing alerts from different intrusion detection
sensors and rules was a big challenge in enterprise
settings.
SIEM systems aggregate and filter alarms from
many sources and present actionable
information to security analysts.
Corporate cyber crime is increasing, the number of
security incidents climbed to 38% in 2015 and is
growing high, and the prevention and detection
methods have proved largely ineffective. Price
waterhouse Coopers'2016
Currently, for complex cyberespionage attack (e.g.
Advanced Persistent Threat (APT)) detection relies
heavily on the expertise of human analysts to create
custom signatures and perform manual investigation.
Cyber-security Intelligence - 2nd generation
Source: ManageEngine.com
12. 12/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Cyber-security
Intelligence with
Big Data Analytics
Karim BAÏNA, ENSIAS
28 Octobre 2016
Workshop 3 : Fight against Cybercrime and Crime by Big Data analysis
14. 14/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Value of Big Data and Data analytics for
Cyber-security Challenges
Territorial Security
Prediction of natural Catastrophs
Citizen Security
Efficient and Personnalised recognition of malicious
behaviour (pattern) representing cyber-security
threatness, to suggest/recommend actions.
Identification of actionable security information
from large enterprise data sets and decrease of false
positive rate (Veracity) to manageable levels (actions
are expensive).
Complex events correlation analysis (eg. user profile
& behaviour similarity, event dependence or causality) to
produce coherent peaces of cyber-security knowledge
Prevention analysis : deduction that an event will
happen – future cyber-security risk probability) and
Proposition of anticipative actions to limit the impact.
Prediction analysis : exact deduction and explanation
when an inner or extern cyber-security issue will happen,
and Prevision of consequences.
Prove compliance with regulatory requirements.
Financial Fraud analysis
Prevention of epidemy evolution
15. 15/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Big Data Architecture and processes
(Real Time Processing)
Big Data Zone
ata
ke
(Batch Processing)
Big Data Lake (Processing Data at Rest) :
Acquisition, Extraction cleaning/annotation,
Integration/Aggregation, Representation, and
Recording [un, semi] structured data.
Real Time Processing (RTAP of Data in Motion) :
Big Data Management and Analytics in real time
Analytics Sand Box : Modeling and Analysis
through inductive/inferential approach on a sample
data set, Interpretation.
Continuous learning loop between Big Data Zone and
Analytics Sand Box (deductive/inductive process)
Business Intelligence Environment : Browsing
structured datamarts, KPI Reporting, Actionning and
Alerting, Integration with Business Processes.
Inspired from EMC (except RTAP part)
19. 19/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Anomaly Detection (AD)
anomaly
outlier
Cluster 2
Cluster 1
anomaly
Linear regression
K-Means (clustering) 1) Generate a Model of what is normal :
Group data using supervised or unsupervised
methods.e.g. Classification/Clustering
2) Anomaly Detection :
refers to the problem of finding
patterns in data that don’t confirm the expected
behaviour
Detect data points that deviates so much from
the normal expected observations. when it
happens trigger a signal.
Examples : Smart, customized and targeted malware,
Malicious or negligent insiders who abuse their access
to put data or IP at risk, Compliance breaches that
require complex interrelated rule sets to be detected,
etc.
unsupervised algorithm
supervised algorithm
20. 20/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
source : Happiest Minds Technologies'2013
Anomaly Detection (AD)
21. 21/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Behaviour change/Anomaly Detection
(B.A.D)
1) Generate a Model of what is normal :
If the scoring of current collected data is not an
outlier (within a window of most recent data), it is
added to a buffer (reference)
2) Behaviour change Detection :
Keep monitoring change in patterns between the
current data and the reference buffer based on
distance metric.
Detect shift in the score of the current data.
when it happens trigger a signal.
Examples : User/employee behaviour, Asset
behaviour, Interaction behaviour.
22. 22/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Advanced Network Anomaly example :
APT (Advanced Persistent Threat)
source : IBM'2013
23. 23/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Advanced Network Anomaly example :
APT (Advanced Persistent Threat)
filter malicious URLs with
Neural Networks
or Clustering
filter spam e-mail with
A trained machine learning (Decision Tree, Support Vector Machine
identify infected PDF files with malicious JavaScript
By (1) Detecting JavaScript syntacticly
+ (2) categorising JS code as malicious
with syntactic trees-based clustering
Detect anormal
outbound data transfer
over the network
(if data exfiltration targets
the enterprise network
and not a third party one's)
Correlation of all those detections within a window of time
24. 24/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
OLAPnon-structured
semi-structured
structured
Cross
- multiple (inner or extern) data
sources
- of multiple formats (or not even
formatted)
- with no-schema constraint (ELT
or schema on read)
network traffic events from firewalls,
and security devices, software
application events (e.g. website traffic,
financial transactions, business processes),
and people action events)
Managing Variety of cybersecurity data
sources with BigData
25. 25/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Managing Volume of cybersecurity data
with Big Data
(partition/fragmentation)
Data at Rest
Spread data across a cluster of
computers (partition/fragmentation)
Keep processing physically close to the
data (parallel synchronous [micro]
batching for Data Locality)
Large enterprise Data sets Storage
for a longer period without purge
Analysis scalability of big security
data (min instead of hours)
26. 26/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Pattern recognition/correlation/scoring rules
Data in Motion
Data (event) arrives to
processings and is handled
before even storage
Processing of Millions of
events by second (real time
analysis processing – RTAP)
It is estimated that an enterprise as large
as HP (in 2013) generates 1 trillion
events per day, or roughly 12 million
events per second
Managing Velocity of cybersecurity data
production with BigData
27. 27/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Case Studies
Karim BAÏNA, ENSIAS
28 Octobre 2016
Workshop 3 : Fight against Cybercrime and Crime by Big Data analysis
28. 28/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Network Security : Zions Bancorporation announced that it
is using Hadoop clusters and Big Data to parse more data
more quickly than with traditional SIEM tools.
In their experience, the quantity of data and the frequency
analysis of events are too much for traditional SIEMs to handle
alone.
To better model security context of the enterprise, Zions
Bancorporation built a security Hive datawarehouse on
Hadoop : 120 TB (2 years storage) of more than 120 types of
multi-source data : transactions, fraud alerts, server logs
firewall logs, IDS logs.
In their traditional systems, searching among a month’s load of
data could take between 20 minutes and 1 hour. In their new
Hadoop system running queries with Hive, they get the
same results in about one minute.
Cybersecurity with Big Data - Case Studies
Zions Bancorporation,
RSA Conference'2012
29. 29/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Normal data on an enterprise environment includes
billions of events per day (IP traffic information and
network trafic). This data that should be used to identify
cybersecurity issues, that are collected by Netflow, must
be stored and analyzed.
Storage alone is costly. Analyzing what amount Big Data
stores is an entire other challenge.
Apache Spot (Incubating) offers a solution. It was
designed to gather, store and analyze Big Data. In fact,
Apache Spot (Incubating) is an ideal solution for this
cybersecurity challenge.
Apache Spot (Incubating) can integrate many different
data sources in a data lake then add operational context
to the data by linking configuration, inventory, service
databases and other data stores. This helps you to
prioritize the actions to take under different attack,
malware, APT and hacking scenarios.
Cybersecurity with Big Data - Case Studies
source : http://spot.apache.org
30. 30/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
A large-scale graph inference approach was
introduced to identify malware-infected hosts in
an enterprise network and the malicious
domains accessed by the enterprise's hosts.
Experiments on a 2 Billion HTTP request data set
collected at a large enterprise, a 1 Billion DNS
request data set collected at an ISP, and a 35
Billion network IDS alert data set collected from
over 900 enterprises worldwide.
True positive rates and false positive rates can
be decreased with having limited data labeled as
normal events or attack events used to train
anomaly detectors (supervised algorithm)
Cybersecurity with Big Data - Case Studies
Machine Learning Approach &
Algorithms :
Graph inference approach
Supervised anomaly detectors
HP Labs'2013
31. 31/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
TBs of DNS events consisting of Billions of DNS requests
and responses collected at an ISP were analyzed.
The goal was to use the rich source of DNS information to
identify botnets, malicious domains, etc.
A varied set of features were computed, including ones
derived from domain names, time stamps, and DNS
response time-to-live values.
Then, classification techniques (e.g., decision trees and
support vector machines) were used to identify infected
hosts and malicious domains.
The analysis has already identified many malicious
activities from the ISP data set.
Cybersecurity with Big Data - Case Studies
Machine Learning Algorithms :
Decision Trees,
Support Vector Machines (SVM)
HP Labs'2013
32. 32/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Big Data – an ecosystem of new concepts
and innovative technologies
33. 33/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Sandip K Pal, Manish Anand, User Behavior based Anomaly Detection for Cyber Network Security, Happiest Minds Technologies, 2013
Big Data Working Group, Big Data Analytics for Security Intelligence, Cloud Security Alliance, 2013
Big Data Analytics in Cyber Defense, Sponsored by Teradata, conducted by Ponemon Institute LLC, February 2013
The Internet Organised Crime Threat Assessment (iOCTA), Europol, 2014
Detecting Hacks : Anomaly Detection on Networking Data, CISCO, 2015 Hadoop Summit
Turnaround and transformation in cybersecurity Key findings from The Global State of Information Security, Survey PwC, 2016
Sri Krishnamurthy, Anomaly Detection Techniques and Best Practices, 4th Annual Global Big Data Conference, August-September 2016, Santa Clara, California
Salah Baïna, Nouvelles Technologies & Nouvelles Menaces, Securisk Africa Forum, February 2016.
Karim Baïna, Les Big Data : Paradigm Shift et catalyseur de création de la Valeur, ISIMA, Université Blaise Pascale, Juillet 2016
References
34. 34/30Karim BAÏNA, « Workshop 3 : Fight against Cybercrime and Crime by Big Data Analysis », AUSIM'2016
Workshop 3 : Fight against Cybercrime and Crime
by Big Data Analysis
28 Octobre 2016
Karim BAÏNA, karim.baina@um5.ac.ma
Co-responsable du Diplôme Universitaire « Big Data Scientist »
Chef du Département Génie Logiciel
Chef de Service de Coopération
ENSIAS, Mohammed V University of Rabat, Morocco