This talk describes, from an architectural point of view, how to exploit the HDP + Nifi technological platform aimed at researching, exploiting and targeting events related to Cyber Security. The purpose of the system is to create a knowledge base related to the events, actors and operating methods with which the cyber attacks happened and may happen, collecting both real-time data from social networks and web pages or literature material on such episodes in batch modality. The process focuses text and graph analysis at scale thanks to Spark engine Metron and Kafka, on a complexly integrated tech stack, that enhances the capabilities of the algorithms and results to offer a flexible solution to the analysts. The system supports the user in determining the motivations and eventually the actual executors of the attacks and, hopefully, the instigators of the same, also thanks to a smart representation of data stored on a graph NoSQL database. A further aim of the system will be to determine, in a predictive way, the "symptoms" or the processes connected to the attacks.
2. ENG.IT
Who am I???
Manager of
Big Data
Competence
Center
Trainer on
Big Data
Tech
Writer for
Ingenium
Magazine
Big Data
Solution
Architect
Big Data
Cartoonist
4. ENG.IT
Tools
Shared & integrated information
Threat Intelligence
Platforms
CTImate
modular platform based on Big Data tools & AI
techniques
Cyber Threat Intelligence
know, predict and
prevent threats
5. ENG.IT
CTImate: EII platform for Cyber Threat Intelligence
Address
investments
Make
informed
decisions
Support for predicting and
preventing future threats to
IT systems
Measure
resilience rate
of your own
systems
News
Software Vulnerability
Leak/breach
Actors
Previous Incident Analysis
Open
Data
Documents
API REST
Vertical DB
Social
Security
agencies
NIDS
Logs
Firewall
Email Scanner
SIEM
13. ENG.IT
CTImate: HCP
The data flow for HCP is performed in real-time and contains the following steps:
• Information from telemetry data sources is ingested into Kafka topics (Kafka is
the telemetry event buffer). A Kafka topic is created for every telemetry data
source. This information is the raw telemetry data consisting of host logs,
firewall logs, emails, and network data.
• The data is parsed into a normalized JSON structure that Metron can read.
• The information is then enriched with asset, geo, threat intelligence, and other
information
• The information is indexed and stored, and any resulting alerts are sent to the
Metron dashboard, the Alerts user interface, and telemetry.
14. ENG.IT
CTImate: rely on HCP
Indexing & Storing
Alerting
Enriching
Parsing
Real time ingesting
Visualizing
Nifi
Metron
Metron
Metron
HDFS. Solr, Elasticsearch
Kibana
15. ENG.IT
CTImate: NRT pipeline
• Normalization: perform a parsing of the input data, in order to obtain a JSON format (if it already
exists, it may be necessary, modify it anyway for the purposes of use by Metron), using native
Grok parser (or custom Java)
• Enrichment: information can be enriched, for example, with data related to geo-ip geolocation,
and domain info (which should already be available on Metron)
• Alert for Hate Speech: for each input that requires it, recalls a web service over HTTP for the
sentiment analysis, using Stellar code. The Hate Speech model consists of Python code
serialized and stored on HDFS, previously trained, thanks to MaaS service that runs a bash
script invoking the model exposed by REST Apis (Flask).
Elasticsearch Kibana
22. ENG.IT
CTImate: What Kibana is used for?
• Charts and Dashboards on Facets queries
• Geographical maps
• Time series
• Tabular views
Search the
index
(Lucene-
like syntax)
Hi everybody, I’m pleased to be here in Barcelona at the dw summit, to present our solution for cyber threat intelligence
Let me start with a very quick intro about who am I.
I lead the big data team within the Data & analytics coe of engineering i.i. nevertheless i’m the solution architectdealing with distributed systems for data management (let’s call them big data). Apart from this main mandate i do several stuff, like training, writing articles and pre-sales acivities as well as some more funny thing like…drawing!!!
Let’s jump to the topic of my talk: i said that we developed a platform for threat intel which is in the context of:
….
Therefore we decided to build a platform, called ctimate, in charge of integrating different useful sources, to perform predictions using AI techniques overe some Big Data technologies i’m going to show you
The cyber threat intel system we addressed takes care of collecting news, vulnerabilities, breaches, actors and previous incidents that have been already analyzed, crawling ad-hoc the open data, some social networks, security agencies feeds, specific documentation, vertical data bases and everything we need available through API Rest. These help addressing investments, making informed decisions, predicting and preventing future threats and measuring resiliancies rate of…
In order to test and verify the solution we inspected a very popular case-study attack: it happened on December 23, 2015, the Ukrainian National Electricity System suffered a major blackout that affected a wide area1 of the territory for several hours, probably due to the effects of a cyber attack against SCADA systems (Supervisory, Control and Data Acquisition) of some electricity distribution companies2.
From open sources, it would seem to emerge that the malware used in the event is an evolution of the well-known APT BlackEnergy, malware developed and used in the past by entities sponsored. Furthermore, on 19 January 2016 a further attack against the same targets and with the same modalities was found, but with a different malware, probably to ensure the effectiveness and persistence of the attack even after the development and dissemination of the Indicators of Compromise of the previous one.
Thanks to this case-study we collected lots of useful documents to address the first use cases.
What about functionalities? Here you can see a set of capabilities available within CTImate:
…
We’ll focus the speech on the architectural aspects to perform alla of these analytical functionalities, which is solved by a Big Data approach based on the Hadoop ecosystem, given that the aim of the tool is to provide the above functionalities.