Peter Wood started looking at Big Data as a solution for Advanced Threat Protection in 2013. This presentation examines how Big Data is being used for security in 2015, how this market is developing and how realistic vendor offerings are.
The three major categories of machine roles in a Hadoop deployment are Client machines, Masters nodes, and Slave nodes. The Master nodes oversee the two key functional pieces that make up Hadoop: storing lots of data (HDFS), and running parallel computations on all that data (Map Reduce). The Name Node oversees and coordinates the data storage function (HDFS), while the Job Tracker oversees and coordinates the parallel processing of data using Map Reduce. Slave Nodes make up the vast majority of machines and do all the dirty work of storing the data and running the computations. Each slave runs both a Data Node and Task Tracker daemon that communicate with and receive instructions from their master nodes. The Task Tracker daemon is a slave to the Job Tracker, the Data Node daemon a slave to the Name Node.
Client machines have Hadoop installed with all the cluster settings, but are neither a Master or a Slave. Instead, the role of the Client machine is to load data into the cluster, submit Map Reduce jobs describing how that data should be processed, and then retrieve or view the results of the job when its finished. In smaller clusters (~40 nodes) you may have a single physical server playing multiple roles, such as both Job Tracker and Name Node. With medium to large clusters you will often have each role operating on a single server machine.
In real production clusters there is no server virtualization, no hypervisor layer. That would only amount to unnecessary overhead impeding performance. Hadoop runs best on Linux machines, working directly with the underlying hardware.
The deployment of Big Data for fraud detection, and in place of security incident and event management (SIEM) systems, is attractive to many organisations. The overheads of managing the output of traditional SIEM and logging systems are proving too much for most IT departments and Big Data is seen as a potential saviour. There are commercial replacements available for existing log management systems, or the technology can be deployed to provide a single data store for security event management and enrichment.
Taking the idea a step further, the challenge of detecting and preventing advanced persistent threats may be answered by using Big Data style analysis. These techniques could play a key role in helping detect threats at an early stage, using more sophisticated pattern analysis, and combining and analysing multiple data sources. There is also the potential for anomaly identification using feature extraction.
Today logs are often ignored unless an incident occurs. Big Data provides the opportunity to automatically consolidate and analyse logs from multiple sources rather than in isolation. This could provide insight that individual logs cannot, and potentially enhance intrusion detection systems (IDS) and intrusion prevention systems (IPS) through continual adjustment and effectively learning “good” and “bad” behaviours.
Integrating information from physical security systems, such as building access controls and even CCTV, could also significantly enhance IDS and IPS to a point where insider attacks and social engineering are factored in to the detection process. This presents the possibility of significantly more advanced detection of fraud and criminal activities.
We know that organisational silos often reduce the effectiveness of security systems, so businesses must be aware that the potential effectiveness of Big Data style analysis can also be diluted unless these issues are addressed.
At the very least, Big Data could result in far more practical and successful SIEM, IDS and IPS implementations.
Data collection and storage
The ability to collect information from multiple dimensions of the organisation is essential to provide visibility across the infrastructure and to ensure that there are no gaps in protection. This should include perimeter security controls such as antivirus and firewalls, all endpoints and every system connected to the network, including custom applications, embedded systems, removable media and physical access control records. For incident response and forensic purposes, all information should be encrypted, compressed, time stamped and stored in a secure archive. This will also enable the organisation to comply with the data retention requirements of the regulations and industry standards that apply to them.
Big data analytics
The sheer volume of the data requires that the system is integrated, scalable and extensible, with all processes highly automated. Early SIEM and log management systems were criticised for their inability to effectively analyse all the data collected, as many sources were stored in isolation and involved too many manual processes. What is required is big data analytics capabilities that provide advanced data aggregation, event correlation and pattern recognition across all dimensions of the big data sets
collected using techniques that include statistical and heuristic analysis. It is necessary that the system performs continuous monitoring on a real time basis in order to be able to detect threats as they occur and that all the information is stored in a secure repository for use in forensic investigations to find the root cause of events that have occurred.
Behavioural analysis
The system should include integrated behavioural analytical capabilities that can automatically establish what constitutes expected and accepted behaviour for all systems, devices and users connected to the network—a process that all too often requires manual intervention in many first-generation SIEM and log management systems. Accepted behaviour for all those systems can then be whitelisted so that unexpected or suspicious behaviour can be flagged and alerted so that remediation steps can be taken. This also means that known good behaviour can be eliminated from any forensic review that is required.
Integrity monitoring
To ensure that internal threats are countered, such as changes made to files or configurations that could introduce vulnerabilities, organisations should look for a security intelligence platform with integrated file integrity and change management capabilities. Using behavioural analytics, multiple disparate data sets can be combined to look for behavioural patterns and risk factors that can provide indications of when and where advanced attacks have occurred so that remediation can be taken faster, focused on the highest priority events that have been uncovered.
Threat intelligence feeds
To turn log and event feeds into actionable security intelligence that can drive automated remediation, intelligence feeds should be included from other sources that include vulnerability data, identity and access management events, asset classification information, metadata, geolocation information and real-time threat intelligence feeds garnered from a variety of sources. Making sense of this information and its dependencies requires advanced correlation and pattern recognition capabilities that can uncover all data patterns and associate them with particular users and devices.
Real time, continuous monitoring
In early systems, much of the information that was uncovered through analysis and correlation would show events that had occurred for forensic investigation. However, whilst this is still a key requirement, this is insufficient for countering the dynamic, advanced threats seen today. Rather, the threat of a breach occurring that exposes sensitive information requires that all information is analysed and correlated in real time. This is only possible if the system provides continuous, real time protective monitoring of all activity, including network and host connections, user access events and behaviour, removable media activity, and processes and services that are running on all systems connected to the network. The types of activity that should be continuously monitored in real time are shown in Appendix 1 6
Unified management platform
One further criticism of early SIEM and log management systems was that they were difficult to manage and use. To ease management tasks, organisations should look for a system that combines the capabilities described above into one integrated security intelligence platform, accessed through one central console that provides an intuitive user interface to wizard-driven processes. This will provide organisations with a single, consolidated view across events occurring in all parts of the network and will allow them to investigate those events in context. That console should provide access to easy-to-understand reports related to security, compliance and operational issues throughout the entire technology stack of the network.
Many businesses already use Big Data for marketing and research, yet may not have the fundamentals right, particularly from a security perspective. As with all new technologies, security seems to be an afterthought at best. Big Data breaches will be big too, with the potential for even more serious reputational damage and legal repercussions than at present.
A growing number of companies are using the technology to store and analyse petabytes of data including web logs, click stream data and social media content to gain better insights about their customers and their business.
As a result, information classification becomes even more critical; and information ownership must be addressed to facilitate any reasonable classification. Most organisations already struggle with implementing these concepts, making this a significant challenge. We will need to identify owners for the outputs of Big Data processes as well as the raw data. Thus Data Ownership will be distinct from Information Ownership, perhaps with IT owning the raw data and business units taking responsibility for the outputs.
Very few organisations are likely to build a Big Data environment in-house, so cloud and Big Data will be inextricably linked. As many businesses are aware, storing data in the cloud does not remove their responsibility for protecting it - from both a regulatory and a commercial perspective.
Techniques such as Attribute Based Encryption may be necessary to protect sensitive data and apply access controls (being attributes of the data itself, rather than the environment in which it is stored). Many of these concepts are foreign to businesses today.