SlideShare a Scribd company logo
1 of 11
Large-scale Log Analysis
David Stubley, CEO 7 Elements
Intro
Obligatory Definition
"Big data is a term for data sets that are
so large or complex that traditional
data processing applications are
inadequate."
https://en.wikipedia.org/wiki/Big_data
Known Unknowns
redirect:$%7B%23a%3d(new%20java.lang.ProcessBuilder(new%20
java.lang.String%5B%5D%20%7B'whoami'%7D)).start(),%23b%3d%
23a.getInputStream(),%23c%3dnew%20java.io.InputStreamReade
r%20(%23b),%23d%3dnew%20java.io.BufferedReader(%23c),%23e%
3dnew%20char%5B50000%5D,%23d.read(%23e),%23matt%3d%20%23co
ntext.get('com.opensymphony.xwork2.dispatcher.HttpServletR
esponse'),%23matt.getWriter().println%20(%23e),%23matt.get
Writer().flush(),%23matt.getWriter().close()%7D=
Size Matters!
"Big data is a term for data
sets that are so large or
complex that traditional
data processing applications
are inadequate."
A New Hope?
"Big data is a term for data
sets that are so large or
complex that traditional
data processing
applications are
inadequate."
The Result?
Word of Caution
HAL 9000: "I'm sorry Dave, I'm afraid I can't do that"
Questions
?
www.7elements.co.uk

More Related Content

Viewers also liked

Viewers also liked (6)

The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Security and Audit for Big Data
Security and Audit for Big DataSecurity and Audit for Big Data
Security and Audit for Big Data
 
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service DeploymentEnd-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Big Data Security and Governance
Big Data Security and GovernanceBig Data Security and Governance
Big Data Security and Governance
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 

Similar to Big Data and Cyber Security

MicroManager_MATLAB_Implementation
MicroManager_MATLAB_ImplementationMicroManager_MATLAB_Implementation
MicroManager_MATLAB_Implementation
Philip Mohun
 
Get Back in Control of Your SQL with jOOQ at #Java2Days
Get Back in Control of Your SQL with jOOQ at #Java2DaysGet Back in Control of Your SQL with jOOQ at #Java2Days
Get Back in Control of Your SQL with jOOQ at #Java2Days
Lukas Eder
 

Similar to Big Data and Cyber Security (20)

Advance MySQL Docstore Features
Advance MySQL Docstore FeaturesAdvance MySQL Docstore Features
Advance MySQL Docstore Features
 
Mres presentation
Mres presentationMres presentation
Mres presentation
 
MicroManager_MATLAB_Implementation
MicroManager_MATLAB_ImplementationMicroManager_MATLAB_Implementation
MicroManager_MATLAB_Implementation
 
Java one 2010
Java one 2010Java one 2010
Java one 2010
 
Automatic Fine-Grained Issue Report Reclassification
Automatic Fine-Grained Issue Report ReclassificationAutomatic Fine-Grained Issue Report Reclassification
Automatic Fine-Grained Issue Report Reclassification
 
sun solaris
sun solarissun solaris
sun solaris
 
Vaadin & Web Components
Vaadin & Web ComponentsVaadin & Web Components
Vaadin & Web Components
 
Whatever it takes - Fixing SQLIA and XSS in the process
Whatever it takes - Fixing SQLIA and XSS in the processWhatever it takes - Fixing SQLIA and XSS in the process
Whatever it takes - Fixing SQLIA and XSS in the process
 
HATEOAS 101 - Opinionated Introduction to a REST API Style
HATEOAS 101 - Opinionated Introduction to a REST API StyleHATEOAS 101 - Opinionated Introduction to a REST API Style
HATEOAS 101 - Opinionated Introduction to a REST API Style
 
Citicorp Presentation
Citicorp PresentationCiticorp Presentation
Citicorp Presentation
 
Dropwizard Introduction
Dropwizard IntroductionDropwizard Introduction
Dropwizard Introduction
 
Vaadin 7 CN
Vaadin 7 CNVaadin 7 CN
Vaadin 7 CN
 
Agile Data Science 2.0 - Big Data Science Meetup
Agile Data Science 2.0 - Big Data Science MeetupAgile Data Science 2.0 - Big Data Science Meetup
Agile Data Science 2.0 - Big Data Science Meetup
 
Get Back in Control of your SQL
Get Back in Control of your SQLGet Back in Control of your SQL
Get Back in Control of your SQL
 
Get Back in Control of Your SQL with jOOQ at #Java2Days
Get Back in Control of Your SQL with jOOQ at #Java2DaysGet Back in Control of Your SQL with jOOQ at #Java2Days
Get Back in Control of Your SQL with jOOQ at #Java2Days
 
20190627 j hipster-conf- diary of a java dev lost in the .net world
20190627   j hipster-conf- diary of a java dev lost in the .net world20190627   j hipster-conf- diary of a java dev lost in the .net world
20190627 j hipster-conf- diary of a java dev lost in the .net world
 
手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務手把手教你如何串接 Log 到各種網路服務
手把手教你如何串接 Log 到各種網路服務
 
Css2011 Ruo Ando
Css2011 Ruo AndoCss2011 Ruo Ando
Css2011 Ruo Ando
 
ESRI Dev Meetup: Building Distributed JavaScript Map Widgets
ESRI Dev Meetup: Building Distributed JavaScript Map WidgetsESRI Dev Meetup: Building Distributed JavaScript Map Widgets
ESRI Dev Meetup: Building Distributed JavaScript Map Widgets
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
 

More from Napier University

More from Napier University (20)

Intrusion Detection Systems
Intrusion Detection SystemsIntrusion Detection Systems
Intrusion Detection Systems
 
Networks
NetworksNetworks
Networks
 
Memory, Big Data and SIEM
Memory, Big Data and SIEMMemory, Big Data and SIEM
Memory, Big Data and SIEM
 
What is Cyber Data?
What is Cyber Data?What is Cyber Data?
What is Cyber Data?
 
Open Source Intelligence
Open Source IntelligenceOpen Source Intelligence
Open Source Intelligence
 
10. Data to Information: NumPy and Pandas
10. Data to Information: NumPy and Pandas10. Data to Information: NumPy and Pandas
10. Data to Information: NumPy and Pandas
 
2. Defence Systems
2. Defence Systems2. Defence Systems
2. Defence Systems
 
1. Cyber and Intelligence
1. Cyber and Intelligence1. Cyber and Intelligence
1. Cyber and Intelligence
 
The Road Ahead for Ripple, Marjan Delatinne
The Road Ahead for Ripple, Marjan DelatinneThe Road Ahead for Ripple, Marjan Delatinne
The Road Ahead for Ripple, Marjan Delatinne
 
Delivering The Tel Aviv Stock Exchange Securities, Duncan Johnston-Watt
 Delivering The Tel Aviv Stock Exchange Securities, Duncan Johnston-Watt Delivering The Tel Aviv Stock Exchange Securities, Duncan Johnston-Watt
Delivering The Tel Aviv Stock Exchange Securities, Duncan Johnston-Watt
 
ARTiFACTS, Emma Boswood
ARTiFACTS, Emma BoswoodARTiFACTS, Emma Boswood
ARTiFACTS, Emma Boswood
 
RMIT Blockchain Innovation Hub, Chris Berg
RMIT Blockchain Innovation Hub, Chris BergRMIT Blockchain Innovation Hub, Chris Berg
RMIT Blockchain Innovation Hub, Chris Berg
 
Keynote, Naseem Naqvi
Keynote, Naseem Naqvi Keynote, Naseem Naqvi
Keynote, Naseem Naqvi
 
Browser-based Crypto M, C. F Mondschein
Browser-based Crypto M, C. F MondscheinBrowser-based Crypto M, C. F Mondschein
Browser-based Crypto M, C. F Mondschein
 
Should we transform or adapt to blockchain - a public sector perspective?, Al...
Should we transform or adapt to blockchain - a public sector perspective?, Al...Should we transform or adapt to blockchain - a public sector perspective?, Al...
Should we transform or adapt to blockchain - a public sector perspective?, Al...
 
IoT device attestation system using blockchain, Alistair Duke
IoT device attestation system using blockchain, Alistair DukeIoT device attestation system using blockchain, Alistair Duke
IoT device attestation system using blockchain, Alistair Duke
 
Robust Programming of Smart Contracts in Solidity+, RK Shyamasundar
Robust Programming of Smart Contracts in Solidity+, RK ShyamasundarRobust Programming of Smart Contracts in Solidity+, RK Shyamasundar
Robust Programming of Smart Contracts in Solidity+, RK Shyamasundar
 
Using Blockchain for Evidence Purpose, Rafael Prabucki
Using Blockchain for Evidence Purpose, Rafael PrabuckiUsing Blockchain for Evidence Purpose, Rafael Prabucki
Using Blockchain for Evidence Purpose, Rafael Prabucki
 
Cryptocurrencies and cyberlaundering- the need for regulation, Gian Marco Bov...
Cryptocurrencies and cyberlaundering- the need for regulation, Gian Marco Bov...Cryptocurrencies and cyberlaundering- the need for regulation, Gian Marco Bov...
Cryptocurrencies and cyberlaundering- the need for regulation, Gian Marco Bov...
 
Emerging Regulatory Approaches to Blockchain-based Token Economy, Agata Fereirra
Emerging Regulatory Approaches to Blockchain-based Token Economy, Agata FereirraEmerging Regulatory Approaches to Blockchain-based Token Economy, Agata Fereirra
Emerging Regulatory Approaches to Blockchain-based Token Economy, Agata Fereirra
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Big Data and Cyber Security

Editor's Notes

  1. Big data – with spheres containing pictures of the main themes. Just as an intro to what we are going to cover and that today our talk on big data is brought to you by Leo Tolstoy, Donald Rumsfeld, 2001 A space odyssey, star wars and the Eiffel tower. Fast paced, non-sales led overview of how we approach large-scale log analysis at 7E as part of incident response.
  2. Being a conference on Big Data and as my talk is at least the sixth of the day, I thought we may need to actually define what big data is! For me, the key points are highlighted in bold, we are talking about large and or complex data sets where traditional approaches to data processing are no longer adequate. In this talk we are going to cover how large-scale log analysis can quickly generate data sets that are so large that current approaches are inadequate and explore how a blended approach of technology and human problem solving is required to find the needle in the haystack. What we are not talking about is replacing people with algorithms / machine learning or magic. Also a key point is that big data is not a panacea for all ills within the security world, but when leveraged in the right way with skilled technical resource, the use of big data will enable our community to make sense of the ever evolving and complicated threat landscape that we now face.
  3. So, this talk is based on real world experience of dealing with large amounts of log data. For context, we are going to use an incident that the 7E team worked on that involved a new exploit with no known attack signatures. So in the words of Donald Rumsfeld, “because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know. If it was a known know, then we could expect IPS / WAF platforms to provide protection and alerting and thus analysis of those alerts. However, in the event of a breach that doesn’t match known signatures, then we are moving in to the realm of known unknowns and having to search available logs for evidence of malicious activity. I’m not even going to get in to a discussion about unknown unknowns….. So for now, lets get back to logs, specifically apache log files and lots of them.
  4. What do we actually mean when we talk about large and complex logs? Incident engagement, enterprise grade three tier architecture model, split over multiple load balancers serving traffic to eight application servers and supporting backend databases. Compromise of the main application layer. As part of the incident response we obtained 3TB of log files that covered the previous three months of log history for the eight servers. All logs split by host and further split through log rotation. Today, 3TB of data doesn’t sound that much, especially as you can now buy 3TB external drives for £70. So what is 3TB of log data in real terms? Total number of words = three hundred thirty nine billion, three hundred eighty eight million, three hundred thirteen thousand, four hundred twelve. Total number of lines = twenty six billion, eight hundred million, five hundred ninety five thousand, nine hundred twenty seven. Enough words to fill five hundred seventy seven thousand, eight hundred ninety one copies of Leo Tolstoy’s War and Peace. Which if you stacked those books would be be the height of 117 Eifel Towers.
  5. Clearly that amount of data was going to require more than notepad or a spreadsheet! And as any discerning Jedi knows, you don’t just rely on technology – the failure of the death star is a prime case in point. We need a blend of human skills supported by technology. So we built an easily extensible python controller using a RESTful JSON-based API to take advantage of Elasticsearch and Logstash. Combined with Kibana to provide a graphical interface for manual interrogation of log data.
  6. Key to large scale log analysis: Ability to extract out ‘known knowns’. This is done through verified attack signatures. Automatically generate evidence files. Access to raw logs for manual analysis and verification. Both command line and visual through kabana. Ability to update signatures with new attack patterns and repeat analysis. Timelines, suspect IPs. All traffic from a suspect IP is extracted for further targeted analysis. Human is in full control over the analysis, this includes choice of signatures.
  7. Finding new attacks that automated tools along would not have found! In our reference case: Identification of an exploit kit being used to compromise enterprise Established the timeline of the attack, occurred further back than initially thought Identification of compromised hosts and backdoor scripts in use by the attackers, allowing for wider internal search of systems for evidence of compromise. As a second example: Manual analysis of 1m lines of log data for evidence of compromise through a new exploit. Manually the effort took around four man days of effort. Using the platform, 2.5 seconds to identify the nine relevant entries within the logs showing a successful compromise of the server.
  8. Positioning slide to talk about the need to be able to drive tools and interrogate the data. Do not be driven by the output of the tool! Also talk about being able to do critical analysis – “So what” Use example of analyst finding shellshock. Skill set of front line needs to increase, not just restricted to research.