5. Big Data
Four V’s of Big Data
• Volume
– Data Quality
• Velocity
– Data Speed
• Variety
– Data Types
• Veracity
– Messiness
6.
7.
8.
9. Data Science
• Data science is an interdisciplinary field about
processes and systems to extract knowledge
or insights from data in various forms, either
structured or unstructured.
• “Data Scientist (n.): Person who is better at
statistics than any software engineer and
better at software engineering than any
statistician.” -- Josh Willis, Cloudera.
11. Machine Learning
• “Machine learning systems automatically
learn programs from data” *
• You don’t really code the program, but it is
inferred from data.
• Intuition of trying to mimic the way the brain
learns: that’s where terms like artificial
intelligence come from.
* CACM 55(10) - A Few Useful Things to Know about Machine Learning
14. Security Applications of Machine Learning
• Fraud detection systems
– Is what he just did consistent with past behavior?
• Network anomaly detection
– More like statistical analysis.
• Predicting likelihood of attack actors
– Create different predictive models and chain them
to gain more confidence in each step.
• SPAM Filters
16. Machine Learning in InfoSec
• SIEM and Log Monitoring tools are just vertical
BI applications (from the 90’s)
• How many logs you think there are in your
organization?
17. Kinds of Network Security Monitoring
• Alert-based:
• “Traditional” log management
• SIEM
• Using “Threat Intelligence”
(i.e blacklists)
• Lack of context
• Low effectiveness
• You get the results handed
over to you
• Exploration-based:
• Network Forensics tools
• Elastic Search based LM
systems
• High effectiveness
• Lots of highly trained people
necessary
• Big Data Security Analytics:
• Run exploration-based monitoring on Hadoop
• More like Big Data Security Monitoring (BDSM)
18. MLSec Project
• Sign up, send logs, receive reports generated
by machine learning models!
• Working with several companies on trying out
these models on their environment with their
data
• Visit https://www.mlsecproject.org
19. How do I get started on this?
• Programming is a must (Python / R)
• Statistical knowledge keeps you from making
dumb mistakes
• Specific machine learning courses and books:
– Coursera (ML/ Data Analysis / Data Science)
• Practice, Practice, Practice: –
– Explore your data
– Security Onion
– Kaggle
20. Thank You
Most of the information is taken from
http://www.slideshare.net/AlexandrePinto10/