SlideShare a Scribd company logo
1 of 49
Practical
Machine Learning
in Infosec
who are you?
2
whoami?
3
clarence chio (@cchio)
https://www.meetup.com/Data
-Mining-for-Cyber-Security/
https://www.youtube.com/wat
ch?v=JAGDpJFFM2A
4
anto joseph (@antojosep007)
whoami?
Agenda
5
6
Python toolKits
7
● Scikit-Learn - Python library that implements a range
of machine learning algos and helpers
● TensorFlow - library for numerical computation using
data flow graphs / deep learning
scikit-learn
8
● easy-to-use, general-purpose toolbox for machine
learning in Python.
● supervised and unsupervised machine learning
techniques.
● Utilities for common tasks such as model selection,
feature extraction, and feature selection
● Built on NumPy, SciPy, and matplotlib
● Open source, commercially usable - BSD license
Tensorflow
9
● Open source
● By Google
● used for both research and production
● Used widely for deep learning/neural nets
○ But not restricted to just deep models
● Multiple GPU Support
Data science libs
10
Basic terms
11
Classifier
"An algorithm that implements classification, especially in a concrete
implementation, is known as a classifier. The term "classifier" sometimes
also refers to the mathematical function, implemented by a classification
algorithm, that maps input data to a category."
Model
Linear regression algorithm is a technique to fit points to a line y = m
x+c. Now after fitting, you get for example, y = 10 x + 4. This a model. A
model is something to which when you give an input, gives an output. In ML,
any 'object' created after training from an ML algorithm is a model.
Linear Regression
Fitting a linear relationship b/w two quantitative variables
Cross validation
12
Confusion matrix
13
MACHINE LEARNING 101
14
Types of machine learning use cases:
● Regression
● Classification
● Anomaly detection
● Recommendation
won’t cover here, but check out this talk
This covers EVERYTHING.(almost)
supervised
unsupervised
15
Regression
Regression
● regression = finding relationships between variables
Training data
Regression learning
algorithm
Regression
model/function
Size of
population
Profit
16
Linear Regression
17
regression line
2d linear
regression
Polynomial Regression
18
regression line
Model optimization - Gradient descent
19
success
Model optimization - Gradient descent
20
failure
21
Hands-on
Linear/Logistic Regression
● /home/ml/Desktop/intro/00-linear-regression.ipynb
● /home/ml/Desktop/intro/01-logistic-regression.ipynb
22
Anomaly Detection
Anomaly detection
23
Anomaly detection
● Outliers vs. novelties
○ novelties: unobserved pattern in new observations not included in
training data
● Simple statistics/forecasting methods
○ Exponential smoothing, Holt-Winters algorithm
● Machine learning methods
○ Elliptical envelope, density-based, clustering, SVM
24
25
Hands-on
Anomaly Detection
● /home/ml/Desktop/intro/02-anomaly-detection.ipynb
● /home/ml/Desktop/anomaly/anomaly-detection-eg.ipynb
26
Classification
27
labeled data - do you have it?
Classification
28
Classification
supervised
learning
unsupervised
learning
(semi-supervised learning)
yes! lots! no :(
only alittle bit
Supervised classification
● Many different algorithms!
● We will go through five:
○ Naive Bayes
○ K-nearest neighbors
○ Support Vector Machines
○ Decision Trees
29
Bayes Theorem
30
the probability of an event
happening is based on prior
knowledge of conditions that
might be related to the event
Naive Bayes classifier
31
● SUPERVISED LEARNING
32
classifying spam
HANDS ON
The dataset: 2007 TREC Public Spam Corpus
33
http://plg.uwaterloo.ca/~gvcormac/treccorpus07/
add info about dataset
Multiclass classification
34
2 ways to do it:
● 1-vs-rest
○ 1 binary classifier per class
● 1-vs-1
○ 1 binary classifier per pair of classes
○ K*(K−1)/2 classifiers for a K-class problem
K-Nearest Neighbors classifier (kNN)
35
● SUPERVISED LEARNING
Support Vector Machines (SVM)
36
● /home/ml/Desktop/intro/03-svm.ipynb
Decision Tree classifier
37
● SUPERVISED LEARNING
visualization
Unsupervised classification
38
● Mainly refers to clustering
● Four types:
○ Centroid: K-Means
○ Distribution: Gaussian mixture models
○ Density: DBSCAN
○ Connectivity: Hierarchical clustering
K-Means clustering
39
● UNSUPERVISED LEARNING
● /home/ml/Desktop/intro/04-kmeans-pca.ipynb
SO MANY ALGORITHMS.
HOW TO PICK.????????
40
41
42
XSS detectionHANDS ON
● /home/ml/Desktop/waf
43
classifying packetsHANDS ON
● /home/ml/Desktop/network/kdd-packet-classification.ipynb
dataset
44
classifying malware(with static PE features)
HANDS ON
● /home/ml/Desktop/malware
Portable executable (PE)
45
PointerToRawData:
0x2000
PointerToRelocations: 0x0
PointerToLinenumbers: 0x0
NumberOfRelocations: 0x0
NumberOfLinenumbers:
0x0
Characteristics:
0xC0000040
Flags: MEM_WRITE,
CNT_INITIALIZED_DATA,
MEM_READ
Entropy: 7.980693 (Min=0.0,
Max=8.0)
[IMAGE_SECTION_HEADER]
Name: NicolasB
Misc: 0x1000
Misc_PhysicalAddress:
0x1000
Misc_VirtualSize: 0x1000
VirtualAddress:
0x47000
SizeOfRawData:
0xEFEFADFF
PointerToRawData:
0x47000
PointerToRelocations: 0x0
PointerToLinenumbers: 0x0
...
----------PE Sections----------
[IMAGE_SECTION_HEADER]
Name: CODE
Misc: 0x1000
Misc_PhysicalAddress:
0x1000
Misc_VirtualSize: 0x1000
VirtualAddress: 0x1000
SizeOfRawData: 0x1000
PointerToRawData: 0x1000
PointerToRelocations: 0x0
PointerToLinenumbers: 0x0
NumberOfRelocations: 0x0
NumberOfLinenumbers: 0x0
Characteristics:
0xE0000020
Flags: MEM_WRITE, CNT_CODE,
MEM_EXECUTE, MEM_READ
Entropy: 0.061089 (Min=0.0,
Max=8.0)
[IMAGE_SECTION_HEADER]
Name: DATA
Misc: 0x45000
Misc_PhysicalAddress:
0x45000
Misc_VirtualSize: 0x45000
VirtualAddress: 0x2000
SizeOfRawData: 0x45000
----------FILE_HEADER----------
[IMAGE_FILE_HEADER]
Machine: 0x14C
NumberOfSections: 0x4
TimeDateStamp: 0x851C3163
[INVALID TIME]
PointerToSymbolTable:
0x74726144
NumberOfSymbols: 0x455068
SizeOfOptionalHeader: 0xE0
Characteristics: 0x818F
----------OPTIONAL_HEADER----------
[IMAGE_OPTIONAL_HEADER]
Magic: 0x10B
MajorLinkerVersion: 0x2
MinorLinkerVersion: 0x19
SizeOfCode: 0x200
SizeOfInitializedData: 0x45400
SizeOfUninitializedData: 0x0
AddressOfEntryPoint: 0x2000
BaseOfCode: 0x1000
BaseOfData: 0x2000
ImageBase: 0xDE0000
SectionAlignment: 0x1000
FileAlignment: 0x1000
MajorOperatingSystemVersion: 0x1
MinorOperatingSystemVersion: 0x0
----------Parsing Warnings----------
Suspicious NumberOfRvaAndSizes in
the Optional Header. Normal values are
never larger than 0x10, the value is:
0xdfffddde
Error parsing section 2. SizeOfRawData
is larger than file.
----------DOS_HEADER----------
[IMAGE_DOS_HEADER]
e_magic: 0x5A4D
e_cblp: 0x50
e_cp: 0x2
----------NT_HEADERS----------
[IMAGE_NT_HEADERS]
Signature: 0x4550
pefile dump
46
PE feature vector
47
Name|md5|Machine|SizeOfOptionalHeader|Characteristics|MajorLinkerVersion|MinorLinkerVersion|SizeOfCode|SizeOfIniti
alizedData|SizeOfUninitializedData|AddressOfEntryPoint|BaseOfCode|BaseOfData|ImageBase|SectionAlignment|FileAlignm
ent|MajorOperatingSystemVersion|MinorOperatingSystemVersion|MajorImageVersion|MinorImageVersion|MajorSubsystemVers
ion|MinorSubsystemVersion|SizeOfImage|SizeOfHeaders|CheckSum|Subsystem|DllCharacteristics|SizeOfStackReserve|SizeO
fStackCommit|SizeOfHeapReserve|SizeOfHeapCommit|LoaderFlags|NumberOfRvaAndSizes|SectionsNb|SectionsMeanEntropy|Sec
tionsMinEntropy|SectionsMaxEntropy|SectionsMeanRawsize|SectionsMinRawsize|SectionMaxRawsize|SectionsMeanVirtualsiz
e|SectionsMinVirtualsize|SectionMaxVirtualsize|ImportsNbDLL|ImportsNb|ImportsNbOrdinal|ExportNb|ResourcesNb|Resour
cesMeanEntropy|ResourcesMinEntropy|ResourcesMaxEntropy|ResourcesMeanSize|ResourcesMinSize|ResourcesMaxSize|LoadCon
figurationSize|VersionInformationSize|legitimate
legitimate:
memtest.exe|631ea355665f28d4707448e442fbf5b8|332|224|258|9|0|361984|115712|0|6135|4096|372736|4194304|4096|512|0|0
|0|0|1|0|1036288|1024|485887|16|1024|1048576|4096|1048576|4096|0|16|8|5.7668065537|3.60742957555|7.22105072892|597
12.0|1024|325120|126875.875|896|551848|0|0|0|0|4|3.26282271103|2.56884382364|3.53793936419|8797.0|216|18032|0|16|1
malware:
VirusShare_76c2574c22b44f69e3ed519d36bd8dff|76c2574c22b44f69e3ed519d36bd8dff|332|224|258|10|0|28672|445952|16896|1
4819|4096|32768|4194304|4096|512|5|0|6|0|5|0|3977216|1024|680384|2|34112|1048576|4096|1048576|4096|0|16|6|2.650641
84009|0.0|6.49788465186|30634.6666667|0|139264|661773.333333|3978|3362816|8|172|1|0|21|3.42072662405|1.86523352037
|7.9688495098|6558.42857143|180|67624|0|0|0
Thank you!
48
@cchio
cchio@cs.stanford.edu
@antojosep007
antojoseph007@gmail.com
THANKS
@cchio
sign up for updates!
mlsec@cs.stanford.edu
https://www.amazon.com/Machine-Learning-Security-
Protecting-Algorithms/dp/1491979909

More Related Content

Similar to Практическое применение машинного обучения в ИБ

Similar to Практическое применение машинного обучения в ИБ (20)

How Can Machine Learning Help Your Research Forward?
How Can Machine Learning Help Your Research Forward?How Can Machine Learning Help Your Research Forward?
How Can Machine Learning Help Your Research Forward?
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
earning by s/doing/h4ck1ng/ - Our experience learning application security th...
earning by s/doing/h4ck1ng/ - Our experience learning application security th...earning by s/doing/h4ck1ng/ - Our experience learning application security th...
earning by s/doing/h4ck1ng/ - Our experience learning application security th...
 
Presentation of glpi project, OW2con'19, June 12-13, Paris.
Presentation of glpi project, OW2con'19, June 12-13, Paris. Presentation of glpi project, OW2con'19, June 12-13, Paris.
Presentation of glpi project, OW2con'19, June 12-13, Paris.
 
Sci computing using python
Sci computing using pythonSci computing using python
Sci computing using python
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi Data science unit 1 By: Professor Lili Saghafi
Data science unit 1 By: Professor Lili Saghafi
 
Overview of Machine Learning and its Applications
Overview of Machine Learning and its ApplicationsOverview of Machine Learning and its Applications
Overview of Machine Learning and its Applications
 
Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care
 
Large Data Analyze With PyTables
Large Data Analyze With PyTablesLarge Data Analyze With PyTables
Large Data Analyze With PyTables
 
PyTables
PyTablesPyTables
PyTables
 
Py tables
Py tablesPy tables
Py tables
 
Linux Security for Developers
Linux Security for DevelopersLinux Security for Developers
Linux Security for Developers
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
Overview of data programming: easing the bottleneck of supervised machine lea...
Overview of data programming: easing the bottleneck of supervised machine lea...Overview of data programming: easing the bottleneck of supervised machine lea...
Overview of data programming: easing the bottleneck of supervised machine lea...
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case StudiesIoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
IoT Malware: Comprehensive Survey, Analysis Framework and Case Studies
 

More from Positive Hack Days

Мастер-класс «Трущобы Application Security»
Мастер-класс «Трущобы Application Security»Мастер-класс «Трущобы Application Security»
Мастер-класс «Трущобы Application Security»
Positive Hack Days
 
Эвристические методы защиты приложений
Эвристические методы защиты приложенийЭвристические методы защиты приложений
Эвристические методы защиты приложений
Positive Hack Days
 
Уязвимое Android-приложение: N проверенных способов наступить на грабли
Уязвимое Android-приложение: N проверенных способов наступить на граблиУязвимое Android-приложение: N проверенных способов наступить на грабли
Уязвимое Android-приложение: N проверенных способов наступить на грабли
Positive Hack Days
 
Механизмы предотвращения атак в ASP.NET Core
Механизмы предотвращения атак в ASP.NET CoreМеханизмы предотвращения атак в ASP.NET Core
Механизмы предотвращения атак в ASP.NET Core
Positive Hack Days
 

More from Positive Hack Days (20)

Инструмент ChangelogBuilder для автоматической подготовки Release Notes
Инструмент ChangelogBuilder для автоматической подготовки Release NotesИнструмент ChangelogBuilder для автоматической подготовки Release Notes
Инструмент ChangelogBuilder для автоматической подготовки Release Notes
 
Как мы собираем проекты в выделенном окружении в Windows Docker
Как мы собираем проекты в выделенном окружении в Windows DockerКак мы собираем проекты в выделенном окружении в Windows Docker
Как мы собираем проекты в выделенном окружении в Windows Docker
 
Типовая сборка и деплой продуктов в Positive Technologies
Типовая сборка и деплой продуктов в Positive TechnologiesТиповая сборка и деплой продуктов в Positive Technologies
Типовая сборка и деплой продуктов в Positive Technologies
 
Аналитика в проектах: TFS + Qlik
Аналитика в проектах: TFS + QlikАналитика в проектах: TFS + Qlik
Аналитика в проектах: TFS + Qlik
 
Использование анализатора кода SonarQube
Использование анализатора кода SonarQubeИспользование анализатора кода SonarQube
Использование анализатора кода SonarQube
 
Развитие сообщества Open DevOps Community
Развитие сообщества Open DevOps CommunityРазвитие сообщества Open DevOps Community
Развитие сообщества Open DevOps Community
 
Методика определения неиспользуемых ресурсов виртуальных машин и автоматизаци...
Методика определения неиспользуемых ресурсов виртуальных машин и автоматизаци...Методика определения неиспользуемых ресурсов виртуальных машин и автоматизаци...
Методика определения неиспользуемых ресурсов виртуальных машин и автоматизаци...
 
Автоматизация построения правил для Approof
Автоматизация построения правил для ApproofАвтоматизация построения правил для Approof
Автоматизация построения правил для Approof
 
Мастер-класс «Трущобы Application Security»
Мастер-класс «Трущобы Application Security»Мастер-класс «Трущобы Application Security»
Мастер-класс «Трущобы Application Security»
 
Формальные методы защиты приложений
Формальные методы защиты приложенийФормальные методы защиты приложений
Формальные методы защиты приложений
 
Эвристические методы защиты приложений
Эвристические методы защиты приложенийЭвристические методы защиты приложений
Эвристические методы защиты приложений
 
Теоретические основы Application Security
Теоретические основы Application SecurityТеоретические основы Application Security
Теоретические основы Application Security
 
От экспериментального программирования к промышленному: путь длиной в 10 лет
От экспериментального программирования к промышленному: путь длиной в 10 летОт экспериментального программирования к промышленному: путь длиной в 10 лет
От экспериментального программирования к промышленному: путь длиной в 10 лет
 
Уязвимое Android-приложение: N проверенных способов наступить на грабли
Уязвимое Android-приложение: N проверенных способов наступить на граблиУязвимое Android-приложение: N проверенных способов наступить на грабли
Уязвимое Android-приложение: N проверенных способов наступить на грабли
 
Требования по безопасности в архитектуре ПО
Требования по безопасности в архитектуре ПОТребования по безопасности в архитектуре ПО
Требования по безопасности в архитектуре ПО
 
Формальная верификация кода на языке Си
Формальная верификация кода на языке СиФормальная верификация кода на языке Си
Формальная верификация кода на языке Си
 
Механизмы предотвращения атак в ASP.NET Core
Механизмы предотвращения атак в ASP.NET CoreМеханизмы предотвращения атак в ASP.NET Core
Механизмы предотвращения атак в ASP.NET Core
 
SOC для КИИ: израильский опыт
SOC для КИИ: израильский опытSOC для КИИ: израильский опыт
SOC для КИИ: израильский опыт
 
Honeywell Industrial Cyber Security Lab & Services Center
Honeywell Industrial Cyber Security Lab & Services CenterHoneywell Industrial Cyber Security Lab & Services Center
Honeywell Industrial Cyber Security Lab & Services Center
 
Credential stuffing и брутфорс-атаки
Credential stuffing и брутфорс-атакиCredential stuffing и брутфорс-атаки
Credential stuffing и брутфорс-атаки
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Практическое применение машинного обучения в ИБ