Powerful Google developer tools for immediate impact! (2023-24 C)
Keith J. Jones, Ph.D. - MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH RUNNING WINDOW ENTROPY AND MACHINE LEARNING
1. MALGAZER: AN AUTOMATED MALWARE CLASSIFIER WITH
RUNNING WINDOW ENTROPY AND MACHINE LEARNING
Student: Keith J. Jones, Chair: Dr. Yong Wang
Dakota State University - keith.j.jones@ieee.org
2. COMMITTEE
Dr. Yong Wang (Chair)
Dr. Jun Liu
Dr. Wayne Pauli
Dr. Joshua Pauli
Dr. Joshua Stroschein
3. OUTLINE
1. Introduction & Motivation
2. Literature Review
3. Research Methodology
4. Optimizing Running Window Entropy
5. Qualitative Data Exploration
6. Malgazer: An RWE-Based Malware Classifier
7. Comparison of Malgazer to Prior Literature
8. The Malgazer Web Application
9. Conclusions
4. 1. INTRODUCTION & MOTIVATION
Malware: 200,000+ new threats released into the internet each day,
and that number general grows each year (Panda Security, 2017)
Malware detection, with machine learning methods, claim efficacy
rates of 95% (Saxe & Berlin, 2015) or beyond (Cylance Inc. 2016)
Malware detection and classification often used interchangeably in
prior research.
Malware detection is a subset of the malware classification problem:
6. THE ML CLASSIFIER MODEL
The machine learning based model can be represented by
the set:
C: {T(), P(), MLM(), E(), TU(), PP()}
T() – Transformation function
P() – Preprocessing function
MLM() – The machine learning model
E() – Error measurement function
TU() – Tuning function
PP() – Post processing function
8. RUNNING WINDOW ENTROPY
This is a measurement that shows the localized entropy at each
offset within the malware.
I was able to optimize the original running window entropy algorithm
to less than 2% of the running time for the original algorithm, on
average.
Jones, K., & Wang, Y. (2018).
9. WHAT DOES R.W.E. GIVE US?
A view of “randomness”.
0 is not random, 1 is totally random.
A view of “information gain”.
A view of “relevancy”.
Running windows of entropy present the entropic
structure of a malware sample.
13. THE RESEARCH PROBLEM
Develop an automated machine learning based
classifier (Malgazer) using running window
entropy that will classify new malware samples
functionally. This problem will be solved using a
design science research methodology (Arnould,
Hevner, March, & Park, 2004).
Null Hypothesis: A classifier cannot be built to perform
better than random guessing (or in this case random
classifications).
14. SIX DSRM CONTRIBUTIONS
1. Model Artifact: The ML based classifier model.
2. Method Artifact: Optimization of R.W.E. Algorithm
3. Method Artifact: Collect and normalize functional classification
information from publicly available resources
4. Method Artifact: Scaling the training of machine learning classifiers
5. Instantiation Artifact: An actual ML based classifier.
6. Instantiation Artifact: A user application to access the classifier
15. 2. LITERATURE REVIEW
While reading, began categorizing the prior literature.
Malware Classification Method Categories:
Malware Detector:
Boot sectors
Dynamic analysis
Entropy statistics
Images from bytes and n-grams
Opcode statistics
PDF streams
Static analysis
Static analysis and opcodes
Static analysis and n-grams
Static and dynamic analysis
Structural Entropy
N-grams
16. 2. LITERATURE REVIEW
Malware Classification Method Categories:
Malware Classifier:
Byte code
Dynamic analysis
HMM
Images from bytes
Images from bytes and opcodes
Intermediate language of opcodes
Static analysis
Static analysis and images from bytes
Static analysis and opcodes
Static and dynamic analysis
Structural entropy
17. 2. LITERATURE REVIEW
Reviewed 71 prior methods
Spans 1996 through 2018
41 malware detectors
30 malware classifiers
Only 2 classifiers used features similar to running window entropy.
The first method only classifies samples if they were packed.
The second method only reported an accuracy of 87.2%. It also used
structural entropy.
One method, “GIST”, was implemented for deeper comparisons.
GIST = filters on a 2D image of the malware sample, 320 values
19. 3. RESEARCH METHODOLOGY
Comparison to prior literature
Confusion Matrices:
Per classification accuracy
ROC/AUC
Empirical comparison to GIST method.
Theoretical comparisons to other methods.
20. TRAINING THE MODEL
What training set?
Publicly available malware through VirusTotal.
Windows PE files
60,000 samples of classified and known malware, 10,000 samples from
each of 6 classifications:
Virus, Backdoor, Worm, Trojan, Ransom, PUA
What testing set?
Approximately 10-20% of the collected malware samples.
What input data? (features)
RWE
GIST
Scale the 200+ experiments using Terraform.
25. 4. OPTIMIZING R.W.E.
I was able to optimize the original running window entropy algorithm
to less than 2% of the running time for the original algorithm, on
average.
This topic was published in a peer reviewed conference, June 2018.
Jones, K., & Wang, Y. (2018).
29. 5. QUALITATIVE DATA EXPLORATION
Experimental data set was collected.
Approximately 25,190 publicly available samples from VirusTotal
The RWE was computed.
RWE was down sampled because malware binary size is a variable
The classifications were explored.
The unified functional classifications were determined:
Virus, Worm, Backdoor, Trojan, Ransom, PUA
The VirusTotal products were chosen:
Microsoft, Symantec, Kaspersky, Sophos, TrendMicro, ClamAV
Classifier experimentation.
Source the final malware data set.
32. 6. MALGAZER
Over 200 experiments
1 experiment was 1 training session.
Computed the features:
RWE
GIST
Scaled the computations
Terraform
Grid searching
Provided hyper-parameters
Cross 10-Fold validation scoring
Provided average accuracy statistics
Final Training
Provided ROC/AUC and an actual classifier
34. 7. MALGAZER COMPARISON
The Malgazer classifier slightly outperformed the GIST classifier,
therefore they are at a minimum comparable for classification
accuracy.
Using this comparison, it is possible to theoretically compare
Malgazer to other prior literature.
Malgazer was comparable to the prior literature reviewed in this
dissertation.
Additional observations were also made…
38. THE MALGAZER CLASSIFIER “C”
CATEGORY: Malware Classifier – Running Window Entropy
EFFICACY: accuracy of approximately 95%
TRAINING DATA: 60,000 malware samples and functional classifications from
VirusTotal, where there were six functional classifications developed from the
publicly available detection information: Virus, Trojan, Worm, PUA, Ransom, and
Backdoor
TRANSFORMATION FUNCTION T(): The RWE data
PRE-PROCESSING FUNCTION P(): The re-sampled RWE to a fixed sample size
MACHINE LEARNING MODEL FUNCTION MLM():
Adaboost/DT
Adaboost/RF
Artificial neural networks
ERROR FUNCTION E(): Confusion matrices
TUNING FUNCTION TU(): Window and data point sampling sizes for running
window entropy.
POST PROCESSING FUNCTION PP(): Not discussed.
44. 9. CONCLUSIONS
The Six Research Contributions
The generalized classification model
RWE optimization
Collection and normalization of functional classification information
Scaling the computations
The most accurate classifier (slightly more than GIST)
The web application
Recommendations
RWE is slow, but if you are already using it, this could be a great method for
automated classification. Otherwise, GIST is faster and provides about the
same accuracy.
Future research opportunities
Continue generalized model application
Functional classification determination (Problematic Trojan?)
Optimal RWE window size and data sample length