Publicité

Zero day malware detection

2 Nov 2018
Publicité

Contenu connexe

Présentations pour vous(20)

Similaire à Zero day malware detection(20)

Publicité

Zero day malware detection

  1. Presented by Sujeesh kumar j S7 CSE 1
  2. What is a malware ? Different malware analysis techniques. What’s wrong with those techniques. What’s this paper about ? Proposed malware classification system. Evaluation and validation. Experimental result analysis. Comparing accuracy of classifiers BFS and AFS. Comparing of model building time BFS and AFS. 2
  3. A software program that purposefully fulfils the harmful intent of an attacker is usually known as malicious software or malware. 3
  4. Commonly used different malware analysis are:  Fully-automatic analysis.  Static properties analysis.  Dynamic properties analysis.  Manual code reversing. 4
  5. The suspicious program is scanned with fully-automated tools.  These tools are able to quickly assess what a malware is capable of if it infiltrated the system. Even though a fully-automated analysis does not provide as much information as an analyst, it is still the fastest method to sift through large quantities of malware. 5
  6. The static properties include hashes, embedded strings, embedded resources, and header information.The properties should be able to show elementary indicators of compromise. 6
  7. To observe a malicious file, it might often times be put in an isolated laboratory to see if it directly infects the laboratory. Analysts will frequently monitor these laboratories to see if the malicious file tries to attach to any hosts. With this information, the analyst will then be able to replicate the situation. 7
  8. Reversing the code of the malicious file can decode encrypted data that was stored by the sample, and see other capabilities of the file that did not show up during the behavioral analysis. In order to manually reverse the code, malware analysis tools such as a debugger and disassembler are needed. 8
  9.  The main problem with these techniques are:  High false positive and false negative rates.  The process of building a classification model takes time which hinders the early detection of malware. 9
  10. This paper presents a system that addresses both the issues mentioned before. It uses an integration of both static and dynamic analysis features of malware binaries incorporated with machine learning process for detecting zero-day malware. 10
  11. Due to pros and cons of the techniques mentioned before, it is obvious that a relevant of features needs to be selected so that the classification model can be built in less time with high accuracy. 11
  12. Feature selection is a method of identifying top ranked features. It detects the relevant features thus making it easy to discard the irrelevant ones. A perfect selection of features can improve the learning speed as well as generalization capacity of the model. 12
  13. 13
  14. A large corpus of malicious samples are collected and then scanned using AVG AV to endorse their maliciousness. The clean files used are collected manually from system directories of successive versions of the respective operating system. 14
  15. All the collected specimen are then made to execute in an automated analysis environment using a modified version of Cuckoo sandbox. The system is configured to generate the analysis reports in JSON format after executing a specimen in it. 15
  16. The JSON reports are then parsed to obtain the various malware features including both static and dynamic features. The dataset so obtained contains very large number of features and is not suitable for building the classification model. This data is prepared to have a reduced set of malware attributes which can be used for building the classification model. 16
  17. Building a classification model from the training data is time consuming task . So, the top ranked features are selected from this reduced data set using Information Gain (IG) method. 17
  18. The selected features are then used to build the classification model using ML algorithms. These classifiers are used for distinguishing malicious files from benign ones. The model build time is observed while conducting the experiments using both the datasets i.e. BFS and AFS. 18
  19. 19
  20. The training data is required by the classification algorithms to build the model while testing data is required to test the models so built. Validation is done by cross validation technique which is used for evaluating the results generated by the independent datasets. The machine learning algorithms are evaluated by using following performance measures 20
  21. True positive rate (TPR): Rate of correctly identified malicious files (also known as recall or sensitivity). It is a measure of completeness or quality. 𝑇𝑃𝑅 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 21
  22. False positive rate (FPR): Rate of incorrectly identified benign files. 𝐹𝑃𝑅 = 𝐹𝑃 𝐹𝑃 + 𝑇𝑁 22
  23. Precision: Rate of Detection. It is a measure of exactness or quality 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 23
  24. F-Measure: It is the harmonic mean of precision and recall. 𝐹 − 𝑀𝑒𝑎𝑠𝑢𝑟𝑒 = 2 ∗ 𝑇𝑃 2 ∗ 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 24
  25. Accuracy: Percentage of correctly identified files (both benign and malicious). 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 % = 𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁 + 𝑇𝑁 ∗ 100 25
  26. 26
  27. 27
  28. 28
  29.  [1] A. Moser, C. Kruegel, E. Kirda,“Exploring Multiple Execution Paths for Malware Analysis,”Proc. of IEEE Symposium on Security and Privacy, pp. 231-245. IEEE Computer Society, USA, 2007, doi:10.1109/SP.2007.17.  [2] E. Gandotra, D. Bansal,S. Sofat,“Malware Analysis and Classification: A Survey,” Journal of Information Security, vol. 5, pp. 56-65, 2014.  [3] Internet Security Threat Report, Symantec,Volume 21, April, 2016, [online]. Available: https://www.symantec.com/content/dam/symantec/docs/ reports/istr21-2016-en.pdf  [4] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I.Witten,“The WEKA Data Mining Software: An Update,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1 pp. 10-18, 2009.  [5] M. Schultz, E. Eskin, F. Zadok, and S. Stolfo,“Data mining methods for detection of new malicious executables,”Proc. of 2001 IEEE Symposium on Security and Privacy, IEEE, Oakland, CA, 2001, pp. 38-49, Doi: 10.1109/SECPRI.2001.924286.  [6] J. Kolter, and M. Maloof,“Learning to detect malicious executables in the wild,” Proc. of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM NewYork, NY, USA, 2004, pp. 470–478, doi: 10.1145/1014052.1014105.  [7] D. Kong and G.Yan,“Discriminant malware distance learning on structural information for automated malware classification,”Proc. of the ACM SIGMETRICS/ international conference on Measurement and modeling of computer systems,ACM NewYork, USA, 2013, pp. 347- 348, doi: 10.1145/2465529.2465531.  [8] R.Tian, L. Batten, and S.Versteeg,“Function Length as a Tool for Malware Classification,” Proc. of the 3rd International Conference. 29
  30. Thank you 30
Publicité