SlideShare une entreprise Scribd logo
1  sur  24
Simseer.com
Malware Similarity and Clustering
Made Easy

Silvio Cesare <silvio@ruxcon.org.au>
Introduction
• Simseer.com is a set of web services to analyse
  malware using program structure as a signature..
  Why?

• AV String signatures not very robust.

• Can’t detect ‘approximate’ matches.

• Hard to generate signature for an entire family.

• Program structure improves signature-based
  methods.
Who am I?
• Ph.D. Student at Deakin University.

• Presented at Ruxcon, Black Hat, AusCERT, etc.

• Published in academia.

• Book author         

• Recently relocated to Canberra.
Outline
1. Introduction

2. Simseer.com’s Malware Services

3. Supporting Infrastructure

4. Other Services

5. Conclusion
Signatures
• In my other presentations.
• Signature is based on ‘set of control flow graphs’
Signature Extraction
• Transform ‘set of control flow graphs’ into a
  ‘feature vector’

• Decompilation + N-Grams                                               W|IE
                                                                        |IEH
                                                              W|IEH}R
                                                                        IEH}
                                                                        EH}R
                             proc(){
               L_0           L_0:                   W|IEH}R
                               while (v1 || v2) {
               L_3           L_1:
                                 if (v3) {
 true                        L_2:
               L_6
                                 } else {
        true                 L_4:
                                 }
 L_1           L_7           L_5:
                      true     }
 true                        L_7:
                               return;
 L_2           L_4
                             }
               true

               L_5
Simseer
• Begin start of demo...

• A revamp of my existing
  http://www.FooCodeChu.com service.

• Submit an archive of malware samples.

• Results
  ▫ A similarity matrix comparing samples.
  ▫ An evolutionary tree showing relationships.
Submission Page
Results
Simseer
• Demo complete...

• Use ‘distance between vectors’ to show
  similarity.

• Visualize using phylogenetics software.
SimseerCluster
• Begin demo...

• A new service.

• Submit an archive of malware samples.

• Define the number of clusters.

• Results
  ▫ Samples grouped into clusters.
  ▫ Cross checking samples with AV.
  ▫ Identification of families.
Submission Page
Results
SimseerCluster
• Demo complete...

• Use ‘similarity matrix’ and ‘cosine similarity’.

• Pass to ‘cluster analysis software’ – The Weka
  Machine Learning Toolkit.

• Use Hierarchical clustering.
SimseerSearch
• Begin demo...

• A new service.

• Submit a malware sample.

• Specify threshold of similarity.

• Results
  ▫ All samples in database similar to query.
  ▫ An AV report.
  ▫ Heuristics to detect obfuscations (packing).
Submission Page
Results
Query Benign

                                                                       r



SimseerSearch                     p
                                              d(p,q)
                                                        q




                                                                      Query Malicious
                                      Query




• Demo complete...
                                      Malware




• Use ‘nearest neighbour similarity search’ based
  on ‘Euclidean distance’.

• Packer detection based on entropy analysis.
Supporting Infrastructure
Other Services
• Other services on the same infrastructure
 ▫ Clonewise
 ▫ Bugwise
Clonewise – Detecting embedded
libraries.
Bugwise on real Debian Linux binaries
Future Work
• Integrate Cuckoo sandbox
 ▫ Unpacking with Volatility.
 ▫ Non EXE formats (PDF, DOC, etc).
 ▫ API Call classification (non signature-based).
Conclusion
• Free services.

• Control flow better than traditional string
  signatures.

• Try it!

• http://www.simseer.com

Contenu connexe

Tendances

Spark as a distributed Scala
Spark as a distributed ScalaSpark as a distributed Scala
Spark as a distributed ScalaAlex Fruzenshtein
 
Java Tutorial Lab 5
Java Tutorial Lab 5Java Tutorial Lab 5
Java Tutorial Lab 5Berk Soysal
 
jimmy hacking (at) Microsoft
jimmy hacking (at) Microsoftjimmy hacking (at) Microsoft
jimmy hacking (at) MicrosoftJimmy Schementi
 
Go Concurrency Patterns
Go Concurrency PatternsGo Concurrency Patterns
Go Concurrency PatternsElifTech
 
My first experience with lambda expressions in java
My first experience with lambda expressions in javaMy first experience with lambda expressions in java
My first experience with lambda expressions in javaScheidt & Bachmann
 
Functional programming in Javascript
Functional programming in JavascriptFunctional programming in Javascript
Functional programming in JavascriptKnoldus Inc.
 
Stream Puzzlers – Traps and Pitfalls in Using Java 8 Streams
Stream Puzzlers – Traps and Pitfalls in Using Java 8 Streams Stream Puzzlers – Traps and Pitfalls in Using Java 8 Streams
Stream Puzzlers – Traps and Pitfalls in Using Java 8 Streams langer4711
 
Operator Overloading
Operator OverloadingOperator Overloading
Operator OverloadingNilesh Dalvi
 
Sync with async
Sync with  asyncSync with  async
Sync with asyncprabathsl
 
Angular and The Case for RxJS
Angular and The Case for RxJSAngular and The Case for RxJS
Angular and The Case for RxJSSandi Barr
 
Functors, Applicatives and Monads In Scala
Functors, Applicatives and Monads In ScalaFunctors, Applicatives and Monads In Scala
Functors, Applicatives and Monads In ScalaKnoldus Inc.
 
Java Tutorial Lab 3
Java Tutorial Lab 3Java Tutorial Lab 3
Java Tutorial Lab 3Berk Soysal
 
JavaScript Execution Context
JavaScript Execution ContextJavaScript Execution Context
JavaScript Execution ContextJuan Medina
 
Kotlin scope functions
Kotlin scope functionsKotlin scope functions
Kotlin scope functionsWaheed Nazir
 
JavaScript global object, execution contexts & closures
JavaScript global object, execution contexts & closuresJavaScript global object, execution contexts & closures
JavaScript global object, execution contexts & closuresHDR1001
 

Tendances (20)

Spark as a distributed Scala
Spark as a distributed ScalaSpark as a distributed Scala
Spark as a distributed Scala
 
Python to scala
Python to scalaPython to scala
Python to scala
 
Java Tutorial Lab 5
Java Tutorial Lab 5Java Tutorial Lab 5
Java Tutorial Lab 5
 
jimmy hacking (at) Microsoft
jimmy hacking (at) Microsoftjimmy hacking (at) Microsoft
jimmy hacking (at) Microsoft
 
Go Concurrency Patterns
Go Concurrency PatternsGo Concurrency Patterns
Go Concurrency Patterns
 
My first experience with lambda expressions in java
My first experience with lambda expressions in javaMy first experience with lambda expressions in java
My first experience with lambda expressions in java
 
Functional programming in Javascript
Functional programming in JavascriptFunctional programming in Javascript
Functional programming in Javascript
 
JavaScript for real men
JavaScript for real menJavaScript for real men
JavaScript for real men
 
Stream Puzzlers – Traps and Pitfalls in Using Java 8 Streams
Stream Puzzlers – Traps and Pitfalls in Using Java 8 Streams Stream Puzzlers – Traps and Pitfalls in Using Java 8 Streams
Stream Puzzlers – Traps and Pitfalls in Using Java 8 Streams
 
Operator Overloading
Operator OverloadingOperator Overloading
Operator Overloading
 
Advanced oops concept using asp
Advanced oops concept using aspAdvanced oops concept using asp
Advanced oops concept using asp
 
Sync with async
Sync with  asyncSync with  async
Sync with async
 
Angular and The Case for RxJS
Angular and The Case for RxJSAngular and The Case for RxJS
Angular and The Case for RxJS
 
Functors, Applicatives and Monads In Scala
Functors, Applicatives and Monads In ScalaFunctors, Applicatives and Monads In Scala
Functors, Applicatives and Monads In Scala
 
Java Tutorial Lab 3
Java Tutorial Lab 3Java Tutorial Lab 3
Java Tutorial Lab 3
 
What's new in C# 8.0 (beta)
What's new in C# 8.0 (beta)What's new in C# 8.0 (beta)
What's new in C# 8.0 (beta)
 
JavaScript Execution Context
JavaScript Execution ContextJavaScript Execution Context
JavaScript Execution Context
 
Compilation
CompilationCompilation
Compilation
 
Kotlin scope functions
Kotlin scope functionsKotlin scope functions
Kotlin scope functions
 
JavaScript global object, execution contexts & closures
JavaScript global object, execution contexts & closuresJavaScript global object, execution contexts & closures
JavaScript global object, execution contexts & closures
 

En vedette

Using Test Triggers for Improved Defect Detection
Using Test Triggers for Improved Defect DetectionUsing Test Triggers for Improved Defect Detection
Using Test Triggers for Improved Defect DetectionCharles Schultz
 
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...Silvio Cesare
 
Simseer - A Software Similarity Web Service
Simseer - A Software Similarity Web ServiceSimseer - A Software Similarity Web Service
Simseer - A Software Similarity Web ServiceSilvio Cesare
 
Defect prevention techniques
Defect prevention techniquesDefect prevention techniques
Defect prevention techniquesZarko Acimovic
 

En vedette (6)

Using Test Triggers for Improved Defect Detection
Using Test Triggers for Improved Defect DetectionUsing Test Triggers for Improved Defect Detection
Using Test Triggers for Improved Defect Detection
 
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
 
Simseer - A Software Similarity Web Service
Simseer - A Software Similarity Web ServiceSimseer - A Software Similarity Web Service
Simseer - A Software Similarity Web Service
 
Defect removal effectiveness
Defect removal effectivenessDefect removal effectiveness
Defect removal effectiveness
 
Migration testing
Migration testingMigration testing
Migration testing
 
Defect prevention techniques
Defect prevention techniquesDefect prevention techniques
Defect prevention techniques
 

Similaire à Simseer.com - Malware Similarity and Clustering Made Easy

FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...Silvio Cesare
 
Gae icc fall2011
Gae icc fall2011Gae icc fall2011
Gae icc fall2011Juan Gomez
 
Looking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelopLooking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelopPVS-Studio
 
Microservices Chaos Testing at Jet
Microservices Chaos Testing at JetMicroservices Chaos Testing at Jet
Microservices Chaos Testing at JetC4Media
 
Java-Intro.pptx
Java-Intro.pptxJava-Intro.pptx
Java-Intro.pptxVijalJain3
 
The Future of Node - @rvagg - NodeConf Christchurch 2015
The Future of Node - @rvagg - NodeConf Christchurch 2015The Future of Node - @rvagg - NodeConf Christchurch 2015
The Future of Node - @rvagg - NodeConf Christchurch 2015rvagg
 
Testing swagger contracts without contract based testing
Testing swagger contracts without contract based testingTesting swagger contracts without contract based testing
Testing swagger contracts without contract based testingАлексей Стягайло
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMManuel Bernhardt
 
Scala, Play 2.0 & Cloud Foundry
Scala, Play 2.0 & Cloud FoundryScala, Play 2.0 & Cloud Foundry
Scala, Play 2.0 & Cloud FoundryPray Desai
 
Better Code through Lint and Checkstyle
Better Code through Lint and CheckstyleBetter Code through Lint and Checkstyle
Better Code through Lint and CheckstyleMarc Prengemann
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Flask With Server-Sent Event
Flask With Server-Sent EventFlask With Server-Sent Event
Flask With Server-Sent EventTencent
 
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...ICSM 2011
 
A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in O...
A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in O...A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in O...
A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in O...Coen De Roover
 
The operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerThe operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerAndrey Karpov
 
The Evolution of Async-Programming on .NET Platform (.Net China, C#)
The Evolution of Async-Programming on .NET Platform (.Net China, C#)The Evolution of Async-Programming on .NET Platform (.Net China, C#)
The Evolution of Async-Programming on .NET Platform (.Net China, C#)jeffz
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeXin Ye
 

Similaire à Simseer.com - Malware Similarity and Clustering Made Easy (20)

FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
 
Gae icc fall2011
Gae icc fall2011Gae icc fall2011
Gae icc fall2011
 
Taverna as a service
Taverna as a serviceTaverna as a service
Taverna as a service
 
Looking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelopLooking for Bugs in MonoDevelop
Looking for Bugs in MonoDevelop
 
Microservices Chaos Testing at Jet
Microservices Chaos Testing at JetMicroservices Chaos Testing at Jet
Microservices Chaos Testing at Jet
 
Java-Intro.pptx
Java-Intro.pptxJava-Intro.pptx
Java-Intro.pptx
 
The Future of Node - @rvagg - NodeConf Christchurch 2015
The Future of Node - @rvagg - NodeConf Christchurch 2015The Future of Node - @rvagg - NodeConf Christchurch 2015
The Future of Node - @rvagg - NodeConf Christchurch 2015
 
Testing swagger contracts without contract based testing
Testing swagger contracts without contract based testingTesting swagger contracts without contract based testing
Testing swagger contracts without contract based testing
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVMVoxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
 
Scala, Play 2.0 & Cloud Foundry
Scala, Play 2.0 & Cloud FoundryScala, Play 2.0 & Cloud Foundry
Scala, Play 2.0 & Cloud Foundry
 
Better Code through Lint and Checkstyle
Better Code through Lint and CheckstyleBetter Code through Lint and Checkstyle
Better Code through Lint and Checkstyle
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Flask With Server-Sent Event
Flask With Server-Sent EventFlask With Server-Sent Event
Flask With Server-Sent Event
 
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
 
A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in O...
A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in O...A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in O...
A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in O...
 
The operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerThe operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzer
 
The Evolution of Async-Programming on .NET Platform (.Net China, C#)
The Evolution of Async-Programming on .NET Platform (.Net China, C#)The Evolution of Async-Programming on .NET Platform (.Net China, C#)
The Evolution of Async-Programming on .NET Platform (.Net China, C#)
 
C# for beginners
C# for beginnersC# for beginners
C# for beginners
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain KnowledgeLearning to Rank Relevant Files for Bug Reports using Domain Knowledge
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
 
Net framework
Net frameworkNet framework
Net framework
 

Plus de Silvio Cesare

A BEGINNER’S JOURNEY INTO THE WORLD OF HARDWARE HACKING
A BEGINNER’S JOURNEY INTO THE WORLD OF HARDWARE HACKINGA BEGINNER’S JOURNEY INTO THE WORLD OF HARDWARE HACKING
A BEGINNER’S JOURNEY INTO THE WORLD OF HARDWARE HACKINGSilvio Cesare
 
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERSA WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERSSilvio Cesare
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisSilvio Cesare
 
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...
Clonewise  - Automatically Detecting Package Clones and Inferring Security Vu...Clonewise  - Automatically Detecting Package Clones and Inferring Security Vu...
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...Silvio Cesare
 
Wire - A Formal Intermediate Language for Binary Analysis
Wire - A Formal Intermediate Language for Binary AnalysisWire - A Formal Intermediate Language for Binary Analysis
Wire - A Formal Intermediate Language for Binary AnalysisSilvio Cesare
 
Effective flowgraph-based malware variant detection
Effective flowgraph-based malware variant detectionEffective flowgraph-based malware variant detection
Effective flowgraph-based malware variant detectionSilvio Cesare
 
Faster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationFaster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationSilvio Cesare
 
Automated Detection of Software Bugs and Vulnerabilities in Linux
Automated Detection of Software Bugs and Vulnerabilities in LinuxAutomated Detection of Software Bugs and Vulnerabilities in Linux
Automated Detection of Software Bugs and Vulnerabilities in LinuxSilvio Cesare
 
Malware Variant Detection Using Similarity Search over Sets of Control Flow G...
Malware Variant Detection Using Similarity Search over Sets of Control Flow G...Malware Variant Detection Using Similarity Search over Sets of Control Flow G...
Malware Variant Detection Using Similarity Search over Sets of Control Flow G...Silvio Cesare
 
Simple Bugs and Vulnerabilities in Linux Distributions
Simple Bugs and Vulnerabilities in Linux DistributionsSimple Bugs and Vulnerabilities in Linux Distributions
Simple Bugs and Vulnerabilities in Linux DistributionsSilvio Cesare
 
Fast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of MalwareFast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of MalwareSilvio Cesare
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowSilvio Cesare
 
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...Silvio Cesare
 
Security Applications For Emulation
Security Applications For EmulationSecurity Applications For Emulation
Security Applications For EmulationSilvio Cesare
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource KernelsSilvio Cesare
 

Plus de Silvio Cesare (15)

A BEGINNER’S JOURNEY INTO THE WORLD OF HARDWARE HACKING
A BEGINNER’S JOURNEY INTO THE WORLD OF HARDWARE HACKINGA BEGINNER’S JOURNEY INTO THE WORLD OF HARDWARE HACKING
A BEGINNER’S JOURNEY INTO THE WORLD OF HARDWARE HACKING
 
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERSA WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow AnalysisDetecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
 
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...
Clonewise  - Automatically Detecting Package Clones and Inferring Security Vu...Clonewise  - Automatically Detecting Package Clones and Inferring Security Vu...
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...
 
Wire - A Formal Intermediate Language for Binary Analysis
Wire - A Formal Intermediate Language for Binary AnalysisWire - A Formal Intermediate Language for Binary Analysis
Wire - A Formal Intermediate Language for Binary Analysis
 
Effective flowgraph-based malware variant detection
Effective flowgraph-based malware variant detectionEffective flowgraph-based malware variant detection
Effective flowgraph-based malware variant detection
 
Faster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware ClassificationFaster, More Effective Flowgraph-based Malware Classification
Faster, More Effective Flowgraph-based Malware Classification
 
Automated Detection of Software Bugs and Vulnerabilities in Linux
Automated Detection of Software Bugs and Vulnerabilities in LinuxAutomated Detection of Software Bugs and Vulnerabilities in Linux
Automated Detection of Software Bugs and Vulnerabilities in Linux
 
Malware Variant Detection Using Similarity Search over Sets of Control Flow G...
Malware Variant Detection Using Similarity Search over Sets of Control Flow G...Malware Variant Detection Using Similarity Search over Sets of Control Flow G...
Malware Variant Detection Using Similarity Search over Sets of Control Flow G...
 
Simple Bugs and Vulnerabilities in Linux Distributions
Simple Bugs and Vulnerabilities in Linux DistributionsSimple Bugs and Vulnerabilities in Linux Distributions
Simple Bugs and Vulnerabilities in Linux Distributions
 
Fast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of MalwareFast Automated Unpacking and Classification of Malware
Fast Automated Unpacking and Classification of Malware
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control Flow
 
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
 
Security Applications For Emulation
Security Applications For EmulationSecurity Applications For Emulation
Security Applications For Emulation
 
Auditing the Opensource Kernels
Auditing the Opensource KernelsAuditing the Opensource Kernels
Auditing the Opensource Kernels
 

Simseer.com - Malware Similarity and Clustering Made Easy

  • 1. Simseer.com Malware Similarity and Clustering Made Easy Silvio Cesare <silvio@ruxcon.org.au>
  • 2. Introduction • Simseer.com is a set of web services to analyse malware using program structure as a signature.. Why? • AV String signatures not very robust. • Can’t detect ‘approximate’ matches. • Hard to generate signature for an entire family. • Program structure improves signature-based methods.
  • 3. Who am I? • Ph.D. Student at Deakin University. • Presented at Ruxcon, Black Hat, AusCERT, etc. • Published in academia. • Book author  • Recently relocated to Canberra.
  • 4. Outline 1. Introduction 2. Simseer.com’s Malware Services 3. Supporting Infrastructure 4. Other Services 5. Conclusion
  • 5. Signatures • In my other presentations. • Signature is based on ‘set of control flow graphs’
  • 6. Signature Extraction • Transform ‘set of control flow graphs’ into a ‘feature vector’ • Decompilation + N-Grams W|IE |IEH W|IEH}R IEH} EH}R proc(){ L_0 L_0: W|IEH}R while (v1 || v2) { L_3 L_1: if (v3) { true L_2: L_6 } else { true L_4: } L_1 L_7 L_5: true } true L_7: return; L_2 L_4 } true L_5
  • 7. Simseer • Begin start of demo... • A revamp of my existing http://www.FooCodeChu.com service. • Submit an archive of malware samples. • Results ▫ A similarity matrix comparing samples. ▫ An evolutionary tree showing relationships.
  • 10. Simseer • Demo complete... • Use ‘distance between vectors’ to show similarity. • Visualize using phylogenetics software.
  • 11. SimseerCluster • Begin demo... • A new service. • Submit an archive of malware samples. • Define the number of clusters. • Results ▫ Samples grouped into clusters. ▫ Cross checking samples with AV. ▫ Identification of families.
  • 14. SimseerCluster • Demo complete... • Use ‘similarity matrix’ and ‘cosine similarity’. • Pass to ‘cluster analysis software’ – The Weka Machine Learning Toolkit. • Use Hierarchical clustering.
  • 15. SimseerSearch • Begin demo... • A new service. • Submit a malware sample. • Specify threshold of similarity. • Results ▫ All samples in database similar to query. ▫ An AV report. ▫ Heuristics to detect obfuscations (packing).
  • 18. Query Benign r SimseerSearch p d(p,q) q Query Malicious Query • Demo complete... Malware • Use ‘nearest neighbour similarity search’ based on ‘Euclidean distance’. • Packer detection based on entropy analysis.
  • 20. Other Services • Other services on the same infrastructure ▫ Clonewise ▫ Bugwise
  • 21. Clonewise – Detecting embedded libraries.
  • 22. Bugwise on real Debian Linux binaries
  • 23. Future Work • Integrate Cuckoo sandbox ▫ Unpacking with Volatility. ▫ Non EXE formats (PDF, DOC, etc). ▫ API Call classification (non signature-based).
  • 24. Conclusion • Free services. • Control flow better than traditional string signatures. • Try it! • http://www.simseer.com