SlideShare une entreprise Scribd logo
1  sur  38
©2019 FireEye
©2019 FireEye©2019 FireEye2
About Us
 Michael Sikorski
 Philip Tully
 Jay Gibble
 Matthew Haigh
©2019 FireEye
"HTTP 1.1 200 OK "
©2019 FireEye©2019 FireEye
One String can Make a Difference
4
NanoHTTPD webserver produces extra whitespace
Cobalt Strike Server Detection
Continued for 7 years
Detection signature
Track threat actors, identify C2 addresses
https://blog.fox-it.com/2019/02/26/identifying-cobalt-strike-team-servers-in-the-wild/
©2019 FireEye©2019 FireEye
Running Strings on larger
binaries produces tens of
thousands of strings.
5
©2019 FireEye©2019 FireEye
Strings produces a ton of noise
mixed in with important
information.
6
©2019 FireEye©2019 FireEye
What is a String
7
 N characters + NULL
No file format, context
0x31 0x33 0x33 0x37 0x00
– ‘1337’, right?
Not necessarily:
– memory address
– CPU instructions
– data used by the program
©2019 FireEye©2019 FireEye
Wide Strings
8
 Also be referred to as Wide strings
 The Windows OS uses Wide strings internally
– Microsoft’s encoding standard is UTF-16 LE
 Each wide character is two bytes
 C-style wide character strings terminated with double NULL (0x00, 0x00)
©2019 FireEye©2019 FireEye
Compilation
9
SourceCode
int main() {
printf("Derby");
return 0;
}
ObjectFile
"Derby"
.EXEBinary
.data
0x56000:
"Derby"
Strings persist on disk throughout the compilation process.
©2019 FireEye©2019 FireEye
The Strings Program
10
!This program cannot be run in DOS mode.
??3@YAXPAX@Z
??2@YAPAXI@Z
__CxxFrameHandler
_except_handler3
WSAStartup() error: %d
User-Agent: Mozilla/4.0 (compatible; MSIE 6.00; Windows
NT 5.1)
GetLastInputInfo
SeShutdownPrivilege
%sIEXPLORE.EXE
SOFTWAREMicrosoftWindowsCurrentVersionApp
PathsIEXPLORE.EXE
[Machine IdleTime:] %d days + %.2d:%.2d:%.2d
[Machine UpTime:] %-.2d Days %-.2d Hours %-.2d Minutes
%-.2d Seconds
ServiceDll
SYSTEMCurrentControlSetServices%sParameters
if exist "%s" goto selfkill
del "%s"
attrib -a -r -s -h "%s"
Inject '%s' to PID '%d' Successfully!
cmd.exe /c
Hi,Master [%d/%d/%d %d:%d:%d]
©2019 FireEye©2019 FireEye
Malware Triage
11
Customer
Suspected
compromise
Incident Response
Forensic analysis
Identify malware
sample
Reverse Engineer
Binary triage
Malware analysis
reverse engineers, SOC analysts, red teamers, incident responders, malware researchers
©2019 FireEye©2019 FireEye
Knowing which strings are
relevant often requires highly
experienced analysts.
12
©2019 FireEye©2019 FireEye
Strings Tells a Story
13
Relevance
domain names
IP addresses
URLs
filenames
registry paths
registry keys
HTTP user-agent strings
service configuration info
keylogger indicators
(e.g. ”[DELETE]”, “[BS]”
third party libraries
PDB strings
function names
debugging messages
command line help/usage options
OSINT
runtime artifacts
compiler artifacts
Windows APIs
library code
localizations
locations
languages
error messages
random byte sequences
format specifiers
©2019 FireEye©2019 FireEye
Relevance is subjective and its
definition can vary significantly
across analysts.
14
©2019 FireEye©2019 FireEye
Hypothesis and Goals
15
 Develop a tool that can:
– efficiently identify and prioritize strings
– based on relevance for malware analysis
StringSifter should:
– be easy to use
– generalize across:
– personas, use cases, downstream apps
– save time and money
 How does it work?
©2019 FireEye©2019 FireEye
Rankings are Everywhere
16
©2019 FireEye©2019 FireEye
 Search engines
– web
– e-commerce
 News Feeds
– social networks
 Recommender systems
– ads
– movies
– music
Our Favorite Products Serve Up Rankings
17
©2019 FireEye©2019 FireEye
( )
 Create optimal ordering of a list of items
 Precise individual item scores less important
than their relative ordering
 In classification, regression, clustering we
predict a class or single score
 LTR rarely applied in security applications
Learning to Rank
18
f
©2019 FireEye©2019 FireEye
 Rank items within unseen lists in a similar way to rankings within training lists
 Each item associated with a set of features and an ordinal integer label
 Ordinal label is the teaching signal that encodes relevance level
LTR as Supervised Learning
19
©2019 FireEye©2019 FireEye
 Decision Trees
– greedily choose splits by Gini impurity
 Gradient Boosted Decision Trees (GBDTs)
– combine outputs from multiple Decision Trees
– reduce loss using gradient descent
– weighted sum of trees’ predictions as ensemble
 LightGBM
– GBDTs with an LTR objective function
Gradient Boosted Decision Trees
20
©2019 FireEye©2019 FireEye
EMBER Training Dataset
21
 Endgame Malware BEnchmark for Research
– v1 (1.1 million PE files scanned on or before 2017)
 https://arxiv.org/abs/1804.04637
 https://github.com/endgameinc/ember
– 400k train + test malware binaries from v1
 malware defined as > 40 VT vendors say malicious
 Ran Strings on 400k malware binaries
– produced 3+ billion individual strings (24 GB)
– performed sampling
– labeled according to heuristics and FLARE hand-labeling
©2019 FireEye©2019 FireEye
 Natural Language Processing
– Markov model
– Entropy rate, english KL divergence
– Scrabble scores
 Host, Network IoCs
 Malware Regexes
– encodings (base64)
– format specifiers
– user agents
Representing Strings as Features
22
t
%
F
0.02
0.07
0.01
0.2
0.2
0.01
0.03
0.14
0.05
threshold = 0.01
http://evil.com
SOFTWAREincludeevil.pdb
t%Ft
Vr}Y
0.018
0.014
0.007
0.001
©2019 FireEye©2019 FireEye
quixotry  ˈkwik-sə-trē  (n.)
behavior inspired by idealistic
beliefs without regard to reality.
23
©2019 FireEye©2019 FireEye
Example
24
©2019 FireEye©2019 FireEye
 Normalized Discounted Cumulative Gain
– Normalized: divide DCG by ideal DCG on a
ground truth holdout dataset
– Discounted: divides each string’s predicted
relevance by a monotonically increasing
function (log of its ranked position)
– Cumulative: the cumulative gain or summed
total of every string’s relevance
– Gain: the magnitude of each string’s relevance
Evaluation
25
©2019 FireEye©2019 FireEye
Results
26
StringSifter performs well on a holdout set of 7+ years of FLARE malware reports.
©2019 FireEye©2019 FireEye
Putting it All Together
27
©2019 FireEye©2019 FireEye
Open Sourcing StringSifter
28
 The tool is now live:
– https://github.com/fireeye/stringsifter
– pip install stringsifter
– Command line and Docker tools
 flarestrings <my_sample> | rank_strings
 Versatility
– FLOSS outputs
– live memory dumps
©2019 FireEye
Tools demo
©2019 FireEye©2019 FireEye
 Git + local pip install
– Easy access to source code
 Pip install from PyPi
– If you just want to use the tool
 Docker container
– Minimum impact to host
Install and Use
30
git clone https://github.com/fireeye/stringsifter.git
cd stringsifter
pip install -e .
flarestrings <my_sample> | rank_strings
pip install stringsifter
flarestrings <my_sample> | rank_strings
git clone https://github.com/fireeye/stringsifter.git
cd stringsifter
docker build -t stringsifter -f docker/Dockerfile .
docker run -v <malware_dir>:/samples -it stringsifter
flarestrings /samples/<my_sample> | rank_strings
©2019 FireEye©2019 FireEye
 There are many versions of "strings"
– Gnu binutils, BSD, various windows implementations
– Inconsistent features
 flarestrings
– Pure python implementation of "strings"
– Consistent across platforms
– Prints both ASCII and wide strings
flarestrings *
31
* FLARE => FireEye Labs Advanced Reverse Engineering
©2019 FireEye©2019 FireEye
flarestrings Demo
32
©2019 FireEye©2019 FireEye
StringSifter rank_strings Demo
33
©2019 FireEye©2019 FireEye
rank_strings Options
34
©2019 FireEye©2019 FireEye
rank_strings with --scores
35
©2019 FireEye©2019 FireEye
rank_strings with --min-score
36
©2019 FireEye©2019 FireEye
 Rapid screening for potential capabilities
 Detect and handle packed / obfuscated binaries
– Tipoff for automated unpacker tooling
 Leverage feature vectors to focus triage
 Improve NLP
 Improve ranking performance on mach-o, ELF
Other Use Cases and Future Work
37
©2019 FireEye©2019 FireEye
 Plug into your malware analysis stack
 Seeking critical feedback
– improve accuracy and utility
– pertinent edge cases, non-PE files
– contribute via GitHub Issues
 Beginners and experts alike
 Thank you for your attention!
Community Support
38
https://github.com/fireeye/stringsifter
pip install stringsifter

Contenu connexe

Tendances

Cryptography full report
Cryptography full reportCryptography full report
Cryptography full report
harpoo123143
 

Tendances (20)

Elliptic Curve Cryptography: Arithmetic behind
Elliptic Curve Cryptography: Arithmetic behindElliptic Curve Cryptography: Arithmetic behind
Elliptic Curve Cryptography: Arithmetic behind
 
Artificial Intelligence and Machine Learning for Cybersecurity
Artificial Intelligence and Machine Learning for CybersecurityArtificial Intelligence and Machine Learning for Cybersecurity
Artificial Intelligence and Machine Learning for Cybersecurity
 
Layer wise network security
Layer wise network securityLayer wise network security
Layer wise network security
 
Active Directory in ICS: Lessons Learned From The Field
Active Directory in ICS: Lessons Learned From The FieldActive Directory in ICS: Lessons Learned From The Field
Active Directory in ICS: Lessons Learned From The Field
 
Iot security and Authentication solution
Iot security and Authentication solutionIot security and Authentication solution
Iot security and Authentication solution
 
Cryptography and network security
Cryptography and network securityCryptography and network security
Cryptography and network security
 
Application Security | Application Security Tutorial | Cyber Security Certifi...
Application Security | Application Security Tutorial | Cyber Security Certifi...Application Security | Application Security Tutorial | Cyber Security Certifi...
Application Security | Application Security Tutorial | Cyber Security Certifi...
 
Cryptography Presentation
Cryptography PresentationCryptography Presentation
Cryptography Presentation
 
Cryptographic algorithms
Cryptographic algorithmsCryptographic algorithms
Cryptographic algorithms
 
Data Privacy & Security
Data Privacy & SecurityData Privacy & Security
Data Privacy & Security
 
Privacy preserving machine learning
Privacy preserving machine learningPrivacy preserving machine learning
Privacy preserving machine learning
 
Ethical hacking and cyber security intro
Ethical hacking and cyber security introEthical hacking and cyber security intro
Ethical hacking and cyber security intro
 
Steganography - The art of hiding data
Steganography - The art of hiding dataSteganography - The art of hiding data
Steganography - The art of hiding data
 
Introduction to Cybersecurity Fundamentals
Introduction to Cybersecurity FundamentalsIntroduction to Cybersecurity Fundamentals
Introduction to Cybersecurity Fundamentals
 
Ch14
Ch14Ch14
Ch14
 
Cryptography full report
Cryptography full reportCryptography full report
Cryptography full report
 
Vulnerability assessment and penetration testing
Vulnerability assessment and penetration testingVulnerability assessment and penetration testing
Vulnerability assessment and penetration testing
 
Steganography
SteganographySteganography
Steganography
 
One time pad Encryption:
One time pad Encryption:One time pad Encryption:
One time pad Encryption:
 
Cyber Security Seminar.pptx
Cyber Security Seminar.pptxCyber Security Seminar.pptx
Cyber Security Seminar.pptx
 

Similaire à StringSifter: Learning to Rank Strings Output for Speedier Malware Analysis

Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Breaking Extreme Networks WingOS: How to own millions of devices running on A...Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Priyanka Aash
 
" Breaking Extreme Networks WingOS: How to own millions of devices running on...
" Breaking Extreme Networks WingOS: How to own millions of devices running on..." Breaking Extreme Networks WingOS: How to own millions of devices running on...
" Breaking Extreme Networks WingOS: How to own millions of devices running on...
PROIDEA
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
DataWorks Summit
 
apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...
apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...
apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...
Chrysostomos Christofi
 

Similaire à StringSifter: Learning to Rank Strings Output for Speedier Malware Analysis (20)

IBM Watson & PHP, A Practical Demonstration
IBM Watson & PHP, A Practical DemonstrationIBM Watson & PHP, A Practical Demonstration
IBM Watson & PHP, A Practical Demonstration
 
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
apidays LIVE Paris - Bring the API culture to DevOps teams by Christophe Bour...
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Learning to Rank Relevant Malware Strings Using Weak Supervision
Learning to Rank Relevant Malware Strings Using Weak SupervisionLearning to Rank Relevant Malware Strings Using Weak Supervision
Learning to Rank Relevant Malware Strings Using Weak Supervision
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
voip_en
voip_envoip_en
voip_en
 
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
 
The Role of Standards in IoT Security
The Role of Standards in IoT SecurityThe Role of Standards in IoT Security
The Role of Standards in IoT Security
 
Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Breaking Extreme Networks WingOS: How to own millions of devices running on A...Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Breaking Extreme Networks WingOS: How to own millions of devices running on A...
 
" Breaking Extreme Networks WingOS: How to own millions of devices running on...
" Breaking Extreme Networks WingOS: How to own millions of devices running on..." Breaking Extreme Networks WingOS: How to own millions of devices running on...
" Breaking Extreme Networks WingOS: How to own millions of devices running on...
 
Firepower ngfw internet
Firepower ngfw internetFirepower ngfw internet
Firepower ngfw internet
 
Csa UK agm 2019 - Web API attacks - Trends seen in the field Kriti Mohul
Csa UK agm 2019 - Web API attacks - Trends seen in the field Kriti MohulCsa UK agm 2019 - Web API attacks - Trends seen in the field Kriti Mohul
Csa UK agm 2019 - Web API attacks - Trends seen in the field Kriti Mohul
 
Serverless survival kit
Serverless survival kitServerless survival kit
Serverless survival kit
 
Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...
Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...
Fuzzing malware for fun & profit. Applying Coverage-Guided Fuzzing to Find Bu...
 
CIS 2015 How to secure the Internet of Things? Hannes Tschofenig
CIS 2015 How to secure the Internet of Things? Hannes TschofenigCIS 2015 How to secure the Internet of Things? Hannes Tschofenig
CIS 2015 How to secure the Internet of Things? Hannes Tschofenig
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
 
Introduction To NIDS
Introduction To NIDSIntroduction To NIDS
Introduction To NIDS
 
apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...
apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...
apl5iy2ftxiwofbhsmxj-signature-584e2459f99b5370bda435f09b42cc84cc8c063b8cd454...
 
technical-information-gathering-slides.pdf
technical-information-gathering-slides.pdftechnical-information-gathering-slides.pdf
technical-information-gathering-slides.pdf
 
Atelier Technique CISCO ACSS 2018
Atelier Technique CISCO ACSS 2018Atelier Technique CISCO ACSS 2018
Atelier Technique CISCO ACSS 2018
 

Dernier

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 

Dernier (20)

Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 

StringSifter: Learning to Rank Strings Output for Speedier Malware Analysis

  • 2. ©2019 FireEye©2019 FireEye2 About Us  Michael Sikorski  Philip Tully  Jay Gibble  Matthew Haigh
  • 4. ©2019 FireEye©2019 FireEye One String can Make a Difference 4 NanoHTTPD webserver produces extra whitespace Cobalt Strike Server Detection Continued for 7 years Detection signature Track threat actors, identify C2 addresses https://blog.fox-it.com/2019/02/26/identifying-cobalt-strike-team-servers-in-the-wild/
  • 5. ©2019 FireEye©2019 FireEye Running Strings on larger binaries produces tens of thousands of strings. 5
  • 6. ©2019 FireEye©2019 FireEye Strings produces a ton of noise mixed in with important information. 6
  • 7. ©2019 FireEye©2019 FireEye What is a String 7  N characters + NULL No file format, context 0x31 0x33 0x33 0x37 0x00 – ‘1337’, right? Not necessarily: – memory address – CPU instructions – data used by the program
  • 8. ©2019 FireEye©2019 FireEye Wide Strings 8  Also be referred to as Wide strings  The Windows OS uses Wide strings internally – Microsoft’s encoding standard is UTF-16 LE  Each wide character is two bytes  C-style wide character strings terminated with double NULL (0x00, 0x00)
  • 9. ©2019 FireEye©2019 FireEye Compilation 9 SourceCode int main() { printf("Derby"); return 0; } ObjectFile "Derby" .EXEBinary .data 0x56000: "Derby" Strings persist on disk throughout the compilation process.
  • 10. ©2019 FireEye©2019 FireEye The Strings Program 10 !This program cannot be run in DOS mode. ??3@YAXPAX@Z ??2@YAPAXI@Z __CxxFrameHandler _except_handler3 WSAStartup() error: %d User-Agent: Mozilla/4.0 (compatible; MSIE 6.00; Windows NT 5.1) GetLastInputInfo SeShutdownPrivilege %sIEXPLORE.EXE SOFTWAREMicrosoftWindowsCurrentVersionApp PathsIEXPLORE.EXE [Machine IdleTime:] %d days + %.2d:%.2d:%.2d [Machine UpTime:] %-.2d Days %-.2d Hours %-.2d Minutes %-.2d Seconds ServiceDll SYSTEMCurrentControlSetServices%sParameters if exist "%s" goto selfkill del "%s" attrib -a -r -s -h "%s" Inject '%s' to PID '%d' Successfully! cmd.exe /c Hi,Master [%d/%d/%d %d:%d:%d]
  • 11. ©2019 FireEye©2019 FireEye Malware Triage 11 Customer Suspected compromise Incident Response Forensic analysis Identify malware sample Reverse Engineer Binary triage Malware analysis reverse engineers, SOC analysts, red teamers, incident responders, malware researchers
  • 12. ©2019 FireEye©2019 FireEye Knowing which strings are relevant often requires highly experienced analysts. 12
  • 13. ©2019 FireEye©2019 FireEye Strings Tells a Story 13 Relevance domain names IP addresses URLs filenames registry paths registry keys HTTP user-agent strings service configuration info keylogger indicators (e.g. ”[DELETE]”, “[BS]” third party libraries PDB strings function names debugging messages command line help/usage options OSINT runtime artifacts compiler artifacts Windows APIs library code localizations locations languages error messages random byte sequences format specifiers
  • 14. ©2019 FireEye©2019 FireEye Relevance is subjective and its definition can vary significantly across analysts. 14
  • 15. ©2019 FireEye©2019 FireEye Hypothesis and Goals 15  Develop a tool that can: – efficiently identify and prioritize strings – based on relevance for malware analysis StringSifter should: – be easy to use – generalize across: – personas, use cases, downstream apps – save time and money  How does it work?
  • 17. ©2019 FireEye©2019 FireEye  Search engines – web – e-commerce  News Feeds – social networks  Recommender systems – ads – movies – music Our Favorite Products Serve Up Rankings 17
  • 18. ©2019 FireEye©2019 FireEye ( )  Create optimal ordering of a list of items  Precise individual item scores less important than their relative ordering  In classification, regression, clustering we predict a class or single score  LTR rarely applied in security applications Learning to Rank 18 f
  • 19. ©2019 FireEye©2019 FireEye  Rank items within unseen lists in a similar way to rankings within training lists  Each item associated with a set of features and an ordinal integer label  Ordinal label is the teaching signal that encodes relevance level LTR as Supervised Learning 19
  • 20. ©2019 FireEye©2019 FireEye  Decision Trees – greedily choose splits by Gini impurity  Gradient Boosted Decision Trees (GBDTs) – combine outputs from multiple Decision Trees – reduce loss using gradient descent – weighted sum of trees’ predictions as ensemble  LightGBM – GBDTs with an LTR objective function Gradient Boosted Decision Trees 20
  • 21. ©2019 FireEye©2019 FireEye EMBER Training Dataset 21  Endgame Malware BEnchmark for Research – v1 (1.1 million PE files scanned on or before 2017)  https://arxiv.org/abs/1804.04637  https://github.com/endgameinc/ember – 400k train + test malware binaries from v1  malware defined as > 40 VT vendors say malicious  Ran Strings on 400k malware binaries – produced 3+ billion individual strings (24 GB) – performed sampling – labeled according to heuristics and FLARE hand-labeling
  • 22. ©2019 FireEye©2019 FireEye  Natural Language Processing – Markov model – Entropy rate, english KL divergence – Scrabble scores  Host, Network IoCs  Malware Regexes – encodings (base64) – format specifiers – user agents Representing Strings as Features 22 t % F 0.02 0.07 0.01 0.2 0.2 0.01 0.03 0.14 0.05 threshold = 0.01 http://evil.com SOFTWAREincludeevil.pdb t%Ft Vr}Y 0.018 0.014 0.007 0.001
  • 23. ©2019 FireEye©2019 FireEye quixotry ˈkwik-sə-trē (n.) behavior inspired by idealistic beliefs without regard to reality. 23
  • 25. ©2019 FireEye©2019 FireEye  Normalized Discounted Cumulative Gain – Normalized: divide DCG by ideal DCG on a ground truth holdout dataset – Discounted: divides each string’s predicted relevance by a monotonically increasing function (log of its ranked position) – Cumulative: the cumulative gain or summed total of every string’s relevance – Gain: the magnitude of each string’s relevance Evaluation 25
  • 26. ©2019 FireEye©2019 FireEye Results 26 StringSifter performs well on a holdout set of 7+ years of FLARE malware reports.
  • 28. ©2019 FireEye©2019 FireEye Open Sourcing StringSifter 28  The tool is now live: – https://github.com/fireeye/stringsifter – pip install stringsifter – Command line and Docker tools  flarestrings <my_sample> | rank_strings  Versatility – FLOSS outputs – live memory dumps
  • 30. ©2019 FireEye©2019 FireEye  Git + local pip install – Easy access to source code  Pip install from PyPi – If you just want to use the tool  Docker container – Minimum impact to host Install and Use 30 git clone https://github.com/fireeye/stringsifter.git cd stringsifter pip install -e . flarestrings <my_sample> | rank_strings pip install stringsifter flarestrings <my_sample> | rank_strings git clone https://github.com/fireeye/stringsifter.git cd stringsifter docker build -t stringsifter -f docker/Dockerfile . docker run -v <malware_dir>:/samples -it stringsifter flarestrings /samples/<my_sample> | rank_strings
  • 31. ©2019 FireEye©2019 FireEye  There are many versions of "strings" – Gnu binutils, BSD, various windows implementations – Inconsistent features  flarestrings – Pure python implementation of "strings" – Consistent across platforms – Prints both ASCII and wide strings flarestrings * 31 * FLARE => FireEye Labs Advanced Reverse Engineering
  • 37. ©2019 FireEye©2019 FireEye  Rapid screening for potential capabilities  Detect and handle packed / obfuscated binaries – Tipoff for automated unpacker tooling  Leverage feature vectors to focus triage  Improve NLP  Improve ranking performance on mach-o, ELF Other Use Cases and Future Work 37
  • 38. ©2019 FireEye©2019 FireEye  Plug into your malware analysis stack  Seeking critical feedback – improve accuracy and utility – pertinent edge cases, non-PE files – contribute via GitHub Issues  Beginners and experts alike  Thank you for your attention! Community Support 38 https://github.com/fireeye/stringsifter pip install stringsifter

Notes de l'éditeur

  1. Introduce what binary triage is and how it relates to malware analysis – add a slide about other users (incident response, soc analyst, researchers (move triage before incident response.
  2. Starts at hex 21 / 94 printable characters
  3. Reverse inference
  4. Traditional ML solves a prediction problem (classification or regression) on a single instance at a time. E.g. if you are doing spam detection on email, you will look at all the features associated with that email and classify it as spam or not. The aim of traditional ML is to come up with a class (spam or no-spam) or a single numerical score for that instance. LTR solves a ranking problem on a list of items. The aim of LTR is to come up with optimal ordering of those items. As such, LTR doesn’t care much about the exact score that each item gets, but cares more about the relative ordering among all the items.
  5. Traditional ML solves a prediction problem (classification or regression) on a single instance at a time. E.g. if you are doing spam detection on email, you will look at all the features associated with that email and classify it as spam or not. The aim of traditional ML is to come up with a class (spam or no-spam) or a single numerical score for that instance. LTR solves a ranking problem on a list of items. The aim of LTR is to come up with optimal ordering of those items. As such, LTR doesn’t care much about the exact score that each item gets, but cares more about the relative ordering among all the items.
  6. - Learning to rank learns to directly rank items by training a model to predict the probability of a certain item ranking over another item. - This is done by learning a scoring function where items ranked higher should have higher scores. The model can be trained via gradient descent on a loss function defined over these scores. - For each item, gradient descent pushes the score up for every item that ranks below it and pushes the score down for every item that ranks above it. The “strength” of the push is determined by the difference in scores. - To ensure that the model focuses on getting the higher ranks (which are generally more important) correct, we can weight the “strength” of the push by a factor that accounts for how important the ranking is.
  7. Discounted reflects the goal of having the most relevant strings ranked towards the top of our predictions Normalization makes it possible to compare scores across samples since the number of strings within different Strings outputs can vary widely. which we obtain from FLARE-identified relevant strings contained within historical malware reports.
  8. Discounted reflects the goal of having the most relevant strings ranked towards the top of our predictions Normalization makes it possible to compare scores across samples since the number of strings within different Strings outputs can vary widely. which we obtain from FLARE-identified relevant strings contained within historical malware reports.