11. Malware Communications
Let’s pretend…
We all just compromised 10k hosts for our botnet
What do we do now?
Have our malware phone home
Botnets are resilient cloud based, often distributed, remote
administration systems
[10]
13. Malware Communications: IP
Open socket
Beacon to IP address
Easy to set up
Easy to take down
Client
Implant
C2
Server
Client
Implant
Client
Implant
14. Malware Communications: P2P
Open socket
Beacon to super node peer(s)
Very resilient
Peer consensus issues
Complex to set up
Super
node
Super
node
Super
node
Super
node
[9]
Client
Implant
Client
Implant
Client
Implant
16. Malware Communications: DNS
Open socket
Issue DNS query
Open socket
Beacon to IP address
Relatively easy to set up
Relatively easy to take down
Client
Implant
C2
Server
Client
Implant
Client
Implant
DNS
Resolver
18. Malware Communications:
DNS Resiliency Tricks
Fast Flux – DNS A records change quickly
Double Flux – DNS A and NS records change quickly
Domain Generation Algorithms (DGA) – C2 domain
names are generated dynamically by a deterministic
function within the implant at run time.
Samples are "strings proof"
19. How To DGA
Client
DGA
Date
Seed
Hash/PRNG
String
TLD
set
Domain
name
Lexicon
query
connect
to
IP
NXD
A
Start
End
21. Malware Communications: DGA
- Function that generates
domain names
- Shared secret between
botnet implants and
operators
- Often incorporates the date
Operator registers domain
“just in time” before the
implant generates it
[3]
Client
Implant
Registrar
Operator
DNS
Resolver
C2
Server
22. Malware Communications: DGA
- Function that generates
domain names
- Shared secret between
botnet implants and
operators
- Often incorporates the date
Registrar ensures the domain
is inserted into the DNS
[3]
Client
Implant
Registrar
Operator
DNS
Resolver
C2
Server
23. Malware Communications: DGA
- Function that generates
domain names
- Shared secret between
botnet implants and
operators
- Often incorporates the date
Implant generates and
resolves the domain
[3]
Client
Implant
Registrar
Operator
DNS
Resolver
C2
Server
24. Malware Communications: DGA
- Function that generates
domain names
- Shared secret between
botnet implants and
operators
- Often incorporates the date
Implant connects to C2 IPv4
[3]
Client
Implant
Registrar
Operator
DNS
Resolver
C2
Server
25. Malware Communications: DGA
- Function that generates
domain names
- Shared secret between
botnet implants and
operators
- Often incorporates the date
Repeat:
Operator is constantly
registering domain names
[3]
Client
Implant
Registrar
Operator
DNS
Resolver
C2
Server
28. Each DGA is Special Snowflake
Conficker.C – generated 50k names per day
Pushdo – DGA as a backup if C2 domain went down
Kelihos – DGA as a backup if P2P network went down
newGOZ DGA domains…
registered through a few common registrars
typically registered 1hr before algo would generate them
changed NS domains but reused NS IPv4s
[4]
[11]
29. DGA Domain Query Periods
Dyre
Ramnit
Matsnu
Pykspa
Bedep
~1 day
N/A
~2 weeks
~3 weeks
~1 week
30. Generalized DGA pseudo code…
for i in domain_set_size:
domain = generate_domain(date, magic)
resolve domain
if domain resolves
contact domain
StopIteration
def generate_domain(date, magic):
domain = ''
for i in lexicon_item_count:
item = random_select(lexicon, magic)
domain = domain + item
domain = domain + random_select(tld_set, magic)
return domain
31. Generalized Algorithms Analyses
Domain set size
How many domains to generate
Date
Today's date
Seed
A number used to ignite a PRNG
Salt
A magic number or campaign ID
Lexicon
A set of letters, n-grams, or words
Lexicon Items Count
Number of items to use from lexicon
TLD set
All possible TLDs
MD*, SHA*, Etc
Some hash
PRNG
Random numbers
Bitwise Math
xor, shl/shr, mod, b64, ascii to
hex
Names to contact
These are often regex-able due to
properties of the transformation
function
Inputs Functions Outputs
32. An Algorithm Taxonomy from Inputs
Group
Lexicon
Domain
set
size
Salt/
Seed
Date
Examples
A
LeNers
Yes
Yes
Yes
Necurs,
GOZ,
Symmi,
Tinba,
Pykspa
B
LeNers
Yes
Yes
No
Ramnit,
DirCrypt,
VolaVleCedar,
Ramdo
C.i
LeNers
Yes
No
Yes
Conficker,
Dyre,
Cryptolocker,
Pushdo,
Qakbot
C.ii
Words
Yes
No
Yes
Matsnu,
Rovnix
36. Ramnit DGA Pseudo Code
class RandInt: # LCG PRNG, random uint32
def __init__(self, seed):
self.seed = seed
def rand_int_modulus(self, modulus):
ix = self.seed
ix = 16807*(ix % 127773) - 2836*(ix / 127773) /
& 0xFFFFFFFF
self.seed = ix
return ix % modulus
r = RandInt(seed) # seed = ?
for i in domain_set_size: # domain_set_size = ?
seed_a = r.seed
domain_length = r.rand_int_modulus(12) + 8 # domain_length = {8,19}
seed_b = r.seed
domain = ''
for i in domain_length:
char = 'a' + r.rand_int_modulus(25) # lexicon = [a-y]
domain += char
domain += ".com” # tld_set = [“.com”]
m = seed_a*seed_b
r.seed = (m + m//(2**32)) % 2**32
yield domain
[1]
37. Ramnit DGA Pseudo Code
class RandInt: # LCG PRNG, random uint32
def __init__(self, seed):
self.seed = seed
def rand_int_modulus(self, modulus):
ix = self.seed
ix = 16807*(ix % 127773) - 2836*(ix / 127773) /
& 0xFFFFFFFF
self.seed = ix
return ix % modulus
r = RandInt(seed) # seed = ?
for i in domain_set_size: # domain_set_size = ?
seed_a = r.seed
domain_length = r.rand_int_modulus(12) + 8 # domain_length = {8,19}
seed_b = r.seed
domain = ''
for i in domain_length:
char = 'a' + r.rand_int_modulus(25) # lexicon = [a-y]
domain += char
domain += ".com” # tld_set = [“.com”]
m = seed_a*seed_b
r.seed = (m + m//(2**32)) % 2**32
yield domain
[1]
38. Ramnit DGA Pseudo Code
Client
DGA
Seed
uint32
LCG
PRNG
string
+
".com"
Domain
Name
Lexicon
[a-‐y]{8,19}
query
connect
to
IP
NXD
A
39. Ramnit DGA Pseudo Code
Unknowns
1. Linear congruential
generator’s seed
2. How many times this
loop occurs
Client
DGA
Seed
uint32
LCG
PRNG
string
+
".com"
Domain
Name
Lexicon
[a-‐y]{8,19}
query
connect
to
IP
NXD
A
40. Brute Forcing Ramnit DGA Seeds
Inputs: domain_set_size, seed, tld_set, lexicon
Outputs: names
I. Iterate over seed space (232) and identify candidate
seeds
II. Find and generate the seeds’ associated
domain_set_size
III. Determine the minimum set of seeds to produce all
domains (overlap in LCG output)
[2]
41. Step 1: Identify Candidate Seeds
1. Seed the Ramnit DGA with every value 0-232
2. Generate the first domain from each seed
– 27 hours on an AWS c3.8xlarge
– 24 processes, each with its own CPU core and a portions
of the seed space
– Resulting seed and domain tuples sorted and merged
3. Scan OpenDNS querylogs and find which domains
received at least one query
4. Seeds which generated domains that received
queries are candidate seeds
44. Step 2: Find Seeds’ Domain Set Size
1. Observe the domain’s hourly query counts for the
previous two weeks*
2. For each candidate seed, generate the next domain
3. Compare 2 to the seed’s composite query pattern
If they are similar:
1. Merge the pattern into the seed’s composite query pattern
2. Increment the seed’s domain set size
3. Goto 1
Otherwise:
1. Exit
* A vector with each position representing an hourly count of DNS queries
46. Seeds’ Domain Set Size Example
seed1, domain1
seed1, domain2
seed1, domain3
seed1, domain4
47. Step 3: Minimum Seed Set for Domain Coverage
1. For each seed and its associated domain set…
2. Remove all domain sets that are subset of other
domain sets
3. Minimum seed set for domain coverage remains
Seeds that remain aren’t necessarily “in the wild”
They are seeds that generate all domains “in the wild”
50. Brute Forcing Algorithm Weaknesses
1. The first domain from each seed is used to
located candidate seeds
2. No queries on that day means seed is ignored
3. Point in time analysis
4. DGAs collide with legitimate domain names
- 1 million monkeys typing in 1 million address bars
will eventually browse to 4chan
52. Results: Seeds, Domains, Clients
29 seeds, 3924 domains
- Seeds confirmed by Symantec’s report
I found some seeds not listed in Symantec’s report
- Not a big deal due to overlaps in Ramnit DGA’s LCG
seeds
I found some domains not listed in Symantec’s report
- Bigger deal if Symantec is serious about takedowns
[7]
[8]
55. Results: Patterns in Domain Queries
1. Locate IPv4s that queried each domain
2. Create a graph of seed -> domains -> client IPv4s
3. Count connect components (I found two)
S
S
S
S
S
D
D
D
D
D
D
D
D
D
C
C
C
C
C
C
57. Applications and Improvements
Generalize framework for use with all DGA implementations
- Currently working with more than just Ramnit
Vigilant monitoring instead of point in time search
- Ramdo seeds are able to be updated by the C2 server
- even if you RE the algorithm, you don't have the seed
unique to each compromised system
Combine with other DGA detection techniques
- co-occurrances and lexical features
[6]
58. Conclusion
Why should you care?
- Many malware families are using DGAs
- This is a new way to identify new badness
- Know the shared secret, find all the C2 domains
- Not all DGAs are created equal
- Some are more difficult to track than others
- malware authors are people too
- 3:30, “The Life and Times of an APT Malware Author”