6. DISTRIBUTED DENIAL OF SERVICE
• DDoS is the current threat as seen on recent news on cyber attacks
• Mirai, for example, employs millions of infected network devices to perform DDoS
• These devices form a network of zombies or bots, so-called “botnet”
• The botnet(s) is/are controlled by a person or a group of people known as “botmaster(s)”
• Botmasters issue commands to the botnet after the bots have successfully established
connections to the Command-and-Control (C&C) server(s)
12. BOTNET C&C LOOKUP
• Botnet establishes connection with its C&C server by first looking up the IP address of its C&C
server
• Regardless of its architecture / topology, botnets mostly use fluxing
• There are two types of fluxing:
• IP Flux
• Domain Flux
13. IP FLUX
• A single Fully Qualified Domain Name (FQDN) associated with many constantly-changing IP
addresses
• There are two types of IP Fluxing techniques:
• Single Flux
• Double Flux
14. DOMAIN FLUX
• Many FQDNs resolve to a single IP address
• Most of the time this IP address is the IP address of the proxy, not the actual C&C server
• One of the most popular techniques nowadays is the Domain Generation Algorithm (DGA)
16. DEFINITION
Domain generation algorithms (DGA) are algorithms seen in various families of
malware that are used to periodically generate a large number of domain
names that can be used as rendezvous points with their command and control
servers.
17. CHARACTERISTICS
• NXDOMAIN responses
• Usually random on the 2LD or 3LD domains
• A lot of requests from the same IP address
• Ranges from completely unreadable words (not compliant to Zipf’s Law) to dictionary words
(harder to detect).
18. MALWARES USING DGA
• Kraken
• Conficker
• Gameover Zeus
• Pykspa
• Cryptolocker
• Dyre
• Darkshell
• Locky
• Mad Max
• PandaBanker
• Pushdo
• Ramnit
• Srizbi
• Torpig
• Virut
• etc.
19. DGA DETECTION TECHNIQUES
• Reverse Engineering (Generating Regular Expressions for DGA Detection)
• Zipf’s Law (Detecting the Existence of DGA within Log Files)
• Maximum Consonant Sequence Length (Detecting the DGA within Log Files)
• Hierarchical Clustering (Clustering Log Files)
21. DGARCHIVE
• Daniel Plohmann, Khaled Yakdan, Michael Klatt, Johannes Bader, and Elmar Gerhards-Padilla
published a paper entitled “A Comprehensive Measurement Study of Domain Generating
Malware” in which they discussed the many different categories of malware DGAs.
• In addition, they also managed to create DGArchive, a repository of DGA regexes from 69
malware families obtained by reverse engineering malware samples.
• Using the regexes, it is possible to generate list of AGDs for the current day to be used as a
blacklist before the DGA attack even started.
22. DRAWBACK OF REGEX
• The regex provided by DGArchive is too generic
• For example, the DGA regular expression of Darkshell is [sS]{6}.com and google.com
fits into the regex
• Some other detection measures are necessary
24. ZIPF’S LAW
Zipf's law states that given some corpus of natural language utterances, the
frequency of any word is inversely proportional to its rank in the frequency
table. Thus the most frequent word will occur approximately twice as often as
the second most frequent word, three times as often as the third most frequent
word.
25. N-GRAM FREQUENCIES
Let’s take facebook.com as an example:
• Unigrams = [‘f’, ‘a’, ‘c’, ‘e’, ‘b’, ‘o’, ‘o’, ‘k’, ‘c’, ‘o’, ‘m’]
• Bigrams = [‘fa’, ‘ac’, ‘ce’, ‘eb’, ‘bo’, ‘oo’, ‘ok’, ‘co’, ‘om’]
• Trigrams = [‘fac’, ‘ace’, ‘ceb’, ‘ebo’, ‘boo’, ‘ook’, ‘com’]
The bigram frequency:
• fa = 1
• ac = 1
• ce = 1
• eb = 1
• bo = 1
• oo = 1
• ok = 1
• co = 1
• om = 1
The unigram frequency:
• f = 1
• a = 1
• c = 2
• e = 1
• b = 1
• o = 3
• k = 1
• m = 1
26. BIGRAM FREQUENCY OF LOG FILE
Given a DNS Log File containing
a list of domain names as follows:
• google.com
• facebook.co.id
• apple.com
• youtube.com
• klikbca.com
• twitter.com
• detik.com
• co = 7
• om = 6
• ik = 2
• le = 2
• oo = 2
• ac = 1
• ca = 1
• it = 1
• ce =1
The sorted bigram frequencies would be:
• ap = 1
• go = 1
• et = 1
• gl = 1
• er = 1
• pp = 1
• tw = 1
• tt = 1
• tu = 1
• li = 1
• ti = 1
• te = 1
• pl = 1
• be = 1
• de = 1
• yo = 1
• bc = 1
• bo = 1
• wi = 1
• fa = 1
• eb = 1
• kb = 1
• ok = 1
• og = 1
• ut = 1
• kl = 1
• ou = 1
• ub = 1
• id = 1
27. CONVERTING FREQUENCIES TO FREQUENCY RATIOS
• There are 38 distinct bigrams in the given DNS log file
• The total of all 38 bigram frequencies are 52
• The most frequent bigram frequency is 7, equalling to 7/52 times in the log file
• The least frequent bigram frequency is 1, equalling to 1/52 times in the log file
• Therefore the max and min bigram frequency ratio is 0.1346 and 0.0192 respectively
33. AGD VS HGD
• From the graphs, it is seen that Algorithmically-Generated Domains (AGD) such as the Conficker and
Pykspa worm domains, generate a relatively straight line graph while Human-Generated Domains (HGD)
like Alexa’s Top 500 sites produce an elbow-shaped graph .
• This observation leads to the creation of a formula for calculating the probability of a given log file
containing DGA domains or incurring a DGA attack. The higher the DGA probability rate, the higher the
possibility of an ongoing DGA attack within the monitored log.
35. DISCOVERING DGA WITHIN LOG FILES
• Further observation on the polluted log file (identified using Zipf’s Law) reveals one of the most
prominent DGA characteristics that allow us to distinguish AGDs from HGDs better, i.e. Maximum
Consonant Sequence Length. Generally, AGDs has a larger value of MCS Length compared to HGDs.
• Example:
• google.com has a maximum consonant sequence length of 2, since the longest consonant sequence is “gl”
• vofwxlbi.cn, one of the domains generated by Conficker worm, has a Maximum Consonant Sequence Length of
5 and the longest sequence is “fwxlb”
42. COUNTERMEASURES – DNS RPZ
• Obtain daily DGA log file from http://data.netlab.360.com/feeds/dga/dga.txt
• Parse using dnsanalysis library in Python
• Export to text file and implement into DNS RPZ
43. REFERENCES
• Botnet Communication Topologies
https://www.damballa.com/downloads/r_pubs/WP_Botnet_Communications_Primer.pdf
• A Comprehensive Measurement Study of Domain Generating Malware
https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_plohmann.p
df
• DGArchive – A deep dive into domain generating malware
https://www.botconf.eu/wp-content/uploads/2015/12/OK-P06-Plohmann-DGArchive.pdf
• Using DNS RPZ to Block Malicious DNS Requests
https://blogs.cisco.com/security/using-dns-rpz-to-block-malicious-dns-requests