1. DRIEI PhD Program in Electronic and Computer Engineering
PhD School in Information Engineering
Host and Network based Anomaly
Detectors for HTTP A8acks
By Advisor
Davide Ariu Prof. Giorgio Giacinto
Pattern Recognition and Applications Group
Department of Electrical and Electronic Engineering
University of Cagliari, Italy
2. Outline
• Web Applica6ons
– Mo@va@ons
– Overview
• Intrusion Detec6on Systems
– Network vs. Host‐based IDS
– Signature Based IDS
– Anomaly‐based IDS
• Network Based IDS: Payload Analysis
– State of Art
– Contribu6on #1: McPAD
– Contribu6on #2: HMMPayl
• Host Based IDS: Request URI Analysis
– Contribu6on #3: HMM-Web
• Conclusions
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 2
3. Web Applica6ons Security
Mo6va6ons
• More than 200,000,000 of sites (January 2010)1
– A lot of sensi@ve data sent everyday over the newtork
• Cybercriminals interested in sensi6ve data:
– E.g. Credit Card Numbers
– E.g. Bank Account Creden6als
– E.g. Iden66es theXs. The full iden@ty of a European ci@zen might be quite interes@ng for a
terrorist since the free circula@on within European Union Countries.
• Vulnerabili6es on Web Applica6ons
– More than 50% of vulnerabili@es discovered during the first half of 2009 affected Web Applica@ons2
1 Source: Netcra'.com
2 Source: X‐Force Mid‐year report 2009
March 5, 2010 Host and based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 3
4. Web Applica6ons
Overview
HTTP Request
HTTP Payload
GET /pra/index.php?lang=eng HTTP/1.1 Request URI
Host: prag.diee.unica.it
User-Agent: Mozilla/5.0
Headers
Connection: keep-alive
Accept-Encoding: gzip,deflate
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 4
5. Intrusion Detec6on Systems
Network vs Host‐based IDS
• Based on the source of data being audited IDS
can be classified in:
• Network‐based IDS
– Monitor the network ac@vity
– A single IDS can monitor an en@re network
• Host‐based IDS
– Analyze the ac@vity of a specific Host
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 5
6. Intrusion Detec6on Systems
Signature‐based IDS
• Signature (or misuse) based systems
– Each a8ack is described by one or more signatures
• E.g. A certain sequence of bytes is found within a payload
• E.g. An applica@on receives a certain input value
• Troubles:
– Signatures can be extracted only from known a8acks
• Vulnerable to zero‐days (that is never seen before) a8acks
– A signature is ineffec@ve against variants of the same
a8ack (polymorphism)
– It is difficult to keep up with the large number of a8acks
that appear every day
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 6
7. Intrusion Detec6on Systems
Anomaly‐based IDS
• Anomaly based IDS rely on a model of the normal
behavior of the resource to be protected
• A normal behavior of a resource is “a set of
characteris,cs that are observed during its
normal opera,on”.
• Advantages:
– Both known and unknown a8acks can be detected
• Anomaly‐based IDS can face up with zero‐days a8acks
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 7
8. Intrusion Detec6on Systems
Performance Evalua6on
• IDS are usually evaluated in terms of:
– Detec6on Rate (or True Posi6ve Rate)
• The percentage of A8acks Detected
– False Posi6ve (or Alarm) Rate
• The percentage of legi@mate pa8erns wrongly classified as
a8acks
– Area Under the ROC Curve
• It allows to evaluate the IDS for all the possible opera@ng
points
• We considered a Par6al AUC (AUCp) obtained with
maximum false posi@ve rate 0.1
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 8
10. Network Based IDS
Payload Analysis
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 10
11. Payload Analysis
Ra6onale
• The assump6on behind IDS based on payload
sta@s@cs is that normal and aPack payloads
have different distribu6ons of bytes.
• APacks can be detected if they make payload
sta@s@cs deviate from those of the normal
traffic.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 11
12. Payload Analysis
Mo6va6ons
A normal payload
GET /pra/ita/home.php HTTP/1.1
Host: prag.diee.unica.it
Connection: Keep-alive
Accept: text/*, text/html
Accept-Encoding: compress, gzip
Accept-Language: it, en-gb
Long Request Buffer Overflow aPack
HEAD / aaaaaaa…aaaaaaaaaaaa
URL Decoding Error aPack
GET /d/winnt/system32/cmd.exe?/c+dir HTTP/1.0
Host: www
Connection: close
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 12
13. Payload Analysis
Mo6va6ons
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 13
14. State of Art: PAYL1
• PAYL is based on the n‐gram analysis, a technique that
was proposed to solve text classifica@on problems2:
– A sliding window of width n runs over the payload
– The occurrences of n‐grams are counted and their rela6ve frequencies
are calculated
– Example n=1
4 3 3 1 3 4 2 3 3 4 1-gram
– Example n=2
4 3 3 1 3 4 2 3 3 4 2-gram
1Wang et al., “Anomalous Payload‐based Network Intrusion Detec6on”, RAID Int. Symposium, 2004.
2Damashek, “Gauging similarity with n‐Grams: Language‐independent Categoriza6on of Text”, Science, 1995.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 14
15. State of Art: PAYL
• PAYL is quite effec@ve but:
– A value of n=1 doesn’t take into account the structure
of the payload
• It might be quite simple for an a8acker to mimic
distribu@ons of 1‐grams1
• It is difficult to detect a8acks that slightly modify the
sta@s@cs of the payload
– To model the structure of the payload a value of n>=2
must be considered
• Since the payload is represented in a feature space of size
256n a value of n bigger than 2 can’t be used
1Fogla et al. “Polymorphic Blending APack”, USENIX Security Symposium, 2006.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 15
16. Original Contribu6on n°1
McPAD1
Mul@ple Classifiers Payload Anomaly Detector
1R. Perdisci, D. Ariu, P. Fogla, G. Giacinto, W. Lee. McPAD: A Mul,ple classifier system
for accurate payload‐based anomaly detec,on. Computer Networks, 2009.
Special Issue on Traffic Classifica@on and Its Applica@ons to Modern Networks
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 16
17. McPAD
Mul6ple Classifiers Payload Anomaly Detector
• IDEA: The n‐gram analysis can be approximated using n‐1
classifiers each one of which works into a feature space of size
2562
• We calculate rela@ve frequencies of pairs of bytes from 0 to ν
posi6ons away from each other (2‐ν‐gram analysis)
• Example: ν = 2 (equivalent to a 4‐gram)
2-0-gram 4 3 3 1 3 4 2 3 3 4
• = ν+2
n
• +1 feat. Spaces
ν
2-1-gram 4 3 3 1 3 4 2 3 3 4 • clustering
A
algorithm is applied
2-2-gram 4 3 3 1 3 4 2 3 3 4
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 17
18. McPAD
Scheme
McPAD
SVM 1
SVM 2
Feature Extrac6on
PAYLOAD MCS label
and Reduc6on
SVM k
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 18
19. McPAD
Experimental Setup
• Legi@mate traffic
– 7 days of HTTP request toward the web server of the
College of Compu@ng at Georgia Tech (GT).
– 5 days or HTTP request from the DARPA dataset
• A8acks
– 66 Generic HTTP A8acks (Shellcode,DoS, Informa@on
Leakage, etc.)
– 11 Shell‐code A8acks
– 96 polymorphic a8acks generated with CLET
– 6339 Polymorphic Blending A8acks (PBA1)
1Fogla et al. “Polymorphic Blending APack”, USENIX Security Symposium, 2006.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 19
20. McPAD
Experimental Results
Very low false posi6ve rate
Payl (1-gram) McPAD
Detection Rate
Detection Rate
False Positive Rate False Positive Rate
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 20
21. McPAD
Experimental Results: MCS Benefits
The AUCp increases with the number of classifiers
Shell‐code APacks Generic APacks
AUCp
AUCp
Number of Models Number of Models
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 21
22. McPAD
Experimental Results: Increased Bayesian DR
Payl (1-gram) McPAD
Detection Rate
Detection Rate
False Positive Rate False Positive Rate
• xelsson provided a defini@on of Bayesian Detec6on Rate1
A
2 ⋅10−5 P(A | I)
P(I | A) =
2 ⋅10−5 P(A | I) + 0.99998 ⋅ P(A |¬I) False Posi@ve
1Axelsson S., “The base‐rate fallacy and the difficul6 of Intrusion Detec6on”, ACM TSSEC, 2000.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 22
€
23. McPAD
Weakness
• The 2‐ν‐gram analysis only allows for an
approximate representa@on of n‐grams.
Ques6on
– Is there any algorithm that has the same
expressive power of the n‐gram analysis but
doesn’t suffer from the same limita@ons in terms
of computa@onal cost?
Answer
– Yes, we can use Hidden Markov Models
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 23
24. Original Contribu6on n°2
HMMPayl1
Hidden Markov Models for the Analysis of the HTTP
Payload
1D. Ariu, G. Giacinto, R. Tronci. HMMPayl: an Intrusion Detec,on System based on
Hidden Markov Models. SubmiPed to Computers and Security, Elsevier, 2010.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 24
25. HMMPayl
Hidden Markov Models for Payload Analysis
• IDEA: We can consider an n‐gram as a
sequence and model it using HMM.
• Using the HMM we can associate a probability
to each sequence extracted from the payload.
• Star@ng from the probabili@es associated to all
the sequence extracted from the payload we
can obtain an overall probability for it.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 25
26. HMMPayl
A simple example
• E.g. Given a toy payload (with a window width = 5)
2 1 2 0 0 1 2 1 0 2
Sequence 1 2 1 2 0 0
0.62 Probability of
Sequence 2 1 2 0 0 1 the payload
0.65
Sequence 3 1 2 0 0 1 0.67
HMM = 0.66
Sequence 4 1 2 0 0 1 0.70
Sequence 5 1 2 0 0 1 0.68
Sequence 6 1 2 0 0 1 0.64
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 26
27. HMMPayl
Scheme
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 27
28. HMMPayl
Experimental Setup
• Legi@mate traffic
– 7 days of HTTP requests toward the web server of the College of
Compu@ng at Georgia Tech (GT)
– 6 days of HTTP requests toward the web server of our
department (DIEE)
– 5 days or HTTP request from the DARPA dataset
• A8acks
– 66 Generic HTTP A8acks (Shellcode,DoS, Informa@on Leakage,
etc.)
– 11 Shell‐code A8acks
– 96 polymorphic a8acks generated with CLET
– 38 Cross Site Scrip@ng (XSS) and SQL‐Injec6on A8acks
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 28
29. HMMPayl
Experimental Results
AUCp increased respect to McPAD
Generic APacks
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 29
30. HMMPayl
Experimental Results: Classifiers (Ideal) Selec6on1
1R. Tronci, G. Giacinto, F. Roli, “Dynamic score selec,on for fusion on mul,ple biometric matchers”, ICIAP 2007
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 30
31. HMMPayl
Experimental Results: Sequences Sampling
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 31
32. Host Based IDS
Analysis of the Request‐URI
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 32
33. Original Contribu6on n°3
HMM‐Web1
Hidden Markov Models for Web Applica@ons Protec@on
1I. Corona, D. Ariu, G. Giacinto. HMM‐Web: A framework for the detec,on of aEacks
against web applica,ons. IEEE Interna@onal Conference on Communica@ons, Dreden,
2009.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 33
34. Analysis of the Request URI
Mo6va6ons
• With the Request URI input arguments can be
provided to the Web Applica6on
– Input arguments are provided as aPribute‐value pairs
• Normal requests should be generated clicking
somewhere in a web page
– The posi@on of a8ributes in the request depends on the
hyperlink
• An aPribute can’t receive whatever value
– A model of the values that an a8ribute can receive is necessary
– It is important to dis@nguish between alphabe@c‐characters,
digits and meta‐characters.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 34
35. HMM‐Web
Scheme
GET /search.php?cat=32&key=hmm HTTP/1.1
Module: index.php
HMM‐Web
Module: search.php
Module: list.php
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 35
36. HMM‐Web
Scheme
GET /search.php?cat=32&key=hmm HTTP/1.1
Module: index.php
HMM Ensemble
cat-key
Sequence of APributes
HMM Ensemble
3-2
Cat APribute Value
HMM Ensemble
h-m-m
Key APribute Value
Module: search.php
Module: list.php
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 36
37. Experimental Results
Effec6veness of aPributes’ codifica6on
The curve on the right has been obtained using the codifica6on proposed by Kruegel et al. In “A mul,model approach to the
detec,on of web‐based aEacks”, Computer Networks, 2005.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 37
38. Conclusions ‐ 1
• With this research we addressed the problem of
protec6ng web applica6ons
• We proposed Network‐based IDS that offer
protec@ons against a wide range of aPacks
• We proposed an IDS (McPAD) that achieved both
high classifica6on accuracy and robustness
against a8empts of evasion
• We proposed an IDS (HMMPayl) that realizes a
very accurate model of the payload
outperforming previously proposed approaches
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 38
39. Conclusion ‐ 2
• We shown that Mul6ple Classifiers are useful
to increase both the classifica6on accuracy
and the robustness against aPempts of
evasion
• We proposed also a Host‐Based solu6on
(HMM-Web) to model the input provided to
web applica@ons.
March 5, 2010 Host and Network based Anomaly Detectors for HTTP APacks ‐ Davide Ariu 39