comprehensively reviewing the features derived from network traffic, and the related data preprocessing techniques which have been used in anomaly-based NIDS since 1999.
grouping anomaly-based NIDS based on the types of network traffic features used for detection. The aim is to show where the majority of research has been focused. The groups show a trend from previously using packet header features exclusively, to using more payload features.
2. u Dataset creation
u involves identifying representative network traffic for training and
testing. These datasets should be labeled indicating whether the
connection is normal or anomalous.
u Feature construction
u create additional features with a better discriminative ability than the
initial feature set. This can bring significant improvement to
machinelearning algorithms. Features can be constructed manually, or
by using data mining methods such as sequence analysis, association
mining, and frequent-episode mining.
u Reduction
u is commonly used to decrease the dimensionality of the dataset by
discarding any redundant or irrelevant features.(FS)
Data preprocessing
3. u comprehensively reviewing the features derived from
network traffic, and the related data preprocessing
techniques which have been used in anomaly-based NIDS
since 1999.
u grouping anomaly-based NIDS based on the types of
network traffic features used for detection. The aim is to
show where the majority of research has been focused.
The groups show a trend from previously using packet
header features exclusively, to using more payload
features.
paper main contributions
6. u Minimize data preprocessing requirements
u Real-time, High bandwidth links
u Summarizing a series of network packet headers into a
single flow record, such as NetFlow, further reduces
resource requirements
u Packet header approaches also have the advantage of
remaining valid when traffic payloads are encrypted, such
as with SSL sessions.
Packet header anomaly detection
7. u Data preprocessing to extract packet headers is
traightforward.
u Many software programs and libraries already exist to
process network traffic, e.g. libpcap, tcpdump, tshark,
tcptrace, Softflowd, NetFlow, and IPFIX implementations.
u The complex part of the data preprocessing is using
appropriate feature construction to derive more
discriminative features (e.g. time-based statistical
measures) from this basic traffic information.
8. u Only three papers use the basic features extracted directly
from individual packet headers without further feature
construction.
u PHAD
u to detect attacks against the TCP/IP stack, IDS evasion techniques,
imperfect attack code, and anomalous traffic from victim machines
u learns normal ranges for each packet header field at the data link
(Ethernet), Network (IP), and Transport/control (TCP, UDP, ICMP)
layers
u The result is 33 packet header fields used as basic features. The
possible numeric range of each packet header field is very large, so
to reduce this space, clustering is used.
u a univariate approach which cannot model dependencies
between features.
Packet header basic features
9. u SPADE : one of the first attempts to use an anomaly method for
portscan detection
u the basic features are instead used to build a normal traffic
distribution model for the monitored network.
u Traffic distributions are maintained in real time by tracking joint
probability measurements, e.g. P (source address, destination
address, destination port), or using a Bayes Network.
u During detection, packets are compared to the probability
distribution to calculate an anomaly score.
u By retaining these unusual packets, it is possible to look for
portscans over
u a much wider time window.
Packet header basic features
10. u Attacks against wireless networks have also been detected using
packet headers, in this case from the MAC layer frame header.
u The approach requires tapping the local wireless network.
u Guennoun et al. (2008) perform preprocessing to extract all the frame
headers, convert any continuous features to categorical ones, and
derive new features
u A wrapper approach is then used to find the best set of features. It
uses a forward search algorithm which starts with the single most
relevant feature, tests it with a k-means classifier, and then iteratively
adds the next most relevant feature to the set. It was found that the
top eight ranked features produced a classifier with the best
accuracy.
Packet header basic features
12. u use complete network flows as data instances rather than
individual packet data.
u Analyzing flows provides more context than analyzing individual
packets standalone.
u Flows are unidirectional sequences of packets sharing a
common key such as the same source address and port, and
destination address and port.
u complete after a timeout period, or for TCP with end of session
flags (e.g. FIN or RST).
u A convenient way of obtaining flow information is to use
NetFlow records.
Single connection derived features
13. u Having a router generate NetFlow data saves the NIDS
from doing its own data preprocessing tasks such as
parsing of IP headers, maintaining packet counts, and
stream (flow) reassembly.
u Alternatively, NetFlow records can be produced on a
computer host using software such as softflowd NetFlow
records also significantly reduce the storage requirements
compared to full packet capture.
u NetFlow information is only based on packet headers, so
the transport payload is ignored.
SCD features
14. u The most common and important SCD features:
timebased statistical measures by monitoring basic
features over the duration of the flow.
u Examples
u counts of packets and bytes in the flow (as per NetFlow records),
u the average inter-packet arrival time,
u the mean packet length.
u These features are useful for fingerprinting sessions,
detecting unusual data flows, or finding other anomalies
within a single session.
SCD features
15. u ANDSOM
u Data preprocessing first segments the dataset by service type (TCP or UDP) and
the application protocol (HTTP or SMTP).
u For each data segment a different model is created. In this case self-organizing
maps (SOM) are used.
u The calculated SCD features are quad, start time, end time, whether the session
had a valid start (2 SYN packets), whether the connection was closed properly
(FINs) or improperly (RST), number of queries per second, average size of questions,
average size of answers, question answer idle time, answer question idle time, and
the duration of the connection.
u These features provide a fingerprint for the session. During the detection phase the
data instances were compared to the appropriate SOM model to detect
anomalies in that service.
u Testing successfully found an injected BIND attack and an HTTP tunnel, both of
which are detectable within a single flow.
SCD features
16. u Yamada et al.
u use SCD features to find attacks against webservers when the traffic is
encrypted by SSL or TLS.
u only use information from the unencrypted protocol headers for
detection.
u The features used are :
u the HTTP request and response sizes, calculated across each continuous activity
of each user.
u Since using size features alone would produce many false positives,
frequency analysis is also performed to eliminate alerts common to the
webserver.
u Statistically rare alerts are flagged as anomalies.
SCD features
17. u Anomaly detection using only TCP flags as SCD features
u TCP flags are extracted from packets within each TCP session,
and each flag combination is quantized as a symbol.
u A separate model is produced for each of the observed protocols
SSH, HTTP and FTP
u During the detection phase, network traffic is evaluated against
the appropriate model for anomaly detection.
u The approach was found to detect scans initiated by nmap, and
SSH and HTTP misuse.
u While this approach detects attacks which modify TCP
characteristics, it is not likely to detect payload-based attacks.
SCD features
18. u SCD features have been used to detect connections
which pass through multiple stepping stones (Yang and
Huang, 2007).
u SCD features are also used by Early and Brodley (2006).
Their aim is to automatically detect which application
protocol (e.g. SSH, telnet, SMTP, or HTTP) is being used
without using the destination port as a guide.
SCD features
19. u Are useful for finding anomalous behavior within a single
session, such as an unexpected protocol, unusual data
sizes, unusual packet timing, or unusual TCP flag
sequences.
u Particular detection capabilities include backdoors, HTTP
tunnels, stepping stones, BIND attacks, and command and
control channels.
u However, by themselves they cannot be used to find
activity spanning multiple flows such as DoS attacks or
network probes. For that, MCD features are required.
SCD features
20.
21. u Are constructed by monitoring base features over multiple
flows or connections
u They enable detection of anomalies which manifest
themselves as unusual patterns of traffic, such as network
probes and DoS attacks.
u Domain knowledge is used to choose a window of data to
consider.
u The time windows range from 5 s to 24 h, with shorter time
windows detecting bursty attacks, and long time windows
more likely to detect slow and stealthy attacks.
u Connection based windows are also used, such as
nalyzing the most recent 100 connections
Multiple connection derived features
22. u Domain knowledge is used to choose a window of data to
consider.
u The time windows range from 5 s to 24 h, with shorter time
windows detecting bursty attacks, and long time windows
more likely to detect slow and stealthy attacks.
u Connection based windows are also used, such as
nalyzing the most recent 100 connections.
MCD features
23.
24. u it has known limitations
u Advantages
u being publicly available, labeled, and preprocessed ready for
machine learning.
u Each network connection was processed into a labeled
vector of 41 features constructed using data mining
techniques and expert domain knowledge when creating
a machine learning misuse-based NIDS
KDD cup 99
25. u 9 basic and SCD header features for each connection
(similar to NetFlow)
u 9 time-based MCD header features constructed over a 2
s window
u 10 host-based MCD header features constructed over a
100 connection window to detect slow probes.
u 13 content-based features were constructed from the
traffic payloads using domain knowledge. Data mining
algorithms could not be used since the payloads were
unprocessed and therefore unstructured. They were
designed to specifically detect U2R and R2L attacks.
KDD 99 data preprocessing produced
27. u Many remote attacks on computers place the exploit
code inside the payload of network packets. Hence these
attacks are not directly detectable by packet header
approaches
u Payload attacks are more computationally expensive to
detect due to requiring deeper searches into network
sessions.
Content anomaly detection
28. u SANS Top Cyber Security Risks” 2009 report lists the top two
cyber risks as client side software which remains
unpatched, and vulnerable Internet-facing websites.
u The first risk can be exploited using malicious content
destined for a client, while the second can be exploited
using crafted content in requests to servers.
u In these cases, bytes containing the exploit code are
contained within network packet payloads beyond the
TCP/IP headers, such as within downloaded files.
Content anomaly detection
29. u PAYL
u uses 1-g and unsupervised learning to build a byte-frequency
distribution model of
u network traffic payloads.
u A 1-g is simply a single byte with value in the range 0e255. The
result of preprocessing a packet payload this way is a feature
vector containing the relative frequency count of each of the
256 possible 1-g (bytes) in the payload.
u The model also includes the average frequency, as well as the
variance and standard deviation as other features.
u Separate models of normal traffic are created for each
combination of destination port and length of the flow.
N-gram analysis of requests to servers
30. u PAYL was designed to detect zero-day worms, since flows with
worm payloads can produce an unusual byte-frequency
distribution.
u Testing was performed on all attacks in the DARPA 1999 dataset
using individual packets as data units (connection data units
were also attempted).
u The overall detection rate was close to 60% at a false positive
rate less then 1%.
u The authors point to a large non-overlap between PAYL and
PHAD, with one modeling header data and the other modeling
payloads. The two approaches could complement each other.
31. u ANAGRAM also builds on PAYL, but uses a mixture of high-
order N-grams with N > 1.
u This reduces its susceptibility to mimicry attacks since
higher order N-grams are harder to emulate in padded
bytes.
u By contrast, PAYL can be easily evaded if normal byte
frequencies are known to an attacker since malicious
payloads can be padded with bytes to match it.
u ANAGRAM uses supervised learning to model normal
traffic by storing N-grams of normal packets into one
bloom filter.
N-gram analysis of requests to servers
32. u Similarly, McPAD creates 2v-grams and uses a sliding window to
cover all sets of 2 bytes, n positions apart in network traffic
payloads.
u Since each byte can have values in the range 0 to 255, and n =
2, the feature space is 256^2 = 65,536. By varying v , different
feature spaces are constructed, each handled by a different
classifier.
u The dimensionality of the feature space is then reduced using a
clustering algorithm.
u Multiple one-class SVMs are used for classification, and a meta-
classifier combines these outputs into a final classification
prediction. The results of testing McPAD showed it could detect
shellcode attacks in HTTP requests.
N-gram analysis of requests to servers
33. u Organizations may require additional monitoring of critical
applications.
u One method is to create an application-specific anomaly
detector, such as for web applications.
u anomaly-based SQL injection detector : host based and
relied on the interception of SQL statements between the
web application and the database.
Analysis of requests to web applications
34.
35. u Common network architectures ensure client hosts
(workstations) within an organization are not directly
exposed to the Internet at the network layer. This protects
the client hosts from external threats such as probes, DoS,
network worms and other attacks against open ports
(services).
u However, many other threats are faced by these clients,
particularly when they are exposed to untrusted code or
data.
Analysis of web content to clients
36.
37. u This review has identified the various feature sets used by
anomaly-based NIDS.
u When designing a NIDS, the choice of network traffic
features is largely driven by the detection requirements.
u If broad anomaly detection is desired, then separate
anomaly detectors should be built for each of the feature
sets.
u For more targetted anomaly detection, a single feature
set can be used.
Conclusion and Feature set
recommendation
38. u Packet header features have the advantages of
u being fast, with relatively low computation and memory overheads,
and avoid some of the privacy and legal concerns regarding
network data analysis.
u Basic features can be used to
u flag single packets which are anomalous with respect to a normal
training model (e.g. PHAD),
u or as a filtering mechanism so only unusual packets are fed to
downstream algorithms (e.g. SPADE).
u Individual packets cannot be used to identify unusual trends or
patterns over time.
Conclusion and Feature set
recommendation
39. u To identify anomalous patterns across multiple packets,but
within a single connection, SCD header features are used.
u e.g. if all connections to port 80 on the local network are
expected to be HTTP traffic, but the timing of packets
within a monitored port 80 connection does not match an
HTTP profile, then an anomaly can be raised.
Conclusion and Feature set
recommendation
40. u MCD features are generally derived over a time window
of connections.
u Most MCD features are volume-based, such as the count
of connections to a particular destination IP address and
port in a given time window.
u MCD features can be easily used to detect unusual traffic
volumes associated with DoS attacks or scanning
behavior, but at the cost of overlooking individual
anomalous packets (since these will not meet the volume-
based threshold).
Conclusion and Feature set
recommendation
41. u While packet header feature limitations :
u packet header approaches cannot be used to directly detect
attacks aimed at applications, since the attack bytes are
embedded in the packet body.
u many of today’s exploits are directed at applications rather than
network services.
u Eg : buffer overflow attacks against web servers, web
application exploits, and attacks targetting web clients
such as drive-by-downloads.
Conclusion and Feature set
recommendation
42. u NIDS must use payload-based features extracted from packet
bodies to detect these types of attacks, since the packet
headers can remain completely normal.
u Payload analysis is more computationally expensive than
header analysis. This is due to requiring deeper packet
inspection, dealing with a variety of payload types (HTML, XML,
pdf, jpg, etc.), transfer encoding (gzip, Base64), and
obfuscation techniques.
u The advantage of payload analysis is having access to all bytes
transferred between network devices.
u This allows a rich set of payload-based features to be
constructed for anomaly detection.
Conclusion and Feature set
recommendation
43. u Due to the complexity of payload analysis, many techniques focus on
small subsets of the payload, e.g. the HTTP request, or only the
JavaScript sections of downloaded web content.
u The anomaly-based techniques do not try to match signatures of
known malware, however they can apply heuristics such as pattern
matching for the presence of shellcode, or highlighting suspiciously
long strings which may indicate a buffer overflow attempt.
u The reviewed payload based approaches derive features from either
the payload of a single connection or a user application session, and
compare the features to a normal model.
u In effect these are SCD payload-based features. Extending this
approach to multiple connections to produce MCD payload-based
features could allow different types of anomalies to stand out, e.g.
detecting an unusually large number of HTTP redirects in a network
could indicate a widespread infection attempt.
Conclusion and Feature set
recommendation