SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
By :F.Noorbehbahani
Fall 2013
Data preprocessing for anomaly based
network intrusion
detection: A review
u Dataset creation
u involves identifying representative network traffic for training and
testing. These datasets should be labeled indicating whether the
connection is normal or anomalous.
u Feature construction
u create additional features with a better discriminative ability than the
initial feature set. This can bring significant improvement to
machinelearning algorithms. Features can be constructed manually, or
by using data mining methods such as sequence analysis, association
mining, and frequent-episode mining.
u Reduction
u is commonly used to decrease the dimensionality of the dataset by
discarding any redundant or irrelevant features.(FS)
Data preprocessing
u comprehensively reviewing the features derived from
network traffic, and the related data preprocessing
techniques which have been used in anomaly-based NIDS
since 1999.
u grouping anomaly-based NIDS based on the types of
network traffic features used for detection. The aim is to
show where the majority of research has been focused.
The groups show a trend from previously using packet
header features exclusively, to using more payload
features.
paper main contributions
AnomalyBasedFeatures
Packet Header
Basic
Single
Connection
Multiple
Connection
Protocol Based
Specification
Based
Parser Based
AP Keyboard
Based
KDD Cup 99
Payload Based
N-gram analysis of
request to server
Analysis of request
to Web App
General payload
pattern matching
Analysis of web
content to clients
u Minimize data preprocessing requirements
u Real-time, High bandwidth links
u Summarizing a series of network packet headers into a
single flow record, such as NetFlow, further reduces
resource requirements
u Packet header approaches also have the advantage of
remaining valid when traffic payloads are encrypted, such
as with SSL sessions.
Packet header anomaly detection
u Data preprocessing to extract packet headers is
traightforward.
u Many software programs and libraries already exist to
process network traffic, e.g. libpcap, tcpdump, tshark,
tcptrace, Softflowd, NetFlow, and IPFIX implementations.
u The complex part of the data preprocessing is using
appropriate feature construction to derive more
discriminative features (e.g. time-based statistical
measures) from this basic traffic information.
u Only three papers use the basic features extracted directly
from individual packet headers without further feature
construction.
u PHAD
u to detect attacks against the TCP/IP stack, IDS evasion techniques,
imperfect attack code, and anomalous traffic from victim machines
u learns normal ranges for each packet header field at the data link
(Ethernet), Network (IP), and Transport/control (TCP, UDP, ICMP)
layers
u The result is 33 packet header fields used as basic features. The
possible numeric range of each packet header field is very large, so
to reduce this space, clustering is used.
u a univariate approach which cannot model dependencies
between features.
Packet header basic features
u SPADE : one of the first attempts to use an anomaly method for
portscan detection
u the basic features are instead used to build a normal traffic
distribution model for the monitored network.
u Traffic distributions are maintained in real time by tracking joint
probability measurements, e.g. P (source address, destination
address, destination port), or using a Bayes Network.
u During detection, packets are compared to the probability
distribution to calculate an anomaly score.
u By retaining these unusual packets, it is possible to look for
portscans over
u a much wider time window.
Packet header basic features
u Attacks against wireless networks have also been detected using
packet headers, in this case from the MAC layer frame header.
u The approach requires tapping the local wireless network.
u Guennoun et al. (2008) perform preprocessing to extract all the frame
headers, convert any continuous features to categorical ones, and
derive new features
u A wrapper approach is then used to find the best set of features. It
uses a forward search algorithm which starts with the single most
relevant feature, tests it with a k-means classifier, and then iteratively
adds the next most relevant feature to the set. It was found that the
top eight ranked features produced a classifier with the best
accuracy.
Packet header basic features
Packet header basic features
u use complete network flows as data instances rather than
individual packet data.
u Analyzing flows provides more context than analyzing individual
packets standalone.
u Flows are unidirectional sequences of packets sharing a
common key such as the same source address and port, and
destination address and port.
u complete after a timeout period, or for TCP with end of session
flags (e.g. FIN or RST).
u A convenient way of obtaining flow information is to use
NetFlow records.
Single connection derived features
u Having a router generate NetFlow data saves the NIDS
from doing its own data preprocessing tasks such as
parsing of IP headers, maintaining packet counts, and
stream (flow) reassembly.
u Alternatively, NetFlow records can be produced on a
computer host using software such as softflowd NetFlow
records also significantly reduce the storage requirements
compared to full packet capture.
u NetFlow information is only based on packet headers, so
the transport payload is ignored.
SCD features
u The most common and important SCD features:
timebased statistical measures by monitoring basic
features over the duration of the flow.
u Examples
u counts of packets and bytes in the flow (as per NetFlow records),
u the average inter-packet arrival time,
u the mean packet length.
u These features are useful for fingerprinting sessions,
detecting unusual data flows, or finding other anomalies
within a single session.
SCD features
u ANDSOM
u Data preprocessing first segments the dataset by service type (TCP or UDP) and
the application protocol (HTTP or SMTP).
u For each data segment a different model is created. In this case self-organizing
maps (SOM) are used.
u The calculated SCD features are quad, start time, end time, whether the session
had a valid start (2 SYN packets), whether the connection was closed properly
(FINs) or improperly (RST), number of queries per second, average size of questions,
average size of answers, question answer idle time, answer question idle time, and
the duration of the connection.
u These features provide a fingerprint for the session. During the detection phase the
data instances were compared to the appropriate SOM model to detect
anomalies in that service.
u Testing successfully found an injected BIND attack and an HTTP tunnel, both of
which are detectable within a single flow.
SCD features
u Yamada et al.
u use SCD features to find attacks against webservers when the traffic is
encrypted by SSL or TLS.
u only use information from the unencrypted protocol headers for
detection.
u The features used are :
u the HTTP request and response sizes, calculated across each continuous activity
of each user.
u Since using size features alone would produce many false positives,
frequency analysis is also performed to eliminate alerts common to the
webserver.
u Statistically rare alerts are flagged as anomalies.
SCD features
u Anomaly detection using only TCP flags as SCD features
u TCP flags are extracted from packets within each TCP session,
and each flag combination is quantized as a symbol.
u A separate model is produced for each of the observed protocols
SSH, HTTP and FTP
u During the detection phase, network traffic is evaluated against
the appropriate model for anomaly detection.
u The approach was found to detect scans initiated by nmap, and
SSH and HTTP misuse.
u While this approach detects attacks which modify TCP
characteristics, it is not likely to detect payload-based attacks.
SCD features
u SCD features have been used to detect connections
which pass through multiple stepping stones (Yang and
Huang, 2007).
u SCD features are also used by Early and Brodley (2006).
Their aim is to automatically detect which application
protocol (e.g. SSH, telnet, SMTP, or HTTP) is being used
without using the destination port as a guide.
SCD features
u Are useful for finding anomalous behavior within a single
session, such as an unexpected protocol, unusual data
sizes, unusual packet timing, or unusual TCP flag
sequences.
u Particular detection capabilities include backdoors, HTTP
tunnels, stepping stones, BIND attacks, and command and
control channels.
u However, by themselves they cannot be used to find
activity spanning multiple flows such as DoS attacks or
network probes. For that, MCD features are required.
SCD features
u Are constructed by monitoring base features over multiple
flows or connections
u They enable detection of anomalies which manifest
themselves as unusual patterns of traffic, such as network
probes and DoS attacks.
u Domain knowledge is used to choose a window of data to
consider.
u The time windows range from 5 s to 24 h, with shorter time
windows detecting bursty attacks, and long time windows
more likely to detect slow and stealthy attacks.
u Connection based windows are also used, such as
nalyzing the most recent 100 connections
Multiple connection derived features
u Domain knowledge is used to choose a window of data to
consider.
u The time windows range from 5 s to 24 h, with shorter time
windows detecting bursty attacks, and long time windows
more likely to detect slow and stealthy attacks.
u Connection based windows are also used, such as
nalyzing the most recent 100 connections.
MCD features
u it has known limitations
u Advantages
u being publicly available, labeled, and preprocessed ready for
machine learning.
u Each network connection was processed into a labeled
vector of 41 features constructed using data mining
techniques and expert domain knowledge when creating
a machine learning misuse-based NIDS
KDD cup 99
u 9 basic and SCD header features for each connection
(similar to NetFlow)
u 9 time-based MCD header features constructed over a 2
s window
u 10 host-based MCD header features constructed over a
100 connection window to detect slow probes.
u 13 content-based features were constructed from the
traffic payloads using domain knowledge. Data mining
algorithms could not be used since the payloads were
unprocessed and therefore unstructured. They were
designed to specifically detect U2R and R2L attacks.
KDD 99 data preprocessing produced
MCD features
u Many remote attacks on computers place the exploit
code inside the payload of network packets. Hence these
attacks are not directly detectable by packet header
approaches
u Payload attacks are more computationally expensive to
detect due to requiring deeper searches into network
sessions.
Content anomaly detection
u SANS Top Cyber Security Risks” 2009 report lists the top two
cyber risks as client side software which remains
unpatched, and vulnerable Internet-facing websites.
u The first risk can be exploited using malicious content
destined for a client, while the second can be exploited
using crafted content in requests to servers.
u In these cases, bytes containing the exploit code are
contained within network packet payloads beyond the
TCP/IP headers, such as within downloaded files.
Content anomaly detection
u PAYL
u uses 1-g and unsupervised learning to build a byte-frequency
distribution model of
u network traffic payloads.
u A 1-g is simply a single byte with value in the range 0e255. The
result of preprocessing a packet payload this way is a feature
vector containing the relative frequency count of each of the
256 possible 1-g (bytes) in the payload.
u The model also includes the average frequency, as well as the
variance and standard deviation as other features.
u Separate models of normal traffic are created for each
combination of destination port and length of the flow.
N-gram analysis of requests to servers
u PAYL was designed to detect zero-day worms, since flows with
worm payloads can produce an unusual byte-frequency
distribution.
u Testing was performed on all attacks in the DARPA 1999 dataset
using individual packets as data units (connection data units
were also attempted).
u The overall detection rate was close to 60% at a false positive
rate less then 1%.
u The authors point to a large non-overlap between PAYL and
PHAD, with one modeling header data and the other modeling
payloads. The two approaches could complement each other.
u ANAGRAM also builds on PAYL, but uses a mixture of high-
order N-grams with N > 1.
u This reduces its susceptibility to mimicry attacks since
higher order N-grams are harder to emulate in padded
bytes.
u By contrast, PAYL can be easily evaded if normal byte
frequencies are known to an attacker since malicious
payloads can be padded with bytes to match it.
u ANAGRAM uses supervised learning to model normal
traffic by storing N-grams of normal packets into one
bloom filter.
N-gram analysis of requests to servers
u Similarly, McPAD creates 2v-grams and uses a sliding window to
cover all sets of 2 bytes, n positions apart in network traffic
payloads.
u Since each byte can have values in the range 0 to 255, and n =
2, the feature space is 256^2 = 65,536. By varying v , different
feature spaces are constructed, each handled by a different
classifier.
u The dimensionality of the feature space is then reduced using a
clustering algorithm.
u Multiple one-class SVMs are used for classification, and a meta-
classifier combines these outputs into a final classification
prediction. The results of testing McPAD showed it could detect
shellcode attacks in HTTP requests.
N-gram analysis of requests to servers
u Organizations may require additional monitoring of critical
applications.
u One method is to create an application-specific anomaly
detector, such as for web applications.
u anomaly-based SQL injection detector : host based and
relied on the interception of SQL statements between the
web application and the database.
Analysis of requests to web applications
u Common network architectures ensure client hosts
(workstations) within an organization are not directly
exposed to the Internet at the network layer. This protects
the client hosts from external threats such as probes, DoS,
network worms and other attacks against open ports
(services).
u However, many other threats are faced by these clients,
particularly when they are exposed to untrusted code or
data.
Analysis of web content to clients
u This review has identified the various feature sets used by
anomaly-based NIDS.
u When designing a NIDS, the choice of network traffic
features is largely driven by the detection requirements.
u If broad anomaly detection is desired, then separate
anomaly detectors should be built for each of the feature
sets.
u For more targetted anomaly detection, a single feature
set can be used.
Conclusion and Feature set
recommendation
u Packet header features have the advantages of
u being fast, with relatively low computation and memory overheads,
and avoid some of the privacy and legal concerns regarding
network data analysis.
u Basic features can be used to
u flag single packets which are anomalous with respect to a normal
training model (e.g. PHAD),
u or as a filtering mechanism so only unusual packets are fed to
downstream algorithms (e.g. SPADE).
u Individual packets cannot be used to identify unusual trends or
patterns over time.
Conclusion and Feature set
recommendation
u To identify anomalous patterns across multiple packets,but
within a single connection, SCD header features are used.
u e.g. if all connections to port 80 on the local network are
expected to be HTTP traffic, but the timing of packets
within a monitored port 80 connection does not match an
HTTP profile, then an anomaly can be raised.
Conclusion and Feature set
recommendation
u MCD features are generally derived over a time window
of connections.
u Most MCD features are volume-based, such as the count
of connections to a particular destination IP address and
port in a given time window.
u MCD features can be easily used to detect unusual traffic
volumes associated with DoS attacks or scanning
behavior, but at the cost of overlooking individual
anomalous packets (since these will not meet the volume-
based threshold).
Conclusion and Feature set
recommendation
u While packet header feature limitations :
u packet header approaches cannot be used to directly detect
attacks aimed at applications, since the attack bytes are
embedded in the packet body.
u many of today’s exploits are directed at applications rather than
network services.
u Eg : buffer overflow attacks against web servers, web
application exploits, and attacks targetting web clients
such as drive-by-downloads.
Conclusion and Feature set
recommendation
u NIDS must use payload-based features extracted from packet
bodies to detect these types of attacks, since the packet
headers can remain completely normal.
u Payload analysis is more computationally expensive than
header analysis. This is due to requiring deeper packet
inspection, dealing with a variety of payload types (HTML, XML,
pdf, jpg, etc.), transfer encoding (gzip, Base64), and
obfuscation techniques.
u The advantage of payload analysis is having access to all bytes
transferred between network devices.
u This allows a rich set of payload-based features to be
constructed for anomaly detection.
Conclusion and Feature set
recommendation
u Due to the complexity of payload analysis, many techniques focus on
small subsets of the payload, e.g. the HTTP request, or only the
JavaScript sections of downloaded web content.
u The anomaly-based techniques do not try to match signatures of
known malware, however they can apply heuristics such as pattern
matching for the presence of shellcode, or highlighting suspiciously
long strings which may indicate a buffer overflow attempt.
u The reviewed payload based approaches derive features from either
the payload of a single connection or a user application session, and
compare the features to a normal model.
u In effect these are SCD payload-based features. Extending this
approach to multiple connections to produce MCD payload-based
features could allow different types of anomalies to stand out, e.g.
detecting an unusually large number of HTTP redirects in a network
could indicate a widespread infection attempt.
Conclusion and Feature set
recommendation
List of features
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion
Noorbehbahani data preprocessing for anomaly based network intrusion

Contenu connexe

Tendances

Transport Layer In Computer Network
Transport Layer In Computer NetworkTransport Layer In Computer Network
Transport Layer In Computer NetworkDestro Destro
 
A Survey of various Methods of Preventing and Detecting Attacks on AODV-based...
A Survey of various Methods of Preventing and Detecting Attacks on AODV-based...A Survey of various Methods of Preventing and Detecting Attacks on AODV-based...
A Survey of various Methods of Preventing and Detecting Attacks on AODV-based...IJERD Editor
 
raim-2015-paper31
raim-2015-paper31raim-2015-paper31
raim-2015-paper31John Wu
 
User datagram protocol (udp)
User datagram protocol (udp)User datagram protocol (udp)
User datagram protocol (udp)Ramola Dhande
 
Paper id 25201418
Paper id 25201418Paper id 25201418
Paper id 25201418IJRAT
 
Transport layer (computer networks)
Transport layer (computer networks)Transport layer (computer networks)
Transport layer (computer networks)Fatbardh Hysa
 
TRIDNT: THE TRUST-BASED ROUTING PROTOCOL WITH CONTROLLED DEGREE OF NODE SELFI...
TRIDNT: THE TRUST-BASED ROUTING PROTOCOL WITH CONTROLLED DEGREE OF NODE SELFI...TRIDNT: THE TRUST-BASED ROUTING PROTOCOL WITH CONTROLLED DEGREE OF NODE SELFI...
TRIDNT: THE TRUST-BASED ROUTING PROTOCOL WITH CONTROLLED DEGREE OF NODE SELFI...IJNSA Journal
 
Performance measurement of MANET routing protocols under Blackhole security a...
Performance measurement of MANET routing protocols under Blackhole security a...Performance measurement of MANET routing protocols under Blackhole security a...
Performance measurement of MANET routing protocols under Blackhole security a...iosrjce
 
IRJET- Estimating Various DHT Protocols
IRJET- Estimating Various DHT ProtocolsIRJET- Estimating Various DHT Protocols
IRJET- Estimating Various DHT ProtocolsIRJET Journal
 
Anomalous payload based network intrusion detection
Anomalous payload based network intrusion detectionAnomalous payload based network intrusion detection
Anomalous payload based network intrusion detectionUltraUploader
 
Providing A Network Encryption Approach to reduce end-to-end Delay in MANET
Providing A Network Encryption Approach to reduce end-to-end Delay in MANETProviding A Network Encryption Approach to reduce end-to-end Delay in MANET
Providing A Network Encryption Approach to reduce end-to-end Delay in MANETEditor IJCATR
 
Peer to peer Paradigms
Peer to peer ParadigmsPeer to peer Paradigms
Peer to peer Paradigmshassan ahmed
 
Bt0072 computer networks 2
Bt0072 computer networks  2Bt0072 computer networks  2
Bt0072 computer networks 2Techglyphs
 
Computer Architecture Performance Evolution of Caches using Patch
Computer Architecture Performance Evolution of Caches using Patch Computer Architecture Performance Evolution of Caches using Patch
Computer Architecture Performance Evolution of Caches using Patch Mudassir Parvi
 

Tendances (18)

Transport Layer In Computer Network
Transport Layer In Computer NetworkTransport Layer In Computer Network
Transport Layer In Computer Network
 
A Survey of various Methods of Preventing and Detecting Attacks on AODV-based...
A Survey of various Methods of Preventing and Detecting Attacks on AODV-based...A Survey of various Methods of Preventing and Detecting Attacks on AODV-based...
A Survey of various Methods of Preventing and Detecting Attacks on AODV-based...
 
Vp ns
Vp nsVp ns
Vp ns
 
raim-2015-paper31
raim-2015-paper31raim-2015-paper31
raim-2015-paper31
 
User datagram protocol (udp)
User datagram protocol (udp)User datagram protocol (udp)
User datagram protocol (udp)
 
Paper id 25201418
Paper id 25201418Paper id 25201418
Paper id 25201418
 
Transport layer (computer networks)
Transport layer (computer networks)Transport layer (computer networks)
Transport layer (computer networks)
 
Protocols
Protocols Protocols
Protocols
 
TRIDNT: THE TRUST-BASED ROUTING PROTOCOL WITH CONTROLLED DEGREE OF NODE SELFI...
TRIDNT: THE TRUST-BASED ROUTING PROTOCOL WITH CONTROLLED DEGREE OF NODE SELFI...TRIDNT: THE TRUST-BASED ROUTING PROTOCOL WITH CONTROLLED DEGREE OF NODE SELFI...
TRIDNT: THE TRUST-BASED ROUTING PROTOCOL WITH CONTROLLED DEGREE OF NODE SELFI...
 
Performance measurement of MANET routing protocols under Blackhole security a...
Performance measurement of MANET routing protocols under Blackhole security a...Performance measurement of MANET routing protocols under Blackhole security a...
Performance measurement of MANET routing protocols under Blackhole security a...
 
IRJET- Estimating Various DHT Protocols
IRJET- Estimating Various DHT ProtocolsIRJET- Estimating Various DHT Protocols
IRJET- Estimating Various DHT Protocols
 
Anomalous payload based network intrusion detection
Anomalous payload based network intrusion detectionAnomalous payload based network intrusion detection
Anomalous payload based network intrusion detection
 
Providing A Network Encryption Approach to reduce end-to-end Delay in MANET
Providing A Network Encryption Approach to reduce end-to-end Delay in MANETProviding A Network Encryption Approach to reduce end-to-end Delay in MANET
Providing A Network Encryption Approach to reduce end-to-end Delay in MANET
 
Kademlia introduction
Kademlia introductionKademlia introduction
Kademlia introduction
 
Peer to peer Paradigms
Peer to peer ParadigmsPeer to peer Paradigms
Peer to peer Paradigms
 
Bt0072 computer networks 2
Bt0072 computer networks  2Bt0072 computer networks  2
Bt0072 computer networks 2
 
Computer Architecture Performance Evolution of Caches using Patch
Computer Architecture Performance Evolution of Caches using Patch Computer Architecture Performance Evolution of Caches using Patch
Computer Architecture Performance Evolution of Caches using Patch
 
Kw2418391845
Kw2418391845Kw2418391845
Kw2418391845
 

Similaire à Noorbehbahani data preprocessing for anomaly based network intrusion

For your final step, you will synthesize the previous steps and la
For your final step, you will synthesize the previous steps and laFor your final step, you will synthesize the previous steps and la
For your final step, you will synthesize the previous steps and laShainaBoling829
 
Chapter 3. sensors in the network domain
Chapter 3. sensors in the network domainChapter 3. sensors in the network domain
Chapter 3. sensors in the network domainPhu Nguyen
 
Scalable Statistical Detection of Tunnelled Applications
Scalable Statistical Detection of Tunnelled ApplicationsScalable Statistical Detection of Tunnelled Applications
Scalable Statistical Detection of Tunnelled ApplicationsIJCSIS Research Publications
 
ANALYZING NETWORK PERFORMANCE PARAMETERS USING WIRESHARK
ANALYZING NETWORK PERFORMANCE PARAMETERS USING WIRESHARKANALYZING NETWORK PERFORMANCE PARAMETERS USING WIRESHARK
ANALYZING NETWORK PERFORMANCE PARAMETERS USING WIRESHARKIJNSA Journal
 
Experiment 7 traffic analysis
Experiment 7 traffic analysisExperiment 7 traffic analysis
Experiment 7 traffic analysisnikitaa25
 
Anomaly detection final
Anomaly detection finalAnomaly detection final
Anomaly detection finalAkshay Bansal
 
Network monotoring
Network monotoringNetwork monotoring
Network monotoringProgrammer
 
Zmap fast internet wide scanning and its security applications
Zmap fast internet wide scanning and its security applicationsZmap fast internet wide scanning and its security applications
Zmap fast internet wide scanning and its security applicationslosalamos
 
IRJET- Assessment of Network Protocol Packet Analysis in IPV4 and IPV6 on Loc...
IRJET- Assessment of Network Protocol Packet Analysis in IPV4 and IPV6 on Loc...IRJET- Assessment of Network Protocol Packet Analysis in IPV4 and IPV6 on Loc...
IRJET- Assessment of Network Protocol Packet Analysis in IPV4 and IPV6 on Loc...IRJET Journal
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics TokyoAdam Gibson
 
Automated Traffic Classification And Application Identification Using Machine...
Automated Traffic Classification And Application Identification Using Machine...Automated Traffic Classification And Application Identification Using Machine...
Automated Traffic Classification And Application Identification Using Machine...Jennifer Daniel
 
Dist sniffing & scanning project
Dist sniffing & scanning projectDist sniffing & scanning project
Dist sniffing & scanning projectRishu Seth
 
NON-INTRUSIVE REMOTE MONITORING OF SERVICES IN A DATA CENTRE
NON-INTRUSIVE REMOTE MONITORING OF SERVICES IN A DATA CENTRENON-INTRUSIVE REMOTE MONITORING OF SERVICES IN A DATA CENTRE
NON-INTRUSIVE REMOTE MONITORING OF SERVICES IN A DATA CENTREcscpconf
 
Analysis Of Internet Protocol ( IP ) Datagrams
Analysis Of Internet Protocol ( IP ) DatagramsAnalysis Of Internet Protocol ( IP ) Datagrams
Analysis Of Internet Protocol ( IP ) DatagramsEmily Jones
 
Disadvantages And Disadvantages Of Wireless Networked And...
Disadvantages And Disadvantages Of Wireless Networked And...Disadvantages And Disadvantages Of Wireless Networked And...
Disadvantages And Disadvantages Of Wireless Networked And...Kimberly Jones
 
Procuring the Anomaly Packets and Accountability Detection in the Network
Procuring the Anomaly Packets and Accountability Detection in the NetworkProcuring the Anomaly Packets and Accountability Detection in the Network
Procuring the Anomaly Packets and Accountability Detection in the NetworkIOSR Journals
 
Discriminators for use in flow-based classification
Discriminators for use in flow-based classificationDiscriminators for use in flow-based classification
Discriminators for use in flow-based classificationDenis Zuev
 

Similaire à Noorbehbahani data preprocessing for anomaly based network intrusion (20)

For your final step, you will synthesize the previous steps and la
For your final step, you will synthesize the previous steps and laFor your final step, you will synthesize the previous steps and la
For your final step, you will synthesize the previous steps and la
 
Chapter 3. sensors in the network domain
Chapter 3. sensors in the network domainChapter 3. sensors in the network domain
Chapter 3. sensors in the network domain
 
Scalable Statistical Detection of Tunnelled Applications
Scalable Statistical Detection of Tunnelled ApplicationsScalable Statistical Detection of Tunnelled Applications
Scalable Statistical Detection of Tunnelled Applications
 
ANALYZING NETWORK PERFORMANCE PARAMETERS USING WIRESHARK
ANALYZING NETWORK PERFORMANCE PARAMETERS USING WIRESHARKANALYZING NETWORK PERFORMANCE PARAMETERS USING WIRESHARK
ANALYZING NETWORK PERFORMANCE PARAMETERS USING WIRESHARK
 
Experiment 7 traffic analysis
Experiment 7 traffic analysisExperiment 7 traffic analysis
Experiment 7 traffic analysis
 
Anomaly detection final
Anomaly detection finalAnomaly detection final
Anomaly detection final
 
Contents namp
Contents nampContents namp
Contents namp
 
Contents namp
Contents nampContents namp
Contents namp
 
Network monotoring
Network monotoringNetwork monotoring
Network monotoring
 
Zmap fast internet wide scanning and its security applications
Zmap fast internet wide scanning and its security applicationsZmap fast internet wide scanning and its security applications
Zmap fast internet wide scanning and its security applications
 
IRJET- Assessment of Network Protocol Packet Analysis in IPV4 and IPV6 on Loc...
IRJET- Assessment of Network Protocol Packet Analysis in IPV4 and IPV6 on Loc...IRJET- Assessment of Network Protocol Packet Analysis in IPV4 and IPV6 on Loc...
IRJET- Assessment of Network Protocol Packet Analysis in IPV4 and IPV6 on Loc...
 
Big Data Analytics Tokyo
Big Data Analytics TokyoBig Data Analytics Tokyo
Big Data Analytics Tokyo
 
Automated Traffic Classification And Application Identification Using Machine...
Automated Traffic Classification And Application Identification Using Machine...Automated Traffic Classification And Application Identification Using Machine...
Automated Traffic Classification And Application Identification Using Machine...
 
Dist sniffing & scanning project
Dist sniffing & scanning projectDist sniffing & scanning project
Dist sniffing & scanning project
 
NON-INTRUSIVE REMOTE MONITORING OF SERVICES IN A DATA CENTRE
NON-INTRUSIVE REMOTE MONITORING OF SERVICES IN A DATA CENTRENON-INTRUSIVE REMOTE MONITORING OF SERVICES IN A DATA CENTRE
NON-INTRUSIVE REMOTE MONITORING OF SERVICES IN A DATA CENTRE
 
Analysis Of Internet Protocol ( IP ) Datagrams
Analysis Of Internet Protocol ( IP ) DatagramsAnalysis Of Internet Protocol ( IP ) Datagrams
Analysis Of Internet Protocol ( IP ) Datagrams
 
Wiki2010 Unit 4
Wiki2010 Unit 4Wiki2010 Unit 4
Wiki2010 Unit 4
 
Disadvantages And Disadvantages Of Wireless Networked And...
Disadvantages And Disadvantages Of Wireless Networked And...Disadvantages And Disadvantages Of Wireless Networked And...
Disadvantages And Disadvantages Of Wireless Networked And...
 
Procuring the Anomaly Packets and Accountability Detection in the Network
Procuring the Anomaly Packets and Accountability Detection in the NetworkProcuring the Anomaly Packets and Accountability Detection in the Network
Procuring the Anomaly Packets and Accountability Detection in the Network
 
Discriminators for use in flow-based classification
Discriminators for use in flow-based classificationDiscriminators for use in flow-based classification
Discriminators for use in flow-based classification
 

Dernier

Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detailhaiderbaloch3
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Organic farming with special reference to vermiculture
Organic farming with special reference to vermicultureOrganic farming with special reference to vermiculture
Organic farming with special reference to vermicultureTakeleZike1
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 

Dernier (20)

Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Biological classification of plants with detail
Biological classification of plants with detailBiological classification of plants with detail
Biological classification of plants with detail
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Organic farming with special reference to vermiculture
Organic farming with special reference to vermicultureOrganic farming with special reference to vermiculture
Organic farming with special reference to vermiculture
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 

Noorbehbahani data preprocessing for anomaly based network intrusion

  • 1. By :F.Noorbehbahani Fall 2013 Data preprocessing for anomaly based network intrusion detection: A review
  • 2. u Dataset creation u involves identifying representative network traffic for training and testing. These datasets should be labeled indicating whether the connection is normal or anomalous. u Feature construction u create additional features with a better discriminative ability than the initial feature set. This can bring significant improvement to machinelearning algorithms. Features can be constructed manually, or by using data mining methods such as sequence analysis, association mining, and frequent-episode mining. u Reduction u is commonly used to decrease the dimensionality of the dataset by discarding any redundant or irrelevant features.(FS) Data preprocessing
  • 3. u comprehensively reviewing the features derived from network traffic, and the related data preprocessing techniques which have been used in anomaly-based NIDS since 1999. u grouping anomaly-based NIDS based on the types of network traffic features used for detection. The aim is to show where the majority of research has been focused. The groups show a trend from previously using packet header features exclusively, to using more payload features. paper main contributions
  • 4.
  • 5. AnomalyBasedFeatures Packet Header Basic Single Connection Multiple Connection Protocol Based Specification Based Parser Based AP Keyboard Based KDD Cup 99 Payload Based N-gram analysis of request to server Analysis of request to Web App General payload pattern matching Analysis of web content to clients
  • 6. u Minimize data preprocessing requirements u Real-time, High bandwidth links u Summarizing a series of network packet headers into a single flow record, such as NetFlow, further reduces resource requirements u Packet header approaches also have the advantage of remaining valid when traffic payloads are encrypted, such as with SSL sessions. Packet header anomaly detection
  • 7. u Data preprocessing to extract packet headers is traightforward. u Many software programs and libraries already exist to process network traffic, e.g. libpcap, tcpdump, tshark, tcptrace, Softflowd, NetFlow, and IPFIX implementations. u The complex part of the data preprocessing is using appropriate feature construction to derive more discriminative features (e.g. time-based statistical measures) from this basic traffic information.
  • 8. u Only three papers use the basic features extracted directly from individual packet headers without further feature construction. u PHAD u to detect attacks against the TCP/IP stack, IDS evasion techniques, imperfect attack code, and anomalous traffic from victim machines u learns normal ranges for each packet header field at the data link (Ethernet), Network (IP), and Transport/control (TCP, UDP, ICMP) layers u The result is 33 packet header fields used as basic features. The possible numeric range of each packet header field is very large, so to reduce this space, clustering is used. u a univariate approach which cannot model dependencies between features. Packet header basic features
  • 9. u SPADE : one of the first attempts to use an anomaly method for portscan detection u the basic features are instead used to build a normal traffic distribution model for the monitored network. u Traffic distributions are maintained in real time by tracking joint probability measurements, e.g. P (source address, destination address, destination port), or using a Bayes Network. u During detection, packets are compared to the probability distribution to calculate an anomaly score. u By retaining these unusual packets, it is possible to look for portscans over u a much wider time window. Packet header basic features
  • 10. u Attacks against wireless networks have also been detected using packet headers, in this case from the MAC layer frame header. u The approach requires tapping the local wireless network. u Guennoun et al. (2008) perform preprocessing to extract all the frame headers, convert any continuous features to categorical ones, and derive new features u A wrapper approach is then used to find the best set of features. It uses a forward search algorithm which starts with the single most relevant feature, tests it with a k-means classifier, and then iteratively adds the next most relevant feature to the set. It was found that the top eight ranked features produced a classifier with the best accuracy. Packet header basic features
  • 12. u use complete network flows as data instances rather than individual packet data. u Analyzing flows provides more context than analyzing individual packets standalone. u Flows are unidirectional sequences of packets sharing a common key such as the same source address and port, and destination address and port. u complete after a timeout period, or for TCP with end of session flags (e.g. FIN or RST). u A convenient way of obtaining flow information is to use NetFlow records. Single connection derived features
  • 13. u Having a router generate NetFlow data saves the NIDS from doing its own data preprocessing tasks such as parsing of IP headers, maintaining packet counts, and stream (flow) reassembly. u Alternatively, NetFlow records can be produced on a computer host using software such as softflowd NetFlow records also significantly reduce the storage requirements compared to full packet capture. u NetFlow information is only based on packet headers, so the transport payload is ignored. SCD features
  • 14. u The most common and important SCD features: timebased statistical measures by monitoring basic features over the duration of the flow. u Examples u counts of packets and bytes in the flow (as per NetFlow records), u the average inter-packet arrival time, u the mean packet length. u These features are useful for fingerprinting sessions, detecting unusual data flows, or finding other anomalies within a single session. SCD features
  • 15. u ANDSOM u Data preprocessing first segments the dataset by service type (TCP or UDP) and the application protocol (HTTP or SMTP). u For each data segment a different model is created. In this case self-organizing maps (SOM) are used. u The calculated SCD features are quad, start time, end time, whether the session had a valid start (2 SYN packets), whether the connection was closed properly (FINs) or improperly (RST), number of queries per second, average size of questions, average size of answers, question answer idle time, answer question idle time, and the duration of the connection. u These features provide a fingerprint for the session. During the detection phase the data instances were compared to the appropriate SOM model to detect anomalies in that service. u Testing successfully found an injected BIND attack and an HTTP tunnel, both of which are detectable within a single flow. SCD features
  • 16. u Yamada et al. u use SCD features to find attacks against webservers when the traffic is encrypted by SSL or TLS. u only use information from the unencrypted protocol headers for detection. u The features used are : u the HTTP request and response sizes, calculated across each continuous activity of each user. u Since using size features alone would produce many false positives, frequency analysis is also performed to eliminate alerts common to the webserver. u Statistically rare alerts are flagged as anomalies. SCD features
  • 17. u Anomaly detection using only TCP flags as SCD features u TCP flags are extracted from packets within each TCP session, and each flag combination is quantized as a symbol. u A separate model is produced for each of the observed protocols SSH, HTTP and FTP u During the detection phase, network traffic is evaluated against the appropriate model for anomaly detection. u The approach was found to detect scans initiated by nmap, and SSH and HTTP misuse. u While this approach detects attacks which modify TCP characteristics, it is not likely to detect payload-based attacks. SCD features
  • 18. u SCD features have been used to detect connections which pass through multiple stepping stones (Yang and Huang, 2007). u SCD features are also used by Early and Brodley (2006). Their aim is to automatically detect which application protocol (e.g. SSH, telnet, SMTP, or HTTP) is being used without using the destination port as a guide. SCD features
  • 19. u Are useful for finding anomalous behavior within a single session, such as an unexpected protocol, unusual data sizes, unusual packet timing, or unusual TCP flag sequences. u Particular detection capabilities include backdoors, HTTP tunnels, stepping stones, BIND attacks, and command and control channels. u However, by themselves they cannot be used to find activity spanning multiple flows such as DoS attacks or network probes. For that, MCD features are required. SCD features
  • 20.
  • 21. u Are constructed by monitoring base features over multiple flows or connections u They enable detection of anomalies which manifest themselves as unusual patterns of traffic, such as network probes and DoS attacks. u Domain knowledge is used to choose a window of data to consider. u The time windows range from 5 s to 24 h, with shorter time windows detecting bursty attacks, and long time windows more likely to detect slow and stealthy attacks. u Connection based windows are also used, such as nalyzing the most recent 100 connections Multiple connection derived features
  • 22. u Domain knowledge is used to choose a window of data to consider. u The time windows range from 5 s to 24 h, with shorter time windows detecting bursty attacks, and long time windows more likely to detect slow and stealthy attacks. u Connection based windows are also used, such as nalyzing the most recent 100 connections. MCD features
  • 23.
  • 24. u it has known limitations u Advantages u being publicly available, labeled, and preprocessed ready for machine learning. u Each network connection was processed into a labeled vector of 41 features constructed using data mining techniques and expert domain knowledge when creating a machine learning misuse-based NIDS KDD cup 99
  • 25. u 9 basic and SCD header features for each connection (similar to NetFlow) u 9 time-based MCD header features constructed over a 2 s window u 10 host-based MCD header features constructed over a 100 connection window to detect slow probes. u 13 content-based features were constructed from the traffic payloads using domain knowledge. Data mining algorithms could not be used since the payloads were unprocessed and therefore unstructured. They were designed to specifically detect U2R and R2L attacks. KDD 99 data preprocessing produced
  • 27. u Many remote attacks on computers place the exploit code inside the payload of network packets. Hence these attacks are not directly detectable by packet header approaches u Payload attacks are more computationally expensive to detect due to requiring deeper searches into network sessions. Content anomaly detection
  • 28. u SANS Top Cyber Security Risks” 2009 report lists the top two cyber risks as client side software which remains unpatched, and vulnerable Internet-facing websites. u The first risk can be exploited using malicious content destined for a client, while the second can be exploited using crafted content in requests to servers. u In these cases, bytes containing the exploit code are contained within network packet payloads beyond the TCP/IP headers, such as within downloaded files. Content anomaly detection
  • 29. u PAYL u uses 1-g and unsupervised learning to build a byte-frequency distribution model of u network traffic payloads. u A 1-g is simply a single byte with value in the range 0e255. The result of preprocessing a packet payload this way is a feature vector containing the relative frequency count of each of the 256 possible 1-g (bytes) in the payload. u The model also includes the average frequency, as well as the variance and standard deviation as other features. u Separate models of normal traffic are created for each combination of destination port and length of the flow. N-gram analysis of requests to servers
  • 30. u PAYL was designed to detect zero-day worms, since flows with worm payloads can produce an unusual byte-frequency distribution. u Testing was performed on all attacks in the DARPA 1999 dataset using individual packets as data units (connection data units were also attempted). u The overall detection rate was close to 60% at a false positive rate less then 1%. u The authors point to a large non-overlap between PAYL and PHAD, with one modeling header data and the other modeling payloads. The two approaches could complement each other.
  • 31. u ANAGRAM also builds on PAYL, but uses a mixture of high- order N-grams with N > 1. u This reduces its susceptibility to mimicry attacks since higher order N-grams are harder to emulate in padded bytes. u By contrast, PAYL can be easily evaded if normal byte frequencies are known to an attacker since malicious payloads can be padded with bytes to match it. u ANAGRAM uses supervised learning to model normal traffic by storing N-grams of normal packets into one bloom filter. N-gram analysis of requests to servers
  • 32. u Similarly, McPAD creates 2v-grams and uses a sliding window to cover all sets of 2 bytes, n positions apart in network traffic payloads. u Since each byte can have values in the range 0 to 255, and n = 2, the feature space is 256^2 = 65,536. By varying v , different feature spaces are constructed, each handled by a different classifier. u The dimensionality of the feature space is then reduced using a clustering algorithm. u Multiple one-class SVMs are used for classification, and a meta- classifier combines these outputs into a final classification prediction. The results of testing McPAD showed it could detect shellcode attacks in HTTP requests. N-gram analysis of requests to servers
  • 33. u Organizations may require additional monitoring of critical applications. u One method is to create an application-specific anomaly detector, such as for web applications. u anomaly-based SQL injection detector : host based and relied on the interception of SQL statements between the web application and the database. Analysis of requests to web applications
  • 34.
  • 35. u Common network architectures ensure client hosts (workstations) within an organization are not directly exposed to the Internet at the network layer. This protects the client hosts from external threats such as probes, DoS, network worms and other attacks against open ports (services). u However, many other threats are faced by these clients, particularly when they are exposed to untrusted code or data. Analysis of web content to clients
  • 36.
  • 37. u This review has identified the various feature sets used by anomaly-based NIDS. u When designing a NIDS, the choice of network traffic features is largely driven by the detection requirements. u If broad anomaly detection is desired, then separate anomaly detectors should be built for each of the feature sets. u For more targetted anomaly detection, a single feature set can be used. Conclusion and Feature set recommendation
  • 38. u Packet header features have the advantages of u being fast, with relatively low computation and memory overheads, and avoid some of the privacy and legal concerns regarding network data analysis. u Basic features can be used to u flag single packets which are anomalous with respect to a normal training model (e.g. PHAD), u or as a filtering mechanism so only unusual packets are fed to downstream algorithms (e.g. SPADE). u Individual packets cannot be used to identify unusual trends or patterns over time. Conclusion and Feature set recommendation
  • 39. u To identify anomalous patterns across multiple packets,but within a single connection, SCD header features are used. u e.g. if all connections to port 80 on the local network are expected to be HTTP traffic, but the timing of packets within a monitored port 80 connection does not match an HTTP profile, then an anomaly can be raised. Conclusion and Feature set recommendation
  • 40. u MCD features are generally derived over a time window of connections. u Most MCD features are volume-based, such as the count of connections to a particular destination IP address and port in a given time window. u MCD features can be easily used to detect unusual traffic volumes associated with DoS attacks or scanning behavior, but at the cost of overlooking individual anomalous packets (since these will not meet the volume- based threshold). Conclusion and Feature set recommendation
  • 41. u While packet header feature limitations : u packet header approaches cannot be used to directly detect attacks aimed at applications, since the attack bytes are embedded in the packet body. u many of today’s exploits are directed at applications rather than network services. u Eg : buffer overflow attacks against web servers, web application exploits, and attacks targetting web clients such as drive-by-downloads. Conclusion and Feature set recommendation
  • 42. u NIDS must use payload-based features extracted from packet bodies to detect these types of attacks, since the packet headers can remain completely normal. u Payload analysis is more computationally expensive than header analysis. This is due to requiring deeper packet inspection, dealing with a variety of payload types (HTML, XML, pdf, jpg, etc.), transfer encoding (gzip, Base64), and obfuscation techniques. u The advantage of payload analysis is having access to all bytes transferred between network devices. u This allows a rich set of payload-based features to be constructed for anomaly detection. Conclusion and Feature set recommendation
  • 43. u Due to the complexity of payload analysis, many techniques focus on small subsets of the payload, e.g. the HTTP request, or only the JavaScript sections of downloaded web content. u The anomaly-based techniques do not try to match signatures of known malware, however they can apply heuristics such as pattern matching for the presence of shellcode, or highlighting suspiciously long strings which may indicate a buffer overflow attempt. u The reviewed payload based approaches derive features from either the payload of a single connection or a user application session, and compare the features to a normal model. u In effect these are SCD payload-based features. Extending this approach to multiple connections to produce MCD payload-based features could allow different types of anomalies to stand out, e.g. detecting an unusually large number of HTTP redirects in a network could indicate a widespread infection attempt. Conclusion and Feature set recommendation