SlideShare a Scribd company logo
1 of 12
Download to read offline
Enhancing Intrusion
Detection System with
Proximity Information
†Zhenyun Zhuang
College of Computing
Georgia Institute of Technology
Atlanta, GA 30332
E-mail: zhenyun@cc.gatech.edu
†Corresponding author
Ying Li
College of Computing
Georgia Institute of Technology
Atlanta, GA 30332
E-mail: yingli@cc.gatech.edu
Zesheng Chen
Department of Engineering
Indiana University - Purdue University Fort Wayne
Fort Wayne, IN 46805
E-mail: zchen@engr.ipfw.edu
Abstract: The wide spread of worms poses serious challenges to today’s Internet.
Various IDSes (Intrusion Detection Systems) have been proposed to identify or prevent
such spread. These IDSes can be largely classified as signature-based or anomaly-based
ones depending on what type of knowledge the system knows. Signature-based IDSes
are unable to detect the outbreak of new and unidentified worms when the worms’
characteristic patterns are unknown. In addition, new worms are often sufficiently
intelligent to hide their activities and evade anomaly detection. Moreover, modern
worms tend to spread more quickly, and the outbreak period lasts in the order of
hours or even minutes. Such characteristics render existing detection mechanisms less
effective.
In this work, we consider the drawbacks of current detection approaches and propose
PAIDS, a proximity-assisted IDS approach for identifying the outbreak of unknown
worms. PAIDS does not rely on signatures. Instead, it takes advantage of the prox-
imity information of compromised hosts. PAIDS operates on an orthogonal dimension
with existing IDS approaches and can thus work collaboratively with existing IDSes
to achieve better performance. We test the effectiveness of PAIDS with trace-driven
simulations and observe that PAIDS has a high detection rate and a low false positive
rate. We also build a proof-of-concept prototype using Google Maps APIs and libpcap
library.
Keywords: Intrusion Detection System; Proximity; Security; Worm.
1 Introduction
Self-propagating worms have been posing serious threats to
today’s Internet [35]. Such malicious programs are capable
of propagating quickly and infecting a significant number
of vulnerable machines in a short period of time (e.g., a few
hours). Victims compromised by worms can form botnets
and be used to launch large-scale attacks (e.g., DDoS and
spamming).
To effectively prevent worms from propagating rapidly,
an IDS (Intrusion Detection System) is needed. Over the
last few decades, numerous intrusion detection mechanisms
have been proposed. These various techniques can be
largely classified into two categories based on the nature
of the knowledge that an IDS has. If the characteristic
patterns (i.e., signatures) of attacks in an IDS are known,
the IDS is referred to as a signature-based IDS. Otherwise,
if the patterns of normal activities are known, the attacks
can be identified by monitoring the differences between the
normal and anomalous activities. An IDS relying on iden-
tifying anomaly features is referred to as an anomaly-based
IDS.
Existing techniques can detect and prevent the propa-
gation of many worms, in particular the known ones. Nev-
ertheless, their operations often fail to work with new and
more intelligent worms since the characteristics patterns
of new worms have not been extracted and analyzed. Due
to the unavailability of patterns of the worms, signature-
based solutions simply do not work. Briefly, two limita-
tions associated with signature-based IDSes prevent them
from functioning effectively. First, it takes considerable
time for detection entities (such as anti-virus software com-
panies) to learn the attack pattern or the signature of the
worms. Thus, before such signatures are available, the
worms may have infected a significant number of machines.
Second, it takes some time for a normal user to utilize such
signatures in the form of updating or installing IDS soft-
ware. On the other hand, modern worms tend to spread
more quickly, and the outbreak period lasts in the order of
hours or even minutes. In particular, flash worms are able
to compromise a large number of hosts in several minutes
[34]. Therefore, signature-based techniques fail to effec-
tively detect the propagation of unknown worms.
Anomaly-based solutions also have their drawbacks. An
anomaly-based IDS examines traffic, activities, or behav-
iors to find anomalies. The underlying principle is that
“attack behaviors” may differ from “normal user behav-
iors.” By cataloging and identifying the differences in-
volved, IDSes can detect a worm in many circumstances.
However, anomaly-based IDSes are far from being effective.
First, since normal behaviors can change easily and read-
ily, anomaly-based IDSes are prone to false positives, i.e.,
many actually normal behaviors are falsely reported as at-
tacks. Second, worms can potentially become increasingly
intelligent to hide their abnormal activities and thus render
most anomaly-based approaches ineffective. For example,
if a worm applies a polymorphic blending attack [11], it
can hide its activities in normal network activities. There-
fore, anomaly-based IDSes may fail to effectively detect
the propagation of intelligent worms.
In this work, our goal is to design an IDS that is ca-
pable of identifying new and intelligent fast-propagating
worms and thwarting their spread, particularly during the
worm’s “start-up” stage. Since neither signature-based nor
anomaly-based techniques can achieve such capabilities, we
ask the question: Given the unknown nature of an imped-
ing worm, its high spreading speed, and its intelligence to
hide its abnormal activities, is it still possible to efficiently
thwart its propagation?
We answer this question with a novel proximity-
assisted approach. Our approach is referred to as PAIDS
(Proximity-Assisted Intrusion Detection System) and is
mainly based on the observations of the clustered pattern
of worm propagation and the typical long active time of a
compromised host. Since vulnerable hosts are highly un-
evenly distributed in the Internet [5], compromised com-
puters are usually centralized in certain outbreak areas,
especially at the early stage of worm propagation. Thus, a
host located in the area with more infected hosts is more
likely to be compromised by the worm. The locations of
the early infected hosts can be known in certain ways, for
instance, using honeypots. In other words, although the
nature and signatures of the upcoming worms cannot be
identified at the early stage, the exact fact that they are
infecting other machines can be detected. Thus, such infor-
mation can be utilized to counteract the worms’ infection
process at their early stage when conventional signature-
based methods fail to work. One important property about
PAIDS is that it is fully complementary to existing ap-
proaches, particularly anomaly-based ones. That is, the
mechanisms included in the proximity-assisted approach
can be utilized in tandem with other approaches since
PAIDS is working on a different dimension from exist-
ing IDSes. Moreover, our preliminary evaluation based
on trace-driven simulations shows that PAIDS has a high
detection rate and a low false positive rate.
The rest of the paper is organized as follows. We first
motivate our design in Section 2. We present the detailed
design in Section 3. We then perform trace-driven simula-
tions in Section 4. We also built a proof-of-concept proto-
type and present the implementation details in Section 5.
We then discuss several important design issues in Section
6. Finally, we conclude our work in Section 7.
2 Motivation
Our proposed approach, PAIDS, is motivated by three ma-
jor observations: (i) limitations of existing IDSes; (ii) clus-
tered pattern of worm spread; and (iii) long active time of
compromised hosts.
2.1 Limitations of existing approaches
• Signature-based or anomaly-based. Intru-
sion detection efforts have been proposed for years and
can be classified as signature-based and anomaly-based.
Signature-based IDSes, such as anti-virus programs (e.g.,
Norton Antivirous and Macfee), use the signatures to rec-
ognize and block infected files, programs, or active Web
content from entering a computer system. Such IDSes are
the most widely used approach in the commercial IDS tech-
nology today. Moreover, abundant research efforts such as
[17, 30, 3, 6, 24, 21] also fall into this category.
Anomaly-based IDSes use rules or predefined concepts
about “normal” and “abnormal” system activities (i.e.,
heuristics) to distinguish anomalies from normal system
behaviors and to monitor or block anomalies. Anomaly-
based IDSes include [16, 33, 22, 40, 7, 1, 39, 9]. For an
anomaly-based IDS, a training process is typically required
to extract normal patterns. Afterwards, when some ac-
tivities are observed to be sufficiently different from such
patterns, they are flagged as abnormal ones, and the cor-
responding actions may be taken. A wide variety of tech-
niques have been explored to approach the anomaly de-
tection problem, such as neural networks [16], statistical
modeling [33], temporal sequence learning [22], n-grams
[40], and states of web applications [7].
However, neither signature-based nor anomaly-based
IDSes can effectively identify new and intelligent worms.
On one hand, identifying a worm using signature-based
IDSes requires the characteristic patterns of the worm.
Such mechanisms will not work with new worms since the
patterns of these worms are not known a priori. Further-
more, even if the signature of a new worm is extracted
after a certain period of time, it takes considerable time
for a normal user to adopt the signature in the form of up-
dating his anti-worm software. On the other hand, modern
worms are becoming increasingly intelligent, and many are
capable of hiding their activities to avoid the detection of
anomaly-based IDSes [11]. Worms achieve such stealthy
goals either by learning the rules used by the IDS or by
acting extremely less aggressively.
• Network-based or host-based. Various IDSes
can also be classified into network-based [38, 28, 13, 37,
25, 32, 10, 13] and host-based ones [15, 36, 8, 26, 31, 12].
The difference between these two categories lies in where
the intelligence used for detection resides. In host-based
IDSes, the intelligence only resides on local hosts, whereas
network-based IDSes rely on the information exchanged
among many hosts. Both categories have their own ad-
vantages. For example, network-based IDSes can monitor
an entire network with only a few well-situated nodes and
impose little overhead on a network. Host-based IDSes
can analyze activities on the host at a high level of details
and determine which processes and/or users are involved
in malicious activities.
Both types of approaches, however, have disadvantages
and may fail to work under certain scenarios. For exam-
ple, although network-based IDSes have the advantage of
knowing aggregate traffic patterns, they may not know
other crucial information such as the process environments
and the exact protocol of a connection. Most of the time
normal users are reluctant to give such information to other
entities including IDSes due to privacy or security con-
cerns. By contrast, host-based IDSes have the full infor-
mation of the connections of a specific host, but they lack
the aggregate view of the network. The disadvantages as-
sociated with either IDSes may greatly affect the effective-
ness of detecting worms. Our designed PAIDS intends to
incorporate the advantages of both IDSes.
2.2 Clustered pattern of worm spread
Once a worm is released to the Internet, it typically starts
from a few hosts. It then guesses the addresses of targets
and attempts to infect them. Typically, worms use random
scanning or localized scanning to spread [35, 41]. Random
2 4 6 8 10 12 14 16 18 20
0
10
20
30
40
50
60
70
80
90
100
Time (hours)
Coverageoftopclusters(%)
Clustered pattern (40*40 Grid)
K=8
K=12
K=16
Figure 1: Top K grids (out of 1600 grids) of U.S. Map
scanning selects targets randomly and has been used by
Code Red v2, Slammer, and Witty worms, whereas local-
ized scanning preferentially searches for targets in the local
network (i.e., sharing the same prefix of IP addresses) and
has been exploited by Code Red II and Nimda worms. Lo-
calized scanning has some advantages over random scan-
ning. For example, hosts within the same subnet often
expose similar vulnerabilities. In addition, in general in-
fecting close-by hosts takes less time due to the smaller
round trip time. Furthermore, many networks are pro-
tected by firewalls, and infecting hosts in the same network
is relatively easier.
Worm spread demonstrates a highly clustered pattern,
especially for localized scanning. The main reason is that
vulnerable hosts are highly unevenly distributed in the In-
ternet [4]. Therefore, at any time of the worm outbreak,
the compromised hosts typically form clusters, e.g., they
reside in close-by geographical locations. To show this,
we study the outbreak of the Code Red v2 worm in July
2001 based on the traces from CAIDA [2]. To examine
the clustered pattern of geographical locations of infected
hosts, we equally divide the US continent map into a num-
ber of grids and count the number of compromised hosts
covered by each grid. We then choose the top K grids
and show the percentage of compromised hosts covered by
these grids. Intuitively, if the cluster degree is higher, more
hosts would reside in the top K grids. Specifically, we di-
vide the map into 40×40 grids (totally 1600) and measure
the percentage of compromised hosts covered by the top 8,
12, and 16 grids (i.e., K = 8, 12, 16). Figure 1 shows our
measurements of the coverage of top K grids over time of
up to 20 hours. We observe the highly clustered patterns.
For example, as shown in the figure, with only 8 grids
(0.5% of total grids), the coverage at the 2-hour time is
about 60%, whereas 16 grids (1% of all grids) cover more
than 72% of all compromised hosts. More interestingly,
such clustered behaviors do not fade quickly over time. As
shown in the figure, even after 20 hours, top 16 grids still
account for more than 53% of all compromised hosts.
Interestingly, the clustering behavior of worm spreading
occurs not only in the geographical dimension, but in the
network address dimension. Works such as [5] have iden-
tified the clustered pattern in the network address space.
Specifically, work [5] finds that more than 80% of mali-
cious sources are clustered in the same 20% of the IPv4
address space over time based on DShield data. Our de-
signed PAIDS can work on both geographical and network
address dimensions. In this paper, we mainly present the
results on the geographical dimension.
2.3 Long active time of compromised hosts
Another observation is that compromised hosts are typ-
ically engaged in infecting other hosts for a considerable
amount of time. Taking the Code Red v2 worm as an
example, we plot the distribution of active time of all com-
promised hosts (more than 359,000) in Figure 2. It can be
seen that 80% of compromised hosts have active time of
longer than 30 minutes, while only 12% of these hosts have
active time shorter than 5 minutes. In other words, the fig-
ure shows that most compromised hosts attempt to infect
other hosts for a considerably long time (e.g., more than
dozens of minutes) before they are stopped. This charac-
teristic of compromised hosts can potentially be utilized
to assist detection. For example, if a compromised host
can be identified as a suspicious one, then hosts that are
communicating with this host should be alerted.
3 Design
We now present the design of PAIDS. After giving an
overview, we will present the deployment model, the soft-
ware architecture, and three major components of PAIDS.
3.1 Overview
The key idea of PAIDS is to take advantage of the clus-
tered pattern of worm outbreak and the long active time
of compromised hosts. Briefly, PAIDS works as follows.
PAIDS runs on a local host as a daemon and keeps moni-
toring the ongoing connections. It records the IP addresses
and port numbers of the hosts with which the local host
is communicating. The recorded information is sent to a
processing center that determines the danger levels of the
corresponding hosts. If the processing center determines
that a recorded host is suspicious, then it reports back to
0−5 min. (12%)
5−10 min. (2%)
10−15 min. (2%)
15−20 min. (2%)
20−25 min. (1%)
25−30 min. (1%)
>30 min. (80%)
Figure 2: Active time distribution of the Code Red v2
worm
the local host so that PAIDS can raise alerts and notify
the user. The processing center makes the decision by col-
lecting the information of suspicious hosts on the Internet
and performing clustering operations to model the danger
level of each area. The clustering is based on proximity in-
formation associated with the suspicious hosts. Note that
proximity is not necessarily limited to geographical and
network address dimensions and can also occur in other di-
mensions such as DNS. In this work, we present the design
of PAIDS only in the context of the geographical dimen-
sion for the purpose of simplicity. However, PAIDS can be
easily changed to utilize other dimensions.
The operations of PAIDS involve the cooperation of sev-
eral entities (shown in Figure 3). Firstly, it requires the
local host to report its communicating hosts (referred to as
“communication neighbors”) to a processing center. The
communication components of this reporting service are
referred to as Remote Connection Monitor (RCM) and
Remote Connection Attendant (RCA). Secondly, the pro-
cessing center is referred to as Security-Level Processing
Engine (SLPE). SLPE is the entity that determines the
danger level of each of the reported neighbors from the
local host. The determination is based on proximity cor-
relations, i.e., the closer the communicating host is to the
outbreak area, the greater chance it might conduct mali-
cious activities. The proximity correlations can be com-
puted using various geographic mapping techniques [27].
One of such techniques is the IP-to-location service [19].
The SLPE can be implemented at a place convenient to the
local hosts, e.g., a dedicated machine of a group or an insti-
tute. Thirdly, it requires a suspicious-activity monitoring
service that typically runs honeynet to attract worms and
collects related information. In this paper we refer to the
monitoring service as Suspicious Hosts Sniffing (SHS). SHS
serves no other purpose than simply recording suspicious
connections reaching the honeynet. The recorded informa-
tion at least includes the IP addresses and the port num-
bers of the suspicious connections. Such information will
expose the outbreak information of new worms for they
have to propagate to other hosts after they are released.
PAIDS addresses the shortcomings of signature-based
and anomaly-based IDSes with a mechanism based on
proximity information. It deals with the unavailability of
worm signatures by using the notion of suspicious hosts,
i.e., the hosts that are captured by honeynet when they
attempt to connect to honeynet. Specifically, if a host is
compromised, it will attempt to infect other hosts. If it
falls into the view of the monitoring service, it is recorded.
Since compromised hosts typically keep active for a con-
siderably long time, such a “black-list” style treatment will
bring benefits. Moreover, since PAIDS is not based on the
known patterns of normal behaviors, it does not have the
drawbacks of the anomaly-based approach.
PAIDS addresses the drawbacks of network-based and
host-based approaches by using a processing center, which
is actually the third layer between a global information cen-
ter and a local host. Introducing such a layer can gain ben-
efits by striking the balance between the advantages and
Figure 3: PAIDS Architecture
disadvantages associated with these two types of IDSes.
Since hosts are more likely to trust a local processing center
and present detailed information to the latter, such under-
taking can bridge the trust gap between these two types of
IDSes. In other words, the processing center can serve as
the intermediate proxy for the information exchanges be-
tween the global information center and local hosts. The
functioning of the processing center works in two direc-
tions. On one hand, it collects the information from local
hosts and reports filtered information to the global center.
On the other hand, it obtains information from the global
center and determines the danger level of reported connec-
tions on behalf of local hosts. Other than the trust benefit,
introducing the processing center can also help relieve the
scalability concern embedded in alternative two-layer ap-
proaches.
3.2 Deployment model of PAIDS
The design of PAIDS is not intended to replace any other
type of IDSes. Thus, it is inappropriate to compare PAIDS
directly with other IDSes. Instead, it complements other
IDSes by working orthogonally with them. For instance,
PAIDS can be combined with an anomaly-based IDS. One
way for such a combination is to let PAIDS adjust the
threshold values usually used in anomaly-based IDSes. An
anomaly-based IDS typically sets certain “threshold” val-
ues to determine whether a connection is suspicious or not.
These values are difficult to determine in many cases for
the determination has to strike a balance between false
positives and false negatives. With PAIDS, the threshold
values can be adjusted according to the danger levels mea-
sured by PAIDS. For example, if a connection involves a
host that is within an outbreak area and thus has a higher
danger level, the threshold values can be adjusted to re-
flect such information. Such treatments are possible since
PAIDS utilizes the proximity information that other IDSes
typically do not consider.
We now use one example scenario to elaborate how
PAIDS can work cooperatively with other existing IDSes.
We choose an anomaly-based IDS simply because of easy
explanation. Note that such a scenario should not be
treated as the only way in which PAIDS can work with
other IDSes. The anomaly-based IDS is Swaddler [7].
Swaddler supports two modes: training and detection
modes. During the training mode, suitable thresholds are
derived to represent the anomalous scores. The system
then switches to the detection mode, and anomalous states
can be reported based on the derived thresholds. PAIDS
can assist the setting of the threshold values by replac-
ing each threshold by a pair of threshold values, one for
safe neighbors and the other for suspicious neighbors. The
suspicious neighbors are the hosts causing PAIDS to raise
alerts, whereas the safe neighbors are those not triggering
alerts. Intuitively, the threshold value for safe neighbors
should be higher than the value for suspicious neighbors
since the suspicious ones require stricter screening. Alter-
natively, PAIDS may help the threshold setting in a finer
way by assigning a range of values. In other words, PAIDS
may define a threshold value function Th = g(x), where x
is the danger level outputted by PAIDS. With such a func-
tion, the threshold value is not fixed across all hosts, it is
adjusted according to the danger level of the corresponding
host.
3.3 Software architecture and components
The software architecture of PAIDS is shown in Figure
3. PAIDS mainly consists of two main entities: a PAIDS
client and a PAIDS server. It also assumes the existence of
a third entity, namely the SHS (Suspicious Hosts Sniffing).
The PAIDS client runs two software components: a RCM
(Remote Connection Monitor) and a Danger Level Fron-
tend. The first component, RCM, keeps track of on-going
connections on the local host and reports the connection
list to the PAIDS server. The second component can be
as simple as a typical browser such as Internet Explorer or
Firefox, but may also include more functionalities such as
providing the user ways to disconnect certain connections.
The PAIDS server consists of two parts: a RCA (Remote
Connection Attendant) and a SLPE (Security-Level Pro-
cessing Engine). The purpose of RCA is simply to receive
the reported connection list from RCM. SLPE is the core
part of PAIDS. It collects information from both RCA and
SHS, determines the danger level of each reported connec-
tion using geographic mapping techniques, and then sends
the relevant information back to the PAIDS client.
We now elaborate on the three major components of
PAIDS:
• RCM and RCA. RCM and RCA are used to report
the local host’s communicating neighbors to the SLPE.
Specifically, RCM runs on the local host side, whereas
RCA runs on the server side. The basic functionalities
of RCM are to monitor and report. RCM keeps track of
all the ongoing connections of the local host and is capable
of reporting to RCA periodically (i.e., pushing) or in the
on-demand fashion (i.e., pulling). The major advantage of
pushing information to RCA is the timeliness. However,
it incurs higher communication and processing overhead.
In contrast, pulling information from RCM requires less
overhead, but it inflates the information gathering time.
RCA collects connection information from RCM. Under
the pushing model, RCA is simply a server daemon col-
lecting the reports of RCM. Under the pulling model, RCA
pro-actively requests the information from RCM.
• Security Level Processing Engine (SLPE). The
SLPE is the key component of PAIDS, determining the
danger level of the reported neighbors using the IP-to-
location service. The determination is based on the dis-
tance between a neighbor and the outbreak areas. For-
mally, given a set of outbreak areas S = {S1, S2, ..., SI}
and a list of neighbor hosts H = {H1, H2, ..., HJ }, SLPE
will output a danger level vector of L = {L1, L2, ..., LJ },
where Li is the danger level of neighbor host Hi. Specifi-
cally, let di,j denote the distance between Hi and Sj. The
danger level of Hi is given by a function f(di,j, S). The de-
sign of f(di,j, S) by itself is an important and interesting
problem. Due to space limitations, we only provide two
simple forms. The first form is the average value of all di,j
on Sj:
Li =
1
I
I∑
j=1
di,j. (1)
The second form simply chooses the minimum value of
all di,j, i.e., Li = minj {di,j}. Both forms have advan-
tages and disadvantages. Without explicit notes, in the
following we only use the first form. If SHS also reports
the normalized intensity of each outbreak areas, the above
equation is given as
Li =
1
I
I∑
j=1
di,jtj, (2)
where tj is the normalized intensity of Sj and
∑I
j=1 tj = 1.
After computing the danger level of a neighboring host,
SLPE compares its danger level value to a pre-defined
threshold value Td. If the danger level value is below the
threshold, which implies that the host is sufficiently close
to the outbreak area, the host will be flagged as suspicious.
One important design issue of SLPE is the scalability.
As the number of compromised hosts increases, the pro-
cessing overhead on SLPE also increases. To reduce the
processing overhead, SLPE performs clustering of sniffed
suspicious hosts. Clustering can effectively improve scala-
bility by organizing compromised hosts into a list of out-
break areas. The list of outbreak areas is referred to as
Security Vector (SV), which essentially records the loca-
tion and the intensity of the outbreak areas. There are
many ways to perform clustering, and one popular way is
to use K-means. With K-means, the number of outbreak
areas is fixed to K irrespective of the number of compro-
mised hosts, and K is practically much smaller than the
number of compromised hosts.
• Suspicious Hosts Sniffing (SHS). SHS is respon-
sible for collecting the activities of suspicious hosts and
reporting to SLPE. The implementation of SHS can be as
simple as an individual honeypot or a more complicated
form such as honeynet [29]. Since typically SHS does not
have legitimate hosts, SHS records suspicious hosts when-
ever it receives connection requests from other hosts. The
information that SHS records can be simply a list of IP
addresses or both IP addresses and port numbers. To
take full advantage of the suspicious activities, SHS may
also record the intensity of the threat by counting the oc-
currence frequencies of specific connections. Since there
might be a significant number of suspicious activities, SHS
may aggregate individual hosts into suspicious outbreak ar-
eas. In other words, each outbreak area may contain many
hosts and can be represented in the format of < C, r, t >,
namely, the center C, the radium r, and the outbreak in-
tensity t.
3.4 Pseudo-code of main components
The main operations of the components of PAIDS are
shown with pseudo-code in Figure 4. Briefly, (i) SHS
keeps monitoring the abnormal activities of the network
and records the information of the corresponding host.
The recorded information includes the IP address, port
number, security intensity, and so on. The information is
periodically reported to SLPE for further processing. (ii)
SLPE gathers the reported information about suspicious
hosts and performs clustering operations to aggregate the
security information. The output of clustering is a secu-
rity vector SV , which maintains the clustered security re-
gions. Each of the regions is an aggregation of a set of
suspicious hosts. The clustering can be achieved by em-
ploying any clustering algorithm such as K-means. (iii)
RCA receives the request from RCM and responds with
the determined danger level. The request consists of the
information about a host that needs to determine the dan-
ger level. RCA calculates the danger level based on SV
maintained by SLPE. Because of such reliance, it is better
to have SLPE and RCA co-located on the same machine.
(iv) RCM runs locally by users and is triggered every time
a new connection is setup. RCM extracts the information
of the communication neighbor and inquires RCA about
the danger level of the neighbor.
(a) Component of SHS
Variables
Hi: A suspicious host
1 Keep monitoring network activities
2 If the activity (involving Hi) is suspicious
3 Record information of Hi (IP, Port, etc.)
4 Report information of Hi to SLPE
5 End
6 End
(b) Component of SLPE
Variables
Hi: A suspicious host
SV : Security vector maintained by SLPE
1 Keep gathering reports from SHS
2 Enqueue new suspicious host Hi information
3 Obtain the proximity information of Hi
4 Re-clustering by incorporating Hi
5 Output a new SV
6 End
(c) Component of RCA
Variables
Hj: A host that needs to determine danger level
SV : Security vector maintained by SLPE
1 Keep waiting for RCM requests
2 Obtain information (IP, port, etc.) of corresponding Hj
3 Determine the danger level of Hj based on SV
4 Respond to RCM
5 End
(d) Component of RCM
Variables
Hj: A host that needs to determine danger level
1 Keep monitoring connections on local host
2 Extract information (IP, port, etc.) of the neighbor Hj
3 Report Hj to RCA
4 Get response from RCA
5 Determine actions to take for the connection
6 End
Figure 4: Pseudo-code : (a) SHS, (b) SLPE, (c) RCA, (d) RCM
4 Evaluation
PAIDS works complementarily with other IDSes such as
anomaly-based ones. Thus, a thorough evaluation of
PAIDS requires the integration with another IDS, either
at the live time of an actively propagating worm or with
the complete traces of that IDS on Internet. Partly due to
lack of such traces, we turn to simulations at this stage,
and we view the abovementioned integrated evaluation as
future work. Specifically, in this work we evaluate the ef-
fectiveness of PAIDS by performing simulations with the
propagation traces of a recorded worm from CAIDA [2].
4.1 Trace-driven simulations
The simulation uses the outbreak trace of the Code Red v2
worm from CAIDA. We focus on two performance metrics:
detection rate and false positive rate.
The simulation is performed as follows. First, for all
the recorded compromised hosts, we sort them according
to their starting time, i.e., the time they are observed to
begin to infect other hosts. The method is illustrated in
Figure 5, where all hosts are ranked along the time line.
For each of the hosts, we assume that their infectious activ-
Figure 5: Illustration of Trace-driven Evaluation
ities can be sniffed by SHS and their information are sent
to SLPE after a certain latency. This latency is referred
to as the Sniffing Latency, denoted by SL. Let Ci denote
the set of all compromised hosts when host Hi is compro-
mised, and we have Ci = {Hj, j ≤ i}. Let Hm denote the
last host that is at least SL time ahead of the infection of
Hi. We use Ri to denote the set of all recorded hosts be-
fore host Hi, i.e., Ri = {Hj, j ≤ m}. We assume that the
honeynet can provide Ri immediately, but cannot provide
the signatures and the patterns in a short time. With these
notations, Ci contains all hosts that are actually compro-
mised when Hi is compromised, whereas Ri contains all
hosts in Ci that are sniffed or recorded by SHS. In other
words, Ri is only a subset of Ci.
Since the trace from CAIDA does not provide the infor-
mation of “who infects whom”, we make several assump-
tions to evaluate the performance of PAIDS. For a host Hv,
we assume that PAIDS is running on this host and report-
ing suspicious hosts with which the host is communicating.
Moreover, we assume that Hv is communicating with Hi.
We then measure the effectiveness of PAIDS by monitoring
how well PAIDS can detect the infected host Hi. That is,
if PAIDS was running on host Hv when the host Hi was
infected and begin infecting Hv, then PAIDS running on
Hv may be able to determine the danger level of Hi based
on the information of Ri. If PAIDS is able to raise the
alert, then it is considered to be effective. Otherwise, it is
considered to be ineffective. Since Ri is only a subset of Ci
(i.e., Ri ⊂ Ci), PAIDS running on Hv may or may not be
able to raise an alert depending on the current threshold
value. Specifically, we compute the averaged distance of
Hi to Ri, i.e.,
Li =
1
|Ri|
∑
j
dHi,Hj , where Hj ∈ Ri.
0 2000 4000 6000 8000 10000
0
10
20
30
40
50
60
70
80
90
100
Number of Compromised Hosts
Percentage(%)
Detection Rate
False Positive Rate
(a) Threshold = 10
0 2000 4000 6000 8000 10000
0
10
20
30
40
50
60
70
80
90
100
Number of Compromised Hosts
Percentage(%)
Detection Rate
False Positive Rate
(b) Threshold = 15
0 2000 4000 6000 8000 10000
0
10
20
30
40
50
60
70
80
90
100
Number of Compromised Hosts
Percentage(%)
Detection Rate
False Positive Rate
(c) Threshold = 20
Figure 6: Effectiveness of PAIDS (Sniffing Latency = 100 seconds)
Note that such a scenario provides a worst case for PAIDS
since we consider the time slot right after host Hi is in-
fected when the size of SHS set Ri is smallest.
After Li is obtained, it is then compared to Td, a pre-
configured threshold value. If Li is smaller than or equal
to Td, then host Hi is considered to be in danger, and the
corresponding effectiveness value is set to 1. Otherwise, if
Li is larger, then the effectiveness value is set to 0, as illus-
trated below: Alerti = 1, if Li ≤ Td; Alerti = 0, otherwise.
We measure the effectiveness of PAIDS for the first
10,000 infected hosts. For every 500 hosts, we output
a detection rate that is calculated based on the number
of alerts raised. Specifically, for each of the 500 hosts, if
PAIDS can raise the alert, we count 1; otherwise, 0. Then
we obtain the detection rate by dividing the sum of alerts
by 500. Hence, denoting the number of alerts as Alert, the
effectiveness of PAIDS is calculated as Alert
500 .
The false positive rate is obtained by considering a ran-
domized scenario. Specifically, we generate a scenario by
randomly choosing a host that is not compromised and
is communicating with host Hv. That is, we introduce a
host with a random IP address and use this host to re-
place the host Hi in each step. As a result, the randomly
selected neighbor represents the false positive. Using the
same method as described in calculating the detection rate,
we also obtain the false positive rate.
10 15 20 25 30 35 40
0
10
20
30
40
50
60
70
80
90
100
Threshold
Percentage(%)
Figure 7: Impact of Thresholds
4.2 Results
• Impact of Td. We study the impact of Td by varying
its values. We set the SL = 100 seconds and test three
cases: Td = 10, 15, and 20. We consider totally 10,000
hosts, and Figure 6 shows the results. In the figure, we ob-
serve two trends. Firstly, for randomly introduced hosts,
only very few false positives are raised, whereas for com-
promised hosts, PAIDS can raise a very high percentage
of alerts. For example, with Td = 15, about 75% of com-
promised hosts are detected. Secondly, the effectiveness
increases with higher values of Td. These results are ex-
pected since higher Td means that more neighbor hosts are
treated as suspicious, thus reducing the false negatives. To
more closely study the trend, we vary Td from 10 to 40 and
show the results in Figure 7. We can see that Td = 10 gives
the detection rate only 51%, and the rate quickly reaches
91% when Td = 20. When Td = 40, it can detect up to
96% compromised hosts. The false positive rates for all
results shown in the figure are almost zero.
• Impact of SL. We also vary the Sniffing Latency
value SL from 25 seconds to 150 seconds and evaluate the
impact. The results are shown in Figure 8. Note that in
these experiments, the threshold is 15 (i.e., Td = 15). We
observe that as SL increases, the detection rate of PAIDS
decreases. For instance, with SL = 25 seconds, the de-
tection rate is about 79%, while with SL = 150 seconds,
the rate drops to about 74%. This is because a larger SL
value means that it takes a longer time for SHS to learn
the information about compromised hosts.
We further test the impact of SL with varying values
and show the results in Figure 9. We can see that as SL
increases from 25 seconds to 200 seconds, the effectiveness
only drops from about 79% to 74%. These results im-
ply that the effectiveness of PAIDS is relatively insensitive
to the changes of SL, which is actually an advantage of
PAIDS.
• Impact of f(di,j, S). As we elaborated in Section
3, the choice of f(di,j, S), the danger level function for a
host Hi, also needs the careful design. We have proposed
two forms of this function. In all above evaluations we
use the average form, which calculates the danger level by
averaging on all di,j. We also preliminarily evaluate the
impact of the other form, which takes the minimum value
0 2000 4000 6000 8000 10000
0
10
20
30
40
50
60
70
80
90
100
Number of Compromised Hosts
Percentage(%)
Detection Rate
False Positive Rate
(a) Sniffing Latency = 25 seconds
0 2000 4000 6000 8000 10000
0
10
20
30
40
50
60
70
80
90
100
Number of Compromised Hosts
Percentage(%)
Detection Rate
False Positive Rate
(b) Sniffing Latency = 50 seconds
0 2000 4000 6000 8000 10000
0
10
20
30
40
50
60
70
80
90
100
Number of Compromised Hosts
Percentage(%)
Detection Rate
False Positive Rate
(c) Sniffing Latency = 150 seconds
Figure 8: Effectiveness of PAIDS (Threshold = 15)
of all di,j.
Since the second form of f(di,j, S) takes the minimum
value of all di,j, the value of f(di,j, S) is much smaller
than the first form. We choose three threshold values for
Td, namely, 0.01, 0.05, and 0.1. The results are shown
in Figure 10. We see that as Td increases, the detection
rates also gradually increase, and with Td = 0.1 the rate
almost reaches 100%. The false positive rates are kept very
low. Specifically, our results indicate that the highest false
rate is only 1.3%. The interesting observation regarding
the false positive rate is that the rate increases as more
hosts are compromised. Part of the reason for such a re-
sult is that with increasing number of hosts, the randomly
selected hosts have a higher probability of being close to
20 40 60 80 100 120 140 160 180 200
0
10
20
30
40
50
60
70
80
90
100
Sniffing Latency
Percentage(%)
Figure 9: Impact of Sniffing Latency
0 2000 4000 6000 8000 10000
0
10
20
30
40
50
60
70
80
90
100
Number of Compromised Hosts
Percentage(%)
Detection Rate (Threshold = 0.01)
Detection Rate (Threshold = 0.05)
Detection Rate (Threshold = 0.1)
False Positive Rate (Threshold=0.01)
False Positive Rate (Threshold=0.05)
False Positive Rate (Threshold=0.1)
Figure 10: Impact of Danger Level Function
outbreak areas.
It would be interesting to study the question of applying
which form of f(di,j, S) under what conditions. Although
the page limit does not allow us to present the details,
in short, our results suggest that the average form has
the better false positive rate (all are zero), but with the
relatively high detection rate; whereas the minimum form
has the non-zero false positive rate (though very small),
but with the higher detection rate.
5 Prototype
To demonstrate that PAIDS is actually doable, we imple-
ment a proof-of-concept prototype. The prototype uses
Google Maps APIs [14] and libpcap library [23] on the client
side and our implemented IP-to-location service [18] on the
server side.
The prototype demonstrates the power of PAIDS by en-
hancing users’ intervention with PAIDS. The enhancement
works by monitoring the ongoing connections and then
identifying suspicious hosts that are located in or near out-
break areas. Once PAIDS identifies a suspicious host, it
will raise the alert and inform the users who then take
appropriate actions. With this prototype, PAIDS comple-
ments the default user intervention model by differentiat-
ing the neighboring hosts based on their danger levels.
On the client side, the prototype firstly implements a
tool that can assist normal users to better understand their
ongoing connections. The tool utilizes the Google Maps
APIs. With the assistance of the security information, the
tool will raise alarms to the users when they are commu-
nicating with hosts located in outbreak areas. On the pro-
cessing center side, the prototype identifies suspicious hosts
that are located in or near outbreak areas. By computing
the distances between the outbreak areas and these remote
hosts, it determines the danger levels of those computers.
If the danger level is high enough, the prototype will warn
the user and present the threat information (such as the
remote hosts’ location, compromised areas, running proto-
cols, and the danger level) in visualized ways. Based on
such information, the user can take further actions.
client side also runs a monitoring daemon (i.e., RCM),
which keeps tracks of the ongoing connections of the client
and periodically reports the connection list to the process-
ing center attendant daemon (i.e., RCA). RCA is imple-
mented using the libpcap library for Linux hosts and con-
tinuously monitors the connections and refreshes the host
list in two ways. It periodically refreshes the host list ev-
ery 30 seconds, and under the user’s request, it can also
refresh immediately.
The processing center runs a SLPE instance that calls
the IP-to-location service running on the same center. Af-
ter receiving the updated host list from the client and the
compromised clusters from the worm-sniffing service (i.e.,
SHS), SLPE calls the IP-to-location service to compute
the distance between each of the client neighbors and the
known compromised areas. For simplicity, the prototype
assumes the presence of a SHS service and emulates the
SHS by a preloaded file that consists a suspicious IP list.
After all danger levels are known, client neighbors are
ranked according to the distance and assigned with danger
levels. The information is written to a file that will be read
by the Maps-generating program. The Maps-generating
program uses PHP scripts and will show the geographical
locations as well as the danger level associated with each
host once requested by the client.
The IP-to-location service is implemented using the
database provided by [18]. It contains millions of ASes and
the corresponding information such as latitudes and longi-
tudes. Given an IP address, it looks up the corresponding
AS and extracts the latitude and longitude information,
which is used to calculate the geographical distances be-
tween hosts. The database is updated on a daily basis.
6 Discussion
The effectiveness of PAIDS partly relies on the clustered
pattern of worm propagation. In this section, we discuss
several important issues relevant to the effectiveness of
PAIDS. These issues include: (i) Design issues of suspi-
cious hosts sniffing (SHS), (ii) Stages of worm propagation,
(iii) Non-prioritized candidate selection, (iv) Mismatch be-
tween geographical proximity information and networks,
and (v) Clustering of normal Internet hosts.
6.1 Design issues of suspicious hosts sniffing
(SHS)
In the PAIDS design, we assume that infected hosts can be
sniffed by SHS with a certain delay. The strength of this
assumption depends on the ability of honeynets to capture
suspicious worm activities on the Internet. For honeypots
or honeynets with large sizes, such an assumption makes
more sense, whereas for small-size honeynets, this is indeed
a strong assumption. Small-size honeynets may take a long
time to detect infected hosts, which results in a smaller set
of Ri when compared with Ci. However, as we observed in
the evaluation, PAIDS is relatively insensitive to the sniff-
ing latency SL. In summary, an appropriate deployment
of PAIDS should also consider the honeynets in use since
the honeynets’ capability impacts the sniffing latency.
6.2 Stages of worm propagation
PAIDS is expected to work well when the compromised
hosts form clusters. However, as a propagation instance
goes on, eventually there will be a larger number of com-
promised hosts, and they may fail to exhibit the apparent
clustered pattern. For example, if half of Internet are in-
fected, then it would be very difficult to extract the clus-
tered pattern that can be used effectively from these hosts.
Thus, PAIDS works better when a worm just starts to
propagate and the number of compromised hosts is rela-
tively small. We argue that this feature is not a drawback
of PAIDS, but the exact advantage of PAIDS. First, when
a new worm is created and released, it is very difficult
to quickly learn the signature of the worm. Thus, being
capable of detecting such worms at their initial propaga-
tion stages is of vital importance. Second, when a consid-
erable number of hosts are compromised, existing IDSes,
which PAIDS complements, will extract the characteristic
patterns of worms and be able to detect them. In other
words, an IDS that combines PAIDS and a conventional
IDS should be able to achieve much better performance
since they can work complementarily.
6.3 Non-prioritized candidate selection
Worms propagate in a clustered fashion due to several ad-
vantages including the similar vulnerability, low latency,
and no firewalls. The clustering pattern is more appar-
ent when an infected host chooses candidate hosts with
localized scanning (LS). That is, LS gives higher priority
to local hosts and actually inflates the clustering behav-
ior of infected host. Examples of worms using LS include
Code Red II and Nimada worms. By comparison, if worms
choose candidate hosts with random scanning (RS), then
the clustered pattern may be less apparent. Interestingly,
the Code Red V2 worm that we studied in Section 2 uses
RS to propagate, but the trace still shows the highly clus-
tered pattern. We suspect that part of the reason may
be related to the similar vulnerability present in local net-
works. We envision that worms with LS will exhibit a
higher degree of the clustered pattern.
Another issue regarding candidate selection is “flash
worms”[34]. , which hard-code a list of hosts with known
vulnerabilities in the worms. Since hosts in such a list
may not reside in clusters, the propagation in turn might
not exhibit the obvious clustered pattern. However, since
the number of encoded hosts is typically limited due to
various constraints such as the code size, worms will soon
need to rely on other ways to select candidate hosts to in-
fect. Hence, the hard-coded host list only affects the initial
“seeds” of the propagation of a worm.
6.4 Mismatch between geographical proximity in-
formation and networks
One assumption made by PAIDS is that the network ad-
dress can approximately map to the geographical loca-
tion. Since there may be a mismatch between proximity
information and network addresses, one may argue that
PAIDS might not work when such mismatches occur. On
one hand, geographically close hosts may not be in the
same network. On the other hand, IP addresses that ap-
pear to be in the same network may be located far away
from each other due to the existence of techniques such
as VPN. Although such instances do occur, they do not
seriously negate the effectiveness of PAIDS for several rea-
sons. First, the fact that geographically close hosts re-
siding in different networks does not affect PAIDS since
PAIDS works in the direction of mapping network address
to the proximity information, rather than the other direc-
tion. Second, the occurrence of VPN-like instances is con-
siderably small. In other words, most of the time, hosts in
the same network are located closely. Third, even such a
mismatch occurs, PAIDS may still address the issue to a
great extent because it records the individual IP addresses
of suspicious hosts and utilizes such information to deter-
mine danger levels. Fourth, as we mentioned in Section
3, PAIDS can also utilize proximity information along di-
mensions other than the geographical dimension. Specifi-
cally, the mismatch can be easily addressed by combining
network-proximity information in an appropriate way.
6.5 Clustering of normal Internet hosts
Research has shown that normal Internet hosts form clus-
ters [20]. This clustering behavior has an important im-
pact on the effectiveness of PAIDS in the form of increased
or decreased false positives and false negatives. To better
understand the impact, we take a simple scenario where
there are only two physically equal-size areas A and B. We
further assume that area A is a big city with higher popu-
lation and B is a rural region. Thus, naturally A has more
Internet hosts and is clustered with a higher degree. Now
we have two cases of worm propagation since the worm
propagation cluster can be formed in either A or B. If the
propagation cluster occurs in area A, then PAIDS will re-
port more false positives since hosts in A are more likely
to be treated as false positives. On the other hand, if the
propagation cluster occurs in area B, then PAIDS will re-
port more false negatives and less false positives because
hosts in B are more likely to cause false negativesdue to
similar reasons. We note that the impact of normal Inter-
net hosts clustering is important, and it needs to be taken
into consideration in the future design of PAIDS.
7 Conclusions
In this work, we have studied the problem of intrusion de-
tection and focused on the detection of unidentified worms.
We have observed the failures and the drawbacks of exist-
ing IDSes and approached the problem in a novel way.
Our design is based on the typical behaviors of worm out-
breaks. By utilizing the proximity and activity information
of worm outbreaks, we have proposed a detection system,
called PAIDS, that is complementary to existing IDSes.
Our trace-driven simulation shows that PAIDS can detect
an unidentified worm with a high detection rate and a
low false positive rate. We have also built a prototype
of PAIDS to demonstrate its usage.
8 Acknowledgements
We thank Dr. Jonathon Giffin at Georgia Institute of Tech-
nology and the anonymous reviewers for their insightful
comments that helped us improve the quality of this re-
search.
REFERENCES
[1] S. Bhatkar, A. Chaturvedi, and R. Sekar. Dataflow
anomaly detection. In SP ’06: Proceedings of the 2006
IEEE Symposium on Security and Privacy, pages 48–62,
Washington, DC, USA, 2006. IEEE Computer Society.
[2] CAIDA. The cooperative association for internet data
analysis. In http: // www. caida. org/ home/ .
[3] R. Cathey, L. Ma, N. Goharian, and D. Grossman. Mis-
use detection for information retrieval systems. In CIKM
’03: Proceedings of the twelfth international conference on
Information and knowledge management, 2003.
[4] Z. Chen and C. Ji. Measuring network-aware worm spread-
ing ability. In Proceedings of IEEE INFOCOM, pages 116–
124, Anchorage, AK, USA, 2007.
[5] Z. Chen, C. Ji, and B. Paul. Spatial-temporal characteris-
tics of internet malicious sources. In Proceedings of IEEE
INFOCOM (Mini-Conference), Phoenix, AZ, USA, 2008.
[6] C. Y. Chung, M. Gertz, and K. Levitt. Demids: A mis-
use detection system for database systems. In Proceedings
of the Third International IFIP TC-11 WG11.5 Working
Conference on Integrity and Internal Control in Informa-
tion Systems, pages 159–178. Kluwer Academic Publishers,
1999.
[7] M. Cova, D. Balzarotti, V. Felmetsger, and G. Vigna.
Swaddler: An approach for the anomaly-based detection
of state violations in web applications. In Proceedings of
RAID, pages 63–86, Queensland, Australia, September 5–
7, 2007.
[8] H. Debar, M. Dacier, and A. Wespi. A revised taxonomy
for intrusion detection systems. Technical report, IBM
Research, October 1999.
[9] D. E. Denning. An intrusion-detection model. IEEE Trans.
Softw. Eng., 13(2):222–232, 1987.
[10] H. Dreger, A. Feldmann, M. Mai, V. Paxson, and R. Som-
mer. Dynamic application-layer protocol analysis for net-
work intrusion detection. In Proceeding of USENIX-SS’06,
Berkeley, CA, USA, 2006. USENIX Association.
[11] P. Fogla and W. Lee. Evading network anomaly detec-
tion systems: formal reasoning and practical techniques.
In CCS ’06: Proceedings of the 13th ACM conference on
Computer and communications security, pages 59–68, New
York, NY, USA, 2006. ACM.
[12] J. T. Giffin, D. Dagon, S. Jha, W. Lee, and B. P. Miller.
Environment-sensitive intrusion detection. In Proceedings
of RAID, pages 185–206, 2005.
[13] J. M. Gonz´alez and V. Paxson. Enhancing network intru-
sion detection with integrated sampling and filtering. In
Proceedings of RAID, pages 272–289, 2006.
[14] Google. Google maps api. In http: // code. google. com/
apis/ maps/ .
[15] R. Gopalakrishna, E. H. Spafford, and J. Vitek. Efficient
intrusion detection using automaton inlining. In SP ’05:
Proceedings of the 2005 IEEE Symposium on Security and
Privacy, pages 18–31, Washington, DC, USA, 2005. IEEE
Computer Society.
[16] A. K. Gosh, J. Wanken, and F. Charron. Detecting anoma-
lous and unknown intrusions against programs. In ACSAC
’98: Proceedings of the 14th Annual Computer Security
Applications Conference, page 259, Washington, DC, USA,
1998. IEEE Computer Society.
[17] H.Bos and K. Huang. Towards software-based signature
detection for intrusion prevention on the network card. In
Proceedings of RAID, Seattle, WA, USA, 2005. ACM.
[18] Hostip. Ip address lookup. In http: // www. hostip. info/
dl/ index. html .
[19] IP2Location. Geolocation ip address to country city region
latitude longitude. In http://www.ip2location.com/.
[20] B. Krishnamurthy and J. Wang. On network-aware clus-
tering of web clients. SIGCOMM Comput. Commun. Rev.,
30(4):97–110, 2000.
[21] C. Kr¨ugel and T. Toth. Using decision trees to improve
signature-based intrusion detection. In Proceedings of
RAID, pages 173–191, 2003.
[22] T. Lane and C. E. Brodley. Temporal sequence learning
and data reduction for anomaly detection. ACM Trans.
Inf. Syst. Secur., 2(3):295–331, 1999.
[23] Libpcap. The libpcap project. In http: // sourceforge.
net/ projects/ libpcap/ .
[24] U. Lindqvist and P. A. Porras. Detecting computer and
network misuse through the production-based expert sys-
tem toolset (p-BEST). In IEEE Symposium on Security
and Privacy, pages 146–161, 1999.
[25] C. Muelder, K.-L. Ma, and T. Bartoletti. Interactive visu-
alization for network and port scan detection. In Proceed-
ings of RAID, pages 265–283, 2005.
[26] D. Mutz, W. K. Robertson, G. Vigna, and R. A. Kem-
merer. Exploiting execution context for the detection of
anomalous system calls. In Proceedings of RAID, pages
1–20, 2007.
[27] V. N. Padmanabhan and L. Subramanian. An investiga-
tion of geographic mapping techniques for internet hosts.
In SIGCOMM ’01, pages 173–185, New York, NY, USA,
2001. ACM.
[28] M. Polychronakis, K. G. Anagnostakis, and E. P.
Markatos. Emulation-based detection of non-self-
contained polymorphic shellcode. In Proceedings of RAID,
volume 4637, pages 87–106. Springer, 2007.
[29] M. A. Rajab, F. Monrose, and A. Terzis. On the effective-
ness of distributed worm monitoring. In SSYM’05: Pro-
ceedings of the 14th conference on USENIX Security Sym-
posium, pages 15–15, Baltimore, MD, USA, 2005. USENIX
Association.
[30] S. Rubin, S. Jha, and B. P. Miller. Language-based gen-
eration and evaluation of nids signatures. In SP ’05: Pro-
ceedings of the 2005 IEEE Symposium on Security and
Privacy, pages 3–17, Washington, DC, USA, 2005. IEEE
Computer Society.
[31] M. I. Sharif, K. Singh, J. T. Giffin, and W. Lee. Under-
standing precision in host based intrusion detection. In
Proceedings of RAID, pages 21–41, 2007.
[32] S. Sinha, F. Jahanian, and J. M. Patel. Wind: Workload-
aware intrusion detection. In Proceedings of RAID, pages
290–310, 2006.
[33] A. Soule, K. Salamatian, and N. Taft. Combining filtering
and statistical methods for anomaly detection. In IMC’05:
Proceedings of the Internet Measurement Conference 2005
on Internet Measurement Conference, pages 31–31, Berke-
ley, CA, USA, 2005. USENIX Association.
[34] S. Staniford, D. Moore, V. Paxson, and N. Weaver. The
top speed of flash worms. In WORM ’04: Proceedings of
the 2004 ACM workshop on Rapid malcode, pages 33–42,
New York, NY, USA, 2004. ACM.
[35] S. Staniford, V. Paxson, and N. Weaver. How to 0wn the
internet in your spare time. In Proceedings of the 11th
USENIX Security, San Francisco, CA, USA, 2002.
[36] Sufatrio and R. H. C. Yap. Improving host-based ids with
argument abstraction to prevent mimicry attacks. In Pro-
ceedings of RAID, pages 146–164, 2005.
[37] M. Vallentin, R. Sommer, J. Lee, C. Leres, V. Paxson, and
B. Tierney. The nids cluster: Scalable, stateful network in-
trusion detection on commodity hardware. In Proceedings
of RAID, pages 107–126, 2007.
[38] G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P.
Markatos, and S. Ioannidis. Gnort: High performance
network intrusion detection using graphics processors. In
R. Lippmann, E. Kirda, and A. Trachtenberg, editors, Pro-
ceedings of RAID, volume 5230 of Lecture Notes in Com-
puter Science, pages 116–134. Springer, 2008.
[39] K. Wang, G. F. Cretu, and S. J. Stolfo. Anomalous
payload-based worm detection and signature generation.
In Proceedings of RAID, pages 227–246, Seattle, Washing-
ton, USA, 2005.
[40] K. Wang, J. J. Parekh, and S. J. Stolfo. Anagram: A
content anomaly detector resistant to mimicry attack. In
Proceedings of RAID, pages 226–248, Hamburg, Germany,
2006.
[41] C. C. Zou, D. Towsley, and W. Gong. On the perfor-
mance of internet worm scanning strategies. Perform.
Eval., 63(7):700–723, 2006.

More Related Content

What's hot

Artificial Intelligence in Virus Detection & Recognition
Artificial Intelligence in Virus Detection & RecognitionArtificial Intelligence in Virus Detection & Recognition
Artificial Intelligence in Virus Detection & Recognition
ahmadali999
 
A cooperative immunization system for an untrusting internet
A cooperative immunization system for an untrusting internetA cooperative immunization system for an untrusting internet
A cooperative immunization system for an untrusting internet
UltraUploader
 
2011-A_Novel_Approach_to_Troubleshoot_Security_Attacks_in_Local_Area_Networks...
2011-A_Novel_Approach_to_Troubleshoot_Security_Attacks_in_Local_Area_Networks...2011-A_Novel_Approach_to_Troubleshoot_Security_Attacks_in_Local_Area_Networks...
2011-A_Novel_Approach_to_Troubleshoot_Security_Attacks_in_Local_Area_Networks...
Mrunalini Koritala
 
A taxonomy of computer worms
A taxonomy of computer wormsA taxonomy of computer worms
A taxonomy of computer worms
UltraUploader
 

What's hot (19)

Artificial Intelligence Methods in Virus Detection & Recognition - Introducti...
Artificial Intelligence Methods in Virus Detection & Recognition - Introducti...Artificial Intelligence Methods in Virus Detection & Recognition - Introducti...
Artificial Intelligence Methods in Virus Detection & Recognition - Introducti...
 
Clustering Categorical Data for Internet Security Applications
Clustering Categorical Data for Internet Security ApplicationsClustering Categorical Data for Internet Security Applications
Clustering Categorical Data for Internet Security Applications
 
Artificial Intelligence in Virus Detection & Recognition
Artificial Intelligence in Virus Detection & RecognitionArtificial Intelligence in Virus Detection & Recognition
Artificial Intelligence in Virus Detection & Recognition
 
NetWitness
NetWitnessNetWitness
NetWitness
 
Eh34803812
Eh34803812Eh34803812
Eh34803812
 
The Modern Malware Review March 2013
The Modern Malware Review March 2013The Modern Malware Review March 2013
The Modern Malware Review March 2013
 
The modern-malware-review-march-2013
The modern-malware-review-march-2013 The modern-malware-review-march-2013
The modern-malware-review-march-2013
 
Ns unit 6,7,8
Ns unit 6,7,8Ns unit 6,7,8
Ns unit 6,7,8
 
10a98 virus111
10a98 virus11110a98 virus111
10a98 virus111
 
A cooperative immunization system for an untrusting internet
A cooperative immunization system for an untrusting internetA cooperative immunization system for an untrusting internet
A cooperative immunization system for an untrusting internet
 
2011-A_Novel_Approach_to_Troubleshoot_Security_Attacks_in_Local_Area_Networks...
2011-A_Novel_Approach_to_Troubleshoot_Security_Attacks_in_Local_Area_Networks...2011-A_Novel_Approach_to_Troubleshoot_Security_Attacks_in_Local_Area_Networks...
2011-A_Novel_Approach_to_Troubleshoot_Security_Attacks_in_Local_Area_Networks...
 
Cyber crime trends in 2013
Cyber crime trends in 2013 Cyber crime trends in 2013
Cyber crime trends in 2013
 
Meet anomaly detection: a powerful cybersecurity defense mechanism when its w...
Meet anomaly detection: a powerful cybersecurity defense mechanism when its w...Meet anomaly detection: a powerful cybersecurity defense mechanism when its w...
Meet anomaly detection: a powerful cybersecurity defense mechanism when its w...
 
Virus detection based on virus throttle technology
Virus detection based on virus throttle technologyVirus detection based on virus throttle technology
Virus detection based on virus throttle technology
 
Modern Malware and Threats
Modern Malware and ThreatsModern Malware and Threats
Modern Malware and Threats
 
Malware detection
Malware detectionMalware detection
Malware detection
 
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
Needles, Haystacks and Algorithms: Using Machine Learning to detect complex t...
 
A taxonomy of computer worms
A taxonomy of computer wormsA taxonomy of computer worms
A taxonomy of computer worms
 
Anton Chuvakin on What is NOT Working in Security 2004
Anton Chuvakin on What is NOT Working in Security 2004Anton Chuvakin on What is NOT Working in Security 2004
Anton Chuvakin on What is NOT Working in Security 2004
 

Viewers also liked

Intrusion detection and prevention system for network using Honey pots and Ho...
Intrusion detection and prevention system for network using Honey pots and Ho...Intrusion detection and prevention system for network using Honey pots and Ho...
Intrusion detection and prevention system for network using Honey pots and Ho...
Eng. Mohammed Ahmed Siddiqui
 
Intrusion detection system ppt
Intrusion detection system pptIntrusion detection system ppt
Intrusion detection system ppt
Sheetal Verma
 

Viewers also liked (6)

Sonar Project Report
Sonar Project ReportSonar Project Report
Sonar Project Report
 
Traffic Jam Detection System by Ratul, Sadh, Shams
Traffic Jam Detection System by Ratul, Sadh, ShamsTraffic Jam Detection System by Ratul, Sadh, Shams
Traffic Jam Detection System by Ratul, Sadh, Shams
 
Intrusion detection and prevention system for network using Honey pots and Ho...
Intrusion detection and prevention system for network using Honey pots and Ho...Intrusion detection and prevention system for network using Honey pots and Ho...
Intrusion detection and prevention system for network using Honey pots and Ho...
 
Intrusion detection system
Intrusion detection systemIntrusion detection system
Intrusion detection system
 
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATAREAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
REAL-TIME INTRUSION DETECTION SYSTEM FOR BIG DATA
 
Intrusion detection system ppt
Intrusion detection system pptIntrusion detection system ppt
Intrusion detection system ppt
 

Similar to Enhancing Intrusion Detection System with Proximity Information

Current Studies On Intrusion Detection System, Genetic Algorithm And Fuzzy Logic
Current Studies On Intrusion Detection System, Genetic Algorithm And Fuzzy LogicCurrent Studies On Intrusion Detection System, Genetic Algorithm And Fuzzy Logic
Current Studies On Intrusion Detection System, Genetic Algorithm And Fuzzy Logic
ijdpsjournal
 
An effective architecture and algorithm for detecting worms with various scan...
An effective architecture and algorithm for detecting worms with various scan...An effective architecture and algorithm for detecting worms with various scan...
An effective architecture and algorithm for detecting worms with various scan...
UltraUploader
 
Malicious Code Intrusion Detection using Machine Learning and Indicators of C...
Malicious Code Intrusion Detection using Machine Learning and Indicators of C...Malicious Code Intrusion Detection using Machine Learning and Indicators of C...
Malicious Code Intrusion Detection using Machine Learning and Indicators of C...
IJCSIS Research Publications
 
2011 modeling and detection of camouflaging worm
2011   modeling and detection of camouflaging worm2011   modeling and detection of camouflaging worm
2011 modeling and detection of camouflaging worm
deepikareddy123
 
A review of anomaly based intrusions detection in
A review of anomaly based intrusions detection inA review of anomaly based intrusions detection in
A review of anomaly based intrusions detection in
IAEME Publication
 
A review of anomaly based intrusions detection in multi tier web applications
A review of anomaly based intrusions detection in multi tier web applicationsA review of anomaly based intrusions detection in multi tier web applications
A review of anomaly based intrusions detection in multi tier web applications
iaemedu
 
A review of anomaly based intrusions detection in multi tier web applications
A review of anomaly based intrusions detection in multi tier web applicationsA review of anomaly based intrusions detection in multi tier web applications
A review of anomaly based intrusions detection in multi tier web applications
IAEME Publication
 

Similar to Enhancing Intrusion Detection System with Proximity Information (20)

C-Worm Traffic Detection using Power Spectral Density and Spectral Flatness ...
C-Worm Traffic Detection using Power Spectral Density and  Spectral Flatness ...C-Worm Traffic Detection using Power Spectral Density and  Spectral Flatness ...
C-Worm Traffic Detection using Power Spectral Density and Spectral Flatness ...
 
Current Studies On Intrusion Detection System, Genetic Algorithm And Fuzzy Logic
Current Studies On Intrusion Detection System, Genetic Algorithm And Fuzzy LogicCurrent Studies On Intrusion Detection System, Genetic Algorithm And Fuzzy Logic
Current Studies On Intrusion Detection System, Genetic Algorithm And Fuzzy Logic
 
DB-OLS: An Approach for IDS1
DB-OLS: An Approach for IDS1DB-OLS: An Approach for IDS1
DB-OLS: An Approach for IDS1
 
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMSAN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
 
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMSAN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
 
idps
idpsidps
idps
 
Idps
IdpsIdps
Idps
 
Detecting Anomaly IDS in Network using Bayesian Network
Detecting Anomaly IDS in Network using Bayesian NetworkDetecting Anomaly IDS in Network using Bayesian Network
Detecting Anomaly IDS in Network using Bayesian Network
 
An effective architecture and algorithm for detecting worms with various scan...
An effective architecture and algorithm for detecting worms with various scan...An effective architecture and algorithm for detecting worms with various scan...
An effective architecture and algorithm for detecting worms with various scan...
 
A Comprehensive Review On Intrusion Detection System And Techniques
A Comprehensive Review On Intrusion Detection System And TechniquesA Comprehensive Review On Intrusion Detection System And Techniques
A Comprehensive Review On Intrusion Detection System And Techniques
 
Detecting Unknown Insider Threat Scenarios
Detecting Unknown Insider Threat Scenarios Detecting Unknown Insider Threat Scenarios
Detecting Unknown Insider Threat Scenarios
 
Intrusion Detection System using AI and Machine Learning Algorithm
Intrusion Detection System using AI and Machine Learning AlgorithmIntrusion Detection System using AI and Machine Learning Algorithm
Intrusion Detection System using AI and Machine Learning Algorithm
 
NETWORK INTRUSION DETECTION AND COUNTERMEASURE SELECTION IN VIRTUAL NETWORK (...
NETWORK INTRUSION DETECTION AND COUNTERMEASURE SELECTION IN VIRTUAL NETWORK (...NETWORK INTRUSION DETECTION AND COUNTERMEASURE SELECTION IN VIRTUAL NETWORK (...
NETWORK INTRUSION DETECTION AND COUNTERMEASURE SELECTION IN VIRTUAL NETWORK (...
 
Network Intrusion Detection And Countermeasure Selection In Virtual Network (...
Network Intrusion Detection And Countermeasure Selection In Virtual Network (...Network Intrusion Detection And Countermeasure Selection In Virtual Network (...
Network Intrusion Detection And Countermeasure Selection In Virtual Network (...
 
Malicious Code Intrusion Detection using Machine Learning and Indicators of C...
Malicious Code Intrusion Detection using Machine Learning and Indicators of C...Malicious Code Intrusion Detection using Machine Learning and Indicators of C...
Malicious Code Intrusion Detection using Machine Learning and Indicators of C...
 
2011 modeling and detection of camouflaging worm
2011   modeling and detection of camouflaging worm2011   modeling and detection of camouflaging worm
2011 modeling and detection of camouflaging worm
 
A review of anomaly based intrusions detection in
A review of anomaly based intrusions detection inA review of anomaly based intrusions detection in
A review of anomaly based intrusions detection in
 
A review of anomaly based intrusions detection in multi tier web applications
A review of anomaly based intrusions detection in multi tier web applicationsA review of anomaly based intrusions detection in multi tier web applications
A review of anomaly based intrusions detection in multi tier web applications
 
A review of anomaly based intrusions detection in multi tier web applications
A review of anomaly based intrusions detection in multi tier web applicationsA review of anomaly based intrusions detection in multi tier web applications
A review of anomaly based intrusions detection in multi tier web applications
 
A0430104
A0430104A0430104
A0430104
 

More from Zhenyun Zhuang

Designing SSD-friendly Applications for Better Application Performance and Hi...
Designing SSD-friendly Applications for Better Application Performance and Hi...Designing SSD-friendly Applications for Better Application Performance and Hi...
Designing SSD-friendly Applications for Better Application Performance and Hi...
Zhenyun Zhuang
 
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
Zhenyun Zhuang
 
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
Zhenyun Zhuang
 
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
On the Impact of Mobile Hosts in Peer-to-Peer Data NetworksOn the Impact of Mobile Hosts in Peer-to-Peer Data Networks
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
Zhenyun Zhuang
 
Client-side web acceleration for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hostsClient-side web acceleration for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hosts
Zhenyun Zhuang
 
A3: application-aware acceleration for wireless data networks
A3: application-aware acceleration for wireless data networksA3: application-aware acceleration for wireless data networks
A3: application-aware acceleration for wireless data networks
Zhenyun Zhuang
 
Mutual Exclusion in Wireless Sensor and Actor Networks
Mutual Exclusion in Wireless Sensor and Actor NetworksMutual Exclusion in Wireless Sensor and Actor Networks
Mutual Exclusion in Wireless Sensor and Actor Networks
Zhenyun Zhuang
 
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
SLA-aware Dynamic CPU Scaling in Business Cloud Computing EnvironmentsSLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
Zhenyun Zhuang
 

More from Zhenyun Zhuang (20)

Designing SSD-friendly Applications for Better Application Performance and Hi...
Designing SSD-friendly Applications for Better Application Performance and Hi...Designing SSD-friendly Applications for Better Application Performance and Hi...
Designing SSD-friendly Applications for Better Application Performance and Hi...
 
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
 
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
 
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
 
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
 
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
On the Impact of Mobile Hosts in Peer-to-Peer Data NetworksOn the Impact of Mobile Hosts in Peer-to-Peer Data Networks
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
 
WebAccel: Accelerating Web access for low-bandwidth hosts
WebAccel: Accelerating Web access for low-bandwidth hostsWebAccel: Accelerating Web access for low-bandwidth hosts
WebAccel: Accelerating Web access for low-bandwidth hosts
 
Client-side web acceleration for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hostsClient-side web acceleration for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hosts
 
A3: application-aware acceleration for wireless data networks
A3: application-aware acceleration for wireless data networksA3: application-aware acceleration for wireless data networks
A3: application-aware acceleration for wireless data networks
 
Mutual Exclusion in Wireless Sensor and Actor Networks
Mutual Exclusion in Wireless Sensor and Actor NetworksMutual Exclusion in Wireless Sensor and Actor Networks
Mutual Exclusion in Wireless Sensor and Actor Networks
 
Hazard avoidance in wireless sensor and actor networks
Hazard avoidance in wireless sensor and actor networksHazard avoidance in wireless sensor and actor networks
Hazard avoidance in wireless sensor and actor networks
 
Dynamic Layer Management in Super-Peer Architectures
Dynamic Layer Management in Super-Peer ArchitecturesDynamic Layer Management in Super-Peer Architectures
Dynamic Layer Management in Super-Peer Architectures
 
A Distributed Approach to Solving Overlay Mismatching Problem
A Distributed Approach to Solving Overlay Mismatching ProblemA Distributed Approach to Solving Overlay Mismatching Problem
A Distributed Approach to Solving Overlay Mismatching Problem
 
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Hybrid Periodical Flooding in Unstructured Peer-to-Peer NetworksHybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
 
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systemsAOTO: Adaptive overlay topology optimization in unstructured P2P systems
AOTO: Adaptive overlay topology optimization in unstructured P2P systems
 
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
 
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
 
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
 
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
SLA-aware Dynamic CPU Scaling in Business Cloud Computing EnvironmentsSLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
 
Building Cloud-ready Video Transcoding System for Content Delivery Networks (...
Building Cloud-ready Video Transcoding System for Content Delivery Networks (...Building Cloud-ready Video Transcoding System for Content Delivery Networks (...
Building Cloud-ready Video Transcoding System for Content Delivery Networks (...
 

Recently uploaded

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Recently uploaded (20)

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 

Enhancing Intrusion Detection System with Proximity Information

  • 1. Enhancing Intrusion Detection System with Proximity Information †Zhenyun Zhuang College of Computing Georgia Institute of Technology Atlanta, GA 30332 E-mail: zhenyun@cc.gatech.edu †Corresponding author Ying Li College of Computing Georgia Institute of Technology Atlanta, GA 30332 E-mail: yingli@cc.gatech.edu Zesheng Chen Department of Engineering Indiana University - Purdue University Fort Wayne Fort Wayne, IN 46805 E-mail: zchen@engr.ipfw.edu Abstract: The wide spread of worms poses serious challenges to today’s Internet. Various IDSes (Intrusion Detection Systems) have been proposed to identify or prevent such spread. These IDSes can be largely classified as signature-based or anomaly-based ones depending on what type of knowledge the system knows. Signature-based IDSes are unable to detect the outbreak of new and unidentified worms when the worms’ characteristic patterns are unknown. In addition, new worms are often sufficiently intelligent to hide their activities and evade anomaly detection. Moreover, modern worms tend to spread more quickly, and the outbreak period lasts in the order of hours or even minutes. Such characteristics render existing detection mechanisms less effective. In this work, we consider the drawbacks of current detection approaches and propose PAIDS, a proximity-assisted IDS approach for identifying the outbreak of unknown worms. PAIDS does not rely on signatures. Instead, it takes advantage of the prox- imity information of compromised hosts. PAIDS operates on an orthogonal dimension with existing IDS approaches and can thus work collaboratively with existing IDSes to achieve better performance. We test the effectiveness of PAIDS with trace-driven simulations and observe that PAIDS has a high detection rate and a low false positive rate. We also build a proof-of-concept prototype using Google Maps APIs and libpcap library. Keywords: Intrusion Detection System; Proximity; Security; Worm. 1 Introduction Self-propagating worms have been posing serious threats to today’s Internet [35]. Such malicious programs are capable of propagating quickly and infecting a significant number of vulnerable machines in a short period of time (e.g., a few hours). Victims compromised by worms can form botnets and be used to launch large-scale attacks (e.g., DDoS and spamming). To effectively prevent worms from propagating rapidly, an IDS (Intrusion Detection System) is needed. Over the last few decades, numerous intrusion detection mechanisms have been proposed. These various techniques can be largely classified into two categories based on the nature
  • 2. of the knowledge that an IDS has. If the characteristic patterns (i.e., signatures) of attacks in an IDS are known, the IDS is referred to as a signature-based IDS. Otherwise, if the patterns of normal activities are known, the attacks can be identified by monitoring the differences between the normal and anomalous activities. An IDS relying on iden- tifying anomaly features is referred to as an anomaly-based IDS. Existing techniques can detect and prevent the propa- gation of many worms, in particular the known ones. Nev- ertheless, their operations often fail to work with new and more intelligent worms since the characteristics patterns of new worms have not been extracted and analyzed. Due to the unavailability of patterns of the worms, signature- based solutions simply do not work. Briefly, two limita- tions associated with signature-based IDSes prevent them from functioning effectively. First, it takes considerable time for detection entities (such as anti-virus software com- panies) to learn the attack pattern or the signature of the worms. Thus, before such signatures are available, the worms may have infected a significant number of machines. Second, it takes some time for a normal user to utilize such signatures in the form of updating or installing IDS soft- ware. On the other hand, modern worms tend to spread more quickly, and the outbreak period lasts in the order of hours or even minutes. In particular, flash worms are able to compromise a large number of hosts in several minutes [34]. Therefore, signature-based techniques fail to effec- tively detect the propagation of unknown worms. Anomaly-based solutions also have their drawbacks. An anomaly-based IDS examines traffic, activities, or behav- iors to find anomalies. The underlying principle is that “attack behaviors” may differ from “normal user behav- iors.” By cataloging and identifying the differences in- volved, IDSes can detect a worm in many circumstances. However, anomaly-based IDSes are far from being effective. First, since normal behaviors can change easily and read- ily, anomaly-based IDSes are prone to false positives, i.e., many actually normal behaviors are falsely reported as at- tacks. Second, worms can potentially become increasingly intelligent to hide their abnormal activities and thus render most anomaly-based approaches ineffective. For example, if a worm applies a polymorphic blending attack [11], it can hide its activities in normal network activities. There- fore, anomaly-based IDSes may fail to effectively detect the propagation of intelligent worms. In this work, our goal is to design an IDS that is ca- pable of identifying new and intelligent fast-propagating worms and thwarting their spread, particularly during the worm’s “start-up” stage. Since neither signature-based nor anomaly-based techniques can achieve such capabilities, we ask the question: Given the unknown nature of an imped- ing worm, its high spreading speed, and its intelligence to hide its abnormal activities, is it still possible to efficiently thwart its propagation? We answer this question with a novel proximity- assisted approach. Our approach is referred to as PAIDS (Proximity-Assisted Intrusion Detection System) and is mainly based on the observations of the clustered pattern of worm propagation and the typical long active time of a compromised host. Since vulnerable hosts are highly un- evenly distributed in the Internet [5], compromised com- puters are usually centralized in certain outbreak areas, especially at the early stage of worm propagation. Thus, a host located in the area with more infected hosts is more likely to be compromised by the worm. The locations of the early infected hosts can be known in certain ways, for instance, using honeypots. In other words, although the nature and signatures of the upcoming worms cannot be identified at the early stage, the exact fact that they are infecting other machines can be detected. Thus, such infor- mation can be utilized to counteract the worms’ infection process at their early stage when conventional signature- based methods fail to work. One important property about PAIDS is that it is fully complementary to existing ap- proaches, particularly anomaly-based ones. That is, the mechanisms included in the proximity-assisted approach can be utilized in tandem with other approaches since PAIDS is working on a different dimension from exist- ing IDSes. Moreover, our preliminary evaluation based on trace-driven simulations shows that PAIDS has a high detection rate and a low false positive rate. The rest of the paper is organized as follows. We first motivate our design in Section 2. We present the detailed design in Section 3. We then perform trace-driven simula- tions in Section 4. We also built a proof-of-concept proto- type and present the implementation details in Section 5. We then discuss several important design issues in Section 6. Finally, we conclude our work in Section 7. 2 Motivation Our proposed approach, PAIDS, is motivated by three ma- jor observations: (i) limitations of existing IDSes; (ii) clus- tered pattern of worm spread; and (iii) long active time of compromised hosts. 2.1 Limitations of existing approaches • Signature-based or anomaly-based. Intru- sion detection efforts have been proposed for years and can be classified as signature-based and anomaly-based. Signature-based IDSes, such as anti-virus programs (e.g., Norton Antivirous and Macfee), use the signatures to rec- ognize and block infected files, programs, or active Web content from entering a computer system. Such IDSes are the most widely used approach in the commercial IDS tech- nology today. Moreover, abundant research efforts such as [17, 30, 3, 6, 24, 21] also fall into this category. Anomaly-based IDSes use rules or predefined concepts about “normal” and “abnormal” system activities (i.e., heuristics) to distinguish anomalies from normal system behaviors and to monitor or block anomalies. Anomaly- based IDSes include [16, 33, 22, 40, 7, 1, 39, 9]. For an anomaly-based IDS, a training process is typically required
  • 3. to extract normal patterns. Afterwards, when some ac- tivities are observed to be sufficiently different from such patterns, they are flagged as abnormal ones, and the cor- responding actions may be taken. A wide variety of tech- niques have been explored to approach the anomaly de- tection problem, such as neural networks [16], statistical modeling [33], temporal sequence learning [22], n-grams [40], and states of web applications [7]. However, neither signature-based nor anomaly-based IDSes can effectively identify new and intelligent worms. On one hand, identifying a worm using signature-based IDSes requires the characteristic patterns of the worm. Such mechanisms will not work with new worms since the patterns of these worms are not known a priori. Further- more, even if the signature of a new worm is extracted after a certain period of time, it takes considerable time for a normal user to adopt the signature in the form of up- dating his anti-worm software. On the other hand, modern worms are becoming increasingly intelligent, and many are capable of hiding their activities to avoid the detection of anomaly-based IDSes [11]. Worms achieve such stealthy goals either by learning the rules used by the IDS or by acting extremely less aggressively. • Network-based or host-based. Various IDSes can also be classified into network-based [38, 28, 13, 37, 25, 32, 10, 13] and host-based ones [15, 36, 8, 26, 31, 12]. The difference between these two categories lies in where the intelligence used for detection resides. In host-based IDSes, the intelligence only resides on local hosts, whereas network-based IDSes rely on the information exchanged among many hosts. Both categories have their own ad- vantages. For example, network-based IDSes can monitor an entire network with only a few well-situated nodes and impose little overhead on a network. Host-based IDSes can analyze activities on the host at a high level of details and determine which processes and/or users are involved in malicious activities. Both types of approaches, however, have disadvantages and may fail to work under certain scenarios. For exam- ple, although network-based IDSes have the advantage of knowing aggregate traffic patterns, they may not know other crucial information such as the process environments and the exact protocol of a connection. Most of the time normal users are reluctant to give such information to other entities including IDSes due to privacy or security con- cerns. By contrast, host-based IDSes have the full infor- mation of the connections of a specific host, but they lack the aggregate view of the network. The disadvantages as- sociated with either IDSes may greatly affect the effective- ness of detecting worms. Our designed PAIDS intends to incorporate the advantages of both IDSes. 2.2 Clustered pattern of worm spread Once a worm is released to the Internet, it typically starts from a few hosts. It then guesses the addresses of targets and attempts to infect them. Typically, worms use random scanning or localized scanning to spread [35, 41]. Random 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 80 90 100 Time (hours) Coverageoftopclusters(%) Clustered pattern (40*40 Grid) K=8 K=12 K=16 Figure 1: Top K grids (out of 1600 grids) of U.S. Map scanning selects targets randomly and has been used by Code Red v2, Slammer, and Witty worms, whereas local- ized scanning preferentially searches for targets in the local network (i.e., sharing the same prefix of IP addresses) and has been exploited by Code Red II and Nimda worms. Lo- calized scanning has some advantages over random scan- ning. For example, hosts within the same subnet often expose similar vulnerabilities. In addition, in general in- fecting close-by hosts takes less time due to the smaller round trip time. Furthermore, many networks are pro- tected by firewalls, and infecting hosts in the same network is relatively easier. Worm spread demonstrates a highly clustered pattern, especially for localized scanning. The main reason is that vulnerable hosts are highly unevenly distributed in the In- ternet [4]. Therefore, at any time of the worm outbreak, the compromised hosts typically form clusters, e.g., they reside in close-by geographical locations. To show this, we study the outbreak of the Code Red v2 worm in July 2001 based on the traces from CAIDA [2]. To examine the clustered pattern of geographical locations of infected hosts, we equally divide the US continent map into a num- ber of grids and count the number of compromised hosts covered by each grid. We then choose the top K grids and show the percentage of compromised hosts covered by these grids. Intuitively, if the cluster degree is higher, more hosts would reside in the top K grids. Specifically, we di- vide the map into 40×40 grids (totally 1600) and measure the percentage of compromised hosts covered by the top 8, 12, and 16 grids (i.e., K = 8, 12, 16). Figure 1 shows our measurements of the coverage of top K grids over time of up to 20 hours. We observe the highly clustered patterns. For example, as shown in the figure, with only 8 grids (0.5% of total grids), the coverage at the 2-hour time is about 60%, whereas 16 grids (1% of all grids) cover more than 72% of all compromised hosts. More interestingly, such clustered behaviors do not fade quickly over time. As shown in the figure, even after 20 hours, top 16 grids still account for more than 53% of all compromised hosts. Interestingly, the clustering behavior of worm spreading occurs not only in the geographical dimension, but in the network address dimension. Works such as [5] have iden- tified the clustered pattern in the network address space.
  • 4. Specifically, work [5] finds that more than 80% of mali- cious sources are clustered in the same 20% of the IPv4 address space over time based on DShield data. Our de- signed PAIDS can work on both geographical and network address dimensions. In this paper, we mainly present the results on the geographical dimension. 2.3 Long active time of compromised hosts Another observation is that compromised hosts are typ- ically engaged in infecting other hosts for a considerable amount of time. Taking the Code Red v2 worm as an example, we plot the distribution of active time of all com- promised hosts (more than 359,000) in Figure 2. It can be seen that 80% of compromised hosts have active time of longer than 30 minutes, while only 12% of these hosts have active time shorter than 5 minutes. In other words, the fig- ure shows that most compromised hosts attempt to infect other hosts for a considerably long time (e.g., more than dozens of minutes) before they are stopped. This charac- teristic of compromised hosts can potentially be utilized to assist detection. For example, if a compromised host can be identified as a suspicious one, then hosts that are communicating with this host should be alerted. 3 Design We now present the design of PAIDS. After giving an overview, we will present the deployment model, the soft- ware architecture, and three major components of PAIDS. 3.1 Overview The key idea of PAIDS is to take advantage of the clus- tered pattern of worm outbreak and the long active time of compromised hosts. Briefly, PAIDS works as follows. PAIDS runs on a local host as a daemon and keeps moni- toring the ongoing connections. It records the IP addresses and port numbers of the hosts with which the local host is communicating. The recorded information is sent to a processing center that determines the danger levels of the corresponding hosts. If the processing center determines that a recorded host is suspicious, then it reports back to 0−5 min. (12%) 5−10 min. (2%) 10−15 min. (2%) 15−20 min. (2%) 20−25 min. (1%) 25−30 min. (1%) >30 min. (80%) Figure 2: Active time distribution of the Code Red v2 worm the local host so that PAIDS can raise alerts and notify the user. The processing center makes the decision by col- lecting the information of suspicious hosts on the Internet and performing clustering operations to model the danger level of each area. The clustering is based on proximity in- formation associated with the suspicious hosts. Note that proximity is not necessarily limited to geographical and network address dimensions and can also occur in other di- mensions such as DNS. In this work, we present the design of PAIDS only in the context of the geographical dimen- sion for the purpose of simplicity. However, PAIDS can be easily changed to utilize other dimensions. The operations of PAIDS involve the cooperation of sev- eral entities (shown in Figure 3). Firstly, it requires the local host to report its communicating hosts (referred to as “communication neighbors”) to a processing center. The communication components of this reporting service are referred to as Remote Connection Monitor (RCM) and Remote Connection Attendant (RCA). Secondly, the pro- cessing center is referred to as Security-Level Processing Engine (SLPE). SLPE is the entity that determines the danger level of each of the reported neighbors from the local host. The determination is based on proximity cor- relations, i.e., the closer the communicating host is to the outbreak area, the greater chance it might conduct mali- cious activities. The proximity correlations can be com- puted using various geographic mapping techniques [27]. One of such techniques is the IP-to-location service [19]. The SLPE can be implemented at a place convenient to the local hosts, e.g., a dedicated machine of a group or an insti- tute. Thirdly, it requires a suspicious-activity monitoring service that typically runs honeynet to attract worms and collects related information. In this paper we refer to the monitoring service as Suspicious Hosts Sniffing (SHS). SHS serves no other purpose than simply recording suspicious connections reaching the honeynet. The recorded informa- tion at least includes the IP addresses and the port num- bers of the suspicious connections. Such information will expose the outbreak information of new worms for they have to propagate to other hosts after they are released. PAIDS addresses the shortcomings of signature-based and anomaly-based IDSes with a mechanism based on proximity information. It deals with the unavailability of worm signatures by using the notion of suspicious hosts, i.e., the hosts that are captured by honeynet when they attempt to connect to honeynet. Specifically, if a host is compromised, it will attempt to infect other hosts. If it falls into the view of the monitoring service, it is recorded. Since compromised hosts typically keep active for a con- siderably long time, such a “black-list” style treatment will bring benefits. Moreover, since PAIDS is not based on the known patterns of normal behaviors, it does not have the drawbacks of the anomaly-based approach. PAIDS addresses the drawbacks of network-based and host-based approaches by using a processing center, which is actually the third layer between a global information cen- ter and a local host. Introducing such a layer can gain ben- efits by striking the balance between the advantages and
  • 5. Figure 3: PAIDS Architecture disadvantages associated with these two types of IDSes. Since hosts are more likely to trust a local processing center and present detailed information to the latter, such under- taking can bridge the trust gap between these two types of IDSes. In other words, the processing center can serve as the intermediate proxy for the information exchanges be- tween the global information center and local hosts. The functioning of the processing center works in two direc- tions. On one hand, it collects the information from local hosts and reports filtered information to the global center. On the other hand, it obtains information from the global center and determines the danger level of reported connec- tions on behalf of local hosts. Other than the trust benefit, introducing the processing center can also help relieve the scalability concern embedded in alternative two-layer ap- proaches. 3.2 Deployment model of PAIDS The design of PAIDS is not intended to replace any other type of IDSes. Thus, it is inappropriate to compare PAIDS directly with other IDSes. Instead, it complements other IDSes by working orthogonally with them. For instance, PAIDS can be combined with an anomaly-based IDS. One way for such a combination is to let PAIDS adjust the threshold values usually used in anomaly-based IDSes. An anomaly-based IDS typically sets certain “threshold” val- ues to determine whether a connection is suspicious or not. These values are difficult to determine in many cases for the determination has to strike a balance between false positives and false negatives. With PAIDS, the threshold values can be adjusted according to the danger levels mea- sured by PAIDS. For example, if a connection involves a host that is within an outbreak area and thus has a higher danger level, the threshold values can be adjusted to re- flect such information. Such treatments are possible since PAIDS utilizes the proximity information that other IDSes typically do not consider. We now use one example scenario to elaborate how PAIDS can work cooperatively with other existing IDSes. We choose an anomaly-based IDS simply because of easy explanation. Note that such a scenario should not be treated as the only way in which PAIDS can work with other IDSes. The anomaly-based IDS is Swaddler [7]. Swaddler supports two modes: training and detection modes. During the training mode, suitable thresholds are derived to represent the anomalous scores. The system then switches to the detection mode, and anomalous states can be reported based on the derived thresholds. PAIDS can assist the setting of the threshold values by replac- ing each threshold by a pair of threshold values, one for safe neighbors and the other for suspicious neighbors. The suspicious neighbors are the hosts causing PAIDS to raise alerts, whereas the safe neighbors are those not triggering alerts. Intuitively, the threshold value for safe neighbors should be higher than the value for suspicious neighbors since the suspicious ones require stricter screening. Alter- natively, PAIDS may help the threshold setting in a finer way by assigning a range of values. In other words, PAIDS may define a threshold value function Th = g(x), where x is the danger level outputted by PAIDS. With such a func- tion, the threshold value is not fixed across all hosts, it is adjusted according to the danger level of the corresponding host. 3.3 Software architecture and components The software architecture of PAIDS is shown in Figure 3. PAIDS mainly consists of two main entities: a PAIDS client and a PAIDS server. It also assumes the existence of a third entity, namely the SHS (Suspicious Hosts Sniffing). The PAIDS client runs two software components: a RCM (Remote Connection Monitor) and a Danger Level Fron- tend. The first component, RCM, keeps track of on-going connections on the local host and reports the connection list to the PAIDS server. The second component can be as simple as a typical browser such as Internet Explorer or Firefox, but may also include more functionalities such as providing the user ways to disconnect certain connections. The PAIDS server consists of two parts: a RCA (Remote Connection Attendant) and a SLPE (Security-Level Pro- cessing Engine). The purpose of RCA is simply to receive the reported connection list from RCM. SLPE is the core part of PAIDS. It collects information from both RCA and
  • 6. SHS, determines the danger level of each reported connec- tion using geographic mapping techniques, and then sends the relevant information back to the PAIDS client. We now elaborate on the three major components of PAIDS: • RCM and RCA. RCM and RCA are used to report the local host’s communicating neighbors to the SLPE. Specifically, RCM runs on the local host side, whereas RCA runs on the server side. The basic functionalities of RCM are to monitor and report. RCM keeps track of all the ongoing connections of the local host and is capable of reporting to RCA periodically (i.e., pushing) or in the on-demand fashion (i.e., pulling). The major advantage of pushing information to RCA is the timeliness. However, it incurs higher communication and processing overhead. In contrast, pulling information from RCM requires less overhead, but it inflates the information gathering time. RCA collects connection information from RCM. Under the pushing model, RCA is simply a server daemon col- lecting the reports of RCM. Under the pulling model, RCA pro-actively requests the information from RCM. • Security Level Processing Engine (SLPE). The SLPE is the key component of PAIDS, determining the danger level of the reported neighbors using the IP-to- location service. The determination is based on the dis- tance between a neighbor and the outbreak areas. For- mally, given a set of outbreak areas S = {S1, S2, ..., SI} and a list of neighbor hosts H = {H1, H2, ..., HJ }, SLPE will output a danger level vector of L = {L1, L2, ..., LJ }, where Li is the danger level of neighbor host Hi. Specifi- cally, let di,j denote the distance between Hi and Sj. The danger level of Hi is given by a function f(di,j, S). The de- sign of f(di,j, S) by itself is an important and interesting problem. Due to space limitations, we only provide two simple forms. The first form is the average value of all di,j on Sj: Li = 1 I I∑ j=1 di,j. (1) The second form simply chooses the minimum value of all di,j, i.e., Li = minj {di,j}. Both forms have advan- tages and disadvantages. Without explicit notes, in the following we only use the first form. If SHS also reports the normalized intensity of each outbreak areas, the above equation is given as Li = 1 I I∑ j=1 di,jtj, (2) where tj is the normalized intensity of Sj and ∑I j=1 tj = 1. After computing the danger level of a neighboring host, SLPE compares its danger level value to a pre-defined threshold value Td. If the danger level value is below the threshold, which implies that the host is sufficiently close to the outbreak area, the host will be flagged as suspicious. One important design issue of SLPE is the scalability. As the number of compromised hosts increases, the pro- cessing overhead on SLPE also increases. To reduce the processing overhead, SLPE performs clustering of sniffed suspicious hosts. Clustering can effectively improve scala- bility by organizing compromised hosts into a list of out- break areas. The list of outbreak areas is referred to as Security Vector (SV), which essentially records the loca- tion and the intensity of the outbreak areas. There are many ways to perform clustering, and one popular way is to use K-means. With K-means, the number of outbreak areas is fixed to K irrespective of the number of compro- mised hosts, and K is practically much smaller than the number of compromised hosts. • Suspicious Hosts Sniffing (SHS). SHS is respon- sible for collecting the activities of suspicious hosts and reporting to SLPE. The implementation of SHS can be as simple as an individual honeypot or a more complicated form such as honeynet [29]. Since typically SHS does not have legitimate hosts, SHS records suspicious hosts when- ever it receives connection requests from other hosts. The information that SHS records can be simply a list of IP addresses or both IP addresses and port numbers. To take full advantage of the suspicious activities, SHS may also record the intensity of the threat by counting the oc- currence frequencies of specific connections. Since there might be a significant number of suspicious activities, SHS may aggregate individual hosts into suspicious outbreak ar- eas. In other words, each outbreak area may contain many hosts and can be represented in the format of < C, r, t >, namely, the center C, the radium r, and the outbreak in- tensity t. 3.4 Pseudo-code of main components The main operations of the components of PAIDS are shown with pseudo-code in Figure 4. Briefly, (i) SHS keeps monitoring the abnormal activities of the network and records the information of the corresponding host. The recorded information includes the IP address, port number, security intensity, and so on. The information is periodically reported to SLPE for further processing. (ii) SLPE gathers the reported information about suspicious hosts and performs clustering operations to aggregate the security information. The output of clustering is a secu- rity vector SV , which maintains the clustered security re- gions. Each of the regions is an aggregation of a set of suspicious hosts. The clustering can be achieved by em- ploying any clustering algorithm such as K-means. (iii) RCA receives the request from RCM and responds with the determined danger level. The request consists of the information about a host that needs to determine the dan- ger level. RCA calculates the danger level based on SV maintained by SLPE. Because of such reliance, it is better to have SLPE and RCA co-located on the same machine. (iv) RCM runs locally by users and is triggered every time a new connection is setup. RCM extracts the information of the communication neighbor and inquires RCA about the danger level of the neighbor.
  • 7. (a) Component of SHS Variables Hi: A suspicious host 1 Keep monitoring network activities 2 If the activity (involving Hi) is suspicious 3 Record information of Hi (IP, Port, etc.) 4 Report information of Hi to SLPE 5 End 6 End (b) Component of SLPE Variables Hi: A suspicious host SV : Security vector maintained by SLPE 1 Keep gathering reports from SHS 2 Enqueue new suspicious host Hi information 3 Obtain the proximity information of Hi 4 Re-clustering by incorporating Hi 5 Output a new SV 6 End (c) Component of RCA Variables Hj: A host that needs to determine danger level SV : Security vector maintained by SLPE 1 Keep waiting for RCM requests 2 Obtain information (IP, port, etc.) of corresponding Hj 3 Determine the danger level of Hj based on SV 4 Respond to RCM 5 End (d) Component of RCM Variables Hj: A host that needs to determine danger level 1 Keep monitoring connections on local host 2 Extract information (IP, port, etc.) of the neighbor Hj 3 Report Hj to RCA 4 Get response from RCA 5 Determine actions to take for the connection 6 End Figure 4: Pseudo-code : (a) SHS, (b) SLPE, (c) RCA, (d) RCM 4 Evaluation PAIDS works complementarily with other IDSes such as anomaly-based ones. Thus, a thorough evaluation of PAIDS requires the integration with another IDS, either at the live time of an actively propagating worm or with the complete traces of that IDS on Internet. Partly due to lack of such traces, we turn to simulations at this stage, and we view the abovementioned integrated evaluation as future work. Specifically, in this work we evaluate the ef- fectiveness of PAIDS by performing simulations with the propagation traces of a recorded worm from CAIDA [2]. 4.1 Trace-driven simulations The simulation uses the outbreak trace of the Code Red v2 worm from CAIDA. We focus on two performance metrics: detection rate and false positive rate. The simulation is performed as follows. First, for all the recorded compromised hosts, we sort them according to their starting time, i.e., the time they are observed to begin to infect other hosts. The method is illustrated in Figure 5, where all hosts are ranked along the time line. For each of the hosts, we assume that their infectious activ- Figure 5: Illustration of Trace-driven Evaluation ities can be sniffed by SHS and their information are sent to SLPE after a certain latency. This latency is referred to as the Sniffing Latency, denoted by SL. Let Ci denote the set of all compromised hosts when host Hi is compro- mised, and we have Ci = {Hj, j ≤ i}. Let Hm denote the last host that is at least SL time ahead of the infection of Hi. We use Ri to denote the set of all recorded hosts be- fore host Hi, i.e., Ri = {Hj, j ≤ m}. We assume that the honeynet can provide Ri immediately, but cannot provide the signatures and the patterns in a short time. With these notations, Ci contains all hosts that are actually compro- mised when Hi is compromised, whereas Ri contains all hosts in Ci that are sniffed or recorded by SHS. In other words, Ri is only a subset of Ci. Since the trace from CAIDA does not provide the infor- mation of “who infects whom”, we make several assump- tions to evaluate the performance of PAIDS. For a host Hv, we assume that PAIDS is running on this host and report- ing suspicious hosts with which the host is communicating. Moreover, we assume that Hv is communicating with Hi. We then measure the effectiveness of PAIDS by monitoring how well PAIDS can detect the infected host Hi. That is, if PAIDS was running on host Hv when the host Hi was infected and begin infecting Hv, then PAIDS running on Hv may be able to determine the danger level of Hi based on the information of Ri. If PAIDS is able to raise the alert, then it is considered to be effective. Otherwise, it is considered to be ineffective. Since Ri is only a subset of Ci (i.e., Ri ⊂ Ci), PAIDS running on Hv may or may not be able to raise an alert depending on the current threshold value. Specifically, we compute the averaged distance of Hi to Ri, i.e., Li = 1 |Ri| ∑ j dHi,Hj , where Hj ∈ Ri.
  • 8. 0 2000 4000 6000 8000 10000 0 10 20 30 40 50 60 70 80 90 100 Number of Compromised Hosts Percentage(%) Detection Rate False Positive Rate (a) Threshold = 10 0 2000 4000 6000 8000 10000 0 10 20 30 40 50 60 70 80 90 100 Number of Compromised Hosts Percentage(%) Detection Rate False Positive Rate (b) Threshold = 15 0 2000 4000 6000 8000 10000 0 10 20 30 40 50 60 70 80 90 100 Number of Compromised Hosts Percentage(%) Detection Rate False Positive Rate (c) Threshold = 20 Figure 6: Effectiveness of PAIDS (Sniffing Latency = 100 seconds) Note that such a scenario provides a worst case for PAIDS since we consider the time slot right after host Hi is in- fected when the size of SHS set Ri is smallest. After Li is obtained, it is then compared to Td, a pre- configured threshold value. If Li is smaller than or equal to Td, then host Hi is considered to be in danger, and the corresponding effectiveness value is set to 1. Otherwise, if Li is larger, then the effectiveness value is set to 0, as illus- trated below: Alerti = 1, if Li ≤ Td; Alerti = 0, otherwise. We measure the effectiveness of PAIDS for the first 10,000 infected hosts. For every 500 hosts, we output a detection rate that is calculated based on the number of alerts raised. Specifically, for each of the 500 hosts, if PAIDS can raise the alert, we count 1; otherwise, 0. Then we obtain the detection rate by dividing the sum of alerts by 500. Hence, denoting the number of alerts as Alert, the effectiveness of PAIDS is calculated as Alert 500 . The false positive rate is obtained by considering a ran- domized scenario. Specifically, we generate a scenario by randomly choosing a host that is not compromised and is communicating with host Hv. That is, we introduce a host with a random IP address and use this host to re- place the host Hi in each step. As a result, the randomly selected neighbor represents the false positive. Using the same method as described in calculating the detection rate, we also obtain the false positive rate. 10 15 20 25 30 35 40 0 10 20 30 40 50 60 70 80 90 100 Threshold Percentage(%) Figure 7: Impact of Thresholds 4.2 Results • Impact of Td. We study the impact of Td by varying its values. We set the SL = 100 seconds and test three cases: Td = 10, 15, and 20. We consider totally 10,000 hosts, and Figure 6 shows the results. In the figure, we ob- serve two trends. Firstly, for randomly introduced hosts, only very few false positives are raised, whereas for com- promised hosts, PAIDS can raise a very high percentage of alerts. For example, with Td = 15, about 75% of com- promised hosts are detected. Secondly, the effectiveness increases with higher values of Td. These results are ex- pected since higher Td means that more neighbor hosts are treated as suspicious, thus reducing the false negatives. To more closely study the trend, we vary Td from 10 to 40 and show the results in Figure 7. We can see that Td = 10 gives the detection rate only 51%, and the rate quickly reaches 91% when Td = 20. When Td = 40, it can detect up to 96% compromised hosts. The false positive rates for all results shown in the figure are almost zero. • Impact of SL. We also vary the Sniffing Latency value SL from 25 seconds to 150 seconds and evaluate the impact. The results are shown in Figure 8. Note that in these experiments, the threshold is 15 (i.e., Td = 15). We observe that as SL increases, the detection rate of PAIDS decreases. For instance, with SL = 25 seconds, the de- tection rate is about 79%, while with SL = 150 seconds, the rate drops to about 74%. This is because a larger SL value means that it takes a longer time for SHS to learn the information about compromised hosts. We further test the impact of SL with varying values and show the results in Figure 9. We can see that as SL increases from 25 seconds to 200 seconds, the effectiveness only drops from about 79% to 74%. These results im- ply that the effectiveness of PAIDS is relatively insensitive to the changes of SL, which is actually an advantage of PAIDS. • Impact of f(di,j, S). As we elaborated in Section 3, the choice of f(di,j, S), the danger level function for a host Hi, also needs the careful design. We have proposed two forms of this function. In all above evaluations we use the average form, which calculates the danger level by averaging on all di,j. We also preliminarily evaluate the impact of the other form, which takes the minimum value
  • 9. 0 2000 4000 6000 8000 10000 0 10 20 30 40 50 60 70 80 90 100 Number of Compromised Hosts Percentage(%) Detection Rate False Positive Rate (a) Sniffing Latency = 25 seconds 0 2000 4000 6000 8000 10000 0 10 20 30 40 50 60 70 80 90 100 Number of Compromised Hosts Percentage(%) Detection Rate False Positive Rate (b) Sniffing Latency = 50 seconds 0 2000 4000 6000 8000 10000 0 10 20 30 40 50 60 70 80 90 100 Number of Compromised Hosts Percentage(%) Detection Rate False Positive Rate (c) Sniffing Latency = 150 seconds Figure 8: Effectiveness of PAIDS (Threshold = 15) of all di,j. Since the second form of f(di,j, S) takes the minimum value of all di,j, the value of f(di,j, S) is much smaller than the first form. We choose three threshold values for Td, namely, 0.01, 0.05, and 0.1. The results are shown in Figure 10. We see that as Td increases, the detection rates also gradually increase, and with Td = 0.1 the rate almost reaches 100%. The false positive rates are kept very low. Specifically, our results indicate that the highest false rate is only 1.3%. The interesting observation regarding the false positive rate is that the rate increases as more hosts are compromised. Part of the reason for such a re- sult is that with increasing number of hosts, the randomly selected hosts have a higher probability of being close to 20 40 60 80 100 120 140 160 180 200 0 10 20 30 40 50 60 70 80 90 100 Sniffing Latency Percentage(%) Figure 9: Impact of Sniffing Latency 0 2000 4000 6000 8000 10000 0 10 20 30 40 50 60 70 80 90 100 Number of Compromised Hosts Percentage(%) Detection Rate (Threshold = 0.01) Detection Rate (Threshold = 0.05) Detection Rate (Threshold = 0.1) False Positive Rate (Threshold=0.01) False Positive Rate (Threshold=0.05) False Positive Rate (Threshold=0.1) Figure 10: Impact of Danger Level Function outbreak areas. It would be interesting to study the question of applying which form of f(di,j, S) under what conditions. Although the page limit does not allow us to present the details, in short, our results suggest that the average form has the better false positive rate (all are zero), but with the relatively high detection rate; whereas the minimum form has the non-zero false positive rate (though very small), but with the higher detection rate. 5 Prototype To demonstrate that PAIDS is actually doable, we imple- ment a proof-of-concept prototype. The prototype uses Google Maps APIs [14] and libpcap library [23] on the client side and our implemented IP-to-location service [18] on the server side. The prototype demonstrates the power of PAIDS by en- hancing users’ intervention with PAIDS. The enhancement works by monitoring the ongoing connections and then identifying suspicious hosts that are located in or near out- break areas. Once PAIDS identifies a suspicious host, it will raise the alert and inform the users who then take appropriate actions. With this prototype, PAIDS comple- ments the default user intervention model by differentiat- ing the neighboring hosts based on their danger levels. On the client side, the prototype firstly implements a tool that can assist normal users to better understand their ongoing connections. The tool utilizes the Google Maps APIs. With the assistance of the security information, the tool will raise alarms to the users when they are commu- nicating with hosts located in outbreak areas. On the pro- cessing center side, the prototype identifies suspicious hosts that are located in or near outbreak areas. By computing the distances between the outbreak areas and these remote hosts, it determines the danger levels of those computers. If the danger level is high enough, the prototype will warn the user and present the threat information (such as the remote hosts’ location, compromised areas, running proto- cols, and the danger level) in visualized ways. Based on such information, the user can take further actions. client side also runs a monitoring daemon (i.e., RCM), which keeps tracks of the ongoing connections of the client
  • 10. and periodically reports the connection list to the process- ing center attendant daemon (i.e., RCA). RCA is imple- mented using the libpcap library for Linux hosts and con- tinuously monitors the connections and refreshes the host list in two ways. It periodically refreshes the host list ev- ery 30 seconds, and under the user’s request, it can also refresh immediately. The processing center runs a SLPE instance that calls the IP-to-location service running on the same center. Af- ter receiving the updated host list from the client and the compromised clusters from the worm-sniffing service (i.e., SHS), SLPE calls the IP-to-location service to compute the distance between each of the client neighbors and the known compromised areas. For simplicity, the prototype assumes the presence of a SHS service and emulates the SHS by a preloaded file that consists a suspicious IP list. After all danger levels are known, client neighbors are ranked according to the distance and assigned with danger levels. The information is written to a file that will be read by the Maps-generating program. The Maps-generating program uses PHP scripts and will show the geographical locations as well as the danger level associated with each host once requested by the client. The IP-to-location service is implemented using the database provided by [18]. It contains millions of ASes and the corresponding information such as latitudes and longi- tudes. Given an IP address, it looks up the corresponding AS and extracts the latitude and longitude information, which is used to calculate the geographical distances be- tween hosts. The database is updated on a daily basis. 6 Discussion The effectiveness of PAIDS partly relies on the clustered pattern of worm propagation. In this section, we discuss several important issues relevant to the effectiveness of PAIDS. These issues include: (i) Design issues of suspi- cious hosts sniffing (SHS), (ii) Stages of worm propagation, (iii) Non-prioritized candidate selection, (iv) Mismatch be- tween geographical proximity information and networks, and (v) Clustering of normal Internet hosts. 6.1 Design issues of suspicious hosts sniffing (SHS) In the PAIDS design, we assume that infected hosts can be sniffed by SHS with a certain delay. The strength of this assumption depends on the ability of honeynets to capture suspicious worm activities on the Internet. For honeypots or honeynets with large sizes, such an assumption makes more sense, whereas for small-size honeynets, this is indeed a strong assumption. Small-size honeynets may take a long time to detect infected hosts, which results in a smaller set of Ri when compared with Ci. However, as we observed in the evaluation, PAIDS is relatively insensitive to the sniff- ing latency SL. In summary, an appropriate deployment of PAIDS should also consider the honeynets in use since the honeynets’ capability impacts the sniffing latency. 6.2 Stages of worm propagation PAIDS is expected to work well when the compromised hosts form clusters. However, as a propagation instance goes on, eventually there will be a larger number of com- promised hosts, and they may fail to exhibit the apparent clustered pattern. For example, if half of Internet are in- fected, then it would be very difficult to extract the clus- tered pattern that can be used effectively from these hosts. Thus, PAIDS works better when a worm just starts to propagate and the number of compromised hosts is rela- tively small. We argue that this feature is not a drawback of PAIDS, but the exact advantage of PAIDS. First, when a new worm is created and released, it is very difficult to quickly learn the signature of the worm. Thus, being capable of detecting such worms at their initial propaga- tion stages is of vital importance. Second, when a consid- erable number of hosts are compromised, existing IDSes, which PAIDS complements, will extract the characteristic patterns of worms and be able to detect them. In other words, an IDS that combines PAIDS and a conventional IDS should be able to achieve much better performance since they can work complementarily. 6.3 Non-prioritized candidate selection Worms propagate in a clustered fashion due to several ad- vantages including the similar vulnerability, low latency, and no firewalls. The clustering pattern is more appar- ent when an infected host chooses candidate hosts with localized scanning (LS). That is, LS gives higher priority to local hosts and actually inflates the clustering behav- ior of infected host. Examples of worms using LS include Code Red II and Nimada worms. By comparison, if worms choose candidate hosts with random scanning (RS), then the clustered pattern may be less apparent. Interestingly, the Code Red V2 worm that we studied in Section 2 uses RS to propagate, but the trace still shows the highly clus- tered pattern. We suspect that part of the reason may be related to the similar vulnerability present in local net- works. We envision that worms with LS will exhibit a higher degree of the clustered pattern. Another issue regarding candidate selection is “flash worms”[34]. , which hard-code a list of hosts with known vulnerabilities in the worms. Since hosts in such a list may not reside in clusters, the propagation in turn might not exhibit the obvious clustered pattern. However, since the number of encoded hosts is typically limited due to various constraints such as the code size, worms will soon need to rely on other ways to select candidate hosts to in- fect. Hence, the hard-coded host list only affects the initial “seeds” of the propagation of a worm.
  • 11. 6.4 Mismatch between geographical proximity in- formation and networks One assumption made by PAIDS is that the network ad- dress can approximately map to the geographical loca- tion. Since there may be a mismatch between proximity information and network addresses, one may argue that PAIDS might not work when such mismatches occur. On one hand, geographically close hosts may not be in the same network. On the other hand, IP addresses that ap- pear to be in the same network may be located far away from each other due to the existence of techniques such as VPN. Although such instances do occur, they do not seriously negate the effectiveness of PAIDS for several rea- sons. First, the fact that geographically close hosts re- siding in different networks does not affect PAIDS since PAIDS works in the direction of mapping network address to the proximity information, rather than the other direc- tion. Second, the occurrence of VPN-like instances is con- siderably small. In other words, most of the time, hosts in the same network are located closely. Third, even such a mismatch occurs, PAIDS may still address the issue to a great extent because it records the individual IP addresses of suspicious hosts and utilizes such information to deter- mine danger levels. Fourth, as we mentioned in Section 3, PAIDS can also utilize proximity information along di- mensions other than the geographical dimension. Specifi- cally, the mismatch can be easily addressed by combining network-proximity information in an appropriate way. 6.5 Clustering of normal Internet hosts Research has shown that normal Internet hosts form clus- ters [20]. This clustering behavior has an important im- pact on the effectiveness of PAIDS in the form of increased or decreased false positives and false negatives. To better understand the impact, we take a simple scenario where there are only two physically equal-size areas A and B. We further assume that area A is a big city with higher popu- lation and B is a rural region. Thus, naturally A has more Internet hosts and is clustered with a higher degree. Now we have two cases of worm propagation since the worm propagation cluster can be formed in either A or B. If the propagation cluster occurs in area A, then PAIDS will re- port more false positives since hosts in A are more likely to be treated as false positives. On the other hand, if the propagation cluster occurs in area B, then PAIDS will re- port more false negatives and less false positives because hosts in B are more likely to cause false negativesdue to similar reasons. We note that the impact of normal Inter- net hosts clustering is important, and it needs to be taken into consideration in the future design of PAIDS. 7 Conclusions In this work, we have studied the problem of intrusion de- tection and focused on the detection of unidentified worms. We have observed the failures and the drawbacks of exist- ing IDSes and approached the problem in a novel way. Our design is based on the typical behaviors of worm out- breaks. By utilizing the proximity and activity information of worm outbreaks, we have proposed a detection system, called PAIDS, that is complementary to existing IDSes. Our trace-driven simulation shows that PAIDS can detect an unidentified worm with a high detection rate and a low false positive rate. We have also built a prototype of PAIDS to demonstrate its usage. 8 Acknowledgements We thank Dr. Jonathon Giffin at Georgia Institute of Tech- nology and the anonymous reviewers for their insightful comments that helped us improve the quality of this re- search. REFERENCES [1] S. Bhatkar, A. Chaturvedi, and R. Sekar. Dataflow anomaly detection. In SP ’06: Proceedings of the 2006 IEEE Symposium on Security and Privacy, pages 48–62, Washington, DC, USA, 2006. IEEE Computer Society. [2] CAIDA. The cooperative association for internet data analysis. In http: // www. caida. org/ home/ . [3] R. Cathey, L. Ma, N. Goharian, and D. Grossman. Mis- use detection for information retrieval systems. In CIKM ’03: Proceedings of the twelfth international conference on Information and knowledge management, 2003. [4] Z. Chen and C. Ji. Measuring network-aware worm spread- ing ability. In Proceedings of IEEE INFOCOM, pages 116– 124, Anchorage, AK, USA, 2007. [5] Z. Chen, C. Ji, and B. Paul. Spatial-temporal characteris- tics of internet malicious sources. In Proceedings of IEEE INFOCOM (Mini-Conference), Phoenix, AZ, USA, 2008. [6] C. Y. Chung, M. Gertz, and K. Levitt. Demids: A mis- use detection system for database systems. In Proceedings of the Third International IFIP TC-11 WG11.5 Working Conference on Integrity and Internal Control in Informa- tion Systems, pages 159–178. Kluwer Academic Publishers, 1999. [7] M. Cova, D. Balzarotti, V. Felmetsger, and G. Vigna. Swaddler: An approach for the anomaly-based detection of state violations in web applications. In Proceedings of RAID, pages 63–86, Queensland, Australia, September 5– 7, 2007. [8] H. Debar, M. Dacier, and A. Wespi. A revised taxonomy for intrusion detection systems. Technical report, IBM Research, October 1999. [9] D. E. Denning. An intrusion-detection model. IEEE Trans. Softw. Eng., 13(2):222–232, 1987. [10] H. Dreger, A. Feldmann, M. Mai, V. Paxson, and R. Som- mer. Dynamic application-layer protocol analysis for net- work intrusion detection. In Proceeding of USENIX-SS’06, Berkeley, CA, USA, 2006. USENIX Association.
  • 12. [11] P. Fogla and W. Lee. Evading network anomaly detec- tion systems: formal reasoning and practical techniques. In CCS ’06: Proceedings of the 13th ACM conference on Computer and communications security, pages 59–68, New York, NY, USA, 2006. ACM. [12] J. T. Giffin, D. Dagon, S. Jha, W. Lee, and B. P. Miller. Environment-sensitive intrusion detection. In Proceedings of RAID, pages 185–206, 2005. [13] J. M. Gonz´alez and V. Paxson. Enhancing network intru- sion detection with integrated sampling and filtering. In Proceedings of RAID, pages 272–289, 2006. [14] Google. Google maps api. In http: // code. google. com/ apis/ maps/ . [15] R. Gopalakrishna, E. H. Spafford, and J. Vitek. Efficient intrusion detection using automaton inlining. In SP ’05: Proceedings of the 2005 IEEE Symposium on Security and Privacy, pages 18–31, Washington, DC, USA, 2005. IEEE Computer Society. [16] A. K. Gosh, J. Wanken, and F. Charron. Detecting anoma- lous and unknown intrusions against programs. In ACSAC ’98: Proceedings of the 14th Annual Computer Security Applications Conference, page 259, Washington, DC, USA, 1998. IEEE Computer Society. [17] H.Bos and K. Huang. Towards software-based signature detection for intrusion prevention on the network card. In Proceedings of RAID, Seattle, WA, USA, 2005. ACM. [18] Hostip. Ip address lookup. In http: // www. hostip. info/ dl/ index. html . [19] IP2Location. Geolocation ip address to country city region latitude longitude. In http://www.ip2location.com/. [20] B. Krishnamurthy and J. Wang. On network-aware clus- tering of web clients. SIGCOMM Comput. Commun. Rev., 30(4):97–110, 2000. [21] C. Kr¨ugel and T. Toth. Using decision trees to improve signature-based intrusion detection. In Proceedings of RAID, pages 173–191, 2003. [22] T. Lane and C. E. Brodley. Temporal sequence learning and data reduction for anomaly detection. ACM Trans. Inf. Syst. Secur., 2(3):295–331, 1999. [23] Libpcap. The libpcap project. In http: // sourceforge. net/ projects/ libpcap/ . [24] U. Lindqvist and P. A. Porras. Detecting computer and network misuse through the production-based expert sys- tem toolset (p-BEST). In IEEE Symposium on Security and Privacy, pages 146–161, 1999. [25] C. Muelder, K.-L. Ma, and T. Bartoletti. Interactive visu- alization for network and port scan detection. In Proceed- ings of RAID, pages 265–283, 2005. [26] D. Mutz, W. K. Robertson, G. Vigna, and R. A. Kem- merer. Exploiting execution context for the detection of anomalous system calls. In Proceedings of RAID, pages 1–20, 2007. [27] V. N. Padmanabhan and L. Subramanian. An investiga- tion of geographic mapping techniques for internet hosts. In SIGCOMM ’01, pages 173–185, New York, NY, USA, 2001. ACM. [28] M. Polychronakis, K. G. Anagnostakis, and E. P. Markatos. Emulation-based detection of non-self- contained polymorphic shellcode. In Proceedings of RAID, volume 4637, pages 87–106. Springer, 2007. [29] M. A. Rajab, F. Monrose, and A. Terzis. On the effective- ness of distributed worm monitoring. In SSYM’05: Pro- ceedings of the 14th conference on USENIX Security Sym- posium, pages 15–15, Baltimore, MD, USA, 2005. USENIX Association. [30] S. Rubin, S. Jha, and B. P. Miller. Language-based gen- eration and evaluation of nids signatures. In SP ’05: Pro- ceedings of the 2005 IEEE Symposium on Security and Privacy, pages 3–17, Washington, DC, USA, 2005. IEEE Computer Society. [31] M. I. Sharif, K. Singh, J. T. Giffin, and W. Lee. Under- standing precision in host based intrusion detection. In Proceedings of RAID, pages 21–41, 2007. [32] S. Sinha, F. Jahanian, and J. M. Patel. Wind: Workload- aware intrusion detection. In Proceedings of RAID, pages 290–310, 2006. [33] A. Soule, K. Salamatian, and N. Taft. Combining filtering and statistical methods for anomaly detection. In IMC’05: Proceedings of the Internet Measurement Conference 2005 on Internet Measurement Conference, pages 31–31, Berke- ley, CA, USA, 2005. USENIX Association. [34] S. Staniford, D. Moore, V. Paxson, and N. Weaver. The top speed of flash worms. In WORM ’04: Proceedings of the 2004 ACM workshop on Rapid malcode, pages 33–42, New York, NY, USA, 2004. ACM. [35] S. Staniford, V. Paxson, and N. Weaver. How to 0wn the internet in your spare time. In Proceedings of the 11th USENIX Security, San Francisco, CA, USA, 2002. [36] Sufatrio and R. H. C. Yap. Improving host-based ids with argument abstraction to prevent mimicry attacks. In Pro- ceedings of RAID, pages 146–164, 2005. [37] M. Vallentin, R. Sommer, J. Lee, C. Leres, V. Paxson, and B. Tierney. The nids cluster: Scalable, stateful network in- trusion detection on commodity hardware. In Proceedings of RAID, pages 107–126, 2007. [38] G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and S. Ioannidis. Gnort: High performance network intrusion detection using graphics processors. In R. Lippmann, E. Kirda, and A. Trachtenberg, editors, Pro- ceedings of RAID, volume 5230 of Lecture Notes in Com- puter Science, pages 116–134. Springer, 2008. [39] K. Wang, G. F. Cretu, and S. J. Stolfo. Anomalous payload-based worm detection and signature generation. In Proceedings of RAID, pages 227–246, Seattle, Washing- ton, USA, 2005. [40] K. Wang, J. J. Parekh, and S. J. Stolfo. Anagram: A content anomaly detector resistant to mimicry attack. In Proceedings of RAID, pages 226–248, Hamburg, Germany, 2006. [41] C. C. Zou, D. Towsley, and W. Gong. On the perfor- mance of internet worm scanning strategies. Perform. Eval., 63(7):700–723, 2006.