SlideShare une entreprise Scribd logo
1  sur  4
Télécharger pour lire hors ligne
ACEEE Int. J. on Information Technology, Vol. 01, No. 01, Mar 2011



 Overview of Existing Methods of Spam Mining and
 Potential Usefulness of Sender’s Name as Parameter
              for Fuzzy String Matching
                                           Soma Halder, Chun Wei, Alan Sprague
                                       Department of Computer and Information Sciences
                                             University of Alabama at Birmingham
                                                 Birmingham,Alabama,USA
                                              {soma,weic,sprague}@cis.uab.edu


Abstract—This paper gives an overview and analyzes the                   the spam emails are sent through Botnets, Spamhauses, Fast
different existing anti spamming techniques both content based,          Flux Service Networks (FFSN) and open relay sink holes in
non content based methods and a combination of the duo. It               order to trespass the protective mechanism laid down by the
also suggests that a small change in one of the parameters in            spam filters, domain and IP black listings.
the fuzzy string matching method could be useful to produce               A. Definitions
better results.
                                                                             Before we move into analyzing the different anti spam
    Index Terms —email, spam, computer forensics, fuzzy                  mechanisms we formally define the different methods taken
string match.                                                            up by the spammers to combat them and maintain their
                       I. INTRODUCTION                                   anonymity. Botnets are malware infected machines that are
                                                                         used for originating spam and spreading them to different
    The bulk of unsolicited email that floods our mail boxes
                                                                         recipients all over the web. If the recipient of the email gets
is what we call spam today. Ever since its first appearance
                                                                         infected by this malware they in turn act as bots. All bots are
which happened about 30 years back in 1978 when Gary
                                                                         controlled by the Command and Control Server which is
Theurk sent out the first spam email , the rising number of
                                                                         maintained by spammers [5].Backtracking a spam email with
junk emails has been a great nuisance to the receivers of the
                                                                         WHOIS information at any time would lead to a
email[1][2]. Today spam emails are far worse than the simple
                                                                         compromised machine whose owner is probably unaware
word ‘junk’ suggests because they bring terror in disguise
                                                                         that his machine is a bot.Spamhauses are fraud companies
trying to rob people’s credentials like vital personal
                                                                         with domain names that last for a day and have been set up
information, redirect recipients to phishing sites by making
                                                                         solely to lure investors to phishing sites. Fast Flux Service
them click on malicious urls and above all spread malwares
                                                                         Networks are similar to the botnets, they make use of
and viruses. Some of the malwares set up the recipient
                                                                         compromised machines that are used to host illegal
computer as a bot which in turn starts circulating spam emails.
                                                                         websites[7].Open relay sinkholes are mail transfer
The July 2010 Symantec report says that 88.32 percent of
                                                                         agents(MTA) that are widely used by spammers to forward
emails were spam and 12% of these spam emails were used
                                                                         mail any sender or any recipient and they go
to spread malware [3].
                                                                         undetected[5][6].
    This paper is broadly divided into three main parts : firstly
we do a study on the background of spamming practices                                   III.ANTI SPAMMING TECHNIQUES
prevalent these days , secondly we do a brief analysis of the
dominant anti spamming methods that have been developed                      Anti Spamming methodsgenerally comprise (but not
so far and third we propose a method for mining spam and                 limited to) of two methods:
thus preventing their circulation.                                          (a) Content Based Techniques.
                                                                            (b) Non Content Based Techniques
                     II.BACKGROUND                                           Content based filtering refers to the filtering mechanisms
                                                                         that use the text portion in the spam emails. Whereas non
   In the early days of spamming, spammers used to send                  content based techniques usually refer to methods that use
spam email directly to the recipients from their own mail                DNS blacklisting. These days there are methods that use a
address via the Internet Service Provider (ISP) mail server.             flavor of both content and non-content based techniques in
The first spam email that went out in 3rd May 1978 to about              the same method (C. Wei et al).
400 recipients of the ARPANET (what internet was then
called), was sent by a marketer of Digital Equipment                      A.. Content Based Techniques
Corporation (DEC)[4] . However with the introduction of                    Naïve Bayesian Filters
cyber crime laws and several anti-spam mechanisms, the
                                                                            Bayesian filters use the Bayes’ Theorem as a statistical
spammers have been forced to be in disguise. Today most of
                                                                         method to detect spam. A training data set that contains
© 2011 ACEEE                                                        25
DOI: 01.IJIT.01.01.80
ACEEE Int. J. on Information Technology, Vol. 01, No. 01, Mar 2011


probabilities of the most recurring words in spam is created                 Content based methods have been facing serious
by feeding spam emails to the system. Then when test data                hindrance from text obfuscation methods (Example:
gets fed to this system it finds the probability of the email to         VIAGRA spelled as V1AGRA or as V-I-A-G-R-A) of late,
be spam or legitimate [8][9] based on Bayes’ Theorem which               but deobfuscation of content by CMOScript [10] or with the
is as follows:                                                           Hidden Markov Model [11] has brought up the spam hit rate.
   the email to be spam or legitimate [8][9] based on Bayes’              B. Non Content Based Techniques
Theorem which is as follows:
                                                                            Domain Black Listing
                                             (1)
                                                                               Domain Blacklisting is a non content based method
                                                                         spam classification method. It is also known as real time
 where           is the probability of an Email to be Spam               black hole list(RBL).This method involves listing down all
(S) when given a particular word (W)which is generally a                 domain that ever appeared in the spam messages. SURBL
                     part of spam email.                                 and URIBL are two websites that maintains a database of
 P(S) is the proportion of emails that are spam, P(W|S) is               these black listed domains[17][16].DNS lookups are used
 the probability of the word to appear in a spam message.                to query these databases[21].DNSBL is another such tool
 Similarly P(W|L) is the probability of the word to appear               which keeps a track of all IP address that appeared through
   in a legitimate message and P(L) is the proportion of                 the domain name server in unsolicited emails[18]. However
emails that are legitimate . Putting the different probability           blacklists differ in their aggressiveness because some lists
   values in the equation (1), we decide whether the test                prefer giving less priority to false positives [21]. Domain
                 data(email) is spam or not.                             blacklisting has a major disadvantage because spammers
                                                                         prefer to register a large number of new domain everyday
Support Vector Machines (SVM)
                                                                         (as domain hosting is pretty cheap) and get rid of them the
       SVM is a method of separating spam emails with n                  next day. This evades detection by law enforcement as well
dimensional hyper planes. The most perfect hyper plane is                as serves their purpose of spamming.
called the decision boundary [12]. This decision boundary
is used to separate the points in the first class from that in           C. Combination of Content and Non-Content based
the second class[13].Often the points in the two classes are                methods
distributed and it is difficult to get a perfect hyperspace ,in             Domain Clustering using Fuzzy string matching
such cases n dimensional hyperspaces are considered. The                 Algorithm
support vectors are used to define these hyperplanes. The
                                                                             This method proposed by C. Wei et al [19] involves using
training data and test data are mainly based on Term
                                                                         both the content of the email because it takes subject as a
Frequency (TF) from spam messages [14].
                                                                         feature and also uses the IP address of the URL as another
Neural Networks                                                          feature. The goal of this method is however much different
        Neural Networks (NN) is another method of content                from the other anti spamming methods discussed above
based spam classification. Since neural networks have the                because the input of this method consist of only spam emails
capacity of working with large amount of data, they turn out             unlike others. Once it has all the spam emails in hand it does
to be very advantageous. Information in the neural network               a fuzzy matching of the both subject and IP. This is not a
is generally in the form of numerical weights and                        filtering method instead it clusters spam domains based on
connections. NN like other methods (overviewed earlier) first            subject-IP pair similarity of the emails. The similarity is done
treats itself with a set of training data consisting of both spam        by a fuzzy matching [19].
and nonspam emails where phrases and words are the input                       i. Subject-Matching
vectors. Then when the system is fed with test data the NN                        Spam emails are mostly generated using templates.
finds a pattern and classifies messages as spam and non spam.            Hence emails generated using same templates for some spam
Since NN weights are difficult to understand, visualization              campaign are very similar. The method in [19] uses Inverse
methods like interactive weight visualization have been                  Levenshtein Distance (ILD) to do the fuzzy matching
introduced [22].                                                         between subjects. The method involves the use of dynamic
                                                                         programming for finding the string alignment similarity. It
Random Forests and Active Learning                                       is preferred the string similarity score have a value between
    This method mainly uses term frequency (TF) and inverse              0 and 1 so they use Kulczynski Coefficient (defined in section
document frequency (IDF) of email messages. This is                      IV) to get the score from the ILD value.
followed by clustering to get the medoids. Medoids help in                     ii. Internet Protocol (IP) Address Matching
forming a forest. Active learning of the forest makes it more               Since a single domain can be associated to several IP
and more accurate. The authors of [15] report that accuracy              addresses, hence while matching two domains it is required
is approximately 95% which might be only 65%, 66% for                    to match all IP addresses associated with one domain to that
Naïve Bayes and SVM respectively. This is a very efficient               of the other using Fuzzy matches. The maximum number of
method amongst all content based methods.                                match gets considered [23]. prefer giving less priority to false
                                                                         positives [21]. Domain blacklisting has a major disadvantage

© 2011 ACEEE                                                        26
DOI: 01.IJIT.01.01.80
ACEEE Int. J. on Information Technology, Vol. 01, No. 01, Mar 2011


because spammers prefer to register a large number of new
domain everyday (as domain hosting is pretty cheap) and
get rid of them the next day. This evades detection by law
enforcement as well as serves their purpose of spamming.
          Two domains (or vertices) are connected to each                InvLevDis(a,b)=4(as there are 4 matches)
other by a subject score which act as the edge between them.              We use the ILD to do a Fuzzy matching between Sender’s
All domains that belong to same spamming campaign thus                names but often it is seen that the sender name is a sentence
get connected by subjects. Connected components with simi-            rather than a word separated by spaces in such cases we break
lar strength (but they have to be above a threshold value)            it up in tokens and do ILD[20][23] .Token here represents
fall in the same cluster. This method successfully detected           each word in the sentence separated by spaces.
the Canadian Pharmacy spam that was prevalent in January
2010.

                   IV. OUR METHODOLOGY
         A typical spam email (fig 1) shows that apart from
the subject there is also a feature called the sender name. In           As discussed earlier by [19] that it preferable to have
our experiments we use the same methodology as used by                similarity score between 0 and 1 so apply Kulczynski
[19][20] but we feed sender’s name as a feature instead of            Coefficient.
subject.                                                                Kulczynski(A,B)=(ILD(A,B)|A| + ILD(A,B)/|B|)/2
                                                                        Where |A|, |B| are the size of two strings.(i.e. the Senders
. Measuring similarity of Spam Sender’s name                          name). ILD (A, B) is always nonnegative, and does not
    As discussed by [19] Inverse Levenshtein helps to finds           exceed min (|A|, |B|).
alignment between strings.If a and b are two strings the ILD             The rest of the procedure is the same as C. Wei et al
count for them would be as follows:                                   where we find IP similarity and then cluster domains using
Strings a,b =InvLevDis(a,b).                                          connected components.




                                                     Figure 1: Sample Spam Email

                           V. RESULTS
   Our experiment is based on a data of 298,680 spam
emails. A complete Token Match suggested the maximum
number of times a particular sender name repeated itself was
57,773 times and a partial token match suggested that the
maximum count was 99,198 times. That is the cluster formed
as a result is very big as compared to the total number of
emails. The graph below (Figure 2) shows a comparison
between the results for complete token match.
   The graph contains the set of 5 most repeated sender
names:
                                                                                 Figure 2:Graph for Complete Token Matching


© 2011 ACEEE                                                     27
DOI: 01.IJIT.01.01.37
ACEEE Int. J. on Information Technology, Vol. 01, No. 01, Mar 2011

    Another experiment on a set of 312,585 spams separated             [6] A. Pathak , Y. C. Hu, Z. M. Mao, “Peeking into
into groups of 8 had the following result:                                 Spammer Behavior from a Unique Vantage Point”.In
                                                                           Proc. of the 1st Usenix Workshop on Large-Scale
                                                                           Exploits and Emergent Threat, 2008.
                                                                       [7] T. Holz, C. Gorecki, K. Rieck, F. C. Freiling, “Measuring
                                                                            and Detecting Fast-Flux Service Networks”. In Proc. Of 16th
                                                                            Annu. Network & Distributed System Security Symposium,
                                                                            2008.
                                                                       [8] M. Sahami, S. Dumais, D. Heckerman, E. Horvitz, “A Baysian
                                                                            Approach to Filtering Junk Email”.In Proc. of AAAI-98
                                                                            Workshop on Learning for Text Categorization., 1998.
                                                                       [9] G. Robinson, “A statistical approach to the spam
                                                                            problem,Linux Journal,Issue # 10, pp 3 2003.
                                                                       [10] CMOScript, Available : http://sandgnat.com/cmos
                                                                       [11] H. Lee, A. Y. Ng, “Spam Deobfuscation using a Hidden
                                                                            Markov Model”. In Proc. of the 2nd Conf. on Email and Anti-
                                                                            Spam,CEAS,2005.
                      VI. CONCLUSION                                   [12] C. Liu, “Experiments on Spam detection with Boosting, SVM
                                                                            and Naïve Bayes”, UCSC.
   Seeing the graph above (Figure: 2) we see that sender’s             [13] A. Khorsi, “An Overview of Content Based Spam Filtering
name can definitely be considered as one of the parameters                  Techniques”. Informatics 31 pp 269-277, 2007.
while doing fuzzy matching with strings. Table 1 justifies             [14] H. Drucker, Donghui, V. N. Vapnik. “Support Vector Machines
this argument by giving the total count of the number clusters              for Spam Categorization”. AT&T Labs-Research,1999.
formed as a result of fuzzy match. There by we suggest that            [15] D. Debarr,H. Wechsler, “Spam Detection using
sender’s name that appears in all spam emails definitely can                Clustering,Random Forests and Active Learning”. In Proc.
be taken as a feature when clustering domains in the spam                   of the 6th Conf. on Email and Anti-Spam ,CEAS , 2009.
emails. Therefore the most obvious future work in this                 [16] Available : http://www.surbl.org/.
                                                                       [17] Available : http://www.uribl.com/.
direction would be to cross validate C. Wei et al’s clusters
                                                                       [18] Available : http://www.dnsbl.info/.
with the clusters obtained by the methods of this paper by             [19] C. Wei,A. Sprague,G. Warner,A. Skjellum, “Identifying New
making them work on the same data set.                                      Spam Domains by Hosting IPs: Improving Domain
                         REFERENCES                                         Blacklisting”. In Proc. of the 7th Conf. on Email and Anti-
                                                                            Spam ,CEAS,2010.
[1] B. Templeton, “Origin of the term ‘spam’ to mean net               [20] C. Wei, “A malware-generated Spam Emails with a Novel
    abuse”.Available:http://www.templetons.com/brad/                        Fuzzy String Matching Algorithm”. In Proc. of the ACM
    spamterm.html.                                                          symposium on Applied Computing, 2009.
[2] H.Cox,       “The      History     of    Spam      emails”         [21] D. Cook, J. Hartnett, K. Manderson and J. Scanlan. “Catching
    Available : http://ezinearticles.com/?The-History-of-Spam-              Spam Before it Arrives: Domain Specific Dynamic
    Emails&id=785433                                                        Blacklists”. In Proc. of the 2006 Australasian workshops on
[3] Symantec, State of Spam and Phishing, A monthly Report                  Grid computing and e-research, 2006.
    ,Report #43,July 2010.                                             [22] Fan-Yin Tzeng,Kwan-Liu Ma(2005).Opening the Black Box
[4] B. Templeton, “Reaction to the DEC Spam of 1978” .                      :Data Driven Visualization of Neural Networks. Visualization,
    Available : http://www.templetons.com/brad/spamreact.html.              2005. VIS 05. IEEE.Page 383-390.
[5] A. Pathak, F. Qian, Y. C. Hu, Z. M. Mao, and S. Ranjan,            [23] C. Wei,A. Sprague,G. Warner, “A. Skjellum.Clustering Spam
    “Botnet Spam Campaigns Can Be Long Lasting:Evidence,                    Domains and Destination websites :Digital Forensics for Data
    Implications, and Analysis” .In Proc. of the 11th Int.Joint             mining”.Journal of Digital Forensics,Security and Law,(2009)
    Conf.on Measurement and Modeling of Computer System,
    2009.




                                                                  28
© 2011 ACEEE
DOI: 01.IJIT.01.01.80

Contenu connexe

Tendances

Messaging and Web Security
Messaging and Web SecurityMessaging and Web Security
Messaging and Web SecurityGFI Software
 
How to Keep Spam Off Your Network
How to Keep Spam Off Your NetworkHow to Keep Spam Off Your Network
How to Keep Spam Off Your NetworkGFI Software
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniquesranjit banshpal
 
Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam FilteringiNazneen
 
How to Block NDR Spam
How to Block NDR SpamHow to Block NDR Spam
How to Block NDR SpamGFI Software
 
IRJET- Image Spam Detection: Problem and Existing Solution
IRJET-  	  Image Spam Detection: Problem and Existing SolutionIRJET-  	  Image Spam Detection: Problem and Existing Solution
IRJET- Image Spam Detection: Problem and Existing SolutionIRJET Journal
 
Spam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes AlgorithmSpam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes AlgorithmAkshay Pal
 
A pact with the devil
A pact with the devilA pact with the devil
A pact with the devilUltraUploader
 
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...IJNSA Journal
 
E Mail & Spam Presentation
E Mail & Spam PresentationE Mail & Spam Presentation
E Mail & Spam Presentationnewsan2001
 
Identifying Malicious Data in Social Media
Identifying Malicious Data in Social MediaIdentifying Malicious Data in Social Media
Identifying Malicious Data in Social MediaIRJET Journal
 
Secure and Reliable Data Transmission in Generalized E-Mail
Secure and Reliable Data Transmission in Generalized E-MailSecure and Reliable Data Transmission in Generalized E-Mail
Secure and Reliable Data Transmission in Generalized E-MailIJERA Editor
 
Prepare black list using bayesian approach to improve performance of spam fil...
Prepare black list using bayesian approach to improve performance of spam fil...Prepare black list using bayesian approach to improve performance of spam fil...
Prepare black list using bayesian approach to improve performance of spam fil...IAEME Publication
 
Malware Propagation in Large-Scale Networks
Malware Propagation in Large-Scale NetworksMalware Propagation in Large-Scale Networks
Malware Propagation in Large-Scale Networks1crore projects
 

Tendances (19)

Spam and Anti Spam Techniques
Spam and Anti Spam TechniquesSpam and Anti Spam Techniques
Spam and Anti Spam Techniques
 
Spamming
SpammingSpamming
Spamming
 
Messaging and Web Security
Messaging and Web SecurityMessaging and Web Security
Messaging and Web Security
 
How to Keep Spam Off Your Network
How to Keep Spam Off Your NetworkHow to Keep Spam Off Your Network
How to Keep Spam Off Your Network
 
E mail image spam filtering techniques
E mail image spam filtering techniquesE mail image spam filtering techniques
E mail image spam filtering techniques
 
Spamming and Spam Filtering
Spamming and Spam FilteringSpamming and Spam Filtering
Spamming and Spam Filtering
 
How to Block NDR Spam
How to Block NDR SpamHow to Block NDR Spam
How to Block NDR Spam
 
What is SPAM?
What is SPAM?What is SPAM?
What is SPAM?
 
Spam Email identification
Spam Email identificationSpam Email identification
Spam Email identification
 
IRJET- Image Spam Detection: Problem and Existing Solution
IRJET-  	  Image Spam Detection: Problem and Existing SolutionIRJET-  	  Image Spam Detection: Problem and Existing Solution
IRJET- Image Spam Detection: Problem and Existing Solution
 
Spam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes AlgorithmSpam filtering with Naive Bayes Algorithm
Spam filtering with Naive Bayes Algorithm
 
Spam
SpamSpam
Spam
 
A pact with the devil
A pact with the devilA pact with the devil
A pact with the devil
 
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
EMAIL SPAM CLASSIFICATION USING HYBRID APPROACH OF RBF NEURAL NETWORK AND PAR...
 
E Mail & Spam Presentation
E Mail & Spam PresentationE Mail & Spam Presentation
E Mail & Spam Presentation
 
Identifying Malicious Data in Social Media
Identifying Malicious Data in Social MediaIdentifying Malicious Data in Social Media
Identifying Malicious Data in Social Media
 
Secure and Reliable Data Transmission in Generalized E-Mail
Secure and Reliable Data Transmission in Generalized E-MailSecure and Reliable Data Transmission in Generalized E-Mail
Secure and Reliable Data Transmission in Generalized E-Mail
 
Prepare black list using bayesian approach to improve performance of spam fil...
Prepare black list using bayesian approach to improve performance of spam fil...Prepare black list using bayesian approach to improve performance of spam fil...
Prepare black list using bayesian approach to improve performance of spam fil...
 
Malware Propagation in Large-Scale Networks
Malware Propagation in Large-Scale NetworksMalware Propagation in Large-Scale Networks
Malware Propagation in Large-Scale Networks
 

Similaire à ACEEE Int. J. on Information Technology, Vol. 01, No. 01, Mar 2011 Overview of Existing Methods of Spam Mining and Potential Usefulness of Sender’s Name as Parameter for Fuzzy String Matching

The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Survey on spam filtering
Survey on spam filteringSurvey on spam filtering
Survey on spam filteringChippy Thomas
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 
A review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamA review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamAlexander Decker
 
A Model for Fuzzy Logic Based Machine Learning Approach for Spam Filtering
A Model for Fuzzy Logic Based Machine Learning Approach for  Spam FilteringA Model for Fuzzy Logic Based Machine Learning Approach for  Spam Filtering
A Model for Fuzzy Logic Based Machine Learning Approach for Spam FilteringIOSR Journals
 
Collateral Damage: Consequences of Spam and Virus Filtering for the E-Mail S...
Collateral Damage:
Consequences of Spam and Virus Filtering for the E-Mail S...Collateral Damage:
Consequences of Spam and Virus Filtering for the E-Mail S...
Collateral Damage: Consequences of Spam and Virus Filtering for the E-Mail S...Peter Eisentraut
 
A framework to detect novel computer viruses via system calls
A framework to detect novel computer viruses via system callsA framework to detect novel computer viruses via system calls
A framework to detect novel computer viruses via system callsUltraUploader
 
Internet threats and defence mechanism
Internet threats and defence mechanismInternet threats and defence mechanism
Internet threats and defence mechanismCAS
 
E-mail and Instant MessagingChapter 16Principles of Co.docx
E-mail and Instant MessagingChapter 16Principles of Co.docxE-mail and Instant MessagingChapter 16Principles of Co.docx
E-mail and Instant MessagingChapter 16Principles of Co.docxbrownliecarmella
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptxAnush90
 

Similaire à ACEEE Int. J. on Information Technology, Vol. 01, No. 01, Mar 2011 Overview of Existing Methods of Spam Mining and Potential Usefulness of Sender’s Name as Parameter for Fuzzy String Matching (20)

402 406
402 406402 406
402 406
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Survey on spam filtering
Survey on spam filteringSurvey on spam filtering
Survey on spam filtering
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 
A review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamA review of spam filtering and measures of antispam
A review of spam filtering and measures of antispam
 
A Model for Fuzzy Logic Based Machine Learning Approach for Spam Filtering
A Model for Fuzzy Logic Based Machine Learning Approach for  Spam FilteringA Model for Fuzzy Logic Based Machine Learning Approach for  Spam Filtering
A Model for Fuzzy Logic Based Machine Learning Approach for Spam Filtering
 
Collateral Damage: Consequences of Spam and Virus Filtering for the E-Mail S...
Collateral Damage:
Consequences of Spam and Virus Filtering for the E-Mail S...Collateral Damage:
Consequences of Spam and Virus Filtering for the E-Mail S...
Collateral Damage: Consequences of Spam and Virus Filtering for the E-Mail S...
 
A framework to detect novel computer viruses via system calls
A framework to detect novel computer viruses via system callsA framework to detect novel computer viruses via system calls
A framework to detect novel computer viruses via system calls
 
Spam Filtering
Spam FilteringSpam Filtering
Spam Filtering
 
E spam
E spamE spam
E spam
 
E spam
E spamE spam
E spam
 
E spam
E spamE spam
E spam
 
E spam
E spamE spam
E spam
 
Internet threats and defence mechanism
Internet threats and defence mechanismInternet threats and defence mechanism
Internet threats and defence mechanism
 
E-mail and Instant MessagingChapter 16Principles of Co.docx
E-mail and Instant MessagingChapter 16Principles of Co.docxE-mail and Instant MessagingChapter 16Principles of Co.docx
E-mail and Instant MessagingChapter 16Principles of Co.docx
 
Eh34803812
Eh34803812Eh34803812
Eh34803812
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
 

Plus de IDES Editor

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A ReviewIDES Editor
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...IDES Editor
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...IDES Editor
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...IDES Editor
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCIDES Editor
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...IDES Editor
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingIDES Editor
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...IDES Editor
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsIDES Editor
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...IDES Editor
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...IDES Editor
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkIDES Editor
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetIDES Editor
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyIDES Editor
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’sIDES Editor
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...IDES Editor
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance AnalysisIDES Editor
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesIDES Editor
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...IDES Editor
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...IDES Editor
 

Plus de IDES Editor (20)

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A Review
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFC
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive Thresholds
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability Framework
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through Steganography
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’s
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance Analysis
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
 

Dernier

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Dernier (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

ACEEE Int. J. on Information Technology, Vol. 01, No. 01, Mar 2011 Overview of Existing Methods of Spam Mining and Potential Usefulness of Sender’s Name as Parameter for Fuzzy String Matching

  • 1. ACEEE Int. J. on Information Technology, Vol. 01, No. 01, Mar 2011 Overview of Existing Methods of Spam Mining and Potential Usefulness of Sender’s Name as Parameter for Fuzzy String Matching Soma Halder, Chun Wei, Alan Sprague Department of Computer and Information Sciences University of Alabama at Birmingham Birmingham,Alabama,USA {soma,weic,sprague}@cis.uab.edu Abstract—This paper gives an overview and analyzes the the spam emails are sent through Botnets, Spamhauses, Fast different existing anti spamming techniques both content based, Flux Service Networks (FFSN) and open relay sink holes in non content based methods and a combination of the duo. It order to trespass the protective mechanism laid down by the also suggests that a small change in one of the parameters in spam filters, domain and IP black listings. the fuzzy string matching method could be useful to produce A. Definitions better results. Before we move into analyzing the different anti spam Index Terms —email, spam, computer forensics, fuzzy mechanisms we formally define the different methods taken string match. up by the spammers to combat them and maintain their I. INTRODUCTION anonymity. Botnets are malware infected machines that are used for originating spam and spreading them to different The bulk of unsolicited email that floods our mail boxes recipients all over the web. If the recipient of the email gets is what we call spam today. Ever since its first appearance infected by this malware they in turn act as bots. All bots are which happened about 30 years back in 1978 when Gary controlled by the Command and Control Server which is Theurk sent out the first spam email , the rising number of maintained by spammers [5].Backtracking a spam email with junk emails has been a great nuisance to the receivers of the WHOIS information at any time would lead to a email[1][2]. Today spam emails are far worse than the simple compromised machine whose owner is probably unaware word ‘junk’ suggests because they bring terror in disguise that his machine is a bot.Spamhauses are fraud companies trying to rob people’s credentials like vital personal with domain names that last for a day and have been set up information, redirect recipients to phishing sites by making solely to lure investors to phishing sites. Fast Flux Service them click on malicious urls and above all spread malwares Networks are similar to the botnets, they make use of and viruses. Some of the malwares set up the recipient compromised machines that are used to host illegal computer as a bot which in turn starts circulating spam emails. websites[7].Open relay sinkholes are mail transfer The July 2010 Symantec report says that 88.32 percent of agents(MTA) that are widely used by spammers to forward emails were spam and 12% of these spam emails were used mail any sender or any recipient and they go to spread malware [3]. undetected[5][6]. This paper is broadly divided into three main parts : firstly we do a study on the background of spamming practices III.ANTI SPAMMING TECHNIQUES prevalent these days , secondly we do a brief analysis of the dominant anti spamming methods that have been developed Anti Spamming methodsgenerally comprise (but not so far and third we propose a method for mining spam and limited to) of two methods: thus preventing their circulation. (a) Content Based Techniques. (b) Non Content Based Techniques II.BACKGROUND Content based filtering refers to the filtering mechanisms that use the text portion in the spam emails. Whereas non In the early days of spamming, spammers used to send content based techniques usually refer to methods that use spam email directly to the recipients from their own mail DNS blacklisting. These days there are methods that use a address via the Internet Service Provider (ISP) mail server. flavor of both content and non-content based techniques in The first spam email that went out in 3rd May 1978 to about the same method (C. Wei et al). 400 recipients of the ARPANET (what internet was then called), was sent by a marketer of Digital Equipment A.. Content Based Techniques Corporation (DEC)[4] . However with the introduction of Naïve Bayesian Filters cyber crime laws and several anti-spam mechanisms, the Bayesian filters use the Bayes’ Theorem as a statistical spammers have been forced to be in disguise. Today most of method to detect spam. A training data set that contains © 2011 ACEEE 25 DOI: 01.IJIT.01.01.80
  • 2. ACEEE Int. J. on Information Technology, Vol. 01, No. 01, Mar 2011 probabilities of the most recurring words in spam is created Content based methods have been facing serious by feeding spam emails to the system. Then when test data hindrance from text obfuscation methods (Example: gets fed to this system it finds the probability of the email to VIAGRA spelled as V1AGRA or as V-I-A-G-R-A) of late, be spam or legitimate [8][9] based on Bayes’ Theorem which but deobfuscation of content by CMOScript [10] or with the is as follows: Hidden Markov Model [11] has brought up the spam hit rate. the email to be spam or legitimate [8][9] based on Bayes’ B. Non Content Based Techniques Theorem which is as follows: Domain Black Listing (1) Domain Blacklisting is a non content based method spam classification method. It is also known as real time where is the probability of an Email to be Spam black hole list(RBL).This method involves listing down all (S) when given a particular word (W)which is generally a domain that ever appeared in the spam messages. SURBL part of spam email. and URIBL are two websites that maintains a database of P(S) is the proportion of emails that are spam, P(W|S) is these black listed domains[17][16].DNS lookups are used the probability of the word to appear in a spam message. to query these databases[21].DNSBL is another such tool Similarly P(W|L) is the probability of the word to appear which keeps a track of all IP address that appeared through in a legitimate message and P(L) is the proportion of the domain name server in unsolicited emails[18]. However emails that are legitimate . Putting the different probability blacklists differ in their aggressiveness because some lists values in the equation (1), we decide whether the test prefer giving less priority to false positives [21]. Domain data(email) is spam or not. blacklisting has a major disadvantage because spammers prefer to register a large number of new domain everyday Support Vector Machines (SVM) (as domain hosting is pretty cheap) and get rid of them the SVM is a method of separating spam emails with n next day. This evades detection by law enforcement as well dimensional hyper planes. The most perfect hyper plane is as serves their purpose of spamming. called the decision boundary [12]. This decision boundary is used to separate the points in the first class from that in C. Combination of Content and Non-Content based the second class[13].Often the points in the two classes are methods distributed and it is difficult to get a perfect hyperspace ,in Domain Clustering using Fuzzy string matching such cases n dimensional hyperspaces are considered. The Algorithm support vectors are used to define these hyperplanes. The This method proposed by C. Wei et al [19] involves using training data and test data are mainly based on Term both the content of the email because it takes subject as a Frequency (TF) from spam messages [14]. feature and also uses the IP address of the URL as another Neural Networks feature. The goal of this method is however much different Neural Networks (NN) is another method of content from the other anti spamming methods discussed above based spam classification. Since neural networks have the because the input of this method consist of only spam emails capacity of working with large amount of data, they turn out unlike others. Once it has all the spam emails in hand it does to be very advantageous. Information in the neural network a fuzzy matching of the both subject and IP. This is not a is generally in the form of numerical weights and filtering method instead it clusters spam domains based on connections. NN like other methods (overviewed earlier) first subject-IP pair similarity of the emails. The similarity is done treats itself with a set of training data consisting of both spam by a fuzzy matching [19]. and nonspam emails where phrases and words are the input i. Subject-Matching vectors. Then when the system is fed with test data the NN Spam emails are mostly generated using templates. finds a pattern and classifies messages as spam and non spam. Hence emails generated using same templates for some spam Since NN weights are difficult to understand, visualization campaign are very similar. The method in [19] uses Inverse methods like interactive weight visualization have been Levenshtein Distance (ILD) to do the fuzzy matching introduced [22]. between subjects. The method involves the use of dynamic programming for finding the string alignment similarity. It Random Forests and Active Learning is preferred the string similarity score have a value between This method mainly uses term frequency (TF) and inverse 0 and 1 so they use Kulczynski Coefficient (defined in section document frequency (IDF) of email messages. This is IV) to get the score from the ILD value. followed by clustering to get the medoids. Medoids help in ii. Internet Protocol (IP) Address Matching forming a forest. Active learning of the forest makes it more Since a single domain can be associated to several IP and more accurate. The authors of [15] report that accuracy addresses, hence while matching two domains it is required is approximately 95% which might be only 65%, 66% for to match all IP addresses associated with one domain to that Naïve Bayes and SVM respectively. This is a very efficient of the other using Fuzzy matches. The maximum number of method amongst all content based methods. match gets considered [23]. prefer giving less priority to false positives [21]. Domain blacklisting has a major disadvantage © 2011 ACEEE 26 DOI: 01.IJIT.01.01.80
  • 3. ACEEE Int. J. on Information Technology, Vol. 01, No. 01, Mar 2011 because spammers prefer to register a large number of new domain everyday (as domain hosting is pretty cheap) and get rid of them the next day. This evades detection by law enforcement as well as serves their purpose of spamming. Two domains (or vertices) are connected to each InvLevDis(a,b)=4(as there are 4 matches) other by a subject score which act as the edge between them. We use the ILD to do a Fuzzy matching between Sender’s All domains that belong to same spamming campaign thus names but often it is seen that the sender name is a sentence get connected by subjects. Connected components with simi- rather than a word separated by spaces in such cases we break lar strength (but they have to be above a threshold value) it up in tokens and do ILD[20][23] .Token here represents fall in the same cluster. This method successfully detected each word in the sentence separated by spaces. the Canadian Pharmacy spam that was prevalent in January 2010. IV. OUR METHODOLOGY A typical spam email (fig 1) shows that apart from the subject there is also a feature called the sender name. In As discussed earlier by [19] that it preferable to have our experiments we use the same methodology as used by similarity score between 0 and 1 so apply Kulczynski [19][20] but we feed sender’s name as a feature instead of Coefficient. subject. Kulczynski(A,B)=(ILD(A,B)|A| + ILD(A,B)/|B|)/2 Where |A|, |B| are the size of two strings.(i.e. the Senders . Measuring similarity of Spam Sender’s name name). ILD (A, B) is always nonnegative, and does not As discussed by [19] Inverse Levenshtein helps to finds exceed min (|A|, |B|). alignment between strings.If a and b are two strings the ILD The rest of the procedure is the same as C. Wei et al count for them would be as follows: where we find IP similarity and then cluster domains using Strings a,b =InvLevDis(a,b). connected components. Figure 1: Sample Spam Email V. RESULTS Our experiment is based on a data of 298,680 spam emails. A complete Token Match suggested the maximum number of times a particular sender name repeated itself was 57,773 times and a partial token match suggested that the maximum count was 99,198 times. That is the cluster formed as a result is very big as compared to the total number of emails. The graph below (Figure 2) shows a comparison between the results for complete token match. The graph contains the set of 5 most repeated sender names: Figure 2:Graph for Complete Token Matching © 2011 ACEEE 27 DOI: 01.IJIT.01.01.37
  • 4. ACEEE Int. J. on Information Technology, Vol. 01, No. 01, Mar 2011 Another experiment on a set of 312,585 spams separated [6] A. Pathak , Y. C. Hu, Z. M. Mao, “Peeking into into groups of 8 had the following result: Spammer Behavior from a Unique Vantage Point”.In Proc. of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threat, 2008. [7] T. Holz, C. Gorecki, K. Rieck, F. C. Freiling, “Measuring and Detecting Fast-Flux Service Networks”. In Proc. Of 16th Annu. Network & Distributed System Security Symposium, 2008. [8] M. Sahami, S. Dumais, D. Heckerman, E. Horvitz, “A Baysian Approach to Filtering Junk Email”.In Proc. of AAAI-98 Workshop on Learning for Text Categorization., 1998. [9] G. Robinson, “A statistical approach to the spam problem,Linux Journal,Issue # 10, pp 3 2003. [10] CMOScript, Available : http://sandgnat.com/cmos [11] H. Lee, A. Y. Ng, “Spam Deobfuscation using a Hidden Markov Model”. In Proc. of the 2nd Conf. on Email and Anti- Spam,CEAS,2005. VI. CONCLUSION [12] C. Liu, “Experiments on Spam detection with Boosting, SVM and Naïve Bayes”, UCSC. Seeing the graph above (Figure: 2) we see that sender’s [13] A. Khorsi, “An Overview of Content Based Spam Filtering name can definitely be considered as one of the parameters Techniques”. Informatics 31 pp 269-277, 2007. while doing fuzzy matching with strings. Table 1 justifies [14] H. Drucker, Donghui, V. N. Vapnik. “Support Vector Machines this argument by giving the total count of the number clusters for Spam Categorization”. AT&T Labs-Research,1999. formed as a result of fuzzy match. There by we suggest that [15] D. Debarr,H. Wechsler, “Spam Detection using sender’s name that appears in all spam emails definitely can Clustering,Random Forests and Active Learning”. In Proc. be taken as a feature when clustering domains in the spam of the 6th Conf. on Email and Anti-Spam ,CEAS , 2009. emails. Therefore the most obvious future work in this [16] Available : http://www.surbl.org/. [17] Available : http://www.uribl.com/. direction would be to cross validate C. Wei et al’s clusters [18] Available : http://www.dnsbl.info/. with the clusters obtained by the methods of this paper by [19] C. Wei,A. Sprague,G. Warner,A. Skjellum, “Identifying New making them work on the same data set. Spam Domains by Hosting IPs: Improving Domain REFERENCES Blacklisting”. In Proc. of the 7th Conf. on Email and Anti- Spam ,CEAS,2010. [1] B. Templeton, “Origin of the term ‘spam’ to mean net [20] C. Wei, “A malware-generated Spam Emails with a Novel abuse”.Available:http://www.templetons.com/brad/ Fuzzy String Matching Algorithm”. In Proc. of the ACM spamterm.html. symposium on Applied Computing, 2009. [2] H.Cox, “The History of Spam emails” [21] D. Cook, J. Hartnett, K. Manderson and J. Scanlan. “Catching Available : http://ezinearticles.com/?The-History-of-Spam- Spam Before it Arrives: Domain Specific Dynamic Emails&id=785433 Blacklists”. In Proc. of the 2006 Australasian workshops on [3] Symantec, State of Spam and Phishing, A monthly Report Grid computing and e-research, 2006. ,Report #43,July 2010. [22] Fan-Yin Tzeng,Kwan-Liu Ma(2005).Opening the Black Box [4] B. Templeton, “Reaction to the DEC Spam of 1978” . :Data Driven Visualization of Neural Networks. Visualization, Available : http://www.templetons.com/brad/spamreact.html. 2005. VIS 05. IEEE.Page 383-390. [5] A. Pathak, F. Qian, Y. C. Hu, Z. M. Mao, and S. Ranjan, [23] C. Wei,A. Sprague,G. Warner, “A. Skjellum.Clustering Spam “Botnet Spam Campaigns Can Be Long Lasting:Evidence, Domains and Destination websites :Digital Forensics for Data Implications, and Analysis” .In Proc. of the 11th Int.Joint mining”.Journal of Digital Forensics,Security and Law,(2009) Conf.on Measurement and Modeling of Computer System, 2009. 28 © 2011 ACEEE DOI: 01.IJIT.01.01.80