SlideShare une entreprise Scribd logo
1  sur  72
Télécharger pour lire hors ligne
I
EMAIL FILTERING AND ANALYSIS
USING CLASSIFICATION ALGORITHMS
Submitted in partial fulfillment of the requirements
of the degree of
Bachelor of Engineering in Information Technology
By
Akshay Iyer
Dipti Pamnani
Akanksha Pandey
Karmanya Pathak
Supervisor:
Mrs. Jayshree Hajgude
Department of Information Technology
Vivekanand Education Society’s Institute of Technology
2013-14
II
Project Report Approval for B. E.
This project report entitled EMAIL FILTERING AND ANALYSIS USING
CLASSIFICATION ALGORITHMS by Akshay Iyer, Dipti Pamnani,
Akanksha Pandey, and Karmanya Pathak is approved for the degree of
Bachelor of Engineering in Information Technology.
Examiners
1.---------------------------------------------
2.---------------------------------------------
Supervisors
1.---------------------------------------------
2.---------------------------------------------
Chairman
-----------------------------------------------
Date:
Place:
III
Declaration
I declare that this written submission represents my ideas in my own words and
where others' ideas or words have been included, I have adequately cited and
referenced the original sources. I also declare that I have adhered to all principles
of academic honesty and integrity and have not misrepresented or fabricated or
falsified any idea/data/fact/source in my submission. I understand that any
violation of the above will be cause for disciplinary action by the Institute and can
also evoke penal action from the sources which have thus not been properly cited
or from whom proper permission has not been taken when needed.
-----------------------------------------
Akshay Iyer
-----------------------------------------
Dipti Pamnani
-----------------------------------------
Akanksha Pandey
-----------------------------------------
Karmanya Pathak
Date:
IV
ACKNOWLEDGEMENT
This project has been a great learning experience for us. Through the course of this year, we have worked
as a team for the successful completion of this project. Though, on paper it is only us who have made this
project, in reality there are some people without whom this project could not have been finalized and
designed the way it looks now.
First of all, we would like to thank our Principal, Dr.(Mrs.) J.M.Nair, and our Vice-Principal, Dr.
S.Mukhopadhyay for their support and guidance throughout the project implementation period. Without
their help, the project would not have been possible.
First of all, we are truly indebted to our internal project guide Mrs. Jayshree Hajgude, for her immense
guidance and support. She has encouraged us and channelized our enthusiasm effectively.
We would like to thank, Mrs. Vijayalakshmi Muralidharan, HOD of Information Technology Department.
We would also like to thank our lab in charges, Mr. Amar Jaiswar and Mr. Ulhas Pawar, who have been
very kind to us.
Lastly, but not the least, we want to thank our college, Vivekanand Education Society of Institute and
Technology, for providing us with the excellent reference materials and great computing facilities.
V
ABSTRACT
With the various developments that are taking place in the field of technology especially in the
communication department, there are a wide variety of malpractices that are being taking place which
might prove harmful to the user. Most of this is currently being observed in the Email Account of a user.
The Email user has an Inbox which consists of a wide variety of mails, and these mails are present in an
unorganized manner. Also some mails which are being received by the user may contain harmful content
which may prove to have severe consequences (Normally Termed As Spam). With this idea in mind, the
topic of our BE Project is Email Filtering.
Email Filtering is the process which is used in order to classify the Emails intro various categories on the
basis of their content. The application fetches the emails from a user’s id, and stores it in a server, it then
classifies it into spam and non spam using classification algorithms, and also it classifies it into user
defined categories on the basis of the keyword entered by the user. The user can also send, forward and
reply to a particular mail. There is also a lot of historical spam analysis done by the application on the
basis of the content downloaded by the user. The user can access, read, store and copy the contents of his
Email.
The project report begins with a small introduction about Email Filtering and the reason we have chosen
this topic. This is then followed by the Literature Survey, which tells the various areas where you can find
similar operations being performed, and the various features of Email Filtering. We have also explained
about the Algorithms which we are going to use in order to classify the Emails.
The project then focuses of the Implementation Flow, and various Use Case Designs, which will help in
better understanding of the various features of the project. This chapter is then followed by the actual
implementation code of the project where, you will find information about the various snippets of the
code that are a part of the project. Also, detailed explanation regarding each window of the Email
Filtering application has been written down for the user. The next chapter will display the screenshots of
the Email Filtering, and the various analyses which has been performed by the application, different types
of graphical information is also made visible. This chapter is then followed by the conclusion and the
future scope of the project as to what all features are going to be implemented in the future. The last
chapter consists of a list of references which have played an important role in bringing about the
completion of the project.
VI
Table of Contents
1. INTRODUCTION
1.1. What is Email Filtering…………………………………………………......2
1.2. Motivation…………………………………………………………………..3
1.3. Problem Definition ………………...………………...…………………..…4
1.4. Objectives…………………………………………………………………...5
2. LITERATURE SURVEY
2.1. Application………………………………………………………………....7
2.2. Issues Faced…………………………………………………………….......8
2.3. Different areas of Applications……………………………………………..9
3. ANALYSIS
3.1. C4.5 Algorithm…………………………………………………………......11
3.2. Naïve Bayes Algorithm………………………………………………….....12
3.3. Formulae…………………………………………………………………....15
4. DESIGN
4.1. Implementation Flow……….…………………………………………..…..17
4.2. Use Case Diagram………….…………………………………………….....19
4.3. Class Diagram…………….………………………………………………...20
4.4. Activity Diagram………….….………………………………………….….21
5. IMPLEMENTATION
5.1. The Connection Dialog Box……..…………………………………………23
5.2. The Email Client Window………..………………………………………...28
5.3. The Message Dialog Box....……….……….…………………………….....38
5.4. The File Chooser…………………….…………………………...................39
5.5. The Downloading Dialog Box……….……………………………………..40
5.6. The Analysis Window………………………………………………........... 41
6. RESULTS………………………………………………...................................49
7. CONCLUSION………………………………………………...........................59
8. FUTURE SCOPE………………………………………………........................61
9. REFERENCES………………………………………………............................64
VII
LIST OF IMAGES
S. NO IMAGE PG. NO
1 A graph showing the rate of spam and its increase in the past few years 3
2 The Gmail Inbox which has user various folders in which mails get classified 9
3 A logo of the Apache Spam Assassin 9
4 Implementation Flow 17,18
5 Use Case Diagram 19
6 Class Diagram 20
7 Activity Diagram 21
8 A screenshot of the connect dialog window. 49
9 A screenshot of the home screen which opens once the user is logging in 49
10 A Screenshot of the Main Page where all operations can be performed 50
11 A Screenshot of the message viewer tab 50
12 The Save Dialog Box Appears when store in PC has been clicked 51
13 A Screenshot of the Messaging Tab 51
14 A Screenshot of New Message box 52
15 A Screenshot of Reply Message box 52
16 A Screenshot of Forward Message Box 52
17 A screenshot of the credits page 53
18 The Message Dialog 53
19 The File Chooser 54
20 The Downloading Dialog 54
21 A Screenshot of the Statistics tab 55
22 The Annual Spam Rate Report 55
23 The Monthly Spam Rate Report 56
24 The Weekly Spam Rate Report 56
25 Comparative Spam Rate Report 57
26 User Defined Messages Quantity 57
LIST OF TABLES
S. NO TABLE PG. NO
1 The structure of the login details table 26
2 The structure of the main table where all the mails are stored 26
3 The structure of the keyword table where all the keywords are stored 27
1
CHAPTER 1
INTRODUCTION
2
1.1 What is Email Filtering?
Email Filtering refers to the classification of an account’s emails based on two types of emails:
 Spam and
 Non-Spam.
The user first logs in to his account using the valid id and password. Upon logging in, the user’s mails
are fetched in the database and are classified into spam and non-spam. The user can also create
custom labels which are classified using keywords provided by the user. Also, he can browse for the
unread or read emails. This makes the mail service easy and user friendly.
A basic task in email filtering is to mine the data from an email and to classify it into the different
categories using Data Mining classification algorithm. Decision Tree Classification is a method
commonly used in data mining.
Email Filtering involves spam filtering, generalized filtering and segregation and filtering of inbound
emails. Spam mails are filtered since they are not important to most of the users. Generalized filtering
and segregation of emails is segregation of the mails into different categories such as sent and non-
spam.
Companies filter outbound emails so that sensitive data regarding the working of the company do not
leak intentionally or accidentally by emails.
To summarize email filtering
 Segregates inbound mails into different categories.
 Filters inbound mails so as not to leak sensitive information.
The different categories in which the emails are classified are:
 Spam
 Non- Spam
Also, the user can define categories as per his choice and can set the values as per the user’s choice.
The user can enter the values, and these values will get associated with all the mails that have been
calculated.
3
1.2. Motivation for this domain
With the increase in the internet users, communication and transfer of files and data through different
methods over the internet has increased drastically. In such times, it is difficult to know what kinds of
emails are entering your organisation or system.
Most of the present filtering techniques are unable to handle frequent changing scenario of mails
adopted by the senders over the time.
A graph showing the rate of spam and its increase in the past few years
In absolute numbers, the average number of spam mails sent per day increased from 2.4 billion in
2002 to 300 billion in 2010.
Google today announced it has made security improvements to Gmail to further protect users’ emails
from snooping. Gmail now always uses an encrypted HTTPS connection when you check or send
email, and encrypts all messages moving internally on Google’s servers.
With the advent of growth in technology, desktop based email applications are more increasingly
used. Outlook express has changed the way the world read’s and communicates with the help of
Email.
4
1.3. Problem Definition
As the Internet grows at a phenomenal rate, electronic mail (abbreviated as E-mail) has become a
widely used electronic form of communication on the Internet. Every day, a huge number of people
exchange messages in this fast and inexpensive way. With the excitement on electronic commerce
growing, the usage of E-mail will increase more dramatically. However, the advantages of E-mail also
make it overused by companies, organizations or people to promote products and spread information,
which serves their own purposes. The mailbox of a user may often be crammed with E-mail messages
some or even a large portion of which are not of interest to her/him. Searching for interesting
messages everyday is becoming tedious and annoying. As a consequence, a personal E-mail filter is
indeed needed.
In recent years the highest degree of communication happens through e-mails which are often affected
by passive or active attacks. Effective e-mail filtering measures are the timely requirement to handle
such attacks. The basic idea behind e-mail filtering is to organize the incoming e-mails and also
employ a mail filter to prioritize messages, and to sort them into folders based on subject matter or
other criteria.
The purpose of our application is to classify the incoming mails into different categories as follows:
Spam and
Non Spam
Also there are various other categories which can be created and defined by the user himself which
are stated as shown.
Facebook
Flipkart
Amazon
MakeMyTrip
5
1.4. Objectives
User Interactive
Whenever the user would like to bring about some modifications to his particular application, he
would be able to achieve it easily and without any glitches.
The user would be able to use the application as per his requirements and reap the benefits of the
same.
Security
Security is also an important issue which needs to be considered before going about the actual
procedure and hence the user should be able use his client application in an extremely safe and
sophisticated manner without any fear of security breaks, and SQL attacks.
Spam Detection
This is the major aim of our project and we aim at bringing about the classification of mails, as per the
presence of malicious content which may be harmful for the user computer and hence has been
regarded as spam.
User Defined Mail Analysis
This is a new feature which would be included in our project
According to this, the user can define his own keyword, and on the basis of that, he can access his
mails easily and without any glitch.
The user himself will define the keywords, and on the basis of the keywords that have been defined,
he can clearly check all the concerned mails under one window.
The user will be able to enter a keyword and on the basis of that keyword the mails will get classified.
Historical Spam Analysis
This is one of the features of our projects.
All the mails that have been received by the user, can be analysed over its time period, and on the
basis of that analysis, historical data, and spam detection can be brought about.
The user can easily track which mails, have had the maximum spam, and in which year did he year
the maximum amount of spam mail.
The user can do the same Monthly and Weekly
6
CHAPTER 2
LITERATURE SURVEY
7
2.1. Different areas of Application
Spam Filtering
With the advent of Internet, the number of spam mails has increased too.
A spam filter is a program that is used to detect unsolicited and unwanted email and prevent those
messages from getting to a user’s inbox. Like other types of filtering programs, a spam filter looks for
certain criteria on which it bases judgments.
Generalized Filtering and Segregation of E-mails
Email filtering is the processing of email to organize it according to specified criteria. Most often this
refers to the automatic processing of incoming messages, but the term also applies to the intervention
of human intelligence in addition to anti-spam techniques, and to outgoing emails as well as those
being received.
Filtering mails based on classes like spam, travel, social and look out for a country-based
classification of official mails for ease of access to mails from specific sub-branches would help make
the mail service more efficient in terms of accessibility and user-friendliness.
Inbound and Outbound Filtering of E-mails
Mail filters can operate on inbound and outbound email traffic. Inbound email filtering involves
scanning messages from the Internet addressed to users protected by the filtering system or for lawful
interception.
Outbound email filtering involves the reverse – scanning email messages from local users before any
potentially harmful messages can be delivered to others on the Internet.
One method of outbound email filtering that is commonly used by Internet service
providers is transparent SMTP proxy, in which email traffic is intercepted and filtered via a
transparent proxy within the network.
Outbound filtering can also take place in an email server. Many corporations employ data leak
prevention technology in their outbound mail servers to prevent the leakage of sensitive information
via email.
8
2.2. Issues Faced
Avoidance of vocabulary treated as Spam by Spammers
The subject and body content are chosen carefully by spammers. Being aware of terms, text
processing rules of a filter, etc. helps the spammers to use alternate words still serving the same
purpose yet not falling prey to the filter. This helps them to pass the filter and the mail is treated as a
non-spam mail which otherwise would have formed part of spam bulk.
The Double Opt-In problem
One of the main problems faced by spammers is to gain access and explicit permission to mail any
particular user. An efficient solution found out by the clan is the Double Opt-In method.
It works in the following manner:
1. The user enters his email address into an online form.
2. They receive a confirmation link.
3. On clicking the conformation link the spammer gets explicit permission to send mails to the user.
These mails, though actually spam, are then treated as normal and non-spam mails.
The Encrypted E-Mail Problem
The Encrypted E-Mail Problem is one of the most important problems which are being faced by
various E-Mail Client Applications. Most of the bank transactions which are being performed by
various banks and corporate companies are sent in an encrypted format to the concerned user. This is
done in order to ensure security.
Many mails which are sent by many Telecom and multinational companies concerning any payment
or any transfer of money are also done in the Encrypted format.
The message which is viewed in the user inbox, is not actually the mail which has been revived by it,
it is encrypted using some encryption key which can be retrieved by some user credentials, such as the
user bank account number, his password.
Thus, it is extremely difficult to bring about classification of mails in this format.
Recently, Gmail had announced that, it has taken a step forward in correct classification of encrypted
mails, which is soon to be implemented by them.
9
2.3. Recent Applications
Gmail
Email filtering has been and is being continuously developed and used by various email service
providers. Recently Gmail added many more categories apart from spam which includes travel,
promotions; etc. This has helped the users of Gmail to achieve and efficient classification of all
incoming mails. The effectiveness of Gmail filters was recorded to a 99.05%.
The Gmail Inbox which has user various folders in which mails get classified
SpamAssassin
SpamAssassin is a mail filter to identify spam. It is an intelligent email filter which uses a diverse
range of tests to identify unsolicited bulk email, more commonly known as Spam. These tests are
applied to email headers and content to classify email using advanced statistical methods. In addition,
SpamAssassin has a modular architecture that allows other technologies to be quickly wielded against
spam and is designed for easy integration into virtually any email system.
A logo of the Apache SpamAssassin
10
CHAPTER 3
ANALYSIS
11
3.1. The C4.5 Algorithm
C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension
of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for
classification, and for this reason, C4.5 is often referred to as a statistical classifier.
C4.5 builds decision trees from a set of training data in the same way as ID3, using the concept
of information entropy. The training data is a set
of already classified samples. Each sample consists of a p-dimensional vector
,
Where the represent attributes or features of the sample, as well as the class in which falls.
At each node of the tree, C4.5 chooses the attribute of the data that most effectively splits its set of
samples into subsets enriched in one class or the other. The splitting criterion is the
normalized information gain (difference in entropy). The attribute with the highest normalized
information gain is chosen to make the decision. Thus, the C4.5 algorithm then recourses on the
smaller sub lists.
This algorithm has a few base cases.
 All the samples in the list belong to the same class. When this happens, it simply creates a leaf
node for the decision tree saying to choose that class.
 None of the features provide any information gain. In this case, C4.5 creates a decision node
higher up the tree using the expected value of the class.
 Instance of previously-unseen class encountered. Again, C4.5 creates a decision node higher up
the tree using the expected value.
Pseudo code
In pseudo code, the general algorithm for building decision trees is:
1. Check for base cases
2. For each attribute a
Find the normalized information gain ratio from splitting on a
3. Let a_best be the attribute with the highest normalized information gain
4. Create a decision node that splits on a_best
5. Recurse on the sub lists obtained by splitting on a_best, and add those nodes as children
of node
12
3.2. The Naïve Bayes Algorithm
A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with
strong (naive) independence assumptions. A more descriptive term for the underlying probability
model would be "independent feature model". An overview of statistical classifiers is given in the
article on pattern recognition.
In simple terms, a naive Bayes classifier assumes that the value of a particular feature is unrelated to
the presence or absence of any other feature, given the class variable. For example, a fruit may be
considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier
considers each of these features to contribute independently to the probability that this fruit is an
apple, regardless of the presence or absence of the other features.
For some types of probability models, naive Bayes classifiers can be trained very efficiently in
a supervised learning setting. In many practical applications, parameter estimation for naive Bayes
models uses the method of maximum likelihood; in other words, one can work with the naive Bayes
model without accepting Bayesian probability or using any Bayesian methods.
Despite their naive design and apparently oversimplified assumptions, naive Bayes classifiers have
worked quite well in many complex real-world situations. In 2004, an analysis of the Bayesian
classification problem showed that there are sound theoretical reasons for the apparently
implausible efficacy of naive Bayes classifiers. Still, a comprehensive comparison with other
classification algorithms in 2006 showed that Bayes classification is outperformed by other
approaches, such as boosted trees or random.
Advantages:
An advantage of naive Bayes is that it only requires a small amount of training data to estimate the
parameters (means and variances of the variables) necessary for classification. Because independent
variables are assumed, only the variances of the variables for each class need to be determined and not
the entire covariance matrix.
Probabilistic model:
Abstractly, the probability model for a classifier is a conditional model
over a dependent class variable with a small number of outcomes or classes, conditional on
several feature variables through . The problem is that if the number of features is
13
large or when a feature can take on a large number of values, then basing such a model on
probability tables is infeasible. We therefore reformulate the model to make it more tractable.
Using Bayes' theorem, this can be written
In plain English, using Bayesian Probability terminology, the above equation can be written
as
In practice, there is interest only in the numerator of that fraction, because the denominator does not
depend on and the values of the features are given, so that the denominator is effectively
constant. The numerator is equivalent to the joint probability model
which can be rewritten as follows, using the chain rule for repeated applications of the definition
of conditional probability:
Now the "naive" conditional independence assumptions come into play: assume that each
feature is conditionally independent of every other feature for given the
category . This means that
,
, ,
and so on, for . Thus, the joint model can be expressed as
14
This means that under the above independence assumptions, the conditional distribution over the class
variable is:
where the evidence is a scaling factor dependent only
on , that is, a constant if the values of the feature variables are known.
Constructing a classifier from the probability model:
The discussion so far has derived the independent feature model, that is, the naive Bayes probability
model. The naive Bayes classifier combines this model with a decision rule. One common rule is to
pick the hypothesis that is most probable; this is known as the maximum a posterior or MAP decision
rule. The corresponding classifier, a Bayes classifier, is the function defined as follows:
15
3.3. Formulae
F-Measure
F-measure = 2 * precision * recall / (precision + recall)
Where,
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
True Positive Rate (Sensitivity )
TPR = TP / (TP +FN)
False Positive Rate
FPR = FP / (FP + TN)
True Negative Rate (Specificity)
TNR = TN / (FP + TN)
False Negative Rate
FNR = FN / (TP+FN)
16
CHAPTER 4
DESIGN
17
4.1. Implementation Flow
Home Signup Login Creation of 2
tables in
MySQL
Creation of 3
separate fields in
main table:
 Naïve
Bayes
 C 4.5
 Keyword
Graphical
Display of
the mails
fetched and
the unread
mails.Fill Credentials
 Username
 Password
 Name
 Surname
 Phone no
Fill Credentials
 Username
 Password
The
credentials get
stored in a
table called
login details
Authenticate
based on
details in
login details
Classification
Selection between
Naïve Bayes,
C4.5, keyword
based
classification with
a multi-select
option available
to the user
On selection and
submission of
choices by clicking
on CLASSIFY
button, mails are
classified into spam
and non-spam
18
Message
Viewer
Allows the user to
sell it
 Spam or
 Non-spam or
 Keyword
Gives a view
of mails with
From and
subject as per
choices made
Allows for
keyword
based view,
where a
search is
made by
looking at the
subject as
well as the
content
An option
to store
mail to PC
made
available
An option to
copy e-
mail/content
to clipboard
Statistics Allows for a
graphical comparison
between on e-mails
and on an annual,
monthly or weekly
statistical view of e-
mail based on
historical data.
Messaging Read e-mails Reply to
e-mails
Forward
e-mails
19
4.2. Use Case Diagram
20
4.4. Class Diagram
21
4.5. Activity Diagram
22
CHAPTER 5
IMPLEMENTATION
23
5.1 The Connection Dialog Box
The connection window is the major window which takes all the login credentials and the required
information from the user and stores it in the server. The signup credentials take information such as,
the username, the password, and the Name, Surname, Country, and Mobile Number of the user. The
user also needs to provide the Server with which he is going to be interacting, and the server which is
going to be used by the user to perform message sending operations. As specified earlier, the two mail
server which is going to be accessed is the IMAP server, and the SMTP server is going to be used for
message transport and access.
(See Screenshot 1)
From the above image, it can clearly be understood as to what operations are going to be performed
by the connect dialog window, and what are the prerequisites for signing up by the user. Also, as soon
as the user is signing up there are two separate tables that are created for the user, the first one is the
main user table where all the mails are getting fetched and they are getting stored. The second table is
the keyword table that stores all the user defined keywords that have been searched by the user.
ConnectDialog.java
package emailfiltering;
import java.awt.*;
import java.awt.event.*;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.Statement;
import javax.swing.*;
public class ConnectDialog extends javax.swing.JDialog {
Connection conn = null;
Statement stmt = null, stmt1 = null;
ResultSet rs = null;
String un, ps, n, sn, co, imap, smtp, mobile;
public ConnectDialog(Frame parent) {
// Call super constructor, specifying that dialog is modal.
super(parent, true);
initComponents();
try {
Class.forName("com.mysql.jdbc.Driver");
conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/email", "root",
"");
System.out.println("Connection Established Successfully");
} catch (Exception e) {
System.out.println(e);
}
// Set application title.
setTitle("Connect");
// Handle closing events.
addWindowListener(new WindowAdapter() {
24
public void windowClosing(WindowEvent e) {
actionCancel();
}
});
}
private void actionConnect() {
if (usernameTextField.getText().trim().length() < 1
|| passwordField.getPassword().length < 1) {
JOptionPane.showMessageDialog(this,
"One or more settings is missing.",
"Missing Setting(s)", JOptionPane.ERROR_MESSAGE);
return;
}
// Close dialog.
dispose();
}
// Cancel connecting and exit program.
private void actionCancel() {
System.exit(0);
}
public String getUsername() {
return usernameTextField.getText();
}
// Get e-mail password.
public String getPassword() {
return new String(passwordField.getPassword());
}
@SuppressWarnings("unchecked")
// <editor-fold defaultstate="collapsed" desc="Generated Code">
private void connectButtonActionPerformed(java.awt.event.ActionEvent evt) {
actionConnect();
}
private void cancelButtonActionPerformed(java.awt.event.ActionEvent evt) {
actionCancel();
}
private void signupActionPerformed(java.awt.event.ActionEvent evt) {
un = username.getText();
ps = password.getText();
n = name.getText();
sn = surname.getText();
co = country.getText();
imap = servername.getText();
smtp = smtpserver.getText();
mobile = phoneno.getText();
try {
String sql = "INSERT INTO `logindetails`
(`Username`,`Password`,`Name`,`Surname`,`Country`,`Server`,`SMTPServer`,`Phoneno`)
VALUES (?,?,?,?,?,?,?,?);";
PreparedStatement pstmt = conn.prepareStatement(sql);
pstmt.setString(1, un);
pstmt.setString(2, ps);
pstmt.setString(3, n);
pstmt.setString(4, sn);
pstmt.setString(5, co);
pstmt.setString(6, imap);
pstmt.setString(7, smtp);
pstmt.setString(8, mobile);
pstmt.executeUpdate();
} catch (Exception e) {
25
System.out.println(e);
}
int index = un.indexOf("@");
String name = un.substring(0, index);
String tablename = name.replace(".", "");
try {
String sql = "CREATE TABLE IF NOT EXISTS `" + tablename + "` ( `From` text NOT
NULL, `Subject` text NOT NULL, `Content` longtext NOT NULL, `Naivebayes` text NOT
NULL, `C45` text NOT NULL, `Day` varchar(3) NOT NULL, `Month` varchar(3) NOT NULL,
`Date` int(2) NOT NULL, `Year` int(4) NOT NULL, `Time` int(2) NOT NULL, `Keyword`
text NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1;";
stmt = (Statement) conn.createStatement();
stmt.executeUpdate(sql);
String sql1 = "CREATE TABLE IF NOT EXISTS `" + tablename + "_keyword` ( `Keyword`
text NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1;";
stmt1 = (Statement) conn.createStatement();
stmt1.executeUpdate(sql1);
} catch (Exception e) {
System.out.println(e);
}}
// Variables declaration - do not modify
private javax.swing.JButton cancelButton;
private javax.swing.JButton connectButton;
private javax.swing.JTextField country;
private javax.swing.JLabel jLabel10;
private javax.swing.JLabel jLabel11;
private javax.swing.JLabel jLabel12;
private javax.swing.JLabel jLabel13;
private javax.swing.JLabel jLabel14;
private javax.swing.JLabel jLabel15;
private javax.swing.JLabel jLabel16;
private javax.swing.JLabel jLabel2;
private javax.swing.JLabel jLabel4;
private javax.swing.JLabel jLabel5;
private javax.swing.JLabel jLabel6;
private javax.swing.JLabel jLabel7;
private javax.swing.JLabel jLabel8;
private javax.swing.JLabel jLabel9;
private javax.swing.JTextField name;
private javax.swing.JTextField password;
private javax.swing.JPasswordField passwordField;
private javax.swing.JTextField phoneno;
private javax.swing.JTextField servername;
private javax.swing.JButton signup;
private javax.swing.JTextField smtpserver;
private javax.swing.JTextField surname;
private javax.swing.JTextField username;
private javax.swing.JTextField usernameTextField;
// End of variables declaration
}
26
When the user is signing up for the first time, all his information gets stored in the ‘logindetails’ table
in the server. The structure of the table and the mysql query to execute that code is as shown below.
The structure of the login details table
MySql Query
CREATE TABLE IF NOT EXISTS `logindetails` (
`Username` varchar(30) NOT NULL,
`Password` varchar(30) NOT NULL,
`Name` varchar(30) NOT NULL,
`Surname` varchar(30) NOT NULL,
`Country` varchar(30) NOT NULL,
`Server` varchar(30) NOT NULL,
`SMTPServer` varchar(30) NOT NULL,
`Phoneno` varchar(30) NOT NULL,
`messagecount` int(11) NOT NULL,
`classifiedcount` int(11) NOT NULL,
PRIMARY KEY (`Username`),
UNIQUE KEY `Phoneno` (`Phoneno`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Once the user has signed up, the following are the two table structures that are created for the user.
The structure of the main table where all the mails are stored
27
This table contains the information regarding the mail getting downloaded. Who was the message
received from, what is the subject of the mail, the content of the mail, the two algorithms which are to
be implemented, the date and time, and a keyword column, where the keyword/s associated with that
mail is/are stored.
MySql Query
CREATE TABLE IF NOT EXISTS `username` (
`From` text NOT NULL,
`Subject` text NOT NULL,
`Content` longtext NOT NULL,
`Naivebayes` text NOT NULL,
`C45` text NOT NULL,
`Day` varchar(3) NOT NULL,
`Month` varchar(3) NOT NULL,
`Date` int(2) NOT NULL,
`Year` int(4) NOT NULL,
`Time` int(2) NOT NULL,
`Keyword` text NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The structure of the keyword table where all the keywords are stored
MySql Query
CREATE TABLE IF NOT EXISTS `username_keyword` (
`Keyword` text NOT NULL,
`Count` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
28
5.2. The Email Client Window
The Email Client window is the major window in the application. The major functionalities which are
to be implemented are a part of the Email Client Window. The Email Client is entirely divided into 6
different parts, and each of these 6 parts is represented by 6 tabs which are present on the top of the
Email Client. All the operations which are to be performed can be performed only with the Email
Client.
The Entire Email Client is comprised of the following 6 tabs.
The Welcome Tab
The welcome tab is the basic homepage where the user can view all the basic information, like how
many mails have been downloaded, how many are unread.
The Main Page
It is here that the user performs all the necessary operations, with respect to the client application.
The user executes Naïve Bayes, and C4.5 classification algorithms, as well as can search for specific
user defined keywords.
The Message Viewer
The user can view all his mails on the basis of the conditions that have been specified in this window,
the message viewer helps the user read his mails, as per his preference.
The Statistics Window
The statistics window showcases graphical and historical analysis on the information that is made
available to him from previously fetched data.
The Messaging Window
The user can send a message to another user, from the desktop application to a particular user’s Email
Account.
Credits
Information regarding the developers is present in this window; also a feedback form has been
developed so that the user can send feedbacks regarding his experience with the application.
29
THE WELCOME TAB
(See Screenshot 2)
The screenshot as shown above clearly shows, a graphical display as to how many mails the user has
received which are read, and the total number of mails the user has received and is unread. The red
portion in the pie chart represents the total amount of unread mail which the user is currently having
in his mailbox. The refresh button allows the user to refresh his mailbox, as to retrieve those mails
which haven’t been retrieved yet. This happens on the execution of the connect method which is
executed by clicking on connect from the connect dialog box.
The Connect Method
final ConnectDialog dialog = new ConnectDialog(this);
dialog.show();
username=dialog.getUsername();
password=dialog.getPassword();
final DownloadingDialog downloadingDialog =
new DownloadingDialog(this);
SwingUtilities.invokeLater(new Runnable() {
public void run() {
downloadingDialog.show();
}
});
//Establish JavaMail session and connect to server.
Store store = null;
try {
//Initialize JavaMail session with SMTP server.
Properties props = new Properties();
props.setProperty("mail.store.protocol", "imaps");
props.put("mail.smtp.host","smtp.gmail.com");
props.put("mail.smtp.starttls.enable","true");
props.put("mail.smtp.auth", "true");
session = Session.getInstance(props,
new javax.mail.Authenticator() {
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(dialog.getUsername(),dialog.getPassword());
}
});
store = session.getStore("imaps");
store.connect("imap.gmail.com",dialog.getUsername(),dialog.getPassword());
} catch (Exception e) {
//Close the downloading dialog.
downloadingDialog.dispose();
//Show error dialog.
showError("Unable to connect.", true);
}
//Download message headers from server.
try {
int j=0;
//Open main "INBOX" folder.
Folder folder = store.getFolder("INBOX");
folder.open(Folder.READ_WRITE);
30
Message msg[] = folder.getMessages();
FlagTerm ft = new FlagTerm(new Flags(Flags.Flag.SEEN), false);
Message msg1[] = folder.search(ft);
System.out.println("UNREAD MAILS: "+msg1.length);
System.out.println("MAILS: "+msg.length);
DefaultPieDataset pieDataset=new DefaultPieDataset();
pieDataset.setValue("Unread Mail",msg1.length);
pieDataset.setValue("Read Mail",(msg.length-msg1.length));
JFreeChart chart= ChartFactory.createPieChart("Mail
Stats",pieDataset,true,true,true);
jPanel12.setLayout(new java.awt.BorderLayout());
ChartPanel panelpie1 =new ChartPanel(chart);
jPanel12.removeAll();
jPanel12.add(panelpie1,BorderLayout.CENTER);
jPanel12.validate();
for(Message message:msg) {
try {
String sentdate=message.getSentDate().toString();
String getfrom=message.getFrom()[0].toString();
String getsubject=message.getSubject().toString();
String content;
day=sentdate.substring(0,3);
month=sentdate.substring(4,7);
date=sentdate.substring(8,10);
year=sentdate.substring(24,28);
time=sentdate.substring(11,13);
System.out.println("**********************************");
if (message.getContent() instanceof Multipart) {
StringBuffer messageContent = new StringBuffer();
Multipart multipart = (Multipart) message.getContent();;
for (int i = 0; i < multipart.getCount(); i++) {
Part part = (Part) multipart.getBodyPart(i);
if (part.isMimeType("text/plain")) {
messageContent.append(part.getContent().toString());
}
}
content=messageContent.toString();
} else {
content=message.getContent().toString();
}
try
{
String sql="INSERT INTO `username` (`From`,`Subject`,
`Content`,`Naivebayes`,`C45`,`Day`,`Month`,`Date`,`Year`,`Time`,`Keyword`) VALUES
(?,?,?,?,?,?,?,?,?,?,?);";
PreparedStatement pstmt = conn.prepareStatement(sql);
pstmt.setString(1, getfrom);
pstmt.setString(2, getsubject);
pstmt.setString(3, content);
pstmt.setString(4,"aa");
pstmt.setString(5,"aa");
pstmt.setString(6,day);
pstmt.setString(7,month);
pstmt.setString(8,date);
pstmt.setString(9,year);
pstmt.setString(10,time);
pstmt.setString(11,"aa");
pstmt.executeUpdate();
}
31
catch(Exception e)
{
System.out.println("there is an exception");
System.out.println(e);
}
}
catch (Exception e) {
System.out.println("No Information");
}
Message[] messages = folder.getMessages();
//Retrieve message headers for each message in folder.
FetchProfile profile = new FetchProfile();
profile.add(FetchProfile.Item.ENVELOPE);
folder.fetch(messages, profile);
}
} catch (Exception e) {
// Close the downloading dialog.
downloadingDialog.dispose();
// Show error dialog.
showError("Unable to download messages.", true);
}
// Close the downloading dialog.
downloadingDialog.dispose();
}
THE MAIN PAGE
The main page is the window where major classification operations are being performed. There are
two algorithms that are being used, Naïve Bayes and C4.5.
(See Screenshot 3)
The classification is being performed using the training dataset which is imported and then various
operations with respect to it are being performed by the user.
Training dataset creation
private void createTrainingSet(String dataset) throws Exception
{
emailMessage = new Attribute("emailMessage", (FastVector) null);
emailClass = new FastVector(3);
emailClass.addElement("spam");
emailClass.addElement("no spam");
emailClass.addElement("?");
eClass = new Attribute("emailClass", emailClass);
records = new FastVector(2);
records.addElement(eClass);
records.addElement(emailMessage);
trainingSet = new Instances("SpamClsfyTraining", records, 40);
trainingSet.setClassIndex(0);
this.readTrainingDataset(dataset);
ArffSaver saver = new ArffSaver();
saver.setInstances(trainingSet);
saver.setFile(new File("C:Akshaytraining.arff"));
saver.writeBatch();
32
}
Classification Implementation
private void performClassification(Object model, String modelName) throws
Exception
{
System.out.println("**==" + modelName + "==**");
StringToWordVector stringToVector = new StringToWordVector(1000);
stringToVector.setInputFormat(trainingSet);
stringToVector.setOutputWordCounts(true);
stringToVector.setUseStoplist(false);
Instances filteredData = Filter.useFilter(trainingSet, stringToVector);
Instances filteredTestData = Filter.useFilter(testingSet,stringToVector);
Classifier cModel = (Classifier) model;
cModel.buildClassifier(filteredData);
Evaluation eTest = new Evaluation(filteredTestData);
eTest.evaluateModel(cModel, filteredTestData);
double m=eTest.correct();
int x=(int)m;
System.out.println(x);
if(x==1)
{
if(nb==1)
{
System.out.println("Naive Bayes Spam");
}
if(c==1)
{
System.out.println("C4.5 Spam");
}
}
else
{
if(nb==1)
{
System.out.println("Naive Bayes Non Spam");
}
if(c==1)
{
System.out.println("C4.5 Non Spam");
}
}
}
There is also a keyword based search feature which has been implemented in which the user specified
keyword is being searched by the application.
Keyword Search
private void searchActionPerformed(java.awt.event.ActionEvent evt) {
if(keyword.getText().equals(""))
{
System.out.println("lol");
JOptionPane.showMessageDialog(new JFrame(),"Please Enter The Keyword", "Error",
JOptionPane.ERROR_MESSAGE);
}
33
else
{
try
{
String sql1="INSERT INTO `username_keyword` (`Keyword`) VALUES (?);";
PreparedStatement pstmt = conn.prepareStatement(sql1);
pstmt.setString(1,keyword.getText());
pstmt.executeUpdate();
pst1=conn.prepareStatement("SELECT * FROM `username_keyword`");
rs1=pst1.executeQuery();
keywordviewer.setModel(DbUtils.resultSetToTableModel(rs1));
pst=conn.prepareStatement("SELECT * FROM `username`");
rs=pst.executeQuery();
int i=1;
while(rs.next())
{
String subject = rs.getString("Subject");
String content = rs.getString("Content");
String pastkeywordlist = rs.getString("Keyword");
String newkeyword;
if(pastkeywordlist.equals(""))
{
newkeyword=keyword.getText();
}
else
{
newkeyword=pastkeywordlist + "," + keyword.getText();
}
System.out.println(EmailFiltering.containtsKeyWord(subject, content,
keyword.getText()));
if(EmailFiltering.containtsKeyWord(subject, content, keyword.getText()))
{
String sql="UPDATE `username` SET `keyword` = ? WHERE `Subject` = ? AND `Content`
= ?";
PreparedStatement pstmt1=conn.prepareStatement(sql);
pstmt1.setString(1,newkeyword);
pstmt1.setString(2,subject);
pstmt1.setString(3,content);
pstmt1.executeUpdate();
}
}
}
catch(Exception e)
{
System.out.println(e);
}
}
FillCombo();
}
THE MESSAGE VIEWER
The message viewer enables the user to view all the information on the basis of segregation which has
been performed by the classification algorithms that are executed by the user. The message viewer
also has a feature where the keyword can be recognised and all the necessary files can be created with
respect to that feature to be implemented.
34
There are two additional buttons which have been provided; one is to store the particular file in a
specific location which is defined by the user. The other feature is to copy all the message contents to
the clipboard.
(See Screenshot 4)
View Messages on the basis of Classification
private void update_table()
{
try
{
String cv,sb;
cv=columnvalue.getSelectedItem().toString();
sb=spambox.getSelectedItem().toString();
System.out.println("SELECT `From`,`Subject` FROM `username` WHERE
`naivebayes`='spam'");
pst=conn.prepareStatement("SELECT `From`,`Subject` FROM `username` WHERE
`"+cv+"`='"+sb+"'");
rs=pst.executeQuery();
messageviewer.setModel(DbUtils.resultSetToTableModel(rs));
}
catch(Exception e)
{
System.out.println(e);
}}
View Messages on the basis of Keywords
private void keywordbuttonActionPerformed(java.awt.event.ActionEvent evt) {
String keywordvt=keywordcombobox.getSelectedItem().toString();
System.out.println(keywordvt);
try
{
String sql="SELECT * FROM `username`";
pst=conn.prepareStatement(sql);
rs=pst.executeQuery();
while(rs.next())
{
String keywordtb=rs.getString("Keyword");
System.out.println(keywordtb);
System.out.println(EmailFiltering.containsKeyWord(keywordtb,keywordvt));
if(EmailFiltering.containsKeyWord(keywordtb,keywordvt))
{
pst1=conn.prepareStatement("SELECT `From`,`Subject` FROM `username` WHERE
`Keyword`='"+keywordtb+"'");
rs1=pst1.executeQuery();
messageviewer.setModel(DbUtils.resultSetToTableModel(rs1));
//pst.close();
}
System.out.println();
}}
catch(Exception e)
{
System.out.println(e);
}}
35
Store the particular text file in a specific location
(See Screenshot 5)
Store in PC
private void savepcActionPerformed(java.awt.event.ActionEvent evt) {
System.out.println("Working");
final FileChooser filec=new FileChooser(this,true);
int result = FileChooser.jFileChooser2.showSaveDialog(this);
if (result == FileChooser.jFileChooser2.APPROVE_OPTION) {
String
path=FileChooser.jFileChooser2.getSelectedFile().getAbsoluteFile().toString();
try
{FileWriter writer=new FileWriter(path);
PrintWriter outputStream=new PrintWriter(path);
String content=EmailFiltering.jTextArea1.getText();
outputStream.println(content);
outputStream.close();}
catch(Exception e)
{}
} else if (result == FileChooser.jFileChooser2.CANCEL_OPTION) {
System.out.println("Cancel was selected");
}
FileChooser.jFileChooser2.setVisible(false);
}
Copy Text
private void copytextActionPerformed(java.awt.event.ActionEvent evt) {
String name= jTextArea1.getText();
StringSelection stringSelection = new StringSelection(name);
Clipboard clipboard = Toolkit.getDefaultToolkit().getSystemClipboard();
clipboard.setContents(stringSelection,null);
}
THE MESSAGING TAB
This tab helps the user to send mails, via the desktop application itself. The user can also select a
particular message and forward that message to any user. The user can also reply to a mail which he
has received. All these features have been implemented with the help of the Message Dialog box.
(See Screenshot 6)
Send Message
private void sendMessage(String to,String Subject,String Content) {
MessageDialog dialog=new MessageDialog(this,true);
dialog.totextbox.setText(to);
dialog.subjecttextbox.setText(Subject);
dialog.contenttextbox.setText(Content);
dialog.setVisible(true);
try {
36
Message newMessage = new MimeMessage(session);
newMessage.setFrom(new InternetAddress(dialog.fromtextbox.getText()));
System.out.println("Line 1");
newMessage.setRecipient(Message.RecipientType.TO,
new InternetAddress(dialog.totextbox.getText()));
System.out.println("Line 2");
newMessage.setSubject(dialog.subjecttextbox.getText());
System.out.println("Line 3");
newMessage.setSentDate(new Date());
System.out.println("Line 4");
newMessage.setText(dialog.contenttextbox.getText());
System.out.println("Line 5");
Transport.send(newMessage);
System.out.println("Done");
dialog.setVisible(false);
} catch (Exception e) {
System.out.println(e);
showError("Unable to send message", false);
}
}
(See Screenshot 7)
Function:
private void actionNew() {
int row=messagereader.getSelectedRow();
String messagesubject="";
String messageto="";
String messagecontent="";
sendMessage(messageto,messagesubject,messagecontent);
}
(See Screenshot 8)
Function:
private void actionReply() {
int row=messagereader.getSelectedRow();
String messagesubject=(messagereader.getModel().getValueAt(row,1).toString());
String messageto="";
String messagecontent="";
String replycontent1= " ---------------- +n" +
" REPLY MESSAGE +n" +
" ----------------- +n";
String replycontent;
String replysubject="RE:"+messagesubject;
String sql="select `From`,`Content` from `ourbeproject2014` where
subject='"+messagesubject+"' ";
try
{
pst=conn.prepareStatement(sql);
rs=pst.executeQuery();
while(rs.next())
{
37
messageto=rs.getString("From");
messagecontent=rs.getString("Content");
replycontent=replycontent1+messagecontent;
sendMessage(messageto,replysubject,replycontent);
break;
}
}
catch(Exception e)
{
System.out.println(e);
}
}
(See Screenshot 9)
Function:
private void actionForward() {
int row=messagereader.getSelectedRow();
String messagesubject=(messagereader.getModel().getValueAt(row,1).toString());
String messageto="";
String messagecontent="";
String forwardcontent1=" ----------------- +n" +
" FORWARDED MESSAGE +n" +
" ----------------- +n";
String forwardcontent;
String sql="select `From`,`Content` from `ourbeproject2014` where
subject='"+messagesubject+"' ";
try
{
pst=conn.prepareStatement(sql);
rs=pst.executeQuery();
while(rs.next())
{
messagecontent=rs.getString("Content");
forwardcontent=forwardcontent1+messagecontent;
sendMessage(messageto,messagesubject,forwardcontent);
break;
}
}
catch(Exception e)
{
System.out.println(e);
}
}
CREDITS
(See Screenshot 10)
The user can send a feedback as to how the user felt regarding the application.
38
5.3. The Message Dialog Box
The message dialog box is the dialog box which is being used to send a new mail, reply to an already
existing mail, or to forward a mail. Various code snippets have been combined with this particular
box and hence it plays an important role in the functionality of the project.
(See Screenshot 11)
MessageDialog.java
package emailfiltering;
public class MessageDialog extends javax.swing.JDialog {
public MessageDialog(java.awt.Frame parent, boolean modal) {
super(parent, modal);
initComponents();
}
private void totextboxActionPerformed(java.awt.event.ActionEvent evt) {
}
private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) {
dispose();
}
public static javax.swing.JTextArea contenttextbox;
public static javax.swing.JTextField fromtextbox;
private javax.swing.JButton jButton1;
private javax.swing.JLabel jLabel1;
private javax.swing.JLabel jLabel2;
private javax.swing.JLabel jLabel3;
private javax.swing.JScrollPane jScrollPane1;
public static javax.swing.JTextField subjecttextbox;
public javax.swing.JTextField totextbox;
// End of variables declaration
}
39
5.4. The File Chooser
The file chooser is an inbuilt feature in java which has been included so that the user can trace the
path to a particular location in order to save the file.
(See Screenshot 12)
FileChooser.java
package emailfiltering;
public class FileChooser extends javax.swing.JDialog {
public FileChooser(java.awt.Frame parent, boolean modal) {
super(parent, modal);
initComponents();
}
private void jFileChooser2ActionPerformed(java.awt.event.ActionEvent evt) {
}
public static void main(String args[]) {
java.awt.EventQueue.invokeLater(new Runnable() {
public void run() {
FileChooser dialog = new FileChooser(new javax.swing.JFrame(), true);
dialog.addWindowListener(new java.awt.event.WindowAdapter() {
@Override
public void windowClosing(java.awt.event.WindowEvent e) {
System.exit(0);
}
});
dialog.setVisible(true);
}
});}
public static javax.swing.JFileChooser jFileChooser2;
}
40
5.5. The Downloading Dialog
The downloading dialog is a dialogue that appears whenever the mails are being downloaded from the
server. It appears when the Connect button is clicked from the connect dialog box and continues till
the mails are being fetched by the user.
(See Screenshot 13)
DownloadingDialog.java
package emailfiltering;
import java.awt.*;
import javax.swing.*;
public class DownloadingDialog extends JDialog {
public DownloadingDialog(Frame parent) {
// Call super constructor, specifying that dialog is modal.
super(parent, true);
// Set dialog title.
setTitle("E-mail Client");
// Instruct window not to close when the "X" is clicked.
setDefaultCloseOperation(DO_NOTHING_ON_CLOSE);
// Put a message with a nice border in this dialog.
JPanel contentPane = new JPanel();
contentPane.setBorder(
BorderFactory.createEmptyBorder(5, 5, 5, 5));
contentPane.add(new JLabel("Downloading messages..."));
setContentPane(contentPane);
// Size dialog to components.
pack();
// Center dialog over application.
setLocationRelativeTo(parent);
}
@SuppressWarnings("unchecked")
// <editor-fold defaultstate="collapsed" desc="Generated Code">
}
41
5.6. Analysis Window
THE STATISTICS WINDOW
(See Screenshot 14)
The statistics window is extremely useful in achieving historical analysis of mails, as to how much
amount of spam and non spam has been received over the past few years.
Annual Statistics
The annual statistics generate statistics from 2007 to 2017 and showcase how many mails have been
received each year, how many of them are spam, and how many of them are non spam.
(See Screenshot 15)
Function:
private void annuallyActionPerformed(java.awt.event.ActionEvent evt) {
DefaultCategoryDataset datasetyearly = new DefaultCategoryDataset();
int year=2007;
while(year<=2017)
{
System.out.println(year);
try
{
pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE
NAIVEBAYES='spam' AND YEAR='"+year+"'");
rs=pst.executeQuery();
int spamyearcount;
String yearvalue=Integer.toString(year);
while(rs.next())
{
spamyearcount=rs.getInt("count");
System.out.println(spamyearcount);
datasetyearly.addValue(spamyearcount,"Spam",yearvalue);
}
}
catch(Exception e)
{
System.out.println(e);
}
year=year+1;
}
year=2007;
while(year<=2017)
{
System.out.println(year);
42
try
{
pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE
NAIVEBAYES='nonspam' AND YEAR='"+year+"'");
rs=pst.executeQuery();
int nonspamyearcount;
String yearvalue=Integer.toString(year);
while(rs.next())
{
nonspamyearcount=rs.getInt("count");
System.out.println(nonspamyearcount);
datasetyearly.addValue(nonspamyearcount,"Non Spam",yearvalue);
}
}
catch(Exception e)
{
System.out.println(e);
}
year=year+1;
}
JFreeChart stackedChart = ChartFactory.createStackedBarChart("Annual Spam Rate
Report", "Year", "Mail",datasetyearly, PlotOrientation.VERTICAL, true, true,
false);
CategoryPlot barchrt=stackedChart.getCategoryPlot();
setResizable(false);
barchrt.setRangeGridlinePaint(Color.BLACK);
jPanel13.setLayout(new java.awt.BorderLayout());
ChartPanel panelpie =new ChartPanel(stackedChart);
jPanel13.removeAll();
jPanel13.add(panelpie,BorderLayout.CENTER);
jPanel13.validate();
}
Monthly Statistics
The yearly statistics which are being developed can be further viewed monthly. The user needs to
specify the year during which he would like to perform Analysis and on the basis of that the user can
understand the amount of spam mails that are being fetched and are being stored by the user.
The Monthly Statistics can be viewed from the month of January and it continues till the month of
December.
All the months have been specified
(See Screenshot 16)
Function:
private void monthlyActionPerformed(java.awt.event.ActionEvent evt) {
DefaultCategoryDataset datasetmonthly = new DefaultCategoryDataset();
String my;
String[] month = new String[] {"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
"Aug", "Sep", "Oct", "Nov", "Dec"};
my=monthyear.getSelectedItem().toString();
System.out.println(my);
int i=0;
43
while(i<month.length)
{
System.out.println(month[i]);
try
{
pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE
MONTH='"+month[i]+"' AND NAIVEBAYES='spam' AND YEAR='"+my+"'");
rs=pst.executeQuery();
int nonspammonthcount;
while(rs.next())
{
nonspammonthcount=rs.getInt("count");
System.out.println(nonspammonthcount);
datasetmonthly.addValue(nonspammonthcount,"Spam",month[i]);
}
rs.close();
}
catch(Exception e)
{
System.out.println(e);
}
i++;
}
i=0;
while(i<month.length)
{
System.out.println(month[i]);
try
{
pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE
MONTH='"+month[i]+"' AND NAIVEBAYES='nonspam' AND YEAR='"+my+"'");
rs=pst.executeQuery();
int nonspammonthcount;
while(rs.next())
{
nonspammonthcount=rs.getInt("count");
System.out.println(nonspammonthcount);
datasetmonthly.addValue(nonspammonthcount,"Non Spam",month[i]);
}
rs.close();
}
catch(Exception e)
{
System.out.println(e);
}
i++;
}
JFreeChart stackedChart = ChartFactory.createStackedBarChart("Monthly Spam Rate
Report", "Month", "Mails",
datasetmonthly, PlotOrientation.VERTICAL, true, true, false);
CategoryPlot barchrt=stackedChart.getCategoryPlot();
setResizable(false);
barchrt.setRangeGridlinePaint(Color.BLACK);
jPanel13.setLayout(new java.awt.BorderLayout());
ChartPanel panelpie =new ChartPanel(stackedChart);
jPanel13.removeAll();
jPanel13.add(panelpie,BorderLayout.CENTER);
jPanel13.validate();
}
44
Weekly Statistics
The monthly statistics which are being developed can be further viewed weekly. The user needs to
specify the year during which he would like to perform Analysis and on the basis of that the user can
understand the amount of spam mails that are being fetched and are being stored by the user.
The Weekly Statistics can be viewed in spans of 4 weeks
All the weeks have been specified
Week 1: 1-7
Week 2: 8-14
Week 3: 15-21
Week 4: 22-31
(See Screenshot 17)
Function:
private void weeklyActionPerformed(java.awt.event.ActionEvent evt) {
DefaultCategoryDataset datasetweekly = new DefaultCategoryDataset();
int w1=0,w2=0,w3=0,w4=0;
String wm,wy;
wm=weekmonth.getSelectedItem().toString();
wy=weekyear.getSelectedItem().toString();
System.out.println(wm);
System.out.println(wy);
int i=1;
while(i<=31)
{
try
{
pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE
NAIVEBAYES='spam' AND DATE='"+i+"' AND MONTH='"+wm+"'AND YEAR='"+wy+"'");
rs=pst.executeQuery();
int spamweekcount;
while(rs.next())
{
spamweekcount=rs.getInt("count");
if(i>=1 && i<8)
{ w1=w1+spamweekcount; }
if(i>=8 && i<15)
{ w2=w2+spamweekcount; }
if(i>=15 && i<22)
{ w3=w3+spamweekcount; }
if(i>=22 && i<31)
{ w4=w4+spamweekcount; }
}
rs.close();
}
catch(Exception e)
{
System.out.println(e);
}
i++;
45
}
datasetweekly.addValue(w1, "Spam","Week1");
datasetweekly.addValue(w2, "Spam","Week2");
datasetweekly.addValue(w3, "Spam","Week3");
datasetweekly.addValue(w4, "Spam","Week4");
i=0;
w1=0;w2=0;w3=0;w4=0;
while(i<=31)
{
try
{
pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE
NAIVEBAYES='nonspam' AND DATE='"+i+"' AND MONTH='"+wm+"'AND YEAR='"+wy+"'");
rs=pst.executeQuery();
int nonspamweekcount;
while(rs.next())
{
nonspamweekcount=rs.getInt("count");
if(i>=1 && i<8)
{ w1=w1+nonspamweekcount; }
if(i>=8 && i<15)
{ w2=w2+nonspamweekcount; }
if(i>=15 && i<22)
{ w3=w3+nonspamweekcount; }
if(i>=22 && i<31)
{ w4=w4+nonspamweekcount; }
}
rs.close();
}
catch(Exception e)
{
System.out.println(e);
}
i++;
}
datasetweekly.addValue(w1, "Non Spam","Week1");
datasetweekly.addValue(w2, "Non Spam","Week2");
datasetweekly.addValue(w3, "Non Spam","Week3");
datasetweekly.addValue(w4, "Non Spam","Week4");
JFreeChart stackedChart = ChartFactory.createStackedBarChart("Weekly Spam Rate
Report",wm+","+wy, "Messages",
datasetweekly, PlotOrientation.VERTICAL, true, true, false);
CategoryPlot barchrt=stackedChart.getCategoryPlot();
barchrt.setRangeGridlinePaint(Color.RED);
setResizable(false);
jPanel13.setLayout(new java.awt.BorderLayout());
ChartPanel panelpie =new ChartPanel(stackedChart);
jPanel13.removeAll();
jPanel13.add(panelpie,BorderLayout.CENTER);
jPanel13.validate();
}
Comparative Analysis:
This method shows a comparison between Naïve Bayes and C4.5 and tells the user, which algorithm
is better in catching Spam.
46
(See Screenshot 18)
Function:
private void comparativeActionPerformed(java.awt.event.ActionEvent evt) {
DefaultCategoryDataset datasetcomparative = new DefaultCategoryDataset();
try
{
pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE
NAIVEBAYES='spam'");
pst1=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE
C45='spam'");
rs=pst.executeQuery();
int c45spamcount;
int naivebayesspamcount;
while(rs.next())
{
c45spamcount=rs.getInt("count");
System.out.println(c45spamcount);
datasetcomparative.addValue(c45spamcount,"Spam","C45");
}
rs1=pst1.executeQuery();
while(rs1.next())
{
naivebayesspamcount=rs1.getInt("count");
System.out.println(naivebayesspamcount);
datasetcomparative.addValue(naivebayesspamcount,"Spam","Naive Bayes");
}
}
catch(Exception e)
{
System.out.println(e);
}
try
{
pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE
NAIVEBAYES='nonspam'");
pst1=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE
C45='nonspam'");
rs=pst.executeQuery();
rs1=pst1.executeQuery();
int c45nonspamcount;
int naivebayesnonspamcount;
while(rs.next())
{
c45nonspamcount=rs.getInt("count");
System.out.println(c45nonspamcount);
datasetcomparative.addValue(c45nonspamcount,"Non Spam","C45");
}
rs1=pst1.executeQuery();
while(rs1.next())
{
naivebayesnonspamcount=rs1.getInt("count");
System.out.println(naivebayesnonspamcount);
datasetcomparative.addValue(naivebayesnonspamcount,"Non Spam","Naive Bayes");
}
}
catch(Exception e)
47
{
System.out.println(e);
}
JFreeChart stackedChart = ChartFactory.createStackedBarChart("Comparative Spam
Rate Report", "Algorithm", "Spam/NonSpam",datasetcomparative,
PlotOrientation.VERTICAL, true, true, false);
CategoryPlot barchrt=stackedChart.getCategoryPlot();
setResizable(false);
barchrt.setRangeGridlinePaint(Color.BLACK);
jPanel13.setLayout(new java.awt.BorderLayout());
ChartPanel panelpie =new ChartPanel(stackedChart);
jPanel13.removeAll();
jPanel13.add(panelpie,BorderLayout.CENTER);
jPanel13.validate();
}
User Defined
This feature shows a comparison amongst the mails, which have been distinguished based on the
keywords which have been specified by the user.
This just helps the user in understanding which mails the user has received number of times.
(See Screenshot 19)
Function:
private void userdefinedActionPerformed(java.awt.event.ActionEvent evt) {
DefaultCategoryDataset barChartData=new DefaultCategoryDataset();
String sql="SELECT * FROM `username_keyword`";
try
{
pst=conn.prepareStatement(sql);
rs=pst.executeQuery();
while(rs.next())
barChartData.setValue(rs.getInt("Count"),"Messages",rs.getString("Keyword"));
}
catch(Exception e)
{
System.out.println(e);
}
JFreeChart barChart=ChartFactory.createBarChart("User Preference Messages
Quantity","Keyword","Message", barChartData, PlotOrientation.VERTICAL,
rootPaneCheckingEnabled, rootPaneCheckingEnabled, rootPaneCheckingEnabled);
CategoryPlot barchrt=barChart.getCategoryPlot();
barchrt.setRangeGridlinePaint(Color.ORANGE);
jPanel13.setLayout(new java.awt.BorderLayout());
setResizable(false);
ChartPanel panelpie =new ChartPanel(barChart);
jPanel13.removeAll();
jPanel13.add(panelpie,BorderLayout.CENTER);
jPanel13.validate();}
48
CHAPTER 6
RESULTS
49
Results
SCREENSHOTS:
Screenshot 1: A screenshot of the connect dialog window.
Screenshot 2: A screenshot of the homescreen which opens once the user is logging in
50
Screenshot 3: A Screenshot of the Main Page where all operations can be performed
Screenshot 4: A Screenshot of the message viewer tab
51
Screenshot 5: The Save Dialog Box Appears when store in PC has been clicked
Screenshot 6: A Screenshot of the Messaging Tab
52
Screenshot 7: A Screenshot of New Message box
Screenshot 8: A Screenshot of Reply Message box
Screenshot 9: A Screenshot of Forward Message Box
53
Screenshot 10: A screenshot of the credits page
Screenshot 11: The Message Dialog
54
Screenshot 12:The File Chooser
Screenshot 13: The Downloading Dialog
55
Analysis
Screenshot 14: A screenshot of the Statistics tab
Screenshot 15: The Annual Spam Rate Report
56
Screenshot 16: The Monthly Spam Rate Report
Screenshot 17: The Weekly Spam Rate Report
57
Screenshot 18: Comparative Spam Rate Report
Screenshot 19: User Defined Messages Quantity
58
Comparison of Parameters
Parameter Naïve Bayes C4.5
True Positive 19 19
False Positive 0 1
True Negative 20 19
False Negative 1 1
True Positive Rate 0.95 0.95
False Positive Rate 0 0.05
True Negative Rate 1 0.95
False Negative Rate 0.05 0.5
Precision 1 0.95
Recall 0.95 0.95
F-Measure 0.974 0.95
Total Number of Mails Considered: 40
59
CHAPTER 7
CONCLUSION
60
Conclusion
Considering the necessity of E-Mail in an individual’s life, the need of classifying the messages is of
utmost importance and it is necessary to be achieved. With the employment of various Spam Filtering
techniques, and various classification algorithms, it is extremely easy to classify the information into
various categories. Hence, E-Mail filtering classification and analysis using data mining approach has
been achieved successfully.
61
CHAPTER 8
FUTURE SCOPE
62
Future Scope
Cloud Based Email Archiving System
The concept of cloud based email archiving is pretty simple. Broadly put, a service provider typically
processes, manages and stores your business data in a hosted server and at a remote place either as a
substitute or typically as an enhancement to your on premise infrastructure.
Research reveals that cloud-based email archiving service is becoming rather popular over time with
prominent growth in the number of corporate users served by this cloud based archival model.
An email spam filter service on the cloud thus offers an array of significant benefits, which includes:
1. It’s rather predictable cost of ownership.
2. Its ability in letting the specialist providers manage tall those key email and related functions.
3. Its capability of freeing up the IT staff for other initiatives.
4. A paradigm shift from capital expenditure (CAPEX) to the operating expenditure (OPEX)
model.
5. Ease and convenience of managing the IT services.
6. Comprehensive and thorough E DISCOVERY solution.
7. Reduced chance of virus, spam and malware attacks.
8. Inbound and outbound Email filtering
9. Agile E-mail accessibility.
The concept of email storage on the cloud has been in use by the large corporate for many years. The
scope and future of cloud based email archiving system thus looks extremely bright and is popular for
services which ranges from email archiving to retrieval and spam filtration.
Encrypted message based E-Mail Classification
This is an application which will enable the user to fetch messages from the server and perform
classification on the message on the basis of various encryption algorithms.
The E-Mail application will consists of various encryption/decryption algorithms such as:
1. AES.
2. DES.
3. Additive Cipher.
4. Huffman’s Algorithm.
5. RSA Algorithm.
On the basis of the information obtained, the application will decrypt the text obtained from the E-
mail server and execute all the algorithms. On the basis of the result obtained, the best solution will be
selected amongst all the decrypted texts. If however, the algorithm fails to decrypt the text, then the
63
message will be passed as non-encrypted text and further filtering according to the categories will take
place.
An Android Based Application for accessing Emails
An Android Based Application can be created in order to access and bring about the classification of
emails. This will enable the user to access his E-Mails from any location. We could make use of the
same server to bring about accessing and storage of mails. Also, we can bring about the more user
friendliness with the help of this application.
Location based Analysis of Spam Rate
Location based Analysis of Spam can be a really good feature that can be implemented in the future.
We can take the location information from the user, or retrieve the location information from the
email account of the user, and classify if that particular Email is spam or not. With location based
analysis we can find out which country has maximum spam concentration. This can be graphically
displayed using Google Maps and Java maps in our application.
64
CHAPTER 9
REFERENCES
65
References
1. Data Mining: Concepts and Techniques
Jiawei Han (Author), Micheline Kamber (Author), Jian Pei (Author)
2. Videos on Java Swing programming by ‘Programming Knowledge` on www.youtube.com
3. Sun Certified Java Programming
Kathy Sierra and Bert Bates
4. http://en.wikipedia.org/wiki/Naive_Bayes_classifier
5. http://en.wikipedia.org/wiki/C4.5
6. http://arxiv.org/pdf/cs/0006013.pdf
7. http://www.jfree.org/jfreechart/samples.html

Contenu connexe

Tendances

Mcsp 060 project guidelines july 2012
Mcsp 060 project guidelines july 2012Mcsp 060 project guidelines july 2012
Mcsp 060 project guidelines july 2012
Abhishek Verma
 
MOHAMMAD JASIM UDDIN CV{OK} - Copy
MOHAMMAD JASIM UDDIN CV{OK} - CopyMOHAMMAD JASIM UDDIN CV{OK} - Copy
MOHAMMAD JASIM UDDIN CV{OK} - Copy
Mohammad Jasim Uddin
 
A privacy learning objects identity system for smartphones based on a virtu...
A privacy   learning objects identity system for smartphones based on a virtu...A privacy   learning objects identity system for smartphones based on a virtu...
A privacy learning objects identity system for smartphones based on a virtu...
ijcsit
 

Tendances (19)

LABRARY MANAGEMENT SYSTEM By ARPIT TRIPATHI
LABRARY MANAGEMENT SYSTEM By ARPIT TRIPATHILABRARY MANAGEMENT SYSTEM By ARPIT TRIPATHI
LABRARY MANAGEMENT SYSTEM By ARPIT TRIPATHI
 
Srs of skype
Srs of skypeSrs of skype
Srs of skype
 
Mcsp 060 project guidelines july 2012
Mcsp 060 project guidelines july 2012Mcsp 060 project guidelines july 2012
Mcsp 060 project guidelines july 2012
 
Phone book with project report for BCA,MCA
Phone book with project report for BCA,MCAPhone book with project report for BCA,MCA
Phone book with project report for BCA,MCA
 
IRJET- Question-Answer Text Mining using Machine Learning
IRJET- Question-Answer Text Mining using Machine LearningIRJET- Question-Answer Text Mining using Machine Learning
IRJET- Question-Answer Text Mining using Machine Learning
 
Candidate Ranking and Evaluation System based on Digital Footprints
Candidate Ranking and Evaluation System based on Digital FootprintsCandidate Ranking and Evaluation System based on Digital Footprints
Candidate Ranking and Evaluation System based on Digital Footprints
 
Advanced Question Paper Generator Implemented using Fuzzy Logic
Advanced Question Paper Generator Implemented using Fuzzy LogicAdvanced Question Paper Generator Implemented using Fuzzy Logic
Advanced Question Paper Generator Implemented using Fuzzy Logic
 
Pranavi verma-it 402 class ix-unit 11_presentation
Pranavi verma-it 402 class ix-unit 11_presentationPranavi verma-it 402 class ix-unit 11_presentation
Pranavi verma-it 402 class ix-unit 11_presentation
 
VTU final year project report Main
VTU final year project report MainVTU final year project report Main
VTU final year project report Main
 
MOHAMMAD JASIM UDDIN CV{OK} - Copy
MOHAMMAD JASIM UDDIN CV{OK} - CopyMOHAMMAD JASIM UDDIN CV{OK} - Copy
MOHAMMAD JASIM UDDIN CV{OK} - Copy
 
Connect me 20% presentation
Connect me 20% presentationConnect me 20% presentation
Connect me 20% presentation
 
A c program of Phonebook application
A c program of Phonebook applicationA c program of Phonebook application
A c program of Phonebook application
 
IRJET - College Enquiry Chatbot
IRJET - College Enquiry ChatbotIRJET - College Enquiry Chatbot
IRJET - College Enquiry Chatbot
 
A privacy learning objects identity system for smartphones based on a virtu...
A privacy   learning objects identity system for smartphones based on a virtu...A privacy   learning objects identity system for smartphones based on a virtu...
A privacy learning objects identity system for smartphones based on a virtu...
 
IRJET- Development of College Enquiry Chatbot using Snatchbot
IRJET- Development of College Enquiry Chatbot using SnatchbotIRJET- Development of College Enquiry Chatbot using Snatchbot
IRJET- Development of College Enquiry Chatbot using Snatchbot
 
Dynamic interaction of mobile device and database for
Dynamic interaction of mobile device and database forDynamic interaction of mobile device and database for
Dynamic interaction of mobile device and database for
 
Accessing remote android mobile contents
Accessing remote android mobile contentsAccessing remote android mobile contents
Accessing remote android mobile contents
 
IRJET- College Enquiry Chat-Bot using API.AI
IRJET- College Enquiry Chat-Bot using API.AIIRJET- College Enquiry Chat-Bot using API.AI
IRJET- College Enquiry Chat-Bot using API.AI
 
Ap quiz app
Ap quiz appAp quiz app
Ap quiz app
 

Similaire à Senior Year Project

Phase 1 Documentation (Added System Req)
Phase 1 Documentation (Added System Req)Phase 1 Documentation (Added System Req)
Phase 1 Documentation (Added System Req)
Reinier Eiman
 
Aisha Email System
Aisha Email SystemAisha Email System
Aisha Email System
IOSR Journals
 
Suspicious Email Detection
Suspicious Email DetectionSuspicious Email Detection
Suspicious Email Detection
Suraj Kumar
 
Simple Mail Transfer Protocol
Simple Mail Transfer ProtocolSimple Mail Transfer Protocol
Simple Mail Transfer Protocol
Vinod Gurram
 
Global Freelancer- Course Work
Global Freelancer- Course WorkGlobal Freelancer- Course Work
Global Freelancer- Course Work
Mubarak Jalal
 

Similaire à Senior Year Project (20)

Phase 1 Documentation (Added System Req)
Phase 1 Documentation (Added System Req)Phase 1 Documentation (Added System Req)
Phase 1 Documentation (Added System Req)
 
Voice Based E-Mail System For Blind People Using Speech Recognition Technology
Voice Based E-Mail System For Blind People Using Speech Recognition TechnologyVoice Based E-Mail System For Blind People Using Speech Recognition Technology
Voice Based E-Mail System For Blind People Using Speech Recognition Technology
 
IRJET - Voice based E-Mail for Visually Challenged
IRJET -  	  Voice based E-Mail for Visually ChallengedIRJET -  	  Voice based E-Mail for Visually Challenged
IRJET - Voice based E-Mail for Visually Challenged
 
Library management project
Library management projectLibrary management project
Library management project
 
Aisha Email System
Aisha Email SystemAisha Email System
Aisha Email System
 
Suspicious Email Detection
Suspicious Email DetectionSuspicious Email Detection
Suspicious Email Detection
 
2ND REPORT
2ND REPORT2ND REPORT
2ND REPORT
 
Simple Mail Transfer Protocol
Simple Mail Transfer ProtocolSimple Mail Transfer Protocol
Simple Mail Transfer Protocol
 
Blog
BlogBlog
Blog
 
Students Club
Students ClubStudents Club
Students Club
 
Ignou MCA mini project report
Ignou MCA mini project reportIgnou MCA mini project report
Ignou MCA mini project report
 
osd ncc education assingment l4dc
osd ncc education assingment l4dcosd ncc education assingment l4dc
osd ncc education assingment l4dc
 
Internet mail system java project
Internet mail system java projectInternet mail system java project
Internet mail system java project
 
Heart rate Encapsulation and Response Tool using Sentiment Analysis
Heart rate Encapsulation and Response Tool using Sentiment AnalysisHeart rate Encapsulation and Response Tool using Sentiment Analysis
Heart rate Encapsulation and Response Tool using Sentiment Analysis
 
Mail server_Synopsis
Mail server_SynopsisMail server_Synopsis
Mail server_Synopsis
 
IRJET - Voice based E-Mail for Visually Impaired
IRJET - Voice based E-Mail for Visually ImpairedIRJET - Voice based E-Mail for Visually Impaired
IRJET - Voice based E-Mail for Visually Impaired
 
online news portal system
online news portal systemonline news portal system
online news portal system
 
Kingston University Thesis - Design and Implementation of a Secure Web Applic...
Kingston University Thesis - Design and Implementation of a Secure Web Applic...Kingston University Thesis - Design and Implementation of a Secure Web Applic...
Kingston University Thesis - Design and Implementation of a Secure Web Applic...
 
Global Freelancer- Course Work
Global Freelancer- Course WorkGlobal Freelancer- Course Work
Global Freelancer- Course Work
 
Attendance management system project report.
Attendance management system project report.Attendance management system project report.
Attendance management system project report.
 

Senior Year Project

  • 1. I EMAIL FILTERING AND ANALYSIS USING CLASSIFICATION ALGORITHMS Submitted in partial fulfillment of the requirements of the degree of Bachelor of Engineering in Information Technology By Akshay Iyer Dipti Pamnani Akanksha Pandey Karmanya Pathak Supervisor: Mrs. Jayshree Hajgude Department of Information Technology Vivekanand Education Society’s Institute of Technology 2013-14
  • 2. II Project Report Approval for B. E. This project report entitled EMAIL FILTERING AND ANALYSIS USING CLASSIFICATION ALGORITHMS by Akshay Iyer, Dipti Pamnani, Akanksha Pandey, and Karmanya Pathak is approved for the degree of Bachelor of Engineering in Information Technology. Examiners 1.--------------------------------------------- 2.--------------------------------------------- Supervisors 1.--------------------------------------------- 2.--------------------------------------------- Chairman ----------------------------------------------- Date: Place:
  • 3. III Declaration I declare that this written submission represents my ideas in my own words and where others' ideas or words have been included, I have adequately cited and referenced the original sources. I also declare that I have adhered to all principles of academic honesty and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source in my submission. I understand that any violation of the above will be cause for disciplinary action by the Institute and can also evoke penal action from the sources which have thus not been properly cited or from whom proper permission has not been taken when needed. ----------------------------------------- Akshay Iyer ----------------------------------------- Dipti Pamnani ----------------------------------------- Akanksha Pandey ----------------------------------------- Karmanya Pathak Date:
  • 4. IV ACKNOWLEDGEMENT This project has been a great learning experience for us. Through the course of this year, we have worked as a team for the successful completion of this project. Though, on paper it is only us who have made this project, in reality there are some people without whom this project could not have been finalized and designed the way it looks now. First of all, we would like to thank our Principal, Dr.(Mrs.) J.M.Nair, and our Vice-Principal, Dr. S.Mukhopadhyay for their support and guidance throughout the project implementation period. Without their help, the project would not have been possible. First of all, we are truly indebted to our internal project guide Mrs. Jayshree Hajgude, for her immense guidance and support. She has encouraged us and channelized our enthusiasm effectively. We would like to thank, Mrs. Vijayalakshmi Muralidharan, HOD of Information Technology Department. We would also like to thank our lab in charges, Mr. Amar Jaiswar and Mr. Ulhas Pawar, who have been very kind to us. Lastly, but not the least, we want to thank our college, Vivekanand Education Society of Institute and Technology, for providing us with the excellent reference materials and great computing facilities.
  • 5. V ABSTRACT With the various developments that are taking place in the field of technology especially in the communication department, there are a wide variety of malpractices that are being taking place which might prove harmful to the user. Most of this is currently being observed in the Email Account of a user. The Email user has an Inbox which consists of a wide variety of mails, and these mails are present in an unorganized manner. Also some mails which are being received by the user may contain harmful content which may prove to have severe consequences (Normally Termed As Spam). With this idea in mind, the topic of our BE Project is Email Filtering. Email Filtering is the process which is used in order to classify the Emails intro various categories on the basis of their content. The application fetches the emails from a user’s id, and stores it in a server, it then classifies it into spam and non spam using classification algorithms, and also it classifies it into user defined categories on the basis of the keyword entered by the user. The user can also send, forward and reply to a particular mail. There is also a lot of historical spam analysis done by the application on the basis of the content downloaded by the user. The user can access, read, store and copy the contents of his Email. The project report begins with a small introduction about Email Filtering and the reason we have chosen this topic. This is then followed by the Literature Survey, which tells the various areas where you can find similar operations being performed, and the various features of Email Filtering. We have also explained about the Algorithms which we are going to use in order to classify the Emails. The project then focuses of the Implementation Flow, and various Use Case Designs, which will help in better understanding of the various features of the project. This chapter is then followed by the actual implementation code of the project where, you will find information about the various snippets of the code that are a part of the project. Also, detailed explanation regarding each window of the Email Filtering application has been written down for the user. The next chapter will display the screenshots of the Email Filtering, and the various analyses which has been performed by the application, different types of graphical information is also made visible. This chapter is then followed by the conclusion and the future scope of the project as to what all features are going to be implemented in the future. The last chapter consists of a list of references which have played an important role in bringing about the completion of the project.
  • 6. VI Table of Contents 1. INTRODUCTION 1.1. What is Email Filtering…………………………………………………......2 1.2. Motivation…………………………………………………………………..3 1.3. Problem Definition ………………...………………...…………………..…4 1.4. Objectives…………………………………………………………………...5 2. LITERATURE SURVEY 2.1. Application………………………………………………………………....7 2.2. Issues Faced…………………………………………………………….......8 2.3. Different areas of Applications……………………………………………..9 3. ANALYSIS 3.1. C4.5 Algorithm…………………………………………………………......11 3.2. Naïve Bayes Algorithm………………………………………………….....12 3.3. Formulae…………………………………………………………………....15 4. DESIGN 4.1. Implementation Flow……….…………………………………………..…..17 4.2. Use Case Diagram………….…………………………………………….....19 4.3. Class Diagram…………….………………………………………………...20 4.4. Activity Diagram………….….………………………………………….….21 5. IMPLEMENTATION 5.1. The Connection Dialog Box……..…………………………………………23 5.2. The Email Client Window………..………………………………………...28 5.3. The Message Dialog Box....……….……….…………………………….....38 5.4. The File Chooser…………………….…………………………...................39 5.5. The Downloading Dialog Box……….……………………………………..40 5.6. The Analysis Window………………………………………………........... 41 6. RESULTS………………………………………………...................................49 7. CONCLUSION………………………………………………...........................59 8. FUTURE SCOPE………………………………………………........................61 9. REFERENCES………………………………………………............................64
  • 7. VII LIST OF IMAGES S. NO IMAGE PG. NO 1 A graph showing the rate of spam and its increase in the past few years 3 2 The Gmail Inbox which has user various folders in which mails get classified 9 3 A logo of the Apache Spam Assassin 9 4 Implementation Flow 17,18 5 Use Case Diagram 19 6 Class Diagram 20 7 Activity Diagram 21 8 A screenshot of the connect dialog window. 49 9 A screenshot of the home screen which opens once the user is logging in 49 10 A Screenshot of the Main Page where all operations can be performed 50 11 A Screenshot of the message viewer tab 50 12 The Save Dialog Box Appears when store in PC has been clicked 51 13 A Screenshot of the Messaging Tab 51 14 A Screenshot of New Message box 52 15 A Screenshot of Reply Message box 52 16 A Screenshot of Forward Message Box 52 17 A screenshot of the credits page 53 18 The Message Dialog 53 19 The File Chooser 54 20 The Downloading Dialog 54 21 A Screenshot of the Statistics tab 55 22 The Annual Spam Rate Report 55 23 The Monthly Spam Rate Report 56 24 The Weekly Spam Rate Report 56 25 Comparative Spam Rate Report 57 26 User Defined Messages Quantity 57 LIST OF TABLES S. NO TABLE PG. NO 1 The structure of the login details table 26 2 The structure of the main table where all the mails are stored 26 3 The structure of the keyword table where all the keywords are stored 27
  • 9. 2 1.1 What is Email Filtering? Email Filtering refers to the classification of an account’s emails based on two types of emails:  Spam and  Non-Spam. The user first logs in to his account using the valid id and password. Upon logging in, the user’s mails are fetched in the database and are classified into spam and non-spam. The user can also create custom labels which are classified using keywords provided by the user. Also, he can browse for the unread or read emails. This makes the mail service easy and user friendly. A basic task in email filtering is to mine the data from an email and to classify it into the different categories using Data Mining classification algorithm. Decision Tree Classification is a method commonly used in data mining. Email Filtering involves spam filtering, generalized filtering and segregation and filtering of inbound emails. Spam mails are filtered since they are not important to most of the users. Generalized filtering and segregation of emails is segregation of the mails into different categories such as sent and non- spam. Companies filter outbound emails so that sensitive data regarding the working of the company do not leak intentionally or accidentally by emails. To summarize email filtering  Segregates inbound mails into different categories.  Filters inbound mails so as not to leak sensitive information. The different categories in which the emails are classified are:  Spam  Non- Spam Also, the user can define categories as per his choice and can set the values as per the user’s choice. The user can enter the values, and these values will get associated with all the mails that have been calculated.
  • 10. 3 1.2. Motivation for this domain With the increase in the internet users, communication and transfer of files and data through different methods over the internet has increased drastically. In such times, it is difficult to know what kinds of emails are entering your organisation or system. Most of the present filtering techniques are unable to handle frequent changing scenario of mails adopted by the senders over the time. A graph showing the rate of spam and its increase in the past few years In absolute numbers, the average number of spam mails sent per day increased from 2.4 billion in 2002 to 300 billion in 2010. Google today announced it has made security improvements to Gmail to further protect users’ emails from snooping. Gmail now always uses an encrypted HTTPS connection when you check or send email, and encrypts all messages moving internally on Google’s servers. With the advent of growth in technology, desktop based email applications are more increasingly used. Outlook express has changed the way the world read’s and communicates with the help of Email.
  • 11. 4 1.3. Problem Definition As the Internet grows at a phenomenal rate, electronic mail (abbreviated as E-mail) has become a widely used electronic form of communication on the Internet. Every day, a huge number of people exchange messages in this fast and inexpensive way. With the excitement on electronic commerce growing, the usage of E-mail will increase more dramatically. However, the advantages of E-mail also make it overused by companies, organizations or people to promote products and spread information, which serves their own purposes. The mailbox of a user may often be crammed with E-mail messages some or even a large portion of which are not of interest to her/him. Searching for interesting messages everyday is becoming tedious and annoying. As a consequence, a personal E-mail filter is indeed needed. In recent years the highest degree of communication happens through e-mails which are often affected by passive or active attacks. Effective e-mail filtering measures are the timely requirement to handle such attacks. The basic idea behind e-mail filtering is to organize the incoming e-mails and also employ a mail filter to prioritize messages, and to sort them into folders based on subject matter or other criteria. The purpose of our application is to classify the incoming mails into different categories as follows: Spam and Non Spam Also there are various other categories which can be created and defined by the user himself which are stated as shown. Facebook Flipkart Amazon MakeMyTrip
  • 12. 5 1.4. Objectives User Interactive Whenever the user would like to bring about some modifications to his particular application, he would be able to achieve it easily and without any glitches. The user would be able to use the application as per his requirements and reap the benefits of the same. Security Security is also an important issue which needs to be considered before going about the actual procedure and hence the user should be able use his client application in an extremely safe and sophisticated manner without any fear of security breaks, and SQL attacks. Spam Detection This is the major aim of our project and we aim at bringing about the classification of mails, as per the presence of malicious content which may be harmful for the user computer and hence has been regarded as spam. User Defined Mail Analysis This is a new feature which would be included in our project According to this, the user can define his own keyword, and on the basis of that, he can access his mails easily and without any glitch. The user himself will define the keywords, and on the basis of the keywords that have been defined, he can clearly check all the concerned mails under one window. The user will be able to enter a keyword and on the basis of that keyword the mails will get classified. Historical Spam Analysis This is one of the features of our projects. All the mails that have been received by the user, can be analysed over its time period, and on the basis of that analysis, historical data, and spam detection can be brought about. The user can easily track which mails, have had the maximum spam, and in which year did he year the maximum amount of spam mail. The user can do the same Monthly and Weekly
  • 14. 7 2.1. Different areas of Application Spam Filtering With the advent of Internet, the number of spam mails has increased too. A spam filter is a program that is used to detect unsolicited and unwanted email and prevent those messages from getting to a user’s inbox. Like other types of filtering programs, a spam filter looks for certain criteria on which it bases judgments. Generalized Filtering and Segregation of E-mails Email filtering is the processing of email to organize it according to specified criteria. Most often this refers to the automatic processing of incoming messages, but the term also applies to the intervention of human intelligence in addition to anti-spam techniques, and to outgoing emails as well as those being received. Filtering mails based on classes like spam, travel, social and look out for a country-based classification of official mails for ease of access to mails from specific sub-branches would help make the mail service more efficient in terms of accessibility and user-friendliness. Inbound and Outbound Filtering of E-mails Mail filters can operate on inbound and outbound email traffic. Inbound email filtering involves scanning messages from the Internet addressed to users protected by the filtering system or for lawful interception. Outbound email filtering involves the reverse – scanning email messages from local users before any potentially harmful messages can be delivered to others on the Internet. One method of outbound email filtering that is commonly used by Internet service providers is transparent SMTP proxy, in which email traffic is intercepted and filtered via a transparent proxy within the network. Outbound filtering can also take place in an email server. Many corporations employ data leak prevention technology in their outbound mail servers to prevent the leakage of sensitive information via email.
  • 15. 8 2.2. Issues Faced Avoidance of vocabulary treated as Spam by Spammers The subject and body content are chosen carefully by spammers. Being aware of terms, text processing rules of a filter, etc. helps the spammers to use alternate words still serving the same purpose yet not falling prey to the filter. This helps them to pass the filter and the mail is treated as a non-spam mail which otherwise would have formed part of spam bulk. The Double Opt-In problem One of the main problems faced by spammers is to gain access and explicit permission to mail any particular user. An efficient solution found out by the clan is the Double Opt-In method. It works in the following manner: 1. The user enters his email address into an online form. 2. They receive a confirmation link. 3. On clicking the conformation link the spammer gets explicit permission to send mails to the user. These mails, though actually spam, are then treated as normal and non-spam mails. The Encrypted E-Mail Problem The Encrypted E-Mail Problem is one of the most important problems which are being faced by various E-Mail Client Applications. Most of the bank transactions which are being performed by various banks and corporate companies are sent in an encrypted format to the concerned user. This is done in order to ensure security. Many mails which are sent by many Telecom and multinational companies concerning any payment or any transfer of money are also done in the Encrypted format. The message which is viewed in the user inbox, is not actually the mail which has been revived by it, it is encrypted using some encryption key which can be retrieved by some user credentials, such as the user bank account number, his password. Thus, it is extremely difficult to bring about classification of mails in this format. Recently, Gmail had announced that, it has taken a step forward in correct classification of encrypted mails, which is soon to be implemented by them.
  • 16. 9 2.3. Recent Applications Gmail Email filtering has been and is being continuously developed and used by various email service providers. Recently Gmail added many more categories apart from spam which includes travel, promotions; etc. This has helped the users of Gmail to achieve and efficient classification of all incoming mails. The effectiveness of Gmail filters was recorded to a 99.05%. The Gmail Inbox which has user various folders in which mails get classified SpamAssassin SpamAssassin is a mail filter to identify spam. It is an intelligent email filter which uses a diverse range of tests to identify unsolicited bulk email, more commonly known as Spam. These tests are applied to email headers and content to classify email using advanced statistical methods. In addition, SpamAssassin has a modular architecture that allows other technologies to be quickly wielded against spam and is designed for easy integration into virtually any email system. A logo of the Apache SpamAssassin
  • 18. 11 3.1. The C4.5 Algorithm C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. C4.5 builds decision trees from a set of training data in the same way as ID3, using the concept of information entropy. The training data is a set of already classified samples. Each sample consists of a p-dimensional vector , Where the represent attributes or features of the sample, as well as the class in which falls. At each node of the tree, C4.5 chooses the attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. The splitting criterion is the normalized information gain (difference in entropy). The attribute with the highest normalized information gain is chosen to make the decision. Thus, the C4.5 algorithm then recourses on the smaller sub lists. This algorithm has a few base cases.  All the samples in the list belong to the same class. When this happens, it simply creates a leaf node for the decision tree saying to choose that class.  None of the features provide any information gain. In this case, C4.5 creates a decision node higher up the tree using the expected value of the class.  Instance of previously-unseen class encountered. Again, C4.5 creates a decision node higher up the tree using the expected value. Pseudo code In pseudo code, the general algorithm for building decision trees is: 1. Check for base cases 2. For each attribute a Find the normalized information gain ratio from splitting on a 3. Let a_best be the attribute with the highest normalized information gain 4. Create a decision node that splits on a_best 5. Recurse on the sub lists obtained by splitting on a_best, and add those nodes as children of node
  • 19. 12 3.2. The Naïve Bayes Algorithm A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model". An overview of statistical classifiers is given in the article on pattern recognition. In simple terms, a naive Bayes classifier assumes that the value of a particular feature is unrelated to the presence or absence of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of the presence or absence of the other features. For some types of probability models, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood; in other words, one can work with the naive Bayes model without accepting Bayesian probability or using any Bayesian methods. Despite their naive design and apparently oversimplified assumptions, naive Bayes classifiers have worked quite well in many complex real-world situations. In 2004, an analysis of the Bayesian classification problem showed that there are sound theoretical reasons for the apparently implausible efficacy of naive Bayes classifiers. Still, a comprehensive comparison with other classification algorithms in 2006 showed that Bayes classification is outperformed by other approaches, such as boosted trees or random. Advantages: An advantage of naive Bayes is that it only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix. Probabilistic model: Abstractly, the probability model for a classifier is a conditional model over a dependent class variable with a small number of outcomes or classes, conditional on several feature variables through . The problem is that if the number of features is
  • 20. 13 large or when a feature can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable. Using Bayes' theorem, this can be written In plain English, using Bayesian Probability terminology, the above equation can be written as In practice, there is interest only in the numerator of that fraction, because the denominator does not depend on and the values of the features are given, so that the denominator is effectively constant. The numerator is equivalent to the joint probability model which can be rewritten as follows, using the chain rule for repeated applications of the definition of conditional probability: Now the "naive" conditional independence assumptions come into play: assume that each feature is conditionally independent of every other feature for given the category . This means that , , , and so on, for . Thus, the joint model can be expressed as
  • 21. 14 This means that under the above independence assumptions, the conditional distribution over the class variable is: where the evidence is a scaling factor dependent only on , that is, a constant if the values of the feature variables are known. Constructing a classifier from the probability model: The discussion so far has derived the independent feature model, that is, the naive Bayes probability model. The naive Bayes classifier combines this model with a decision rule. One common rule is to pick the hypothesis that is most probable; this is known as the maximum a posterior or MAP decision rule. The corresponding classifier, a Bayes classifier, is the function defined as follows:
  • 22. 15 3.3. Formulae F-Measure F-measure = 2 * precision * recall / (precision + recall) Where, Precision = TP / (TP + FP) Recall = TP / (TP + FN) True Positive Rate (Sensitivity ) TPR = TP / (TP +FN) False Positive Rate FPR = FP / (FP + TN) True Negative Rate (Specificity) TNR = TN / (FP + TN) False Negative Rate FNR = FN / (TP+FN)
  • 24. 17 4.1. Implementation Flow Home Signup Login Creation of 2 tables in MySQL Creation of 3 separate fields in main table:  Naïve Bayes  C 4.5  Keyword Graphical Display of the mails fetched and the unread mails.Fill Credentials  Username  Password  Name  Surname  Phone no Fill Credentials  Username  Password The credentials get stored in a table called login details Authenticate based on details in login details Classification Selection between Naïve Bayes, C4.5, keyword based classification with a multi-select option available to the user On selection and submission of choices by clicking on CLASSIFY button, mails are classified into spam and non-spam
  • 25. 18 Message Viewer Allows the user to sell it  Spam or  Non-spam or  Keyword Gives a view of mails with From and subject as per choices made Allows for keyword based view, where a search is made by looking at the subject as well as the content An option to store mail to PC made available An option to copy e- mail/content to clipboard Statistics Allows for a graphical comparison between on e-mails and on an annual, monthly or weekly statistical view of e- mail based on historical data. Messaging Read e-mails Reply to e-mails Forward e-mails
  • 26. 19 4.2. Use Case Diagram
  • 30. 23 5.1 The Connection Dialog Box The connection window is the major window which takes all the login credentials and the required information from the user and stores it in the server. The signup credentials take information such as, the username, the password, and the Name, Surname, Country, and Mobile Number of the user. The user also needs to provide the Server with which he is going to be interacting, and the server which is going to be used by the user to perform message sending operations. As specified earlier, the two mail server which is going to be accessed is the IMAP server, and the SMTP server is going to be used for message transport and access. (See Screenshot 1) From the above image, it can clearly be understood as to what operations are going to be performed by the connect dialog window, and what are the prerequisites for signing up by the user. Also, as soon as the user is signing up there are two separate tables that are created for the user, the first one is the main user table where all the mails are getting fetched and they are getting stored. The second table is the keyword table that stores all the user defined keywords that have been searched by the user. ConnectDialog.java package emailfiltering; import java.awt.*; import java.awt.event.*; import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.Statement; import javax.swing.*; public class ConnectDialog extends javax.swing.JDialog { Connection conn = null; Statement stmt = null, stmt1 = null; ResultSet rs = null; String un, ps, n, sn, co, imap, smtp, mobile; public ConnectDialog(Frame parent) { // Call super constructor, specifying that dialog is modal. super(parent, true); initComponents(); try { Class.forName("com.mysql.jdbc.Driver"); conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/email", "root", ""); System.out.println("Connection Established Successfully"); } catch (Exception e) { System.out.println(e); } // Set application title. setTitle("Connect"); // Handle closing events. addWindowListener(new WindowAdapter() {
  • 31. 24 public void windowClosing(WindowEvent e) { actionCancel(); } }); } private void actionConnect() { if (usernameTextField.getText().trim().length() < 1 || passwordField.getPassword().length < 1) { JOptionPane.showMessageDialog(this, "One or more settings is missing.", "Missing Setting(s)", JOptionPane.ERROR_MESSAGE); return; } // Close dialog. dispose(); } // Cancel connecting and exit program. private void actionCancel() { System.exit(0); } public String getUsername() { return usernameTextField.getText(); } // Get e-mail password. public String getPassword() { return new String(passwordField.getPassword()); } @SuppressWarnings("unchecked") // <editor-fold defaultstate="collapsed" desc="Generated Code"> private void connectButtonActionPerformed(java.awt.event.ActionEvent evt) { actionConnect(); } private void cancelButtonActionPerformed(java.awt.event.ActionEvent evt) { actionCancel(); } private void signupActionPerformed(java.awt.event.ActionEvent evt) { un = username.getText(); ps = password.getText(); n = name.getText(); sn = surname.getText(); co = country.getText(); imap = servername.getText(); smtp = smtpserver.getText(); mobile = phoneno.getText(); try { String sql = "INSERT INTO `logindetails` (`Username`,`Password`,`Name`,`Surname`,`Country`,`Server`,`SMTPServer`,`Phoneno`) VALUES (?,?,?,?,?,?,?,?);"; PreparedStatement pstmt = conn.prepareStatement(sql); pstmt.setString(1, un); pstmt.setString(2, ps); pstmt.setString(3, n); pstmt.setString(4, sn); pstmt.setString(5, co); pstmt.setString(6, imap); pstmt.setString(7, smtp); pstmt.setString(8, mobile); pstmt.executeUpdate(); } catch (Exception e) {
  • 32. 25 System.out.println(e); } int index = un.indexOf("@"); String name = un.substring(0, index); String tablename = name.replace(".", ""); try { String sql = "CREATE TABLE IF NOT EXISTS `" + tablename + "` ( `From` text NOT NULL, `Subject` text NOT NULL, `Content` longtext NOT NULL, `Naivebayes` text NOT NULL, `C45` text NOT NULL, `Day` varchar(3) NOT NULL, `Month` varchar(3) NOT NULL, `Date` int(2) NOT NULL, `Year` int(4) NOT NULL, `Time` int(2) NOT NULL, `Keyword` text NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1;"; stmt = (Statement) conn.createStatement(); stmt.executeUpdate(sql); String sql1 = "CREATE TABLE IF NOT EXISTS `" + tablename + "_keyword` ( `Keyword` text NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1;"; stmt1 = (Statement) conn.createStatement(); stmt1.executeUpdate(sql1); } catch (Exception e) { System.out.println(e); }} // Variables declaration - do not modify private javax.swing.JButton cancelButton; private javax.swing.JButton connectButton; private javax.swing.JTextField country; private javax.swing.JLabel jLabel10; private javax.swing.JLabel jLabel11; private javax.swing.JLabel jLabel12; private javax.swing.JLabel jLabel13; private javax.swing.JLabel jLabel14; private javax.swing.JLabel jLabel15; private javax.swing.JLabel jLabel16; private javax.swing.JLabel jLabel2; private javax.swing.JLabel jLabel4; private javax.swing.JLabel jLabel5; private javax.swing.JLabel jLabel6; private javax.swing.JLabel jLabel7; private javax.swing.JLabel jLabel8; private javax.swing.JLabel jLabel9; private javax.swing.JTextField name; private javax.swing.JTextField password; private javax.swing.JPasswordField passwordField; private javax.swing.JTextField phoneno; private javax.swing.JTextField servername; private javax.swing.JButton signup; private javax.swing.JTextField smtpserver; private javax.swing.JTextField surname; private javax.swing.JTextField username; private javax.swing.JTextField usernameTextField; // End of variables declaration }
  • 33. 26 When the user is signing up for the first time, all his information gets stored in the ‘logindetails’ table in the server. The structure of the table and the mysql query to execute that code is as shown below. The structure of the login details table MySql Query CREATE TABLE IF NOT EXISTS `logindetails` ( `Username` varchar(30) NOT NULL, `Password` varchar(30) NOT NULL, `Name` varchar(30) NOT NULL, `Surname` varchar(30) NOT NULL, `Country` varchar(30) NOT NULL, `Server` varchar(30) NOT NULL, `SMTPServer` varchar(30) NOT NULL, `Phoneno` varchar(30) NOT NULL, `messagecount` int(11) NOT NULL, `classifiedcount` int(11) NOT NULL, PRIMARY KEY (`Username`), UNIQUE KEY `Phoneno` (`Phoneno`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Once the user has signed up, the following are the two table structures that are created for the user. The structure of the main table where all the mails are stored
  • 34. 27 This table contains the information regarding the mail getting downloaded. Who was the message received from, what is the subject of the mail, the content of the mail, the two algorithms which are to be implemented, the date and time, and a keyword column, where the keyword/s associated with that mail is/are stored. MySql Query CREATE TABLE IF NOT EXISTS `username` ( `From` text NOT NULL, `Subject` text NOT NULL, `Content` longtext NOT NULL, `Naivebayes` text NOT NULL, `C45` text NOT NULL, `Day` varchar(3) NOT NULL, `Month` varchar(3) NOT NULL, `Date` int(2) NOT NULL, `Year` int(4) NOT NULL, `Time` int(2) NOT NULL, `Keyword` text NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; The structure of the keyword table where all the keywords are stored MySql Query CREATE TABLE IF NOT EXISTS `username_keyword` ( `Keyword` text NOT NULL, `Count` int(11) NOT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
  • 35. 28 5.2. The Email Client Window The Email Client window is the major window in the application. The major functionalities which are to be implemented are a part of the Email Client Window. The Email Client is entirely divided into 6 different parts, and each of these 6 parts is represented by 6 tabs which are present on the top of the Email Client. All the operations which are to be performed can be performed only with the Email Client. The Entire Email Client is comprised of the following 6 tabs. The Welcome Tab The welcome tab is the basic homepage where the user can view all the basic information, like how many mails have been downloaded, how many are unread. The Main Page It is here that the user performs all the necessary operations, with respect to the client application. The user executes Naïve Bayes, and C4.5 classification algorithms, as well as can search for specific user defined keywords. The Message Viewer The user can view all his mails on the basis of the conditions that have been specified in this window, the message viewer helps the user read his mails, as per his preference. The Statistics Window The statistics window showcases graphical and historical analysis on the information that is made available to him from previously fetched data. The Messaging Window The user can send a message to another user, from the desktop application to a particular user’s Email Account. Credits Information regarding the developers is present in this window; also a feedback form has been developed so that the user can send feedbacks regarding his experience with the application.
  • 36. 29 THE WELCOME TAB (See Screenshot 2) The screenshot as shown above clearly shows, a graphical display as to how many mails the user has received which are read, and the total number of mails the user has received and is unread. The red portion in the pie chart represents the total amount of unread mail which the user is currently having in his mailbox. The refresh button allows the user to refresh his mailbox, as to retrieve those mails which haven’t been retrieved yet. This happens on the execution of the connect method which is executed by clicking on connect from the connect dialog box. The Connect Method final ConnectDialog dialog = new ConnectDialog(this); dialog.show(); username=dialog.getUsername(); password=dialog.getPassword(); final DownloadingDialog downloadingDialog = new DownloadingDialog(this); SwingUtilities.invokeLater(new Runnable() { public void run() { downloadingDialog.show(); } }); //Establish JavaMail session and connect to server. Store store = null; try { //Initialize JavaMail session with SMTP server. Properties props = new Properties(); props.setProperty("mail.store.protocol", "imaps"); props.put("mail.smtp.host","smtp.gmail.com"); props.put("mail.smtp.starttls.enable","true"); props.put("mail.smtp.auth", "true"); session = Session.getInstance(props, new javax.mail.Authenticator() { protected PasswordAuthentication getPasswordAuthentication() { return new PasswordAuthentication(dialog.getUsername(),dialog.getPassword()); } }); store = session.getStore("imaps"); store.connect("imap.gmail.com",dialog.getUsername(),dialog.getPassword()); } catch (Exception e) { //Close the downloading dialog. downloadingDialog.dispose(); //Show error dialog. showError("Unable to connect.", true); } //Download message headers from server. try { int j=0; //Open main "INBOX" folder. Folder folder = store.getFolder("INBOX"); folder.open(Folder.READ_WRITE);
  • 37. 30 Message msg[] = folder.getMessages(); FlagTerm ft = new FlagTerm(new Flags(Flags.Flag.SEEN), false); Message msg1[] = folder.search(ft); System.out.println("UNREAD MAILS: "+msg1.length); System.out.println("MAILS: "+msg.length); DefaultPieDataset pieDataset=new DefaultPieDataset(); pieDataset.setValue("Unread Mail",msg1.length); pieDataset.setValue("Read Mail",(msg.length-msg1.length)); JFreeChart chart= ChartFactory.createPieChart("Mail Stats",pieDataset,true,true,true); jPanel12.setLayout(new java.awt.BorderLayout()); ChartPanel panelpie1 =new ChartPanel(chart); jPanel12.removeAll(); jPanel12.add(panelpie1,BorderLayout.CENTER); jPanel12.validate(); for(Message message:msg) { try { String sentdate=message.getSentDate().toString(); String getfrom=message.getFrom()[0].toString(); String getsubject=message.getSubject().toString(); String content; day=sentdate.substring(0,3); month=sentdate.substring(4,7); date=sentdate.substring(8,10); year=sentdate.substring(24,28); time=sentdate.substring(11,13); System.out.println("**********************************"); if (message.getContent() instanceof Multipart) { StringBuffer messageContent = new StringBuffer(); Multipart multipart = (Multipart) message.getContent();; for (int i = 0; i < multipart.getCount(); i++) { Part part = (Part) multipart.getBodyPart(i); if (part.isMimeType("text/plain")) { messageContent.append(part.getContent().toString()); } } content=messageContent.toString(); } else { content=message.getContent().toString(); } try { String sql="INSERT INTO `username` (`From`,`Subject`, `Content`,`Naivebayes`,`C45`,`Day`,`Month`,`Date`,`Year`,`Time`,`Keyword`) VALUES (?,?,?,?,?,?,?,?,?,?,?);"; PreparedStatement pstmt = conn.prepareStatement(sql); pstmt.setString(1, getfrom); pstmt.setString(2, getsubject); pstmt.setString(3, content); pstmt.setString(4,"aa"); pstmt.setString(5,"aa"); pstmt.setString(6,day); pstmt.setString(7,month); pstmt.setString(8,date); pstmt.setString(9,year); pstmt.setString(10,time); pstmt.setString(11,"aa"); pstmt.executeUpdate(); }
  • 38. 31 catch(Exception e) { System.out.println("there is an exception"); System.out.println(e); } } catch (Exception e) { System.out.println("No Information"); } Message[] messages = folder.getMessages(); //Retrieve message headers for each message in folder. FetchProfile profile = new FetchProfile(); profile.add(FetchProfile.Item.ENVELOPE); folder.fetch(messages, profile); } } catch (Exception e) { // Close the downloading dialog. downloadingDialog.dispose(); // Show error dialog. showError("Unable to download messages.", true); } // Close the downloading dialog. downloadingDialog.dispose(); } THE MAIN PAGE The main page is the window where major classification operations are being performed. There are two algorithms that are being used, Naïve Bayes and C4.5. (See Screenshot 3) The classification is being performed using the training dataset which is imported and then various operations with respect to it are being performed by the user. Training dataset creation private void createTrainingSet(String dataset) throws Exception { emailMessage = new Attribute("emailMessage", (FastVector) null); emailClass = new FastVector(3); emailClass.addElement("spam"); emailClass.addElement("no spam"); emailClass.addElement("?"); eClass = new Attribute("emailClass", emailClass); records = new FastVector(2); records.addElement(eClass); records.addElement(emailMessage); trainingSet = new Instances("SpamClsfyTraining", records, 40); trainingSet.setClassIndex(0); this.readTrainingDataset(dataset); ArffSaver saver = new ArffSaver(); saver.setInstances(trainingSet); saver.setFile(new File("C:Akshaytraining.arff")); saver.writeBatch();
  • 39. 32 } Classification Implementation private void performClassification(Object model, String modelName) throws Exception { System.out.println("**==" + modelName + "==**"); StringToWordVector stringToVector = new StringToWordVector(1000); stringToVector.setInputFormat(trainingSet); stringToVector.setOutputWordCounts(true); stringToVector.setUseStoplist(false); Instances filteredData = Filter.useFilter(trainingSet, stringToVector); Instances filteredTestData = Filter.useFilter(testingSet,stringToVector); Classifier cModel = (Classifier) model; cModel.buildClassifier(filteredData); Evaluation eTest = new Evaluation(filteredTestData); eTest.evaluateModel(cModel, filteredTestData); double m=eTest.correct(); int x=(int)m; System.out.println(x); if(x==1) { if(nb==1) { System.out.println("Naive Bayes Spam"); } if(c==1) { System.out.println("C4.5 Spam"); } } else { if(nb==1) { System.out.println("Naive Bayes Non Spam"); } if(c==1) { System.out.println("C4.5 Non Spam"); } } } There is also a keyword based search feature which has been implemented in which the user specified keyword is being searched by the application. Keyword Search private void searchActionPerformed(java.awt.event.ActionEvent evt) { if(keyword.getText().equals("")) { System.out.println("lol"); JOptionPane.showMessageDialog(new JFrame(),"Please Enter The Keyword", "Error", JOptionPane.ERROR_MESSAGE); }
  • 40. 33 else { try { String sql1="INSERT INTO `username_keyword` (`Keyword`) VALUES (?);"; PreparedStatement pstmt = conn.prepareStatement(sql1); pstmt.setString(1,keyword.getText()); pstmt.executeUpdate(); pst1=conn.prepareStatement("SELECT * FROM `username_keyword`"); rs1=pst1.executeQuery(); keywordviewer.setModel(DbUtils.resultSetToTableModel(rs1)); pst=conn.prepareStatement("SELECT * FROM `username`"); rs=pst.executeQuery(); int i=1; while(rs.next()) { String subject = rs.getString("Subject"); String content = rs.getString("Content"); String pastkeywordlist = rs.getString("Keyword"); String newkeyword; if(pastkeywordlist.equals("")) { newkeyword=keyword.getText(); } else { newkeyword=pastkeywordlist + "," + keyword.getText(); } System.out.println(EmailFiltering.containtsKeyWord(subject, content, keyword.getText())); if(EmailFiltering.containtsKeyWord(subject, content, keyword.getText())) { String sql="UPDATE `username` SET `keyword` = ? WHERE `Subject` = ? AND `Content` = ?"; PreparedStatement pstmt1=conn.prepareStatement(sql); pstmt1.setString(1,newkeyword); pstmt1.setString(2,subject); pstmt1.setString(3,content); pstmt1.executeUpdate(); } } } catch(Exception e) { System.out.println(e); } } FillCombo(); } THE MESSAGE VIEWER The message viewer enables the user to view all the information on the basis of segregation which has been performed by the classification algorithms that are executed by the user. The message viewer also has a feature where the keyword can be recognised and all the necessary files can be created with respect to that feature to be implemented.
  • 41. 34 There are two additional buttons which have been provided; one is to store the particular file in a specific location which is defined by the user. The other feature is to copy all the message contents to the clipboard. (See Screenshot 4) View Messages on the basis of Classification private void update_table() { try { String cv,sb; cv=columnvalue.getSelectedItem().toString(); sb=spambox.getSelectedItem().toString(); System.out.println("SELECT `From`,`Subject` FROM `username` WHERE `naivebayes`='spam'"); pst=conn.prepareStatement("SELECT `From`,`Subject` FROM `username` WHERE `"+cv+"`='"+sb+"'"); rs=pst.executeQuery(); messageviewer.setModel(DbUtils.resultSetToTableModel(rs)); } catch(Exception e) { System.out.println(e); }} View Messages on the basis of Keywords private void keywordbuttonActionPerformed(java.awt.event.ActionEvent evt) { String keywordvt=keywordcombobox.getSelectedItem().toString(); System.out.println(keywordvt); try { String sql="SELECT * FROM `username`"; pst=conn.prepareStatement(sql); rs=pst.executeQuery(); while(rs.next()) { String keywordtb=rs.getString("Keyword"); System.out.println(keywordtb); System.out.println(EmailFiltering.containsKeyWord(keywordtb,keywordvt)); if(EmailFiltering.containsKeyWord(keywordtb,keywordvt)) { pst1=conn.prepareStatement("SELECT `From`,`Subject` FROM `username` WHERE `Keyword`='"+keywordtb+"'"); rs1=pst1.executeQuery(); messageviewer.setModel(DbUtils.resultSetToTableModel(rs1)); //pst.close(); } System.out.println(); }} catch(Exception e) { System.out.println(e); }}
  • 42. 35 Store the particular text file in a specific location (See Screenshot 5) Store in PC private void savepcActionPerformed(java.awt.event.ActionEvent evt) { System.out.println("Working"); final FileChooser filec=new FileChooser(this,true); int result = FileChooser.jFileChooser2.showSaveDialog(this); if (result == FileChooser.jFileChooser2.APPROVE_OPTION) { String path=FileChooser.jFileChooser2.getSelectedFile().getAbsoluteFile().toString(); try {FileWriter writer=new FileWriter(path); PrintWriter outputStream=new PrintWriter(path); String content=EmailFiltering.jTextArea1.getText(); outputStream.println(content); outputStream.close();} catch(Exception e) {} } else if (result == FileChooser.jFileChooser2.CANCEL_OPTION) { System.out.println("Cancel was selected"); } FileChooser.jFileChooser2.setVisible(false); } Copy Text private void copytextActionPerformed(java.awt.event.ActionEvent evt) { String name= jTextArea1.getText(); StringSelection stringSelection = new StringSelection(name); Clipboard clipboard = Toolkit.getDefaultToolkit().getSystemClipboard(); clipboard.setContents(stringSelection,null); } THE MESSAGING TAB This tab helps the user to send mails, via the desktop application itself. The user can also select a particular message and forward that message to any user. The user can also reply to a mail which he has received. All these features have been implemented with the help of the Message Dialog box. (See Screenshot 6) Send Message private void sendMessage(String to,String Subject,String Content) { MessageDialog dialog=new MessageDialog(this,true); dialog.totextbox.setText(to); dialog.subjecttextbox.setText(Subject); dialog.contenttextbox.setText(Content); dialog.setVisible(true); try {
  • 43. 36 Message newMessage = new MimeMessage(session); newMessage.setFrom(new InternetAddress(dialog.fromtextbox.getText())); System.out.println("Line 1"); newMessage.setRecipient(Message.RecipientType.TO, new InternetAddress(dialog.totextbox.getText())); System.out.println("Line 2"); newMessage.setSubject(dialog.subjecttextbox.getText()); System.out.println("Line 3"); newMessage.setSentDate(new Date()); System.out.println("Line 4"); newMessage.setText(dialog.contenttextbox.getText()); System.out.println("Line 5"); Transport.send(newMessage); System.out.println("Done"); dialog.setVisible(false); } catch (Exception e) { System.out.println(e); showError("Unable to send message", false); } } (See Screenshot 7) Function: private void actionNew() { int row=messagereader.getSelectedRow(); String messagesubject=""; String messageto=""; String messagecontent=""; sendMessage(messageto,messagesubject,messagecontent); } (See Screenshot 8) Function: private void actionReply() { int row=messagereader.getSelectedRow(); String messagesubject=(messagereader.getModel().getValueAt(row,1).toString()); String messageto=""; String messagecontent=""; String replycontent1= " ---------------- +n" + " REPLY MESSAGE +n" + " ----------------- +n"; String replycontent; String replysubject="RE:"+messagesubject; String sql="select `From`,`Content` from `ourbeproject2014` where subject='"+messagesubject+"' "; try { pst=conn.prepareStatement(sql); rs=pst.executeQuery(); while(rs.next()) {
  • 44. 37 messageto=rs.getString("From"); messagecontent=rs.getString("Content"); replycontent=replycontent1+messagecontent; sendMessage(messageto,replysubject,replycontent); break; } } catch(Exception e) { System.out.println(e); } } (See Screenshot 9) Function: private void actionForward() { int row=messagereader.getSelectedRow(); String messagesubject=(messagereader.getModel().getValueAt(row,1).toString()); String messageto=""; String messagecontent=""; String forwardcontent1=" ----------------- +n" + " FORWARDED MESSAGE +n" + " ----------------- +n"; String forwardcontent; String sql="select `From`,`Content` from `ourbeproject2014` where subject='"+messagesubject+"' "; try { pst=conn.prepareStatement(sql); rs=pst.executeQuery(); while(rs.next()) { messagecontent=rs.getString("Content"); forwardcontent=forwardcontent1+messagecontent; sendMessage(messageto,messagesubject,forwardcontent); break; } } catch(Exception e) { System.out.println(e); } } CREDITS (See Screenshot 10) The user can send a feedback as to how the user felt regarding the application.
  • 45. 38 5.3. The Message Dialog Box The message dialog box is the dialog box which is being used to send a new mail, reply to an already existing mail, or to forward a mail. Various code snippets have been combined with this particular box and hence it plays an important role in the functionality of the project. (See Screenshot 11) MessageDialog.java package emailfiltering; public class MessageDialog extends javax.swing.JDialog { public MessageDialog(java.awt.Frame parent, boolean modal) { super(parent, modal); initComponents(); } private void totextboxActionPerformed(java.awt.event.ActionEvent evt) { } private void jButton1ActionPerformed(java.awt.event.ActionEvent evt) { dispose(); } public static javax.swing.JTextArea contenttextbox; public static javax.swing.JTextField fromtextbox; private javax.swing.JButton jButton1; private javax.swing.JLabel jLabel1; private javax.swing.JLabel jLabel2; private javax.swing.JLabel jLabel3; private javax.swing.JScrollPane jScrollPane1; public static javax.swing.JTextField subjecttextbox; public javax.swing.JTextField totextbox; // End of variables declaration }
  • 46. 39 5.4. The File Chooser The file chooser is an inbuilt feature in java which has been included so that the user can trace the path to a particular location in order to save the file. (See Screenshot 12) FileChooser.java package emailfiltering; public class FileChooser extends javax.swing.JDialog { public FileChooser(java.awt.Frame parent, boolean modal) { super(parent, modal); initComponents(); } private void jFileChooser2ActionPerformed(java.awt.event.ActionEvent evt) { } public static void main(String args[]) { java.awt.EventQueue.invokeLater(new Runnable() { public void run() { FileChooser dialog = new FileChooser(new javax.swing.JFrame(), true); dialog.addWindowListener(new java.awt.event.WindowAdapter() { @Override public void windowClosing(java.awt.event.WindowEvent e) { System.exit(0); } }); dialog.setVisible(true); } });} public static javax.swing.JFileChooser jFileChooser2; }
  • 47. 40 5.5. The Downloading Dialog The downloading dialog is a dialogue that appears whenever the mails are being downloaded from the server. It appears when the Connect button is clicked from the connect dialog box and continues till the mails are being fetched by the user. (See Screenshot 13) DownloadingDialog.java package emailfiltering; import java.awt.*; import javax.swing.*; public class DownloadingDialog extends JDialog { public DownloadingDialog(Frame parent) { // Call super constructor, specifying that dialog is modal. super(parent, true); // Set dialog title. setTitle("E-mail Client"); // Instruct window not to close when the "X" is clicked. setDefaultCloseOperation(DO_NOTHING_ON_CLOSE); // Put a message with a nice border in this dialog. JPanel contentPane = new JPanel(); contentPane.setBorder( BorderFactory.createEmptyBorder(5, 5, 5, 5)); contentPane.add(new JLabel("Downloading messages...")); setContentPane(contentPane); // Size dialog to components. pack(); // Center dialog over application. setLocationRelativeTo(parent); } @SuppressWarnings("unchecked") // <editor-fold defaultstate="collapsed" desc="Generated Code"> }
  • 48. 41 5.6. Analysis Window THE STATISTICS WINDOW (See Screenshot 14) The statistics window is extremely useful in achieving historical analysis of mails, as to how much amount of spam and non spam has been received over the past few years. Annual Statistics The annual statistics generate statistics from 2007 to 2017 and showcase how many mails have been received each year, how many of them are spam, and how many of them are non spam. (See Screenshot 15) Function: private void annuallyActionPerformed(java.awt.event.ActionEvent evt) { DefaultCategoryDataset datasetyearly = new DefaultCategoryDataset(); int year=2007; while(year<=2017) { System.out.println(year); try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE NAIVEBAYES='spam' AND YEAR='"+year+"'"); rs=pst.executeQuery(); int spamyearcount; String yearvalue=Integer.toString(year); while(rs.next()) { spamyearcount=rs.getInt("count"); System.out.println(spamyearcount); datasetyearly.addValue(spamyearcount,"Spam",yearvalue); } } catch(Exception e) { System.out.println(e); } year=year+1; } year=2007; while(year<=2017) { System.out.println(year);
  • 49. 42 try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE NAIVEBAYES='nonspam' AND YEAR='"+year+"'"); rs=pst.executeQuery(); int nonspamyearcount; String yearvalue=Integer.toString(year); while(rs.next()) { nonspamyearcount=rs.getInt("count"); System.out.println(nonspamyearcount); datasetyearly.addValue(nonspamyearcount,"Non Spam",yearvalue); } } catch(Exception e) { System.out.println(e); } year=year+1; } JFreeChart stackedChart = ChartFactory.createStackedBarChart("Annual Spam Rate Report", "Year", "Mail",datasetyearly, PlotOrientation.VERTICAL, true, true, false); CategoryPlot barchrt=stackedChart.getCategoryPlot(); setResizable(false); barchrt.setRangeGridlinePaint(Color.BLACK); jPanel13.setLayout(new java.awt.BorderLayout()); ChartPanel panelpie =new ChartPanel(stackedChart); jPanel13.removeAll(); jPanel13.add(panelpie,BorderLayout.CENTER); jPanel13.validate(); } Monthly Statistics The yearly statistics which are being developed can be further viewed monthly. The user needs to specify the year during which he would like to perform Analysis and on the basis of that the user can understand the amount of spam mails that are being fetched and are being stored by the user. The Monthly Statistics can be viewed from the month of January and it continues till the month of December. All the months have been specified (See Screenshot 16) Function: private void monthlyActionPerformed(java.awt.event.ActionEvent evt) { DefaultCategoryDataset datasetmonthly = new DefaultCategoryDataset(); String my; String[] month = new String[] {"Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"}; my=monthyear.getSelectedItem().toString(); System.out.println(my); int i=0;
  • 50. 43 while(i<month.length) { System.out.println(month[i]); try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE MONTH='"+month[i]+"' AND NAIVEBAYES='spam' AND YEAR='"+my+"'"); rs=pst.executeQuery(); int nonspammonthcount; while(rs.next()) { nonspammonthcount=rs.getInt("count"); System.out.println(nonspammonthcount); datasetmonthly.addValue(nonspammonthcount,"Spam",month[i]); } rs.close(); } catch(Exception e) { System.out.println(e); } i++; } i=0; while(i<month.length) { System.out.println(month[i]); try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE MONTH='"+month[i]+"' AND NAIVEBAYES='nonspam' AND YEAR='"+my+"'"); rs=pst.executeQuery(); int nonspammonthcount; while(rs.next()) { nonspammonthcount=rs.getInt("count"); System.out.println(nonspammonthcount); datasetmonthly.addValue(nonspammonthcount,"Non Spam",month[i]); } rs.close(); } catch(Exception e) { System.out.println(e); } i++; } JFreeChart stackedChart = ChartFactory.createStackedBarChart("Monthly Spam Rate Report", "Month", "Mails", datasetmonthly, PlotOrientation.VERTICAL, true, true, false); CategoryPlot barchrt=stackedChart.getCategoryPlot(); setResizable(false); barchrt.setRangeGridlinePaint(Color.BLACK); jPanel13.setLayout(new java.awt.BorderLayout()); ChartPanel panelpie =new ChartPanel(stackedChart); jPanel13.removeAll(); jPanel13.add(panelpie,BorderLayout.CENTER); jPanel13.validate(); }
  • 51. 44 Weekly Statistics The monthly statistics which are being developed can be further viewed weekly. The user needs to specify the year during which he would like to perform Analysis and on the basis of that the user can understand the amount of spam mails that are being fetched and are being stored by the user. The Weekly Statistics can be viewed in spans of 4 weeks All the weeks have been specified Week 1: 1-7 Week 2: 8-14 Week 3: 15-21 Week 4: 22-31 (See Screenshot 17) Function: private void weeklyActionPerformed(java.awt.event.ActionEvent evt) { DefaultCategoryDataset datasetweekly = new DefaultCategoryDataset(); int w1=0,w2=0,w3=0,w4=0; String wm,wy; wm=weekmonth.getSelectedItem().toString(); wy=weekyear.getSelectedItem().toString(); System.out.println(wm); System.out.println(wy); int i=1; while(i<=31) { try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE NAIVEBAYES='spam' AND DATE='"+i+"' AND MONTH='"+wm+"'AND YEAR='"+wy+"'"); rs=pst.executeQuery(); int spamweekcount; while(rs.next()) { spamweekcount=rs.getInt("count"); if(i>=1 && i<8) { w1=w1+spamweekcount; } if(i>=8 && i<15) { w2=w2+spamweekcount; } if(i>=15 && i<22) { w3=w3+spamweekcount; } if(i>=22 && i<31) { w4=w4+spamweekcount; } } rs.close(); } catch(Exception e) { System.out.println(e); } i++;
  • 52. 45 } datasetweekly.addValue(w1, "Spam","Week1"); datasetweekly.addValue(w2, "Spam","Week2"); datasetweekly.addValue(w3, "Spam","Week3"); datasetweekly.addValue(w4, "Spam","Week4"); i=0; w1=0;w2=0;w3=0;w4=0; while(i<=31) { try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE NAIVEBAYES='nonspam' AND DATE='"+i+"' AND MONTH='"+wm+"'AND YEAR='"+wy+"'"); rs=pst.executeQuery(); int nonspamweekcount; while(rs.next()) { nonspamweekcount=rs.getInt("count"); if(i>=1 && i<8) { w1=w1+nonspamweekcount; } if(i>=8 && i<15) { w2=w2+nonspamweekcount; } if(i>=15 && i<22) { w3=w3+nonspamweekcount; } if(i>=22 && i<31) { w4=w4+nonspamweekcount; } } rs.close(); } catch(Exception e) { System.out.println(e); } i++; } datasetweekly.addValue(w1, "Non Spam","Week1"); datasetweekly.addValue(w2, "Non Spam","Week2"); datasetweekly.addValue(w3, "Non Spam","Week3"); datasetweekly.addValue(w4, "Non Spam","Week4"); JFreeChart stackedChart = ChartFactory.createStackedBarChart("Weekly Spam Rate Report",wm+","+wy, "Messages", datasetweekly, PlotOrientation.VERTICAL, true, true, false); CategoryPlot barchrt=stackedChart.getCategoryPlot(); barchrt.setRangeGridlinePaint(Color.RED); setResizable(false); jPanel13.setLayout(new java.awt.BorderLayout()); ChartPanel panelpie =new ChartPanel(stackedChart); jPanel13.removeAll(); jPanel13.add(panelpie,BorderLayout.CENTER); jPanel13.validate(); } Comparative Analysis: This method shows a comparison between Naïve Bayes and C4.5 and tells the user, which algorithm is better in catching Spam.
  • 53. 46 (See Screenshot 18) Function: private void comparativeActionPerformed(java.awt.event.ActionEvent evt) { DefaultCategoryDataset datasetcomparative = new DefaultCategoryDataset(); try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE NAIVEBAYES='spam'"); pst1=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE C45='spam'"); rs=pst.executeQuery(); int c45spamcount; int naivebayesspamcount; while(rs.next()) { c45spamcount=rs.getInt("count"); System.out.println(c45spamcount); datasetcomparative.addValue(c45spamcount,"Spam","C45"); } rs1=pst1.executeQuery(); while(rs1.next()) { naivebayesspamcount=rs1.getInt("count"); System.out.println(naivebayesspamcount); datasetcomparative.addValue(naivebayesspamcount,"Spam","Naive Bayes"); } } catch(Exception e) { System.out.println(e); } try { pst=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE NAIVEBAYES='nonspam'"); pst1=conn.prepareStatement("SELECT COUNT( * ) AS count FROM `username` WHERE C45='nonspam'"); rs=pst.executeQuery(); rs1=pst1.executeQuery(); int c45nonspamcount; int naivebayesnonspamcount; while(rs.next()) { c45nonspamcount=rs.getInt("count"); System.out.println(c45nonspamcount); datasetcomparative.addValue(c45nonspamcount,"Non Spam","C45"); } rs1=pst1.executeQuery(); while(rs1.next()) { naivebayesnonspamcount=rs1.getInt("count"); System.out.println(naivebayesnonspamcount); datasetcomparative.addValue(naivebayesnonspamcount,"Non Spam","Naive Bayes"); } } catch(Exception e)
  • 54. 47 { System.out.println(e); } JFreeChart stackedChart = ChartFactory.createStackedBarChart("Comparative Spam Rate Report", "Algorithm", "Spam/NonSpam",datasetcomparative, PlotOrientation.VERTICAL, true, true, false); CategoryPlot barchrt=stackedChart.getCategoryPlot(); setResizable(false); barchrt.setRangeGridlinePaint(Color.BLACK); jPanel13.setLayout(new java.awt.BorderLayout()); ChartPanel panelpie =new ChartPanel(stackedChart); jPanel13.removeAll(); jPanel13.add(panelpie,BorderLayout.CENTER); jPanel13.validate(); } User Defined This feature shows a comparison amongst the mails, which have been distinguished based on the keywords which have been specified by the user. This just helps the user in understanding which mails the user has received number of times. (See Screenshot 19) Function: private void userdefinedActionPerformed(java.awt.event.ActionEvent evt) { DefaultCategoryDataset barChartData=new DefaultCategoryDataset(); String sql="SELECT * FROM `username_keyword`"; try { pst=conn.prepareStatement(sql); rs=pst.executeQuery(); while(rs.next()) barChartData.setValue(rs.getInt("Count"),"Messages",rs.getString("Keyword")); } catch(Exception e) { System.out.println(e); } JFreeChart barChart=ChartFactory.createBarChart("User Preference Messages Quantity","Keyword","Message", barChartData, PlotOrientation.VERTICAL, rootPaneCheckingEnabled, rootPaneCheckingEnabled, rootPaneCheckingEnabled); CategoryPlot barchrt=barChart.getCategoryPlot(); barchrt.setRangeGridlinePaint(Color.ORANGE); jPanel13.setLayout(new java.awt.BorderLayout()); setResizable(false); ChartPanel panelpie =new ChartPanel(barChart); jPanel13.removeAll(); jPanel13.add(panelpie,BorderLayout.CENTER); jPanel13.validate();}
  • 56. 49 Results SCREENSHOTS: Screenshot 1: A screenshot of the connect dialog window. Screenshot 2: A screenshot of the homescreen which opens once the user is logging in
  • 57. 50 Screenshot 3: A Screenshot of the Main Page where all operations can be performed Screenshot 4: A Screenshot of the message viewer tab
  • 58. 51 Screenshot 5: The Save Dialog Box Appears when store in PC has been clicked Screenshot 6: A Screenshot of the Messaging Tab
  • 59. 52 Screenshot 7: A Screenshot of New Message box Screenshot 8: A Screenshot of Reply Message box Screenshot 9: A Screenshot of Forward Message Box
  • 60. 53 Screenshot 10: A screenshot of the credits page Screenshot 11: The Message Dialog
  • 61. 54 Screenshot 12:The File Chooser Screenshot 13: The Downloading Dialog
  • 62. 55 Analysis Screenshot 14: A screenshot of the Statistics tab Screenshot 15: The Annual Spam Rate Report
  • 63. 56 Screenshot 16: The Monthly Spam Rate Report Screenshot 17: The Weekly Spam Rate Report
  • 64. 57 Screenshot 18: Comparative Spam Rate Report Screenshot 19: User Defined Messages Quantity
  • 65. 58 Comparison of Parameters Parameter Naïve Bayes C4.5 True Positive 19 19 False Positive 0 1 True Negative 20 19 False Negative 1 1 True Positive Rate 0.95 0.95 False Positive Rate 0 0.05 True Negative Rate 1 0.95 False Negative Rate 0.05 0.5 Precision 1 0.95 Recall 0.95 0.95 F-Measure 0.974 0.95 Total Number of Mails Considered: 40
  • 67. 60 Conclusion Considering the necessity of E-Mail in an individual’s life, the need of classifying the messages is of utmost importance and it is necessary to be achieved. With the employment of various Spam Filtering techniques, and various classification algorithms, it is extremely easy to classify the information into various categories. Hence, E-Mail filtering classification and analysis using data mining approach has been achieved successfully.
  • 69. 62 Future Scope Cloud Based Email Archiving System The concept of cloud based email archiving is pretty simple. Broadly put, a service provider typically processes, manages and stores your business data in a hosted server and at a remote place either as a substitute or typically as an enhancement to your on premise infrastructure. Research reveals that cloud-based email archiving service is becoming rather popular over time with prominent growth in the number of corporate users served by this cloud based archival model. An email spam filter service on the cloud thus offers an array of significant benefits, which includes: 1. It’s rather predictable cost of ownership. 2. Its ability in letting the specialist providers manage tall those key email and related functions. 3. Its capability of freeing up the IT staff for other initiatives. 4. A paradigm shift from capital expenditure (CAPEX) to the operating expenditure (OPEX) model. 5. Ease and convenience of managing the IT services. 6. Comprehensive and thorough E DISCOVERY solution. 7. Reduced chance of virus, spam and malware attacks. 8. Inbound and outbound Email filtering 9. Agile E-mail accessibility. The concept of email storage on the cloud has been in use by the large corporate for many years. The scope and future of cloud based email archiving system thus looks extremely bright and is popular for services which ranges from email archiving to retrieval and spam filtration. Encrypted message based E-Mail Classification This is an application which will enable the user to fetch messages from the server and perform classification on the message on the basis of various encryption algorithms. The E-Mail application will consists of various encryption/decryption algorithms such as: 1. AES. 2. DES. 3. Additive Cipher. 4. Huffman’s Algorithm. 5. RSA Algorithm. On the basis of the information obtained, the application will decrypt the text obtained from the E- mail server and execute all the algorithms. On the basis of the result obtained, the best solution will be selected amongst all the decrypted texts. If however, the algorithm fails to decrypt the text, then the
  • 70. 63 message will be passed as non-encrypted text and further filtering according to the categories will take place. An Android Based Application for accessing Emails An Android Based Application can be created in order to access and bring about the classification of emails. This will enable the user to access his E-Mails from any location. We could make use of the same server to bring about accessing and storage of mails. Also, we can bring about the more user friendliness with the help of this application. Location based Analysis of Spam Rate Location based Analysis of Spam can be a really good feature that can be implemented in the future. We can take the location information from the user, or retrieve the location information from the email account of the user, and classify if that particular Email is spam or not. With location based analysis we can find out which country has maximum spam concentration. This can be graphically displayed using Google Maps and Java maps in our application.
  • 72. 65 References 1. Data Mining: Concepts and Techniques Jiawei Han (Author), Micheline Kamber (Author), Jian Pei (Author) 2. Videos on Java Swing programming by ‘Programming Knowledge` on www.youtube.com 3. Sun Certified Java Programming Kathy Sierra and Bert Bates 4. http://en.wikipedia.org/wiki/Naive_Bayes_classifier 5. http://en.wikipedia.org/wiki/C4.5 6. http://arxiv.org/pdf/cs/0006013.pdf 7. http://www.jfree.org/jfreechart/samples.html