SlideShare une entreprise Scribd logo
1  sur  21
CSMSS
Chh. Shahu College of Engineering,Aurangabad
Seminar On
“Email & SMS Spam Detection”
Guided By:
Dr. S.V. Khidse
Presenting By:
Kunal kalamkar(3271)
Department of Computer Science and Engineering
2021-22
1
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
Contents
 Introduction
 Technologies
 Libraries
 Machine Learning
 Data set
 Problem definition
 Description of dataset
 Methodology
 Algorithms
 Conclusion
 References
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 2
Introduction
3
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
In today’s globalized world, email is a primary source of communication. This
communication can vary from personal, business, corporate to government. With the rapid
increase in email usage, there has also been increase in the SPAM emails. SPAM emails, also
known as junk email involves nearly identical messages sent to numerous recipients by
email. Apart from being annoying, spam emails can also pose a security threat to computer
system. It is estimated that spam cost businesses on the order of $100 billion in 2007. In this
project, we use text mining to perform automatic spam filtering to use emails effectively. We
try to identify patterns using Data-mining classification algorithms to enable us classify the
emails as HAM or SPAM
Technologies
Technologies used :-
1. Python
2. HTML
3. CSS
4. JavaScript
Python :- Python is an interpreted, object-oriented, high-level programming language
with dynamic semantics developed by Guido van Rossum
Libraries: 1. Numpy
2. Pandas
3. Sklearn
4. NLTK
4
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
Libraries
NumPy:- NumPy is a library for the Python programming language, adding support for
large, multi-dimensional arrays and matrices, along with a large collection of high-level
mathematical functions to operate on these arrays. Moreover, NumPy forms the foundation
of the Machine Learning stack.
 Pandas:- Pandas is one of the tools in Machine Learning which is used for data cleaning
and analysis. It has features which are used for exploring, cleaning, transforming and
visualizing from data
 NLTK:- NLTK is intended to support research and teaching in NLP or closely related areas,
including empirical linguistics, cognitive science, artificial intelligence, information retrieval,
and machine learning.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 5
Continued…
 Matplotlib:- Matplotlib is a low-level library of Python which is used for data visualization.
It is easy to use and emulates MATLAB like graphs and visualization. This library is built on the
top of NumPy arrays and consist of several plots like line chart, bar chart, histogram, etc.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 6
Machine Learning
Arthur Samuel, an early American leader in the field of computer gaming and artificial
intelligence, coined the term “Machine Learning ” in 1959 while at IBM. He defined machine
learning as “the field of study that gives computers the ability to learn without being explicitly
programmed “.
• Machine learning is programming computers to optimize a performance criterion using
example data or past experience .
• The field of study known as machine learning is concerned with the question of how to
construct computer programs that automatically improve with experience.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 7
Data set
A machine learning dataset is a collection of data that is used to train the model. A dataset
acts as an example to teach the machine learning algorithm how to make predictions.
dataset as “a collection of data that is treated as a single unit by a computer”. This means
that a dataset contains a lot of separate pieces of data but can be used to train an algorithm
with the goal of finding predictable patterns inside the whole dataset.
How to train the data?
-> AI training data will vary depending on whether you’re using supervised or unsupervised
learning. Unsupervised learning uses unlabeled data. Models are tasked with finding
patterns (or similarities and deviations) in the data to make inferences and reach conclusions.
With supervised learning, on the other hand, humans must tag, label, or annotate the data
to their criteria, in order to train the model to reach the desired conclusion (output). Labeled
data is shown in the examples above, where the desired outputs are predetermined.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 8
Problem Definition
 Short Message (SMS) and email has grown into a multi-billion dollars commercial
industry.
 SMS spam is still not as common as email spam.
 SMS Spam is showing growth, and in 2012 in parts of Asia up to 30% of text messages
was spam.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 9
Description of Dataset
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 10
Spam email percentage in the dataset = 12.63268156424581 %
Ham email percentage in the dataset = 87.37731843575419 %
The dataset consist of 5574 text message from UCI Machine learning repository
Methodology
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 11
Algorithms
 Different algorithms used for email spam detection:-
1. Deep learning
2. Naive Bayes
3. Support Vector Machines
4. K-Nearest Neighbour
5. Rough Sets
6. Random Forests
7. Multinomial naive
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 12
Classification of Algorithms (Naïve Bayes)
NB algorithm is applied to the final extracted features. The speed and simplicity along
with high accuracy of this algorithm makes it a desirable classifier for spam detection
problems. Applying naïve Bayes with multinomial event model to the dataset and using
10-fold cross validation results in Table 1.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 13
Word Cloud for Spam/Ham words
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 14
Top Spam Words Top Ham Words
Which email/SMS is generally longer ?
Here, we have calculated average word count for Ham Emails and Spam Emails
separately and then predicted which emails are generally longer.
Average Word Count for Ham Emails: 4516.00 words
Average Word Count for Spam Emails: 653.000 words
So, it can be concluded that, Ham emails are generally longer than spam emails.
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 15
Spam Avg.
Ham Avg.
Home of application
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 16
Detecting spam messages
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 17
Detecting ham messages/email
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 18
Conclusion
Spam is a major problem in today's world. Spam messages are the most unwanted
messages the end user clients receive in our daily lives. Spam emails are available
nothing but an ad for any company, any kind of virus etc. It will be too much. It is easy
for hackers to access our system using these spam emails
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 19
References
https://www.isroset.org/journal/IJSRCSE/full_paper_view.php?paper_id=444
 https://www.freecodecamp.org/news/send-emailsusing-code-4fcea9df63f/
https://www.scribd.com/doc/61315817/IntraMailing-System
https://www.scribd.com/doc/19518895/Researchon-Mail-System-Project-report-for-Bachelor-inComputer-Rajendra-Man-
Banepali
 https://nevonprojects.com/email-client-project/
https://www.freeprojectz.com/python-djangoproject/mailing-system
 https://www.academia.edu/36614854/Spam_Filtering_Using_ML_Algorithms
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 20
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 21

Contenu connexe

Tendances

Leaky Bucket & Tocken Bucket - Traffic shaping
Leaky Bucket & Tocken Bucket - Traffic shapingLeaky Bucket & Tocken Bucket - Traffic shaping
Leaky Bucket & Tocken Bucket - Traffic shapingVimal Dewangan
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationAnkit Gupta
 
Python libraries for data science
Python libraries for data sciencePython libraries for data science
Python libraries for data sciencenilashri2
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural networkKIRAN R
 
final-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptxfinal-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptxinfotowards
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment AnalysisRebecca Williams
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
Fake Image Identification
Fake Image IdentificationFake Image Identification
Fake Image IdentificationVenkat Projects
 
Machine learning seminar ppt
Machine learning seminar pptMachine learning seminar ppt
Machine learning seminar pptRAHUL DANGWAL
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learningdataalcott
 
Minor project Report for "Quiz Application"
Minor project Report for "Quiz Application"Minor project Report for "Quiz Application"
Minor project Report for "Quiz Application"Harsh Verma
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Simple Mail Transfer Protocol
Simple Mail Transfer ProtocolSimple Mail Transfer Protocol
Simple Mail Transfer ProtocolUjjayanta Bhaumik
 
Applications of Machine Learning
Applications of Machine LearningApplications of Machine Learning
Applications of Machine LearningHayim Makabee
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
 

Tendances (20)

Leaky Bucket & Tocken Bucket - Traffic shaping
Leaky Bucket & Tocken Bucket - Traffic shapingLeaky Bucket & Tocken Bucket - Traffic shaping
Leaky Bucket & Tocken Bucket - Traffic shaping
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
 
Python libraries for data science
Python libraries for data sciencePython libraries for data science
Python libraries for data science
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 
final-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptxfinal-spam-e-mail-detection-180125111231.pptx
final-spam-e-mail-detection-180125111231.pptx
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Fake Image Identification
Fake Image IdentificationFake Image Identification
Fake Image Identification
 
Machine learning seminar ppt
Machine learning seminar pptMachine learning seminar ppt
Machine learning seminar ppt
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Minor project Report for "Quiz Application"
Minor project Report for "Quiz Application"Minor project Report for "Quiz Application"
Minor project Report for "Quiz Application"
 
Text summarization
Text summarizationText summarization
Text summarization
 
Machine learning
Machine learningMachine learning
Machine learning
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Simple Mail Transfer Protocol
Simple Mail Transfer ProtocolSimple Mail Transfer Protocol
Simple Mail Transfer Protocol
 
Applications of Machine Learning
Applications of Machine LearningApplications of Machine Learning
Applications of Machine Learning
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 

Similaire à Spam email detection using machine learning PPT.pptx

Classification with R
Classification with RClassification with R
Classification with RNajima Begum
 
Titles with Abstracts_2023-2024_Data Mining.pdf
Titles with Abstracts_2023-2024_Data Mining.pdfTitles with Abstracts_2023-2024_Data Mining.pdf
Titles with Abstracts_2023-2024_Data Mining.pdfinfo751436
 
YASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptxYASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptxYashShiva3
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
 
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...IJNSA Journal
 
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfHassanElalfy4
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek outiaemedu
 
Email Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningEmail Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningIRJET Journal
 
Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...ijsrd.com
 
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailijsrd.com
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM cscpconf
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM csandit
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine LearningIRJET Journal
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMIRJET Journal
 
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmObfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmjournalBEEI
 
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmObfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmjournalBEEI
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Editor IJCATR
 

Similaire à Spam email detection using machine learning PPT.pptx (20)

spam_msg_detection.pdf
spam_msg_detection.pdfspam_msg_detection.pdf
spam_msg_detection.pdf
 
Classification with R
Classification with RClassification with R
Classification with R
 
Titles with Abstracts_2023-2024_Data Mining.pdf
Titles with Abstracts_2023-2024_Data Mining.pdfTitles with Abstracts_2023-2024_Data Mining.pdf
Titles with Abstracts_2023-2024_Data Mining.pdf
 
YASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptxYASH DATA SCIENCE SEMINAR.pptx
YASH DATA SCIENCE SEMINAR.pptx
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
 
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
A NOVEL EVALUATION APPROACH TO FINDING LIGHTWEIGHT MACHINE LEARNING ALGORITHM...
 
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdf
 
Anomalous symmetry succession for seek out
Anomalous symmetry succession for seek outAnomalous symmetry succession for seek out
Anomalous symmetry succession for seek out
 
Email Spam Detection Using Machine Learning
Email Spam Detection Using Machine LearningEmail Spam Detection Using Machine Learning
Email Spam Detection Using Machine Learning
 
Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...
 
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
 
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM
 
Eckovation Machine Learning
Eckovation Machine LearningEckovation Machine Learning
Eckovation Machine Learning
 
IRJET - Fake News Detection using Machine Learning
IRJET -  	  Fake News Detection using Machine LearningIRJET -  	  Fake News Detection using Machine Learning
IRJET - Fake News Detection using Machine Learning
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
 
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmObfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithm
 
Obfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithmObfuscated computer virus detection using machine learning algorithm
Obfuscated computer virus detection using machine learning algorithm
 
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
Spam Detection in Social Networks Using Correlation Based Feature Subset Sele...
 

Dernier

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 

Dernier (20)

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 

Spam email detection using machine learning PPT.pptx

  • 1. CSMSS Chh. Shahu College of Engineering,Aurangabad Seminar On “Email & SMS Spam Detection” Guided By: Dr. S.V. Khidse Presenting By: Kunal kalamkar(3271) Department of Computer Science and Engineering 2021-22 1 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
  • 2. Contents  Introduction  Technologies  Libraries  Machine Learning  Data set  Problem definition  Description of dataset  Methodology  Algorithms  Conclusion  References DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 2
  • 3. Introduction 3 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD In today’s globalized world, email is a primary source of communication. This communication can vary from personal, business, corporate to government. With the rapid increase in email usage, there has also been increase in the SPAM emails. SPAM emails, also known as junk email involves nearly identical messages sent to numerous recipients by email. Apart from being annoying, spam emails can also pose a security threat to computer system. It is estimated that spam cost businesses on the order of $100 billion in 2007. In this project, we use text mining to perform automatic spam filtering to use emails effectively. We try to identify patterns using Data-mining classification algorithms to enable us classify the emails as HAM or SPAM
  • 4. Technologies Technologies used :- 1. Python 2. HTML 3. CSS 4. JavaScript Python :- Python is an interpreted, object-oriented, high-level programming language with dynamic semantics developed by Guido van Rossum Libraries: 1. Numpy 2. Pandas 3. Sklearn 4. NLTK 4 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD
  • 5. Libraries NumPy:- NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Moreover, NumPy forms the foundation of the Machine Learning stack.  Pandas:- Pandas is one of the tools in Machine Learning which is used for data cleaning and analysis. It has features which are used for exploring, cleaning, transforming and visualizing from data  NLTK:- NLTK is intended to support research and teaching in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 5
  • 6. Continued…  Matplotlib:- Matplotlib is a low-level library of Python which is used for data visualization. It is easy to use and emulates MATLAB like graphs and visualization. This library is built on the top of NumPy arrays and consist of several plots like line chart, bar chart, histogram, etc. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 6
  • 7. Machine Learning Arthur Samuel, an early American leader in the field of computer gaming and artificial intelligence, coined the term “Machine Learning ” in 1959 while at IBM. He defined machine learning as “the field of study that gives computers the ability to learn without being explicitly programmed “. • Machine learning is programming computers to optimize a performance criterion using example data or past experience . • The field of study known as machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 7
  • 8. Data set A machine learning dataset is a collection of data that is used to train the model. A dataset acts as an example to teach the machine learning algorithm how to make predictions. dataset as “a collection of data that is treated as a single unit by a computer”. This means that a dataset contains a lot of separate pieces of data but can be used to train an algorithm with the goal of finding predictable patterns inside the whole dataset. How to train the data? -> AI training data will vary depending on whether you’re using supervised or unsupervised learning. Unsupervised learning uses unlabeled data. Models are tasked with finding patterns (or similarities and deviations) in the data to make inferences and reach conclusions. With supervised learning, on the other hand, humans must tag, label, or annotate the data to their criteria, in order to train the model to reach the desired conclusion (output). Labeled data is shown in the examples above, where the desired outputs are predetermined. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 8
  • 9. Problem Definition  Short Message (SMS) and email has grown into a multi-billion dollars commercial industry.  SMS spam is still not as common as email spam.  SMS Spam is showing growth, and in 2012 in parts of Asia up to 30% of text messages was spam. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 9
  • 10. Description of Dataset DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 10 Spam email percentage in the dataset = 12.63268156424581 % Ham email percentage in the dataset = 87.37731843575419 % The dataset consist of 5574 text message from UCI Machine learning repository
  • 11. Methodology DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 11
  • 12. Algorithms  Different algorithms used for email spam detection:- 1. Deep learning 2. Naive Bayes 3. Support Vector Machines 4. K-Nearest Neighbour 5. Rough Sets 6. Random Forests 7. Multinomial naive DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 12
  • 13. Classification of Algorithms (Naïve Bayes) NB algorithm is applied to the final extracted features. The speed and simplicity along with high accuracy of this algorithm makes it a desirable classifier for spam detection problems. Applying naïve Bayes with multinomial event model to the dataset and using 10-fold cross validation results in Table 1. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 13
  • 14. Word Cloud for Spam/Ham words DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 14 Top Spam Words Top Ham Words
  • 15. Which email/SMS is generally longer ? Here, we have calculated average word count for Ham Emails and Spam Emails separately and then predicted which emails are generally longer. Average Word Count for Ham Emails: 4516.00 words Average Word Count for Spam Emails: 653.000 words So, it can be concluded that, Ham emails are generally longer than spam emails. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 15 Spam Avg. Ham Avg.
  • 16. Home of application DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 16
  • 17. Detecting spam messages DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 17
  • 18. Detecting ham messages/email DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 18
  • 19. Conclusion Spam is a major problem in today's world. Spam messages are the most unwanted messages the end user clients receive in our daily lives. Spam emails are available nothing but an ad for any company, any kind of virus etc. It will be too much. It is easy for hackers to access our system using these spam emails DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 19
  • 21. DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING, CSMSS, CSCOE, AURANGABAD 21