SlideShare une entreprise Scribd logo
1  sur  12
Télécharger pour lire hors ligne
SHREYA GOPAL SUNDARI
PHISHING WEBSITE DETECTION
by MACHINE LEARNING
TECHNIQUES
INTRODUCTION
• Phishing is the most commonly used social engineering and cyber attack.
• Through such attacks, the phisher targets naïve online users by tricking them into
revealing confidential information, with the purpose of using it fraudulently.
• In order to avoid getting phished,
• users should have awareness of phishing websites.
• have a blacklist of phishing websites which requires the knowledge of website being detected
as phishing.
• detect them in their early appearance, using machine learning and deep neural network
algorithms.
• Of the above three, the machine learning based method is proven to be most
effective than the other methods.
• Even then, online users are still being trapped into revealing sensitive information in
phishing websites.
OBJECTIVES
A phishing website is a common social engineering method that mimics
trustful uniform resource locators (URLs) and webpages. The objective of this
project is to train machine learning models and deep neural nets on the dataset
created to predict phishing websites. Both phishing and benign URLs of websites
are gathered to form a dataset and from them required URL and website
content-based features are extracted. The performance level of each model is
measures and compared.
APPROACH
Below mentioned are the steps involved in the completion of this project:
• Collect dataset containing phishing and legitimate websites from the open source platforms.
• Write a code to extract the required features from the URL database.
• Analyze and preprocess the dataset by using EDA techniques.
• Divide the dataset into training and testing sets.
• Run selected machine learning and deep neural network algorithms like SVM, Random Forest,
Autoencoder on the dataset.
• Write a code for displaying the evaluation result considering accuracy metrics.
• Compare the obtained results for trained models and specify which is better.
DATA COLLECTION
• Legitimate URLs are collected from the dataset provided by University of
New Brunswick, https://www.unb.ca/cic/datasets/url-2016.html.
• From the collection, 5000 URLs are randomly picked.
• Phishing URLs are collected from opensource service called PhishTank . This
service provide a set of phishing URLs in multiple formats like csv, json etc.
that gets updated hourly.
• Form the obtained collection, 5000 URLs are randomly picked.
FEATURE SELECTION
• The following category of features are selected:
• Address Bar based Features
• Domain based Features
• HTML & Javascript based Feature
• Address Bar based Features considered are:
• Domian of URL • Redirection ‘//’ in URL
• IP Address in URL • ‘http/https’ in Domain name
• ‘@’ Symbol in URL • Using URL Shortening Service
• Length of URL • Prefix or Suffix "-" in Domain
• Depth of URL
FEATURE SELECTION (CONT.)
• Domain based Features considered are:
• HTML and JavaScript based Features considered are:
• All together 17 features are extracted from the dataset.
• DNS Record • Age of Domain
• Website Traffic • End Period of Domain
• Iframe Redirection • Disabling Right Click
• Status Bar Customization • Website Forwarding
FEATURES DISTRIBUTION
MACHINE LEARNING MODELS
• This is a supervised machine learning task. There are two major types of supervised
machine learning problems, called classification and regression.
• This data set comes under classification problem, as the input URL is classified as
phishing (1) or legitimate (0). The machine learning models (classification) considered
to train the dataset in this notebook are:
• Decision Tree
• Random Forest
• Multilayer Perceptrons
• XGBoost
• Autoencoder Neural Network
• Support Vector Machines
MODEL EVALUATION
• The models are evaluated, and the considered metric is accuracy.
• Below Figure shows the training and test dataset accuracy by the respective models:
• For the above it is clear that the XGBoost model gives better performance. The model is
saved for further usage.
NEXT STEPS
• Working on this project is very knowledgeable and worth the effort.
• Through this project, one can know a lot about the phishing websites and how
they are differentiated from legitimate ones.
• This project can be taken further by creating a browser extensions of
developing a GUI.
• These should classify the inputted URL to legitimate or phishing with the use of
the saved model.
Thank You…..

Contenu connexe

Tendances

Machine learning in Cyber Security
Machine learning in Cyber SecurityMachine learning in Cyber Security
Machine learning in Cyber SecurityRajathV2
 
Application of Machine Learning in Cybersecurity
Application of Machine Learning in CybersecurityApplication of Machine Learning in Cybersecurity
Application of Machine Learning in CybersecurityPratap Dangeti
 
Computer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonComputer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonAkash Satamkar
 
Fake news detection project
Fake news detection projectFake news detection project
Fake news detection projectHarshdaGhai
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detectionPartnered Health
 
12 types of DDoS attacks
12 types of DDoS attacks12 types of DDoS attacks
12 types of DDoS attacksHaltdos
 
Suspicious Activity Detection python Project Abstract
Suspicious Activity Detection python Project AbstractSuspicious Activity Detection python Project Abstract
Suspicious Activity Detection python Project AbstractVenkat Projects
 
Machine Learning in Malware Detection
Machine Learning in Malware DetectionMachine Learning in Malware Detection
Machine Learning in Malware DetectionKaspersky
 
Phishing detection & protection scheme
Phishing detection & protection schemePhishing detection & protection scheme
Phishing detection & protection schemeMussavir Shaikh
 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python Viren Rajput
 
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)PRISMA CSI
 
Ch 1: Web Application (In)security & Ch 2: Core Defense Mechanisms
Ch 1: Web Application (In)security & Ch 2: Core Defense Mechanisms Ch 1: Web Application (In)security & Ch 2: Core Defense Mechanisms
Ch 1: Web Application (In)security & Ch 2: Core Defense Mechanisms Sam Bowne
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learningdataalcott
 
Web scraping
Web scrapingWeb scraping
Web scrapingSelecto
 
Footprinting and reconnaissance
Footprinting and reconnaissanceFootprinting and reconnaissance
Footprinting and reconnaissanceNishaYadav177
 

Tendances (20)

Machine learning in Cyber Security
Machine learning in Cyber SecurityMachine learning in Cyber Security
Machine learning in Cyber Security
 
Application of Machine Learning in Cybersecurity
Application of Machine Learning in CybersecurityApplication of Machine Learning in Cybersecurity
Application of Machine Learning in Cybersecurity
 
Hash function
Hash function Hash function
Hash function
 
Computer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and PythonComputer Vision - Real Time Face Recognition using Open CV and Python
Computer Vision - Real Time Face Recognition using Open CV and Python
 
Fake news detection project
Fake news detection projectFake news detection project
Fake news detection project
 
Malware Detection using Machine Learning
Malware Detection using Machine Learning	Malware Detection using Machine Learning
Malware Detection using Machine Learning
 
Final spam-e-mail-detection
Final  spam-e-mail-detectionFinal  spam-e-mail-detection
Final spam-e-mail-detection
 
12 types of DDoS attacks
12 types of DDoS attacks12 types of DDoS attacks
12 types of DDoS attacks
 
Suspicious Activity Detection python Project Abstract
Suspicious Activity Detection python Project AbstractSuspicious Activity Detection python Project Abstract
Suspicious Activity Detection python Project Abstract
 
Machine Learning in Malware Detection
Machine Learning in Malware DetectionMachine Learning in Malware Detection
Machine Learning in Malware Detection
 
Phishing detection & protection scheme
Phishing detection & protection schemePhishing detection & protection scheme
Phishing detection & protection scheme
 
Web scraping in python
Web scraping in python Web scraping in python
Web scraping in python
 
IP Security
IP SecurityIP Security
IP Security
 
Web mining
Web mining Web mining
Web mining
 
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)Practical White Hat Hacker Training -  Passive Information Gathering(OSINT)
Practical White Hat Hacker Training - Passive Information Gathering(OSINT)
 
Social engineering
Social engineering Social engineering
Social engineering
 
Ch 1: Web Application (In)security & Ch 2: Core Defense Mechanisms
Ch 1: Web Application (In)security & Ch 2: Core Defense Mechanisms Ch 1: Web Application (In)security & Ch 2: Core Defense Mechanisms
Ch 1: Web Application (In)security & Ch 2: Core Defense Mechanisms
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Web scraping
Web scrapingWeb scraping
Web scraping
 
Footprinting and reconnaissance
Footprinting and reconnaissanceFootprinting and reconnaissance
Footprinting and reconnaissance
 

Similaire à Phishing Website Detection by Machine Learning Techniques Presentation.pdf

PHISHING URL - Review 1.pptx
PHISHING URL - Review 1.pptxPHISHING URL - Review 1.pptx
PHISHING URL - Review 1.pptxArulvincent3
 
Rootconf_phishing_v2
Rootconf_phishing_v2Rootconf_phishing_v2
Rootconf_phishing_v2Arjun BM
 
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptxCYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptxTAMILMANIP6
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Detecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningDetecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningIRJET Journal
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDaveEdwards12
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningYogesh Sharma
 
Static Analysis Primer
Static Analysis PrimerStatic Analysis Primer
Static Analysis PrimerCoverity
 
Avtar's ppt
Avtar's pptAvtar's ppt
Avtar's pptmak57
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for CybersecurityVMware Tanzu
 
Online talent sourcing - a future essentia
Online talent sourcing - a future essentiaOnline talent sourcing - a future essentia
Online talent sourcing - a future essentiaHSE Guru
 
How to build corporate size fraud prevention
How to build corporate size fraud preventionHow to build corporate size fraud prevention
How to build corporate size fraud preventionYury Leonychev
 
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTIONMOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTIONI Ruby
 
J2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai AcademicsJ2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai AcademicsMumbai Academisc
 
crawl technology saves money and time
crawl technology saves money and timecrawl technology saves money and time
crawl technology saves money and timeHashScraper Inc.
 

Similaire à Phishing Website Detection by Machine Learning Techniques Presentation.pdf (20)

PHISHING URL - Review 1.pptx
PHISHING URL - Review 1.pptxPHISHING URL - Review 1.pptx
PHISHING URL - Review 1.pptx
 
dasdweda PPT.pptx
dasdweda PPT.pptxdasdweda PPT.pptx
dasdweda PPT.pptx
 
Rootconf_phishing_v2
Rootconf_phishing_v2Rootconf_phishing_v2
Rootconf_phishing_v2
 
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptxCYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
CYBER THREAT DETECTION PLATFORM USING MACHINE LEARNING.pptx
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection
 
Detecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine LearningDetecting Phishing Websites Using Machine Learning
Detecting Phishing Websites Using Machine Learning
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Static Analysis Primer
Static Analysis PrimerStatic Analysis Primer
Static Analysis Primer
 
Avtar's ppt
Avtar's pptAvtar's ppt
Avtar's ppt
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
Final .pptx
Final .pptxFinal .pptx
Final .pptx
 
Online talent sourcing - a future essentia
Online talent sourcing - a future essentiaOnline talent sourcing - a future essentia
Online talent sourcing - a future essentia
 
How to build corporate size fraud prevention
How to build corporate size fraud preventionHow to build corporate size fraud prevention
How to build corporate size fraud prevention
 
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTIONMOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
 
What is web scraping?
What is web scraping?What is web scraping?
What is web scraping?
 
J2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai AcademicsJ2ee project lists:-Mumbai Academics
J2ee project lists:-Mumbai Academics
 
crawl technology saves money and time
crawl technology saves money and timecrawl technology saves money and time
crawl technology saves money and time
 
A density based clustering approach for web robot detection
A density based clustering approach for web robot detectionA density based clustering approach for web robot detection
A density based clustering approach for web robot detection
 

Dernier

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 

Dernier (20)

LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 

Phishing Website Detection by Machine Learning Techniques Presentation.pdf

  • 1. SHREYA GOPAL SUNDARI PHISHING WEBSITE DETECTION by MACHINE LEARNING TECHNIQUES
  • 2. INTRODUCTION • Phishing is the most commonly used social engineering and cyber attack. • Through such attacks, the phisher targets naïve online users by tricking them into revealing confidential information, with the purpose of using it fraudulently. • In order to avoid getting phished, • users should have awareness of phishing websites. • have a blacklist of phishing websites which requires the knowledge of website being detected as phishing. • detect them in their early appearance, using machine learning and deep neural network algorithms. • Of the above three, the machine learning based method is proven to be most effective than the other methods. • Even then, online users are still being trapped into revealing sensitive information in phishing websites.
  • 3. OBJECTIVES A phishing website is a common social engineering method that mimics trustful uniform resource locators (URLs) and webpages. The objective of this project is to train machine learning models and deep neural nets on the dataset created to predict phishing websites. Both phishing and benign URLs of websites are gathered to form a dataset and from them required URL and website content-based features are extracted. The performance level of each model is measures and compared.
  • 4. APPROACH Below mentioned are the steps involved in the completion of this project: • Collect dataset containing phishing and legitimate websites from the open source platforms. • Write a code to extract the required features from the URL database. • Analyze and preprocess the dataset by using EDA techniques. • Divide the dataset into training and testing sets. • Run selected machine learning and deep neural network algorithms like SVM, Random Forest, Autoencoder on the dataset. • Write a code for displaying the evaluation result considering accuracy metrics. • Compare the obtained results for trained models and specify which is better.
  • 5. DATA COLLECTION • Legitimate URLs are collected from the dataset provided by University of New Brunswick, https://www.unb.ca/cic/datasets/url-2016.html. • From the collection, 5000 URLs are randomly picked. • Phishing URLs are collected from opensource service called PhishTank . This service provide a set of phishing URLs in multiple formats like csv, json etc. that gets updated hourly. • Form the obtained collection, 5000 URLs are randomly picked.
  • 6. FEATURE SELECTION • The following category of features are selected: • Address Bar based Features • Domain based Features • HTML & Javascript based Feature • Address Bar based Features considered are: • Domian of URL • Redirection ‘//’ in URL • IP Address in URL • ‘http/https’ in Domain name • ‘@’ Symbol in URL • Using URL Shortening Service • Length of URL • Prefix or Suffix "-" in Domain • Depth of URL
  • 7. FEATURE SELECTION (CONT.) • Domain based Features considered are: • HTML and JavaScript based Features considered are: • All together 17 features are extracted from the dataset. • DNS Record • Age of Domain • Website Traffic • End Period of Domain • Iframe Redirection • Disabling Right Click • Status Bar Customization • Website Forwarding
  • 9. MACHINE LEARNING MODELS • This is a supervised machine learning task. There are two major types of supervised machine learning problems, called classification and regression. • This data set comes under classification problem, as the input URL is classified as phishing (1) or legitimate (0). The machine learning models (classification) considered to train the dataset in this notebook are: • Decision Tree • Random Forest • Multilayer Perceptrons • XGBoost • Autoencoder Neural Network • Support Vector Machines
  • 10. MODEL EVALUATION • The models are evaluated, and the considered metric is accuracy. • Below Figure shows the training and test dataset accuracy by the respective models: • For the above it is clear that the XGBoost model gives better performance. The model is saved for further usage.
  • 11. NEXT STEPS • Working on this project is very knowledgeable and worth the effort. • Through this project, one can know a lot about the phishing websites and how they are differentiated from legitimate ones. • This project can be taken further by creating a browser extensions of developing a GUI. • These should classify the inputted URL to legitimate or phishing with the use of the saved model.