SlideShare une entreprise Scribd logo
1  sur  15
CANTINA
A Content-Based Approach to Detecting Phishing Web
Sites
•CANTINA is a content-based
approach.
•Examines whether the content is
legitimate or not.
•Detects phishing URLs and links.
ABSTRACT
INTRODUCTION
• Phishing
A kind of attack in which victims are tricked by
spoofed emails and fraudulent web sites into giving
up personal information
•How many phishing sites are there?
9,255 unique phishing sites were reported in June of
2006 alone
•How much phishing costs each year?
$1 billion to 2.8 billion per year
EXISTING SYSTEM
• NetCraft(Surface Characteristics)
• SpoofGuard(Surface Characteristics and
blacklist)
• Cloudmark(Blacklist )
PROPOSED SYSTEM
• Detects phishing websites
• Examines text-based content along with surface
characteristics.
• Text based content includes:
-Age of Domain.
-Known Images.
-Suspicious URL.
-Suspicious links.
 Detects phishing links in users email.
TF-IDF ALGORITHM
• Term Frequency (TF)
–The number of times a given term appears
in a specific document
–Measure of the importance of the term
within the particular document
• Inverse Document Frequency (IDF)
–Measure how common a term is across an
entire collection of documents
• High TF-IDF weight means High TF
REAL EBAY WEBPAGE
FAKE EBAY WEBPAGE
MODULES
• Parsing the web pages
• Generating the lexical signature
• Testing Process
• Report Generation
Parsing the web pages
• Link, anchor tag, form tag and attachment in the
web pages is turned into corresponding Text Link,
HTML Link e.t.c.
•Done by parsing each Text
• Uses HTML Parser API
• It is used for extracting information from
HTML code
Generating the lexical signature
• TF-IDF algorithm used to generate
lexical signatures.
• Calculating the TF-IDF value for each
word in a document.
• Selecting the words with highest
value.
Testing Process
• Feed this lexical signature to a search
engine.
• Check domain name of the current
web page matches the domain name
of the N top search results.
Report Generation
• If a page is Legitimate it returns
“legitimate”
• If a page is phishing it returns
“phishing”
• Used to detect fraudulent websites,
emails.
•Protects from giving up personal
information like credit card numbers,
bank details, account passwords etc.
•Used to detect suspicious links in
email.
APPLICATIONS
•Content-based approach for detecting
phishing websites.
•User friendly interface for the users.
•Anti-phishing website that protects users
from giving their personal information.
CONCLUSION

Contenu connexe

Similaire à Cantina content based approach to detect phishing websites

Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites Nikhil Soni
 
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010Yahoo Developer Network
 
Cyberscout Corporate Security
Cyberscout   Corporate SecurityCyberscout   Corporate Security
Cyberscout Corporate SecurityFiroze Hussain
 
introduction for web connectivity (IoT)
introduction for web connectivity (IoT)introduction for web connectivity (IoT)
introduction for web connectivity (IoT)FabMinds
 
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptxChapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptxborith10b
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text MiningHemant Sharma
 
Eba ppt rajesh
Eba ppt rajeshEba ppt rajesh
Eba ppt rajeshRajeshP153
 
Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learningijtsrd
 
Automation Attacks At Scale
Automation Attacks At ScaleAutomation Attacks At Scale
Automation Attacks At ScaleMayank Dhiman
 
Identity Theft
Identity TheftIdentity Theft
Identity TheftSimpletel
 
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...Chi En (Ashley) Shen
 
An introduction to web analytics
An introduction to web analyticsAn introduction to web analytics
An introduction to web analyticsShilpa P
 
1. web technology basics
1. web technology basics1. web technology basics
1. web technology basicsJyoti Yadav
 
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...Selman Bozkır
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfVaralakshmiKC
 
Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Yahoo Developer Network
 
BlueVenn: Creating and Using the 'Golden Customer Record'
BlueVenn: Creating and Using the 'Golden Customer Record'BlueVenn: Creating and Using the 'Golden Customer Record'
BlueVenn: Creating and Using the 'Golden Customer Record'Daniel Williams
 

Similaire à Cantina content based approach to detect phishing websites (20)

Detection of Phishing Websites
Detection of Phishing Websites Detection of Phishing Websites
Detection of Phishing Websites
 
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
HadoopSummit_2010_big dataspamchallange_hadoopsummit2010
 
Cyberscout Corporate Security
Cyberscout   Corporate SecurityCyberscout   Corporate Security
Cyberscout Corporate Security
 
Web mining
Web miningWeb mining
Web mining
 
introduction for web connectivity (IoT)
introduction for web connectivity (IoT)introduction for web connectivity (IoT)
introduction for web connectivity (IoT)
 
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptxChapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
Chapter2_2018 The Internet, the Web, and Electronic Commerce.pptx
 
Web Mining & Text Mining
Web Mining & Text MiningWeb Mining & Text Mining
Web Mining & Text Mining
 
Eba ppt rajesh
Eba ppt rajeshEba ppt rajesh
Eba ppt rajesh
 
Detecting Phishing using Machine Learning
Detecting Phishing using Machine LearningDetecting Phishing using Machine Learning
Detecting Phishing using Machine Learning
 
Automation Attacks At Scale
Automation Attacks At ScaleAutomation Attacks At Scale
Automation Attacks At Scale
 
Identity Theft
Identity TheftIdentity Theft
Identity Theft
 
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
Catching the Golden Snitch- Leveraging Threat Intelligence Platforms to Defen...
 
Phishing
PhishingPhishing
Phishing
 
An introduction to web analytics
An introduction to web analyticsAn introduction to web analytics
An introduction to web analytics
 
1. web technology basics
1. web technology basics1. web technology basics
1. web technology basics
 
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
Phishing Attacks: Trends, Detection Systems and Computer Vision as a Promisin...
 
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdfPhishing Website Detection by Machine Learning Techniques Presentation.pdf
Phishing Website Detection by Machine Learning Techniques Presentation.pdf
 
Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010Winning the Big Data SPAM Challenge__HadoopSummit2010
Winning the Big Data SPAM Challenge__HadoopSummit2010
 
BlueVenn: Creating and Using the 'Golden Customer Record'
BlueVenn: Creating and Using the 'Golden Customer Record'BlueVenn: Creating and Using the 'Golden Customer Record'
BlueVenn: Creating and Using the 'Golden Customer Record'
 
DC presentation 1
DC presentation 1DC presentation 1
DC presentation 1
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

Dernier (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Cantina content based approach to detect phishing websites

  • 1. CANTINA A Content-Based Approach to Detecting Phishing Web Sites
  • 2. •CANTINA is a content-based approach. •Examines whether the content is legitimate or not. •Detects phishing URLs and links. ABSTRACT
  • 3. INTRODUCTION • Phishing A kind of attack in which victims are tricked by spoofed emails and fraudulent web sites into giving up personal information •How many phishing sites are there? 9,255 unique phishing sites were reported in June of 2006 alone •How much phishing costs each year? $1 billion to 2.8 billion per year
  • 4. EXISTING SYSTEM • NetCraft(Surface Characteristics) • SpoofGuard(Surface Characteristics and blacklist) • Cloudmark(Blacklist )
  • 5. PROPOSED SYSTEM • Detects phishing websites • Examines text-based content along with surface characteristics. • Text based content includes: -Age of Domain. -Known Images. -Suspicious URL. -Suspicious links.  Detects phishing links in users email.
  • 6. TF-IDF ALGORITHM • Term Frequency (TF) –The number of times a given term appears in a specific document –Measure of the importance of the term within the particular document • Inverse Document Frequency (IDF) –Measure how common a term is across an entire collection of documents • High TF-IDF weight means High TF
  • 9. MODULES • Parsing the web pages • Generating the lexical signature • Testing Process • Report Generation
  • 10. Parsing the web pages • Link, anchor tag, form tag and attachment in the web pages is turned into corresponding Text Link, HTML Link e.t.c. •Done by parsing each Text • Uses HTML Parser API • It is used for extracting information from HTML code
  • 11. Generating the lexical signature • TF-IDF algorithm used to generate lexical signatures. • Calculating the TF-IDF value for each word in a document. • Selecting the words with highest value.
  • 12. Testing Process • Feed this lexical signature to a search engine. • Check domain name of the current web page matches the domain name of the N top search results.
  • 13. Report Generation • If a page is Legitimate it returns “legitimate” • If a page is phishing it returns “phishing”
  • 14. • Used to detect fraudulent websites, emails. •Protects from giving up personal information like credit card numbers, bank details, account passwords etc. •Used to detect suspicious links in email. APPLICATIONS
  • 15. •Content-based approach for detecting phishing websites. •User friendly interface for the users. •Anti-phishing website that protects users from giving their personal information. CONCLUSION