SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
1/ 17
®
Improving
SPAM detection
1 de março 2016
®
2/ 17
®
Whois
● Antonio Costa – Cooler
● Just another System analyst
● Github CoolerVoid
●
● https://github.com/CoolerVoid
Contact: acosta@conviso.com.br
coolerlair@gmail.com
3/ 17
®
How it works
● Anti-Spam - The common way
● Get E-mails POP3 / IMAP ...
● Validate
● Clean all and tokenization
● BoW (Bag-of-words), SoW(Set-of-Words)...
● tf–idf (term frequency–inverse document
frequency)...
● Supervised learning
● Classification (SVM, KNN, NB, Random forest... )
4/ 17
®
How it works
● Anti-Spam - The common way
● Get E-mails POP3 / IMAP
● Validate
– Country-based filtering
– DNS-based blacklists
– Enforcing RFC standards
– SMTP callback verification
5/ 17
®
● DNS-based blacklists
6/ 17
®
Wake UP
7/ 17
®
How it works
● Anti-Spam - The common way
● Get E-mails POP3 / IMAP ... - INPUT STRING
● Validate
● Clean all and tokenization
● BoW (Bag-of-words), SoW(Set-of-Words), tf–idf
(term frequency–inverse document frequency)...
Create MATRIX
● Supervised learning – USING MATRIX
● Classification (SVM, KNN, NB, Random forest... )
8/ 17
®
Bag-of-words
[ 1 ] - “Luan likes to make hacking. Josimar likes to make
hacking too.”
[ 2 ] - “Luan also likes to web hacking.”
● Create array of words ( tokenize... )
{ “Luan”,”likes”,”to”,”make”,”hacking”,”Josimar”,”too”,
”also”,”web”} Total of 9 elements
● Count number of appers !
[0] – { 1, 2, 2, 2, 2, 1, 1, 0, 0 }
[1] – { 1, 1, 1, 0, 1, 0, 0, 1, 1 }
9/ 17
®
The common way
Look this following
10/ 17
®
The common way
Why naive bayes ?
● At my tests !
KNN 96% Slow
Super simple, you're just doing a bunch of counts. Naive Bayes is
an eager learning classifier and it is much faster than KNN.
Nodaways it could be used for prediction in real time.
Classifier Accuracy Performance
SVM 92% Medium
NB 94% Fast
11/ 17
®
My way
Automatos like a Match Rules
● Gain Accuracy !
● Gain Performance !
● Because can match to SPAM before to use classifier !
● www.site.com/www.bank.com/
● URL/malware.exe rule like URL/[a-zA-Z]*.exe ...
● Rule like to detect IP at URL
● Deterministic finite automaton to detect
● Use ranking !
NB 94% +4% Fast
12/ 17
®
My way
Automatos like a Match Rules
● Gain Accuracy !
● Gain Performance !
● Because can match to SPAM before to use classifier !
● Deterministic finite automaton at Rules to detect
● www.site.com/www.bank.com/
● URL/malware.exe rule like URL/[a-zA-Z]*.exe ...
● Rule like to detect IP at URL
● Rule to detect Phishing
● Use Ranking !
NB 94% +4% Fast
13/ 17
®
Why Ranking ?
Automatos like a Match Rules
● Gain Accuracy !
NB 94% +4% Fast
14/ 17
®
E-mail audit
The project !
● C++ at all source code ! 100% Open Source !
● IMAP – communication
● Blacklists – DNS, bad domains, e-mail address...
● Deterministic Finite Automaton – Filters
● Tf–idf (term frequency–inverse document
frequency)
● Naive bayes – classifier
15/ 17
®
My way
Automatos like a Match Rules
● Gain Accuracy !
● Gain Performance !
● Because can match to SPAM before to use classifier !
● www.site.com/www.bank.com/
● URL/malware.exe rule like URL/[a-zA-Z]*.exe ...
● Rule like to detect IP at URL
● Deterministic finite automaton to detect
● Use ranking !
NB 94% +4% Fast
16/ 17
®
E-mail audit
The project !
● At the future, using GPU to use KNN and automatons...
● Results with GPU turns all fast...
● Next step 100% of accuracy ?
https://github.com/CoolerVoid/email_audit
17/ 17
®
Thanks
● https://github.com/CoolerVoid

Contenu connexe

En vedette (6)

detector de ladrão com laser
detector de ladrão com laserdetector de ladrão com laser
detector de ladrão com laser
 
0d1n
0d1n0d1n
0d1n
 
Vivendo de hacking
Vivendo de hackingVivendo de hacking
Vivendo de hacking
 
WAFFLE - A Web Application Firewall that defies rules
WAFFLE - A Web Application Firewall that defies rulesWAFFLE - A Web Application Firewall that defies rules
WAFFLE - A Web Application Firewall that defies rules
 
Raptor web application firewall
Raptor web application firewallRaptor web application firewall
Raptor web application firewall
 
burlando um WAF
burlando um WAFburlando um WAF
burlando um WAF
 

Similaire à Improving spam detection with automaton

Course_Presentation cyber --------------.pptx
Course_Presentation cyber --------------.pptxCourse_Presentation cyber --------------.pptx
Course_Presentation cyber --------------.pptxssuser020436
 
Gates Toorcon X New School Information Gathering
Gates Toorcon X New School Information GatheringGates Toorcon X New School Information Gathering
Gates Toorcon X New School Information GatheringChris Gates
 
Fuzzing softwares for bugs - OWASP Seasides
Fuzzing softwares for bugs - OWASP SeasidesFuzzing softwares for bugs - OWASP Seasides
Fuzzing softwares for bugs - OWASP SeasidesOWASPSeasides
 
Spam Detection Using Machine Learning (ML)
Spam Detection Using Machine Learning (ML)Spam Detection Using Machine Learning (ML)
Spam Detection Using Machine Learning (ML)JimmyBkk
 
Scraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHPScraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHPPaul Redmond
 
Hacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning TechniquesHacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning Techniquesamiable_indian
 
Extract Method Refactoring Workshop (2016)
Extract Method Refactoring Workshop (2016)Extract Method Refactoring Workshop (2016)
Extract Method Refactoring Workshop (2016)Peter Kofler
 
Massive emailing with Linux, Postfix and Ruby on Rails
Massive emailing with Linux, Postfix and Ruby on RailsMassive emailing with Linux, Postfix and Ruby on Rails
Massive emailing with Linux, Postfix and Ruby on Railsibelmonte
 
Php|tek '12 It's More Than Just Style
Php|tek '12  It's More Than Just StylePhp|tek '12  It's More Than Just Style
Php|tek '12 It's More Than Just StyleLB Denker
 
Web performance optimization - MercadoLibre
Web performance optimization - MercadoLibreWeb performance optimization - MercadoLibre
Web performance optimization - MercadoLibrePablo Moretti
 
Web performance mercadolibre - ECI 2013
Web performance   mercadolibre - ECI 2013Web performance   mercadolibre - ECI 2013
Web performance mercadolibre - ECI 2013Santiago Aimetta
 
Webspam (English Version)
Webspam (English Version)Webspam (English Version)
Webspam (English Version)Dirk Haun
 
Introduction to Windows Dictionary Attacks
Introduction to Windows Dictionary AttacksIntroduction to Windows Dictionary Attacks
Introduction to Windows Dictionary AttacksScott Sutherland
 
Hogy néz ki egy pentest meló a gyakorlatban?
Hogy néz ki egy pentest meló a gyakorlatban?Hogy néz ki egy pentest meló a gyakorlatban?
Hogy néz ki egy pentest meló a gyakorlatban?hackersuli
 
Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011Linuxmalaysia Malaysia
 
How an Enterprise SPAM Filter Works
How an Enterprise SPAM Filter Works How an Enterprise SPAM Filter Works
How an Enterprise SPAM Filter Works Pinpointe On-Demand
 
The Recording HTTP Proxy: Not Yet Another Messiah - Bulgaria PHP 2019
The Recording HTTP Proxy: Not Yet Another Messiah - Bulgaria PHP 2019The Recording HTTP Proxy: Not Yet Another Messiah - Bulgaria PHP 2019
The Recording HTTP Proxy: Not Yet Another Messiah - Bulgaria PHP 2019Viktor Todorov
 

Similaire à Improving spam detection with automaton (20)

Course_Presentation cyber --------------.pptx
Course_Presentation cyber --------------.pptxCourse_Presentation cyber --------------.pptx
Course_Presentation cyber --------------.pptx
 
Gates Toorcon X New School Information Gathering
Gates Toorcon X New School Information GatheringGates Toorcon X New School Information Gathering
Gates Toorcon X New School Information Gathering
 
Fuzzing softwares for bugs - OWASP Seasides
Fuzzing softwares for bugs - OWASP SeasidesFuzzing softwares for bugs - OWASP Seasides
Fuzzing softwares for bugs - OWASP Seasides
 
Spam Detection Using Machine Learning (ML)
Spam Detection Using Machine Learning (ML)Spam Detection Using Machine Learning (ML)
Spam Detection Using Machine Learning (ML)
 
Scraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHPScraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHP
 
Footprinting tools for security auditors
Footprinting tools for security auditorsFootprinting tools for security auditors
Footprinting tools for security auditors
 
Enumeration
EnumerationEnumeration
Enumeration
 
Hacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning TechniquesHacking With Nmap - Scanning Techniques
Hacking With Nmap - Scanning Techniques
 
Extract Method Refactoring Workshop (2016)
Extract Method Refactoring Workshop (2016)Extract Method Refactoring Workshop (2016)
Extract Method Refactoring Workshop (2016)
 
Massive emailing with Linux, Postfix and Ruby on Rails
Massive emailing with Linux, Postfix and Ruby on RailsMassive emailing with Linux, Postfix and Ruby on Rails
Massive emailing with Linux, Postfix and Ruby on Rails
 
Php|tek '12 It's More Than Just Style
Php|tek '12  It's More Than Just StylePhp|tek '12  It's More Than Just Style
Php|tek '12 It's More Than Just Style
 
Web performance optimization - MercadoLibre
Web performance optimization - MercadoLibreWeb performance optimization - MercadoLibre
Web performance optimization - MercadoLibre
 
Web performance mercadolibre - ECI 2013
Web performance   mercadolibre - ECI 2013Web performance   mercadolibre - ECI 2013
Web performance mercadolibre - ECI 2013
 
Webspam (English Version)
Webspam (English Version)Webspam (English Version)
Webspam (English Version)
 
Introduction to Windows Dictionary Attacks
Introduction to Windows Dictionary AttacksIntroduction to Windows Dictionary Attacks
Introduction to Windows Dictionary Attacks
 
Hogy néz ki egy pentest meló a gyakorlatban?
Hogy néz ki egy pentest meló a gyakorlatban?Hogy néz ki egy pentest meló a gyakorlatban?
Hogy néz ki egy pentest meló a gyakorlatban?
 
Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011Introduction To ICT Security Audit OWASP Day Malaysia 2011
Introduction To ICT Security Audit OWASP Day Malaysia 2011
 
How an Enterprise SPAM Filter Works
How an Enterprise SPAM Filter Works How an Enterprise SPAM Filter Works
How an Enterprise SPAM Filter Works
 
Symfony Performance
Symfony PerformanceSymfony Performance
Symfony Performance
 
The Recording HTTP Proxy: Not Yet Another Messiah - Bulgaria PHP 2019
The Recording HTTP Proxy: Not Yet Another Messiah - Bulgaria PHP 2019The Recording HTTP Proxy: Not Yet Another Messiah - Bulgaria PHP 2019
The Recording HTTP Proxy: Not Yet Another Messiah - Bulgaria PHP 2019
 

Plus de Antonio Costa aka Cooler_ (9)

Strange security mitigations
Strange security mitigationsStrange security mitigations
Strange security mitigations
 
Understand study
Understand studyUnderstand study
Understand study
 
WAF protections and bypass resources
WAF protections and bypass resourcesWAF protections and bypass resources
WAF protections and bypass resources
 
Static analysis for beginners
Static analysis for beginnersStatic analysis for beginners
Static analysis for beginners
 
Burlando Waf 2.0
Burlando Waf  2.0Burlando Waf  2.0
Burlando Waf 2.0
 
Development pitfalls
Development pitfallsDevelopment pitfalls
Development pitfalls
 
0d1n bsides2
0d1n bsides20d1n bsides2
0d1n bsides2
 
Bsides odin
Bsides odinBsides odin
Bsides odin
 
Bsides4cooler
Bsides4coolerBsides4cooler
Bsides4cooler
 

Dernier

Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsMonica Sydney
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolinonuriaiuzzolino1
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制pxcywzqs
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"growthgrids
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.krishnachandrapal52
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasDigicorns Technologies
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdfMatthew Sinclair
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdfMatthew Sinclair
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsMonica Sydney
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样ayvbos
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptxAsmae Rabhi
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftAanSulistiyo
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxgalaxypingy
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrHenryBriggs2
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查ydyuyu
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsMonica Sydney
 

Dernier (20)

Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 

Improving spam detection with automaton

  • 2. 2/ 17 ® Whois ● Antonio Costa – Cooler ● Just another System analyst ● Github CoolerVoid ● ● https://github.com/CoolerVoid Contact: acosta@conviso.com.br coolerlair@gmail.com
  • 3. 3/ 17 ® How it works ● Anti-Spam - The common way ● Get E-mails POP3 / IMAP ... ● Validate ● Clean all and tokenization ● BoW (Bag-of-words), SoW(Set-of-Words)... ● tf–idf (term frequency–inverse document frequency)... ● Supervised learning ● Classification (SVM, KNN, NB, Random forest... )
  • 4. 4/ 17 ® How it works ● Anti-Spam - The common way ● Get E-mails POP3 / IMAP ● Validate – Country-based filtering – DNS-based blacklists – Enforcing RFC standards – SMTP callback verification
  • 7. 7/ 17 ® How it works ● Anti-Spam - The common way ● Get E-mails POP3 / IMAP ... - INPUT STRING ● Validate ● Clean all and tokenization ● BoW (Bag-of-words), SoW(Set-of-Words), tf–idf (term frequency–inverse document frequency)... Create MATRIX ● Supervised learning – USING MATRIX ● Classification (SVM, KNN, NB, Random forest... )
  • 8. 8/ 17 ® Bag-of-words [ 1 ] - “Luan likes to make hacking. Josimar likes to make hacking too.” [ 2 ] - “Luan also likes to web hacking.” ● Create array of words ( tokenize... ) { “Luan”,”likes”,”to”,”make”,”hacking”,”Josimar”,”too”, ”also”,”web”} Total of 9 elements ● Count number of appers ! [0] – { 1, 2, 2, 2, 2, 1, 1, 0, 0 } [1] – { 1, 1, 1, 0, 1, 0, 0, 1, 1 }
  • 9. 9/ 17 ® The common way Look this following
  • 10. 10/ 17 ® The common way Why naive bayes ? ● At my tests ! KNN 96% Slow Super simple, you're just doing a bunch of counts. Naive Bayes is an eager learning classifier and it is much faster than KNN. Nodaways it could be used for prediction in real time. Classifier Accuracy Performance SVM 92% Medium NB 94% Fast
  • 11. 11/ 17 ® My way Automatos like a Match Rules ● Gain Accuracy ! ● Gain Performance ! ● Because can match to SPAM before to use classifier ! ● www.site.com/www.bank.com/ ● URL/malware.exe rule like URL/[a-zA-Z]*.exe ... ● Rule like to detect IP at URL ● Deterministic finite automaton to detect ● Use ranking ! NB 94% +4% Fast
  • 12. 12/ 17 ® My way Automatos like a Match Rules ● Gain Accuracy ! ● Gain Performance ! ● Because can match to SPAM before to use classifier ! ● Deterministic finite automaton at Rules to detect ● www.site.com/www.bank.com/ ● URL/malware.exe rule like URL/[a-zA-Z]*.exe ... ● Rule like to detect IP at URL ● Rule to detect Phishing ● Use Ranking ! NB 94% +4% Fast
  • 13. 13/ 17 ® Why Ranking ? Automatos like a Match Rules ● Gain Accuracy ! NB 94% +4% Fast
  • 14. 14/ 17 ® E-mail audit The project ! ● C++ at all source code ! 100% Open Source ! ● IMAP – communication ● Blacklists – DNS, bad domains, e-mail address... ● Deterministic Finite Automaton – Filters ● Tf–idf (term frequency–inverse document frequency) ● Naive bayes – classifier
  • 15. 15/ 17 ® My way Automatos like a Match Rules ● Gain Accuracy ! ● Gain Performance ! ● Because can match to SPAM before to use classifier ! ● www.site.com/www.bank.com/ ● URL/malware.exe rule like URL/[a-zA-Z]*.exe ... ● Rule like to detect IP at URL ● Deterministic finite automaton to detect ● Use ranking ! NB 94% +4% Fast
  • 16. 16/ 17 ® E-mail audit The project ! ● At the future, using GPU to use KNN and automatons... ● Results with GPU turns all fast... ● Next step 100% of accuracy ? https://github.com/CoolerVoid/email_audit