SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Mapping Domain Names to Categories
Maya Rotmensch, Sorcha Gilroy, Corina Gur˘au
Academic Mentor: Cristina Garcia-Cardona
Industry Sponsor: Oversee.net (Kryztof Urban)
Institute of Pure and Applied Mathematics
Research in Industrial Projects
August 15, 2013
Institute for Pure & Applied Mathematics
University of California, Los Angeles
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 1 / 41
Outline
1 Oversee.net
2 Problem Statement
Why so complicated?
ESA - Explicit Semantic Analysis
How Oversee.net Does It
3 Our Project
Our Focus
Methodology
Results
4 Concluding Remarks
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 2 / 41
Outline
1 Oversee.net
2 Problem Statement
Why so complicated?
ESA - Explicit Semantic Analysis
How Oversee.net Does It
3 Our Project
Our Focus
Methodology
Results
4 Concluding Remarks
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 3 / 41
Oversee.net’s Business Model
Person Website
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 4 / 41
Person looking for games A gaming website
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 5 / 41
Oversee.net’s Business Model
Person looking for games Domain A gaming website
Direct Navigation: when users navigate to a website by using the
address bar instead of a search engine.
looking for a gaming website → navigates to ’addictinggamas.com’
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 6 / 41
Oversee.net’s Business Model
Domain parking + traffic matching −→ Oversee.net
Person Domain Category Website
Monetized Domain Parking
The registration of internet domain names without placing any
content on the domain.
Owners monetize traffic by displaying links and advertisements
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 7 / 41
Oversee.net’s Business Model
Advertisers
Partners of Oversee.net
Choose the types of traffic they want from Oversee.net’s category tree
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 8 / 41
Oversee.net’s Business Model
Parked domains do not have any content
Mapping Domains to Categories is extremely difficult
Oversee.net uses Keywords to describe Domains and Categories
Domain Keywords Keywords Category
Not enough, as we are not guaranteed use of same language!
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 9 / 41
Outline
1 Oversee.net
2 Problem Statement
Why so complicated?
ESA - Explicit Semantic Analysis
How Oversee.net Does It
3 Our Project
Our Focus
Methodology
Results
4 Concluding Remarks
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 10 / 41
So what’s the big deal?
Reasoning about concepts
Scarcity of input information
Example 1 - Spelling error
cheapvacatins.com
Example 2 - Ambiguous meaning
bigbearhuts.com (animals? huts? it’s supposed to be winter sports)
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 11 / 41
Text Categorization
Our problem can be thought of as a problem of categorization. We
need to assign a domain to one or more classes or categories
A natural choice is topic modeling
However, unlike most text categorization problems, we don’t actually
have documents to classify, as we are dealing with undeveloped
domains
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 12 / 41
Topic Modeling
This method analyzes the relationships between documents in a corpus by
isolating a set of topics from the documents
For meaningful results, one must work with a set of large texts
Our data set consists of keywords, as our domains are undeveloped
This method results in organic generation of topics
The categories we are attempting to map into are pre-defined
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 13 / 41
ESA - Explicit Semantic Analysis
Building a Semantic Interpreter
Using a Vector Space Model + an exogeneous knowledge base
−→ represent the meaning of text
1
# of articles ∼ 3.5 Million
# of terms ∼ 45 Million
1
Evgeniy Gabrilovich and Shaul Markovitch. Computing Semantic Relatedness using Wikipedia-based Explicit
Semantic Analysis, 2007. Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI)
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 14 / 41
ESA - Explicit Semantic Analysis
Government Finance Toys Children Bank School . . .
Law 0.2 0.3 0.8 0.9 0.2 0.7 . . .
Article2 0.8 0.9 0.1 0.3 0.7 0.5 . . .
Article3 0.5 0.2 0.3 0.6 0.4 0.8 . . .
Article4 0.1 0.2 0.1 0.3 0.4 0.2 . . .
...
...
...
...
...
...
...
...
Term frequency inverse document frequency:
tfidf (t, d, D) = tf (t, d) × idf (t, D)
Logarithmically scaled term frequency:
tf (t, d) = log(f (t, d) + 1)
Inverse document frequency:
idf (t, D) = log
|D|
|d ∈ D : t ∈ d|
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 15 / 41
ESA - Explicit Semantic Analysis
Using a Semantic Interpreter
Cosine similarity measure
similarity = cos(θ) =
A · B
||A|| ||B||
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 16 / 41
How Oversee.net Does It
Instead of comparing two texts - compare two small sets of words!
Use keywords to describe domains and categories
Represent these keywords in terms of DBpedia articles
A keyword is significantly related to an article if the TF-IDF is above a
certain threshold
The set of articles associated to a domain/category is the union of the
sets of articles associated to its keywords
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 17 / 41
How Oversee Does It
Compare the two sets of articles (A - domains, B - categories) using
the Jaccard Index:
J(A, B) =
|A ∩ B|
|A ∪ B|
Categories with highest scores using this index are matched to a
domain
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 18 / 41
Outline
1 Oversee.net
2 Problem Statement
Why so complicated?
ESA - Explicit Semantic Analysis
How Oversee.net Does It
3 Our Project
Our Focus
Methodology
Results
4 Concluding Remarks
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 19 / 41
Our Focus
Domain Keywords Keywords Category
Critical link: domains to keywords
Improve quality of keywords
Click Through Rate
String Similarity
Semantic Analysis
Keyword CTR String Similarity Semantic Similarity
industrial 20 80 0
industriel 20 89 0
industrie 20 100 0
china manufacturer 20 0 88
industries 20 80 98
industrial companies 20 0 86
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 20 / 41
Domain Keywords
Focusing on developing the link between domains and keywords, the two
main questions we posed for our research were:
Could we use ESA to extend the number of meaningful keywords per
domain?
Could we use the keywords obtained through Oversee.net inhouse
statistics as the basis of the new keywords?
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 21 / 41
Methodology
Extending the set of keywords:
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 22 / 41
Methodology
Extending the set of keywords:
When generating new keywords:
Only take top 3 articles
Only take top 2 terms
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 23 / 41
Methodology
Method 2 for extending the set of keywords:
Breaking up and correcting the domain name
chaselogon.com
haselogon
aselogon
cha selogon
chas elogon
chase logon
chasel ogon
chaselo gon
chaselog
chaselogo
Example: domain = ’chaselogon.com’
If entire string matches a word in reference file then stop
If both parts of broken string are exact words then stop
If substring is an exact word then correct other part using edit
distances
Corrections used: deletions, transpositions, replacements, insertions
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 24 / 41
Methodology
Method 2 for extending the set of keywords:
Reference file made up of collections of text, have added more
information
Company names
Popular websites
Brand and store names
Countries and major cities
Initial Keywords Keywords after parsing
chameloeon chas
chase
elson
login
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 25 / 41
Methodology
Generating new keywords and mapping to categories
bankfianancial.com
ncofinancial
ban
bank
financial
financial institutions
financial centre
lobsters
official personal
societies chairman
. . .
Jaccard Index = 0.240492
finance
retirement pension
debit card
tenant credit check
...
Jaccard Index = 0.348147
credit cards
debit card
credit applications
rewards program
...
Jaccard Index = 0.219457
banking
savings banking
checks
community bank
...
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 26 / 41
Results: Comparing Their Keywords to Semantic
We were given a sample of 300 domains that had been matched by
hand to a total of 500 categories
CTR & String Similarity CTR, String Similarity & Semantic Analysis
Number of matches 25 309
percentage of match 5% 61.8%
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 27 / 41
Results: Generating New Keywords
Using Method 1:
CTR & String Similarity Method 1 CTR & String Similarity & 7 Random
Number of matches 25 21 24
percentage of match 5% 4.2% 4.8%
Most of the time, the different methods yielded the same results
Cases where the new keywords improved the system:
thhetrainline.com
Cases where the base case did better:
inindustries.com
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 28 / 41
Results
thhetrainline.com
thetrainline
Jaccard Index = 0.0001 microcars & city cars
Jaccard Index = 0.0002 property management
thhetrainline.com
thetrainline
strafe train
moving departing
train station
telecommunications
georgia
rain shine
. . .
Jaccard Index = 0.1348 bus & rail
Jaccard Index = 0.2255 libraries & museums
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 29 / 41
Results
inindustries.com
industrial
industrias
industriel
. . .
Jaccard Index = 0.0786 manufacturing
inindustries.com
industrial
industrias
industriel
. . .
ministry
quarterly garden/outdoor
filipino footballer
. . .
Jaccard Index = 0.099 tourist destinations
Jaccard Index = 0.1326 real estate
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 30 / 41
Results: Parsing the Domains
Using Method 1 & 2:
CTR & String Similarity Method 1 & 2 CTR & String Similarity & 15 Random
Number of matches 25 93 23
percentage of match 5% 18.6% 4.6%
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 31 / 41
Results - Parsing the Domains
chaselogon.com
chameloeon
No category matched
addictinggamas.com
chameloeon
chas
chase
elson
login
password
journalists cyber
logins expensive
beatles
. . .
Jaccard Index =0.4637 credit cards
Jaccard Index = 0.4637 banking
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 32 / 41
Results: Parsing the Domains
Using Method 2:
CTR & String Sim. Method 1& 2 Method 2
Number of matches 25 97 77 out of 356
percentage of match 5% 19.4% ∼ 21.6 %
Initial results show that overall, just using parsing might be more beneficial
→ depends on the amount of noise.
Example with a lot of noise:
mobilestorage.ca
Example with minimal noise:
addictinggamas.com
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 33 / 41
Results - Amplification of noise
mobilestorage.ca
gfilestorage
mobileshop
mobile
storage
age
investor
vilest
. . .
Jaccard Index = 0.1011 mobile & wireless
Jaccard Index = 0.0959 music & audio
mobilestorage.ca
gfilestorage
mobileshop
mobile
storage
age
investor
vilest
. . .
legal age
taylor
phone companies
mobil
. . .
Jaccard Index =0.0942 music & audio
Jaccard Index = 0.0887 education
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 34 / 41
Results - Minimal noise
addictinggamas.com
addictinggams
addictivegames
adictigegames
. . .
addict
addicting
games
ingram
. . .
Jaccard Index = 0.0153 software
addictinggamas.com
addictinggams
addictivegames
adictigegames
. . .
addict
addicting
games
ingram
. . .
gameplay requires
game
impulsedriven flash
add ons
. . .
Jaccard Index = 0.2019 computer & video games
Jaccard Index = 0.1975 games
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 35 / 41
Results: Extended Matches
Using Extended Matches:
We extended possible matches to parent and root nodes of the
category tree.
Checked in how many cases did the parent or root node of the
categories we got matched the manual matching.
CTR & String Sim. Method 1 Method 1& 2 Method 2
Number of matches 25 21 97 77 out of 356
percentage of match 5% 4.2% 19.4% ∼ 21.6 %
Number of extended matches 32 29 128 102 out of 356
Percentage of matches 6.4% 5.8% 25.6% ∼ 28.7 %
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 36 / 41
Outline
1 Oversee.net
2 Problem Statement
Why so complicated?
ESA - Explicit Semantic Analysis
How Oversee.net Does It
3 Our Project
Our Focus
Methodology
Results
4 Concluding Remarks
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 37 / 41
Conclusion
Implemented a program to match domains with categories
Created an ESA based method to amplify existing keywords
Adapted a domain name parsing and spell correcting method
Revisiting our research questions:
Could we use ESA to extend the number of meaningful keywords per
domain? → Yes
Could we use the keywords obtained through Oversee.net inhouse
statistics as the basis of the new keywords? → No. Or at least
further processing must be done.
getting better & more keywords → getting a few good keywords
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 38 / 41
Future Directions
Find out how many good initial keywords are required to use our
method successfully
Explore a better way of ranking keywords and determine which are
the most descriptive ones
Click through rate and string similarity comparisons are not sufficiently
descriptive, need a better scoring method
Have a reference of the most popular websites, so that the domains
given could be compared to these
Analyze content in websites to amplify domain to category mapping
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 39 / 41
Thank you!
Academic Mentor: Cristina Garcia-Cardona
Industry Sponsor: Kryztof Urban and Oversee.net
RIPS Director: Dr. Michael Raugh
Director of IPAM: Dr. Russ Caflisch
IPAM Staff: Dimi, Stacey, Stacy, Roland, Stephanie, and everyone
that made RIPS possible
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 40 / 41
Questions?
Thank you for listening!
(Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 41 / 41

Contenu connexe

Tendances

Semi Automatic to Improve Ontology Mapping Process in Semantic Web Data Analysis
Semi Automatic to Improve Ontology Mapping Process in Semantic Web Data AnalysisSemi Automatic to Improve Ontology Mapping Process in Semantic Web Data Analysis
Semi Automatic to Improve Ontology Mapping Process in Semantic Web Data AnalysisIRJET Journal
 
Web Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage KnowledgeWeb Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage KnowledgeIRJET Journal
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmIRJET Journal
 
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...Zainul Sayed
 
IRJET- Missing Value Evaluation in SQL Queries: A Survey
IRJET- 	  Missing Value Evaluation in SQL Queries: A SurveyIRJET- 	  Missing Value Evaluation in SQL Queries: A Survey
IRJET- Missing Value Evaluation in SQL Queries: A SurveyIRJET Journal
 
Complex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesComplex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesBesnik Fetahu
 
Part 1
Part 1Part 1
Part 1butest
 
Coverage-Criteria-for-Testing-SQL-Queries
Coverage-Criteria-for-Testing-SQL-QueriesCoverage-Criteria-for-Testing-SQL-Queries
Coverage-Criteria-for-Testing-SQL-QueriesMohamed Reda
 
Positional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted IndexesPositional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted IndexesLeonidas Akritidis
 

Tendances (13)

Semi Automatic to Improve Ontology Mapping Process in Semantic Web Data Analysis
Semi Automatic to Improve Ontology Mapping Process in Semantic Web Data AnalysisSemi Automatic to Improve Ontology Mapping Process in Semantic Web Data Analysis
Semi Automatic to Improve Ontology Mapping Process in Semantic Web Data Analysis
 
Web Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage KnowledgeWeb Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage Knowledge
 
Effective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch AlgorithmEffective Data Retrieval in XML using TreeMatch Algorithm
Effective Data Retrieval in XML using TreeMatch Algorithm
 
DSA-Lecture-05
DSA-Lecture-05DSA-Lecture-05
DSA-Lecture-05
 
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
Intelligent Hiring with Resume Parser and Ranking using Natural Language Proc...
 
Rs web context_content__v4.0__20120908_ma
Rs web context_content__v4.0__20120908_maRs web context_content__v4.0__20120908_ma
Rs web context_content__v4.0__20120908_ma
 
IRJET- Missing Value Evaluation in SQL Queries: A Survey
IRJET- 	  Missing Value Evaluation in SQL Queries: A SurveyIRJET- 	  Missing Value Evaluation in SQL Queries: A Survey
IRJET- Missing Value Evaluation in SQL Queries: A Survey
 
Resume parser
Resume parserResume parser
Resume parser
 
104333 sri vidhya eng notes
104333 sri vidhya eng notes104333 sri vidhya eng notes
104333 sri vidhya eng notes
 
Complex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesComplex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype Properties
 
Part 1
Part 1Part 1
Part 1
 
Coverage-Criteria-for-Testing-SQL-Queries
Coverage-Criteria-for-Testing-SQL-QueriesCoverage-Criteria-for-Testing-SQL-Queries
Coverage-Criteria-for-Testing-SQL-Queries
 
Positional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted IndexesPositional Data Organization and Compression in Web Inverted Indexes
Positional Data Organization and Compression in Web Inverted Indexes
 

En vedette

2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorksNicole Allen
 
2007-10-19 Working With Faculty (SWSLC)
2007-10-19 Working With Faculty (SWSLC)2007-10-19 Working With Faculty (SWSLC)
2007-10-19 Working With Faculty (SWSLC)Nicole Allen
 
Tanjaouiates au Rallye Aicha des Gazelles
Tanjaouiates au Rallye Aicha des GazellesTanjaouiates au Rallye Aicha des Gazelles
Tanjaouiates au Rallye Aicha des GazellesSarah
 
Netherlands
NetherlandsNetherlands
NetherlandsLexi34
 
Edct 203 11b
Edct 203 11bEdct 203 11b
Edct 203 11bcolinissa
 
Romanian Design Week 2016
Romanian Design Week 2016Romanian Design Week 2016
Romanian Design Week 2016Silvia Floares
 
L'Anthropocène et ses victimes - François Gemenne
L'Anthropocène et ses victimes  - François GemenneL'Anthropocène et ses victimes  - François Gemenne
L'Anthropocène et ses victimes - François GemenneThe Shift Project
 
Lansare.Bolucencova
Lansare.BolucencovaLansare.Bolucencova
Lansare.BolucencovaAdela Negura
 
Hire the right driver
Hire the right driverHire the right driver
Hire the right driverPete DiSantis
 
Spectral functions and geometric invariants
Spectral functions and geometric invariantsSpectral functions and geometric invariants
Spectral functions and geometric invariantsPedro Morales
 
George Business Consultancy Operating Model
George Business Consultancy Operating ModelGeorge Business Consultancy Operating Model
George Business Consultancy Operating Modelpaulageorge
 
2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost CrisisNicole Allen
 
Desenho Parte Mecânica TID 3
Desenho Parte Mecânica TID 3Desenho Parte Mecânica TID 3
Desenho Parte Mecânica TID 3Sgtmuniz15
 
Mitologie universala.11.mit. romaneasca
Mitologie universala.11.mit. romaneascaMitologie universala.11.mit. romaneasca
Mitologie universala.11.mit. romaneascaAdela Negura
 
Lean nella azienda ed technologia
Lean nella azienda ed technologia Lean nella azienda ed technologia
Lean nella azienda ed technologia Jürgen Lauber
 
Contents page analysis
Contents page analysisContents page analysis
Contents page analysisyumm
 

En vedette (20)

2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks
 
2007-10-19 Working With Faculty (SWSLC)
2007-10-19 Working With Faculty (SWSLC)2007-10-19 Working With Faculty (SWSLC)
2007-10-19 Working With Faculty (SWSLC)
 
Tanjaouiates au Rallye Aicha des Gazelles
Tanjaouiates au Rallye Aicha des GazellesTanjaouiates au Rallye Aicha des Gazelles
Tanjaouiates au Rallye Aicha des Gazelles
 
Netherlands
NetherlandsNetherlands
Netherlands
 
Edct 203 11b
Edct 203 11bEdct 203 11b
Edct 203 11b
 
Romanian Design Week 2016
Romanian Design Week 2016Romanian Design Week 2016
Romanian Design Week 2016
 
Intro to n screen-rev
Intro to n screen-revIntro to n screen-rev
Intro to n screen-rev
 
L'Anthropocène et ses victimes - François Gemenne
L'Anthropocène et ses victimes  - François GemenneL'Anthropocène et ses victimes  - François Gemenne
L'Anthropocène et ses victimes - François Gemenne
 
Accommodation
AccommodationAccommodation
Accommodation
 
Lansare.Bolucencova
Lansare.BolucencovaLansare.Bolucencova
Lansare.Bolucencova
 
Hire the right driver
Hire the right driverHire the right driver
Hire the right driver
 
Spectral functions and geometric invariants
Spectral functions and geometric invariantsSpectral functions and geometric invariants
Spectral functions and geometric invariants
 
George Business Consultancy Operating Model
George Business Consultancy Operating ModelGeorge Business Consultancy Operating Model
George Business Consultancy Operating Model
 
As 3R
As 3RAs 3R
As 3R
 
2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis
 
Bitcoin 101
Bitcoin 101Bitcoin 101
Bitcoin 101
 
Desenho Parte Mecânica TID 3
Desenho Parte Mecânica TID 3Desenho Parte Mecânica TID 3
Desenho Parte Mecânica TID 3
 
Mitologie universala.11.mit. romaneasca
Mitologie universala.11.mit. romaneascaMitologie universala.11.mit. romaneasca
Mitologie universala.11.mit. romaneasca
 
Lean nella azienda ed technologia
Lean nella azienda ed technologia Lean nella azienda ed technologia
Lean nella azienda ed technologia
 
Contents page analysis
Contents page analysisContents page analysis
Contents page analysis
 

Similaire à Mapping Domain Names to Categories

Twitter Sentiment Analysis: An Unsupervised Approach
Twitter Sentiment Analysis: An Unsupervised ApproachTwitter Sentiment Analysis: An Unsupervised Approach
Twitter Sentiment Analysis: An Unsupervised ApproachIRJET Journal
 
Conceptual design & ER Model.pptx
Conceptual design & ER Model.pptxConceptual design & ER Model.pptx
Conceptual design & ER Model.pptxAvinashChoure2
 
Research @ RELEASeD (presented at SATTOSE2013)
Research @ RELEASeD (presented at SATTOSE2013)Research @ RELEASeD (presented at SATTOSE2013)
Research @ RELEASeD (presented at SATTOSE2013)kim.mens
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19Yong Siang (Ivan) Tan
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxiamultapromax
 
Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema IntegrationMustafa Jarrar
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
PLACEMENTS ANALYTICS AND DASHBOARD
PLACEMENTS ANALYTICS AND DASHBOARDPLACEMENTS ANALYTICS AND DASHBOARD
PLACEMENTS ANALYTICS AND DASHBOARDIRJET Journal
 
CS8592_Notes_008_edubuzz360.pdf
CS8592_Notes_008_edubuzz360.pdfCS8592_Notes_008_edubuzz360.pdf
CS8592_Notes_008_edubuzz360.pdfAROCKIAJAYAIECW
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
 
Jarrar: Data Schema Integration
Jarrar: Data Schema Integration Jarrar: Data Schema Integration
Jarrar: Data Schema Integration Mustafa Jarrar
 
OOAD-Unit1.ppt
OOAD-Unit1.pptOOAD-Unit1.ppt
OOAD-Unit1.pptrituah
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningEditor IJCATR
 
Developing Competitive Strategies in Higher Education through Visual Data Mining
Developing Competitive Strategies in Higher Education through Visual Data MiningDeveloping Competitive Strategies in Higher Education through Visual Data Mining
Developing Competitive Strategies in Higher Education through Visual Data MiningGurdal Ertek
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Gurdal Ertek
 
SKOS as a key element in Enterprise Linked Data Strategies
SKOS as a key element in Enterprise Linked Data StrategiesSKOS as a key element in Enterprise Linked Data Strategies
SKOS as a key element in Enterprise Linked Data StrategiesSemantic Web Company
 
Concept Based Search
Concept Based SearchConcept Based Search
Concept Based Searchfreewi11
 

Similaire à Mapping Domain Names to Categories (20)

Twitter Sentiment Analysis: An Unsupervised Approach
Twitter Sentiment Analysis: An Unsupervised ApproachTwitter Sentiment Analysis: An Unsupervised Approach
Twitter Sentiment Analysis: An Unsupervised Approach
 
Conceptual design & ER Model.pptx
Conceptual design & ER Model.pptxConceptual design & ER Model.pptx
Conceptual design & ER Model.pptx
 
Research @ RELEASeD (presented at SATTOSE2013)
Research @ RELEASeD (presented at SATTOSE2013)Research @ RELEASeD (presented at SATTOSE2013)
Research @ RELEASeD (presented at SATTOSE2013)
 
DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19DataScience SG | Undergrad Series | 26th Sep 19
DataScience SG | Undergrad Series | 26th Sep 19
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
Jarrar: Data Schema Integration
Jarrar: Data Schema IntegrationJarrar: Data Schema Integration
Jarrar: Data Schema Integration
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
PLACEMENTS ANALYTICS AND DASHBOARD
PLACEMENTS ANALYTICS AND DASHBOARDPLACEMENTS ANALYTICS AND DASHBOARD
PLACEMENTS ANALYTICS AND DASHBOARD
 
CS8592_Notes_008_edubuzz360.pdf
CS8592_Notes_008_edubuzz360.pdfCS8592_Notes_008_edubuzz360.pdf
CS8592_Notes_008_edubuzz360.pdf
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 
Jarrar: Data Schema Integration
Jarrar: Data Schema Integration Jarrar: Data Schema Integration
Jarrar: Data Schema Integration
 
OOAD-Unit1.ppt
OOAD-Unit1.pptOOAD-Unit1.ppt
OOAD-Unit1.ppt
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
 
Developing Competitive Strategies in Higher Education through Visual Data Mining
Developing Competitive Strategies in Higher Education through Visual Data MiningDeveloping Competitive Strategies in Higher Education through Visual Data Mining
Developing Competitive Strategies in Higher Education through Visual Data Mining
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...
 
SKOS as a key element in Enterprise Linked Data Strategies
SKOS as a key element in Enterprise Linked Data StrategiesSKOS as a key element in Enterprise Linked Data Strategies
SKOS as a key element in Enterprise Linked Data Strategies
 
Concept Based Search
Concept Based SearchConcept Based Search
Concept Based Search
 

Dernier

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Dernier (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Mapping Domain Names to Categories

  • 1. Mapping Domain Names to Categories Maya Rotmensch, Sorcha Gilroy, Corina Gur˘au Academic Mentor: Cristina Garcia-Cardona Industry Sponsor: Oversee.net (Kryztof Urban) Institute of Pure and Applied Mathematics Research in Industrial Projects August 15, 2013 Institute for Pure & Applied Mathematics University of California, Los Angeles (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 1 / 41
  • 2. Outline 1 Oversee.net 2 Problem Statement Why so complicated? ESA - Explicit Semantic Analysis How Oversee.net Does It 3 Our Project Our Focus Methodology Results 4 Concluding Remarks (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 2 / 41
  • 3. Outline 1 Oversee.net 2 Problem Statement Why so complicated? ESA - Explicit Semantic Analysis How Oversee.net Does It 3 Our Project Our Focus Methodology Results 4 Concluding Remarks (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 3 / 41
  • 4. Oversee.net’s Business Model Person Website (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 4 / 41
  • 5. Person looking for games A gaming website (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 5 / 41
  • 6. Oversee.net’s Business Model Person looking for games Domain A gaming website Direct Navigation: when users navigate to a website by using the address bar instead of a search engine. looking for a gaming website → navigates to ’addictinggamas.com’ (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 6 / 41
  • 7. Oversee.net’s Business Model Domain parking + traffic matching −→ Oversee.net Person Domain Category Website Monetized Domain Parking The registration of internet domain names without placing any content on the domain. Owners monetize traffic by displaying links and advertisements (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 7 / 41
  • 8. Oversee.net’s Business Model Advertisers Partners of Oversee.net Choose the types of traffic they want from Oversee.net’s category tree (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 8 / 41
  • 9. Oversee.net’s Business Model Parked domains do not have any content Mapping Domains to Categories is extremely difficult Oversee.net uses Keywords to describe Domains and Categories Domain Keywords Keywords Category Not enough, as we are not guaranteed use of same language! (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 9 / 41
  • 10. Outline 1 Oversee.net 2 Problem Statement Why so complicated? ESA - Explicit Semantic Analysis How Oversee.net Does It 3 Our Project Our Focus Methodology Results 4 Concluding Remarks (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 10 / 41
  • 11. So what’s the big deal? Reasoning about concepts Scarcity of input information Example 1 - Spelling error cheapvacatins.com Example 2 - Ambiguous meaning bigbearhuts.com (animals? huts? it’s supposed to be winter sports) (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 11 / 41
  • 12. Text Categorization Our problem can be thought of as a problem of categorization. We need to assign a domain to one or more classes or categories A natural choice is topic modeling However, unlike most text categorization problems, we don’t actually have documents to classify, as we are dealing with undeveloped domains (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 12 / 41
  • 13. Topic Modeling This method analyzes the relationships between documents in a corpus by isolating a set of topics from the documents For meaningful results, one must work with a set of large texts Our data set consists of keywords, as our domains are undeveloped This method results in organic generation of topics The categories we are attempting to map into are pre-defined (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 13 / 41
  • 14. ESA - Explicit Semantic Analysis Building a Semantic Interpreter Using a Vector Space Model + an exogeneous knowledge base −→ represent the meaning of text 1 # of articles ∼ 3.5 Million # of terms ∼ 45 Million 1 Evgeniy Gabrilovich and Shaul Markovitch. Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis, 2007. Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI) (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 14 / 41
  • 15. ESA - Explicit Semantic Analysis Government Finance Toys Children Bank School . . . Law 0.2 0.3 0.8 0.9 0.2 0.7 . . . Article2 0.8 0.9 0.1 0.3 0.7 0.5 . . . Article3 0.5 0.2 0.3 0.6 0.4 0.8 . . . Article4 0.1 0.2 0.1 0.3 0.4 0.2 . . . ... ... ... ... ... ... ... ... Term frequency inverse document frequency: tfidf (t, d, D) = tf (t, d) × idf (t, D) Logarithmically scaled term frequency: tf (t, d) = log(f (t, d) + 1) Inverse document frequency: idf (t, D) = log |D| |d ∈ D : t ∈ d| (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 15 / 41
  • 16. ESA - Explicit Semantic Analysis Using a Semantic Interpreter Cosine similarity measure similarity = cos(θ) = A · B ||A|| ||B|| (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 16 / 41
  • 17. How Oversee.net Does It Instead of comparing two texts - compare two small sets of words! Use keywords to describe domains and categories Represent these keywords in terms of DBpedia articles A keyword is significantly related to an article if the TF-IDF is above a certain threshold The set of articles associated to a domain/category is the union of the sets of articles associated to its keywords (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 17 / 41
  • 18. How Oversee Does It Compare the two sets of articles (A - domains, B - categories) using the Jaccard Index: J(A, B) = |A ∩ B| |A ∪ B| Categories with highest scores using this index are matched to a domain (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 18 / 41
  • 19. Outline 1 Oversee.net 2 Problem Statement Why so complicated? ESA - Explicit Semantic Analysis How Oversee.net Does It 3 Our Project Our Focus Methodology Results 4 Concluding Remarks (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 19 / 41
  • 20. Our Focus Domain Keywords Keywords Category Critical link: domains to keywords Improve quality of keywords Click Through Rate String Similarity Semantic Analysis Keyword CTR String Similarity Semantic Similarity industrial 20 80 0 industriel 20 89 0 industrie 20 100 0 china manufacturer 20 0 88 industries 20 80 98 industrial companies 20 0 86 (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 20 / 41
  • 21. Domain Keywords Focusing on developing the link between domains and keywords, the two main questions we posed for our research were: Could we use ESA to extend the number of meaningful keywords per domain? Could we use the keywords obtained through Oversee.net inhouse statistics as the basis of the new keywords? (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 21 / 41
  • 22. Methodology Extending the set of keywords: (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 22 / 41
  • 23. Methodology Extending the set of keywords: When generating new keywords: Only take top 3 articles Only take top 2 terms (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 23 / 41
  • 24. Methodology Method 2 for extending the set of keywords: Breaking up and correcting the domain name chaselogon.com haselogon aselogon cha selogon chas elogon chase logon chasel ogon chaselo gon chaselog chaselogo Example: domain = ’chaselogon.com’ If entire string matches a word in reference file then stop If both parts of broken string are exact words then stop If substring is an exact word then correct other part using edit distances Corrections used: deletions, transpositions, replacements, insertions (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 24 / 41
  • 25. Methodology Method 2 for extending the set of keywords: Reference file made up of collections of text, have added more information Company names Popular websites Brand and store names Countries and major cities Initial Keywords Keywords after parsing chameloeon chas chase elson login (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 25 / 41
  • 26. Methodology Generating new keywords and mapping to categories bankfianancial.com ncofinancial ban bank financial financial institutions financial centre lobsters official personal societies chairman . . . Jaccard Index = 0.240492 finance retirement pension debit card tenant credit check ... Jaccard Index = 0.348147 credit cards debit card credit applications rewards program ... Jaccard Index = 0.219457 banking savings banking checks community bank ... (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 26 / 41
  • 27. Results: Comparing Their Keywords to Semantic We were given a sample of 300 domains that had been matched by hand to a total of 500 categories CTR & String Similarity CTR, String Similarity & Semantic Analysis Number of matches 25 309 percentage of match 5% 61.8% (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 27 / 41
  • 28. Results: Generating New Keywords Using Method 1: CTR & String Similarity Method 1 CTR & String Similarity & 7 Random Number of matches 25 21 24 percentage of match 5% 4.2% 4.8% Most of the time, the different methods yielded the same results Cases where the new keywords improved the system: thhetrainline.com Cases where the base case did better: inindustries.com (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 28 / 41
  • 29. Results thhetrainline.com thetrainline Jaccard Index = 0.0001 microcars & city cars Jaccard Index = 0.0002 property management thhetrainline.com thetrainline strafe train moving departing train station telecommunications georgia rain shine . . . Jaccard Index = 0.1348 bus & rail Jaccard Index = 0.2255 libraries & museums (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 29 / 41
  • 30. Results inindustries.com industrial industrias industriel . . . Jaccard Index = 0.0786 manufacturing inindustries.com industrial industrias industriel . . . ministry quarterly garden/outdoor filipino footballer . . . Jaccard Index = 0.099 tourist destinations Jaccard Index = 0.1326 real estate (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 30 / 41
  • 31. Results: Parsing the Domains Using Method 1 & 2: CTR & String Similarity Method 1 & 2 CTR & String Similarity & 15 Random Number of matches 25 93 23 percentage of match 5% 18.6% 4.6% (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 31 / 41
  • 32. Results - Parsing the Domains chaselogon.com chameloeon No category matched addictinggamas.com chameloeon chas chase elson login password journalists cyber logins expensive beatles . . . Jaccard Index =0.4637 credit cards Jaccard Index = 0.4637 banking (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 32 / 41
  • 33. Results: Parsing the Domains Using Method 2: CTR & String Sim. Method 1& 2 Method 2 Number of matches 25 97 77 out of 356 percentage of match 5% 19.4% ∼ 21.6 % Initial results show that overall, just using parsing might be more beneficial → depends on the amount of noise. Example with a lot of noise: mobilestorage.ca Example with minimal noise: addictinggamas.com (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 33 / 41
  • 34. Results - Amplification of noise mobilestorage.ca gfilestorage mobileshop mobile storage age investor vilest . . . Jaccard Index = 0.1011 mobile & wireless Jaccard Index = 0.0959 music & audio mobilestorage.ca gfilestorage mobileshop mobile storage age investor vilest . . . legal age taylor phone companies mobil . . . Jaccard Index =0.0942 music & audio Jaccard Index = 0.0887 education (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 34 / 41
  • 35. Results - Minimal noise addictinggamas.com addictinggams addictivegames adictigegames . . . addict addicting games ingram . . . Jaccard Index = 0.0153 software addictinggamas.com addictinggams addictivegames adictigegames . . . addict addicting games ingram . . . gameplay requires game impulsedriven flash add ons . . . Jaccard Index = 0.2019 computer & video games Jaccard Index = 0.1975 games (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 35 / 41
  • 36. Results: Extended Matches Using Extended Matches: We extended possible matches to parent and root nodes of the category tree. Checked in how many cases did the parent or root node of the categories we got matched the manual matching. CTR & String Sim. Method 1 Method 1& 2 Method 2 Number of matches 25 21 97 77 out of 356 percentage of match 5% 4.2% 19.4% ∼ 21.6 % Number of extended matches 32 29 128 102 out of 356 Percentage of matches 6.4% 5.8% 25.6% ∼ 28.7 % (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 36 / 41
  • 37. Outline 1 Oversee.net 2 Problem Statement Why so complicated? ESA - Explicit Semantic Analysis How Oversee.net Does It 3 Our Project Our Focus Methodology Results 4 Concluding Remarks (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 37 / 41
  • 38. Conclusion Implemented a program to match domains with categories Created an ESA based method to amplify existing keywords Adapted a domain name parsing and spell correcting method Revisiting our research questions: Could we use ESA to extend the number of meaningful keywords per domain? → Yes Could we use the keywords obtained through Oversee.net inhouse statistics as the basis of the new keywords? → No. Or at least further processing must be done. getting better & more keywords → getting a few good keywords (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 38 / 41
  • 39. Future Directions Find out how many good initial keywords are required to use our method successfully Explore a better way of ranking keywords and determine which are the most descriptive ones Click through rate and string similarity comparisons are not sufficiently descriptive, need a better scoring method Have a reference of the most popular websites, so that the domains given could be compared to these Analyze content in websites to amplify domain to category mapping (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 39 / 41
  • 40. Thank you! Academic Mentor: Cristina Garcia-Cardona Industry Sponsor: Kryztof Urban and Oversee.net RIPS Director: Dr. Michael Raugh Director of IPAM: Dr. Russ Caflisch IPAM Staff: Dimi, Stacey, Stacy, Roland, Stephanie, and everyone that made RIPS possible (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 40 / 41
  • 41. Questions? Thank you for listening! (Institute of Pure and Applied Mathematics) Mapping Domain Names to Categories August 15, 2013 41 / 41