SlideShare a Scribd company logo
1 of 35
Modern On Page Factors
1
SMX Advanced
Matthew Peters, PhD
matt@moz.com @mattthemathman
2
“philadelphia
phillies”
3
“philadelphia
phillies”
4
“Relevance” vs “Ranking”
Conceptually “relevance” determination and “ranking” can be thought of a two
different steps (even if they are implemented as one in a search engine)
5
“Relevance” vs “Ranking”
Conceptually “relevance” determination and “ranking” can be thought of a two
different steps (even if they are implemented as one in a search engine)
Relevance
6
“Relevance” vs “Ranking”
Conceptually “relevance” determination and “ranking” can be thought of a two
different steps (even if they are implemented as one in a search engine)
Relevance
Ranking
1
2
7
Is this page relevant to “philadelphia phillies”?
8
Is this page relevant to “philadelphia phillies”?
query-body similarity: 0.74
9
Is this page relevant to “philadelphia phillies”?
query-body similarity: 0.74
query-title similarity: 0.8
query-H1 similarity: 1.0
etc …
10
Measuring query-document similarity
Goal: given query + document string, compute “similarity”
11
Measuring query-document similarity
See “Introduction to Information Retrieval” by Manning et al:
http://nlp.stanford.edu/IR-book/
> 700
papers
Goal: given query + document string, compute “similarity”
12
Measuring query-document similarity
“philadelphia phillies”
In this context “document” can also refer to title tag, meta description, H1, etc.
0.74
13
Measuring query-document similarity
“philadelphia phillies”
Query Model
tokenization
normalization (stemming)
query expansion
intent
In this context “document” can also refer to title tag, meta description, H1, etc.
0.74
14
Measuring query-document similarity
“philadelphia phillies”
Query Model
tokenization
normalization (stemming)
query expansion
intent
Document Model
tokenization
normalization (stemming)
vector space representation
language model
In this context “document” can also refer to title tag, meta description, H1, etc.
0.74
15
Measuring query-document similarity
“philadelphia phillies”
Query Model
tokenization
normalization (stemming)
query expansion
intent
Document Model
tokenization
normalization (stemming)
vector space representation
language model
In this context “document” can also refer to title tag, meta description, H1, etc.
Scoring function
0.74
16
Query representation
Language identification
Word segmentation
(Japanese, Chinese)
Tokenization + normalization
{reviews, reviewer, reviewing} -> review
Spelling correction
17
Query representation
Language identification
Word segmentation
(Japanese, Chinese)
Tokenization + normalization
{reviews, reviewer, reviewing} -> review
Query expansion
User intent (transactional,
navigational, informational)
Local
Classification
(images, video, news)
Spelling correction
18
Query representation
Language identification
Word segmentation
(Japanese, Chinese)
Tokenization + normalization
{reviews, reviewer, reviewing} -> review
Query expansion
User intent
(transactional, navigational, i
nformational)
Local
Classification
(images, video, news)
Topic Model (LDA)
Entity extraction
Spelling correction
Document representation
TF-IDF
Document representation
TF-IDF Language Model
P(optimization | search, engine)
>>
P(walking | search, engine)
Document representation
Probability Ranking Principle
P(R = 1 | d, q) or P(R = 0 |
d, q)
TF-IDF Language Model
P(optimization | search, engine)
>>
P(walking | search, engine)
Which method performs best?
What are the characteristics of sites that rank highly?
14,000+ keywords
Top 50 results
600,000 URLs
Google-US, no personalization
March 2013
Mean Spearman Correlation
Remember: “correlation is not causation”
Which method performs best?
We tried a few different types of smoothing for the language model,
Dirichlet worked best (Zhai and Lafferty SIGIR 2001)
Impact of stemming
Porter stemmer provided a slight increase in correlations
These correlations are still relatively low compared to other factors
50 results
450
random
pages
movie reviews
50 results
450
random
pages
movie reviews For each
query:500 pages
10% relevant
90% irrelevant
50 results
450
random
pages
movie reviews For each
query:500 pages
10% relevant
90% irrelevant
URL ID PA In SERP?
86 92 1
355 90 0
… … …
27 18 0
URL ID Language
Model
In SERP?
213 0.97 1
156 0.95 1
… … …
355 0.06 0
50 results
450
random
pages
movie reviews For each
query:500 pages
10% relevant
90% irrelevant
URL ID PA In SERP?
86 92 1
355 90 0
… … …
27 18 0
URL ID Language
Model
In SERP?
213 0.97 1
156 0.95 1
… … …
355 0.06 0
P@50 is the “Precision of the top 50 results”. It is the percentage of top 50
results by PA/Language Model that are actually in the SERP.
Top 50
ranked
50 results
450
random
pages
movie reviews For each
query:500 pages
10% relevant
90% irrelevant
URL ID PA In SERP?
86 92 1
355 90 0
… … …
27 18 0
URL ID Language
Model
In SERP?
213 0.97 1
156 0.95 1
… … …
355 0.06 0
P@50 is the “Precision of the top 50 results”. It is the percentage of top 50
results by PA/Language Model that are actually in the SERP.
Top 50
ranked
Takeaways
Implication: Query-document similarity is based on decades of
research. It’s immune to algorithm change.
Takeaways
Implication: Query-document similarity is based on decades of
research. It’s immune to algorithm change.
Action item: With sophisticated query and document models, no
need to optimize separately for similar words, e.g. “movie
reviews” vs “movie review”.
Takeaways
Implication: Query-document similarity is based on decades of
research. It’s immune to algorithm change.
Action item: With sophisticated query and document models, no
need to optimize separately for similar words, e.g. “movie
reviews” vs “movie review”.
Action item: Each page is relevant to many different keywords,
so optimize each page for a broad set of related keywords,
instead of a single keyword.
Takeaways
Implication: Query-document similarity is based on decades of
research. It’s immune to algorithm change.
Action item: With sophisticated query and document models, no
need to optimize separately for similar words, e.g. “movie
reviews” vs “movie review”.
Action item: Each page is relevant to many different keywords,
so optimize each page for a broad set of related keywords,
instead of a single keyword.
Use case: Content creation. What keywords will this new blog
post target? Is it relevant to a set of queries?
Thanks for watching!
Matthew Peters
matt@moz.com @mattthemathman
35

More Related Content

Similar to Peters matthew periodictableseo

Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Trey Grainger
 

Similar to Peters matthew periodictableseo (20)

C-T-R-You Ready for 2021?! - On-SERP SEO Strategies
C-T-R-You Ready for 2021?! - On-SERP SEO StrategiesC-T-R-You Ready for 2021?! - On-SERP SEO Strategies
C-T-R-You Ready for 2021?! - On-SERP SEO Strategies
 
Key Phrases for Better Search
Key Phrases for Better SearchKey Phrases for Better Search
Key Phrases for Better Search
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
 
TCDrupal 2018: SEO! Snippets! Schema!
TCDrupal 2018: SEO! Snippets! Schema! TCDrupal 2018: SEO! Snippets! Schema!
TCDrupal 2018: SEO! Snippets! Schema!
 
SphinnCon Israel 2008
SphinnCon Israel 2008SphinnCon Israel 2008
SphinnCon Israel 2008
 
Graphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and MediaGraphs for Recommendation Engines: Looking beyond Social, Retail, and Media
Graphs for Recommendation Engines: Looking beyond Social, Retail, and Media
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent EngineLeveraging Lucene/Solr as a Knowledge Graph and Intent Engine
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine
 
Semantic Search_ NLP_ ML.pdf
Semantic Search_ NLP_ ML.pdfSemantic Search_ NLP_ ML.pdf
Semantic Search_ NLP_ ML.pdf
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 
SEO Social Blog: SEO Training 2010 From SEOmoz
SEO Social Blog:  SEO Training 2010 From SEOmoz SEO Social Blog:  SEO Training 2010 From SEOmoz
SEO Social Blog: SEO Training 2010 From SEOmoz
 
Introduction to SEO
Introduction to SEOIntroduction to SEO
Introduction to SEO
 
Seo training-2010-100818134052-phpapp02 (1)
Seo training-2010-100818134052-phpapp02 (1)Seo training-2010-100818134052-phpapp02 (1)
Seo training-2010-100818134052-phpapp02 (1)
 
Search Enginge Optimization: SEOmoz
Search Enginge Optimization: SEOmozSearch Enginge Optimization: SEOmoz
Search Enginge Optimization: SEOmoz
 
SEO MARKETING TRAINING
SEO MARKETING TRAININGSEO MARKETING TRAINING
SEO MARKETING TRAINING
 
Search Engine Marketing MD4
Search Engine Marketing MD4Search Engine Marketing MD4
Search Engine Marketing MD4
 
Search engine optimization (seo)
Search engine optimization (seo)Search engine optimization (seo)
Search engine optimization (seo)
 
Concept Based Search
Concept Based SearchConcept Based Search
Concept Based Search
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glance
 
Semrush Ranking Factors Study 2.0 April 2019
Semrush Ranking Factors Study 2.0 April 2019Semrush Ranking Factors Study 2.0 April 2019
Semrush Ranking Factors Study 2.0 April 2019
 
Search Analytics: Diagnosing what ails your site
Search Analytics:  Diagnosing what ails your siteSearch Analytics:  Diagnosing what ails your site
Search Analytics: Diagnosing what ails your site
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Peters matthew periodictableseo