40. Topic Modeling
LDA correlates w/
Google rankings
better than any other
on-page feature
http://www.seomoz.org/blog/content-optimization-revisiting-topic-modeling-lda-our-labs-tool
47. Causation? Not so fast?
• Good links may be more likely to
point to more "relevant" pages
• Other aspect of Google's algorithm
may naturally bias towards
these pages
48. Out of the SERPs!
•Keyword Spamming
might improve your
LDA Score, but
probably not your
rankings
49. How to use topic modeling
• Think about negative keywords
in a similar fashion to negative keywords for ppc.
• Think about positive keywords in localized terms
50. Perspective: it’s all relative
• The numbers are RELATIVE
•Track numbers over time – shoot for
improvement
56. Twitter Data
Google:
“We use it as a signal in our organic and
news rankings. We enhance our news
universal by marking how
many people shared an article.”
http://searchengineland.com/what-social-signals-do-google-bing-really-count-55389
57. Twitter Test
Page A Page B
646 links from 36 root domains 1 link from 1 root domain
2 tweets 522 tweets
http://www.seomoz.org/blog/how-do-tweets-influence-search-rankings-an-experiment-for-a-cause
58. Twitter: Clearly Influencing
Google
Page B – the
tweeted version –
ranks #1!
Page A Page B
646 links from 36 root domains 1 link from 1 root domain
2 tweets 522 tweets
http://www.seomoz.org/blog/how-do-tweets-influence-search-rankings-an-experiment-for-a-cause
59. Twitter Data for QDF
http://www.seomoz.org/blog/tweets-effect-rankings-unexpected-case-study
60. Author Authority
Do Search Engines Use Author Authority to
Rank Pages in the SERPs?
Google: Yes we compute Bing: Yes. We calculate the
and use author quality. authority of someone who tweets.
We don’t know who For known public figures or
anyone is in real life. publishers, we do associate them
with who they are.
http://searchengineland.com/what-social-signals-do-google-bing-really-count-55389
63. What are Brand Signals?
Brands
Have real people working
at a physical address
Generics
Often exist only online
http://www.seomoz.org/blog/the-next-generation-of-ranking-signals
64. Brand Signals
Brands
Have authentic, followed
social accounts
Generics
Rarely have significant
social accounts
http://www.seomoz.org/blog/the-next-generation-of-ranking-signals
65. Brand Signals
Brands
Display obvious, robust
contact information
Generics
Frequently use email
forms only
http://www.seomoz.org/blog/the-next-generation-of-ranking-signals
66. Brand Signals
Brands
Register with
government/civic
organizations
Generics
Stay “under the radar”
http://www.seomoz.org/blog/the-next-generation-of-ranking-signals
67. Brand Signals
Brands
Receive traffic from
diverse sources
Generics
Search is often
90%+ of traffic
http://www.seomoz.org/blog/the-next-generation-of-ranking-signals
69. Brand Signals
Brands
Run offline
marketing/advertising
campaigns
Generics
Ignore the offline world
http://www.seomoz.org/blog/the-next-generation-of-ranking-signals
70. Entity Association
http://www.seomoz.org/blog/the-next-generation-of-ranking-signals
73. Search Engine Ranking Factors
2011
Preliminary Data
http://www.seomoz.org/blog/early-ranking-factors-data-an-april-linkscape-update
74. Big Changes from 2009 to 2011
• Link-Based Factors are waning
• Social Data is increasing
• Page-Level Link Metrics Fell the Most (43% - 22%)
• Keyword-Level Domain Metrics, Brand Data + Social Rising
The next update of the ranking factors will be online in April, 2013
76. From the Mouths of Googlers
Wired.com:
How do you recognize a
shallow-content site?
http://www.wired.com/epicenter/2011/03/the-panda-that-hates-farms/all/1
77. From the Mouths of Googlers
Singhal: we ask…
“Would you be comfortable giving
this site your credit card?
Would you be comfortable giving
medicine prescribed by this site
to your kids?”
http://www.wired.com/epicenter/2011/03/the-panda-that-hates-farms/all/1
78. From the Mouths of Googlers
Matt Cutts responds: we ask…
1. “Do you consider this site to be
authoritative?
2. Would it be okay if this were in a
magazine?
3. Does this site have excessive ads?”
http://www.wired.com/epicenter/2011/03/the-panda-that-hates-farms/all/1
79. From the Mouths of Googlers
Wired.com:
How do you implement that algorithmically?
http://www.wired.com/epicenter/2011/03/the-panda-that-hates-farms/all/1
80. From the Mouths of Googlers
Cutts:
…look for signals that
recreate that same intuition
http://www.wired.com/epicenter/2011/03/the-panda-that-hates-farms/all/1
81. From the Mouths of Googlers
Singhal:
• Imagine in a hyperspace a bunch of points, some
points are red and some points are green
and in others there’s some mixture.
Your job is to find a plane which says,
Most things on this side of the plane are red
and most of the things on that side
of the plane are the opposite of red.
http://www.wired.com/epicenter/2011/03/the-panda-that-hates-farms/all/1
82. Googlers want to know…
Are you trustworthy?
http://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-building-high-quality.html
83. Are you an expert? Author?
http://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-building-high-quality.html
84. Are your facts checked?
http://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-building-high-quality.html
85. Are you genuinely interesting?
http://googlewebmastercentral.blogspot.com/2011/05/more-guidance-on-building-high-quality.html
107. Over SEO’ing: OUT!
seo seo seo seoseo seo
seo seo seo seo seo seo seo
seo seo seo seo seo seo seo
seo seo seo seo seo seo seo
seo seo seo seo seo seo seo
seo seo seo seo seo seo seo
seo seo seo seo seo seo seo
seo seo seo seo seo seo seo
seo seo seo seo seo seo seo
seo seo seo seo seo seo seo
seo seo seo seo seo seo seo
seo seo seo seo seo seo seo
seo seo seo seo seo seo seo
128. Don’t “Look” Like a
Content Farm
http://hubpages.com/hub/WomensFashionsofthe1920-FlappersandtheJazz-Age
129. Avoid “Classic” SEO Tactics
Directory Link Building Keyword-Variant Abuse
Reciprocal Link Pages Paid Links w/ Manipulative Anchor Text
Sitewide, Footer Links Navigation for Engines, Not Humans
Low Cost/Quality, Outsourced Content Generic Design and Layout
Anonymous Contact Forms Anchor-Text Rich Internal Links
Ad Blocks Dominating the Page Keyword Stuffed Titles + Pages
It’s great to do good SEO, just don’t look like the only reason the site exists is to draw Google traffic
131. Become a “Brand”
Brands Generics
• Have real people working at a physical address • Often exist only online
• Have authentic, followed social accounts • Rarely have significant social accounts
• Display obvious, robust contact information • Frequently use email forms only
• Register with government/civic organizations • Stay “under the radar”
• Receive traffic from diverse sources • Search is often 90%+ of traffic
• Generate branded search query volume • Have little-no branded search demand
• Run offline marketing/advertising campaigns • Ignore the offline world
http://www.seomoz.org/blog/the-next-generation-of-ranking-signals
132. Do Competitive Research
Where do these
brands earn
their links?
http:/googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
http://www.opensiteexplorer.org
139. When in Rome…
Find Your Corporate Voice
Phenomenal analysis
of statements by
I’m excited to be Googlers + how they
able to share my translate to
life’s passion with content/marketing
you. actions:
http://bit.ly/iGd7Pe
http://outspokenmedia.com/social-media/quora-hipsters/
140. Get your social on
• Stumble (upon)
• Thumb up
• Re-tweet
• Like it
• Share it
• Digg it
• Redd It
http://outspokenmedia.com/social-media/quora-hipsters/
141. Get your social on
And now… Pinterest
http://outspokenmedia.com/social-media/quora-hipsters/
142. Gillian Muessig
Founding President, SEOmoz
• Twitter: @SEOmom
Try SEOmoz Free for 60 Days
• Blog: www.seomoz.org/blog ICMA12
• Email: gillian@seomoz.org
Gillian Muessig – ICMA April 2012
Editor's Notes
Some queries are very simple - a search for "wikipedia" is non-ambiguous. It’s straightforward and can be effectively returned by even a very basic web search engine. Other searches aren't nearly as simple. Let's look at how engines might order two results - a simple problem most of the time, it can be somewhat complex depending on the situation.Since Content A contains the word “Batman” and Content B does not, the engine an easily choose which one to rank.
The search engine can use TF*IDF to determine that “Wiggum” is a much less common word than “chief” and thus, Content A is more relevant to the query than Content B. NOTE: This example also does a good job of showing the inherent weakness of a metric like keyword density.
Using co-occurrence, the engine can determine that phrases like “Daily Planet” and “Clark Kent” appear with “Superman” and thus, Content B is more relevant than Content A.
As humans reading both sentences, we can infer that Content B is obviously about the musical instrument – a piano – and the woman playing it. But a search engine armed with only the methods we described above will struggle since both sentences use the words “keys” and “notes”, some of the few clues to the puzzle.NOTE: We were pretty excited to see that our LDA modeling tool correctly scored B than higher than A… but then things got REALLY interesting.
For complex queries or when relating large quantities of results with lots of content-related signals, search engines need ways to determine the intent of a particular page. Simply because it containsa keyword 4 or 5 times in prominent places or even mentions similar phrases/synonyms doesn’t necessarily mean that it's truly relevant to the searcher's query.
In this imaginary example, every word in the English language is related to either "cat" or "dog“. They are the only topics available. To measure whether a word is more related to "dog," we use a vector space model that displaysthose relationships mathematically. The illustration does a reasonable job showing our simplistic world. Words like "bigfoot" are perfectly in the middle with no more closeness to "cat" than "dog." But words like "canine" and "feline" are clearly closer to one that the other and the degree of the angle in the vector model illustrates this-and gives us a number.BTW, in an LDA vector space model, topics wouldn't have exact label associations like "dog" and "cat" but would instead be things like "the vector around the topic of dogs.“Taking the simple model above and scaling it to thousands or millions of topics, each of which would have its own dimension. Using this construct, the model can compute the similarity between any word or groups of words and the topics its created. You can learn more about this from Stanford University's posting of Introduction to Information Retrieval, <http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html> which has a specific section on Vector Space Models <http://nlp.stanford.edu/IR-book/html/htmledition/dot-products-1.html>
The correlation with rankings of the LDA scores are uncanny. Certainly, they're not a perfect correlation, but that’s expected, given the complexity of Google's ranking algorithm. Seeing LDA scores show this dramatic result makes us seriously question whether there was causation at work here. We hope to do additional research via our ranking models to attempt to show that impact. Perhaps, good links are more likely to point to pages that are more "relevant" via a topic model or some other aspect of Google's algorithm that we don't yet understand naturally biases towards these.
Like anything else in the SEO world, manipulatively applying the process is probably a terrible idea. Even if this tool worked perfectly to measure keyword relevance and topic modeling in Google, it would be unwise to simply stuff 50 keywords on your page to get the highest LDA score you could. Quality content that real people actually want to find should be the goal of SEO and Google is sophisticated enough to determine the difference between junk content that matches topic models and real content that real users will like,even if the tool's scoring can't do that.
Search engines have, classically, relied on a relatively universal algorithm - one that rates pages based on the metrics available, without massive swings between verticals. In the past few years, however, savvy searchers and many SEOs have noted a distinct shift to a model where certain types of sites have a greater opportunity to perform for certain queries. The odds aren't necessarily stacked against outsiders, but the engines appear to bias to the types of content providers that are likely to fulfill the users' intent.For example, when a user performs a search for "lamb shanks," it could make a lot of sense to give an extra boost to sites whose content is focused on recipes and food.BillSlawsky reported on Entity Association - Rather than just looking for brands, it’s more likely that Google is trying to understand when a query includes an entity – a specific person, place, or thing. And if it can identify an entity, that identification can influence the search results that you see...
Google Plus Your World is about context. You get results that are biased to include what the people who you are connected with on Google+ are talking about, reviewing, liking, or with which they are connected.