For our June 2019 event, Search Social & Attribution, we had two fantastic speakers.
Dale Nguyen presented: Developing a Data-Driven Link Building Strategy Using Google, Competitor, & Industry Insights
Francois Goube presented: What I learned from crawling 10 billions of Pages and analyzing 5 Trillions of log lines.
For full recaps of this and past events, head on over to utahdmc.org
33. THOMASARTS, FARMINGTON
Junior Business Analyst
‣ Study and design digital projects, products, and processes.
‣ 2-3 yrs experience + Bachelors degree
‣ https://thomasarts.bamboohr.com/jobs/view.php?id=118
#UTAHDMC
34. UTAHDMC.ORG/JOB-BOARD
PUT YOUR JOB IN FRONT OF OUR AUDIENCE!
‣ EMAILED TO 4500+ CONTACT DATABASE
‣ SHARED ON TWITTER, FACEBOOK AND SLACK
‣ ANNOUNCED AT MONTHLY EVENT
QUESTIONS? EMAIL AJ@UTAHDMC.ORG
#UTAHDMC
36. DALE NGUYN
DEVELOPING A DATA-DRIVEN LINK
BUILDING STRATEGY USING GOOGLE,
COMPETITOR, & INDUSTRY INSIGHTS
VP OF PARTNER DEVELOPMENT
MULTIFUSE
SARATOGA SPRINGS. UT
TONIGHTS 1ST PRESENTER:
#UTAHDMC
37. How to Create a Link Building Strategy Using
Google Results & Competitor Data
Dale Nguyen
VP of Business Development
dale@multifuse.com
linkedin.com/in/dalenguyen
38. Link Building
has been a
primary ranking
factor for a
long, long time.
[In 2007,] link-related factors crowded the list [of top
ranking factors] - SearchEngineLand
The findings seem to illustrate the fevered pitch that link
building had reached around that time [2009-2013].
Who wouldn’t look at these results and dump a majority
of their resources into links in whatever shape or form
they could get? - SearchEngineLand
[In 2013], links are still believed to be the most important
part of the algorithm (approximately 40%). - Moz
[In 2016], backlinks remain an extremely important
Google ranking factor. We found the number of domains
linking to a page correlated with rankings more than
any other factor. - Backlinko
In summary [for 2019], our aggregated view shows a
very powerful correlation between links and ranking
position. - Stone Temple
39. “Lazy”
Strategies
Forums, web 2.0, bookmarks, PBN’s, paid links,
blog comments, directories.
Value-Based
Strategies
Q&A, local citations, widgets/tools, guest
blogging, link reclaiming, broken link building,
traditional outreach.
There are two
main categories
for link building
strategies.
40. This is what lazy, unnatural link building
looks like.
41. And, this is what natural, value-based link
building looks like.
42. What does
Google think?
https://support.google.com/webmasters/answer/663
56?hl=en
Google does not want:
● Paying or incentivizing a webmaster for a link.
● Excessive link exchanges.
● Large scale campaigns with keyword anchor
text.
● Automated programs/software.
Google does want:
● The webmaster to maintain editorial control.
● Value for the user should they click on the link.
43. The line in the
sand is editorial
control.
“Lazy”
Strategies
Forums, web 2.0, bookmarks,
PBN’s, paid links, blog
comments, directories.
Value-Based
Strategies
Q&A, local citations,
widgets/tools, guest blogging,
link reclaiming, broken link
building, traditional outreach.
44. “Your backlinks should be high authority!”
True, but relative to competitors ranking
for your keywords.
45. Our step by step guide to creating a
personalized link building strategy.
Get your copy of the tool:
bit.ly/32xcRPF
46. Identify your
page’s most
important
keyword.
If your website is already optimized:
● What is your page’s primary keyword that all
other related keywords fall under?
If your website is not optimized:
● Ideally, you should have your keyword research
and keywords mapped to pages first.
● Moz, SEMrush, and Ahrefs all have great tools
to complete your research.
52. For each
competitor root
domain, record
the linking
domains by DA.
Manually record
data to the Step 4
tab in the tool for
each competitor
and DA tier.
53. Tip: if any of the competitors are strong outliers,
uncheck their column to avoid skewed data.
54. So, what does the ideal backlink profile
look like for “UTV Accessories”?
55. “Great! But, which strategy should we use?”
Any strategy that is value-based where the
webmaster keeps editorial control.
Create linkable assets on your target
pages to justify linking to them.
56. “What is a linkable
asset?”
It’s a
non-promotional
statistic, opinion,
or other
educational
resource.
WHITEPAPER BLOG POSTFAQ
STATISTIC QUOTE
59. Let’s recap.
Use Value-Based Link Building
Strategies
Target websites that maintain
editorial control.
Identify the needed quality and
quantity of backlinks using our tool.
Justify linking to your pages by
adding linkable assets to them.
60. Ending remarks.
This tool does not address backlink relevancy,
existing penalties, or anchor text ratios.
This tool is not perfect. It’s going to be directionally
accurate based on real data and help with timelines,
expectations, and budgeting.
This tool will be much more accurate than the
common, shot in the dark, gut feeling strategy.
62. FRANCOIS GOUBE
WHAT I LEARNED FROM CRAWLING 10
BILLIONS OF PAGES AND ANALYZING
5 TRILLIONS OF LOG LINES.
CEO & FOUNDER
ONCRAWL
BORDEAUX AREA, FRANCE
TONIGHTS 2ND PRESENTER:
#UTAHDMC
63. By @FrancoisGoube, CEO @Oncrawl
What I learned crawling
10 billion URLs
and analyzing
5 trillion log lines
67. “Au Menu”
Insights / Fun Facts / Weird facts
Best practices
Demistify some SEO Myths
68. ~5 Trillions
of Log lines
Pieces of Data I grabbed : Crawl & Logs
1
I crawled 250k random websites from the
Majestic Million up to Pagedepth Level 5
I used the data of 97 Oncrawl customers with
their agreement (sites from 10k to 100M+ Urls)2
= ~8B urls
= ~2B urls
~10B urls
3
I look deeply into the Logs Data from these
97 customers ~230 websites over 365 days =
69. I needed to map the websites
I asked our Engineers to run a Machine Learning Model based on
gradient boosting: regression trees to classify websites
Distribution by number of websites Distribution by number of URLs
75. “Taking crawl rate and crawl demand together we define crawl budget as the
number of URLs Googlebot can and wants to crawl.” (Google Webmaster
Blog)
✓ Everyday Google chooses how many and which pages to crawl
✓ Your mission is to bring Google to your Money Pages
Understand Google Crawl Budget
76. What Google says about crawl budget
If you observe that new pages are usually explored the same day
they are published, then you don't really have to worry about
the exploration budget
[…] if a site has less than a few thousand URLs, it will be browsed
correctly most of the time
[…] we do not have a single term to describe everything this term
seems to mean on the outsidene
77.
78. What Google should say is…
Help me point in the right direction,
Help me discover and value your money pages!
79. What Google should say is…
Help me point in the right direction,
Help me discover and value your money pages!
81. Things to know about crawl budget
Crawl budget looks like a Zero-sum game
Your Paid Campaigns might hurt your crawl budget.
Same behavior is observed on 92% of websites in the test
Standard Day Paid Campaign Day
DataSet: 230 Oncrawl monitored Websites
82. ▪ Freshrank is the average timeframe between two events:
1. Google crawled the page for the first time
2. Google sent the first Organic visit
How much time to get a visit on a new page?
Average Freshrank (days)
0.0
22.5
45.0
67.5
90.0
Ecommerce Media Media Niche Player
Average Freshrank vs PageDepth
0.1
1.0
10.0
100.0
1000.0
Ecommerce Media Media Niche Player
Average Freshrank PageDepth 1 to 3 PageDepth 3 to 5 Over PageDepth 5
DataSet: 230 Oncrawl monitored Websites
If you want to drive quickly Organic Traffic, be accurate on PageDepth Level Analysis
83. How to know if you are being migrated to the Mobile First Index ?
Simply look at Web vs Mobile bots hits from Googlebots
State of the Mobile-First Index
84. Are my competitors already in? What’s the state of my market?
State of the Mobile-First Index
86. How often Googlebot renders JS?
We checked only websites without any pre-rendering solutions
(yes there are some…)
On average Googlebots rendering JS are crawling these websites every 24 days
88. Best ranked pages are not always the most crawled by Google
Crawl Budget is not a ranking factor
89. Do people really care about 3XX, 4XX & 5XX?
Media Niche Player
Ecommerce Niche Player
Media
Classifieds
Ecommerce
0% 25% 50% 75% 100%
3XX 4XX 5XX
Not Compliant URLs
527,095,667
Indexable URLs
1,486,160,763
Indexable URLs Not Compliant URLs
90. Impact of 3XX or 4XX on Googlebots
The errors & redirects encountered
by Googlebots directly impacts your
Crawl Budget and how Google sees
your website
91. Impact of 5xx errors
5XX errors on pages ranking first page over the last 30 days
Gain Positions
No loss
Lost less than 5 positions
Lost more than 5 positions
0 1000000 2000000 3000000 4000000
We never find any direct correlation between 5XX errors and Bots behavior…
…But you might loose some rankings
DataSet: 230 Oncrawl monitored Websites: GSC vs Logs
92. Best SEO Trick of the Year: Cloaking 503
When migrating your website simply cloak your pages for
Googlebots with a 503 (service unavailable).
Googlebot will come back later and won’t index your pages.
Cloaking
isnota
Crime
93. The state of AMP
Ecommerce News Classifieds
0,0004% 0,007% 0,0002%
DataSet: 1.4 Billions compliant pages
% of pages implementing AMP
94. ▪ Crawl frequency on AMP Pages is always higher than any other pagetype
▪ AMP Pages take a huge part of your crawl budget
▪ Most advanced players (media) maintain a flat number of AMP Pages
(~5% of their pages / Rule depending on Pubdate)
Interesting Facts
AMPNot AMP
95. The use of structured data
Use of Schema.org on Articles / Product Pages / Ads
0%
25%
50%
75%
100%
Ecommerce Classifieds Media Ecommerce Niche Player Media Niche Player
2+ Schema Types Only 1 Schema type 0 Schema types
DataSet: Only Product pages / Article Pages / Ads ~900M pages
96. The real impact of Structured Data
Pages with Structured
Data get Rich Snippet
And Way better CTR!!
This is a good way to start predicting your SEO ROI
100. ▪ Niche Ecommerce players
Looks like Google knows very well each category
101. ▪ Let’s look at content size:
And behaves differently depending on website category
for each ranking factor
Ecommerce
Classifieds
102. ▪ Google does not treat your near duplicates the same way
depending on how you handled your canonicals
The weird behaviour on Nearduplicates w/ bad canonicals
103. Distribution of
Pages in structure > crawled > ranked > active
Pages in structure
Pages in structure crawled by Google
Pages in structure that are ranked in Google
Active pages in structure
0% 25% 50% 75% 100%
8%
19%
26%
32%
Use the Lookalike method to spot common caracteristics of PageGroups
! What are the pages with similar metrics to Active pages