Crawl Budget Optimisation at #dmwf2018

Nitin Manchanda
Global Head of SEO at trivago
Wise use of Crawl Budget for
Better Rankings

Who is Nitin?
A technical SEO
Works at trivago
Started as Software Engineer
Host at SEMrush SEO Webinars
@_nitman
nitman

Quick Summary
“For most sites, crawl budget is not something
to worry about. For really large sites, it
becomes something to consider looking at.
Prioritizing what to crawl, when, and how
much resource the server hosting the site can
allocate to crawling is more important for
bigger sites, or those that auto-generate
pages based on URL parameters”
- Gary Illyes, Google
Read Full Article: https://webmasters.googleblog.com/2017/01/what-crawl-budget-means-for-googlebot.html

Quick Summary
Crawl rate limit is
designed to help Google
not crawl your pages too
much and too fast where it
hurts your server

Quick Summary
Crawl demand is how
much Google wants to
crawl your pages. This is
based on how popular
your pages are and how
stale the content is in the
Google index

Quick Summary
Crawl budget is “taking
crawl rate and crawl
demand together”
Google defines crawl
budget as “the number of
URLs Googlebot can and
wants to crawl”

In Easy Terms
Crawl budget is the
number of times a search
engine spider hits your
website during a given
period of time

For Example
If Googlebot typically hits
my site about 1,000 times
a month, I can say that 1K
is my monthly crawl
budget for Google

Please Note
There is no universal limit
as to the number and
frequency of these crawls;
we'll get to factors that
form your crawl budget in
a moment

Isn’t Google smart enough
to justify?

But..
You want bots to
discover all the
important pages on
your site

And..
You want bots to find
new content as quickly
as possible

And..
You don’t want to
waste this crawl budget
on unimportant pages

What’s the crawl budget
for my site?

There’s no way
to find that but
to get a feel of
it, you can..
1. Check average number
of pages crawled per
day in Google Search
Console, Bing
Webmaster Tools etc.
2. Check access logs to
understand more on
search patterns

All good! But what can be
done to use it wisely?

Pages are
crawlable
Check if all the important pages
are crawlable
- Not blocked from robots.txt
- Don’t have noindex tags or
X-Robots-Tag: noindex in
the header for these pages
Also, ensure the exact opposite
for unimportant pages.

Please note, if you use noindex meta
tag or X-Robots-Tag, you should not
disallow the page in robots.txt
The page must be crawled before the
tag will be seen and obeyed.

No redirect
chains
URL redirect = Waste of
crawl budget
Long redirect chains =
Spiders may drop off
before they reach your
destination page, which
means that page won’t be
indexed

No broken
links
If a page is broken, that’s
probably not important for
you. So, it doesn’t make
sense to waste crawl
budget on such pages.
Of course, fix it if it’s
important!

Handle URL
parameters
Spiders treat dynamic URLs as
separate pages even though they
lead to the same page (same
content). To handle that, you can
- Mark them as representative
parameters in search console
- Use canonical tags
For all the parameters which
doesn’t change the page content
like tracking parameters.

Clean and
up-to-date
sitemaps
Up-to-date and clean
sitemaps can help spiders
find your important URLs
quickly
Keep only index/important
URLs in your sitemaps

Site structure
and internal
linking
A well organised internal
linking structure would not
only make your customers
happy, but spiders would
also be able to understand
the importance of pages at
different levels

Site structure
and internal
linking
Make sure all important
pages are accessible
easily through site
navigation flow

Use RSS Feed
RSS feeds is a nice way to
boost your readership and
engagement, they’re also
among the most visited sites by
Googlebot.
When your website receives an
update like new products, blog
posts, website updates etc.,
submit it to Google’s Feed
Burner so that you’re sure it’s
properly indexed

Careful with
rich media files
Most of the spiders are still
struggling to crawl rich media
content like JavaScript, Flash
etc.
Google is much better than
others but, spending more
resources to crawl rich media,
hence using more crawl budget
for the same
Also read:
https://www.elephate.com/blog/javascript-vs-crawl-budget-ready-player-one/

Crawl Budget Optimisation at #dmwf2018

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Crawl Budget Optimisation at #dmwf2018

Similaire à Crawl Budget Optimisation at #dmwf2018 (20)

Plus de Nitin Manchanda

Plus de Nitin Manchanda (8)

Dernier

Dernier (20)

Crawl Budget Optimisation at #dmwf2018