4. Quick Summary
“For most sites, crawl budget is not something
to worry about. For really large sites, it
becomes something to consider looking at.
Prioritizing what to crawl, when, and how
much resource the server hosting the site can
allocate to crawling is more important for
bigger sites, or those that auto-generate
pages based on URL parameters”
- Gary Illyes, Google
Read Full Article: https://webmasters.googleblog.com/2017/01/what-crawl-budget-means-for-googlebot.html
5. Quick Summary
Crawl rate limit is
designed to help Google
not crawl your pages too
much and too fast where it
hurts your server
6. Quick Summary
Crawl demand is how
much Google wants to
crawl your pages. This is
based on how popular
your pages are and how
stale the content is in the
Google index
7. Quick Summary
Crawl budget is “taking
crawl rate and crawl
demand together”
Google defines crawl
budget as “the number of
URLs Googlebot can and
wants to crawl”
9. In Easy Terms
Crawl budget is the
number of times a search
engine spider hits your
website during a given
period of time
10. For Example
If Googlebot typically hits
my site about 1,000 times
a month, I can say that 1K
is my monthly crawl
budget for Google
11. Please Note
There is no universal limit
as to the number and
frequency of these crawls;
we'll get to factors that
form your crawl budget in
a moment
19. There’s no way
to find that but
to get a feel of
it, you can..
1. Check average number
of pages crawled per
day in Google Search
Console, Bing
Webmaster Tools etc.
2. Check access logs to
understand more on
search patterns
21. Pages are
crawlable
Check if all the important pages
are crawlable
- Not blocked from robots.txt
- Don’t have noindex tags or
X-Robots-Tag: noindex in
the header for these pages
Also, ensure the exact opposite
for unimportant pages.
22. Please note, if you use noindex meta
tag or X-Robots-Tag, you should not
disallow the page in robots.txt
The page must be crawled before the
tag will be seen and obeyed.
23. No redirect
chains
URL redirect = Waste of
crawl budget
Long redirect chains =
Spiders may drop off
before they reach your
destination page, which
means that page won’t be
indexed
24. No broken
links
If a page is broken, that’s
probably not important for
you. So, it doesn’t make
sense to waste crawl
budget on such pages.
Of course, fix it if it’s
important!
25. Handle URL
parameters
Spiders treat dynamic URLs as
separate pages even though they
lead to the same page (same
content). To handle that, you can
- Mark them as representative
parameters in search console
- Use canonical tags
For all the parameters which
doesn’t change the page content
like tracking parameters.
27. Site structure
and internal
linking
A well organised internal
linking structure would not
only make your customers
happy, but spiders would
also be able to understand
the importance of pages at
different levels
29. Use RSS Feed
RSS feeds is a nice way to
boost your readership and
engagement, they’re also
among the most visited sites by
Googlebot.
When your website receives an
update like new products, blog
posts, website updates etc.,
submit it to Google’s Feed
Burner so that you’re sure it’s
properly indexed
30. Careful with
rich media files
Most of the spiders are still
struggling to crawl rich media
content like JavaScript, Flash
etc.
Google is much better than
others but, spending more
resources to crawl rich media,
hence using more crawl budget
for the same
Also read:
https://www.elephate.com/blog/javascript-vs-crawl-budget-ready-player-one/