LOG FILE ANALYSIS 
The most powerful tool in your SEO toolkit 
Tom Bennet 
Consultant, Builtvisible 
@tomcbennet
Getting Started
What is a log file? 
A record of all hits that a server has received – humans and robots. 
http://www.brightonseo.com/abou...
They’re not pretty…
…but they’re very powerful. 
188.65.114.122 - - [30/Sep/2013:08:07:05 -0400] "GET /resources/whitepapers/retail-whitepaper...
Log Files & SEO
What is Crawl Budget? 
Crawl Budget = The number of URLs crawled on each visit to your site. 
Higher Authority = Higher Cr...
Crawl Budget Utilisation 
http://example.com/thin-product-page-1 
http://example.com/category/thin-product-page-1 
http://...
Working With Logs
Preparing Your Data 
Extraction: Varies by server. See accompanying guide. 
Filter: By Googlebot user-agent, validate the ...
Working in Excel 
1. Convert .log to .csv 
(cool tip: just change the file extension)
Working in Excel 
2. Sample size 
(60-120k Googlebot requests / rows is a good size)
Working in Excel 
3. Text-to-columns 
(a space will usually be a suitable delimiter)
Working in Excel 
4. Create a table 
(Label your columns, sort by timestamp)
Investigate
Most vs Least Crawled 
Formula: Use COUNTIF on Request URL. 
Tip: Extract top-level category for crawl distribution by sit...
Crawl Frequency Over Time 
Formula: Pivot date against count of requests. 
Tip: Segment by site section or by user-agent (...
HTTP Response Codes 
Formula: Total up HTTP Response Codes. 
Tip: Find most common 302s or 404s, filter by code and sort b...
Level Up 
Robots.txt – Crawl all URLs with Screaming Frog to determine if they are blocked in robots.txt. Investigate most...
Top Level Crawl Waste 
Formula: Use IF statements to check for every cause of waste.
Crime = Solved
All Brighton SEO attendees will receive the guide via email.
THANKS FOR LISTENING 
Get in touch 
e: tom@builtvisible.com 
t: @tomcbennet 
Tom Bennet 
Consultant, Builtvisible 
@tomcbe...
Log File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkit
Prochain SlideShare
Chargement dans…5
×

Log File Analysis: The most powerful tool in your SEO toolkit

10 750 vues

Publié le

Slide deck from Tom Bennet's presentation at Brighton SEO, September 2014. Accompanying guide can be found here: http://builtvisible.com/log-file-analysis/

Image Credits:
https://www.flickr.com/photos/nullvalue/4188517246
https://www.flickr.com/photos/small_realm/11189803763/
https://www.flickr.com/photos/florianric/7263382550
http://fotojenix.wordpress.com/2011/07/08/weekly-photo-challenge-old-fashioned/

Publié dans : Technologie
1 commentaire
26 j’aime
Statistiques
Remarques
  • Thanks for the nice sharing. Btw, on slide 15, there's a column 'Times Crawled'. Is that the time (Millisecond) the crawlers took to crawl the URL or the number of times the crawler crawled it? I'm confused with the naming. Thanks
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
Aucun téléchargement
Vues
Nombre de vues
10 750
Sur SlideShare
0
Issues des intégrations
0
Intégrations
716
Actions
Partages
0
Téléchargements
122
Commentaires
1
J’aime
26
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive

Log File Analysis: The most powerful tool in your SEO toolkit

  1. 1. LOG FILE ANALYSIS The most powerful tool in your SEO toolkit Tom Bennet Consultant, Builtvisible @tomcbennet
  2. 2. Getting Started
  3. 3. What is a log file? A record of all hits that a server has received – humans and robots. http://www.brightonseo.com/about/ 1. Protocol 2. Host name 3. File name Host name -> IP Address via DNS -> Connection to Server -> HTTP Get Request via Protocol for File -> HTML to Browser
  4. 4. They’re not pretty…
  5. 5. …but they’re very powerful. 188.65.114.122 - - [30/Sep/2013:08:07:05 -0400] "GET /resources/whitepapers/retail-whitepaper/ HTTP/1.1" 200 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; + http://www.google.com/bot.html)" Server IP Timestamp (date & time) Method (GET / POST) Request URI HTTP status code User-agent
  6. 6. Log Files & SEO
  7. 7. What is Crawl Budget? Crawl Budget = The number of URLs crawled on each visit to your site. Higher Authority = Higher Crawl Budget
  8. 8. Crawl Budget Utilisation http://example.com/thin-product-page-1 http://example.com/category/thin-product-page-1 http://example.com/category/subcategory/thin-product-page-1 http://example.com/category/subcategory/thin-product-page-1?colour=blue Etc… Conservation of crawl budget is key.
  9. 9. Working With Logs
  10. 10. Preparing Your Data Extraction: Varies by server. See accompanying guide. Filter: By Googlebot user-agent, validate the IP range. https://support.google.com/webmasters/answer/80553?hl=en Tools: Gamut and Splunk are great, but you can’t beat Excel.
  11. 11. Working in Excel 1. Convert .log to .csv (cool tip: just change the file extension)
  12. 12. Working in Excel 2. Sample size (60-120k Googlebot requests / rows is a good size)
  13. 13. Working in Excel 3. Text-to-columns (a space will usually be a suitable delimiter)
  14. 14. Working in Excel 4. Create a table (Label your columns, sort by timestamp)
  15. 15. Investigate
  16. 16. Most vs Least Crawled Formula: Use COUNTIF on Request URL. Tip: Extract top-level category for crawl distribution by site-section. http://www.brightonseo.com/speakers/person-name/
  17. 17. Crawl Frequency Over Time Formula: Pivot date against count of requests. Tip: Segment by site section or by user-agent (G-bot Mobile, Images, Video, etc).
  18. 18. HTTP Response Codes Formula: Total up HTTP Response Codes. Tip: Find most common 302s or 404s, filter by code and sort by URL occurrence.
  19. 19. Level Up Robots.txt – Crawl all URLs with Screaming Frog to determine if they are blocked in robots.txt. Investigate most frequently crawled. Faceted Nav Issues – Dedupe a list of unique resources, sort by times requested. Sitemap – Add your sitemap URLs into an Excel table, VLOOKUP against your logs. Which mapped URLs are crawl deficient? CSS / JS – These resources should be crawlable, but are files unnecessary for render absorbing an inordinate amount of crawl budget?
  20. 20. Top Level Crawl Waste Formula: Use IF statements to check for every cause of waste.
  21. 21. Crime = Solved
  22. 22. All Brighton SEO attendees will receive the guide via email.
  23. 23. THANKS FOR LISTENING Get in touch e: tom@builtvisible.com t: @tomcbennet Tom Bennet Consultant, Builtvisible @tomcbennet

×