Sites with any level of content production quickly build up pages that are outdated, no longer relevant and poor performers. Left unmanaged crawl budget may be wasted on low quality pages, penalties may be lost.
In this presentation, Sam wants to show you how to do it in a way that saves you time.
Cut The Crap: Running Content Audits With Crawlers - Sam Marsden, Technical SEO, DeepCrawl
1. Cut the Crap:
Running Content Audits
With Crawlers
Sam Marsden, Technical SEO Executive
SEOCAMPIXX 2018
@sam_marsden SEOCAMPIXX
2. A bit about me...
@sam_marsden SEOCAMPIXX 2
About me...
Technical SEO Executive at DeepCrawl
3. Last year I started at DeepCrawl...
@sam_marsden SEOCAMPIXX 2
4. Soon after we received Series A funding...
@sam_marsden SEOCAMPIXX 2
5. This meant we could scale up...
@sam_marsden SEOCAMPIXX 2
Very
happy
CEO
6. ...and money was made available for a redesign of the website
@sam_marsden SEOCAMPIXX 2
7. A website redesign is a long process...
@sam_marsden SEOCAMPIXX 2
Source: http://ezsitecms.com/services/website-redesign/
8. ...and we wanted to migrate to a new CMS
@sam_marsden SEOCAMPIXX 2
Source: https://juliandontcheff.wordpress.com/2014/05/25/cross-platform-transportable-database-and-oracle-engineered-systems/
9. ...because we were suffering from plugin bloat
@sam_marsden SEOCAMPIXX 2
https://www.greenlanemarketing.com/wp-content/uploads/2015/03/index-bloat.jpg
10. And needed to manually re-enter the data into the new CMS
@sam_marsden SEOCAMPIXX 2
http://blog.transactionpro.com//wp-content/uploads/2015/07/shutterstock_139392815.jpg
11. ...so we only wanted to migrate the content that we needed
@sam_marsden SEOCAMPIXX 2
12. A content audit was in order!
@sam_marsden SEOCAMPIXX 2
We needed to:
Discover the full extent of the site’s content
inventory
Attach relevant performance data to each of
the site’s pages
Create a set of criteria to decide what content
keep and to get rid of.
Apply that criteria to the site’s pages
Decide if content to keep should remain in its
current form
13. How can we do this in a thorough but time-efficient way?
@sam_marsden SEOCAMPIXX 2
18. We want a fresh approach...
@sam_marsden SEOCAMPIXX 2
Thorough Time-saving Replicable
19. Content auditing is like a spring clean Tiefenreinigung
Think home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
1. First you need find all the crap you have hidden
away in your home.
Discovering all of your URLs
2. Then decide what is off-limits and definitely
going to be kept
Taking your core pages out of the equation
3. What’s your reasoning behind what will go?
Creating a set of criteria to make decisions
on pages
4. Making the call on what gets binned? What
stays? What gets a new lease of life?
Deciding what to do with your pages https://tookapic.com/photos/36415
21. The Discovery Phase
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
Aim: To discover all of the existing URLs on your site.
Other guides suggest either:
● Exporting a list of pages from your CMS
○ BUT Pages may be missed - not thorough
● Running a crawl
○ BUT Only running a crawl will give you a limited view of the data
● Exporting data from third party tools and joining to your crawl data
○ BUT Joining the data is laborious and time consuming - not easily
replicable
23. Putting Crawl Data at the Centre of Your Audit
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
By using a cloud crawling solution like DeepCrawl:
● Not limited by scale - sites with hundreds,
thousands and millions of URLs can be crawled
● Can easily bring in multiple data sources
without the need to export tool data and import
into Excel table
Instead of seeing crawler as a bringing single
data source, put it at the centre of your content
audit.
24. Running a crawl
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
25. Using Custom Extractions
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
You can also use custom extractions to pull out
information on your site which can help inform
your content audit.
● Authors - Content performance by writer
● Published date, Last modified date - to
examine data in specific date ranges
● Structured and meta data - presence of
certain markup correlating with better
organic performance.
● Tagging - Extract on page article tags and
meta keywords
27. Now you’re going to have a large dataset
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
28. The Refining Phase
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
Aim: To take this raw data and cut it down to what is necessary and useful in order
for you to make decisions on the content of your site.
https://www.sharpen-up.com/whittle-beginners-guide-wonderful-craft-whittling/
29. Chopping the data down to size
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
Go through the spreadsheet and start chopping it
down to size.
Two parts:
1. Getting rid of unnecessary
metrics (columns)
2. Removing pages that sit outside
of the audit (rows)
30. The Whittling Phase
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
Removing pages that sit outside of the audit
Once you’ve decided on the metrics, you will want to remove pages that site
outside of the audit.
These may include:
Category pages
Paginated pages
Core pages
Faceted URLs
31. With you’re reduced dataset you can avoid this...
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
32. And let the streamlined and efficient content auditing commence...
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
33. What are you left with?
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
After you’ve cut down your dataset you want to be left with:
Page descriptors - URLs, Page titles, Meta descriptions
Page attributes - word count, published & last modified date, links in/out,
duplicate, categories, author
Performance metrics - backlink data, social shares, traffic, SERPs,
impressions, time on page
35. Question No. 1: What is and isn’t performing well?
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
vs.
36. Defining a set of criteria for content performance
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
You need to define a set of criteria by which you judge content
performance.
Will vary dependent on the nature of the site.
For example:
A news site that generates revenue through ad impressions will
define successful content differently from a B2B site that provides a
niche service.
You may also have different expectations of content performance
dependent on the content type. Mass appeal vs. targeted content.
37. Defining a set of criteria for content performance
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
In the case of the DeepCrawl content audit we assessed content
performance based on:
Unique pageviews Share count Backlink count Page Value*
(Analytics)
Inclusion relies on correct goal implementation*
38. Number 2: How can you deal with content that isn’t performing well?
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
https://mylearningsolutions.org/2014/08/13/five-decision-making-pitfalls/
39. Dealing with poor performing content
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
In your spreadsheet you’ll want to create an ‘Action’ column:
This column will feature a set of options which you will use to categorise each page. This
will include the four C’s (or K’s) of content audit decision making:
1. Keep - Pages that are performing well and will not be changed significantly
2. Cut - Low value pages that don’t deserve a place on your site e.g. outdated content
3. Combine - Pages that include content that doesn’t warrant its own dedicated
page but can be used to bolster another existing page
4. Convert - Pages with potential that you want to invest time improving
e.g. partially duplicate content
40. Dealing with poor performing content
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
In DeepCrawl’s case we knew there was a lot of outdated content no longer providing
value.
We could afford to be cut-throat and only keep content that:
● Had a publish date within the last year.
● Or had a specified number of traffic from Analytics or impressions from GSC Search
Analytics.
Medium sized site so could review each poor performing page and decide if could be
combined with relevant pages or marked for rewriting.
41. Criteria creation
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
For pages where you aren’t sure about what action to take.
Ask yourself:
● Is the page being seen in search and receiving traffic?
● Is the page actually bringing value to the site?
● How would pages fair if they were put in front of Google’s search
quality raters?
○ Do they exude Expertise, Authoritativeness and
Trustworthiness?
○ If not, can the content be merged with a stronger page on a
related topic or is there the resource available to elevate that
content?
42. No. 3: What can you do to get the most out of content that is performing well?
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
@sam_marsden SEOCAMPIXX 2
http://theleagueam.com/2017/06/24/coaching/
43. Filter your spreadsheet by what you want to keep...
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
44. This will effectively be a exercise in content optimisation
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
This is where you can start using a fuller breadth of the data you’ve pulled in.
Key areas to focus on for content optimisation:
● Optimising titles & meta descriptions – are titles and descriptions appealing
propositions? Match the user intent in search?
● Keyword cannibalisation – Multiple pages ranking for topically similar queries?
● Duplication issues – Unique content? Near or true duplicates diluting the
authority?
● Linking – Internal/external linking opportunities? Relevant CTAs? Place in user
journey
● Page speed – Ways to reduce load time e.g. image optimisation or clunky code?
● Structured data – Existing implementation correct? Additional markup?
● Tag pages - Can drive value if done well, but sites often have too many. No. tags
compared to articles (ratio)? Can this be reduced to consolidate authority?
45. 4. How can you use this data to inform your content strategy?
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
https://www.eventbrite.co.uk/blog/video-event-industry-trends-and-the-future-of-events-ds00/
46. Need data driven insights because content marketing resources are finite...
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
https://www.freepik.com/premium-photo/empty-piggy-bank_1568162.htm
47. And need to ensure resources are invested into more of what works...
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
http://blog.peerform.com/will-banks-survive-competition-from-alternative-financial-markets/
48. Achieving this is all about finding relationships...
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
http://hopesrising.com/?p=5677
49. Finding patterns and relationships...
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
Aim: To establish patterns from data taking into account objective of site to
make decisions of content strategy.
● Particularly useful for large sites where page-by-page analysis isn’t an
option.
● Doesn’t have to be one-off exercise, can form basis of ongoing
reporting for clients or internal teams.
Let’s take a look at some relationships which may be of interest.
50. Tool of choice: The Pivot Table - Pivoting variables around metrics
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
51. 1. Performance by channel/category/content type
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
Do some types of content perform better than others?
Group content into categories and look in terms of performance
(views, shares, backlinks) and volume of production (no. articles
published).
Are you allocating content efforts efficiently?
Is time, money and effort being spent on the right types of
content?
52. 1. Performance by channel/category/content type
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
● Majority of content resources are going into Sport and News.
● TV/Showbiz articles receive much higher average no. pageviews but much fewer no.
articles.
● You’d want to investigate possibility of upping production of TV/Showbiz articles to see
if can maintain higher average volume of traffic.
53. 2. Content length and engagement
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
Is content length positively correlated with engagement?
● Can look at word count and time on page to determine this?
● Is there a point of diminishing returns e.g. beyond 1,000 words?
54. 2. Content length and engagement
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
If engagement doesn’t increase linearly with content length then
can resources for content production be used more efficiently.
● Create guidelines for content length based on insights.
● Topics selected based on impact rather than content length.
● Greater awareness of time taken to create content and the likely
impact that can be expected.
55. 3. Relationship between page speed and engagement
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
Is page speed harming bounce rate and conversion rate?
● Do some pages load more slowly than others?
● Why? Are some resource heavy? Images need to be optimised?
● Important, especially for eCommerce as load time and bounce rate closely
tied to conversion rate.
https://www.branded3.com/blog/mobile-speed-experience-googles-2-4-second-sweet-spot/
56. 4. Performance and engagement by author
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
How does content performance vary by author?
● Useful for sites with high turnover of content, like news sites.
● Define ranges by which to rate content performance
○ E.g. Poor, average, good, excellent based on pageviews
● Can be replicated on a weekly, monthly, quarterly basis for
ongoing monitoring.
Author name Poor Average Good Excellent
Barton Haberkorn 26 64 11 60
Jacquelynn Kline 19 79 4 49
Claudette Etheredge 87 79 77 11
Sharell Phinney 73 31 8 20
Dane Shiner 51 54 7 90
Francesco Kirwin 84 90 21 57
Issac Asberry 54 78 29 47
57. 5. Performance fluctuations by publish date and time
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
Is content better received on specific days of the week, time of the
day or months of the year?
Can you tailor content publication to times that are likely to get more
exposure?
May involve working non-standard hours or days to meet demand of
your audience.
58. But this is just the beginning...
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
59. From here you want to automate the auditing process...
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
60. ...and pull this data into dashboards for continuous monitoring
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
61. And so to wrap up...
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
Within each audit you want to be able
answer:
1. What is and isn’t working well?
2. What should you do with poor
performing content?
3. How can you get even more out of
your better performing content?
4. What patterns can you find to ensure
content resources are better
allocated?
62. And so to wrap up...
Think of running a content audit like you would a spring clean of your
home
Think of running a contenFirst you need find all th
https://balancedcarend.com/2013/11/21/healthy-holiday/squirrel-nut/
@sam_marsden SEOCAMPIXX 2
The content auditing process should be centred around a cloud based
web crawling solution and be:
● Data driven - So that it is thorough and backed by insights rather
than intuition
● Automated - To save you time and ensure it’s a quick and painless
process
● Frequent - Regularly replicated to assess the impact of changes and
change course accordingly.