This document discusses conducting a forensic SEO audit using a 4-stage investigation model. Stage 1 involves preparation including interviews and initial assessments. Stage 2 is evidence acquisition through audits, logs, and tools. Stage 3 is analyzing the evidence to prioritize issues. Stage 4 disseminates results through a report, action items, and accountability measures. The document provides examples of evidence collection on technical issues like structured data, rendering, and geolocation. It emphasizes connecting clues to identify root causes and aligning stakeholders on next steps.
6. Perficient Digital 6
@renee_girard | #mkesearch
Hurry Up andWait
of enterprise SEOs wait at least a
year for critical technicalchanges to
be implemented
48%
Source: https://moz.com/blog/how-long-are-seos-waiting-for-their-most-important-changes
7. Perficient Digital 7
@renee_girard | #mkesearch
Computer Forensic Investigation Model
Source: Taylor,Haggerty& Gresty(2007)OrganisationalModelforComputer
ForensicsInvestigations
• Identify the purpose of investigation
• Identify resources required
Stage 1
InvestigationPreparation
• Identify sources of digitalevidence
• Preserve digitalevidence
Stage 2
Evidence Acquisition
• Identify tools and technique to use
• Process data
• Interpret analysis results
Stage 3
Analysisof Evidence
• Report findings
• Present findings
Stage 4
Results Dissemination
8. Perficient Digital 8
@renee_girard | #mkesearch
SEO Forensic Investigation
Investigation Preparation Evidence
Acquisition
Results
Dissemination
Evidence
Analysis
Stage 1 Stage 2 Stage 3 Stage 4
Perficient Digital
20. Perficient Digital 20
@renee_girard | #mkesearch
Case of AggressiveBot Crawling
Crime: Aggressive botbehavior
causedsiteoutages
Perpetrator: Bingbotmadeup 69%
ofall eventsbutonly10% oforganic
traffic
21. Perficient Digital 21
@renee_girard | #mkesearch
Case of BrokenIP-Based Geolocation
Crime:Crawlbarriersevident byrewrittentitletagsand
Canadawhich received 34% –50% ofall Googlebotevents
Perpetrator: IP-basedredirectsforcedanyuser/botfromthe
US tothe Canadahomepagebydefault
22. Perficient Digital
Page Prioritization
Inventory
22
@renee_girard | #mkesearch
Step 1: Createan inventory byexporting pages from all available tools intotheir own tab
Step 2: Compileall pages intoa single column on a new tab anddedupe
Step 3: Addcolumns forcritical metrics andVLOOKUPtheir values fromtheir respective tab
Step 4: Crawlthe list andadd HTTP status codes
Step 5: Adda column for PriorityScore that sums up all metrics
24. Perficient Digital
Page Template Review
24
@renee_girard | #mkesearch
Step 1: Identify page templates
Ex.– CMS,category, subcategory, facet, product, blog post
Step 2: Run each template through compliance checks
Step 3: Determine issues that occur globally oracross certain templates
25. Perficient Digital 25
@renee_girard | #mkesearch
Page Template Review Example
#
Area of Examination
Examination
Doesit Exist? Does it Comply? Findings
Category Template: https://domain.com/sample-url
1 Title Tags Yes Yes
2 Meta Descriptions Yes Yes Increase characters/pixels
3 Meta Robots Tag Yes Yes
4 Header Tags Yes Yes Optimize header tags
5 Unique Text Yes No Need unique, optimized copy
6 Facets/Filters Yes No Discuss options for crawl control
7 Images Yes No Add alt text
8 Pagination Yes No Discuss rel=next/prev options
9 Structured Data No -- Add breadcrumb markup
10 Load More No --
11 Video No -- Discuss adding video
36. Perficient Digital
Stage3:
Evidence Analysis
36
@renee_girard | #mkesearch
• Connect the clues
• Direct vs. indirect relationships
• Prioritizecollected evidence
• Expected LOEvs. business impact
• Threat vs. opportunity
• Nice-to-haves vs. must-haves
37. Perficient Digital 37
@renee_girard | #mkesearch
Evidence Analysisfrom a Search Engine POV
Discovery Crawling RetrievalIndexing Ranking
1 2 3 4 5
Perficient Digital
41. Perficient Digital 41
@renee_girard | #mkesearch
GoogleSheets Action Items
# Issue Priority Recommendation Owner Dev Deliverable DateFixed
Topic
1 What?
Outline the
subtopic or
orconcern
Sowhat?
Critical,High,
High, Medium,
Now what?
Listpotential
solutions andyour
recommendation
Who?
Assign who is
responsible for
implementation
Outline if dev
resources will be
be required
Link tosupporting
supporting document in
When?
Document date of
implementation
42. Perficient Digital
Executive Summary
42
Executive Summary
100-footview of what was done, how it was done, and top areas of concern.
Summarizedtable of issues listing low LOE and high-business-impact items first.
# Issue Priority RecommendationsSummary
Topic
1 What? So what? Now what?
Perficient Digital
Thanks everyone who made it out here to MKESearch tonight instead of Ikea.
Today I'm excited to discuss my CSI approach to SEO audits using forensic investigation
My name is Renee Girard and I’m a Senior Organic Search Strategist
I’ve been on the agency side for the past 6 years
SEO for clients medium to enterprise Fortune 1000 clients
When they let me, I’ve been fortunate enough to speak at Pubcon and the undergrad business classes at UWM and Marquette.
Started in DM as a student at UWM and joined MIMA. Honor to speak with you today.
I work at Perficient Digital which is Perficent’s digital agency
I am out of the Milwaukee, Wisconsin office so if i seem too happy to be here, it’s because it’s literally 35 degrees back home right now and still considered “SPRING BREAK!”
We’re hiring for SEO, PPC, and CRO nationwide! Go to PD.com/careers or connect with me if you’re interested or know someone who might be.
Whether it's after a botched platform migration or you're seeing a decline in traffic, the SEO audit is supposed to help solve SEO crimes but bad audits should be considered a crime in itself
I’ve seen a lot of them and I’m sure you have to
Traits of subpar audits:
GENERIC = UNDERWHELMING = SO WHAT?
It’s the one that looks generic or boilerplate as if the auditor simply just did a find and replace on the client name, left underwhelmed and asking “SO WHAT?
TMI = OVERWHELMING = NOW WHAT?
or...It’s the one that came from the auditor that lacked sufficient experience and got overwhelmed ending up stuck in the weeds and cycle of analysis paralysis and information overload, left asking “ok everything is broken, but NOW WHAT?”
I know from my own experience how easy it is to overload everyone with too much information especially at the enterprise level
The last technical audit i did could have gone down that route as i was tasked to audit
One of the largest hat retailers with at least 9 million pages, over 1,200 GMB listings, reaching 2 regions served in both English and French
The result: I audited over 150 areas of this giant site to create a 53 page audit document, added over 50 new JIRA tickets, spent over 4 hours in meetings with the client mostly because they refused to read those 53 pages.
Then, had to still wait 4 months for 1 of those JIRA tickets to get implemented
4 months isn’t bad compared to a 2016 survey that found almost half or 48% of SEOs working for big companies had to wait at least a year for any critical technical changes to get done
Audits are all too often a hurry up and wait scenario
Audit is needed because performance may be suffering, you spend hours developing the audit, then the tickets sit in a JIRA project graveyard because there’s NEVER enough dev hours.
Nothing getting done makes it challenging to maximize your SEO investment since often any optimizations you do may be limited by technical barriers.
After that experience, I discovered a four-stage organizational model that’s used as the high-level process for computer or digital forensics investigations to solve cyber crimes.
This model can easily be applied as a guide to SEO audit investigations
This adopted forensic approach to SEO audits starts with investigation preparation followed by evidence acquisition, then leads into thorough evidence analysis and results dissemination.
These four stages combined with actionable deliverables, can help the client and SEO investigator understand:
Risks
Short term and long term opportunities for growth
Quick wins -- everyone’s favorite!!
We begin with pre-investigation prep work as stage 1
Obtain warrants to access the following necessities:
Analytics
GSC/BWT
Log files if they have access to them
Third-party tools they are paying for that you wouldn’t have access to otherwise
While you’re waiting for clearance, conduct an in-depth discovery interview with your key stakeholders and align on the audit’s overall mission
Designed for the investigator to understand the client’s past and current landscapes in both business and digital.
Make sure you design the questionnaire to cover the basics and customize as needed.
Get to know their business quick:
who are their stakeholders, KPIs, business goals, competitors, and unique value props.
Get to know their digital strategy quick:
outline their past and present SEO and paid search efforts and site relaunches, and keywords critical to their business
Now that you’ve done the necessary prep work, it’s time to collect evidence for stage 2
Before jumping right into an audit checklist, save yourself time later by first doing some initial assessments and data pulls
Then proceed with an audit checklist
I recommend the free and paid checklists provided by Annie Cushing as a foundation
Like any good detective, catalog evidence with screenshots, save everything, and take detailed notes
You never know when you may need this later for an audit appendix
Goals: Help you to understand what are most likely the top issues or at least the places where you will need to dig deeper later
GSC
Going through messages, configurations, check for manual action penalties, and review the new GSC reports
BWT
Who here uses BWT for auditing?
For those who use it, you already know that BWT aggregates all subdomains into one account and reports all XML sitemaps even if you haven’t submitted them UNLIKE GSC
Sitewide crawl or segmented crawl (SF, DC)
Connect APIs for more data, crawl as different user-agents (Googlebot for smartphones vs. Googlebot regular for mobile-first indexing), text-only vs. JavaScript rendering
Analytics
If you don’t trust it, do a quick check with SF extraction for UA-ID and GTM container
If you still don’t trust it, get a separate analytics audit and leave it to the experts
SEOs shouldn't do analytics audits!
Whether it's after a botched platform migration or you're seeing a decline in traffic, the SEO audit is supposed to help solve SEO crimes but bad audits should be considered a crime in itself
I’ve seen a lot of them and I’m sure you have to
Traits of subpar audits:
GENERIC = UNDERWHELMING = SO WHAT?
It’s the one that looks generic or boilerplate as if the auditor simply just did a find and replace on the client name, left underwhelmed and asking “SO WHAT?
TMI = OVERWHELMING = NOW WHAT?
or...It’s the one that came from the auditor that lacked sufficient experience and got overwhelmed ending up stuck in the weeds and cycle of analysis paralysis and information overload, left asking “ok everything is broken, but NOW WHAT?”
Crime: duplicate subdomains and staging sites were indexed, earning legitimate links, and wasting valuable crawl budget -- very common problem when you use multiple CMSs
Case solved with BWT inbound links report, text to columns, pivot table to identify 18 subdomains
Didn’t know exists, shouldn’t have existed, shouldn’t be indexed
Then did additional 301 redirects and added authentication for staging sites
More advanced assessments that really will set your audit apart from the rest
Includes log file analysis, page prioritization inventory, page template review, Search Visibility Matrix, and a rendering assessment
Escalate what CANNOT wait
Use best judgement and raise the alarm when appropriate
If you see a critical red flag, PLEASE do not wait until you’ve finished the audit
Analyzing log files is the only way to truly see the site how a bot does
Using SF Log File Analyser, I’ve been able to discover audit findings that can only be confirmed by logs
Examples include: aggressive bot crawling where Bing made up 69% of all events even though they only received 10% of traffic (typical of Bing)
Helped me to then take the necessary actions to slow them down with crawl delays.
Case of broken IP-based geolocation
Used SF LFA again and found data that can only be provided by logs
Client has 6 regions (no US) that forces geolocation based on the IP address
If bot/user coming from US tries to visit the site, they will be forced to the CA homepage by default
Crime: Google broken indexing where even when looking for the AR site, the title tag was rewritten to say "Canada"
Confirmed with log files: Googlebots servers mostly come from US so CA directories received upwards of 50% of all Googlebot events compared to the other 5 regions
Create an inventory of the most important pages and give them a priority score
Create an inventory of the pages from tools like SEMRush, analytics, Majestic into their own tab
Add all the URLs in to a new tab and dedupe
Add columns with vlookup statements to pull in critical metrics like traffic, revenue, links, and rankings
Crawl the list and add HTTP status codes
Add a sum column to give a Priority Score to each page
Page prioritization scores can be used for almost any report to help prioritize efforts of greatest impact
Ex. - list of 404s. Instead of providing a list from GSC crawl errors, use that list as an input then FILTER the HTTP status column to 40X errors
Identify template-specific issues through a series of compliance checks
Find different page templates by manually reviewing the site and prioritization inventory
Run each page template through a list of compliance checks (customize)
Use that to determine if there are key issues occurring globally or template-specific
Outline key areas of examination like the ones I've listed below then audit if that component exists in the first place and if it does, if it complies to best practices
Ex. - For category template, yes, facets exist, but no they do not comply, because let’s say they are eating valuable crawl resources and causing index bloat. Findings = discuss options for crawl control.
SVS: Simple calculation that gives a numeric score to indicate search visibility of a keyword.
Purpose: demonstrate the value of ranking on page 1 of Google’s organic results for keywords with high search volume.
The higher the score, generally the better the visibility (doesn’t measure traffic or correlate to conversion)Calculation: Search Volume X (1/Position of one particular keyword)
Matrix: Google Sheets that I created which automatically calculates the SVS of your client and their top four competitors as a whole and for any particular keywords or keyword phrases
Provides far more context and value to the client instead of just reporting off the top with “competitor x ranks for 25% more keywords” < client dominates search visibility for keywords containing the word “hat” or “hats” but falls behind with “store”
Download free matrix and read the full tutorial on how to use it with these bitly links
Mobile first indexing – not INDEX (1 index)
Mobile first = determinity parity between mobile and desktop
Mobile friendly has nothing to do with mobile first indexing
Rendering assessment -- Googlebot is no longer the once “blind 5 year old” and is generally able to render and understand web pages like a modern-ish browser. There are caveats and limitations like the 5 second rule, crawl budget, client-side vs. server-side rendering, which JS framework is being used, but in general, here’s the simplest way to do a rendering assessment:
Google IO – Rendering is deferred
Rendering assessment -- Googlebot is no longer the once “blind 5 year old” and is generally able to render and understand web pages like a modern-ish browser. There are caveats and limitations like the 5 second rule, crawl budget, client-side vs. server-side rendering, which JS framework is being used, but in general, here’s the simplest way to do a rendering assessment:
Audit that all essentials tags, links, and content are present in both versions of the page
Initial HTML
Rendered DOM
Decrypt the discrepancies between the two
If missing from rendered DOM, Google may not crawl/index (Google prefers rendered DOM)
If only in the rendered DOM:
Non-Google search engines may not crawl/index (limited JS rendering still)
Identify user event dependencies
Post load events not rendered
Remember: Google CANNOT click, scroll, or hover
What kind of page is the top offender of user-event dependencies? Location finders
Initial HTML
View page source
Fetch as Googlebot -- HTML snapshot (how the crawler sees the page within 5 seconds of rendering)
Screaming Frog’s original HTML column under View Source tab
Rendered DOM
Inspect element in Chrome 41 << Copy outerHTML
Google’s Rich Results Testing Tool << View Source Code -- MOST ACCURATE OPTION for DESKTOP (uses Chrome 41)
Google’s Mobile Firendly Testing Tool << View Source Code & Snapshot – MOST ACCURAE for MOBILE
Screaming Frog’s rendered HTML column under View Source tab -- BEST OPTION FOR MANY PAGES, but uses Chrome 60 instead of what Google uses (Chrome 41)
Screaming Frog renders JS using Chrome 60 so wont be exact
All the resources of a page (JS, CSS, imagery) need to be available to be crawled, rendered and indexed.
Google still require clean, unique URLs for a page, and links to be to be in proper HTML anchor tags (you can offer a static link, as well as calling a JavaScript function).
The rendered page snapshot is taken at 5 seconds, so content needs to be loaded by that time, or it just won’t be indexed for each web page.
If you run a search and can’t find them within the source, then they will be dynamically generated in the DOM and will only be viewable in the rendered code.
rendered DOM is what Google will eventually use to index a webpage’s content
When in JS rendering mode, you can now store the raw HTML and the rendered HTML to inspect the DOM from the “view source” tab SF
Side-by-side comparison for pages -- uses Chrome 60
Ex. - added meta robots tags of NOINDEX, FOLLOW via Adobe’s DTM to deindex a section of the site
Raw/original HTML shows INDEX, FOLLOW but rendered HTML shows NOINDEX, FOLLOW
Auditing many Magento 2 sites I’ve found a discrepancy between text-only and JavaScript rendered SF crawls
Crime: Text-only showed product images had no internal or inlinks pointing to them
Crime: JS rendering with store HTML options showed the inlinks to the images from the product pages in the rendered HTML and reported them in the missing alt text report
Perp: Magnifying widget needs JS and default M2 behavior is missing alt text for the main images on product detail pages
Stage 3: Evidence Analysis -- Stage 3 is evidence analysis
Now that we’ve collected the key evidence, it’s time to connect each clue by identifying relationships that exist between them whether that be direct or indirect
Prioritize each piece of collected evidence
What is the expected level of effort compared to the anticipated business impact
Is this a threat or opportunity? Do no harm.
Is this something that’s fundamentally broken and therefore a must-have or just a nice-to-have?
Evidence analysis from many perspectives but obviously we need to take a primary lens from a search engine viewpoint
Do this by determining if there are any barriers that would prevent a page from getting through the entire search engine process starting from
Initial page discovery, crawling, indexing, retrieval, and finally ranking
Since these steps are generally mutually exclusive, any barriers that exist are considered a relative point of failure
Priority must be crawl!
Stage 4: Results Dissemination -- the final and most important stage of forensic investigation because strong deliverables that cater to your stakeholders regardless of their knowledge of SEO can make all the difference for getting things done post-audit
Report findings in real-time giving the client visibility into issues as they are being discovered
Google Sheets Action Items list
Present and deliver results to the stakeholders
Executive summary
Audit Word Doc -- could also have a supplementary PP deck
Appendix -- reference appendix items from all deliverables to make it easy for everyone to reference especially devs who need examples and C-suites that need it for context and visualization
All audit deliverables should answer these three questions
What? So What? Now What?
Google Sheet updated in real-time
Update the list of issues, recommendations, and priorities in real-time as a master record of everything that is and isn’t implemented
Gives client opportunity to act on issues as they are recorded
This is the first part of the audit Word document and could be the only part C-Suite will actually read. Make sure it’s GOOD
What was done
How it was done
Top areas of concern and opportunity
Summarized issues table by priority
List recommendations that are low expected effort and high expected business impact first.
List should follow the what? So what? Now what? model
Audit structure by topic -- start with topic then list out related subtopics and repeat
Topic
Overview (education) -- what?
Subtopic
Current state (issues) -- so what?
Recommended state (recommendations) -- now what?
Priority
Severity scale based off of JIRA or PM tool (low to critical) based on risk, opp for growth, and quick wins
Audit topic example
Topic = Structured data
Overview (education) -- what?
SD used for G indexing, rich results, increases CTR
Subtopic = missing or incorrect schema.org markup
Current state (issues) -- so what?
Extracted markup using SF to find SD missing required values, has irrelevant values, violates Google's SD guidelines
Recommended state (recommendations) -- now what?
Provide markup code revisions per type, new markup to add, and the implementation methods such as inline microdata, JSON-LD, or even GTM alternative
Priority
Severity scale based off of JIRA or PM tool (low to critical)
Critical
Risk - Google manual action penalty
Opp for growth -- new markup types
Quick wins -- fix existing markup not validating
Gone through the four stages of forensic investigation -- Case closed? No.
Post-Audit Implementation -- is easily the most challenging
How many have delivered an audit that went nowhere?
Commit: After completing the SEO forensics, collaborate with the client to commit to a short-term and long-term action plan.
Align: Since audits typically focus on technical SEO which can take up limited development resources, it’s important to understand the IT ticket workflow and current backlog.
Create tickets even if they sit in a backlog
Review backlog: You may have no choice but to wait months or even years for anything to get done. Adjust and reprioritize tickets every few months or as needed.
Document and share results: anything that actually gets done, get buy-in
Hold stakeholders accountable but within reason
Pick your battles carefully but articulate opp costs to illustrate money being left on the table
A more effective model to auditing that cuts through boilerplate audits
Each stage allows you to go from completing necessary investigation prep work, to acquiring and analyzing evidence, to disseminating the results to your stakeholders
Ensure you present your findings with strategic/actionable deliverables including a discovery Q&A, Action Items, Executive Summary, Word Document, and Appendix that answer the WHAT, SO WHAT, and NOW WHAT
Finally, speed up the implementation process post forensics by committing to a post-audit action plan that aligns with their current priorities
NOW Case closed. Justice has been SERVED.
You guys like free stuff? Of course you do. Here’s a bunch of it. You’re welcome.
Check out the deck, blog post, mostly free tools, resources, and Chrome plugins for your next audit, and the SVS Matrix
Thanks everyone!
Slides are available on Slideshare
Tweet me questions