http://bit.ly/blog-post-2018
When projects fail, it becomes easy to do what’s quick instead of what’s right. Since technical SEO can take significant effort and usually heavily relies on the development team, it’s often the first channel to get neglected. This can lead to technical SEO debt which limits the positive impact of SEO and often requires more development resources to correct in the long-run. Do what you can to prevent forever haunting the JIRA ticket graveyard, prioritize your technical efforts, and chip away at your repayment plan. Whether you inherited the debt or created it to meet a project deadline, the debt must eventually be repaid, often with interest - Google won’t wait forever.
Event >> https://www.pubcon.com/session-details?action=view&conference=pubcon77&record=157
2. Perficient Digital
Renee Girard
@renee_girard
• An SEO since 2012
• Caresfartoomuch about technical SEO
• 6 years of digitalagency experience
• Guest lecturer at Marquette University
andUniversity ofWisconsin-Milwaukee
2
4. Perficient Digital
What is Technical SEO Debt?
Theconsequences of technical SEO issues that are intentionallyor
unintentionallyneglected.
Debt must eventually be paid and often with interest. Thiscosts
youmore inthe long-runand limits both SEOand developer
effectiveness.
4
6. Perficient Digital
HowWe Get into
Debt
6
ofenterpriseSEOswaitatleastayearto
completecritical technicalchanges
48%
Source:https://moz.com/blog/how-long-are-seos-waiting-for-their-most-important-changes@renee_girard | #pubcon
9. Perficient Digital
Prevent & AvoidDebt Prioritize & Get Buy-In RepayinInstallments
Structured data scope creep HTTPS migration disasters XMLsitemap struggles
Debt Management Strategies in Real Life
9
@renee_girard | #pubcon
11. Perficient Digital
Prevent &Avoid Debt
• Regularly incorporate SEO into the development process
• Propose creative solutions andalternatives
• Regularly crawl andaudit
• Createcustom alerts foranalytics andthird-partytools
11
12. Perficient Digital
Including the currency symbol with the pricemarkup used tovalidate and create
rich snippets results. Schema changed their guidelines sothat pricecannot
contain the ‘$’ sign andtouse priceCurrency instead.
Story Time
12
@renee_girard | #pubcon
Allproducts lost theirrich results
13. Perficient Digital
~32 Hours ~6 Hours
Estimated time needed for the dev team tofix
the price and priceCurrency markup
warnings
Total hours used foranalytics andSEOto research,
develop, deploy, and QA allmarkup for 1,000+ pages
via GTM
Story Time
13
@renee_girard | #pubcon
21. Perficient Digital
Prioritize & Get Buy-In
• Canit wait?
• What is the opportunity cost?
• What will make a noticeable outcome with the least effort?
• Outline issues, solutions, impact, & effort
• How will it improveUX andbasic SEO?
• Discovery, crawling,indexing, retrieving, &ranking
• Canyou tie it back tooverall strategy?
• e.g. more trafficandrevenue
21
23. Perficient Digital 23
@renee_girard | #pubcon
• 302sinstead of301s*
• 301redirect ruleremoved
• Infinite redirectloop
• 301redirected allpages tothehomepage
• Lostpaidsearch tracking
• Mixedcontent warnings
• Forgotto transfer GSC settings
• Images not redirecting
• Broken CSSstyling for WP blogs
Story Time
*Italics=fixed
RedFlags
24. Perficient Digital
Page Prioritization
24
@renee_girard | #pubcon
• Step 1: Createan inventory ofpages andfiles
• Step 2: Remove duplicates
• Step 3: Segment andprioritizepages based on KPIs
• Step 4: Crawlpre- andpost-launch with different user-agents
• Note: Cloudflare blocks spoofed Googlebot
• Step 5: Saveeach crawl,correct, andrepeat
25. Perficient Digital
"Upgrade Insecure Requests" is a CSPdirective toindicate toHTTP clients/browsers that all
resources must beaccessed viaHTTPS.
UpgradeInsecure Requests
25
@renee_girard | #pubcon
27. Perficient Digital
Repay in Installments
• Pitch it as an experiment ortest case
• Preach iterative changes
• Be prepared towait
• Be a team player
• Pick yourbattles
27
31. Perficient Digital
1. Changessitemap filenamesdaily*
2. Containshreflang withoutalternates
3. Adds new pages but doesn’t remove (22 child sitemaps to38
in one year)
4. Contains duplicates
5. Contains 30X, 40X, 50X HTTP statuses
6. Contains noindexed pages
7. Contains pages blocked by robots.txt
8. Contains soft 404s
9. Contains server errors
10. Contains facets with noproducts
Story Time
31
@renee_girard | #pubcon
“Dirt”
*Italics=fixed
32. Perficient Digital
~60 Hours ~10 Hours
Estimated time for the dev team toremove hreflang
markup, make file names static, andbuildlogic to
remove “dirt”
Estimated time for SEOtogenerate andQA new
sitemaps with hreflang for only high-value pages
using third-partytools
Story Time
32
@renee_girard | #pubcon
33. Perficient Digital
XML SitemapAuditing
Step 1: Createinventory ofall sitemaps listed in robots.txt, BWT, GSC, andlogs
Step 2: Crawlandexport
Step 3: Identify issues like missing pages and“dirt”
Step 4: Corrector providethird-partygenerated sitemaps
Step 5: Test, upload,andsubmit
Step 6: Monitorindexed-to-submitted ratios,GSC, andrepeat as needed
33
34. Perficient Digital
Prevent & AvoidDebt Prioritize & Get Buy-In RepayinInstallments
• AddSEO intothe dev process
• Propose creative solutions
• Crawlandaudit regularly
• Createcustom alerts
• Canit wait?
• What’s the opportunity cost?
• What will make a noticeable outcome with the
least effort?
• How will it improveUX orbasic SEO?
• Canyou tie it back to overall strategy?
• Pitch it as an experiment
• Preach iterative changes
• Be prepared towait
• Be a team player
• Pick yourbattles
Debt Management Strategies
34
@renee_girard | #pubcon
37. Perficient Digital
MoreTakeaways
• Validate the code using the SDTT,then audit forerrors in bulk using Screaming Frog extractors
• Keep up todate with schema.org core andpending changes
• http://schema.org/docs/releases.html
• http://blog.schema.org/
• https://www.schemaapp.com/category/news/
• Implement andscale JSON-LD using Google TagManager orDynamic TagManager (Launch)
• Allows foragile testing and revisions, but proceed with caution
Prevent & Avoid Debt | Structured Data
37
@renee_girard | #pubcon
38. Perficient Digital
MoreTakeaways
• Know what kind ofcertificate you have
• Transfer HTTP GSC settings, like URL Parameters, tothe HTTPS account
• Track indexation ofHTTP andHTTPS
• Be patient with indexation changes (takes longer than you think)
• Identify andfix mixed content using HTTPSChecker.net orScreaming Frog’s insecure content report
• Consider adding “upgrade-insecure-requests” CSPdirective
• Spoterrors with iterative crawling, especially after code pushes
Prioritize & Get Buy-In |HTTPS Migrations
38
@renee_girard | #pubcon
39. Perficient Digital
MoreTakeaways
• If all else fails, create the files manually using athird-party tool
• Follow sitemap guidelines andsyntax best practices
• Include only high-value canonical URLs
• Have sitemaps automatically update via CRONjob
• Audit for“dirt”byidentifying low indexation ratiosandcrawling them
• Break apartsitemaps intosmaller files nested under one single sitemap index
• Separate logically or by directories instead of limiting it toa number of URLs
Repay in Installments| XMLSitemaps
39
@renee_girard | #pubcon
Good morning! Excited to discuss technical SEO debt management with you today.
My name is Renee and I’m a Senior Organic Search Strategist
I care about technical SEO, probably too much – you’ll find out very quickly
On a mission to try to get everyone to care about technical SEO before everything is in flames
I work at Perficient Digital which is Perficent’s digital agency
I am out of the Milwaukee, Wisconsin office
Ben is my colleague
Technical SEO = any technical action you take to improve search results
Technical SEO debt = metaphor to describe the consequences when technical SEO issues are continuously ignored intentionally or unintentionally
Over time, debt continues to pile up and when you finally start paying it off (which is usually only once everything is going wrong) it often ends up costing you more in long-run
Everyone feels the pain of debt whether you’re the SEO who’s realized the positive impact of optimizations are being limited or the developer who’s burdened with fixing compounding technical issues
Most projects fail (fail = doesn’t meet timeline and budget) which makes it easy to do what’s quick instead of what’s right.
Lack of Resources
“We’ll SEO it later”
Most often heard when a team doesn’t want a project to fail or not meet the timeline and go over-budget which means what’s quick can take priority over what’s right
Lack of Buy-In
SEO knowledge gaps combined with the fact that technical SEO fixes are often not a priority because it can take up valuable dev resources and be difficult to measure ROI
SEO = Content + Links
Platform Limitations
“It’s not possible”
many platforms aren’t built to support SEO and have outdated tech
All this debt results in an endless queue of JIRA tickets and almost half of enterprise SEOs waiting years for anything critical to get done
Enough doom and gloom! (RG style)
Which side note – I was once called by a client along with Darth Vader due to the negative nature of SEO but I digress – that’s a story for another time
Debt is Inevitable
You most likely won’t be able to fix everything but keep trying!
From the famous Parks & Rec Mouse Rat song, I FELL INTO THE PIT, YOU FELL INTO THE PIT, WE ALL FELL INTO THE (technical SEO debt) PIT
Now I’ll share with you real life stories from my clients where I am currently using strategies to manage technical SEO debt
Prevent & Avoid Debt – structured data scope creep
Prioritize & Get Buy-In – HTTPS migration Murphy’s Law edition
Repay in Installments – struggles with something as simple as XML sitemaps
Prevent & Avoid Debt – first step in managing debt is to try to prevent it from happening in the first place
Add SEO into the dev process
#1 thing you can do so that SEO isn’t disrupting the development flow and weaved into every aspect of the dev lifecycle
Become best friends with the dev team
We’ve seen migrations go so bad that we now require SEO as part of each one
Propose creative solutions
Not always a one-size-fits-all-approach to anything with SEO
Come up with multiple approaches
Crawl and audit regularly
Mistakes happen – from experience I can tell you that you don’t want to be the SEO who doesn’t notice a code push reverted the robots.txt file to disallow all
Create custom alerts
Detect statistically significant events like traffic anomolies and 404 spikes by setting them up in your analytics platform or any third-party tools like SEMRush
Schema.org guidelines changed requiring the dollar sign or currency symbol to be excluded as part of the price value and made priceCurrency property a requirement
Both come back as just a warning, but I’ve found in my experience and for many of my clients that rich results were removed
Magento 2 that came out in recent years didn’t reflect the update for their OOTB markup
Not even Google’s Structured Data Markup Helper tool reflects this update
All 1,000+ products that once had rich results for price and availability were no longer eligible
Hurt especially when our clients had lower prices than the competition
32 hours to make the fix since it was considered out of scope for the platform migration
32 hours would take away from our budget for other more important fixes
I care too much about technical SEO
Investigated an alternative to implementing JSON-LD through GTM which is respected by Google and will override whatever is on the page
Who is already using a tag manager like DTM or GTM or already deploying markup through a tag manager?
Between myself and the analytics analyst, it took us 6 total hours from start to finish. Since all products shared the same page template, we were able to extract variables from the HTML using CSS selectors to push through the dataLayer to create JSON-LD scripts for over 1,000 pages and regain eligibility for our rich results.
Disclaimer on GTM/DTM:
Implementing the markup directly on the page or inline is more reliable and ideal than through a tag manager
On-page implementation is more reliable – John Mueller agrees and says it’s “unpredictable”
Scripts can break easily – have to strip out characters or the syntax breaks
Don’t get trigger happy – accidentally had regex error with the trigger and deployed article markup intended for one page to 800
GSC reported no errors
Secure your account with two-factor authentication – obvious but some people can get their GTM accounts hacked
1st example of JSON-LD scripts in GTM is using a custom HTML tag for static markup
Static = data that doesn’t change so if the content changes, it’s a manual process
Ideal for one page where the content isn’t changing like what we did for organization markup on the homepage
2nd example is a blended approach to JSON-LD where you mix static and dynamic values.
Dynamic = anything in the curly braces, these are tags that populate themselves using variables
We deployed this hybrid approach for product markup where the product name, description, and price all populated dynamically and the rest was static since it wouldn’t change.
Finally, the 3rd example is ideal because it’s fully dynamic and as long as the HTML and CSS doesn’t change, will always auto-update as content changes.
For example, we wanted to do event markup for conferences which occur annually but in different cities.
We deployed the markup to all conference homepages so each year when the conference details change, we don’t have to update it.
SDTT is a code validator not content
Validating the schema property values for missing requirements and inaccuracies is extremely important because GSC and SDTT won’t do this for you.
Ex with the regex trigger mistake
Easiest way to audit the property values is through SF extraction.
Using the extraction rules linked in the blog post I’ve linked, you can extract all markup detected across a site whether that be OG tags, JSON-LD or microdata.
If you’re on a free version, extraction won’t work for GTM/DTM implementations because it does require JS rendering
Prioritize & Get Buy-In -- You’ve incurred some debt which is eating at your SEO credit score, you’ve started to identify debt contributors, now what?
Prioritize & Get Buy-In – Answer these questions to prioritize and make technical SEO a priority
Can it wait?
Use best judgement to escalate anything critical as soon as you observe them and then table everything else
What’s the opportunity cost?
If you do nothing to pay off the debt, what will be the potential gains and losses?
What will make a noticeable outcome with the least effort?
Build out your spreadsheet documenting all identified issues and potential solutions, still stressing the expected level of impact and anticipated effort for each.
How will it improve UX or basic SEO?
Determine your nice-to-haves versus must-haves. Then relate them back to user experience and fundamental SEO or URL discovery, crawling, indexing, retrieving, and ranking
Can you tie it back to overall strategy?
Illustrate how a technical fix can actually help generate more traffic and even revenue will allow you to skip the dev queue while aligning technical SEO with strategic initiatives
Story Time – why HTTPS migrations make me sad
Google has been fearmongering it for years
Many of my clients procrastinated only until the “not secure” omnibox warning showed up last year more prominently
There are still some fortune 500 sites still not secure and I have a HTTPS migration tonight!
Probably not the worst thing because it wasn’t until last year that Google said there were some initial bugs causing the temporary ranking drops to be less temporary than they expected.
Now HTTPS is a requirement for advanced AMP, PWA, service workers, and Google Assistant actions as well as a small ranking boost (yea right)
Let me tell you about the Murphy’s law edition of HTTPS migrations where every time one thing was fixed, another thing was broken
Yes, we are looking at a site that 301 redirected every page to the homepage
Story Time – Murphy’s Law
302s instead of 301s*
302 is default state for Apache rewrites
301 redirect rule removed
Both rendered
Infinite redirect loop
If condition nginx configuration conflicting with the Apache rewrite
301 redirected all pages to the homepage
Site became unusable – if it went unfixed, it could have taken down the sites altogether
Lost paid search tracking
Passing through redirect lost the attribution
Mixed content warnings
Caused by images in Pardot forms being served from a vanity domain without SSL (feature just released on February 1st)
Forgot to transfer GSC settings
URL parameters weren’t configured
Images not redirecting
Hosted through CDN subdomain which weren’t included in the hostname certificate
Broken CSS styling for WP blogs
HTTPS links forced WP blogs to render with broken CSS styling which were blocked from being loaded by the browser
Error spotting these issues was done by creating a list of all pages, prioritizing them based on our KPIs, then crawling them regularly before, during, and post migration
Got buy-in immediately for the obvious red flags but for everything else we were able to give a score to each page to illustrate the importance based on that page’s metrics like rankings, traffic, revenue, and links.
CYA by saving your crawls and crawling as different user-agents.
Note: Cloudflare blocks Googlebot spoofing so don’t freak out if you see 403 errors
Recommend doing this for all migrations
Has anyone here heard of the Upgrade Insecure Requests header?
Consider the upgrade insecure requests which is added to the HTTP header of a page to tell a browser to automatically serve resources as HTTPS to prevent mixed content
See an example on cat.com
Repay in Installments –
Accumulated debt, prioritized and identified top debt contributors, now it’s time to start paying off those debts
Pitch it as an experiment
#1 Chip away at debt by pitching the fix as an experiment.
Good for everyone because it’s low effort usually to deploy a fix to a section of the site and then you get to see if it will make any impact while you wait for it to be deployed hopefully in the next year
Preach iterative changes
#2 Not all clients are set up for agile work environments but focus on what you can control and consistently advocate for regular changes to keep things from getting stagnant
Be prepared to wait
Be a team player
Pick your battles
#3 Accept you can’t fix everything at the enterprise level where CMS platforms are held together by bubblegum sometimes so know when to pick your battles to keep perspective (AKA new platform)
Before we get into story time, I want to set the stage as to why you should care more about XML sitemaps than you currently do
First and foremost, crawl budget is something you need to care about if you’re at the enterprise level
It’s mainly a combination of host load (server capacity) and URL scheduling and importance (crawl frequency + Page Rank)
URL importance dissipates when signals are inconsistent
One of the strongest indicators of URL importance is XML sitemap inclusion -- as stated by John Mueller in a recent Google hangout
XML sitemaps are a great first spot to diagnose indexation issues, especially if you don’t have access to logs
Story Time – Battling the XML Sitemap from Hell
Large enterprise ecommerce client does very little to limit crawling and indexing
Knew there were crawl inefficiencies which was evident by double listings, ranking page flip flopping, and over a million pages in the index
Looking at the overall percentage of all pages submitted through a sitemap compared to the number of pages GSC says were indexed, we found that the majority of the child sitemaps had very little indexed pages
Sitemap indexation rate with a sitemap that’s done right should be much higher than 18% which is dreadful
Story Time – Index Coverage Report
So bad that we found that over 1.3 million pages submitted through a sitemap were being excluded from indexed
New index coverage report showed many errors, many of which are still not fixed
Story Time – Almost Every Error Imaginable
Changes sitemap file names daily*
Native issue of Hybris platform which caused Bing to stop crawling the sitemaps completely and for Google to be unable to provide an up-to-date indexed to submitted breakdown.
Contains hreflang without alternates
Hreflang tags can be implemented through XML sitemaps but this one didn’t claim any alternates
Adds new pages but doesn’t remove (22 child sitemaps to 38 in one year)
Contains duplicates
Contains 30X, 40X, 50X HTTP statuses
Contains noindexed pages
Contains pages blocked by robots.txt
Contains soft 404s
Contains server errors
Contains facets with no products
All considered “low-value” and contributors to “dirt”.
The more sitemap dirt, the less trust a search engine has in it which can make diagnosing issues challenging and create confusing signals to what’s the one true page.
To fix all these issues, the dev team quoted about 60 hours.
This is unacceptable for something as fundamental as XMl sitemaps so I decided to look into alternatives.
I care too much about technical SEO
Ben and I are now currently in the process of generating manually XML sitemaps using third-party tools for only high-value pages in an effort to use that dev time for more important tasks and plan to use 10 hours.
Going to use Excel formulas, Screaming Frog, and Aleyda’s new hreflang generator to do it.
Who uses BWT here?
If you care about crawl budget and providing search engines with a clearer signal to the one true page, start with auditing your XML sitemaps.
First identify all known and unknown sitemaps using robots.txt, BWT (shows you at subdomain levels all sitemaps even if you haven’t submitted one), GSC, and log files
Crawl them and export the results
Highlight issues like missing pages and “dirt”
Correct those issues with the dev team or if they push back, provide a third-party generated sitemap
Test, upload, submit, monitor
Debt Management Strategies
Prevent & Avoid Debt
Add SEO into the dev process
Get creative with solutions
Stay on top of things
Prioritize & Get Buy-In
Ask yourself these questions to get people to care
Repay in Installments
Try an experiment or proof of concept while you wait
Get agile
Thanks everyone!
Hopefully you care a little bit more about technical SEO now
Slides are available on Slideshare
I have a blog post going into more debt on these strategies
Tweet me questions
Don’t worry about microdata – Bing is eventually going to adopt JSON-LD!
Rarely see devs get structured data markup right the first time
Also, things change and it’s difficult to do that fast through dev teams
Monitor GSC Structured Data report for errors and number of pages
Blocking crawling
Missing redirects due to certificate
Mixed content warnings
Redirects getting removed magically
Impact: crawl budget wasted, confusion/dilution, duplicate content? potential indexing issues with mixed signals, http pages stuck in the index, index bloat
Creative solution: url parameters setting in gsc (remember sep account), upload dirty intentional sitemap to force crawling of old pages, temporarily unblock crawling outside of root files/directories, do UPA of files and pages and CYA with SF crawl saves pre and post launch, upgrade content resources http header, https checker – free up to 500 pages, can crawl specific directories