The technical setup of category and faceted category systems in ecommerce sites can make or break a website's organic visibility.
There are common errors in off-the-shelf and proprietary e-commerce platforms, which mean that sites using them aren't able to rank for all of the terms that they should.
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Faceted navigation webinar
1. FACETED NAVIGATION
FOR SEO: TECHNICAL TRICKS
TO BOOST YOUR VISIBILITY
Alec Bertram
Founder
@AllotmentDgtl
http://allotment.digital
Michal
Head of Product
@DeepCrawl
http://deepcrawl.com
#CrawlChat
2. Today
The common issues with category systems, which mean that
ecommerce sites don’t rank as well as they could.
How you can identify issues with your site and how to fix them.
• How search engines work
• Common crawl & indexation issues
• Common issues in ecommerce sites
• Case studies: Motolegends, House of Fraser
• Detecting and monitoring problems with DeepCrawl
• Fixing your facet system
8. Let’s talk about
dresses
URL: /dresses
Title: Dresses – Buy Top Designer Dresses | Brand
Description: Shop the Dresses range from our Women's department….
Dresses
Type
Evening
Party
Summer
Day
Brand
DKNY
Karen Millen
Colour
Red
Black
Our dresses are great for all
occasions!
10. Duplication
URL: /dresses/evening
Title: Dresses – Buy Top Designer Dresses | Brand
Description: Shop the Dresses range from our Women's department….
Dresses
Type
Evening
Party
Summer
Day
Brand
DKNY
Karen Millen
Colour
Red
Black
Our dresses are great for all
occasions!
13. Under-indexation
URL: /dresses/evening
Canonical: /dresses
or Robots: noindex
…or:
Dresses
Type
Evening
Party
Summer
Day
Brand
DKNY
Karen Millen
Colour
Red
Black
URL: /dresses#evening
Page content changed with JS
Our dresses are great for all
occasions!
18. Improve chances to
rank
Before: Now:
Term Volume
Dresses 823, 000
Term Volume
Dresses 823, 000
Evening dresses 165, 000
Summer dresses 135, 000
Party dresses 165, 000
DKNY Evening Dresses 50
DIOR evening dresses 170
Victoria Beckham
110
evening dresses
Get more visibility by increasing the number of pages relevant
to longer-tail terms
21. Potential incremental
…
revenue
Potential £88k incremental revenue by opening two facets on one category
Assuming:
- 9% CTR at position 1 to 5
- 2% Conversion rate
- Average order value taken from price point of each helmet
25. Under-indexation
Currently no filters for:
Model (Radeon/ GeForce)
Memory Capacity (2GB)
Memory Type (GDDR5)
Connector (PCI-E)
Frequency (5000MHz)
Output (DVI/HDMI)
26. Under-indexation
This site should also have indexable pages and filters for:
• Model (Radeon/ GeForce)
• Memory Capacity (2GB)
• Memory Type (GDDR5)
• Connector (PCI-E)
• Frequency (5000MHz)
• Output (DVI/HDMI)
31. Almost perfect
Nofollow prevents search crawlers from discovering
multi-selected facets.
No “Cocktail or Maxi Dresses”
Can discover multi-dimensional facets.
“Phase Eight Cocktail Dresses”
32. Almost perfect
3 filters selected = crawlable and indexable
4 filters selected = not crawlable but indexable
33. Almost perfect
URL: womens dresses/cocktail/phase-eight/black or gold
Content: Dresses
Crawlers can not reach this page through this site, but can be linked to from elsewhere
Recommendation: noindex or canonicalise these pages to an indexable parent to prevent index bloat when
search crawlers do discover them another way
44. Detecting Duplication
Use Reports Matches Search to identify URL
patterns that cause Duplication
Create Issue for each
of the patterns
Track the Issue over time
Identify all patterns by using “Does not
match” to exclude already known issues
46. Detecting Duplication
3 types of Duplication:
• Duplicate Pages
• Titles are Duplicated
• Body Content is Duplicated (almost Identical)
• Duplicate Titles
• Titles are Duplicated
• Body Content is Unique
• Duplicate Body Content
• Titles are Identical
• Body Content is Duplicated (almost Identical)
53. Fixing your facet system
1. Market Keyword Research
Find product attributes people actually search for
2. Business Case
Calculate the potential revenue uplift and build a business case
3. Functional Specification
Create the functional spec and work with developers to ensure it’s
perfect
54. Find the product attributes
people actually search for
• Internal site search
• Customer query data
• Manufacturer spec sheets
• Primary research
• Keyword idea tools
55. Build a business case
Ideal category page list
+
Current rankings
+
Search volume and CTR Curve
+
Current conversion rate and AOV
=
Potential Incremental Revenue
57. Unique meta &
headings
URL: /dresses/evening
Title: Evening Dresses – Stunning Designer Evening Dresses | Brand
Description: Evening Dresses from top designers….
Evening Dresses
Type
Evening
Party
Summer
Day
Brand
DKNY
Karen Millen
Colour
Red
Black
Our evening dresses are perfect for
that soirée.
58. Unique pages for
facet permutations
URL: /dresses/evening/dkny/red
Title: Evening Dresses – Stunning Designer Evening Dresses | Brand
Description: Evening Dresses from top designers….
Red DKNY Evening Dresses
Type
Evening
Party
Summer
Day
Brand
DKNY
Karen Millen
Colour
Red
Black
Stunning red evening dresses by
Donna Karen.
59. …but not all of them
URL: /dresses/evening+party/dkny+karen-millen/red+black
Title: ???
Description: ???
Red or Black DKNY and
Karen Millen Evening and
Party Dresses
Type
Evening
Party
Summer
Day
Brand
DKNY
Karen Millen
Colour
Red
Black
60. Only one facet in each
section indexable
URL: /dresses/evening+party/dkny+karen-millen/red+blue
Canonical: /dresses/evening/dkny/red/
Robots: noindex
Red or Blue DKNY and
Karen Millen Evening and
Party Dresses
Type
Evening
Party
Summer
Day
Brand
DKNY
Karen Millen
Colour
Red
Black
61. Nofollow after first
filter in a group
Can crawl: Red DKNY Evening Dresses, Black DKNY Evening Dresses
Can not crawl: DKNY Evening and Party Dresses, etc.
Alternatively: robots.txt Disallow
DKNY Evening Dresses
Type
Evening
Party
Summer
Day
Brand
DKNY
Karen Millen
Colour
Red
Black
Stunning evening dresses by Donna
Karen.
62. Some parameters should
never be indexable
URL: /dresses/evening/dkny/next-day-delivery#sort:price
Canonical: /dresses/evening/dkny
Always nofollow non-attribute links
DKNY Evening Dresses
Type
Evening
Party
Summer
Day
Brand
DKNY
Karen Millen
Colour
Red
Black
Next day delivery only
Sort by: Price
63. Set maximum number of
selected facet groups
Depending on the industry, users normally search for three or four of the core
parameters that they care about.
We don’t need pages indexed which are about uber-specific topics.
DKNY Silk Evening
Sleeveless Red Size 8
Dresses
Type
Evening
Party
Summer
Day
Brand
DKNY
Karen Millen
Colour
Red
Black
64. Dresses
Evening
Dresses
/dresses
/dresses/evening
/dresses/summer
Summer
Dresses
/dresses/party
Party
Dresses
/dresses/evening/dkny
DKNY
Evening
Dresses
/dresses/evening/dion
Dion
Evening
Dresses
/dresses/evening/v-beckham
Beckham
Evening
Dresses
All pages now compete for long tail terms
65. Notes
Sometimes individual
filters shouldn’t be
indexable
In the recruitment
industry, “Full Time” is
the default, so
shouldn’t be indexable
– no one uses it as a
modifier
67. Notes
You can template page headings, but generally
shouldn’t use templated content snippets.
“We have lots of great white dresses in our home and clothing
category. Shop now for great savings on white dress.”
“We have lots of great hard drives in our Electronics category.
Shop now for great savings on hard drive.”
68. Notes
Enforce URL path and parameter order consistency to prevent duplication
/products/dresses/dkny/evening/red
/products/dresses/dkny/red/evening
/products/dresses/?type=evening&colour=red
/products/dresses/?colour=red&type=evening
Other Common URL Design Issues
Casing /dresses/red vs /Dresses/red
Trailing Slashes /dresses/red vs /dresses/red/
Internal Clicks tracking /dresses/red vs /dresses/red?click=home
Multiparent Categories etc.
69. Functional spec
1. Unique meta and content for every facet (and ensure it makes sense)
2. Ensure main combinations are crawlable and indexable
3. Only one selection in a filter set should be indexable
4. Nofollow links in filter set after one is selected (alternatively robots.txt
disallow)
5. Enforce parameter ordering in URLs
6. Set a maximum number of selected groups (3 - 5 depending on industry)
70. Next steps
Analysis
•Analyse whether you have an issue with your current set up
•Research what product attributes people actually search for
Preparation
•Work out the potential revenue uplift by being able to rank for more terms
•Create the functional spec for your website
•Create crawl, traffic, visibility benchmarks
Testing
•Test on development site
•Test on staging site
•Test on production site
•Monitor benchmarks
71. FACETED NAVIGATION
FOR SEO: TECHNICAL TRICKS
TO BOOST YOUR VISIBILITY
Alec Bertram
Founder
@AllotmentDgtl
http://allotment.digital
Michal
Head of Product
@DeepCrawl
http://deepcrawl.com
#CrawlChat
Notes de l'éditeur
MORE DETAIL
THIS IS A FACET SYSTEM
ALLOWS USERS TO FILTER BY KEY ATTRIBUTES
THIS PAGE – DRESSES – STYLE, BRAND, SIZE COLOUR
SIMPLISTIC VIEW
TECHNICAL ISSUES FOCUS ON HOW SEARCH ENGINES USE YOUR SITE, GOOD TO START TO UNDERSTAND THAT
1ST, CRAWLER. SITE => TEMP DB
2ND, INDEXER PROCESES, PUTS IMPORTANT STUFF IN TO INDEX.
A LOT OF USELESS STUFF THROWN AWAY
3RD, WHEN USER MAKES QUERY, SEARCH ENGINE USES THE DOCUMENTS IN INDEX
Later, the indexer comes along and goes through everything to put the important stuff into the index. At this stage, a lot of useless stuff will be thrown out – particularly duplicate pages.
When a user makes a search query, the ranker will use the documents in the index to return a list of the most relevant pages for that query.
COMMON ISSUES WHICH AFFECT HOW CRAWLER OR INDEXER SEE YOUR SITE
CRAWL BUDGET
CRAWLER MOVES THROUGH SITE, DISTRIBUTES PR TO PAGES IT FINDS
CAN’T CRAWL FOREVER. PR BELOW THRESHOLD = STOPS.
WASTING YOUR CRAWL MEANS SOME PAGES WON’T BE CRAWLED AS OFTEN (OR AT ALL)
BEST TO CHANNEL THE CRAWLER TO YOUR MOST IMPORTANT PAGES.
The first issue is crawl budget.
As the crawler moves through your site it will pass and distribute PageRank to the pages it finds.
Obviously, it can’t crawl every level of your site forever, so once the PageRank drops below a certain threshold, it will stop crawling your site – this is called your crawl budget. CHANNELLING TO DISCVER MOST IMPORTANT
DEPTH GRAPH
Because of this, it’s really important to minimise the amount of wasted crawl – this means that spiders will be able to reach more of your important pages more often.
THIS SITE: ECOM – DRESS PAGE
PRETTY GOOD FROM SEO POV
This is an ecommerce site selling dresses – this is the main category page, and it’s pretty good from an SEO perspective.
DUPLICATION – HUGE ISSUE
SEARCH ENGINES DON’T WANT TO RETURN TWO DUPLICATE DOCUMENTS FOR ONE QUERY
DILUTES AUTHORITY – SITE NOT RANK AS WELL AS IT COULD
Duplication is a huge issue for most sites – essentially, search engines don’t want to index lots of the same page, so indexers will normally cut a lot of it out and only index one or two pages from a group of duplicates.
Having duplicate pages wastes your crawl, and dilutes authority across a lot of URLs, meaning your site ultimately doesn’t rank as well as it could.
THIS SITE: ECOM – DRESS PAGE
PRETTY GOOD FROM SEO POV
This is an ecommerce site selling dresses – this is the main category page, and it’s pretty good from an SEO perspective.
This leads to a case of massive duplication through out the site – all pages have the exact same content, but display different products.
DUPLICATION – HUGE ISSUE
SEARCH ENGINES DON’T WANT TO RETURN TWO DUPLICATE DOCUMENTS FOR ONE QUERY
DILUTES AUTHORITY – SITE NOT RANK AS WELL AS IT COULD
Duplication is a huge issue for most sites – essentially, search engines don’t want to index lots of the same page, so indexers will normally cut a lot of it out and only index one or two pages from a group of duplicates.
Having duplicate pages wastes your crawl, and dilutes authority across a lot of URLs, meaning your site ultimately doesn’t rank as well as it could.
UNDERINDEXATION IS WHEN CRAWLERS CANT ACCESS (JS/BLOCKED)
OR WHEN INDEXERS CAN NOT INDEX (CANONICAL/NOINDEX)
Another common issue is under indexation, when – to avoid this kind of duplication, all of the faceted pages are not indexable – maybe they canonicalise back to the main dresses page, have a noindex directive, or worse still, maybe they don’t make a new URL – the products just change with JavaScript.
DUPLICATION – HUGE ISSUE
SEARCH ENGINES DON’T WANT TO RETURN TWO DUPLICATE DOCUMENTS FOR ONE QUERY
DILUTES AUTHORITY – SITE NOT RANK AS WELL AS IT COULD
Duplication is a huge issue for most sites – essentially, search engines don’t want to index lots of the same page, so indexers will normally cut a lot of it out and only index one or two pages from a group of duplicates.
Having duplicate pages wastes your crawl, and dilutes authority across a lot of URLs, meaning your site ultimately doesn’t rank as well as it could.
UNDERINDEXATION IS WHEN CRAWLERS CANT ACCESS (JS/BLOCKED)
OR WHEN INDEXERS CAN NOT INDEX (CANONICAL/NOINDEX)
Another common issue is under indexation, when – to avoid this kind of duplication, all of the faceted pages are not indexable – maybe they canonicalise back to the main dresses page, have a noindex directive, or worse still, maybe they don’t make a new URL – the products just change with JavaScript.
LOOK AT QUERY - DKNY EVENING DRESSES
ALMOST EVERY PAGE ULTRA RELEVANT
NO MATTER HOW AUTHORITATIVE YOU ARE, YOU WON’T RANK WITH A GENERIC PAGE
IF YOU KNOW THIS INDUSTRY, YOU’LL NOTICE ONE OF THE MOST AUTHORITATIVE SITES MISSING BECAUSE THEY DON’T HAVE A PAGE FOR THIS, DESPITE BIDDING HIGHLY IN PPC
If we look at the pages which currently rank for the term dkny evening dresses, we can see that most of them are specifically about DKNY evening dresses, and the ones that are left are about DKNY dresses – no matter how authoritative you are, you’re never going to rank for this term with a generic “dresses” page.
One other thing to point out, you’ll know of one or two extremely authoritative sites in the designer clothing vertical – they don’t rank for this term because they have this exact problem – they sell DKNY evening dresses, but do not have an indexable page for it so are not eligible to rank here.
DOING THIS MEANS RANKING FOR WIDER RANGE OF TERMS
LOW VOLUME BUT HIGHER PURCHASE INTENT
THIS IS A HANDFUL, ADDS UP WHEN YOU LOOK ACROSS THE SITE
If we look at these terms, just by opening these facets, we’re eligible to rank for a much wider array of terms.
You might be looking at these and thinking that a couple of hundred searches a month isn’t very much, but someone searching “DKNY evening dresses”, those people have a much higher purchase intent than those just searching for “dresses”.
Obviously we’ve also only picked a handful of terms, but the fix for this is a one-off and applies to the whole website, so fixing it can affect every category and product type you have.
CASE STUDIES
SYMPATHISE WITH DEV QUEUE
HIGH END MOTORCYCLE GEAR
Let’s looks at a few case studies. The first is a high-end motorcycle gear brand.
AGV HELMETS – AGV IS A BRAND
NONE OF THE FILTERS ARE INDEXABLE,
MEANS NO PAGE FOR MODEL + BRAND – I.E. AGV K3 HELMET
This is a page for AGV motorbike helmets – AGV is a brand name, and they have different styles and models here
Motolegends main problem is that their categories are not indexable or crawlable – if you choose any of these product filters, only the products change. The URL, page heading and description stay the same – meaning that search engines can only index this one page.
IF WE TAKE EVERY HELMET BRAND + MODEL,
TAKE THE SEARCH VOLUME AND A CTR CURVE
USE A CONVERSION RATE AND AOV
WORK OUT THAT BY MAKING THESE INDEXABLE, THERE’S A POTENTIAL 88k INCREMENTAL INCOME
So if we take every brand of helmets and match it with each brand’s model names, we can see that there’s a lot of search volume.
If we assume that they’ll get to the top 5 in Google, we can project the amount of incremental traffic they’ll get.
If we work that out over a year and assume a conversion rate and average order value, we can see that just by opening up the brand and model name facets, Motolegends is eligible to make an extra £88k a year – from a fairly easy fix.
TAKE OUT DECIMAL POINTS
THIS IS AN ELECTRONICS RETAILER, SAME PROBLEM
SAME METHOD, WE CAN SEE THEY’RE MISSING OUT ON A BUNCH OF SEARCH VOLUME FROM BRAND MODIFIERS
BUT WHAT ELSE?
But what else have they missed? The product titles have a whole host of different specifications – just by looking at a few products I picked out some recurring product attributes which may be important – model, memory, the type of connector, the output. Just because Maplin doesn’t currently have these filters on their category pages, that doesn’t mean they shouldn’t look at expanding it.
MAKE A FEW COMBINATIONS OF THESE PRODUCT ATTRIBUTES
SEE THAT THERE’S QUITE A LOT OF SEARCH VOLUME AROUND THESE
THEY SHOULD HAVE MORE FILTERS FOR THE ATTRIBUTES PEOPLE USE WHEN SEARCHING
After running those through the keyword planner, it’s clear that these terms have some pretty significant search volume. Maplin is missing out on a lot of this traffic because they don’t have indexable pages – or even filters – for these terms.
HOF PRETTY GOOD
DRESS PAGE – PRETTY GOOD – RIGHT META, HEADINGS, SOME CONTENT
House of Fraser has a pretty good implementation of facets – it has a couple of issues which we’ll look at in a second, but is great from an indexation people of view.
Here’s the dresses page – exactly as you’d expect.
SELECT COCKTAIL - GET COCKTAIL DRESSES HEADINGS
House of Fraser has a pretty good implementation of facets – it has a couple of issues which we’ll look at in a second, but is great from an indexation people of view.
Here’s the dresses page – exactly as you’d expect.
EVEN BRANDS
House of Fraser has a pretty good implementation of facets – it has a couple of issues which we’ll look at in a second, but is great from an indexation people of view.
Here’s the dresses page – exactly as you’d expect.
TO PREVENT OVER-INDEXATION, THEY BLOCK BOTS FROM CRAWLING MORE THAN ONE ATTRIBUTE FROM SAME FILTER GROUP
Once one of the filters in a group has been selected, the rest of the filters in that group become nofollow, meaning that a crawler will not try to get to the “cocktail or maxi dresses page, but they will try to get to “phase eight cocktail dresses”
STOP PAGE FROM BEING CRAWLED AFTER TOO MANY FACETS GROUPS SELECTED
HoF even limit the number of selected dimensions, once filters in 3 groups have been selected, all links become nofollow – meaning that crawlers generally won’t try to get to super-long tail (and potentially useless) pages.
MAIN PROBLEM – THEY IMPLEMENT NOFOLLOW, BUT BOTS CAN STILL FIND THESE PAGES ELSEWHERE
SHOULD NOINDEX/CANONICALISE OR ROBOTS DISALLOW
One of the main issues with this system though is that where pages have been set as noindex because they’ve tripped one of the rules (for instance, they have too many filters selected), the pages are still indexable.
This means that if search engines find out about the pages from any other source (for instance, a blogger links to a page with black and gold dresses), that page will be indexed with duplicate content from the main category.
The recommendation here would be to noindex pages which shouldn’t be indexed, or better yet, canonicalise them to another of the indexable pages – that will make sure that if these pages are linked to but are not indexable, at least that authority will pass to other pages.
How to identify Underindexation
Look at you Crawl Depth Graph
How do you know though if this is enough pages
Crawl the Market Leaders (stealth crawl)
Estimate the number of Unique and Indexable Pages they have got
The example shows the number of pages of You Website vs Competitors
Calculate indexation Ratio
Ideally you want most of your pages to be available on first 5-6 levels
Compare to your Website
https://tools.deepcrawl.co.uk/projects/reports/33245
Heavy Canonicalisation
https://tools.deepcrawl.co.uk/projects/reports/33245
Heavy Canonicalisation
page Size, sort, show all
https://tools.deepcrawl.co.uk/projects/reports/33245
Heavy Canonicalisation
https://tools.deepcrawl.co.uk/projects/reports/56325
Improving Crawl Efficiency with robots.xtx
https://tools.deepcrawl.co.uk/projects/reports/16010
Heavy Duplication