9. @AlexisKSanders
(real talk: my irobot
has done more
cleaning than anyone
in my home…)
real data
mon iRobot a nettoyer plus que n’importe qui d’autre chez moi
12. @AlexisKSanders
human vs. bot
complex tasks calculations
creativity, imagination, language endless loops
heuristic analysis filtering big data
better with people better with other bots
13. @AlexisKSanders
all of this is to say humans
and bot are different
tout ça pour dire que les humains et les robots sont different
25. @AlexisKSanders
what are we looking for (by user-agent)?
• anomalies
• segment by folder
• crawling rates
• http response codes
26. @AlexisKSanders
we must answer:
1. are bots crawling your site in a way
you’d expect (and want) them to?
2. are your top KPI-driving pages being
crawled?
27. @AlexisKSanders
resulting changes may include:
• more internal linking
• removing dead internal links
• resolving status codes
• canonicalizing page sets (e.g., pdfs)
• bots crawling non-existent pages
• pages no one knew existed
30. @AlexisKSanders
effect of a meaningful XML sitemap
Relative effect of the treatment
showed an increase of +33%.
The 95% confidence interval of
this percentage is
[ 19.0% , 45.0% ].
The probability of this effect
being caused by chance is small
therefore it is statistically
significant.
32. @AlexisKSanders
crawling efficiency:
o important pages close to root?
o no crawl traps?
o no orphan pages?
o all pages have a purpose?
o duplicate content?
o redirects consolidated?
o canonical tags?
o no useless parameters?
37. @AlexisKSanders
why a master list of all URLs?
1. site migration
2. auditing
3. knowing/agreeing on what’s priority
4. to identify what is not being crawled & indexed
5. automation
38. @AlexisKSanders
to make a master URL list:
o crawlers
o XML sitemap
o GSC
o analytics platform
o dev team
o google SERP
40. @AlexisKSanders
we must answer:
1. JS to load important content?
2. performance data when changes are
implemented?
3. added solutions?
4. (bonus) are images important?
41. @AlexisKSanders
how to tell if your content is being
rendered?
1. check direct quotes in SERP
2. use google’s mobile-friendly testing tool
3. check the DOM (Inspect > Element)
44. @AlexisKSanders
1. prioritize (by value to your core users)
2. delegate towards strengths:
• programmed = simple (maybe API) data
input/output
• humans = on people + relationships
tips:
45. @AlexisKSanders
effect of adding computer-generated text
Relative effect of the treatment
showed an increase of +22%
The 95% confidence interval of this
percentage is
[ 13.0% , 30.0% ].
The probability of this effect being
caused by chance is small therefore
it is statistically significant
46. @AlexisKSanders
chatbots
Relative effect of the treatment showed an increase of +22%
The probability of this effect being caused by chance is high
therefore it is not statistically significant
Relative effect of the treatment showed an increase of +78%
The probability of this effect being caused by chance is small
therefore it is statistically significant
50. @AlexisKSanders
what is important to monitor:
• robots.txt
• status codes
• http redirects live
• meta robots (noindex)
• canonical
• XML sitemap
• title tags
• meta description
59. @AlexisKSanders
APPENDIX
• What’s here? Well, basically a bunch of complaints random thoughts
in a rant constructive format about robots.txt and why everyone I
(personally) find it so confusing intellectually stimulating.
60. @AlexisKSanders
things I find confusing about robots.txt
• allow versus disallow hierarchy of more specific
• [undefined] verdicts, what does Google even do… then
• how Google ad bot doesn’t follow the rules
• implied * at end of every line
• implied .com at beginning of every line
• how $ and * are in robots.txt, but they’re not the same as regex
• the whole noindex header on robots.txt being accepted, then ignored… why…
• https://www.robotstxt.org/, the whole site
• how we can only use robots.txt if URL structure makes sense
• how disallowing the robots.txt is just ignored (it’s so meta)
• when sites overuse robots.txt
• why Google automatically crawls your blocked pages if the robots.txt goes down
• how robots.txt is case sensitive (it’s so close… and yet… so far)
63. @maxxeight
Une experience utilisateur… que Google peut comprendre.
• Pertinent
• Mobile-Friendly
• Rapide
• Securisé
• Populaire
Un bon référencement naturel est basé sur:
• Contenu
• Web Design
• Vitesse du site
• SSL/HTTPS
• Liens
@maxxeight
64. @maxxeight
Une experience utilisateur… que Google peut comprendre.
• Pertinent
• Mobile-Friendly
• Rapide
• Securisé
• Populaire
Un bon référencement naturel est basé sur:
• Contenu
• Web Design
• Vitesse du site
• SSL/HTTPS
• Liens
@maxxeight
76. @maxxeight
Javascript Personnalisé dans AMP avec <amp-script>
Restrictions
• 10 000 octets maximum par <amp-script>
• 150 000 octets maximum pour le total des
<amp-script> dans la page
@maxxeight
77. @maxxeight
AMP pour le E-Commerce
<amp-sidebar>
- Navigation
<amp-carousel>
<amp-list>
- Organisation des produits
<amp-form>
- Recherche
<amp-bind>
- Filtrage et tri
<amp-access>
- Connection
<amp-accordion>
- Images/details
<amp-form>
<amp-carousel>
- Commentaires/avis
<amp-selector>
- Onglets/Vignettes
<amp-bind>
- Selection couleur/taille
<amp-state>
- Ajouter au panier
@maxxeight
82. @maxxeight
Combiner AMP et PWA
AMP comme
point d'entrée
dans la PWA
@maxxeight
AMP comme
source de
données pour
la PWA
AMP avec des
fonctionnalités
PWA
90. @maxxeight
Les page AMP ne sont pas rendues
• Bots voient seulement <amp-img> (vs. <img>)
• Pas d’access à l’URL dans src=“”
•
• Utilisez <noscript>
Les images AMP ne sont pas indexable
@maxxeight
91. @maxxeight
“Et pour le SEO?”
Les robots ne crawlent/indexent que la version AMP
Pas de gaspillage des resources (crawler plusieurs URLs avec le meme contenu)
Signaux clairs (pas de sourcis de balise canonical/alternate)
Les pages rapide et pre-chargées dans la SERP (AMP viewer ou “URL Réelle”)
Les pages sont “adaptées aux mobiles”
@maxxeight
(e.g., part of URLs, /#, URL like strings from html)
https://chrome.google.com/webstore/detail/search-analytics-for-shee/ieciiohbljgdndgfhgmdjhjgganlbncj
https://google.github.io/CausalImpact/CausalImpact.html
Fix to only be before the pre/post the next closest update
sessions
https://chrome.google.com/webstore/detail/search-analytics-for-shee/ieciiohbljgdndgfhgmdjhjgganlbncj
https://google.github.io/CausalImpact/CausalImpact.html
Fix to only be before the pre/post the next closest update
Update to organic visits
https://chrome.google.com/webstore/detail/search-analytics-for-shee/ieciiohbljgdndgfhgmdjhjgganlbncj
https://google.github.io/CausalImpact/CausalImpact.html
Fix to only be before the pre/post the next closest update
Update to organic visits
Chat leads, referred chat leads
https://chrome.google.com/webstore/detail/search-analytics-for-shee/ieciiohbljgdndgfhgmdjhjgganlbncj
https://google.github.io/CausalImpact/CausalImpact.html
Fix to only be before the pre/post the next closest update
Update to organic visits
Why is the reach of web apps higher?
Search engines (vs. app stores).
Supported by all major browsers
Low cost of acquisition
Capabilities
Reliable and Fast
App shell cached locally (on 1st load): Fast loading when offline or with slow connection (on subsequent loads)
Mobile-friendly (responsive)
Secure (HTTPS)
Engaging
App icon on device’s home screen
Push notifications
Technically, any website can easily be turned into a PWA (service-worker + manifest)
But in general, a web app, a site built with a JS framework is the best candidate but become a PWA.
Building a web app to be fast: lazy loading, api based content (user click to load)
Refer to JS SEO Best practices
But expand on lazy loading – intersectionObserver – but lazyload attribute
AMP is fast for a lot of reasons that, technically, can be replicated outside of the AMP framework (lazy loading, limited JS, CDN, etc.)
BUT what AMP has that ”normal” pages don’t have is the pre-loading in the SERP (AMP viewer)
If Google start pre-rendering the ”10 blue links”, then AMP has not reason to be.
https://amp.dev/about/how-amp-works/
https://medium.com/@cramforce/why-amp-is-fast-7d2ff1f48597
Lazy loading
Extensive use of preconnect
Prefetching of lazy loaded resources
All async JavaScript
Inline style sheets
Zero HTTP requests block font downloads.
Instant loading through prerendering
Prerendering only downloads resources above the fold
Prerendering does not render things that might be expensive in terms of CPU
Intelligent resource prioritization
Uncoupling of document layout from resource downloads
Maximum size for style sheet
FastDOM-style DOM change batching
Optimized for low count of style recalculations and layout
Mitigations for third party JS worst-practices such as document.write
Runtime cost of analytics instrumentation is independent of number of used analytics providers
Extensions don’t block page layout
CDN delivery available to all AMP documents
All resources and the document are loaded from the same origin through the same HTTP 2.0 tunnel
Animations can be GPU accelerated
User gets the AMP from the SERP
Service worker is installed on device
Once activated, SW caches the “app shell” and initial data
User clicks on a (internal) link
Service worker “hijacks” the click
Pre-cached PWA loads instantly
ServiceWorker “hijacks” the click – Server handles the rest
ServiceWorker “hijacks” the click – Server handles the rest
Google and search engines only get the AMP version of your URLs/pages
- Not the canonical or “normal” URL where images (img + src) can be found
https://amp.dev/documentation/guides-and-tutorials/develop/media_iframes_3p/
https://amp.dev/documentation/guides-and-tutorials/optimize-and-measure/server-side-rendering/
Bots only crawl/index the AMP version of the site
No waste of crawling resources over multiple URLs for the same content
Clear signaling (i.e. don’t worry about all of those canonical/alternate tags)
Pages are fast and pre-loaded in the SERP (AMP viewer or “Real URL”)
Pages are mobile-friendly