Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

What a search engine can teach you about product sitemaps - BrightonSEO April 2018

1 451 vues

Publié le

Pricesearcher is a vertical search engine and our mission is to give consumers the complete view. Our technology processes 500m+ prices per day across 10 countries.
2017 saw the launch of Pricesearcher’s web crawler – PriceBot, to complete the indexing of all UK prices.
In this talk we will analyse what PriceBot discovered and how this information can help you improve the crawlability of your own site.

Publié dans : Internet
  • Soyez le premier à commenter

What a search engine can teach you about product sitemaps - BrightonSEO April 2018

  1. 1. Vlassios Rizopoulos Chief Technology Officer @ pricesearcher.com What a search engine can teach you about product sitemaps @Pricesearcher #BrightonSEO
  2. 2. @Pricesearcher #BrightonSEO BACKGROUND Pricesearcher is a vertical search engine focusing on products and their prices. Our mission is to provide access to all the worlds prices in one place.
  3. 3. @Pricesearcher #BrightonSEO OUR MISSION IS TO INDEX ALL THE WORLD’S PRICES
  4. 4. @Pricesearcher #BrightonSEO SOURCES OF DATA Product feeds from 5000+ retailers Developed plugins Developed PriceBot to complete the picture
  5. 5. @Pricesearcher #BrightonSEO PROGRESS TO DATE Gathered data on 1.1 Billion products Online in 11 Countries Gathered 91 Billion price points for our products On average we check the price of a product 3 times a day We have gathered: 17,000,000 ISBNs 144,000,000 MPNs 73,000,000 SKUs 157,000,000 GTINs GB / US / DE / FR / IT / IE / NO / SE / FI / DK / NG
  6. 6. @Pricesearcher #BrightonSEO WHAT IS PRICEBOT? Pricebot is our proprietary crawler, built to discover products and turn unstructured data from web pages into structured data for our product database Pricesearcher is the only product search engine that crawls to complement our product coverage PriceBot is fully robots.txt compliant, leaves behind a footprint in its user agent and has a built-in feedback mechanism http://www.pricesearcher.com/pricebot
  7. 7. @Pricesearcher #BrightonSEO WHAT INFORMATION IS PRICEBOT COLLECTING? We are looking to extract the following fields: • Product Title • Product Image • Product Price and optionally: • Product Description • Product Identifier (GTIN/UPC/EAN/ISBN) • Product Brand • Product Category • Product Stock Availability
  8. 8. Vastly simplified discovering all the products from retailers @Pricesearcher #BrightonSEO INITIAL CRAWLING TECH DEPENDED ON SITEMAPS
  9. 9. @Pricesearcher #BrightonSEO DATA SAMPLE We will focus on 4000 UK retailers we currently crawl using XML sitemaps discovering 20million+ products
  10. 10. @Pricesearcher #BrightonSEO TOP 10 Data Insights from our crawling tech
  11. 11. @Pricesearcher #BrightonSEO 1. SITEMAP DATA have an XML sitemap with product links that’s regularly updated 91% 61% 54% of retailer websites of retailer websites of retailer websites
  12. 12. @Pricesearcher #BrightonSEO 2. BLOCKING OF CRAWLERS have blocked us unintentionally (generic robots.txt entry or 403 automatic block) have blocked us intentionally (robots.txt entry) 2% of retailer websites 0.05% of retailer websites
  13. 13. @Pricesearcher #BrightonSEO 3. EXTRACTION USING METADATA STANDARDS have product title + price + image defined using meta / opengraph tags have product title + price + image defined using meta / itemprop tags (schema) have product title + price + image defined using both 41% 36% 12% of retailer websites of retailer websites of retailer websites
  14. 14. @Pricesearcher #BrightonSEO 4. EXTRACTION USING JAVASCRIPT no info extracted due to heavy rendering being uneconomical price cannot be extracted as it is converted / calculated on the fly 2% of retailer websites 1% of retailer websites
  15. 15. @Pricesearcher #BrightonSEO 5. SITEMAP LINKS have multiple links to the same product pages have multiple links to pages that return 404 codes 2% of retailer websites 3% of retailer websites
  16. 16. @Pricesearcher #BrightonSEO 6. PRODUCT IDENTIFIERS provide a GTIN-14, EAN-13, UPC-12/8 for their products provide an SKU for their products provide an ISBN for their products 24% of retailer websites 7% of retailer websites 3% of retailer websites
  17. 17. @Pricesearcher #BrightonSEO 7. PRODUCT CATALOGUE SIZE have less than 5000 product links in their sitemap have between 5000 and 30000 links have more than 30000 links 14% of retailer websites 79% of retailer websites 7% of retailer websites
  18. 18. @Pricesearcher #BrightonSEO 8. DATA RICHNESS #1 provide a brand for their products provide a category for their products provide a stock indicator for their products 17% of retailer websites 44% of retailer websites 62% of retailer websites
  19. 19. @Pricesearcher #BrightonSEO 9. DATA RICHNESS #2 – NUMBER OF DIMENSIONS Crawler 6 dimensions Plugin Product Feed 12 dimensions 23 dimensions
  20. 20. @Pricesearcher #BrightonSEO 10. SITEMAP DISCOVERABILITY list their sitemap in robots.txt33% of retailer websites
  21. 21. @Pricesearcher #BrightonSEO TOP 5 Action Points suggestions
  22. 22. @Pricesearcher #BrightonSEO ACTION POINT #1 - SITEMAP • Have an XML sitemap • Have the path of your sitemap listed in robots.txt • Have your product pages in your sitemap • Regularly update your sitemap • Don’t point to 404 pages from your sitemap
  23. 23. @Pricesearcher #BrightonSEO ACTION POINT #2 - META / OPENGRAPH / ITEMPROP • Provide structured information on your products using meta itemprop (schema) or opengraph tags • Provide as much structured data as possible • Implement them as close as possible to the standards
  24. 24. @Pricesearcher #BrightonSEO ACTION POINT #3 – JAVASCRIPT & PRICE • Be wary of the side effects of a javascript heavy site on crawling • If you do implement a javascript heavy site, meta tags with structured information are even more important! • Be wary when converting the price based on geo location • Don’t perform the price conversion in Javascript
  25. 25. @Pricesearcher #BrightonSEO ACTION POINT #4 - ANTI-CRAWL & ROBOTS.TXT • Ask yourselves what’s the benefit of an anti-crawl mechanism • Ask yourselves what’s the benefit of blocking all crawlers in robots.txt • Control the speed of crawlers using crawl-delay
  26. 26. @Pricesearcher #BrightonSEO ACTION POINT #5 - HAVE A SITEMAP MEETING • Have a sitemap strategy, it’s just as important as your SEO strategy • Sitemaps contribute massively to discoverability, yet are often overlooked • Make sure you are doing everything you can to provide structured information • Review your robots.txt contents • Address missed opportunities from your sitemap sooner rather than later
  27. 27. @Pricesearcher #BrightonSEO THANKS FOR LISTENING! Pricebot http://www.pricesearcher.com/pricebot Keen to hear from you with feedback about PriceBot or Pricesearcher in general. Feel free to drop me a line at vlassios@pricesearcher.com or catch up with me at our stand B11 in the expo hall

×