Making AJAX crawlable by katharina Probst & Bruce Johnson

•

0 j'aime•9,445 vues

Diganta Kumar

Technologie

Making AJAX crawlable
Katharina Probst
Engineer, Google
Bruce Johnson
Engineering Manager, Google
in collaboration with:
Arup Mukherjee, Erik van der Poel, Li Xiao, Google

The problem of AJAX for web crawlers
Web crawlers don't always see what the user sees
● JavaScript produces dynamic content that is not seen by crawlers
● Example: A Google Web Toolkit application that looks like this to a user...
...but a web crawler only sees this:
<script src='showcase.js'></script>

Why does this problem need to be solved?
● Web 2.0: More content on the web is created dynamically (~69%)
● Over time, this hurts search
● Developers are discouraged from building dynamic apps
● Not solving AJAX crawlability holds back progress on the web!

A crawler's view of the web - with and without AJAX
Crawler
Web
Server
Browser
Browser
Web
Server
www.example.com/mystate
www.example.com/
What the crawler can't seeWhat the crawler can see
With
AJAX
Without
AJAX
#mystate

● Crawling and indexing AJAX is needed for users and developers
● Problem: Which AJAX states can be indexed?
○ Explicit opt-in needed by the web server
● Problem: Don't want to cloak
○ Users and search engine crawlers need to see the same content
● Problem: How could the logistics work?
○ That's the remainder of the presentation
Goal: crawl and index AJAX

Possible solutions
● Crawlers execute all the web's JavaScript
○ This is expensive and time-consuming
○ Only major search engines would even be able to do this, and
probably only partially
○ Indexes would be more stale, resulting in worse search results
● Web servers execute their own JavaScript at crawl time
○ Avoids above problems
○ Gives more control to webmasters
○ Can be done automatically
○ Does not require ongoing maintenance

Overview of proposed approach - crawl time
Your Web
Server
Crawler
Headless
browser
3. Server maps from ugly URL to pretty URL:
www.example.com/page?query#!mystate
4. Server invokes headless browser
5. Headless browser executes JavaScript and produces an
HTML snapshot for pretty URL
6. Crawler processes
HTML snapshot, extracts
pretty URLs
1. Crawler maps from pretty URL to ugly URL:
www.example.com/page?query&_escaped_fragment_=mystate
2. Requests ugly URL
HTML
snapsho
t
Crawling is enabled by mapping between
● "pretty" URLs: www.example.com/page?query#!mystate
● "ugly" URLs: www.example.com/page?query&_escaped_fragment_=mystate

Overview of proposed approach - search time
Search
engine
1. User submits query
2. Search engine returns pretty URL:
www.example.com/page?query#!mystate
Browser
3. User clicks on pretty URL link
4. Browser returns pretty URL:
www.example.com/page?query#!mystate
Nothing changes!

Agreement between participants
● Web servers agree to
○ opt in by indicating indexable states
○ execute JavaScript for ugly URLs (no user agent sniffing!)
○ not cloak by always giving same content to browser and crawler
regardless of request (or risk elimination, as before)
● Search engines agree to
○ discover URLs as before (Sitemaps, hyperlinks)
○ modify pretty URLs to ugly URLs
○ index content
○ display pretty URLs

Summary: Life of a URL
http://example.com/stocks.html#GOOG
could easily be changed to
http://example.com/stocks.html#!GOOG
which can be crawled as
http://example.com/stocks.html?_escaped_fragment_=GOOG
but will be displayed in the search results as
http://example.com/stocks.html#!GOOG

Feedback is welcome
● We are currently working on a proposal and prototype implementation
● Check out the blog post on the Google Webmaster Central Blog: http:
//googlewebmastercentral.blogspot.com
● We welcome feedback from the community at the Google Webmaster
Help Forum (link is posted in the blog entry)

Recommandé

Building SEO friendly SPA using PhantomJS, Node.js, Angular.js and HTML5Naveen S.R

Is AngularJS Right for Your Enterprise?seoClarity

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

Recommandé

Building SEO friendly SPA using PhantomJS, Node.js, Angular.js and HTML5Naveen S.R

Is AngularJS Right for Your Enterprise?seoClarity

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Histor y of HAM Radio presentation slidevu2urc

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

A Call to Action for Generative AI in 2024Results

Slack Application Development 101 Slidespraypatel2

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Scaling API-first – The story of a global engineering organizationRadu Cotescu

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

A Domino Admins Adventures (Engage 2024)Gabriella Davis

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

Contenu connexe

Dernier

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Histor y of HAM Radio presentation slidevu2urc

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

A Call to Action for Generative AI in 2024Results

Slack Application Development 101 Slidespraypatel2

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Scaling API-first – The story of a global engineering organizationRadu Cotescu

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Dernier (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Exploring the Future Potential of AI-Enabled Smartphone Processors

Histor y of HAM Radio presentation slide

Axa Assurance Maroc - Insurer Innovation Award 2024

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams

08448380779 Call Girls In Civil Lines Women Seeking Men

Data Cloud, More than a CDP by Matt Robison

2024: Domino Containers - The Next Step. News from the Domino Container commu...

Breaking the Kubernetes Kill Chain: Host Path Mount

Boost PC performance: How more available memory can improve productivity

Injustice - Developers Among Us (SciFiDevCon 2024)

Driving Behavioral Change for Information Management through Data-Driven Gree...

A Call to Action for Generative AI in 2024

Slack Application Development 101 Slides

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Salesforce Community Group Quito, Salesforce 101

Scaling API-first – The story of a global engineering organization

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

A Domino Admins Adventures (Engage 2024)

En vedette

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

ChatGPT webinar slidesAlireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike RoutesProject for Public Spaces & National Center for Biking and Walking

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference

Barbie - Brand Strategy PresentationErica Santiago

En vedette (20)

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...

Barbie - Brand Strategy Presentation

Making AJAX crawlable by katharina Probst & Bruce Johnson

1. Making AJAX crawlable Katharina Probst Engineer, Google Bruce Johnson Engineering Manager, Google in collaboration with: Arup Mukherjee, Erik van der Poel, Li Xiao, Google

2. The problem of AJAX for web crawlers Web crawlers don't always see what the user sees ● JavaScript produces dynamic content that is not seen by crawlers ● Example: A Google Web Toolkit application that looks like this to a user... ...but a web crawler only sees this: <script src='showcase.js'></script>

3. Why does this problem need to be solved? ● Web 2.0: More content on the web is created dynamically (~69%) ● Over time, this hurts search ● Developers are discouraged from building dynamic apps ● Not solving AJAX crawlability holds back progress on the web!

4. A crawler's view of the web - with and without AJAX Crawler Web Server Browser Browser Web Server www.example.com/mystate www.example.com/ What the crawler can't seeWhat the crawler can see With AJAX Without AJAX #mystate

5. ● Crawling and indexing AJAX is needed for users and developers ● Problem: Which AJAX states can be indexed? ○ Explicit opt-in needed by the web server ● Problem: Don't want to cloak ○ Users and search engine crawlers need to see the same content ● Problem: How could the logistics work? ○ That's the remainder of the presentation Goal: crawl and index AJAX

6. Possible solutions ● Crawlers execute all the web's JavaScript ○ This is expensive and time-consuming ○ Only major search engines would even be able to do this, and probably only partially ○ Indexes would be more stale, resulting in worse search results ● Web servers execute their own JavaScript at crawl time ○ Avoids above problems ○ Gives more control to webmasters ○ Can be done automatically ○ Does not require ongoing maintenance

7. Overview of proposed approach - crawl time Your Web Server Crawler Headless browser 3. Server maps from ugly URL to pretty URL: www.example.com/page?query#!mystate 4. Server invokes headless browser 5. Headless browser executes JavaScript and produces an HTML snapshot for pretty URL 6. Crawler processes HTML snapshot, extracts pretty URLs 1. Crawler maps from pretty URL to ugly URL: www.example.com/page?query&_escaped_fragment_=mystate 2. Requests ugly URL HTML snapsho t Crawling is enabled by mapping between ● "pretty" URLs: www.example.com/page?query#!mystate ● "ugly" URLs: www.example.com/page?query&_escaped_fragment_=mystate

8. Overview of proposed approach - search time Search engine 1. User submits query 2. Search engine returns pretty URL: www.example.com/page?query#!mystate Browser 3. User clicks on pretty URL link 4. Browser returns pretty URL: www.example.com/page?query#!mystate Nothing changes!

9. Agreement between participants ● Web servers agree to ○ opt in by indicating indexable states ○ execute JavaScript for ugly URLs (no user agent sniffing!) ○ not cloak by always giving same content to browser and crawler regardless of request (or risk elimination, as before) ● Search engines agree to ○ discover URLs as before (Sitemaps, hyperlinks) ○ modify pretty URLs to ugly URLs ○ index content ○ display pretty URLs

10. Summary: Life of a URL http://example.com/stocks.html#GOOG could easily be changed to http://example.com/stocks.html#!GOOG which can be crawled as http://example.com/stocks.html?_escaped_fragment_=GOOG but will be displayed in the search results as http://example.com/stocks.html#!GOOG

11. Feedback is welcome ● We are currently working on a proposal and prototype implementation ● Check out the blog post on the Google Webmaster Central Blog: http: //googlewebmastercentral.blogspot.com ● We welcome feedback from the community at the Google Webmaster Help Forum (link is posted in the blog entry)