SlideShare une entreprise Scribd logo
1  sur  12
Web crawler
Allan@eSobi.com
Agenda
• Forward of Web Crawler
– HTML Parser

• Practice
– Feed Crawler

• Prototype demo
• Conclusion
HTML Parser
• HTML found on Web is usually dirty,
ill-formed and unsuitable for further
processing.
• First clean up the mess and bring the
order to tags, attributes and
ordinary text.
Well-known Parser
• Access the information using
standard XML interfaces.
• HtmlCleaner
• HtmlParser
• Nekohtml
Parser inner structure
• HTML scanner

– Pre-processing action

• Tag balancer

– Reorders individual elements
– Produces well-formed XML

• Extraction
• Transformation
Example
Extraction
• Text extraction
– for use as input for text search engine
databases for example

• Link extraction
– for crawling through web pages or harvesting
email addresses

• Screen scraping
– for programmatic data input from web pages
Extraction

• Resource extraction

– collecting images or sound

• A browser front end

– the preliminary stage of page display

• Link checking

– ensuring links are valid

• Site monitoring

– checking for page differences beyond
simplistic diffs
Transformation
• URL rewriting
– modifying some or all links on a page

• Site capture
– moving content from the web to local disk

• Censorship
– removing offending words and phrases from
pages
Transformation
• HTML cleanup
– correcting erroneous pages

• AD removal
– excising URLs referencing advertising

• Conversion to XML
– moving existing web pages to XML
Practice
• Feed Crawler
– HTML

• Bloglines, Feedage

– XML

• RssMountain

– JSON

• Google AJAX Feed API

• Prototype
– Demo
Conclusion
• Page search, image search, news
search, blog search, feed search ...
• Fault toleration of text processing
• Text mining in web
• Q&A

Contenu connexe

En vedette (11)

Week10 Web Presentation
Week10 Web PresentationWeek10 Web Presentation
Week10 Web Presentation
 
Crawling the web for fun and profit
Crawling the web for fun and profitCrawling the web for fun and profit
Crawling the web for fun and profit
 
Web crawler and applications
Web crawler and applicationsWeb crawler and applications
Web crawler and applications
 
search engines
search enginessearch engines
search engines
 
Web Crawlers
Web CrawlersWeb Crawlers
Web Crawlers
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
Current challenges in web crawling
Current challenges in web crawlingCurrent challenges in web crawling
Current challenges in web crawling
 
WebCrawler
WebCrawlerWebCrawler
WebCrawler
 
Web Crawler
Web CrawlerWeb Crawler
Web Crawler
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
Web crawler
Web crawlerWeb crawler
Web crawler
 

Similaire à Web Crawler

Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawlervinay arora
 
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...Flink Forward
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypresNekoGato
 
CNIT 129S: Ch 4: Mapping the Application
CNIT 129S: Ch 4: Mapping the ApplicationCNIT 129S: Ch 4: Mapping the Application
CNIT 129S: Ch 4: Mapping the ApplicationSam Bowne
 
Week 1 - Interactive News Editing and Producing
Week 1 - Interactive News Editing and ProducingWeek 1 - Interactive News Editing and Producing
Week 1 - Interactive News Editing and Producingkurtgessler
 
CNIT 129S Ch 4: Mapping the Application
CNIT 129S Ch 4: Mapping the ApplicationCNIT 129S Ch 4: Mapping the Application
CNIT 129S Ch 4: Mapping the ApplicationSam Bowne
 
4 Web Crawler.pptx
4 Web Crawler.pptx4 Web Crawler.pptx
4 Web Crawler.pptxDEEPAK948083
 
Search Engine Optimization, SEO Audits, and Analytics
Search Engine Optimization, SEO Audits, and AnalyticsSearch Engine Optimization, SEO Audits, and Analytics
Search Engine Optimization, SEO Audits, and AnalyticsBill Hartzer
 
SEO for developers (session 1)
SEO for developers (session 1)SEO for developers (session 1)
SEO for developers (session 1)RankAbove
 
On page optimization
On page optimizationOn page optimization
On page optimizationshieepoo
 
Search engine optimization (seo) from Endeca & ATG
Search engine optimization (seo) from Endeca & ATGSearch engine optimization (seo) from Endeca & ATG
Search engine optimization (seo) from Endeca & ATGVignesh sitaraman
 
How To Construct A Search Engine Friendly Website
How To Construct A Search Engine Friendly WebsiteHow To Construct A Search Engine Friendly Website
How To Construct A Search Engine Friendly WebsiteMatt Siltala
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptxScrbifPt
 

Similaire à Web Crawler (20)

Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
 
Scalability andefficiencypres
Scalability andefficiencypresScalability andefficiencypres
Scalability andefficiencypres
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Web mining
Web miningWeb mining
Web mining
 
CNIT 129S: Ch 4: Mapping the Application
CNIT 129S: Ch 4: Mapping the ApplicationCNIT 129S: Ch 4: Mapping the Application
CNIT 129S: Ch 4: Mapping the Application
 
Week 1 - Interactive News Editing and Producing
Week 1 - Interactive News Editing and ProducingWeek 1 - Interactive News Editing and Producing
Week 1 - Interactive News Editing and Producing
 
CNIT 129S Ch 4: Mapping the Application
CNIT 129S Ch 4: Mapping the ApplicationCNIT 129S Ch 4: Mapping the Application
CNIT 129S Ch 4: Mapping the Application
 
4 Web Crawler.pptx
4 Web Crawler.pptx4 Web Crawler.pptx
4 Web Crawler.pptx
 
Webcrawler
WebcrawlerWebcrawler
Webcrawler
 
Search Engine Optimization, SEO Audits, and Analytics
Search Engine Optimization, SEO Audits, and AnalyticsSearch Engine Optimization, SEO Audits, and Analytics
Search Engine Optimization, SEO Audits, and Analytics
 
Search
SearchSearch
Search
 
On-Page SEO
On-Page  SEO On-Page  SEO
On-Page SEO
 
Webcrawler
WebcrawlerWebcrawler
Webcrawler
 
Module-5 Ppt.pptx
Module-5 Ppt.pptxModule-5 Ppt.pptx
Module-5 Ppt.pptx
 
SEO for developers (session 1)
SEO for developers (session 1)SEO for developers (session 1)
SEO for developers (session 1)
 
On page optimization
On page optimizationOn page optimization
On page optimization
 
Search engine optimization (seo) from Endeca & ATG
Search engine optimization (seo) from Endeca & ATGSearch engine optimization (seo) from Endeca & ATG
Search engine optimization (seo) from Endeca & ATG
 
How To Construct A Search Engine Friendly Website
How To Construct A Search Engine Friendly WebsiteHow To Construct A Search Engine Friendly Website
How To Construct A Search Engine Friendly Website
 
Web Mining.pptx
Web Mining.pptxWeb Mining.pptx
Web Mining.pptx
 

Plus de Allan Huang

Concurrency in Java
Concurrency in  JavaConcurrency in  Java
Concurrency in JavaAllan Huang
 
Build, logging, and unit test tools
Build, logging, and unit test toolsBuild, logging, and unit test tools
Build, logging, and unit test toolsAllan Huang
 
Java JSON Parser Comparison
Java JSON Parser ComparisonJava JSON Parser Comparison
Java JSON Parser ComparisonAllan Huang
 
Netty 4-based RPC System Development
Netty 4-based RPC System DevelopmentNetty 4-based RPC System Development
Netty 4-based RPC System DevelopmentAllan Huang
 
eSobi Website Multilayered Architecture
eSobi Website Multilayered ArchitectureeSobi Website Multilayered Architecture
eSobi Website Multilayered ArchitectureAllan Huang
 
Java New Evolution
Java New EvolutionJava New Evolution
Java New EvolutionAllan Huang
 
Tomcat New Evolution
Tomcat New EvolutionTomcat New Evolution
Tomcat New EvolutionAllan Huang
 
JQuery New Evolution
JQuery New EvolutionJQuery New Evolution
JQuery New EvolutionAllan Huang
 
Responsive Web Design
Responsive Web DesignResponsive Web Design
Responsive Web DesignAllan Huang
 
Boilerpipe Integration And Improvement
Boilerpipe Integration And ImprovementBoilerpipe Integration And Improvement
Boilerpipe Integration And ImprovementAllan Huang
 
Build Cross-Platform Mobile Application with PhoneGap
Build Cross-Platform Mobile Application with PhoneGapBuild Cross-Platform Mobile Application with PhoneGap
Build Cross-Platform Mobile Application with PhoneGapAllan Huang
 
HTML5 Multithreading
HTML5 MultithreadingHTML5 Multithreading
HTML5 MultithreadingAllan Huang
 
HTML5 Offline Web Application
HTML5 Offline Web ApplicationHTML5 Offline Web Application
HTML5 Offline Web ApplicationAllan Huang
 
HTML5 Data Storage
HTML5 Data StorageHTML5 Data Storage
HTML5 Data StorageAllan Huang
 
Java Script Patterns
Java Script PatternsJava Script Patterns
Java Script PatternsAllan Huang
 
Weighted feed recommand
Weighted feed recommandWeighted feed recommand
Weighted feed recommandAllan Huang
 
eSobi Site Initiation
eSobi Site InitiationeSobi Site Initiation
eSobi Site InitiationAllan Huang
 
Architecture of eSobi club based on J2EE
Architecture of eSobi club based on J2EEArchitecture of eSobi club based on J2EE
Architecture of eSobi club based on J2EEAllan Huang
 

Plus de Allan Huang (20)

Concurrency in Java
Concurrency in  JavaConcurrency in  Java
Concurrency in Java
 
Build, logging, and unit test tools
Build, logging, and unit test toolsBuild, logging, and unit test tools
Build, logging, and unit test tools
 
Drools
DroolsDrools
Drools
 
Java JSON Parser Comparison
Java JSON Parser ComparisonJava JSON Parser Comparison
Java JSON Parser Comparison
 
Netty 4-based RPC System Development
Netty 4-based RPC System DevelopmentNetty 4-based RPC System Development
Netty 4-based RPC System Development
 
eSobi Website Multilayered Architecture
eSobi Website Multilayered ArchitectureeSobi Website Multilayered Architecture
eSobi Website Multilayered Architecture
 
Java New Evolution
Java New EvolutionJava New Evolution
Java New Evolution
 
Tomcat New Evolution
Tomcat New EvolutionTomcat New Evolution
Tomcat New Evolution
 
JQuery New Evolution
JQuery New EvolutionJQuery New Evolution
JQuery New Evolution
 
Responsive Web Design
Responsive Web DesignResponsive Web Design
Responsive Web Design
 
Boilerpipe Integration And Improvement
Boilerpipe Integration And ImprovementBoilerpipe Integration And Improvement
Boilerpipe Integration And Improvement
 
YQL Case Study
YQL Case StudyYQL Case Study
YQL Case Study
 
Build Cross-Platform Mobile Application with PhoneGap
Build Cross-Platform Mobile Application with PhoneGapBuild Cross-Platform Mobile Application with PhoneGap
Build Cross-Platform Mobile Application with PhoneGap
 
HTML5 Multithreading
HTML5 MultithreadingHTML5 Multithreading
HTML5 Multithreading
 
HTML5 Offline Web Application
HTML5 Offline Web ApplicationHTML5 Offline Web Application
HTML5 Offline Web Application
 
HTML5 Data Storage
HTML5 Data StorageHTML5 Data Storage
HTML5 Data Storage
 
Java Script Patterns
Java Script PatternsJava Script Patterns
Java Script Patterns
 
Weighted feed recommand
Weighted feed recommandWeighted feed recommand
Weighted feed recommand
 
eSobi Site Initiation
eSobi Site InitiationeSobi Site Initiation
eSobi Site Initiation
 
Architecture of eSobi club based on J2EE
Architecture of eSobi club based on J2EEArchitecture of eSobi club based on J2EE
Architecture of eSobi club based on J2EE
 

Dernier

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Dernier (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Web Crawler

  • 2. Agenda • Forward of Web Crawler – HTML Parser • Practice – Feed Crawler • Prototype demo • Conclusion
  • 3. HTML Parser • HTML found on Web is usually dirty, ill-formed and unsuitable for further processing. • First clean up the mess and bring the order to tags, attributes and ordinary text.
  • 4. Well-known Parser • Access the information using standard XML interfaces. • HtmlCleaner • HtmlParser • Nekohtml
  • 5. Parser inner structure • HTML scanner – Pre-processing action • Tag balancer – Reorders individual elements – Produces well-formed XML • Extraction • Transformation
  • 7. Extraction • Text extraction – for use as input for text search engine databases for example • Link extraction – for crawling through web pages or harvesting email addresses • Screen scraping – for programmatic data input from web pages
  • 8. Extraction • Resource extraction – collecting images or sound • A browser front end – the preliminary stage of page display • Link checking – ensuring links are valid • Site monitoring – checking for page differences beyond simplistic diffs
  • 9. Transformation • URL rewriting – modifying some or all links on a page • Site capture – moving content from the web to local disk • Censorship – removing offending words and phrases from pages
  • 10. Transformation • HTML cleanup – correcting erroneous pages • AD removal – excising URLs referencing advertising • Conversion to XML – moving existing web pages to XML
  • 11. Practice • Feed Crawler – HTML • Bloglines, Feedage – XML • RssMountain – JSON • Google AJAX Feed API • Prototype – Demo
  • 12. Conclusion • Page search, image search, news search, blog search, feed search ... • Fault toleration of text processing • Text mining in web • Q&A