SlideShare une entreprise Scribd logo
1  sur  10
Making a Web Crawler
Code
Why has the “ –r ” option been included? -r turns on recursive retrieving, which is essential to the working of a crawler. Without it the crawler cannot retrieve the links as can be seen when we remove this option. Understanding the code
 
Why has the “ –spider ” option been included? This option makes wget behave like a spider, i.e, it will not download the web pages, it will just check that they are there.
 
Why has the “ -domains ”  option been included? This option specifies the domain of the search. We have limited the crawling to the URL specified by the user only. The next slide shows the crawler's response when this is not done
 
Clearly, the crawler cannot access www.google.co.in as the host name is not the same as www.google.com
Why has the “ -l 5 ”  option been included? This option specifies the depth of the search. It is a precaution to avoid spider traps.  Why has the “ --tries = 5 ”  option been included? This option specifies the number of retries which the crawler will make in case connection with the URL has failed.

Contenu connexe

Tendances

Web crawler synopsis
Web crawler synopsisWeb crawler synopsis
Web crawler synopsisMayur Garg
 
Web crawler and applications
Web crawler and applicationsWeb crawler and applications
Web crawler and applicationsPartnered Health
 
Colloquim Report on Crawler - 1 Dec 2014
Colloquim Report on Crawler - 1 Dec 2014Colloquim Report on Crawler - 1 Dec 2014
Colloquim Report on Crawler - 1 Dec 2014Sunny Gupta
 
Colloquim Report - Rotto Link Web Crawler
Colloquim Report - Rotto Link Web CrawlerColloquim Report - Rotto Link Web Crawler
Colloquim Report - Rotto Link Web CrawlerAkshay Pratap Singh
 
Design and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web CrawlerDesign and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web CrawlerGeorge Ang
 
Web Crawlers - Exploring the WWW
Web Crawlers - Exploring the WWWWeb Crawlers - Exploring the WWW
Web Crawlers - Exploring the WWWSiddhartha Anand
 
Smart crawler a two stage crawler
Smart crawler a two stage crawlerSmart crawler a two stage crawler
Smart crawler a two stage crawlerRishikesh Pathak
 
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...CloudTechnologies
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyIOSR Journals
 
Smart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep Web
Smart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep WebSmart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep Web
Smart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep WebS Sai Karthik
 
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.iosrjce
 
Log File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitLog File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitTom Bennet
 
Server Logs: After Excel Fails
Server Logs: After Excel FailsServer Logs: After Excel Fails
Server Logs: After Excel FailsOliver Mason
 
Controlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingControlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingRajesh Magar
 
Using server logs to your advantage
Using server logs to your advantageUsing server logs to your advantage
Using server logs to your advantageAlexandra Johnson
 
Research on Key Technology of Web Reptile
Research on Key Technology of Web ReptileResearch on Key Technology of Web Reptile
Research on Key Technology of Web ReptileIRJESJOURNAL
 

Tendances (20)

Web crawler synopsis
Web crawler synopsisWeb crawler synopsis
Web crawler synopsis
 
Web crawler and applications
Web crawler and applicationsWeb crawler and applications
Web crawler and applications
 
“Web crawler”
“Web crawler”“Web crawler”
“Web crawler”
 
WebCrawler
WebCrawlerWebCrawler
WebCrawler
 
Colloquim Report on Crawler - 1 Dec 2014
Colloquim Report on Crawler - 1 Dec 2014Colloquim Report on Crawler - 1 Dec 2014
Colloquim Report on Crawler - 1 Dec 2014
 
Colloquim Report - Rotto Link Web Crawler
Colloquim Report - Rotto Link Web CrawlerColloquim Report - Rotto Link Web Crawler
Colloquim Report - Rotto Link Web Crawler
 
Design and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web CrawlerDesign and Implementation of a High- Performance Distributed Web Crawler
Design and Implementation of a High- Performance Distributed Web Crawler
 
Web Crawlers - Exploring the WWW
Web Crawlers - Exploring the WWWWeb Crawlers - Exploring the WWW
Web Crawlers - Exploring the WWW
 
Smart crawler a two stage crawler
Smart crawler a two stage crawlerSmart crawler a two stage crawler
Smart crawler a two stage crawler
 
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
SMART CRAWLER: A TWO-STAGE CRAWLER FOR EFFICIENTLY HARVESTING DEEP-WEB INTERF...
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET Technology
 
Smart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep Web
Smart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep WebSmart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep Web
Smart Crawler -A Two Stage Crawler For Efficiently Harvesting Deep Web
 
Smart crawler a two stage crawler
Smart crawler a two stage crawlerSmart crawler a two stage crawler
Smart crawler a two stage crawler
 
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
Smart Crawler: A Two Stage Crawler for Concept Based Semantic Search Engine.
 
Log File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkitLog File Analysis: The most powerful tool in your SEO toolkit
Log File Analysis: The most powerful tool in your SEO toolkit
 
Server Logs: After Excel Fails
Server Logs: After Excel FailsServer Logs: After Excel Fails
Server Logs: After Excel Fails
 
Controlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and RankingControlling crawler for better Indexation and Ranking
Controlling crawler for better Indexation and Ranking
 
Using server logs to your advantage
Using server logs to your advantageUsing server logs to your advantage
Using server logs to your advantage
 
Research on Key Technology of Web Reptile
Research on Key Technology of Web ReptileResearch on Key Technology of Web Reptile
Research on Key Technology of Web Reptile
 

Similaire à Coding for a wget based Web Crawler

intoduction to Grails Framework
intoduction to Grails Frameworkintoduction to Grails Framework
intoduction to Grails FrameworkHarshdeep Kaur
 
SEO vs Angular
SEO vs AngularSEO vs Angular
SEO vs AngularFrançois
 
Smart Crawler Automation with RMI
Smart Crawler Automation with RMISmart Crawler Automation with RMI
Smart Crawler Automation with RMIIRJET Journal
 
Technical SEO | Joomla Day Chicago 2012
Technical SEO | Joomla Day Chicago 2012 Technical SEO | Joomla Day Chicago 2012
Technical SEO | Joomla Day Chicago 2012 Jessica Dunbar
 
11 Advanced Uses of Screaming Frog Nov 2019 DMSS
11 Advanced Uses of Screaming Frog Nov 2019 DMSS11 Advanced Uses of Screaming Frog Nov 2019 DMSS
11 Advanced Uses of Screaming Frog Nov 2019 DMSSOliver Brett
 
Modern SEO Players Guide
Modern SEO Players GuideModern SEO Players Guide
Modern SEO Players GuideMichael King
 
Search Engine Spiders
Search Engine SpidersSearch Engine Spiders
Search Engine SpidersCJ Jenkins
 
Making Chrome Extension with AngularJS
Making Chrome Extension with AngularJSMaking Chrome Extension with AngularJS
Making Chrome Extension with AngularJSBen Lau
 
Logstash for SEO: come monitorare i Log del Web Server in realtime
Logstash for SEO: come monitorare i Log del Web Server in realtimeLogstash for SEO: come monitorare i Log del Web Server in realtime
Logstash for SEO: come monitorare i Log del Web Server in realtimeAndrea Cardinale
 
Google Search Console Tutorial | How To Use Google Search Console For SEO ? |...
Google Search Console Tutorial | How To Use Google Search Console For SEO ? |...Google Search Console Tutorial | How To Use Google Search Console For SEO ? |...
Google Search Console Tutorial | How To Use Google Search Console For SEO ? |...Simplilearn
 
Selenium webdriver version 4 features by vikas thange xpanxion automation te...
Selenium webdriver version 4 features  by vikas thange xpanxion automation te...Selenium webdriver version 4 features  by vikas thange xpanxion automation te...
Selenium webdriver version 4 features by vikas thange xpanxion automation te...Vikas Thange
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET Journal
 
How to use url parameters in webmaster tools
How to use url parameters in webmaster toolsHow to use url parameters in webmaster tools
How to use url parameters in webmaster toolsCgColors
 
Comprehensive Browser Automation Solution using Groovy, WebDriver & Obect Model
Comprehensive Browser Automation Solution using Groovy, WebDriver & Obect ModelComprehensive Browser Automation Solution using Groovy, WebDriver & Obect Model
Comprehensive Browser Automation Solution using Groovy, WebDriver & Obect ModelvodQA
 
Agile NCR 2013 - Gaurav Bansal- web_automation
Agile NCR 2013 - Gaurav Bansal- web_automationAgile NCR 2013 - Gaurav Bansal- web_automation
Agile NCR 2013 - Gaurav Bansal- web_automationAgileNCR2013
 
Report - Final_New_phishila
Report - Final_New_phishilaReport - Final_New_phishila
Report - Final_New_phishilaAshwin Palani
 

Similaire à Coding for a wget based Web Crawler (20)

intoduction to Grails Framework
intoduction to Grails Frameworkintoduction to Grails Framework
intoduction to Grails Framework
 
SEO vs Angular
SEO vs AngularSEO vs Angular
SEO vs Angular
 
Spring security
Spring securitySpring security
Spring security
 
Spring security
Spring securitySpring security
Spring security
 
Smart Crawler Automation with RMI
Smart Crawler Automation with RMISmart Crawler Automation with RMI
Smart Crawler Automation with RMI
 
Technical SEO | Joomla Day Chicago 2012
Technical SEO | Joomla Day Chicago 2012 Technical SEO | Joomla Day Chicago 2012
Technical SEO | Joomla Day Chicago 2012
 
11 Advanced Uses of Screaming Frog Nov 2019 DMSS
11 Advanced Uses of Screaming Frog Nov 2019 DMSS11 Advanced Uses of Screaming Frog Nov 2019 DMSS
11 Advanced Uses of Screaming Frog Nov 2019 DMSS
 
Modern SEO Players Guide
Modern SEO Players GuideModern SEO Players Guide
Modern SEO Players Guide
 
Search Engine Spiders
Search Engine SpidersSearch Engine Spiders
Search Engine Spiders
 
White Hat Cloaking
White Hat CloakingWhite Hat Cloaking
White Hat Cloaking
 
Making Chrome Extension with AngularJS
Making Chrome Extension with AngularJSMaking Chrome Extension with AngularJS
Making Chrome Extension with AngularJS
 
Logstash for SEO: come monitorare i Log del Web Server in realtime
Logstash for SEO: come monitorare i Log del Web Server in realtimeLogstash for SEO: come monitorare i Log del Web Server in realtime
Logstash for SEO: come monitorare i Log del Web Server in realtime
 
Google Search Console Tutorial | How To Use Google Search Console For SEO ? |...
Google Search Console Tutorial | How To Use Google Search Console For SEO ? |...Google Search Console Tutorial | How To Use Google Search Console For SEO ? |...
Google Search Console Tutorial | How To Use Google Search Console For SEO ? |...
 
SEO for Developers
SEO for DevelopersSEO for Developers
SEO for Developers
 
Selenium webdriver version 4 features by vikas thange xpanxion automation te...
Selenium webdriver version 4 features  by vikas thange xpanxion automation te...Selenium webdriver version 4 features  by vikas thange xpanxion automation te...
Selenium webdriver version 4 features by vikas thange xpanxion automation te...
 
IRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine OptimizationIRJET - Review on Search Engine Optimization
IRJET - Review on Search Engine Optimization
 
How to use url parameters in webmaster tools
How to use url parameters in webmaster toolsHow to use url parameters in webmaster tools
How to use url parameters in webmaster tools
 
Comprehensive Browser Automation Solution using Groovy, WebDriver & Obect Model
Comprehensive Browser Automation Solution using Groovy, WebDriver & Obect ModelComprehensive Browser Automation Solution using Groovy, WebDriver & Obect Model
Comprehensive Browser Automation Solution using Groovy, WebDriver & Obect Model
 
Agile NCR 2013 - Gaurav Bansal- web_automation
Agile NCR 2013 - Gaurav Bansal- web_automationAgile NCR 2013 - Gaurav Bansal- web_automation
Agile NCR 2013 - Gaurav Bansal- web_automation
 
Report - Final_New_phishila
Report - Final_New_phishilaReport - Final_New_phishila
Report - Final_New_phishila
 

Dernier

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Dernier (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Coding for a wget based Web Crawler

  • 1. Making a Web Crawler
  • 3. Why has the “ –r ” option been included? -r turns on recursive retrieving, which is essential to the working of a crawler. Without it the crawler cannot retrieve the links as can be seen when we remove this option. Understanding the code
  • 4.  
  • 5. Why has the “ –spider ” option been included? This option makes wget behave like a spider, i.e, it will not download the web pages, it will just check that they are there.
  • 6.  
  • 7. Why has the “ -domains ” option been included? This option specifies the domain of the search. We have limited the crawling to the URL specified by the user only. The next slide shows the crawler's response when this is not done
  • 8.  
  • 9. Clearly, the crawler cannot access www.google.co.in as the host name is not the same as www.google.com
  • 10. Why has the “ -l 5 ” option been included? This option specifies the depth of the search. It is a precaution to avoid spider traps. Why has the “ --tries = 5 ” option been included? This option specifies the number of retries which the crawler will make in case connection with the URL has failed.