This document provides a project synopsis for developing a highly efficient web crawler. The objective is to browse the World Wide Web in an automated manner. A web crawler is a computer program that methodically explores websites by starting with a list of seed URLs and identifying links to add to a crawl frontier. As the crawler visits URLs, it finds and adds all hyperlinks to the frontier to visit recursively. The synopsis provides an introduction to web crawlers and their uses, as well as the machine specifications required.
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Web crawler synopsis
1. Y.M.C.A UniversitY of sCienCe And
teChnologY, fAridAbAd
ProJeCt sYnoPsis
_______________________________________________________
Web CrAWler
_______________________________________________________
MAYUR GARG
Rol no. :- IT-2337-2k7
Mentor: - Mrs DEEPIKA
Email: -
Mayurgarg2@gmail.com
Objective/ Aim:
2. This project aims at developing a highly efficient WEB CRAWLER that
browses the World Wide Web in a methodical, automated manner.
Introduction:
A Web crawler is a computer program that browses the World Wide Web
in a methodical, automated manner. Other terms for Web crawlers are ants,
automatic indexers, bots, and worms or Web spider, Web robot, or—
especially in the FOAF community—Web scutter.
The process is called Web crawling or spidering. Many sites, in particular
search engines, use spidering as a means of providing up-to-date data. Web
crawlers are mainly used to create a copy of all the visited pages for later
processing by a search engine that will index the downloaded pages to
provide fast searches. A Web crawler is one type of bot, or software agent.
In general, it starts with a list of URLs to visit, called the seeds. As the
crawler visits these URLs, it identifies all the hyperlinks in the page and
adds them to the list of URLs to visit, called the crawl frontier. URLs from
the frontier are recursively visited according to a set of policies.
Machine specification:
640MB Memory,
100Mbps Ether, Win XP/VISTA,
Jdk1.6, Database Client.