SlideShare une entreprise Scribd logo
1  sur  18
try it the                         way !!!
Founders: Larry Page (currently, President of Manufacturing) and Sergey Brin (President of Technology) Created “BackRub” web search engine in 1996 with a motive to bring the net on their system
History of Google so Far : In 1998 Larry and Sergey(Stanford Graduates)  changed the name BackRub to google and started their company “Google Inc.” Later that year they received their first funding cheque worth $100,000. In 2000, google toolbar and adwords were introduced. AOL added google as their search partners officially. In 2003, google launched their adSense program.
Some Rough Statistics of Google (from August 29th, 1996) Number of webpages fetched-24 Million Total indexable HTML urls: 75.2306 Million Total content downloaded: 207.022 gigabytes
Services Provided by Google apart from being a Search Engine
What made Google so popular ? Chief features are: pageRank Algorithm  Anchor text Other features are: Big Files Repository Document Index Hit lists
PageRank Algorithm(Bringing Order to the Web) A PageRank for 26 million web pages can be computed in a few hours on a medium size workstation.  Firstly, citation graphs are created, containing as many as 518 million hyperlinks(Assumed). These maps help in calculating the page rank of different web pages. A simple formula is used to create the page ranks for any search
PageRank Formula PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) T1….Tn are citations to a page d is the Damping Factor (value between 0 to 1). Usually has a value of 0.85. C(A) is the no of links going out of that page. pageRank can be calculated by using a simple iterative algorithm.
Anchor Text Usually the links are given the text as the type of page they are associated with. Google creates a separate database to maitainthese indexes. This helps to retrieve even those pages which are not being crawled. In this case, the search engine can even return a page that never actually existed, but had hyperlinks pointing to it.
Repository The repository contains the full HTML of every web page. Each page is compressed using zlib. compression rate of zlib is 3 to 1. the documents are stored one after the other and are prefixed by docID, length, and URL.
HIT LISTS-A hit list corresponds to a list of occurrences of a particular word in a particular document including position, font, and capitalization information. DOCUMENT INDEX-The document index keeps information about each document. It is a fixed width ISAM (Index sequential access mode) index, ordered by docID.  BIGFILES-BigFiles are virtual files spanning multiple file systems and are addressable by 64 bit integers.
Google Architecture Overview
Crawling The Web In order to scale to hundreds of millions of web pages, Google has a fast distributed crawling system. A single URLserver serves lists of URLs to a number of crawlers (we typically ran about 3). Both the URLserver and the crawlers are implemented in Python. Each crawler keeps roughly 300 connections open at once. At peak speeds, the system can crawl over 100 web pages per second using four crawlers. Googlebot is the search bot software used by Google,  which collects documents from the web to build a searchable index for the Google Search engine.
What else google can do ? Refine search results Calculator Currency converter Time zones Specific “filetype” search Advanced search I Am Feeling Lucky. Dictionary Language translator
Created By:Anmol Buber(0713313015)Abhinav Singh(0713313003)

Contenu connexe

Tendances

Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlPrimal Pappachan
 
From Web 2.0 to the Semantic Web: Bridging the Gap in the Newsmedia Industry
From Web 2.0 to the Semantic Web: Bridging the Gap in the Newsmedia IndustryFrom Web 2.0 to the Semantic Web: Bridging the Gap in the Newsmedia Industry
From Web 2.0 to the Semantic Web: Bridging the Gap in the Newsmedia IndustryJoel Amoussou
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
Overview of Dan Olteanu's Research presentation
Overview of Dan Olteanu's Research presentationOverview of Dan Olteanu's Research presentation
Overview of Dan Olteanu's Research presentationDBOnto
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with RBarbara Fusinska
 
20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所Ryuji Tamagawa
 
Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016David Erickson
 
PyCon 2012 - Data Driven Design
PyCon 2012 -  Data Driven DesignPyCon 2012 -  Data Driven Design
PyCon 2012 - Data Driven DesignMax Klymyshyn
 
20170210 sapporotechbar7
20170210 sapporotechbar720170210 sapporotechbar7
20170210 sapporotechbar7Ryuji Tamagawa
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Thomas Vanhove
 
Graph Analysis over JSON, Larus
Graph Analysis over JSON, LarusGraph Analysis over JSON, Larus
Graph Analysis over JSON, LarusNeo4j
 
Ten things to consider for interactive analytics on write once workloads
Ten things to consider for interactive analytics on write once workloadsTen things to consider for interactive analytics on write once workloads
Ten things to consider for interactive analytics on write once workloadsAbinasha Karana
 
Building real apps on serverless
Building real apps on serverlessBuilding real apps on serverless
Building real apps on serverlessTirumarai Selvan
 
Intro to hadoop ecosystem
Intro to hadoop ecosystemIntro to hadoop ecosystem
Intro to hadoop ecosystemGrzegorz Kolpuc
 

Tendances (20)

Insight_150115_Demo
Insight_150115_DemoInsight_150115_Demo
Insight_150115_Demo
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
 
From Web 2.0 to the Semantic Web: Bridging the Gap in the Newsmedia Industry
From Web 2.0 to the Semantic Web: Bridging the Gap in the Newsmedia IndustryFrom Web 2.0 to the Semantic Web: Bridging the Gap in the Newsmedia Industry
From Web 2.0 to the Semantic Web: Bridging the Gap in the Newsmedia Industry
 
Watch Your Log!
Watch Your Log!Watch Your Log!
Watch Your Log!
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
Overview of Dan Olteanu's Research presentation
Overview of Dan Olteanu's Research presentationOverview of Dan Olteanu's Research presentation
Overview of Dan Olteanu's Research presentation
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
 
Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016Visualizing Data in Elasticsearch DevFest DC 2016
Visualizing Data in Elasticsearch DevFest DC 2016
 
PyCon 2012 - Data Driven Design
PyCon 2012 -  Data Driven DesignPyCon 2012 -  Data Driven Design
PyCon 2012 - Data Driven Design
 
20170210 sapporotechbar7
20170210 sapporotechbar720170210 sapporotechbar7
20170210 sapporotechbar7
 
Database Backup
Database BackupDatabase Backup
Database Backup
 
Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)Introduction to Big Data processing (FGRE2016)
Introduction to Big Data processing (FGRE2016)
 
Graph Analysis over JSON, Larus
Graph Analysis over JSON, LarusGraph Analysis over JSON, Larus
Graph Analysis over JSON, Larus
 
Ten things to consider for interactive analytics on write once workloads
Ten things to consider for interactive analytics on write once workloadsTen things to consider for interactive analytics on write once workloads
Ten things to consider for interactive analytics on write once workloads
 
Data_Size_statistics
Data_Size_statisticsData_Size_statistics
Data_Size_statistics
 
Building real apps on serverless
Building real apps on serverlessBuilding real apps on serverless
Building real apps on serverless
 
Intro to hadoop ecosystem
Intro to hadoop ecosystemIntro to hadoop ecosystem
Intro to hadoop ecosystem
 
GitConnect
GitConnectGitConnect
GitConnect
 

En vedette

En vedette (9)

Chap14_Ecom
Chap14_EcomChap14_Ecom
Chap14_Ecom
 
Implementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduceImplementing page rank algorithm using hadoop map reduce
Implementing page rank algorithm using hadoop map reduce
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Seo and page rank algorithm
Seo and page rank algorithmSeo and page rank algorithm
Seo and page rank algorithm
 
The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?The Google Pagerank algorithm - How does it work?
The Google Pagerank algorithm - How does it work?
 
Smart Crawler
Smart CrawlerSmart Crawler
Smart Crawler
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph
 
Web crawler
Web crawlerWeb crawler
Web crawler
 

Similaire à Try It The Google Way .

ΟΚΤΩΒΡΙΟΣ 2010
ΟΚΤΩΒΡΙΟΣ 2010ΟΚΤΩΒΡΙΟΣ 2010
ΟΚΤΩΒΡΙΟΣ 2010steverz
 
The anatomy of google
The anatomy of googleThe anatomy of google
The anatomy of googlemaelmardi
 
Google Research Paper
Google Research PaperGoogle Research Paper
Google Research Paperdidip
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search EngineNIKHIL NAIR
 
Search engine
Search engineSearch engine
Search engineswaraj27
 
History page-brin thesis - anatomy of a large scale hypertextual web search...
History   page-brin thesis - anatomy of a large scale hypertextual web search...History   page-brin thesis - anatomy of a large scale hypertextual web search...
History page-brin thesis - anatomy of a large scale hypertextual web search...Bitsytask
 
Pagerank
PagerankPagerank
Pageranktkgcse
 
Web2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google worldWeb2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google worldCarlo Vaccari
 
Googling of GooGle
Googling of GooGleGoogling of GooGle
Googling of GooGlebinit singh
 
Digital marketing.pptx
Digital marketing.pptxDigital marketing.pptx
Digital marketing.pptxBhaskar813968
 
Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)ROHIT SAHU
 
Google history nd architecture
Google history nd architectureGoogle history nd architecture
Google history nd architectureDivyangee Jain
 
Google algorithms
Google algorithmsGoogle algorithms
Google algorithmsstudent
 
Google Search Engine
Google Search Engine Google Search Engine
Google Search Engine Aniket_1415
 

Similaire à Try It The Google Way . (20)

ΟΚΤΩΒΡΙΟΣ 2010
ΟΚΤΩΒΡΙΟΣ 2010ΟΚΤΩΒΡΙΟΣ 2010
ΟΚΤΩΒΡΙΟΣ 2010
 
The anatomy of google
The anatomy of googleThe anatomy of google
The anatomy of google
 
Test
TestTest
Test
 
Google
GoogleGoogle
Google
 
Google Research Paper
Google Research PaperGoogle Research Paper
Google Research Paper
 
Working Of Search Engine
Working Of Search EngineWorking Of Search Engine
Working Of Search Engine
 
Search engine
Search engineSearch engine
Search engine
 
Search Engine
Search EngineSearch Engine
Search Engine
 
History page-brin thesis - anatomy of a large scale hypertextual web search...
History   page-brin thesis - anatomy of a large scale hypertextual web search...History   page-brin thesis - anatomy of a large scale hypertextual web search...
History page-brin thesis - anatomy of a large scale hypertextual web search...
 
Pagerank
PagerankPagerank
Pagerank
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
Web2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google worldWeb2.0.2012 - lesson 8 - Google world
Web2.0.2012 - lesson 8 - Google world
 
Googling of GooGle
Googling of GooGleGoogling of GooGle
Googling of GooGle
 
Digital marketing.pptx
Digital marketing.pptxDigital marketing.pptx
Digital marketing.pptx
 
Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)Seminar report(rohitsahu cs 17 vth sem)
Seminar report(rohitsahu cs 17 vth sem)
 
Search engine
Search engineSearch engine
Search engine
 
Google history nd architecture
Google history nd architectureGoogle history nd architecture
Google history nd architecture
 
Google
GoogleGoogle
Google
 
Google algorithms
Google algorithmsGoogle algorithms
Google algorithms
 
Google Search Engine
Google Search Engine Google Search Engine
Google Search Engine
 

Try It The Google Way .

  • 1. try it the way !!!
  • 2. Founders: Larry Page (currently, President of Manufacturing) and Sergey Brin (President of Technology) Created “BackRub” web search engine in 1996 with a motive to bring the net on their system
  • 3. History of Google so Far : In 1998 Larry and Sergey(Stanford Graduates) changed the name BackRub to google and started their company “Google Inc.” Later that year they received their first funding cheque worth $100,000. In 2000, google toolbar and adwords were introduced. AOL added google as their search partners officially. In 2003, google launched their adSense program.
  • 4. Some Rough Statistics of Google (from August 29th, 1996) Number of webpages fetched-24 Million Total indexable HTML urls: 75.2306 Million Total content downloaded: 207.022 gigabytes
  • 5. Services Provided by Google apart from being a Search Engine
  • 6.
  • 7. What made Google so popular ? Chief features are: pageRank Algorithm Anchor text Other features are: Big Files Repository Document Index Hit lists
  • 8. PageRank Algorithm(Bringing Order to the Web) A PageRank for 26 million web pages can be computed in a few hours on a medium size workstation. Firstly, citation graphs are created, containing as many as 518 million hyperlinks(Assumed). These maps help in calculating the page rank of different web pages. A simple formula is used to create the page ranks for any search
  • 9.
  • 10. PageRank Formula PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) T1….Tn are citations to a page d is the Damping Factor (value between 0 to 1). Usually has a value of 0.85. C(A) is the no of links going out of that page. pageRank can be calculated by using a simple iterative algorithm.
  • 11. Anchor Text Usually the links are given the text as the type of page they are associated with. Google creates a separate database to maitainthese indexes. This helps to retrieve even those pages which are not being crawled. In this case, the search engine can even return a page that never actually existed, but had hyperlinks pointing to it.
  • 12. Repository The repository contains the full HTML of every web page. Each page is compressed using zlib. compression rate of zlib is 3 to 1. the documents are stored one after the other and are prefixed by docID, length, and URL.
  • 13. HIT LISTS-A hit list corresponds to a list of occurrences of a particular word in a particular document including position, font, and capitalization information. DOCUMENT INDEX-The document index keeps information about each document. It is a fixed width ISAM (Index sequential access mode) index, ordered by docID. BIGFILES-BigFiles are virtual files spanning multiple file systems and are addressable by 64 bit integers.
  • 15. Crawling The Web In order to scale to hundreds of millions of web pages, Google has a fast distributed crawling system. A single URLserver serves lists of URLs to a number of crawlers (we typically ran about 3). Both the URLserver and the crawlers are implemented in Python. Each crawler keeps roughly 300 connections open at once. At peak speeds, the system can crawl over 100 web pages per second using four crawlers. Googlebot is the search bot software used by Google, which collects documents from the web to build a searchable index for the Google Search engine.
  • 16. What else google can do ? Refine search results Calculator Currency converter Time zones Specific “filetype” search Advanced search I Am Feeling Lucky. Dictionary Language translator
  • 17.