SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
2 December 2005
Seminar on Web Search
History of Search and Web Search Engines
Prof. Beat Signer
Department of Computer Science
Vrije Universiteit Brussel
http://vub.academia.edu/BeatSigner
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2September 5, 2011
Seminar Organisation
 Prof. Beat Signer
WISE Lab, Vrije Universiteit Brussel
bsigner@vub.ac.be
 cross-media information spaces
and architectures
 interactive paper and augmented reality
 multimodal and multi-touch interaction
 Content of the Seminar
 history of search and web search engines
 search engine optimisation (SEO) and
search engine marketing (SEM)
 current and future trends in web search
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3September 5, 2011
Early "Documents"
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4September 5, 2011
Papyrus
 Greeks and Romans
stored information on
papyrus scrolls
 Tags with a summary of
the content facilitated the
retrieval of information
 Table of content was
introduced around 100 BC
 Parchment (vellum) came
up as an alternative
 bound in book form
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5September 5, 2011
Paper
 Invented in China (105 AD)
 Brought to Europe only in
the twelfth century
 Took another 300 years
before paper became the
major writing material
 How long will we still use
paper?
 electronic paper vs.
augmented paper
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6September 5, 2011
Printing Press
 Johann Gutenberg
invented the printing press
in 1450
 Gutenberg Bible published
in 1455
 Growing libraries and
need to search for
information
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7September 5, 2011
Reading Wheel (Bookwheel)
 Described by Agostino
Ramelli in 1588
 Keep several books open
to read from them at the
same time
 comparable to modern
tabbed browsing
 The reading wheel has
never really been built
 Could be seen as a
predecessor of hypertext
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8September 5, 2011
Dewey Decimal Classification (DDC)
 Library classification
system
 developed by Melvil Dewey
in 1876
 Hierarchical classification
 10 main classes with
10 divisions each and
10 sections per division
 total of 1000 sections
 often separate fiction section
 Documents can appear in
more than one class
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9September 5, 2011
Dewey Decimal Classification (DDC) ...
 After the three numbers,
decimals can be used for
further subclassification
 Different Alternatives
 Library of Congress
classification
 Universal Decimal
Classification (UDC)
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10September 5, 2011
Dewey Decimal Classification (DDC) ...
000-099 Computer Science, Information and General Works
000 Computer Science, Knowledge and Systems
000 Computer Science, Knowledge and General Works
...
005 Computer Programming, Programs and Data
...
009 [Unassigned]
010 Bibliographies
...
100-199 Philosophy and Psychology
200-299 Religion
300-399 Social Sciences
340 Law
341 International Law
400-499 Language
500-599 Science
600-699 Technology
700-799 Arts
800-899 Literature
900-999 History, Geography and Biography
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11September 5, 2011
"As We May Think" (1945)
... When data of any sort are placed in
storage, they are filed alphabetically
or numerically, and information is
found (when it is) by tracing it down
from subclass to subclass. It can be in
only one place, unless duplicates are
used; one has to have rules as to which
path will locate it, and the rules are
cumbersome. Having found one
item, moreover, one has to emerge from
the system and re-enter on a
new path. The human mind does not work
that way. It operates by association.
...
Vannevar Bush
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12September 5, 2011
"As We May Think" (1945) …
... It affords an immediate step,
however, to associative indexing, the
basic idea of which is a
provision whereby any item may be
caused at will to select immediately
and automatically another. This is the
essential feature of the memex. The
process of tying two items together is
the important thing. ...
Vannevar Bush, As We May Think,
Atlanic Monthly, July 1945
Vannevar Bush
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13September 5, 2011
"As We May Think" (1945) …
 Bush's article 'As We My Think'
(1945) is often seen as
the “origin" of hypertext
 Article introduces the Memex
 prototypical hypertext machine
 store and access information
 follow cross-references in the form
of associative trails between pieces
of information (microfilms)
 trail blazers are those who find
delight in the task of establishing
useful trails
Memex
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14September 5, 2011
Memex Movie
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15September 5, 2011
Hypertext (1965)
 Ted Nelson coined the term hypertext
 Nelson started Project Xanadu in 1960
 first hypertext project
 nonsequential writing
 referencing/embedding parts of a document
in another document (transclusion)
 transpointing windows
 bidirectional (bivisible) links
 version and rights management
 XanaduSpace 1.0 was released as part of Project
Xanadu in 2007
Ted Nelson
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16September 5, 2011
World Wide Web (WWW)
 Networked hypertext system
(over ARPANET) to share in-
formation at CERN
 first draft in March 1989
 The Information Mine,
Information Mesh, …?
 Components by end of 1990
 HyperText Transfer Protocol (HTTP)
 HyperText Markup Language (HTML)
 HTTP server software
 Web browser (WorldWideWeb)
 First public "release" in August 1991
Tim Berners-Lee Robert Cailliau
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17September 5, 2011
Search Engine History
 Early "search engines" include various systems
starting with Bush's Memex
 Archie (1990)
 first Internet search engine
 indexing of files on FTP servers
 W3Catalog (September 1993)
 first "web search engine"
 mirroring and integration of manually maintained catalogues
 JumpStation (December 1993)
 first web search engine combining crawling, indexing and
searching
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18September 5, 2011
Search Engine History ...
 In the following two years (1994/1995) many
new search engines appeared
 AltaVista, Infoseek, Excite, Inktomi, Yahoo!, ...
 Two categories of early Web search solutions
 full text search
- based on an index that is automatically created by a web crawler in
combination with an indexer
- e.g. AltaVista or InfoSeek
 manually maintained classification (hierarchy) of webpages
- significant human editing effort
- e.g. Yahoo
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19September 5, 2011
Information Retrieval
 Precision and recall can be used to measure the
performance of different information retrieval algorithms
   
 documentsretrieved
documentsretrieveddocumentsrelevant
precision


   
 documentsrelevant
documentsretrieveddocumentsrelevant
recall


D1 D2 D4
D6 D7 D10
D3 D5
D8 D9
D1 D3 D8
D9 D10
query
6.0
5
3
precision 
75.0
4
3
recall 
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20September 5, 2011
Information Retrieval ...
 Often a combination of precision and recall, the so-called
F-score (harmonic mean) is used as a single measure
D1 D2 D4
D6 D7 D10
D3 D5
D8 D9
D1 D3
D8 D9 D10
query
57.0precision
1recall
recallprecision
recallprecision
2scoreF



D1 D2 D4
D6 D7 D10
D3 D5
D8 D9
D1 D3 D8
D9 D10
query
6.0precision
75.0recall
67.0score-F 
D5D2
73.0score-F 
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21September 5, 2011
Bank
Delhaize
Ghent
Metro
Shopping
Train
D1 D2 D3 D4 D5 D6
1
Boolean Model
 Based on set theory and boolean logic
 Exact matching of documents to a user query
 Uses the boolean AND, OR and NOT operators
 query: Shopping AND Ghent AND NOT Delhaize
 computation: 101110 AND 100111 AND 000111 = 000110
 result: document set {D4,D5}
1 0 0 1 1
1
1
0
1
1
1
0
0
1
0
0
1
1
1
0
0
1
0
1
1
0
1
0
1
0
0
1
0
0
0
... ... ... ... ... ... ...
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22September 5, 2011
Boolean Model ...
 Advantages
 relatively easy to implement and scalable
 fast query processing based on parallel scanning of indexes
 Disadvantages
 does not pay attention to synonymy
 does not pay attention to polysemy
 no ranking of output
 often the user has to learn a special syntax such as the use of
double quotes to search for phrases
 Variants of the boolean model form the basis for many
search engines
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23September 5, 2011
Vector Space Model
 Algebraic model representing text documents and
queries as vectors based on the index terms
 one dimension for each term
 Compute the similarity (angle) between the query vector
and the document vectors
 Advantages
 simple model based on linear algebra
 partial matching with relevance scoring for results
 potenial query reevaluation based on user relevance feedback
 Disadvantages
 computationally expensive (similarity measures for each query)
 limited scalability
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24September 5, 2011
Web Search Engines
 Most web search engines are based on traditional
information retrieval techniques but they have to be
adapted to deal with the characteristics of the the Web
 immense amount of web resources (>50 billion webpages)
 hyperlinked resources
 dynamic content with frequent updates
 self-organised web resources
 Evaluation of performance
 no standard collections
 often based on user studies (satisfaction)
 Of course not only the precision and recall but also the
query answer time is an important issue
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25September 5, 2011
What About Old Content?
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26September 5, 2011
The Internet Archive
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27September 5, 2011
Web Crawler
 A web crawler or spider is used to create an
index of webpages to be used by a web search engine
 any web search is then based on this index
 Web crawler has to deal with the following issues
 freshness
- the index should be updated regularly (based on webpage update frequency)
 quality
- since not all webpages can be indexed, the crawler should give priority to
"high quality" pages
 scalabilty
- it should be possible to increase the crawl rate by just adding additional
servers (modular architecture)
- e.g. the estimated number of Google servers in 2007 was 1'000'000 (including
not only the crawler but the entire Google platform)
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28September 5, 2011
Web Crawler ...
 distribution
- the crawler should be able to run in a distributed manner (computer centers all
over the world)
 robustness
- the Web contains a lot of pages with errors and a crawler has to deal with
these problems
- e.g. deal with a web server that creates an unlimited number of "virtual web
pages" (crawler trap)
 efficiency
- resources (e.g. network bandwidth) should be used in a most efficient way
 crawl rates
- the crawler should pay attention to existing web server policies
(e.g. revisit-after HTML meta tag or robots.txt file)
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/ robots.txt
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29September 5, 2011
Web Search Engine Architecture
WWW Crawler
URL Pool
Storage
Manager
Page
Repository
content already added?
Document
Index
Special
Indexes
IndexersURL Handler
URL
Repository
filter
normalisation
and duplicate
elimination
Client
Query
Handler
inverted index
Ranking
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30September 5, 2011
Pre-1998 Web Search
 Find all documents for a given query term
 use information retrieval (IR) solutions
- boolean model
- vector space model
- ...
 ranking based on "on-page factors"
 problem: poor quality of search results (order)
 Larry Page and Sergey Brin proposed to compute the
absolute quality of a page called PageRank
 based on the number and quality of pages linking
to a page (votes)
 query-independent
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31September 5, 2011
Origins of PageRank
 Developed as part of an
academic project at Stanford
University
 research platform to aid under-
standing of large-scale web data
and enable researchers to easily
experiment with new search
technologies
 Larry Page and Sergey Brin worked on the project about a new
kind of search engine (1995-1998) which finally led to a functional
prototype called Google
Larry Page Sergey Brin
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32September 5, 2011
PageRank
 A page Pi has a high PageRank Ri if
 there are many pages linking to it
 or, if there are some pages with a high PageRank linking to it
 Total score = IR score × PageRank
P1
R1
P2
R2
P3
R3
P4
R4
P5
R5
P6
R6
P7
R7
P8
R8
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33September 5, 2011
Basic PageRank Algorithm
 where
 Bi is the set of pages
that link to page Pi
 Lj is the number of
outgoing links for page Pj


ij BP j
j
i
L
PR
PR
)(
)(
P1 P2
P3
P1
1
P2
1
P3
1
P1
1.5
P2
1.5
P3
0.75
P1
1.5
P2
1.5
P3
0.75
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34September 5, 2011
Matrix Representation
 Let us define a hyperlink
matrix H
P1 P2
P3


 

otherwise0
if1 ijj
ij
BPL
H











0210
001
1210
H
  iPRRand
HRR 
R is an eigenvector of H
with eigenvalue 1

Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35September 5, 2011
Matrix Representation ...
 We can use the power method to find R
 sparse matrix H with 40 billion columns and rows but only an
average of 10 non-zero entries in each colum
tt
HRR 1











0210
001
1210
HFor our example
this results in or 122R  2.04.04.0
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36September 5, 2011
Dangling Pages (Rank Sink)
 Problem with pages that
have no outbound links (e.g. P2)
 Stochastic adjustment
 if page Pj has no outgoing links then replace column j with 1/Lj
 New stochastic matrix S always has a stationary vector R
 can also be interpreted as a markov chain
P1 P2







01
00
H and  00R







210
210
C 






211
210
CHSand
C
C
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37September 5, 2011
Strongly Connected Pages (Graph)
 Add new transition proba-
bilities between all pages
 with probability d we follow
the hyperlink structure S
 with probability 1-d we
choose a random page
 matrix G becomes irreducible
 Google matrix G reflects
a random surfer
 no modelling of back button
P1 P2
P3P4
P5
  1SG
n
dd
1
1 GRR 
1-d
1-d 1-d
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38September 5, 2011
Examples   1SG
n
dd
1
1
A1
0.26
A2
0.37
A3
0.37
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 39September 5, 2011
Examples ...
A1
0.13
A2
0.185
A3
0.185
B1
0.13
B2
0.185
B3
0.185
  5.0AP   5.0BP
  1SG
n
dd
1
1
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 40September 5, 2011
Examples
 PageRank leakage
A1
0.10
A2
0.14
A3
0.14
B1
0.22
B2
0.20
B3
0.20
  38.0AP   62.0BP
  1SG
n
dd
1
1
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 41September 5, 2011
Examples ...
A1
0.3
A2
0.23
A3
0.18
B1
0.10
B2
0.095
B3
0.095
  71.0AP   29.0BP
  1SG
n
dd
1
1
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 42September 5, 2011
Examples
 PageRank feedback
A1
0.35
A2
0.24
A3
0.18
B1
0.09
B2
0.07
B3
0.07
  77.0AP   23.0BP
  1SG
n
dd
1
1
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 43September 5, 2011
Examples ...
A1
0.33
A2
0.17
A3
0.175
B1
0.08
B2
0.06
B3
0.06
  80.0AP
  20.0BPA4
0.125
  1SG
n
dd
1
1
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 44September 5, 2011
Implications for Website Development
 First make sure that your page gets indexed
 on-page factors
 Think about your site's internal link structure
 create many internal links for important pages
 be "careful" about where to put outgoing links
 Increase the number of pages
 Ensure that webpages are addressed consistently
 http://www.vub.ac.be  http://www.vub.ac.be/index.php
 Make sure that you get incoming links from good
websites
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 45September 5, 2011
Tools
 Google toolbar
 shows logarithmic PageRank value (from 0 to 10)
 information not frequently updated (google dance)
 Google webmaster tools
 accepts a sitemap (XML document) with the structure of a website
 variety of reports that help to improve the quality of a website
- meta description issues
- title tag issues
- non-indexable content issues
- number and URLs of indexed pages
- number and URLs of inbound/outbound links
- ...
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 46September 5, 2011
Questions
 Is PageRank fair?
 What about Google's power and influence?
 What about Web 2.0 or Web 3.0 and web search?
 "non-existent" webpages such as offered by Rich Internet
Applications (e.g. Ajax) may bring problems for traditional search
engines (hidden web)
 new forms of social search
- Wikia Search
- Delicious
- ...
 social marketing
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 47September 5, 2011
HITS Algorithm
 Hypertext Induced Topic Search
 Jon Kleinberg
 developed around the same time when
Page and Brin invented PageRank
 Uses the link structure like PageRank to
compute a popularity score
 Differences from PageRank
 two popularity values for each page (hub and authority score)
 note that the values are not query-independent
 user gets a ranked hub and authority list
Jon Kleinberg
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 48September 5, 2011
HITS Algorithm ...
 Good authorities are linked by good hubs and good hubs
link to good authorities
 Compute impact of authorities and hubs similar to
PageRank (but only on limited set of result pages!)
P1 P2
Authority Hub
initialise each page with an authority and hub score of 1
repeat {
compute new authority scores
compute new hub scores
normalise authority and hub scores
}
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 49September 5, 2011
Meta Search Engines
 Search tool that sends a query to multiple search
engines
 Aggregates the individual results on a single result page
 metacrawler is an example of a meta search engine that
uses different search engines (Google, Bing, Yahoo!, ...)
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 50September 5, 2011
Search Engine Market Share
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 51September 5, 2011
Conclusions
 Web information retrieval techniques have to deal with
the specific characteristics of the Web
 PageRank algorithm
 absolute quality of a page based on incoming links
 based on random surfer model
 computed as eigenvector of Google matrix G
 PageRank is just one (important) factor
 Implications for website development and SEO
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 52September 5, 2011
References
 Vannevar Bush, As We May Think, Atlanic Monthly,
July 1945
 http://www.theatlantic.com/doc/194507/bush/
 http://sloan.stanford.edu/MouseSite/Secondary.html
 L. Page, S. Brin, R. Motwani and T. Winograd,
The PageRank Citation Ranking: Bringing Order
to the Web, January 1998
 S. Brin and L. Page, The Anatomy of a Large-Scale
Hypertextual Web Search Engine, Computer Networks
and ISDN Systems, 30(1-7), April 1998
Beat Signer - Department of Computer Science - bsigner@vub.ac.be 53September 5, 2011
References …
 Amy N. Langville and Carl D. Meyer, Google's
PageRank and Beyond – The Science of Search Engine
Rankings, Princeton University Press, July 2006
 PageRank Calculator
 http://www.webworkshop.net/pagerank_calculator.php
 Google Webmaster Tools
 http://www.google.com/webmasters/
2 December 2005
Next Lecture
Search Engine Optimisation (SEO) and Search
Engine Marketing (SEM)

Contenu connexe

Similaire à History of Search and Web Search Engines - Seminar on Web Search

ECM Meets the Semantic Web - Nuxeo World 2011
ECM Meets the Semantic Web - Nuxeo World 2011ECM Meets the Semantic Web - Nuxeo World 2011
ECM Meets the Semantic Web - Nuxeo World 2011Stefane Fermigier
 
Seville2000
Seville2000Seville2000
Seville2000behem0t
 
12_N.Smolenski, M.Kostic, A.Sofronijevic
12_N.Smolenski, M.Kostic, A.Sofronijevic12_N.Smolenski, M.Kostic, A.Sofronijevic
12_N.Smolenski, M.Kostic, A.SofronijevicNikola Smolenski
 
Web 2.0 Basics - Lecture 06 - Web Information Systems (4011474FNR)
Web 2.0 Basics - Lecture 06 - Web Information Systems (4011474FNR)Web 2.0 Basics - Lecture 06 - Web Information Systems (4011474FNR)
Web 2.0 Basics - Lecture 06 - Web Information Systems (4011474FNR)Beat Signer
 
Introduction - Web Technologies (1019888BNR)
Introduction - Web Technologies (1019888BNR)Introduction - Web Technologies (1019888BNR)
Introduction - Web Technologies (1019888BNR)Beat Signer
 
Introduction - Lecture 1 - Advanced Topics in Information Systems (WE-DINF-15...
Introduction - Lecture 1 - Advanced Topics in Information Systems (WE-DINF-15...Introduction - Lecture 1 - Advanced Topics in Information Systems (WE-DINF-15...
Introduction - Lecture 1 - Advanced Topics in Information Systems (WE-DINF-15...Beat Signer
 
Tech Trans as Learning
Tech Trans as LearningTech Trans as Learning
Tech Trans as LearningVidensemergens
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spacesMounia Lalmas-Roelleke
 
Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...Olaf Janssen
 
A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)Raphael Troncy
 
Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010Roku
 
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...Alexandre Porcelli
 
Another history of the Web from its architecture
Another history of the Web from its architectureAnother history of the Web from its architecture
Another history of the Web from its architectureAlexandre Monnin
 
Reflections on 10 years of the Institutional Web
Reflections on 10 years of the Institutional WebReflections on 10 years of the Institutional Web
Reflections on 10 years of the Institutional Weblisbk
 
Web Based Solution For Oyigbo Football Club
Web Based Solution For Oyigbo Football ClubWeb Based Solution For Oyigbo Football Club
Web Based Solution For Oyigbo Football ClubToya Shamberger
 
The JISC Information Environment and collection description
The JISC Information Environment and collection descriptionThe JISC Information Environment and collection description
The JISC Information Environment and collection descriptionAndy Powell
 
Increasing NUS Libraries' Visibility in the Virtual World
Increasing NUS Libraries' Visibility in the Virtual WorldIncreasing NUS Libraries' Visibility in the Virtual World
Increasing NUS Libraries' Visibility in the Virtual WorldKC Tan
 
Digital Libraries, K. Stefanov
Digital Libraries, K. StefanovDigital Libraries, K. Stefanov
Digital Libraries, K. StefanovErik Axdorph
 

Similaire à History of Search and Web Search Engines - Seminar on Web Search (20)

ECM Meets the Semantic Web - Nuxeo World 2011
ECM Meets the Semantic Web - Nuxeo World 2011ECM Meets the Semantic Web - Nuxeo World 2011
ECM Meets the Semantic Web - Nuxeo World 2011
 
Seville2000
Seville2000Seville2000
Seville2000
 
12_N.Smolenski, M.Kostic, A.Sofronijevic
12_N.Smolenski, M.Kostic, A.Sofronijevic12_N.Smolenski, M.Kostic, A.Sofronijevic
12_N.Smolenski, M.Kostic, A.Sofronijevic
 
Web Of Books
Web Of BooksWeb Of Books
Web Of Books
 
Web 2.0 Basics - Lecture 06 - Web Information Systems (4011474FNR)
Web 2.0 Basics - Lecture 06 - Web Information Systems (4011474FNR)Web 2.0 Basics - Lecture 06 - Web Information Systems (4011474FNR)
Web 2.0 Basics - Lecture 06 - Web Information Systems (4011474FNR)
 
Introduction - Web Technologies (1019888BNR)
Introduction - Web Technologies (1019888BNR)Introduction - Web Technologies (1019888BNR)
Introduction - Web Technologies (1019888BNR)
 
Introduction - Lecture 1 - Advanced Topics in Information Systems (WE-DINF-15...
Introduction - Lecture 1 - Advanced Topics in Information Systems (WE-DINF-15...Introduction - Lecture 1 - Advanced Topics in Information Systems (WE-DINF-15...
Introduction - Lecture 1 - Advanced Topics in Information Systems (WE-DINF-15...
 
Tech Trans as Learning
Tech Trans as LearningTech Trans as Learning
Tech Trans as Learning
 
Aggregation for searching complex information spaces
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spaces
 
Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...
 
A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)A Semantic Multimedia Web (Part 3)
A Semantic Multimedia Web (Part 3)
 
Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010Ranking the Linked Data: the case of DBpedia - ICWE 2010
Ranking the Linked Data: the case of DBpedia - ICWE 2010
 
Semantic Technologies for Cultural Heritage
Semantic Technologies for Cultural HeritageSemantic Technologies for Cultural Heritage
Semantic Technologies for Cultural Heritage
 
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
A importância dos dados em sua arquitetura... uma visão muito além do SQL Ser...
 
Another history of the Web from its architecture
Another history of the Web from its architectureAnother history of the Web from its architecture
Another history of the Web from its architecture
 
Reflections on 10 years of the Institutional Web
Reflections on 10 years of the Institutional WebReflections on 10 years of the Institutional Web
Reflections on 10 years of the Institutional Web
 
Web Based Solution For Oyigbo Football Club
Web Based Solution For Oyigbo Football ClubWeb Based Solution For Oyigbo Football Club
Web Based Solution For Oyigbo Football Club
 
The JISC Information Environment and collection description
The JISC Information Environment and collection descriptionThe JISC Information Environment and collection description
The JISC Information Environment and collection description
 
Increasing NUS Libraries' Visibility in the Virtual World
Increasing NUS Libraries' Visibility in the Virtual WorldIncreasing NUS Libraries' Visibility in the Virtual World
Increasing NUS Libraries' Visibility in the Virtual World
 
Digital Libraries, K. Stefanov
Digital Libraries, K. StefanovDigital Libraries, K. Stefanov
Digital Libraries, K. Stefanov
 

Plus de Beat Signer

Introduction - Lecture 1 - Human-Computer Interaction (1023841ANR)
Introduction - Lecture 1 - Human-Computer Interaction (1023841ANR)Introduction - Lecture 1 - Human-Computer Interaction (1023841ANR)
Introduction - Lecture 1 - Human-Computer Interaction (1023841ANR)Beat Signer
 
Indoor Positioning Using the OpenHPS Framework
Indoor Positioning Using the OpenHPS FrameworkIndoor Positioning Using the OpenHPS Framework
Indoor Positioning Using the OpenHPS FrameworkBeat Signer
 
Personalised Learning Environments Based on Knowledge Graphs and the Zone of ...
Personalised Learning Environments Based on Knowledge Graphs and the Zone of ...Personalised Learning Environments Based on Knowledge Graphs and the Zone of ...
Personalised Learning Environments Based on Knowledge Graphs and the Zone of ...Beat Signer
 
Cross-Media Technologies and Applications - Future Directions for Personal In...
Cross-Media Technologies and Applications - Future Directions for Personal In...Cross-Media Technologies and Applications - Future Directions for Personal In...
Cross-Media Technologies and Applications - Future Directions for Personal In...Beat Signer
 
Bridging the Gap: Managing and Interacting with Information Across Media Boun...
Bridging the Gap: Managing and Interacting with Information Across Media Boun...Bridging the Gap: Managing and Interacting with Information Across Media Boun...
Bridging the Gap: Managing and Interacting with Information Across Media Boun...Beat Signer
 
Codeschool in a Box: A Low-Barrier Approach to Packaging Programming Curricula
Codeschool in a Box: A Low-Barrier Approach to Packaging Programming CurriculaCodeschool in a Box: A Low-Barrier Approach to Packaging Programming Curricula
Codeschool in a Box: A Low-Barrier Approach to Packaging Programming CurriculaBeat Signer
 
The RSL Hypermedia Metamodel and Its Application in Cross-Media Solutions
The RSL Hypermedia Metamodel and Its Application in Cross-Media Solutions The RSL Hypermedia Metamodel and Its Application in Cross-Media Solutions
The RSL Hypermedia Metamodel and Its Application in Cross-Media Solutions Beat Signer
 
Case Studies and Course Review - Lecture 12 - Information Visualisation (4019...
Case Studies and Course Review - Lecture 12 - Information Visualisation (4019...Case Studies and Course Review - Lecture 12 - Information Visualisation (4019...
Case Studies and Course Review - Lecture 12 - Information Visualisation (4019...Beat Signer
 
Dashboards - Lecture 11 - Information Visualisation (4019538FNR)
Dashboards - Lecture 11 - Information Visualisation (4019538FNR)Dashboards - Lecture 11 - Information Visualisation (4019538FNR)
Dashboards - Lecture 11 - Information Visualisation (4019538FNR)Beat Signer
 
Interaction - Lecture 10 - Information Visualisation (4019538FNR)
Interaction - Lecture 10 - Information Visualisation (4019538FNR)Interaction - Lecture 10 - Information Visualisation (4019538FNR)
Interaction - Lecture 10 - Information Visualisation (4019538FNR)Beat Signer
 
View Manipulation and Reduction - Lecture 9 - Information Visualisation (4019...
View Manipulation and Reduction - Lecture 9 - Information Visualisation (4019...View Manipulation and Reduction - Lecture 9 - Information Visualisation (4019...
View Manipulation and Reduction - Lecture 9 - Information Visualisation (4019...Beat Signer
 
Visualisation Techniques - Lecture 8 - Information Visualisation (4019538FNR)
Visualisation Techniques - Lecture 8 - Information Visualisation (4019538FNR)Visualisation Techniques - Lecture 8 - Information Visualisation (4019538FNR)
Visualisation Techniques - Lecture 8 - Information Visualisation (4019538FNR)Beat Signer
 
Design Guidelines and Principles - Lecture 7 - Information Visualisation (401...
Design Guidelines and Principles - Lecture 7 - Information Visualisation (401...Design Guidelines and Principles - Lecture 7 - Information Visualisation (401...
Design Guidelines and Principles - Lecture 7 - Information Visualisation (401...Beat Signer
 
Data Processing and Visualisation Frameworks - Lecture 6 - Information Visual...
Data Processing and Visualisation Frameworks - Lecture 6 - Information Visual...Data Processing and Visualisation Frameworks - Lecture 6 - Information Visual...
Data Processing and Visualisation Frameworks - Lecture 6 - Information Visual...Beat Signer
 
Data Presentation - Lecture 5 - Information Visualisation (4019538FNR)
Data Presentation - Lecture 5 - Information Visualisation (4019538FNR)Data Presentation - Lecture 5 - Information Visualisation (4019538FNR)
Data Presentation - Lecture 5 - Information Visualisation (4019538FNR)Beat Signer
 
Analysis and Validation - Lecture 4 - Information Visualisation (4019538FNR)
Analysis and Validation - Lecture 4 - Information Visualisation (4019538FNR)Analysis and Validation - Lecture 4 - Information Visualisation (4019538FNR)
Analysis and Validation - Lecture 4 - Information Visualisation (4019538FNR)Beat Signer
 
Data Representation - Lecture 3 - Information Visualisation (4019538FNR)
Data Representation - Lecture 3 - Information Visualisation (4019538FNR)Data Representation - Lecture 3 - Information Visualisation (4019538FNR)
Data Representation - Lecture 3 - Information Visualisation (4019538FNR)Beat Signer
 
Human Perception and Colour Theory - Lecture 2 - Information Visualisation (4...
Human Perception and Colour Theory - Lecture 2 - Information Visualisation (4...Human Perception and Colour Theory - Lecture 2 - Information Visualisation (4...
Human Perception and Colour Theory - Lecture 2 - Information Visualisation (4...Beat Signer
 
Introduction - Lecture 1 - Information Visualisation (4019538FNR)
Introduction - Lecture 1 - Information Visualisation (4019538FNR)Introduction - Lecture 1 - Information Visualisation (4019538FNR)
Introduction - Lecture 1 - Information Visualisation (4019538FNR)Beat Signer
 
Towards a Framework for Dynamic Data Physicalisation
Towards a Framework for Dynamic Data PhysicalisationTowards a Framework for Dynamic Data Physicalisation
Towards a Framework for Dynamic Data PhysicalisationBeat Signer
 

Plus de Beat Signer (20)

Introduction - Lecture 1 - Human-Computer Interaction (1023841ANR)
Introduction - Lecture 1 - Human-Computer Interaction (1023841ANR)Introduction - Lecture 1 - Human-Computer Interaction (1023841ANR)
Introduction - Lecture 1 - Human-Computer Interaction (1023841ANR)
 
Indoor Positioning Using the OpenHPS Framework
Indoor Positioning Using the OpenHPS FrameworkIndoor Positioning Using the OpenHPS Framework
Indoor Positioning Using the OpenHPS Framework
 
Personalised Learning Environments Based on Knowledge Graphs and the Zone of ...
Personalised Learning Environments Based on Knowledge Graphs and the Zone of ...Personalised Learning Environments Based on Knowledge Graphs and the Zone of ...
Personalised Learning Environments Based on Knowledge Graphs and the Zone of ...
 
Cross-Media Technologies and Applications - Future Directions for Personal In...
Cross-Media Technologies and Applications - Future Directions for Personal In...Cross-Media Technologies and Applications - Future Directions for Personal In...
Cross-Media Technologies and Applications - Future Directions for Personal In...
 
Bridging the Gap: Managing and Interacting with Information Across Media Boun...
Bridging the Gap: Managing and Interacting with Information Across Media Boun...Bridging the Gap: Managing and Interacting with Information Across Media Boun...
Bridging the Gap: Managing and Interacting with Information Across Media Boun...
 
Codeschool in a Box: A Low-Barrier Approach to Packaging Programming Curricula
Codeschool in a Box: A Low-Barrier Approach to Packaging Programming CurriculaCodeschool in a Box: A Low-Barrier Approach to Packaging Programming Curricula
Codeschool in a Box: A Low-Barrier Approach to Packaging Programming Curricula
 
The RSL Hypermedia Metamodel and Its Application in Cross-Media Solutions
The RSL Hypermedia Metamodel and Its Application in Cross-Media Solutions The RSL Hypermedia Metamodel and Its Application in Cross-Media Solutions
The RSL Hypermedia Metamodel and Its Application in Cross-Media Solutions
 
Case Studies and Course Review - Lecture 12 - Information Visualisation (4019...
Case Studies and Course Review - Lecture 12 - Information Visualisation (4019...Case Studies and Course Review - Lecture 12 - Information Visualisation (4019...
Case Studies and Course Review - Lecture 12 - Information Visualisation (4019...
 
Dashboards - Lecture 11 - Information Visualisation (4019538FNR)
Dashboards - Lecture 11 - Information Visualisation (4019538FNR)Dashboards - Lecture 11 - Information Visualisation (4019538FNR)
Dashboards - Lecture 11 - Information Visualisation (4019538FNR)
 
Interaction - Lecture 10 - Information Visualisation (4019538FNR)
Interaction - Lecture 10 - Information Visualisation (4019538FNR)Interaction - Lecture 10 - Information Visualisation (4019538FNR)
Interaction - Lecture 10 - Information Visualisation (4019538FNR)
 
View Manipulation and Reduction - Lecture 9 - Information Visualisation (4019...
View Manipulation and Reduction - Lecture 9 - Information Visualisation (4019...View Manipulation and Reduction - Lecture 9 - Information Visualisation (4019...
View Manipulation and Reduction - Lecture 9 - Information Visualisation (4019...
 
Visualisation Techniques - Lecture 8 - Information Visualisation (4019538FNR)
Visualisation Techniques - Lecture 8 - Information Visualisation (4019538FNR)Visualisation Techniques - Lecture 8 - Information Visualisation (4019538FNR)
Visualisation Techniques - Lecture 8 - Information Visualisation (4019538FNR)
 
Design Guidelines and Principles - Lecture 7 - Information Visualisation (401...
Design Guidelines and Principles - Lecture 7 - Information Visualisation (401...Design Guidelines and Principles - Lecture 7 - Information Visualisation (401...
Design Guidelines and Principles - Lecture 7 - Information Visualisation (401...
 
Data Processing and Visualisation Frameworks - Lecture 6 - Information Visual...
Data Processing and Visualisation Frameworks - Lecture 6 - Information Visual...Data Processing and Visualisation Frameworks - Lecture 6 - Information Visual...
Data Processing and Visualisation Frameworks - Lecture 6 - Information Visual...
 
Data Presentation - Lecture 5 - Information Visualisation (4019538FNR)
Data Presentation - Lecture 5 - Information Visualisation (4019538FNR)Data Presentation - Lecture 5 - Information Visualisation (4019538FNR)
Data Presentation - Lecture 5 - Information Visualisation (4019538FNR)
 
Analysis and Validation - Lecture 4 - Information Visualisation (4019538FNR)
Analysis and Validation - Lecture 4 - Information Visualisation (4019538FNR)Analysis and Validation - Lecture 4 - Information Visualisation (4019538FNR)
Analysis and Validation - Lecture 4 - Information Visualisation (4019538FNR)
 
Data Representation - Lecture 3 - Information Visualisation (4019538FNR)
Data Representation - Lecture 3 - Information Visualisation (4019538FNR)Data Representation - Lecture 3 - Information Visualisation (4019538FNR)
Data Representation - Lecture 3 - Information Visualisation (4019538FNR)
 
Human Perception and Colour Theory - Lecture 2 - Information Visualisation (4...
Human Perception and Colour Theory - Lecture 2 - Information Visualisation (4...Human Perception and Colour Theory - Lecture 2 - Information Visualisation (4...
Human Perception and Colour Theory - Lecture 2 - Information Visualisation (4...
 
Introduction - Lecture 1 - Information Visualisation (4019538FNR)
Introduction - Lecture 1 - Information Visualisation (4019538FNR)Introduction - Lecture 1 - Information Visualisation (4019538FNR)
Introduction - Lecture 1 - Information Visualisation (4019538FNR)
 
Towards a Framework for Dynamic Data Physicalisation
Towards a Framework for Dynamic Data PhysicalisationTowards a Framework for Dynamic Data Physicalisation
Towards a Framework for Dynamic Data Physicalisation
 

Dernier

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 

Dernier (20)

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 

History of Search and Web Search Engines - Seminar on Web Search

  • 1. 2 December 2005 Seminar on Web Search History of Search and Web Search Engines Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel http://vub.academia.edu/BeatSigner
  • 2. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 2September 5, 2011 Seminar Organisation  Prof. Beat Signer WISE Lab, Vrije Universiteit Brussel bsigner@vub.ac.be  cross-media information spaces and architectures  interactive paper and augmented reality  multimodal and multi-touch interaction  Content of the Seminar  history of search and web search engines  search engine optimisation (SEO) and search engine marketing (SEM)  current and future trends in web search
  • 3. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 3September 5, 2011 Early "Documents"
  • 4. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 4September 5, 2011 Papyrus  Greeks and Romans stored information on papyrus scrolls  Tags with a summary of the content facilitated the retrieval of information  Table of content was introduced around 100 BC  Parchment (vellum) came up as an alternative  bound in book form
  • 5. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 5September 5, 2011 Paper  Invented in China (105 AD)  Brought to Europe only in the twelfth century  Took another 300 years before paper became the major writing material  How long will we still use paper?  electronic paper vs. augmented paper
  • 6. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 6September 5, 2011 Printing Press  Johann Gutenberg invented the printing press in 1450  Gutenberg Bible published in 1455  Growing libraries and need to search for information
  • 7. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 7September 5, 2011 Reading Wheel (Bookwheel)  Described by Agostino Ramelli in 1588  Keep several books open to read from them at the same time  comparable to modern tabbed browsing  The reading wheel has never really been built  Could be seen as a predecessor of hypertext
  • 8. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 8September 5, 2011 Dewey Decimal Classification (DDC)  Library classification system  developed by Melvil Dewey in 1876  Hierarchical classification  10 main classes with 10 divisions each and 10 sections per division  total of 1000 sections  often separate fiction section  Documents can appear in more than one class
  • 9. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 9September 5, 2011 Dewey Decimal Classification (DDC) ...  After the three numbers, decimals can be used for further subclassification  Different Alternatives  Library of Congress classification  Universal Decimal Classification (UDC)
  • 10. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 10September 5, 2011 Dewey Decimal Classification (DDC) ... 000-099 Computer Science, Information and General Works 000 Computer Science, Knowledge and Systems 000 Computer Science, Knowledge and General Works ... 005 Computer Programming, Programs and Data ... 009 [Unassigned] 010 Bibliographies ... 100-199 Philosophy and Psychology 200-299 Religion 300-399 Social Sciences 340 Law 341 International Law 400-499 Language 500-599 Science 600-699 Technology 700-799 Arts 800-899 Literature 900-999 History, Geography and Biography
  • 11. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 11September 5, 2011 "As We May Think" (1945) ... When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path. The human mind does not work that way. It operates by association. ... Vannevar Bush
  • 12. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 12September 5, 2011 "As We May Think" (1945) … ... It affords an immediate step, however, to associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. This is the essential feature of the memex. The process of tying two items together is the important thing. ... Vannevar Bush, As We May Think, Atlanic Monthly, July 1945 Vannevar Bush
  • 13. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 13September 5, 2011 "As We May Think" (1945) …  Bush's article 'As We My Think' (1945) is often seen as the “origin" of hypertext  Article introduces the Memex  prototypical hypertext machine  store and access information  follow cross-references in the form of associative trails between pieces of information (microfilms)  trail blazers are those who find delight in the task of establishing useful trails Memex
  • 14. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 14September 5, 2011 Memex Movie
  • 15. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 15September 5, 2011 Hypertext (1965)  Ted Nelson coined the term hypertext  Nelson started Project Xanadu in 1960  first hypertext project  nonsequential writing  referencing/embedding parts of a document in another document (transclusion)  transpointing windows  bidirectional (bivisible) links  version and rights management  XanaduSpace 1.0 was released as part of Project Xanadu in 2007 Ted Nelson
  • 16. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 16September 5, 2011 World Wide Web (WWW)  Networked hypertext system (over ARPANET) to share in- formation at CERN  first draft in March 1989  The Information Mine, Information Mesh, …?  Components by end of 1990  HyperText Transfer Protocol (HTTP)  HyperText Markup Language (HTML)  HTTP server software  Web browser (WorldWideWeb)  First public "release" in August 1991 Tim Berners-Lee Robert Cailliau
  • 17. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 17September 5, 2011 Search Engine History  Early "search engines" include various systems starting with Bush's Memex  Archie (1990)  first Internet search engine  indexing of files on FTP servers  W3Catalog (September 1993)  first "web search engine"  mirroring and integration of manually maintained catalogues  JumpStation (December 1993)  first web search engine combining crawling, indexing and searching
  • 18. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 18September 5, 2011 Search Engine History ...  In the following two years (1994/1995) many new search engines appeared  AltaVista, Infoseek, Excite, Inktomi, Yahoo!, ...  Two categories of early Web search solutions  full text search - based on an index that is automatically created by a web crawler in combination with an indexer - e.g. AltaVista or InfoSeek  manually maintained classification (hierarchy) of webpages - significant human editing effort - e.g. Yahoo
  • 19. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 19September 5, 2011 Information Retrieval  Precision and recall can be used to measure the performance of different information retrieval algorithms      documentsretrieved documentsretrieveddocumentsrelevant precision        documentsrelevant documentsretrieveddocumentsrelevant recall   D1 D2 D4 D6 D7 D10 D3 D5 D8 D9 D1 D3 D8 D9 D10 query 6.0 5 3 precision  75.0 4 3 recall 
  • 20. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 20September 5, 2011 Information Retrieval ...  Often a combination of precision and recall, the so-called F-score (harmonic mean) is used as a single measure D1 D2 D4 D6 D7 D10 D3 D5 D8 D9 D1 D3 D8 D9 D10 query 57.0precision 1recall recallprecision recallprecision 2scoreF    D1 D2 D4 D6 D7 D10 D3 D5 D8 D9 D1 D3 D8 D9 D10 query 6.0precision 75.0recall 67.0score-F  D5D2 73.0score-F 
  • 21. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 21September 5, 2011 Bank Delhaize Ghent Metro Shopping Train D1 D2 D3 D4 D5 D6 1 Boolean Model  Based on set theory and boolean logic  Exact matching of documents to a user query  Uses the boolean AND, OR and NOT operators  query: Shopping AND Ghent AND NOT Delhaize  computation: 101110 AND 100111 AND 000111 = 000110  result: document set {D4,D5} 1 0 0 1 1 1 1 0 1 1 1 0 0 1 0 0 1 1 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 0 0 ... ... ... ... ... ... ...
  • 22. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 22September 5, 2011 Boolean Model ...  Advantages  relatively easy to implement and scalable  fast query processing based on parallel scanning of indexes  Disadvantages  does not pay attention to synonymy  does not pay attention to polysemy  no ranking of output  often the user has to learn a special syntax such as the use of double quotes to search for phrases  Variants of the boolean model form the basis for many search engines
  • 23. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 23September 5, 2011 Vector Space Model  Algebraic model representing text documents and queries as vectors based on the index terms  one dimension for each term  Compute the similarity (angle) between the query vector and the document vectors  Advantages  simple model based on linear algebra  partial matching with relevance scoring for results  potenial query reevaluation based on user relevance feedback  Disadvantages  computationally expensive (similarity measures for each query)  limited scalability
  • 24. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 24September 5, 2011 Web Search Engines  Most web search engines are based on traditional information retrieval techniques but they have to be adapted to deal with the characteristics of the the Web  immense amount of web resources (>50 billion webpages)  hyperlinked resources  dynamic content with frequent updates  self-organised web resources  Evaluation of performance  no standard collections  often based on user studies (satisfaction)  Of course not only the precision and recall but also the query answer time is an important issue
  • 25. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 25September 5, 2011 What About Old Content?
  • 26. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 26September 5, 2011 The Internet Archive
  • 27. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 27September 5, 2011 Web Crawler  A web crawler or spider is used to create an index of webpages to be used by a web search engine  any web search is then based on this index  Web crawler has to deal with the following issues  freshness - the index should be updated regularly (based on webpage update frequency)  quality - since not all webpages can be indexed, the crawler should give priority to "high quality" pages  scalabilty - it should be possible to increase the crawl rate by just adding additional servers (modular architecture) - e.g. the estimated number of Google servers in 2007 was 1'000'000 (including not only the crawler but the entire Google platform)
  • 28. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 28September 5, 2011 Web Crawler ...  distribution - the crawler should be able to run in a distributed manner (computer centers all over the world)  robustness - the Web contains a lot of pages with errors and a crawler has to deal with these problems - e.g. deal with a web server that creates an unlimited number of "virtual web pages" (crawler trap)  efficiency - resources (e.g. network bandwidth) should be used in a most efficient way  crawl rates - the crawler should pay attention to existing web server policies (e.g. revisit-after HTML meta tag or robots.txt file) User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ robots.txt
  • 29. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 29September 5, 2011 Web Search Engine Architecture WWW Crawler URL Pool Storage Manager Page Repository content already added? Document Index Special Indexes IndexersURL Handler URL Repository filter normalisation and duplicate elimination Client Query Handler inverted index Ranking
  • 30. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 30September 5, 2011 Pre-1998 Web Search  Find all documents for a given query term  use information retrieval (IR) solutions - boolean model - vector space model - ...  ranking based on "on-page factors"  problem: poor quality of search results (order)  Larry Page and Sergey Brin proposed to compute the absolute quality of a page called PageRank  based on the number and quality of pages linking to a page (votes)  query-independent
  • 31. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 31September 5, 2011 Origins of PageRank  Developed as part of an academic project at Stanford University  research platform to aid under- standing of large-scale web data and enable researchers to easily experiment with new search technologies  Larry Page and Sergey Brin worked on the project about a new kind of search engine (1995-1998) which finally led to a functional prototype called Google Larry Page Sergey Brin
  • 32. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 32September 5, 2011 PageRank  A page Pi has a high PageRank Ri if  there are many pages linking to it  or, if there are some pages with a high PageRank linking to it  Total score = IR score × PageRank P1 R1 P2 R2 P3 R3 P4 R4 P5 R5 P6 R6 P7 R7 P8 R8
  • 33. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 33September 5, 2011 Basic PageRank Algorithm  where  Bi is the set of pages that link to page Pi  Lj is the number of outgoing links for page Pj   ij BP j j i L PR PR )( )( P1 P2 P3 P1 1 P2 1 P3 1 P1 1.5 P2 1.5 P3 0.75 P1 1.5 P2 1.5 P3 0.75
  • 34. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 34September 5, 2011 Matrix Representation  Let us define a hyperlink matrix H P1 P2 P3      otherwise0 if1 ijj ij BPL H            0210 001 1210 H   iPRRand HRR  R is an eigenvector of H with eigenvalue 1 
  • 35. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 35September 5, 2011 Matrix Representation ...  We can use the power method to find R  sparse matrix H with 40 billion columns and rows but only an average of 10 non-zero entries in each colum tt HRR 1            0210 001 1210 HFor our example this results in or 122R  2.04.04.0
  • 36. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 36September 5, 2011 Dangling Pages (Rank Sink)  Problem with pages that have no outbound links (e.g. P2)  Stochastic adjustment  if page Pj has no outgoing links then replace column j with 1/Lj  New stochastic matrix S always has a stationary vector R  can also be interpreted as a markov chain P1 P2        01 00 H and  00R        210 210 C        211 210 CHSand C C
  • 37. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 37September 5, 2011 Strongly Connected Pages (Graph)  Add new transition proba- bilities between all pages  with probability d we follow the hyperlink structure S  with probability 1-d we choose a random page  matrix G becomes irreducible  Google matrix G reflects a random surfer  no modelling of back button P1 P2 P3P4 P5   1SG n dd 1 1 GRR  1-d 1-d 1-d
  • 38. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 38September 5, 2011 Examples   1SG n dd 1 1 A1 0.26 A2 0.37 A3 0.37
  • 39. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 39September 5, 2011 Examples ... A1 0.13 A2 0.185 A3 0.185 B1 0.13 B2 0.185 B3 0.185   5.0AP   5.0BP   1SG n dd 1 1
  • 40. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 40September 5, 2011 Examples  PageRank leakage A1 0.10 A2 0.14 A3 0.14 B1 0.22 B2 0.20 B3 0.20   38.0AP   62.0BP   1SG n dd 1 1
  • 41. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 41September 5, 2011 Examples ... A1 0.3 A2 0.23 A3 0.18 B1 0.10 B2 0.095 B3 0.095   71.0AP   29.0BP   1SG n dd 1 1
  • 42. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 42September 5, 2011 Examples  PageRank feedback A1 0.35 A2 0.24 A3 0.18 B1 0.09 B2 0.07 B3 0.07   77.0AP   23.0BP   1SG n dd 1 1
  • 43. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 43September 5, 2011 Examples ... A1 0.33 A2 0.17 A3 0.175 B1 0.08 B2 0.06 B3 0.06   80.0AP   20.0BPA4 0.125   1SG n dd 1 1
  • 44. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 44September 5, 2011 Implications for Website Development  First make sure that your page gets indexed  on-page factors  Think about your site's internal link structure  create many internal links for important pages  be "careful" about where to put outgoing links  Increase the number of pages  Ensure that webpages are addressed consistently  http://www.vub.ac.be  http://www.vub.ac.be/index.php  Make sure that you get incoming links from good websites
  • 45. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 45September 5, 2011 Tools  Google toolbar  shows logarithmic PageRank value (from 0 to 10)  information not frequently updated (google dance)  Google webmaster tools  accepts a sitemap (XML document) with the structure of a website  variety of reports that help to improve the quality of a website - meta description issues - title tag issues - non-indexable content issues - number and URLs of indexed pages - number and URLs of inbound/outbound links - ...
  • 46. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 46September 5, 2011 Questions  Is PageRank fair?  What about Google's power and influence?  What about Web 2.0 or Web 3.0 and web search?  "non-existent" webpages such as offered by Rich Internet Applications (e.g. Ajax) may bring problems for traditional search engines (hidden web)  new forms of social search - Wikia Search - Delicious - ...  social marketing
  • 47. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 47September 5, 2011 HITS Algorithm  Hypertext Induced Topic Search  Jon Kleinberg  developed around the same time when Page and Brin invented PageRank  Uses the link structure like PageRank to compute a popularity score  Differences from PageRank  two popularity values for each page (hub and authority score)  note that the values are not query-independent  user gets a ranked hub and authority list Jon Kleinberg
  • 48. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 48September 5, 2011 HITS Algorithm ...  Good authorities are linked by good hubs and good hubs link to good authorities  Compute impact of authorities and hubs similar to PageRank (but only on limited set of result pages!) P1 P2 Authority Hub initialise each page with an authority and hub score of 1 repeat { compute new authority scores compute new hub scores normalise authority and hub scores }
  • 49. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 49September 5, 2011 Meta Search Engines  Search tool that sends a query to multiple search engines  Aggregates the individual results on a single result page  metacrawler is an example of a meta search engine that uses different search engines (Google, Bing, Yahoo!, ...)
  • 50. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 50September 5, 2011 Search Engine Market Share
  • 51. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 51September 5, 2011 Conclusions  Web information retrieval techniques have to deal with the specific characteristics of the Web  PageRank algorithm  absolute quality of a page based on incoming links  based on random surfer model  computed as eigenvector of Google matrix G  PageRank is just one (important) factor  Implications for website development and SEO
  • 52. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 52September 5, 2011 References  Vannevar Bush, As We May Think, Atlanic Monthly, July 1945  http://www.theatlantic.com/doc/194507/bush/  http://sloan.stanford.edu/MouseSite/Secondary.html  L. Page, S. Brin, R. Motwani and T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web, January 1998  S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Computer Networks and ISDN Systems, 30(1-7), April 1998
  • 53. Beat Signer - Department of Computer Science - bsigner@vub.ac.be 53September 5, 2011 References …  Amy N. Langville and Carl D. Meyer, Google's PageRank and Beyond – The Science of Search Engine Rankings, Princeton University Press, July 2006  PageRank Calculator  http://www.webworkshop.net/pagerank_calculator.php  Google Webmaster Tools  http://www.google.com/webmasters/
  • 54. 2 December 2005 Next Lecture Search Engine Optimisation (SEO) and Search Engine Marketing (SEM)