SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Why we need an independent index of the Web
Dirk Lewandowski
dirk.lewandowski@haw-hamburg.de
http://www.bui.haw-hamburg.de/lewandowski.html
@Dirk_Lew
Society of the Query Conference, Amsterdam, 7/11/2013
The “local copy” of the Web
•  Web Indexing
–  New, changed, deleted document
–  “Holy grail” of keeping the index complete and current
Risvik, K. M., & Michelsen, R. (2002). Search engines and web dynamics. Computer Networks, 39(3), 289–302.
Representation of documents in a search engine
Referring documents à Document à Metadata (examplex)
heading1
heading 2
Anchor text
Anchor text
Anchor text
From the source code
- Title
- Description
- Keywords
- Author
From the document
(document info)
- Length
- Date
- Decay
- Name of the author
From the Web
- PageRank
- Number of citations
The User’s Perspective
•  Everyone uses search engines (Purcell, Brenner & Raine, 2012; van Eimeren &
Frees, 2012)
•  Market is dominated by Google (ComScore data)
•  Users rely on
–  Google’s method of ordering results
–  Google’s method of collecting data
à If Google hasn’t seen it — and indexed it — or kept it up to date, it
can’t be found with a search query.
Freshness of Web search engines
(see Lewandowski, Wahlig & Meyer-Bautor, 2006; Lewandowski, 2008)
Original (as of yesterday) Google‘s copy (as of yesterday)
What about the alternatives to Google?
•  Many “seems to be” search engines
–  Accessing the data of another search engine
–  Representing nothing more than an alternative user interface to one of the more
well-known engines
–  In many cases, that turns out to be Google
–  E.g., in Germany, we can see that the major internet portals T-Online, GMX,
AOL, and web.de all display results obtained from Google
Why is one search engine not enough?
•  We need more than one search engine to ensure that a broad range of
opinions are represented in the search market.
•  Users should have the choice between different worldviews which originate
as a product of algorithm-based search result generation
•  Ideology-free search algorithms are simply not possible
Alternative Search Engine Indexes
•  There are only a handful of search engines that operate their own indexes,
due to costs and technical complexity
•  Search engines start-ups
–  Use an existing external index
–  Focus on a specialised topic (which requires only a small index)
–  Aggregate data from different search engines (meta search engine)
•  Actual search engine startups like Blekko and Duck Duck Go are more the
exception than the rule
Partner model
•  “Real” search engine providers such as Google and Bing operate their own
search engines but also provide their search results to partners
•  All the major web portals have now embraced this model.
•  Income through ads; revenue-sharing
•  Attractiveness of the model
–  The search engine provider encounters only minimal costs
–  The operator of the portal no longer needs to go to the great expense of running
its own search engine.
–  The partner index model has served to thin out the competition in the search
industry.
Access to Search Engine Indexes
•  Application programming interfaces (APIs)
–  No direct access to the search engine index
–  Limited number of top results which have already been ranked by the search
engine provider
–  Access via APIs is similar to what is occurring at the meta-search engines
–  The representation of the document in the source search engine is also not
included
Alternative Search Engines
•  What constitutes an “alternative search engine”?
–  All search engines that are not Google? (“Google Killers“, e.g., Cuil)
–  Some alternatives are not perceived as such because they are considered to be
simply the same as Google (e.g., Bing)
–  Search engines which explicitly position themselves as an alternative to Google
through a regional approach (e.g., Seekport)
–  New approaches to search / “Real alternatives”: Alternative approaches to
gathering and representing web content
Public Support for Search Engine Technology?
•  Quaero/Theseus: Funding a “Google Killer”?
–  Quaero: Technologies for multimedia searching.
–  Theseus: Semantic technologies for business-to-business applications (without
focusing exclusively on search).
•  The proposal to provide government funding for search engine technology
has been subject to intense criticism in the past
•  Establish a single alternative?
•  A number of factors which would cause it to fail
–  Poor marketing
–  Graphic design of the user interface
–  ...
•  Regardless of the reason, a failure of the new search engine would result in
the entire publicly funded initiative failing.
Economic perspective
•  Only the largest internet companies are able to afford large indexes.
•  Microsoft is the only company besides Google to possess a comprehensive
search engine index.
•  Yahoo gave up on its own index several years ago
•  It appears as though operating a dedicated index is attractive to practically
no one — and there are hardly any candidates with the necessary financial
resources in any case
The Solution
•  Create the conditions that will make establishing alternative search engines
possible
•  We can expect that the possibilities it presents would benefit a number of
different companies, individuals, and institutions.
•  The result will be fair competition to develop the best concepts for using the
data provided by the index.
Vision
•  “An index of the web that can be accessed at fair conditions for
everyone”
–  “Everyone” means that anyone who is interested can access the index.
–  “Fair conditions” does not mean that access to the index must be free of
charge for everyone. A certain number of document requests per day
should be available at no cost in order to promote non-profit projects.
–  “Access” to the index can be defined as the ability to automatically
query the index with ease.
–  The concept “index of the web” is intended to cover as much of the web
as possible
Funding and operation
•  Funding
–  This type of project cannot be supported by any one country alone. The only
feasible option is a pan-European initiative.
•  Who would operate the index?
–  Existing research institution or newly-founded institution
–  The operator of the index should not obtain the exclusive right to determine the
way in which the documents are used or made available (à Board of trustees)
Conclusion: Advantages of an independent index of the web
•  Motivate companies, institutions, and developers pursuing personal projects
to create their own search applications.
•  The data available on the web is so boundless that it lends itself to
countless applications in a broad range of fields.
•  Enable applications we are not yet capable of even imagining.
•  An open structure, transparency with respect to access, and the assurance
of permanent availability thanks to state sponsorship would lay the
groundwork for innovation.
Thank you
Prof. Dr. Dirk Lewandowski
Hochschule für Angewandte Wissenschaften
Hamburg
dirk.lewandowski@haw-hamburg,de
Twitter: Dirk_Lew
http://www.bui.haw-hamburg.de/lewandowski.html
http://www.searchstudies.org

Contenu connexe

Tendances

International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
IJDKP
 

Tendances (12)

Call for Papers - International Journal of Data Mining & Knowledge Management...
Call for Papers - International Journal of Data Mining & Knowledge Management...Call for Papers - International Journal of Data Mining & Knowledge Management...
Call for Papers - International Journal of Data Mining & Knowledge Management...
 
MOVING presentation at JSI
MOVING presentation at JSIMOVING presentation at JSI
MOVING presentation at JSI
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
General Introduction to the Oxford e-Research Centre
General Introduction to the Oxford e-Research CentreGeneral Introduction to the Oxford e-Research Centre
General Introduction to the Oxford e-Research Centre
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
Understanding Open Access
Understanding Open AccessUnderstanding Open Access
Understanding Open Access
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 

En vedette

Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Dirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Dirk Lewandowski
 
Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?
Dirk Lewandowski
 

En vedette (7)

Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (2)
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (1)
 
Perspektiven eines Open Web Index
Perspektiven eines Open Web IndexPerspektiven eines Open Web Index
Perspektiven eines Open Web Index
 
Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?Neue Trends: Google, SEO und Co.?
Neue Trends: Google, SEO und Co.?
 
Wie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretierenWie Suchmaschinen die Inhalte des Web interpretieren
Wie Suchmaschinen die Inhalte des Web interpretieren
 
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
Wie entwickeln sich Suchmaschinen heute, was kommt morgen?
 
Suchmaschinen verstehen
Suchmaschinen verstehenSuchmaschinen verstehen
Suchmaschinen verstehen
 

Similaire à Why we need an independent index of the Web

Google Case Analysis
Google Case AnalysisGoogle Case Analysis
Google Case Analysis
Lior Agassi
 
Optus improves customer experience
Optus improves customer experienceOptus improves customer experience
Optus improves customer experience
Sushant Arora
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
butest
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
butest
 
KB Seminars: Working with Technology - Product Management; 10/13
KB Seminars: Working with Technology - Product Management; 10/13KB Seminars: Working with Technology - Product Management; 10/13
KB Seminars: Working with Technology - Product Management; 10/13
MDIF
 
talk for HK SME center about web3.0 , AI, mobile apps
talk for HK SME center about web3.0 , AI, mobile appstalk for HK SME center about web3.0 , AI, mobile apps
talk for HK SME center about web3.0 , AI, mobile apps
Alex Hung
 

Similaire à Why we need an independent index of the Web (20)

Alternatives to Google
Alternatives to GoogleAlternatives to Google
Alternatives to Google
 
Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)Web-Oriented Architecture (WOA)
Web-Oriented Architecture (WOA)
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Design Issues for Search Engines and Web Crawlers: A Review
Design Issues for Search Engines and Web Crawlers: A ReviewDesign Issues for Search Engines and Web Crawlers: A Review
Design Issues for Search Engines and Web Crawlers: A Review
 
Google Case Analysis
Google Case AnalysisGoogle Case Analysis
Google Case Analysis
 
Social shopping with semantic power
Social shopping with semantic powerSocial shopping with semantic power
Social shopping with semantic power
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
 
Optus improves customer experience
Optus improves customer experienceOptus improves customer experience
Optus improves customer experience
 
Google Whitepaper - Project Border
Google Whitepaper - Project BorderGoogle Whitepaper - Project Border
Google Whitepaper - Project Border
 
PPT 3 Web Analytics (1).pptx
PPT 3 Web Analytics (1).pptxPPT 3 Web Analytics (1).pptx
PPT 3 Web Analytics (1).pptx
 
Digital Marketing Course Week 6: Search Engine Optimization (SEO)
Digital Marketing Course Week 6: Search Engine Optimization (SEO)Digital Marketing Course Week 6: Search Engine Optimization (SEO)
Digital Marketing Course Week 6: Search Engine Optimization (SEO)
 
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
 
Keyword research tools for Search Engine Optimisation (SEO)
Keyword research tools for Search Engine Optimisation (SEO)Keyword research tools for Search Engine Optimisation (SEO)
Keyword research tools for Search Engine Optimisation (SEO)
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...A machine learning approach to web page filtering using ...
A machine learning approach to web page filtering using ...
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a Nutshell
 
Google Analytics SDDU Seminar
Google Analytics SDDU SeminarGoogle Analytics SDDU Seminar
Google Analytics SDDU Seminar
 
KB Seminars: Working with Technology - Product Management; 10/13
KB Seminars: Working with Technology - Product Management; 10/13KB Seminars: Working with Technology - Product Management; 10/13
KB Seminars: Working with Technology - Product Management; 10/13
 
talk for HK SME center about web3.0 , AI, mobile apps
talk for HK SME center about web3.0 , AI, mobile appstalk for HK SME center about web3.0 , AI, mobile apps
talk for HK SME center about web3.0 , AI, mobile apps
 
Search and Social Media Marketing Course Slides - Salford Universtiy
Search and Social Media Marketing Course Slides - Salford UniverstiySearch and Social Media Marketing Course Slides - Salford Universtiy
Search and Social Media Marketing Course Slides - Salford Universtiy
 

Plus de Dirk Lewandowski

In a World of Biased Search Engines
In a World of Biased Search EnginesIn a World of Biased Search Engines
In a World of Biased Search Engines
Dirk Lewandowski
 
Künstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei SuchmaschinenKünstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei Suchmaschinen
Dirk Lewandowski
 
Analysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topicsAnalysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topics
Dirk Lewandowski
 
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und EntwicklungsperspektivenInternet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Dirk Lewandowski
 
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Dirk Lewandowski
 
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von SuchmaschinenVerwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Dirk Lewandowski
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Dirk Lewandowski
 
Medientage 2013: Die Zukunft der Suche
Medientage 2013: Die Zukunft der SucheMedientage 2013: Die Zukunft der Suche
Medientage 2013: Die Zukunft der Suche
Dirk Lewandowski
 
Suchmaschinen: Googlerisierung der Gesellschaft
Suchmaschinen: Googlerisierung der GesellschaftSuchmaschinen: Googlerisierung der Gesellschaft
Suchmaschinen: Googlerisierung der Gesellschaft
Dirk Lewandowski
 
Wie beeinflussen Suchmaschinen den Informationsmarkt?
Wie beeinflussen Suchmaschinen den Informationsmarkt?Wie beeinflussen Suchmaschinen den Informationsmarkt?
Wie beeinflussen Suchmaschinen den Informationsmarkt?
Dirk Lewandowski
 
Warum wir Alternativen zu Google benötigen
Warum wir Alternativen zu Google benötigenWarum wir Alternativen zu Google benötigen
Warum wir Alternativen zu Google benötigen
Dirk Lewandowski
 

Plus de Dirk Lewandowski (20)

The Need for and fundamentals of an Open Web Index
The Need for and fundamentals of an Open Web IndexThe Need for and fundamentals of an Open Web Index
The Need for and fundamentals of an Open Web Index
 
In a World of Biased Search Engines
In a World of Biased Search EnginesIn a World of Biased Search Engines
In a World of Biased Search Engines
 
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
EIN ANDERER BLICK AUF GOOGLE: Wie interpretieren Nutzer/innen die Suchergebni...
 
Künstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei SuchmaschinenKünstliche Intelligenz bei Suchmaschinen
Künstliche Intelligenz bei Suchmaschinen
 
Analysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topicsAnalysing search engine data on socially relevant topics
Analysing search engine data on socially relevant topics
 
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändertGoogle Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
Google Assistant, Alexa & Co.: Wie sich die Welt der Suche verändert
 
Suchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von SuchdienstenSuchverhalten und die Grenzen von Suchdiensten
Suchverhalten und die Grenzen von Suchdiensten
 
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
Können Nutzer echte Suchergebnisse von Werbung in Suchmaschinen unterscheiden?
 
Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?Are Ads on Google search engine results pages labeled clearly enough?
Are Ads on Google search engine results pages labeled clearly enough?
 
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
Search Engine Bias - sollen wir Googles Suchergebnissen vertrauen?
 
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und EntwicklungsperspektivenInternet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
Internet-Suchmaschinen: Aktueller Stand und Entwicklungsperspektiven
 
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
Ordinary Search Engine Users Assessing Difficulty, Effort and Outcome for Sim...
 
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von SuchmaschinenVerwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
Verwendung von Skalenbewertungen in der Evaluierung von Suchmaschinen
 
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
Neue Entwicklungen bei Suchmaschinen und deren Relevanz für Bibliotheken (3)
 
Nutzer verstehen
Nutzer verstehenNutzer verstehen
Nutzer verstehen
 
Medientage 2013: Die Zukunft der Suche
Medientage 2013: Die Zukunft der SucheMedientage 2013: Die Zukunft der Suche
Medientage 2013: Die Zukunft der Suche
 
Suchmaschinen: Googlerisierung der Gesellschaft
Suchmaschinen: Googlerisierung der GesellschaftSuchmaschinen: Googlerisierung der Gesellschaft
Suchmaschinen: Googlerisierung der Gesellschaft
 
Wie beeinflussen Suchmaschinen den Informationsmarkt?
Wie beeinflussen Suchmaschinen den Informationsmarkt?Wie beeinflussen Suchmaschinen den Informationsmarkt?
Wie beeinflussen Suchmaschinen den Informationsmarkt?
 
Web-Index-Workshop 2014
Web-Index-Workshop 2014Web-Index-Workshop 2014
Web-Index-Workshop 2014
 
Warum wir Alternativen zu Google benötigen
Warum wir Alternativen zu Google benötigenWarum wir Alternativen zu Google benötigen
Warum wir Alternativen zu Google benötigen
 

Dernier

Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
Asmae Rabhi
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
galaxypingy
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
ydyuyu
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 

Dernier (20)

Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 

Why we need an independent index of the Web

  • 1. Why we need an independent index of the Web Dirk Lewandowski dirk.lewandowski@haw-hamburg.de http://www.bui.haw-hamburg.de/lewandowski.html @Dirk_Lew Society of the Query Conference, Amsterdam, 7/11/2013
  • 2. The “local copy” of the Web •  Web Indexing –  New, changed, deleted document –  “Holy grail” of keeping the index complete and current Risvik, K. M., & Michelsen, R. (2002). Search engines and web dynamics. Computer Networks, 39(3), 289–302.
  • 3. Representation of documents in a search engine Referring documents à Document à Metadata (examplex) heading1 heading 2 Anchor text Anchor text Anchor text From the source code - Title - Description - Keywords - Author From the document (document info) - Length - Date - Decay - Name of the author From the Web - PageRank - Number of citations
  • 4. The User’s Perspective •  Everyone uses search engines (Purcell, Brenner & Raine, 2012; van Eimeren & Frees, 2012) •  Market is dominated by Google (ComScore data) •  Users rely on –  Google’s method of ordering results –  Google’s method of collecting data à If Google hasn’t seen it — and indexed it — or kept it up to date, it can’t be found with a search query.
  • 5. Freshness of Web search engines (see Lewandowski, Wahlig & Meyer-Bautor, 2006; Lewandowski, 2008) Original (as of yesterday) Google‘s copy (as of yesterday)
  • 6. What about the alternatives to Google? •  Many “seems to be” search engines –  Accessing the data of another search engine –  Representing nothing more than an alternative user interface to one of the more well-known engines –  In many cases, that turns out to be Google –  E.g., in Germany, we can see that the major internet portals T-Online, GMX, AOL, and web.de all display results obtained from Google
  • 7. Why is one search engine not enough? •  We need more than one search engine to ensure that a broad range of opinions are represented in the search market. •  Users should have the choice between different worldviews which originate as a product of algorithm-based search result generation •  Ideology-free search algorithms are simply not possible
  • 8. Alternative Search Engine Indexes •  There are only a handful of search engines that operate their own indexes, due to costs and technical complexity •  Search engines start-ups –  Use an existing external index –  Focus on a specialised topic (which requires only a small index) –  Aggregate data from different search engines (meta search engine) •  Actual search engine startups like Blekko and Duck Duck Go are more the exception than the rule
  • 9. Partner model •  “Real” search engine providers such as Google and Bing operate their own search engines but also provide their search results to partners •  All the major web portals have now embraced this model. •  Income through ads; revenue-sharing •  Attractiveness of the model –  The search engine provider encounters only minimal costs –  The operator of the portal no longer needs to go to the great expense of running its own search engine. –  The partner index model has served to thin out the competition in the search industry.
  • 10. Access to Search Engine Indexes •  Application programming interfaces (APIs) –  No direct access to the search engine index –  Limited number of top results which have already been ranked by the search engine provider –  Access via APIs is similar to what is occurring at the meta-search engines –  The representation of the document in the source search engine is also not included
  • 11. Alternative Search Engines •  What constitutes an “alternative search engine”? –  All search engines that are not Google? (“Google Killers“, e.g., Cuil) –  Some alternatives are not perceived as such because they are considered to be simply the same as Google (e.g., Bing) –  Search engines which explicitly position themselves as an alternative to Google through a regional approach (e.g., Seekport) –  New approaches to search / “Real alternatives”: Alternative approaches to gathering and representing web content
  • 12. Public Support for Search Engine Technology? •  Quaero/Theseus: Funding a “Google Killer”? –  Quaero: Technologies for multimedia searching. –  Theseus: Semantic technologies for business-to-business applications (without focusing exclusively on search). •  The proposal to provide government funding for search engine technology has been subject to intense criticism in the past •  Establish a single alternative? •  A number of factors which would cause it to fail –  Poor marketing –  Graphic design of the user interface –  ... •  Regardless of the reason, a failure of the new search engine would result in the entire publicly funded initiative failing.
  • 13. Economic perspective •  Only the largest internet companies are able to afford large indexes. •  Microsoft is the only company besides Google to possess a comprehensive search engine index. •  Yahoo gave up on its own index several years ago •  It appears as though operating a dedicated index is attractive to practically no one — and there are hardly any candidates with the necessary financial resources in any case
  • 14. The Solution •  Create the conditions that will make establishing alternative search engines possible •  We can expect that the possibilities it presents would benefit a number of different companies, individuals, and institutions. •  The result will be fair competition to develop the best concepts for using the data provided by the index.
  • 15. Vision •  “An index of the web that can be accessed at fair conditions for everyone” –  “Everyone” means that anyone who is interested can access the index. –  “Fair conditions” does not mean that access to the index must be free of charge for everyone. A certain number of document requests per day should be available at no cost in order to promote non-profit projects. –  “Access” to the index can be defined as the ability to automatically query the index with ease. –  The concept “index of the web” is intended to cover as much of the web as possible
  • 16. Funding and operation •  Funding –  This type of project cannot be supported by any one country alone. The only feasible option is a pan-European initiative. •  Who would operate the index? –  Existing research institution or newly-founded institution –  The operator of the index should not obtain the exclusive right to determine the way in which the documents are used or made available (à Board of trustees)
  • 17. Conclusion: Advantages of an independent index of the web •  Motivate companies, institutions, and developers pursuing personal projects to create their own search applications. •  The data available on the web is so boundless that it lends itself to countless applications in a broad range of fields. •  Enable applications we are not yet capable of even imagining. •  An open structure, transparency with respect to access, and the assurance of permanent availability thanks to state sponsorship would lay the groundwork for innovation.
  • 18. Thank you Prof. Dr. Dirk Lewandowski Hochschule für Angewandte Wissenschaften Hamburg dirk.lewandowski@haw-hamburg,de Twitter: Dirk_Lew http://www.bui.haw-hamburg.de/lewandowski.html http://www.searchstudies.org