SlideShare une entreprise Scribd logo
1  sur  66
Télécharger pour lire hors ligne
Risks of search engine dependency and
           its influence on data quality

Thesis intermediate report submitted for the European Master in Business Studies
                                    (EMBS)
                         by Ronan CHARDONNEAU
        Institut de Management de l'Université de Savoie d'Annecy (FR)
                      Università degli studi di Trento (IT)
                           Universität Kassel (GER)
                           Universidad de León (SP)




                Date of submission: 26th January, 2009



                              Master Thesis
Disclaimer

       Before starting this report I inform any readers that this is an intermediate
report of a final master thesis which will be finished in June 2009.
       The work within is the result of months of work since June 2008. To find
what inspired me please have a look at my blog (in French) at:
               http://moteurs-de-recherches-alternatifs.blogspot.com/
       The following intermediate report has not taken yet ( at the date of February
the 13th 2009) in consideration the pieces of advice from my teachers.
       Even if there are not many corrections to do this work is then not perfect yet.
       I removed the acknowledgements part for the public version of the thesis.
       Lastly I would like to inform any readers of this work that I will normally
be graduated in June 2009 and that from this date I will be actively looking for a
job or a phd in the field of international e-marketing.
       I am very flexible and able to move at any place around the world.
       Please feel free to contact me though my blog by posting a comment or at
ronanchardonneau@gmail.com.




  CHARDONNEAU Ronan - European Master in Business Studies 2007/2009 2/66
Ronan CHARDONNEAU
                 Blog: http://moteurs-de-recherches- ronanchardonneau@gmail.com
                     alternatifs.blogspot.com/




Professional experience                               Educational background

                                                          □ 2007/2009
  •     Search Engine Optimizer(Eng-IT)
                                                            European Master in Business Studies
         June- September 2008 – Search engine
                                                            Quadruple degree in international
            optimizer in English and Italian
                                                            management (Trento, Annecy, Kassel
                 Edexon Venice - Italy
                                                            Léon)
  •     International salesman
                                                          □ 2006/2007
        April- august 2007 – Selling IT services
                                                            A three-year degree in Import-Export
               on the local foreign market
                                                            (IUT Quimper)
                Easteq Shanghai - China
                                                          □ 2004-2006
        Assistant of the director of operations
  □
                                                            A two-year degree in Business and
        April- June 2006 – Creation of a search
                                                            Administration: Finance, Accountancy
                  engine for a database
                                                            and Computering
            Northwest Folklife Seattle - USA
                                                            (IUT Nantes)
        Track and Field coach trainer
  □
                                                          • 2004
       2006- 2008
                                                            Scientific A-level
        Mailman
  □
                                                            (Lycée Clemenceau Nantes)
       Summer 2004-2006 – (La Poste - France)


                                                       Professional projects
Foreign languages
                                                          □ Market study
                                                            Possibilities of entering the Italian market
   □ Trilingual: French, English, Italian
                                                            for Easteq China Offshore
     Autonomous in those three languages
                                                          □ Market study
                                                            Research of distributors and competitors
   •    Chinese
                                                            in Germany for Thyssen Krupp Elevators
        Able to stand basic conversations
                                                          □ Setting up a prototype computer
                                                            application for the students
   □ German and Spanish
                                                            registration system of the university of
     Elementary level (A2)
                                                            Nantes


                                                       Others
Computer skills
                                               □ Extra scholars activities
   •  Search Engine Optimization                  Author of several blogs dealing with trips
     6 months of experience in this field         and many articles about Web applications.
   □ Software Suite                            □ Centers of interests
      Competent in GIMP, Microsoft and Open       Foreign languages, traveling (more than
   Office. Notions of Linux/Ubuntu                20 countries visited), computers and
   • Programming                                  technology
      Notions ofCHARDONNEAU Ronan - European Master in Business Studies 2007/2009 3/66
                 PHP and XHTML
Contents
Foreword.......................................................................................................................6
Chapter 1: Introduction of the topic background..........................................................8
    1.1 Relevance of the subject...................................................................................10
    1.2 Major terms......................................................................................................11
    1.3 Focus, goals and structure of the report...........................................................11
Chapter 2: Concept of data quality.............................................................................13
    2.1 Data quality definition......................................................................................14
    2.2 The importance of data quality.........................................................................15
Chapter 3: Search engines dependency.......................................................................16
    3.1 Search engine market configuration.................................................................17
        3.1.1 Search engine categories..........................................................................17
        3.1.2 Search engine market...............................................................................19
        3.1.3 The search engines in the world...............................................................19
        3.1.4 The search engine market shares per country...........................................22
        3.1.5 The search engines competition...............................................................23
        3.1.6 The semantic web.....................................................................................24
    3.2 Search engines dependency aspect...................................................................25
        3.2.1 Search engines dependency proves..........................................................25
        3.2.2 Search engines dependency aspect...........................................................27
    3.3 Search engines dependency problems..............................................................28
        3.3.1 Privacy issues...........................................................................................29
        3.3.2 Looking for other search engines.............................................................30
        3.3.3 Search engine awareness..........................................................................30
        3.3.4 Other search engines existence awareness...............................................32
        3.3.5 Less confident regarding other search engines.........................................33
        3.3.5 Less confident regarding other search engines.........................................33
        3.3.6 Even the best cannot provide you everything...........................................34
Chapter 4: Risks of search engines dependency and its influence on data quality.....35
    4.1 The information has been found but is poor....................................................36
    4.2 What the search engines do not tell you...........................................................36
    4.3 The best way to get data quality.......................................................................37

    CHARDONNEAU Ronan - European Master in Business Studies 2007/2009 4/66
4.3.1 The sub-search engines.............................................................................37
        4.3.2 The size of the Internet.............................................................................38
        4.3.3 Single search engine Internet coverage....................................................39
        4.3.4 Multiple search engine Internet coverage.................................................42
        4.3.5 Others search engine Internet coverage....................................................44
        4.3.6 A concrete representation of the World Wide Web...................................46
    4.4 The gap between search engine dependency and data quality.........................47
Chapter 5: The Google example.................................................................................50
    5.1 Google..............................................................................................................51
    5.2 Google's success...............................................................................................51
    5.3 Google dependency state..................................................................................52
    5.4 Google functions..............................................................................................52
    5.5 Google added functionalities............................................................................53
    5.6 Google success is his weakness.......................................................................53
    5.7 Google's disappearance hypothesis..................................................................54
Conclusion..................................................................................................................55
Declaration..................................................................................................................56
List of literature...........................................................................................................57
Afterword....................................................................................................................61




    CHARDONNEAU Ronan - European Master in Business Studies 2007/2009 5/66
Table of figures
Illustration 1: Total Sites Across All Domains August 1995 - January 2009................9
Illustration 2: A light interface: the homepage of the search engine www.ask.fr.......17
Illustration 3: Home page of the Mail.ru portal..........................................................18
Illustration 4: Top 10: Search websites in the world August 2007.............................19
Illustration 5: The most visited website by country....................................................20
Illustration 6: Search engine market shares in Czech Republic..................................22
Illustration 7: Results page of Cuil.............................................................................24
Illustration 8: Search engines figures for France, Source: XitiMonitor .....................28
Illustration 9: Some sub-search engines of Google....................................................37
Illustration 10: Google's Index coverage....................................................................40
Illustration 11: “Powered by Google” search engines coverage.................................41
Illustration 12: Search engines coverage....................................................................43
Illustration 13: “Who is like it” search engine............................................................44
Illustration 14: Dogpile meta search engine...............................................................45
Illustration 15: A representation of the most visited websites....................................46
Illustration 16: Comparison among Google, Yahoo and MSN...................................48
Illustration 17: Google's domination in Europe..........................................................52
Illustration 18: Google's Advanced search engine......................................................53
Illustration 19: Eye Tracking on Google.....................................................................54




                CHARDONNEAU Ronan - European Master in Business Studies                                      6/66
Foreword
       As most of the students who has a computer one of my first move when I
wake up is to switch on the computer and to spend my first twenty minutes of the day
on the Internet.
       From there I have a look at the last news, I check my e-mails and eventually
exchange some few words with a couple of friends by using online chat applications.
I also check my other email account as well as my blogs and analyze the traffic I got
during the last few days, to finish this process I consult my advertisement account to
see if I got some revenues. I often use as well search engine to look for information
which just came up into my mind during the night.
       In the paragraph you just read was the description of my morning routine on
Internet. There is nothing special except that most of the moves I described above are
in fact done on two to three major search engines: Google, Yahoo and Microsoft.
       I hardly ever use Yahoo or Microsoft for search purpose but Google is for
sure the website I visit the most to crawl the web but... is Google the Internet?
       I got the idea to write about: « Risks of search engine dependency and its
influence on data quality » not because I was using all those Google applications
everyday and was scared about what will happen if I get in troubles with Google
such as privacy issues or if Google just closed. I just write about it because one day I
found Google results not accurate enough.
       And from this observation a lot of questions came to my mind:
   •   Is it me who is not good enough at performing research on the Internet?
   •   Is it because no one wrote about the information I am looking for?
   •   Is it because the information is not on the first pages in Google that I have to
       browse all the pages in order to find it?
   •   Is it because Google is not good enough?
   •   Is it because the information is hidden in some other documents such as PDF,
       pictures, videos?
   •   Is it because I have to use another way to crawl the web and if yes how?
       You see here how a simple observation can raise a lot of questions.
       I hesitated a lot about writing on this topic, the main problem I got was that I
was not convinced that there is a potential risk of being search engine dependent. The

           CHARDONNEAU Ronan - European Master in Business Studies                  7/66
reason is that companies such as Google are working hard in order to fit Internet
users expectations and the vision we get is that they are doing a wonderful work. The
problem is that there could be a difference between perception and real facts and this
is exactly what I am eager to discover here.
        Can we measure how huge is the gap between the information we were
looking for and the one of search engines as Google are providing us?
        Search engines are set up to find information on the Internet, information
being the basis of any good decisions making we can then understand how important
and interesting it is to write on this topic.


        I hope you will appreciate this reading as much as I did when making my
research.


                                                           Ronan CHARDONNEAU




            CHARDONNEAU Ronan - European Master in Business Studies              8/66
Chapter 1: Introduction of the topic
            background




CHARDONNEAU Ronan - European Master in Business Studies   9/66
I will not surprise you if I say that Internet has been created to share
information and to communicate with each others.
        It is hard to evaluate how big is the Internet, estimations among companies
are very different, it varies from 15 to some 30 billion Web pages1. The number of
websites is increasing everyday and estimated at 185,167,8972                   with a constant
augmentation since the creation of the world wide web.




    Illustration 1: Total Sites Across All Domains August 1995 - January 2009


Habits have changed since the creation of the Internet and websites are used now in
diverse manners if it comes to be a standard for companies (recognized as a mark of
trust, seriousness and quality) it is also a space for many individuals (blog
phenomenon). As an example regarding France, in June 2008 14% of French people
above 12 year-old which means 22% of French Internet users are authors of a blog or
a website3.
        The banalization of the Internet and the fact that anyone can create his own
website for free increase the feeling we have regarding the Internet: a true jungle of
information and even sometimes real “dump” regarding information accuracy.



1 cf. Koch, P. / Koch, S. (2009): How big is the Internet?.
2 Netcraft, (2009): January 2009 Web Server Survey.
3 Crédoc, (2008): La diffusion des technologies de l'information et de la communication dans la
  société française, p.120.

           CHARDONNEAU Ronan - European Master in Business Studies                          10/66
Websites can be accessible through three channels:


    •   Direct access (for example you know the website address by heart, you put it
        in your favorites or you find a website on a business card and you are typing
        it in the address bar);
    •   External links (you access to a website which has the link of another
        website, this is the case in most of websites, catalogs, advertisement);
    •   Through Search Engines (you use a dedicated application by typing in some
        keywords in order to get suggestions of what you are looking for);
        As you can see from this list if you use only the first two ways to crawl the
web it comes to be too rigid and not wide enough. It has been said as well that the
first way is disappearing more and more in profit of search engines4.
        So one could say that there is currently two main ways to crawl the web, from
link to link and by using search engine.
        This last one being indispensable in order to crawl the web properly.


            More and more information are put on the Internet which makes it
            come a true jungle. The only way to crawl those information properly
is to use search engines.


        1.1 Relevance of the subject


        Internet is becoming more and more our information provider. quot;In 2002 a
study from the U.S. Department of Commerce estimated that 36% of the American
public over the age of 3 used the Internet to search for product and service
information in 2001. This usage represented a substantial increase in information
search behavior from 2000, when 26% of the American public reported using the
                                                                               information.quot;5
Internet      to     search       for     product       and       service
        Between 2000 and 2008 the USA got an increase of Internet users of 130,9
%6 we can then imagine how important is searching information through Internet

4 cf. Ohayon, O. (2008): Google, moteur de recherche ou moteur de navigation?.
5 Peterson, Robert A. / Merino, Maria C. (2003):Consumer Information Search Behavior and the
  Internet, p.6.
6 Internet World Stats (2008): Internet Usage and Population in North America.

           CHARDONNEAU Ronan - European Master in Business Studies                       11/66
nowadays.
       The number of Internet users is estimated to 1,463,632,361 (world population
6,676,120,288) with a growth rate from 2000-2008 fixed at 305.5 %7.

       1.2 Major terms


       In this thesis you will hear a lot about the following terms: search engines,
search engine dependency and data quality.
       Search engine is the most flexible technology which has been created in
order to crawl the web. A search engine is no more than a web application which is
processing data. A search engine does not create data it just process some
information it has in his index.
       I describe search engine dependency as the fact that one does not make the
choice between one search engine and another. Search engine dependency is very
relevant in most of the countries. Most of the people are using only one single search
engine when performing requests on the web. I will speak as well about bad habits
when dealing with search engines. For example you can be dependent of using a
search engine but using it badly.
       Data quality is the quality of data. Data are of high quality quot;if they are fit for
their intended uses in operations, decision making and planning quot;. Alternatively, the
data are deemed of high quality if they correctly represent the real-world construct to
which they refer. These two views can often be in disagreement, even about the same
set of data used for the same purpose8.


       1.3 Focus, goals and structure of the report


       The focus of this work is to show if there are some risks of being search
engine dependent and if there are some, is the gap of information significant
between a search engine to another?
       Goals are numerous:
   •   to put in evidence how bad it is to be dependent of an unique search engine;

7 Internet World Stats (2008): INTERNET USAGE STATISTICS
  The Internet Big Picture.
8 Kaplan, I. (2008): Bad Data Can Cost You Big Time.

          CHARDONNEAU Ronan - European Master in Business Studies                  12/66
•   to show how important it is to have reliable information;
   •   to show how big is the gap between a general search engine and a specialized
       one;
   •   to look for alternatives if tomorrow the main search engines disappear;
   •   to show the weaknesses of general search engines;
   •   to show and discover other search engines and put in evidence their
       usefulness;
   •   to show that general search engines are not well used;
   •   to discover the future expectations people have regarding search engines and
       what are the coming technologies in this field;


       The structure of the report should be done as follow:


       The first idea is to introduce the concept of data quality. We all go on
search engines in order to find information whatever it is or at least to find the
answer to one of our question (how much does a Paris-Berlin train ticket costs? What
is the weather in New York? What is the last result of our favorite soccer team?).
       The second point is about Data quality which is very interesting in order to
understand why search engines exist.
       The next point is dealing with the world of search engines and the
dependency which is outcome from them.
       Analyzing the world of search engines is fundamental to understand how
Internet is not as rational as we could think (search engines may be not the Internet,
search engines may be different from a country to another).
       Then we will have a look on figures which show quite clearly that people are
not using several search engines when making research on the Internet but an unique
one that I will define as the dependency concept.
       Once this definition given I will focus on the heart of this work which are the
risks that this dependency provide and its influence on data quality.
       Google being in Europe the most used search engine I will use it as a major
example in my last part.




         CHARDONNEAU Ronan - European Master in Business Studies                 13/66
Chapter 2: Concept of data quality




CHARDONNEAU Ronan - European Master in Business Studies   14/66
When I firstly decided to write about the concept of data quality for search
engine I did not mean at the beginning that data has to be perfect. I just meant that
when I write a request on any search engine I expect to have as results some
answers which fit to my expectations. The last example I have in mind is when on
the 31th of December a friend of mine looked for the « average height of Korean
people » (request were made on Google) and all the results on the page were dealing
about the « average size of the sexual organs of Korean people ». I have high doubts
that no information regarding the average height of Korean people does not exist on
the web it however has been what Google seemed to tell me.
        Here as you can see in this example it raises a lot of issues regarding data
quality. My first expectation regarding data quality was then to get some results
which fit to my request. But looking it deeper what is the value of a supposed correct
answer that I could have found written by someone like you and I (not specialized in
this topic).
        In fact here everything depends on how professional the data as been written,
this is what I am explaining in this chapter.


        2.1 Data quality definition


        “This part needs additional information and improvements and is then not
finished yet.”


        Data quality is defined by four criteria:


    •   Accuracy: it means that the information has to be true so based on real facts.
        Here we see the importance of having the source of the document.
    •   Timely: the data given has to be dated. The most striking example could be
        the one with stock exchange, what is the value of a data regarding a currency
        change without the date?
    •   Meaningful: here it comes back to what I was explaining in the introduction.
        Does the result fits with my request? What is the color of a frog? The answer
        is green, is it useful? Yes but is it complete?;
    •   Complete: an information can be for sure useful but will be far more useful if

           CHARDONNEAU Ronan - European Master in Business Studies              15/66
it is complete.


        Here is then the minimum vital of data quality.


        2.2 The importance of data quality


“This part needs additional information and improvements and is then not finished
yet.”


        As I mentioned it there are two levels of data quality:
   •    Poor quality one: for example you and I are looking for information
        whatever it is in order to get a quick answer or even a mere comment. Here
        two theories are facing each other:
                An information should be given even if it is possibly wrong. It of
           ▪
                course can be useless if the information is a hoax but will it be in the
                case of a warning of a terrorist attack?
           ▪ An information should be given only if it is 100% accurate. Here it is
                interesting in order to not create polemical situations for nothing.
        There are no rational choice to make here, sometimes you need to use the first
theory and sometimes the other one.
        I would describe poor quality data as mass consumption information which
mean good enough for “lambda” people but not that useful for businesses and even
less for researchers. With the increasing of Internet users more and more mass
consumption data are created and of course their number is bypassing the one of
quality ones.


   •    High quality one: here we find the data which fits the four criteria I
        described above. High quality data are useful in order to take complex and
        rational decisions.


The Wikipedia Issue:


        As you may know Wikipedia is an Open Source encyclopedia where

          CHARDONNEAU Ronan - European Master in Business Studies                  16/66
everyone can write what they want on a specific subject. His success made in come
the biggest encyclopedia of all time. Such a success is recognized as a mark of trust
and seriousness for search engines which is characterized by being displayed most of
the time in the 5th first result for any request. With most of the time an appearance on
the first result.




The main issue even if recently it has been added the possibility to add the author
name most of the articles are anonymous. It is also possible to write down an article
without mentioning any source of information. This is then explained to the reader
before starting the article but everything is then dependending on reader's behaviour.
Which quality of information does he or she want to have.




           CHARDONNEAU Ronan - European Master in Business Studies                17/66
Chapter 3: Search engines dependency




CHARDONNEAU Ronan - European Master in Business Studies   18/66
In order to understand well this part we should have firstly a look at the
search engine market configuration. Even if for a lot of people search engine is a
synonym of Google it may be not true for some other people. I made a strong effort
during this work to make it as global as possible. Europe is good as well as the
United States but we will never got the all answers of our issues if we always
consider these two parts of the world.


3.1 Search engine market configuration


       3.1.1 Search engine categories


       The world of search engines is not as uniformed as we could have think. I
until now identified three kind of search engines:
   •   Standard: the most well known search engines such as www.google.com,
       www.livesearch.com (Microsoft), Ask (http://www.ask.com/?o=312) they are
       looking for any kind of information through the Internet and are characterized
       by a very light interface:




 Illustration 2: A light interface: the homepage of the search engine www.ask.fr


   •   Portals: may be the most complex search engines to analyze on a figures


         CHARDONNEAU Ronan - European Master in Business Studies               19/66
point of view. Portals are characterized by a lot of information on their home
    page including the search engine function. It is then difficult to make the
    difference here between the success of the portal and the success of the search
    engine on this web page. The best example I found is the one of mail.ru
    which is the most visited website in Russia far before Google. It seems when
    looking at the search engine statistics that the research made on Mail.ru are
    not accounted or that no one is using the search function of Mail.ru. So it is
    sometimes hard to measure the success of some search engines. The most
    well known portal is Yahoo.




                Illustration 3: Home page of the Mail.ru portal


•    Specialized search engines: they to my point of view belong to a
    subcategory of the first group. The major problem of standard search engines
    is that they are too big. Specialized search engines are then no more than a
    special function which is working as a filter. It is then far more easier to find
    the information you are looking for on a specific website rather than using
    standard ones. A good example of it can be the one of Ebay, it is of course far
    more convenient to go on the Ebay website and use the search engine directly
    from there rather than going on Google and writing a request such as 'Ebay
    buying socks'. I mentioned it again but in most of the cases specialized search
    engines are not a revolution they are just one part of standard search engine
    and are not a new technology by themselves.



      CHARDONNEAU Ronan - European Master in Business Studies                  20/66
3.1.2 Search engine market


       The search engine market is segmented by an enormous leader: Google, a
follower who is far from him: Yahoo and a large amounts of small search engines.
The dominant position of Google may stay for years and years and only a technical
revolution could really jeopardized him.




          Illustration 4: Top 10: Search websites in the world August 2007


       As you can see here I identified what we could call « The big four » which
represents the four major search engines. Then comes three specialized search
engines (7,49% all together) but their success are limited to the size of the website
they are browsing. Then came in last positions what I would called the dead
champions which once have been great and well known search engines but which
are nowadays in decline and will one day probably disappear.


           3.1.3 The search engines in the world


       The more I study the E-world and the more I realize that the web is exactly
the reflect of our society but in a non-physical aspect. Internet did not break cultural
differences and search engines really show it.
       As we just saw Google represents 6 research out of 10 in the world but does it
mean that each country in the world has a population of 60% Google users?
       To answer this question let's have a look at the following map:


          CHARDONNEAU Ronan - European Master in Business Studies                 21/66
Illustration 5: The most visited website by country
          Source9
          As we can see here the world is not covered entirely by Google. We clearly
have some Google countries, Yahoo countries, Mail.ru countries and so on and so
fourth.
          We can however notice some striking information such as almost all the
American continent is using Google as well as Europe, Northern Africa and
Southern Africa, Australia, India. In one word almost all countries which have strong
links with the Anglo-Saxon culture.
          Then comes what I would qualified as a cultural wall which is starting in
Eastern Europe and which is finishing in Russia. Here are the ex-soviet countries. I
unfortunately have no concrete proves of what I am saying but I suppose that there is
a kind of a « boycott of American technologies » and support of Russian
technologies. The recent partnership between Yandex (main search engine in Russia)
and the browser Firefox raised those suspicions10.
          Russia is not the only country in this situation, China also. The recent
advertisement broadcast by Baidu (the leader search engine in China) shows clearly

9 Alexa the Web information company (2008).
10 cf. Houste, F. (2009): Russie : Yandex sera le moteur de recherche par défaut de Firefox.
   cf. Schwartz, B. (2009): Firefox Drops Google For Yandex In Russia, But Big Loser May Be
   Rambler.

            CHARDONNEAU Ronan - European Master in Business Studies                        22/66
the will that Chinese public institutions are ready to protect their territory.11
        As you can see Asia is the region where Google is the least present. It is also
the continent where are gathering a lot of different search engines.
        I may not emphasize the diversity of the Caribbean area as well as Center
Africa which are areas where Internet is not that well implemented which mean then
that the battle to take the lead is not finished yet. For example is it really relevant to
say that Yahoo is the leader in Cameroon which is a country with less than 500,000
Internet users?
        I however will highlight that the Pacific area which is containing all the
« Tigers » (Taiwan, Thailand, Singapore...) are all in red: Yahoo.
        The search engine world is then divided in two parts:
    •   The Google world: which is composed of all the Anglo-Saxon countries as
        well as countries which have strong links with the United States or Great
        Britain. It is clearly showed regarding India and Australia. We can at the date
        of today still identify two countries which are not under Google control:
        Czech Republic and Iceland but actually by looking at the figures and the
        forecasts it is just a matter of time12.
    •   The Asian – Pacific world: Asia is composed of a lot of countries and of
        course a lot of cultures. Among them we can identify four players:
        ◦ Mail.ru which is dominating all the ex-soviet countries;
        ◦ Baidu which has a total control over China;
        ◦ Naver, a 100% South Korean product which is the best example that
            search engines work by culture;
        ◦ Yahoo which is leader in all the “Tigers” Asian countries.


             Yahoo being an American technology such as Google, how is it possible
             that Yahoo is so successful in the Pacific area and not elsewhere? The
             reason I found is that Yahoo is a shiny portal and that Asian culture on



11 cf. Einhorn, B. (2007): Baidu Thinks It Can Play in Japan.
   cf. (2007): Baidu au Japon?.
   cf. Grallet, G. (2009): Baidu, un autre Google s'éveille.
12 cf. Rafat, A. (2008): Czech Portal Seznam Could Fetch $900 Million; Google, Apax, Warburg and
   Others in Fray.
   cf. Mar Hauksson, K. (2007): Global search report 2007, p8.

           CHARDONNEAU Ronan - European Master in Business Studies                        23/66
Internet recognize a quality website to the number of animations on it 13. I will then
add that Japan is a strong pole of Internet with one of the highest rate of Internet
integration in the world per capita14. This is why I think Yahoo is so popular in this
region.


             3.1.4 The search engine market shares per country


          One of the most complete work I found on this topic after mine is the
« Global Search Report 2007 »15. What stroke me the most in all the countries
studied in this report as well as in all the report and research I made until now are the
search engines market configuration which looks like very often to this:




                       Illustration 6: Search engine market shares in
                                       Czech Republic

          It is very rare to find a country where there is a close competition among
search engines. Even if in the High Technology world things change from a day to
another you have often the following configuration where the first search engine
is leading the game by more than 30 points on its followers.


             This is typical from the search engines market or you are adopted by a
             population or you are not. This trend seems quite relevant in the
software industry, people seem to look for a standard used by all. This is the case for
the Operating System industry, the browser industry, the e-learning industry. The
13 cf. Tobin, R. / Hotchkiss, G. / Lee, P. (2008): Chinese Search Engine Engagement, p28.
14 Internet World Stats. (2008): Internet Usage in Asia.
15 cf. Wilsdon, N. (2007): Global Search Report 2007.

            CHARDONNEAU Ronan - European Master in Business Studies                         24/66
explanation I found for the success of search engine within a population is the word
to mouth, this is how Google has been so successful isn 't it? How never heard
sentences such as « you just have to Google it » Google is even nowadays in
dictionaries as a verb16.
        As a conclusion I would say that in the world of search engines you are first
or you are nothing.
        I have to mention as well that the market has a lot of small local search
engines which are if original enough bought by the big ones or if not will disappear
quickly (some example are coming in the news every month). The only key of the
success on the short term seem to be advertisement but on the long run you need the
technology behind in order to compete.


            3.1.5 The search engines competition


        Google has been created in 1998 and at that time search engines were already
in place, it did not scared Google and one after the other Google bypassed all of
them. In fact among the big four Yahoo is the oldest (1994) and Microsoft the
youngest (2003). Even if the battle seems to be finished it will take a lot of time to
Google to be the number one in all countries (everything being linked to culture
rather than rationality) which in fact is giving hope to its followers.
        At the time I am writing this thesis discussions are still on the way between
Yahoo and Microsoft in order for Microsoft to buy Yahoo search technologies. We
can understand how strategic a such acquisition could be. Yahoo having the research
knowledge and Microsoft the funds as well as the software ownership.
        Regarding Baidu we cannot clearly see how they could compete against
Google outside of China.
        As I said previously specialized search engines are limited to the website they
are linked to.
        We could then think about new comers who starting from nothing could beat
famous search engines in a small period of time, it could have been the success of
some products such as Cuil launched in summer 2008 which received a lot of
advertisement through the news17. But search engines is a very ungrateful world
16 cf. Merriam Webster. (2001): Google - Definition from the Merriam-Webster Online Dictionary.
17 cf. Arrington, M. (2008): Cuil On BusinessWeek's Most Successful of 2008 List. Huh?.

           CHARDONNEAU Ronan - European Master in Business Studies                         25/66
where visitors are giving no more than one chance: the product works or it does not.




                           Illustration 7: Results page of Cuil


           This is a point that I discovered very quickly and that you can test by
           yourself. People want the information as soon as they can. They are
           ready to test the product but in a certain amount of tries. When you
move from Google to another search engine you are often intransigent. At the first
result which does not fit your expectations you will go back to Google. But is the
search engine wrong or is it because it is responding differently that on what you
were used to?


        In order to conclude this part I would say that with the search engine history
we have and the search engine market configuration, I cannot see how Google
could lose its position. Until now only one company succeeds to make a such gap in
the world of search engine and it is Google itself and it was in a period where
everything had to be created on Internet.
        So I would say that on this field I don't see how Google can be beaten and
even worried.
        A new technology regarding research is however more and more recurrent in
this field and is called semantic research.


           3.1.6 The semantic web


“This part needs additional information and improvements and is then not finished
yet.”


          CHARDONNEAU Ronan - European Master in Business Studies               26/66
The semantic web is another way of crawling the net. We all know how to
make a search on the Internet isn't it? We just type in some keywords and press the
return key in order to get the answer. In this configuration you have to feed the
search engine with the request.
        With semantic Web the concept is a bit different and based on suggesting you
the request instead of typing it entirely. Each time that you are starting to type your
request a list of suggestions are coming to you. We are recently seeing more and
more this technology on the biggest search engines.
        The purpose is in fact to guide you as best as they can in order to put you on
the right track and trying to avoid you to reach the labyrinth of the web.
        This technology fits one of the main drawback of search engines and that I
call « search engine technology awareness » which consists in how to write good
requests for search engines.
        The main drawback of the semantic web is that this is a very new technology
which then have a lot to do before reaching his maturity point. Here we are speaking
about a maximum length of a decade. We can also complain about the rigidity of the
system but it is true that with length and experience this issue could be fixed.
        It said that Ask is one of the search engine which based a lot of R&D on this
new technology but according to me and without being a technician I think that
Google can have better results because of its huge database of requests. Future will
tell us what is going to happen.


3.2 Search engines dependency aspect


        As I mentioned it in the introduction I define search engine dependency as the
fact that people are swearing only by one search engine when looking for
information on the Internet.


        3.2.1 Search engines dependency proves


“This part needs additional information and improvements and is then not finished
yet.”



          CHARDONNEAU Ronan - European Master in Business Studies                  27/66
This is not the studies which are lacking on this topic. When looking for
information regarding information literacy on the Internet you arrive on different
sources of studies and this regarding all the countries of the world. Information
literacy is a relevant topic and an issue. I focused on some very recent and
francophone research that I found on the Internet regarding Canadian students18,
French19 and Belgium students20. I also found information regarding Germany on this
topic. Many sources are as well saying that such research have been made in mostly
all Europe, China (Hong Kong)21 and the United States22.
        All the studies I found until now (all done on students panels so literate
people) are all saying the same thing: search engine are the first source of
information when looking on the Internet and all students seem to have receive not
enough training on how to look for information on the Internet.
        The best study I found on this topic is one made on all the registered PhD
students (2,218 with an answer rate of 23,4%) last year (2008) on a whole region of
France (not a high technological developed country but far to be the least on a
worldwide scale)23.
        As we can imagine PhD students have a high requirements regarding quality
of information.
        The study shows that 67,5% of the respondents have never received a training
regarding how to look for information during their whole stay at the university which
could explain the fact that people are running toward search engines directly.
        Search engines are used in 96% of the cases when performing research
(which emphasize the necessity of how to well use those technologies).
        94% of them do not use blogs which I take as a good thing (even if the survey
is saying the opposite, blogs being written by professionals as well).
        The most used search engines are Google (85%) and Google Scholar 37%

18 cf. Crepuq. (2003): Etude sur les connaissances en recherche documentaire des étudiants entrant
   au 1er cycle dans les universités québécoises.
19 cf. Université de Lyon. (2007): De la documentation au plagiat.
20 cf. EduDoc. (2008): Enquête sur les compétences documentaires et informationnelles des étudiants
   qui accèdent à l'enseignement supérieur en Communauté française de Belgique.
21 cf. Leung, Hon-wing 梁漢榮. (2004): A study of computer science students' conceptions of
   information literacy and their experiences in information search process and use.
22 cf. Enquiro. (2004): Search Engine Usage in North America.
23 cf. URFIST de Rennes. (2008): ENQUÊTE SUR LES BESOINS DE FORMATION DES
   DOCTORANTS Á LA MAÎTRISE DE L’INFORMATION SCIENTIFIQUE dans les Ecoles
   doctorales de Bretagne.

           CHARDONNEAU Ronan - European Master in Business Studies                          28/66
(which is a sub search engine of Google).
        60% of them do not know what is a meta search engine and only 5% of them
use them.
        46 % do not know the search engine of their field and only 20% do use them.
        Those figures are very interesting because they show clearly how people are
not adapted to the technology they are using. PhD students should be some of the
most search engines awarded people and it seems that for France they are not. They
are strictly dependent of a single search engine which is here Google. They know
very few of his sub search engine and as written above they do not know how to use
the technology. They are also not aware massively about other search engines.




        3.2.2 Search engines dependency aspect


Search engines dependency can however comes from different ways:


    •   Search engine satisfaction: you are using a specific search engine which
        give you entire satisfaction, so why should you change?;


    •   Search engine patriotism: you are using this search engine in order to
        support your local technologies;




    •   Search engine convenience: the search engine is providing you all kind of
        services which made it very convenient to use or even made the other ones
        not convenient to use;



        Of course the trend for all search engine is to go for convenience because it
gives to customer everything they need. The main drawback is that for the search
engine companies you have then to dedicate less people to your core activity and
then there is a risk that your search activity will pay the price.



            CHARDONNEAU Ronan - European Master in Business Studies             29/66
So being search engine dependent means using massively a search engine for
one of those reasons and ignoring all the other ones. Search engines dependency
reach very high rate in Europe:




                        Illustration 8: Search engines figures for
                               France, Source: XitiMonitor

Most of the European countries have like France a strong addiction to Google with
more than 90%. What does it concretely mean? Almost all European when making
search on the Internet are fed by using the same way to process information.


3.3 Search engines dependency problems


       At the first sight when using a search engine we are not thinking about all the
issues which are coming out from them. We make our research and we get results
from this and then we try the results one after the other until finding the one which
fits the best our expectations.
       The first main problem is that when addicted to a specific search engine
which normally gave you satisfaction the day when the result will not be the one you
want you may think about different possibilities:
   •   The information is not displayed so the information you are looking for does
       not exist yet;
   •   The request was not good enough so let's try with other keywords;


       The main issue to highlight is that people are so confident with some search
engines that they will not normally look for alternatives or even consider that their

          CHARDONNEAU Ronan - European Master in Business Studies               30/66
favorite search engine can be wrong.


            People are so confident with their search engine that they are not
            thinking that the search engine can be wrong.


        3.3.1 Privacy issues


        I am a bit divided on this topic because I consider it as the mass public
« scarecrow » which is only good to animate polemical debates for nothing.
        It finds its explanations in the way that search engines are collecting
information.
                 When analyzing search engines we have to consider that it is a free
                product for all of us (in fact search engines get paid by displaying
                advertisement on each web page). Each time you are making a
                research on the Internet the search engine you are using registers the IP
number of your computer and of course the research you just made. All these data are
of course supposed to be confidential but some are used in order to make some
statistics such as how many Internet users from a specific country have visited this
website. It can be used for other purposes such as the rank of the most used research
and others data such as those. Of course the more information you give and the most
they collect so if you open an email account on a search engine for example they will
collect your name, address. Until now few are the cases where we got the proof that
information collected by search engines have been given to third parties. The most
famous one is the one of Yahoo in China which filtered some emails and gives the
names of some Chinese journalists who were denouncing things about the Chinese
government24.
        To make it clear until now no mass exploitation of data have been observed
and the recent news given by major search engines (Microsoft and Google) are
saying that the trend is to eliminate those data as much as possible in the fear of
losing confidentiality25. We however have to consider an additional element the more
search engine know about what we are looking for and the most they can fit our
expectations, so I personally do not think that reducing the collection of data is in
24 cf. Kahn, J. (2005): Yahoo helped Chinese to prosecute journalist.
25 cf. Boucq, I. (2009): Yahoo et vos données persos....

           CHARDONNEAU Ronan - European Master in Business Studies                 31/66
people interest and I will qualify the privacy issue as a global scarecrow in order to
bug the major search engines and putting on the first row some alternative ones
which on the long run may will not fix those issues.


        3.3.2 Looking for other search engines


        A recurrent question often asked is « Why people are not looking for other
search engines? ». Why do people knowledge regarding search engines is so low? If
you ask to someone how many search engines he may know he will for sure answer
you « Google » maybe « Yahoo » he may add « Microsoft » without telling you its
real name which is « Live Search » and it will for sure stops right here.
        Many others search engines are however present on the market. We should
then conclude that people are satisfied by the current search engines.
        However studies are showing that regarding Google people when getting their
results are only considering the first three results and not the others giving far more
importance to the first one.26 By just this observation we could then say that there are
things to improve in Google's interface. I guess again that people are looking for
something new even if it is not their priority. This is why I would like to raise another
issue which is search engine awareness.


        3.3.3 Search engine awareness


        Search engine awareness is for me the key issue of this all work and reveal
two parts:
    •   Poor search engine awareness regarding how to use a specific search engine;
    •   Poor search engine awareness regarding the existence of other search
        engines;
        Both parts are fundamental. The first one deals with what we call search
tools. It consists of a combination of keys in order to fit a specific request.
              If you are on Google typing the following request:
                         search engine dependency
              is different from:

26 Eye tools. (2009): Eyetools Eyetracking Research.

           CHARDONNEAU Ronan - European Master in Business Studies                 32/66
« search engine dependency »
        In the first case you will look for pages which are containing the words:
search engine dependency as well as all the others combinations such as pages
with search dependency, engine dependency, search, engine, dependency. Whereas
in the other cases you are restricting the pages which include only those three
words and in the same order.


        It exists dozens of those tools per search engine, the most famous ones are
called « boolean operators ». Some websites provide the list of those tools27. We have
also to consider that all the search engines are not using the same conventions so
some tools under search engine A will not work on search engine B.


                 If you are on Yahoo the following request:
                     •    title:search engine
                 will give you all the web pages which have for titles the following
                 keywords. This tool is unfortunately not working on Google.


This example shows well how search engines are in fact complementary in order to
make precise research.


             Is Google really better than Yahoo?
             To get a piece of the answer you can make this empirical experience I
             made a couple of months ago. I did the same request on three different
search engines (Google, Yahoo and MS Live Search) and compared the first five
results and gave my opinion on each of those results. If the result fit my
expectations I gave one point if it did not zero. Google got a four out of five, Yahoo
a two out of five and MS Live Search a zero out of five. However all results from a
search engine to another was different. So Google won for sure at the time I
performed the award of pertinence, however the information I was looking may
have been on the two results Yahoo gave me.
So is Google better than Yahoo for this example the answer is yes. Does it mean
that I should not consider Yahoo? Absolutely not.

27 cf. Koch, P. / Koch, S. (1999-2006): A short and easy search engine tutorial.

           CHARDONNEAU Ronan - European Master in Business Studies                 33/66
3.3.4 Other search engines existence awareness


“This part needs additional information and improvements and is then not finished
yet.”


        Here is another issue which is actually not hitting only the competitors of
major search engines but also themselves. In order to be recognized in the world of
search engines you need very strong guarantees. This year two of them succeeds their
advertisement campaign: « Cuil » by claiming that they had a database which is 6
times more than the one of Google. It said also that their success is coming from the
fact that the company has been built by a former Google employee but actually Cuil
disappears very soon as the change of their CEO. The main reproach which has been
done is that the results given were not numerous enough which of course is seen
from a very bad eye when you are claiming that it is your main strength.


           In the world of search engines you cannot be a clown. People give you
           one time the floor and will not give it to you back if you do not
           perform as they wish.


        The other example I can give is the one of Exalead, a French search engine
which is supported by the European Union, the purpose behind is to set up an
European technology in order to face the American ones. So as you can imagine
there is a big support and a huge project behind which make me coming to the bullet
point above if you are a search engine which wants just to be correctly advertised
you definitely need very strong supports.




              I today January the 7th 2009 got twice in a day the same observation
              about researching information on the Internet. I was looking for two
              different songs but I did not know the title as well as the singer, all I
              had was some couple of words. I made then a research on Google with
the words I knew from the song and added the word « lyrics » as well in order to


          CHARDONNEAU Ronan - European Master in Business Studies                34/66
specify that what I was looking for was a song. I as well added the quotes « » in
order to say to Google that I wanted the words in this strict order in order for it to
identify better the song I was looking for among the jungle of the web.
I wrote then: lyrics « freaky people » the answers I got where dealing with a singer
called « Michael Franti and Spearhead » and this on three pages in a row. I tried to
set a different request and the result was the same. Then I gave a try to GrabAll
which is a double search engine displaying Google and Yahoo on the same screen but
on two different columns. I made then the same request and got very surprise to see
the name of the Fat Boy Slim band on the first page of Yahoo. This result was on the
fourth page of Google instead. Here I have two points to make. Firstly the utility of
using a second search engine in order to control the result and secondly the fact
that Google was taking in account that I was 100% of what I was looking for.
Google was in fact giving me those results because the two words « freaky people »
are a part of a title of a song of « Michael Franti and Spearhead » but did I asked for
that? No I just asked for lyrics which have the words « freaky people ».
The second example I am going to give you is as well quite relevant of how search
engine dependency can be dangerous. I was looking this time for lyrics with « lyrics
wanna be your doll » what I did not know is that I had it wrong and it was not
« lyrics wanna be your doll » but « wanna be your dog ». The fact is that both songs
containing those lyrics exist and that Google was displaying me the songs containing
the words I just fill in whereas actually Yahoo was presenting me diverse results
including the one I wanted.
        I am not writing here that Yahoo is better than Google, I am just saying that
there is a high interest to have a look around in the world of search engines, I could
have a look around all night on Google looking for my song that I will have never
found because I had it wrong since the beginning. So looking for diversity should
be the first reflex when you cannot find what you are looking for in short
attempts.


        3.3.5 Less confident regarding other search engines


“This part needs additional information and improvements and is then not finished
yet.”


          CHARDONNEAU Ronan - European Master in Business Studies                35/66
This part is in fact in close link with the example I just gave above. Each
search engine has his own methodology when displaying the result. When you get
used to get the results in a specific way then when it comes to be different you may
think that the results are totally absurd. The problem is that the more you get addicted
the more you will have the feeling that other search engines are bad and then will not
give them a try.
        This is why I describe search engine dependency as the fact to be less and less
confident regarding other search engines.


        3.3.6 Even the best cannot provide you everything


“This part needs additional information and improvements and is then not finished
yet.”


        Some functionalities are not included in the biggest search engines which
make not you think that they could even exist such as search engines for jobs or
people. How to get a visual representation of what I am looking for? And the main
problem of the big search engines is that they are too big so even if they have what
you are looking for it is not sure that you can even see it.




          CHARDONNEAU Ronan - European Master in Business Studies                 36/66
Chapter 4: Risks of search engines dependency
       and its influence on data quality




    CHARDONNEAU Ronan - European Master in Business Studies   37/66
This part is dealing with the measurement and the proves of what I am
writing about. Here I expect to show how search engine dependency is affecting our
everyday life and how we could overcome this situation.


        4.1 The information has been found but is poor


“This part needs additional information and improvements and is then not finished
yet.”


        Let's imagine that you and I just performed a request under a specific search
engine and that the answers given do not satisfy you entirely. You found the
information but it is not developed enough, not signed, too old...


        4.2 What the search engines do not tell you


        One of my former English teacher taught me one day that there is different
ways of not giving information, one is to lie and one is to not say the information. I
don't think search engines are lying and cheating even if in the case of Baidu the
Chinese search engine there are high suspicions on it28. Some are saying that in order
to be well ranked you need to pay Baidu for it and it has been said as well that the
Chinese government as a big role to play when displaying the results.
        The recent Milk scandal in China (Baidu accepted to high ranked unlicensed
companies which were providing fake milk in exchange of money) showed how
dangerous can be a poor data quality search29.
        I however have the proof that search engines are not telling you everything
when displaying results and that this rule is general for all the major search engines.
        In Google case, search engines are censored according to the country in
which you are making your research from30 and it touches all the country around the
world even countries such as France and Germany.
        Most of the time those censorship acts are for your welfare or to protect
national security interest.

28 cf. China Tech News.com. (2007): CCTV: Baidu Search Engine Fraud Exposed?.
29 cf. China Daily. (2008): Baidu cuts revenue forecast on ad scandal.
30 cf. ROSEN, J. (2008): Google’s Gatekeepers.

          CHARDONNEAU Ronan - European Master in Business Studies                 38/66
4.3 The best way to get data quality


       Here is maybe the most interesting part of my work which consists in giving
the solution to the main issue I highlighted since the beginning.
       Until now I just raised questions regarding the risks of dependency and not
showed what you should do in order to explore the web properly.


       4.3.1 The sub-search engines


       As I said previously Google is too big, the bigger it is the more precise your
request have to be. Few people knowing the existence of Boolean operators it then
comes more and more difficult to get the right information. This is why in fact
Google put at your disposal some sub-search engines in order to make your research
easier, for example: Google books, Google videos, Google images. We all agree that
looking for pictures could be done on the research bar of Google, but it is far more
convenient to make it directly from « Google images » because it is displayed better.
       The main problem is what I described previously: « search engine awareness
within a search engine ». Am I aware that Google has the following services?
       I could sum up it all in a scheme:




                  Illustration 9: Some sub-search engines of Google


       For a Google search engine dependent everything starts from Google home
page. He is then aware here of those specialized search engines in order to make

          CHARDONNEAU Ronan - European Master in Business Studies               39/66
more precise research on the following information: images, maps, news and
products. It is however under his own initiative to discover what is going on under
those search engines and to discover them. I do not have a proof of what I am saying
but I may presume that in order to buy a product more people are going on Google,
type eBay or Amazon and go and buy on those websites rather than going on Google
products. Google products is however including all those websites in his database so
it should more convenient to have a look on Google products first in order to get the
best price rather than going individually on eBay and Amazon.
          We can then symbolize the Internet users awareness of Google search engines
according to this scheme.
          The more we get into deep of those levels and the least the Internet user is
aware of it.
          Level 2 is still available on the first page but with two clicks.
          Level 3 is no more available on the home page but can be accessible in three
clicks.
          Level 4 is even not accessible from Google main website and that you have
too be aware of it in order to access it.
          Level 5 is hypothetical and constitute the unknown Google projects, often
developed under another name.


             All those Google sub levels have been created in order to make easier
             research on a specific request. When looking for an image it is far
more convenient to pass through Google images rather than the general Google.


          4.3.2 The size of the Internet


               How big is the Internet? Is Google indexing all websites?
               It is really hard to answer to those questions but however possible to set
               up some estimations according to some information31. Those sources are
saying that in 2005 the size of the Internet is estimated to 5 million terabytes and
Google's index to 170 terabytes which would mean that Google is processing only
0,000034%. However the Internet is also containing what we call the invisible web

31 cf. Plesu, A. (2005): How Big Is the Internet?.

            CHARDONNEAU Ronan - European Master in Business Studies                40/66
composed of websites that owners do not want its content indexed as well as
websites which are protected by a password. In 2004 this invisible web was
estimated to be 500 times bigger than the visible web. It has been said as well that
Google is indexing invisible web only recently32.
        A clever calculation will then give us:
Internet size = Visible Web + Invisible Web;
Internet size = 501 * Visible Web;
Visible Web = Internet size / 501;
Visible Web = 5 000 000 / 501;
Visible Web = 9980 terabytes;
Google Index = 170 / Visible Web;
Google Index = 1,7%
        This estimation is of course only an estimation and could be full of errors. I
however find it more useful than no information at all. I would also emphasize the
fact that Google has no interest in indexing bad quality websites and that
technologies have evolved in the last few years and that this rate should be of course
far higher than those 1,7%. Whatever is the final result my point is the following:




            Google is not the Internet and is not processing all the web... but does
            all the web need to be indexed?


        4.3.3 Single search engine Internet coverage


“This part needs additional information and improvements and is then not finished
yet.”


        Here is a more optimistic representation of Google Internet coverage I made
in order to show that Google is not the Internet and that even within Google's sphere
Internet users cannot access to all the information:




32 cf. Chitu, A. (2008): Google Starts to Index the Invisible Web.

           CHARDONNEAU Ronan - European Master in Business Studies               41/66
Illustration 10: Google's Index coverage
Google dependent people are not only using Google when making research but as
well Google partners all symbolized by the sign:



Powered by Google means according to an IT company called Alacra33:
Alacra uses Google Search Appliances to create the Alacra Compliance Web. The
use of Google Search Appliances combines the power of Google search technology,
including the ability to find the highest quality and most relevant documents, with
Alacra's domain expertise in selecting those web sites and pages which are relevant
for AML compliance.
Which means that here you have a partnership working more or less in the same way
that using a Google specialized search engine.
33 Alacra. (2008?): What does quot;Powered By Googlequot; mean?.

          CHARDONNEAU Ronan - European Master in Business Studies            42/66
There are thousands and thousands search engines all specialized in a specific
field on the Internet which are using Google technology to search sites such as
Tourism, tutorials...
       By using all those specialized search engines you are crawling better the
Google's coverage space.




           Illustration 11: “Powered by Google” search engines coverage


       As you can see here on this configuration by using specialized search engines
powered by Google you will always browse Google's cyberspace and when we know
that Google is not the Internet we can then ask ourselves how we can get the best out
of Internet.




          CHARDONNEAU Ronan - European Master in Business Studies              43/66
On January the 11th I went on both websites http://www.aol.fr/ and
               http://www.google.fr/ and type the following request 'les moteurs de
               recherches' both results on the first page were identical. The only
               difference is that Google gave me 3,210,000 results and AOL 290,000
(9% of Google results) so as I developed above AOL is looking at the same place as
Google but is applying more filters. Is it really useful considering that AOL is a
general search engine? The answer maybe be given in their home page at
http://search.aol.com/aol/webhome « The AOL Search engine delivers great search
results, enhanced by Google, plus relevant multimedia results delivered on a single
page-so you can search less and discover more. »34
        If you go on http://www.google.com/coop/cse/ you will have a good example
of the search engine powered by Google. You can even create your own one. All is
done by using Google technology and you are just applying your own filter by listing
the websites where you want Google to look inside.




        4.3.4 Multiple search engine Internet coverage


“This part needs additional information and improvements and is then not finished
yet.”


        Let's go now deeper in our analysis by taking in account search engine which
are not using Google technologies. All developed their own technology and are
sometimes better than Google in some fields worst in the other. We will then have
something like that:




34 cf. Boswell, W. (2004?): AOL Search: How to Search with AOL Search.

          CHARDONNEAU Ronan - European Master in Business Studies             44/66
Illustration 12: Search engines coverage


        So here is my point the most the search engines are different and the most it
gives you the possibility to discover the web. Here I just put the main actors but we
have also to consider that it exists a lot of small search engines which developed their
own index and have then their own way to process data.
        Of course one could say that there is no interest for an European person to
process information on a Chinese or even Russian search engine because Chinese
search engine should of course be better in looking for Chinese information rather
than an European one. It is definitely true if we are speaking about contents such as
texts, but when it is dealing with pictures or video the reality should be totally
different.
        On the other hand the day where this European person is looking for Chinese


             CHARDONNEAU Ronan - European Master in Business Studies              45/66
information he should then be aware that he should use the Chinese search engine
rather than the Chinese version of the European one. And as we can imagine all the
other players: Yahoo, Baidu, Mail.ru have their own Boolean operators and their own
sub search engines. So it comes back to the idea of search engine awareness. The
more you know about them and the most you are increasing your knowledge about
how to get the best out of Internet.


        4.3.5 Others search engine Internet coverage


“This part needs additional information and improvements and is then not finished
yet.”


        You have a last category of search engine which are the one coming from the
semantic web concept which are not based on keywords requests. I found one of
those which is called « Who is like it » and gives as results websites similar to the
one you just gave him.




                    Illustration 13: “Who is like it” search engine


        Of course solutions have been found in order to create the perfect search
engine including all those technologies. But as you can imagine it is not an easy
thing if firstly search engines did not take the same standards as Boolean operators.
Secondly the more search engine you combine the more results you will get and it
then very complicated to decide which one is more pertinent than the other.


          CHARDONNEAU Ronan - European Master in Business Studies              46/66
A good example of it is the meta search engine called Dogpile which is quite
popular in the United States it combines the results of 4 big search engines which are
Google, Yahoo, MSN live search and Ask. The main problem is already showed on
the advanced search web page of Dogpile, few are the Boolean operators limited to
4.




                     Illustration 14: Dogpile meta search engine


       The second issue which is relevant is how to decide which results from those
search engines are the most relevant. I showed you previously that when making a
comparison between Google and Yahoo I found what I wanted on Yahoo on the 8th
position. So it is hard to decide which result is the most relevant and to this game you
are the only judge of the situation.
       So the hypothetical search engine should have the following characteristics:
including all the technologies ever created regarding search engines (Google's
knowledge+Yahoo knowledge+MSN knowledge+...) in order to define the most
powerful algorithm ever. All those companies have of course different objectives
than being the most rational search engine all are looking to be the number one. This
day will may come but it will not be for sure for tomorrow so waiting for it we
should understand how to use one by one all those technologies.


               The solution being then to test the search engines one by one but
           before testing all of them you have to know that they exist and how to

          CHARDONNEAU Ronan - European Master in Business Studies                 47/66
use them but you should at least to know that they exist.


       4.3.6 A concrete representation of the World Wide Web


       In order to make this work easier a Japanese company set up regularly a web
map of the most famous websites in the world by referencing them by categories this
map could of course be improved but should be a good start:




            Illustration 15: A representation of the most visited websites

       This map is available at the following address: http://informationarchitects.jp/
start/ with all the links included towards the websites it is composed of. It is very
interesting in order to break the search engine dependency phenomenon. On this map
are located all the most famous websites for 2008 you can then see all the most


         CHARDONNEAU Ronan - European Master in Business Studies                 48/66
influential websites and where to seek for information. For example if I want to look
for a video I may follow the gray line and test all the websites which are on it to find
the video I am looking for.


        In order to conclude this part which actually should not be the core of the all
thesis the Web is big and in order to discover it you need to know how to.
        We could compare it to the real world when living in country A you receive as
feedback from country B by different ways (people who are moving from country B
to A, the news, the books and documentations you have about country B) but you
will never be physically in country B and for this there are some information that you
could never get. Of course you can get nearer to those information by crawling more
and more the web with your country A search engine (it will be like documenting
yourself more and more about country B) but it will never be like being and living in
country B.


        4.4 The gap between search engine dependency and data quality


“This part needs additional information and improvements and is then not finished
yet.”


        In this part I will explain how to put in evidence the gap of information
between being search engine dependent of only one search engine and using the most
rational tool to search for information on the Internet.
It may not be easy to prove it concretely so I may need to prove it by making
empirical studies.
        I succeed to put in evidence so far that:
        Internet users are search engines addicted;
   •
        Internet users have few knowledge about search engines awareness;
   •
        I have now to prove that this is bad;
   •


        My first point will be to show that search engines are using different
technologies provide different results.
        Here is a comparison I made for three search engines through

          CHARDONNEAU Ronan - European Master in Business Studies                 49/66
http://www.thumbshots.org/Products/Thumbshots/Ranking.aspx which shows how
many similarities search engines have among them. Here I said if I type in
“Universität Kassel” what are the results that those search engines have in common
(in blue). And I moreover added the option to highlight me the website of the
university of Kassel (in red) which for me is a sign of relevancy of my request.




            Illustration 16: Comparison among Google, Yahoo and MSN
Here as we can see:
       Google has 7 similarities with Yahoo out of 60;
   •
       Google has 10 similarities with MSN out of 60;
   •
       Google found two times the website I was looking for;
   •
       Yahoo has 4 similarities with MSN out of 60;
   •
       Yahoo found one time the website I was looking for;
   •
       MSN found 4 times the website I was looking for;
   •

         CHARDONNEAU Ronan - European Master in Business Studies                   50/66
This analysis is then confirming what I was supposing before search engines do not
look for information at the same place.
Then one could ask about the pertinence of the results, is it worthwhile to display
more than one time on one page the website of the university of Kassel? Or is it a
sign of relevancy? So here we have an interpretation according to the Internet user.




           One thing is sure search engines using different technology provide
           different results.




         CHARDONNEAU Ronan - European Master in Business Studies                 51/66
Chapter 5: The Google example




CHARDONNEAU Ronan - European Master in Business Studies   52/66
“This part needs additional information and improvements and is then not finished
yet.”


         Google is for sure in some parts of the world the best example we can find of
the search engine dependency phenomenon I described.


         5.1 Google

         In January 1996 a 24 year-old PhD student called Larry Page studying at the
University of Stanford was looking for a theme for his thesis. Encouraged by his
supervisor he studied the following topic “exploring the mathematical properties of
the World Wide Web“ working in collaboration with another student called Sergey
Brin. To make it simple, it is from this work and collaboration which will came up
“Google Inc” (officially created in September the 7th 1998).
         Two months later Google is already included in the Top 100 of world
websites of PC magazine (a reference in the United States for computers).
Even if Google is formerly a web based application in English it is a worldwide
service available on the Internet for all. As his creator (Larry Page) said quot;Google's
search engine has always had strong global appealquot;35.
         Google is nowadays the most famous search engine. In ten years as the
Millward Brown report said36 Google will become the most powerful brand in the
world.


         5.2 Google's success


         The broadest explanation I found is the following « Google provides for free
a useful service that people actively seek out »37. And when you think about it it is
definitely true. People are going on Google because they are all looking for that kind
of services. But how can it be more successful than its fellows? Here I could say that
in general Google is giving better results which may be one reason, but moreover it

35 cf. Chardonneau, R. (2008): International Marketing: Google.
36 cf. Millward Brown. (2008): Top 100 most powerful brands 08.
37 Eternal Dreamer. (2008): Why Google is so Successful?.

           CHARDONNEAU Ronan - European Master in Business Studies              53/66
is providing added services such emails, blogs, news, Advertisement programs.
         Moreover Google's health as a company has been well preserved by making
good choices when making acquisitions and mergers, they internationally developed
themselves very well.


         5.3 Google dependency state


         If we take Europe we can see that it is definitely a Google dependent
continent:




                     Illustration 17: Google's domination in Europe


         As we can see here Google (in blue) is not only the most used search engine
in those countries it is in fact like the only search engine present at the continental
level.


         5.4 Google functions


         Google Boolean operators are numerous but who really use them when
performing a research on Internet? A simplified interface of them is available at
http://www.google.com/advanced_search?hl=eng but here also who really makes
research under it?



           CHARDONNEAU Ronan - European Master in Business Studies               54/66
Illustration 18: Google's Advanced search engine


             5.5 Google added functionalities


        Here is another part I did not developed so far. It deals with other
functionalities and services that the search engine do have but do not put sufficiently
in advance to the Internet user. So let's say that the advanced research which is
located on the home page is already not that much used you can then imagine how
much is used a service which is hidden.


             5.6 Google success is his weakness


        The fate of Google is linked also to the one of Search Engine Optimization
which is the ability to well index a website on search engines. The main issue is that
Google being the most well known search engine a lot of computer scientists tried to
understand how Google was working in order to index web pages. The more
experiments are made and the more the secret algorithm of Google is known. As we
can imagine few are the web masters who are interested to have a website in the
latest pages of Google. This is why Google results are not the most rational and are
not the reflect of our physical society.
        This observation is far more important if we consider that some studies38
showed that Internet users are considering only the first 3 results of Google when
making research on it.
38 cf. Enquiro. (2008): Eye Tracking Studies.

           CHARDONNEAU Ronan - European Master in Business Studies               55/66
Illustration 19: Eye Tracking on Google


             5.7 Google's disappearance hypothesis


       It is hard to believe that a company such as Google can close his gates one
day but everything can happen and moreover in the world of ICT, so let's imagine
what will be the consequences of his disappearance. I truly believe that all Internet
users will go for its followers Yahoo and Microsoft with exactly the same behaviors.
       It however said that the patent of Google algorithm belong to the Stanford
University and expires in 2011 what will then happen to Google at that time?


       On January 31th 2009 a good example of this hypothesis has been put in
evidence on a global scale. Due to a Google technician mistake during 50 minutes
Google search engine was displaying all the results websites as potentially
dangerous:




         CHARDONNEAU Ronan - European Master in Business Studies               56/66
According to French analysts this human error decreased the number of
Google visitors of more than 90%:




         CHARDONNEAU Ronan - European Master in Business Studies               57/66
The first conclusion we can set is that people are definitely trusting Google. If
Google say something is that it is supposed to be true.
        According to French analysts during that time Internet users did not went for
other search engines39. They probably have shut down their computers during that
time. However the day after all search engines competitors rose drastically their
market shares except for powered Google search engines40. It clearly shows that
Internet is stopping during a rather short period of time before that Internet users
goes for alternative solutions. They then go for the major alternative search engines.




39 http://www.pcworld.fr/actualite/google-panne-et-degats-collateraux/26021/
40 http://fr.news.yahoo.com/16/20090203/ttc-le-bug-de-google-revele-la-forte-dep-c2f7783.html

           CHARDONNEAU Ronan - European Master in Business Studies                         58/66
Conclusion

        The purpose of this intermediate report was for me to show my knowledge
about the topic, to prove that the problematic is pertinent, interesting and that there is
enough to write about it.
        I wanted to show as well that I have the ideas clear enough for the next
coming months. What I keep in mind from this work is that I like the research I am
doing, that I am very curious about the final result (how to make the difference when
looking for information on the Internet and that the gap between rational search of
information and mass search information is abyssal).
        I am quite conscious that a lot of work has to be done and that a lot of
questions have still no answers.
        However I would like as well to emphasize what I personally did so far and
what I learned:
        I know now how to structure a master thesis. I had until now no knowledge
    •
        about it and I learned from scratch how to set up an academical report;
        I studied since June 2008 the world of search engines and comes to know a
    •
        lot about the topic. Once graduated I would like to continue on the field of e-
        marketing either on a PhD level or in an international e-marketing company.
        This thesis is then making me more valuable to pretend to one of those two
        goals;
        Studying this field brought me additional ideas regarding the e-marketing
    •
        field. I wrote done an additional 20-pages report (not included in this
        intermediate report) about a hypothetical computer application which could
        give information about how to launch e-marketing campaigns on a global
        scale;


        February will be for sure the month which will decide how this master thesis
will be achieved. I have concretely more than 3 weeks to dedicate days and nights to
this project.
        The biggest issues I will have to think about are:
        Chapter 3.1.6: regarding semantic web I did not study yet in detail the topic
    •


          CHARDONNEAU Ronan - European Master in Business Studies                   59/66
and I will have to come back on this point later;
    Chapter 4: how to measure the gap of information between mass behavior
•
    when looking for information on the Internet and the most rational behavior?;
    Chapter 5: The Google example;
•
    Re writing the thesis from an informal point of view this time (using the third
•
    person of the singular instead of the first which does not sound very
    academic);




      CHARDONNEAU Ronan - European Master in Business Studies                60/66
Declaration

       I certify that this work has been done by myself and only myself. All the
sources used for its realization have been well indicated.




Ronan CHARDONNEAU
European Master in Business Studies
Institut de Management de l'Université de Savoie d'Annecy (FR)
Università degli studi di Trento (IT)
Universität Kassel (GER)
Universidad de León (SP)
22th January, 2009




 CHARDONNEAU Ronan - European Master in Business Studies 2007/2009 61/66
List of literature

Alacra. (2008?): What does quot;Powered By Googlequot; mean? URL: http://www.alacra.com/compliance-
   search/faq.asp Last access: 23/01/2009
Alexa the Web information company. (2008): Top sites by country URL:
   http://www.alexa.com/site/ds/top_500 Last access: 23/01/2009
Anonymous. (2007): Baidu au Japon? URL: http://asia.2803.com/chine/baidu-au-japon/ Last access:
   23/01/2009
Arrington, M. (2008): Cuil On BusinessWeek's Most Successful of 2008 List. Huh? URL:
   http://www.washingtonpost.com/wp-dyn/content/article/2008/12/29/AR2008122901227.html Last
   access: 23/01/2009
Battelle, John. (2005): The birth of Google URL:
   http://www.wired.com/wired/archive/13.08/battelle.html?tw=wn_tophead_4 Last access:
   23/01/2009
Boswell, W. (2004?): AOL Search: How to Search with AOL Search URL:
   http://websearch.about.com/od/enginesanddirectories/a/aol_search_2.htm Last access: 23/01/2009
Boucq, I. (2009): Yahoo et vos données persos... URL:
   http://www.erenumerique.fr/yahoo_et_vos_donnees_persos_-news-15162.html Last access:
   23/01/2009
Chardonneau, R. (2008): International Marketing: Google URL:
   http://storage.canalblog.com/24/01/274197/31730758.pdf Last access: 23/01/2009
China Tech News. (2007): CCTV: Baidu Search Engine Fraud Exposed? URL:
   http://www.chinatechnews.com/2007/05/31/5459-cctv-baidu-search-engine-fraud-exposed/ Last
   access: 23/01/2009
China Daily. (2008): Baidu cuts revenue forecast on ad scandal URL:
   http://www.chinadaily.com.cn/bizchina/2008-12/14/content_7302788.htm Last access: 23/01/2009
Chitu, A. (2008): Google Starts to Index the Invisible Web URL:
   http://googlesystem.blogspot.com/2008/04/google-starts-to-index-invisible-web.html Last access:
   23/01/2009
Crédoc, (2008): La diffusion des technologies de l'information et de la communication dans la société
   française URL: www.arcep.fr/uploads/tx_gspublication/etude-credoc-2008-101208.pdf Last acess
   23/01/2009
Crepuq. (2003): Etude sur les connaissances en recherche documentaire des étudiants entrant au 1er
   cycle dans les universités québécoises URL:
   http://www.crepuq.qc.ca/documents/bibl/formation/etude.pdf Last access: 23/01/2009
EduDoc. (2008): Enquête sur les compétences documentaires et informationnelles des étudiants qui
   accèdent à l'enseignement supérieur en Communauté française de Belgique URL:
   http://www.edudoc.be/synthese.pdf Last access: 23/01/2009



 CHARDONNEAU Ronan - European Master in Business Studies 2007/2009 62/66
Einhorn, B. (2007): Baidu Thinks It Can Play in Japan URL:
 http://www.businessweek.com/globalbiz/content/feb2007/gb20070215_649662.htm?
   chan=globalbiz_asia_technology Last access: 23/01/2009
Enquiro. (2008): Eye Tracking Studies URL:http://www.enquiroresearch.com/eyetracking-report.aspx
   Last access: 23/01/2009
Enquiro. (2004): Search Engine Usage in North America URL:
   http://pages.enquiroresearch.com/search-engine-usage-in-na.html?
   source=Search_Engine_Usage_In_North_America_whitepaper Last access: 19/01/2009
Eternal Dreamer. (2008): Why Google is so Successful? URL:
   http://crumja.wordpress.com/2008/05/20/why-google-is-so-successful/ Last access: 23/01/2009
Eye tools. (2009): Eyetools Eyetracking Research URL:
   http://www.eyetools.com/inpage/research_google_eyetracking_heatmap.htm Last access:
   23/01/2009
Grallet, G. (2009): Baidu, un autre Google s'éveille URL: http://www.lexpress.fr/actualite/high-
   tech/baidu-un-autre-google-s-eveille_734826.html Last access: 23/01/2009
Houste, F. (2009): Russie: Yandex sera le moteur de recherche par défaut de Firefox URL:
   http://www.search-engine-feng-shui.com/2009/01/russie-yandex-sera-le-moteur-de-recherche-par-
   defaut-de-firefox/ Last access: 23/01/2009
Internet World Stats (2008): Internet Usage and Population in North America URL:
   http://www.internetworldstats.com/stats14.htm Last access: 23/01/2009
Internet World Stats (2008): INTERNET USAGE STATISTICS
   The Internet Big Picture URL:http://www.internetworldstats.com/stats.htm Last access:
   23/01/2009
Internet World Stats. (2008): Internet Usage in Asia
   URL:http://www.internetworldstats.com/stats3.htm Last access: 23/01/2009
Kahn, J. (2005): Yahoo helped Chinese to prosecute journalist URL:
   http://www.iht.com/articles/2005/09/07/business/yahoo.php Last access: 23/01/2009
Kaplan, I. (2008): Bad Data Can Cost You Big Time URL:
   http://www.federationofcredit.com/base/document/Newsletter/IKaplanSept08.html Last
   access:23/01/2009
Koch, P. / Koch, S. (2009): How big is the Internet? URL: http://www.pandia.com/sew/383-
   websize.html. Last access: 19/01/2009
Koch, P. / Koch, S. (1999-2006): A short and easy search engine tutorial URL:
   http://www.pandia.com/goalgetter/index.html Last access: 19/01/2009
Leung, Hon-wing 梁漢榮. (2004): A study of computer science students' conceptions of information
   literacy and their experiences in information search process and use URL:
   http://sunzi.lib.hku.hk/hkuto/record/B29597730 Last access: 19/01/2009
Lipsman, A. (2007): 61 Billion Searches Conducted Worldwide in August/Google Ranks as Top
   Global Search Property URL: http://www.comscore.com/press/release.asp?press=1802 Last



 CHARDONNEAU Ronan - European Master in Business Studies 2007/2009 63/66
Risks of Search Engine Dependency and its Influence on Data Quality
Risks of Search Engine Dependency and its Influence on Data Quality
Risks of Search Engine Dependency and its Influence on Data Quality

Contenu connexe

En vedette

Ali Hashemi Sohi -AVIMA10-Mater thesis-New aero engine concepts1
Ali Hashemi Sohi -AVIMA10-Mater thesis-New aero engine concepts1Ali Hashemi Sohi -AVIMA10-Mater thesis-New aero engine concepts1
Ali Hashemi Sohi -AVIMA10-Mater thesis-New aero engine concepts1Ali Sohi
 
Connected Car Investment Thesis
Connected Car Investment ThesisConnected Car Investment Thesis
Connected Car Investment ThesisJames Harris
 
210496458 cfd-modelling-combustor
210496458 cfd-modelling-combustor210496458 cfd-modelling-combustor
210496458 cfd-modelling-combustormanojg1990
 
Review of related literature and studies
Review of related literature and studiesReview of related literature and studies
Review of related literature and studiesbantigui
 
Fluid mechanics
Fluid mechanicsFluid mechanics
Fluid mechanicspawanjot
 

En vedette (7)

Ali Hashemi Sohi -AVIMA10-Mater thesis-New aero engine concepts1
Ali Hashemi Sohi -AVIMA10-Mater thesis-New aero engine concepts1Ali Hashemi Sohi -AVIMA10-Mater thesis-New aero engine concepts1
Ali Hashemi Sohi -AVIMA10-Mater thesis-New aero engine concepts1
 
Engine Thesis
Engine ThesisEngine Thesis
Engine Thesis
 
Gas Turbine Thesis
Gas Turbine ThesisGas Turbine Thesis
Gas Turbine Thesis
 
Connected Car Investment Thesis
Connected Car Investment ThesisConnected Car Investment Thesis
Connected Car Investment Thesis
 
210496458 cfd-modelling-combustor
210496458 cfd-modelling-combustor210496458 cfd-modelling-combustor
210496458 cfd-modelling-combustor
 
Review of related literature and studies
Review of related literature and studiesReview of related literature and studies
Review of related literature and studies
 
Fluid mechanics
Fluid mechanicsFluid mechanics
Fluid mechanics
 

Similaire à Risks of Search Engine Dependency and its Influence on Data Quality

cv Fabio Vitaterna ENG EUR
cv Fabio Vitaterna ENG EURcv Fabio Vitaterna ENG EUR
cv Fabio Vitaterna ENG EURFabio Vitaterna
 
Benjamin Laporte 2016 resume
Benjamin Laporte 2016 resumeBenjamin Laporte 2016 resume
Benjamin Laporte 2016 resumeBenjamin Laporte
 
Ann Tina Bianchi_Letter & CV_07-2016
Ann Tina Bianchi_Letter & CV_07-2016Ann Tina Bianchi_Letter & CV_07-2016
Ann Tina Bianchi_Letter & CV_07-2016Tina Bianchi
 
Europass-CV-20150405-Nitu-EN
Europass-CV-20150405-Nitu-ENEuropass-CV-20150405-Nitu-EN
Europass-CV-20150405-Nitu-ENRamona Nitu
 
My Professional Profile
My Professional ProfileMy Professional Profile
My Professional ProfilePaulo Carmo
 
Roberto carmone english cv
Roberto carmone english cvRoberto carmone english cv
Roberto carmone english cvRoberto CARMONE
 
Roberto carmone english cv
Roberto carmone english cvRoberto carmone english cv
Roberto carmone english cvRoberto CARMONE
 
Roberto CARMONE english cv
Roberto CARMONE english cvRoberto CARMONE english cv
Roberto CARMONE english cvRoberto CARMONE
 
Resume Fernanda Paulo Ramos - Oct/2013
Resume Fernanda Paulo Ramos - Oct/2013Resume Fernanda Paulo Ramos - Oct/2013
Resume Fernanda Paulo Ramos - Oct/2013Fernanda Paulo Ramos
 
CV eng_CBechet 2016_no contact
CV eng_CBechet 2016_no contactCV eng_CBechet 2016_no contact
CV eng_CBechet 2016_no contactCEDRIC BECHET
 
Written In India Stc India Keynote Dec 2005
Written In India  Stc India Keynote Dec 2005Written In India  Stc India Keynote Dec 2005
Written In India Stc India Keynote Dec 2005Christine Troianello
 
Rita Araújo_EN2015
Rita Araújo_EN2015Rita Araújo_EN2015
Rita Araújo_EN2015Rita Araújo
 
CV-Europass-20161018-Marcelino-EN
CV-Europass-20161018-Marcelino-ENCV-Europass-20161018-Marcelino-EN
CV-Europass-20161018-Marcelino-ENFilipe Marcelino
 

Similaire à Risks of Search Engine Dependency and its Influence on Data Quality (20)

cv Fabio Vitaterna ENG EUR
cv Fabio Vitaterna ENG EURcv Fabio Vitaterna ENG EUR
cv Fabio Vitaterna ENG EUR
 
Curriculum Vitae in English
Curriculum Vitae in EnglishCurriculum Vitae in English
Curriculum Vitae in English
 
Migrate and Evolve with SAP ERP
Migrate and Evolve with SAP ERPMigrate and Evolve with SAP ERP
Migrate and Evolve with SAP ERP
 
Benjamin Laporte 2016 resume
Benjamin Laporte 2016 resumeBenjamin Laporte 2016 resume
Benjamin Laporte 2016 resume
 
Ann Tina Bianchi_Letter & CV_07-2016
Ann Tina Bianchi_Letter & CV_07-2016Ann Tina Bianchi_Letter & CV_07-2016
Ann Tina Bianchi_Letter & CV_07-2016
 
Europass-CV-20150405-Nitu-EN
Europass-CV-20150405-Nitu-ENEuropass-CV-20150405-Nitu-EN
Europass-CV-20150405-Nitu-EN
 
CV George DINCA
CV George DINCACV George DINCA
CV George DINCA
 
Resume shutima p_dataeng01
Resume shutima p_dataeng01Resume shutima p_dataeng01
Resume shutima p_dataeng01
 
My Professional Profile
My Professional ProfileMy Professional Profile
My Professional Profile
 
Roberto carmone english cv
Roberto carmone english cvRoberto carmone english cv
Roberto carmone english cv
 
Roberto carmone english cv
Roberto carmone english cvRoberto carmone english cv
Roberto carmone english cv
 
Roberto CARMONE english cv
Roberto CARMONE english cvRoberto CARMONE english cv
Roberto CARMONE english cv
 
Cv 2016 eng 1
Cv 2016 eng 1Cv 2016 eng 1
Cv 2016 eng 1
 
RABOUTET Ludovic - CV
RABOUTET Ludovic - CVRABOUTET Ludovic - CV
RABOUTET Ludovic - CV
 
Resume Fernanda Paulo Ramos - Oct/2013
Resume Fernanda Paulo Ramos - Oct/2013Resume Fernanda Paulo Ramos - Oct/2013
Resume Fernanda Paulo Ramos - Oct/2013
 
CV eng_CBechet 2016_no contact
CV eng_CBechet 2016_no contactCV eng_CBechet 2016_no contact
CV eng_CBechet 2016_no contact
 
Written In India Stc India Keynote Dec 2005
Written In India  Stc India Keynote Dec 2005Written In India  Stc India Keynote Dec 2005
Written In India Stc India Keynote Dec 2005
 
Rita Araújo_EN2015
Rita Araújo_EN2015Rita Araújo_EN2015
Rita Araújo_EN2015
 
CV-Europass-20161018-Marcelino-EN
CV-Europass-20161018-Marcelino-ENCV-Europass-20161018-Marcelino-EN
CV-Europass-20161018-Marcelino-EN
 
Louis forino
Louis forinoLouis forino
Louis forino
 

Plus de Nanor

Intermediate Report Presentation Example
Intermediate Report Presentation ExampleIntermediate Report Presentation Example
Intermediate Report Presentation ExampleNanor
 
Google Presentation Search Engine Dependency
Google Presentation Search Engine DependencyGoogle Presentation Search Engine Dependency
Google Presentation Search Engine DependencyNanor
 
Google Presentation Search Engine Dependency
Google Presentation Search Engine DependencyGoogle Presentation Search Engine Dependency
Google Presentation Search Engine DependencyNanor
 
Consumer Behaviour Google
Consumer Behaviour GoogleConsumer Behaviour Google
Consumer Behaviour GoogleNanor
 
Consumer Behavior 7 Elements
Consumer Behavior 7 ElementsConsumer Behavior 7 Elements
Consumer Behavior 7 ElementsNanor
 
International Marketing Google
International Marketing GoogleInternational Marketing Google
International Marketing GoogleNanor
 
International Marketing Google
International Marketing GoogleInternational Marketing Google
International Marketing GoogleNanor
 
Browser Wars: Internet Explorer versus Netscape
Browser Wars: Internet Explorer versus NetscapeBrowser Wars: Internet Explorer versus Netscape
Browser Wars: Internet Explorer versus NetscapeNanor
 
Browser Wars Internet Explorer versus Netscape
Browser Wars Internet Explorer versus NetscapeBrowser Wars Internet Explorer versus Netscape
Browser Wars Internet Explorer versus NetscapeNanor
 
Search Engine Dependency Conference
Search Engine Dependency ConferenceSearch Engine Dependency Conference
Search Engine Dependency ConferenceNanor
 
Singleuropeanact
SingleuropeanactSingleuropeanact
SingleuropeanactNanor
 
Distribution Policy Apple
Distribution Policy AppleDistribution Policy Apple
Distribution Policy AppleNanor
 
Microsoft Distribution Channels Presentation
Microsoft Distribution Channels PresentationMicrosoft Distribution Channels Presentation
Microsoft Distribution Channels PresentationNanor
 
Mistery Shopping Grimm Museum
Mistery Shopping Grimm MuseumMistery Shopping Grimm Museum
Mistery Shopping Grimm MuseumNanor
 
Google International marketing
Google International marketingGoogle International marketing
Google International marketingNanor
 
Baidu
BaiduBaidu
BaiduNanor
 

Plus de Nanor (16)

Intermediate Report Presentation Example
Intermediate Report Presentation ExampleIntermediate Report Presentation Example
Intermediate Report Presentation Example
 
Google Presentation Search Engine Dependency
Google Presentation Search Engine DependencyGoogle Presentation Search Engine Dependency
Google Presentation Search Engine Dependency
 
Google Presentation Search Engine Dependency
Google Presentation Search Engine DependencyGoogle Presentation Search Engine Dependency
Google Presentation Search Engine Dependency
 
Consumer Behaviour Google
Consumer Behaviour GoogleConsumer Behaviour Google
Consumer Behaviour Google
 
Consumer Behavior 7 Elements
Consumer Behavior 7 ElementsConsumer Behavior 7 Elements
Consumer Behavior 7 Elements
 
International Marketing Google
International Marketing GoogleInternational Marketing Google
International Marketing Google
 
International Marketing Google
International Marketing GoogleInternational Marketing Google
International Marketing Google
 
Browser Wars: Internet Explorer versus Netscape
Browser Wars: Internet Explorer versus NetscapeBrowser Wars: Internet Explorer versus Netscape
Browser Wars: Internet Explorer versus Netscape
 
Browser Wars Internet Explorer versus Netscape
Browser Wars Internet Explorer versus NetscapeBrowser Wars Internet Explorer versus Netscape
Browser Wars Internet Explorer versus Netscape
 
Search Engine Dependency Conference
Search Engine Dependency ConferenceSearch Engine Dependency Conference
Search Engine Dependency Conference
 
Singleuropeanact
SingleuropeanactSingleuropeanact
Singleuropeanact
 
Distribution Policy Apple
Distribution Policy AppleDistribution Policy Apple
Distribution Policy Apple
 
Microsoft Distribution Channels Presentation
Microsoft Distribution Channels PresentationMicrosoft Distribution Channels Presentation
Microsoft Distribution Channels Presentation
 
Mistery Shopping Grimm Museum
Mistery Shopping Grimm MuseumMistery Shopping Grimm Museum
Mistery Shopping Grimm Museum
 
Google International marketing
Google International marketingGoogle International marketing
Google International marketing
 
Baidu
BaiduBaidu
Baidu
 

Dernier

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Risks of Search Engine Dependency and its Influence on Data Quality

  • 1. Risks of search engine dependency and its influence on data quality Thesis intermediate report submitted for the European Master in Business Studies (EMBS) by Ronan CHARDONNEAU Institut de Management de l'Université de Savoie d'Annecy (FR) Università degli studi di Trento (IT) Universität Kassel (GER) Universidad de León (SP) Date of submission: 26th January, 2009 Master Thesis
  • 2. Disclaimer Before starting this report I inform any readers that this is an intermediate report of a final master thesis which will be finished in June 2009. The work within is the result of months of work since June 2008. To find what inspired me please have a look at my blog (in French) at: http://moteurs-de-recherches-alternatifs.blogspot.com/ The following intermediate report has not taken yet ( at the date of February the 13th 2009) in consideration the pieces of advice from my teachers. Even if there are not many corrections to do this work is then not perfect yet. I removed the acknowledgements part for the public version of the thesis. Lastly I would like to inform any readers of this work that I will normally be graduated in June 2009 and that from this date I will be actively looking for a job or a phd in the field of international e-marketing. I am very flexible and able to move at any place around the world. Please feel free to contact me though my blog by posting a comment or at ronanchardonneau@gmail.com. CHARDONNEAU Ronan - European Master in Business Studies 2007/2009 2/66
  • 3. Ronan CHARDONNEAU Blog: http://moteurs-de-recherches- ronanchardonneau@gmail.com alternatifs.blogspot.com/ Professional experience Educational background □ 2007/2009 • Search Engine Optimizer(Eng-IT) European Master in Business Studies June- September 2008 – Search engine Quadruple degree in international optimizer in English and Italian management (Trento, Annecy, Kassel Edexon Venice - Italy Léon) • International salesman □ 2006/2007 April- august 2007 – Selling IT services A three-year degree in Import-Export on the local foreign market (IUT Quimper) Easteq Shanghai - China □ 2004-2006 Assistant of the director of operations □ A two-year degree in Business and April- June 2006 – Creation of a search Administration: Finance, Accountancy engine for a database and Computering Northwest Folklife Seattle - USA (IUT Nantes) Track and Field coach trainer □ • 2004 2006- 2008 Scientific A-level Mailman □ (Lycée Clemenceau Nantes) Summer 2004-2006 – (La Poste - France) Professional projects Foreign languages □ Market study Possibilities of entering the Italian market □ Trilingual: French, English, Italian for Easteq China Offshore Autonomous in those three languages □ Market study Research of distributors and competitors • Chinese in Germany for Thyssen Krupp Elevators Able to stand basic conversations □ Setting up a prototype computer application for the students □ German and Spanish registration system of the university of Elementary level (A2) Nantes Others Computer skills □ Extra scholars activities • Search Engine Optimization Author of several blogs dealing with trips 6 months of experience in this field and many articles about Web applications. □ Software Suite □ Centers of interests Competent in GIMP, Microsoft and Open Foreign languages, traveling (more than Office. Notions of Linux/Ubuntu 20 countries visited), computers and • Programming technology Notions ofCHARDONNEAU Ronan - European Master in Business Studies 2007/2009 3/66 PHP and XHTML
  • 4. Contents Foreword.......................................................................................................................6 Chapter 1: Introduction of the topic background..........................................................8 1.1 Relevance of the subject...................................................................................10 1.2 Major terms......................................................................................................11 1.3 Focus, goals and structure of the report...........................................................11 Chapter 2: Concept of data quality.............................................................................13 2.1 Data quality definition......................................................................................14 2.2 The importance of data quality.........................................................................15 Chapter 3: Search engines dependency.......................................................................16 3.1 Search engine market configuration.................................................................17 3.1.1 Search engine categories..........................................................................17 3.1.2 Search engine market...............................................................................19 3.1.3 The search engines in the world...............................................................19 3.1.4 The search engine market shares per country...........................................22 3.1.5 The search engines competition...............................................................23 3.1.6 The semantic web.....................................................................................24 3.2 Search engines dependency aspect...................................................................25 3.2.1 Search engines dependency proves..........................................................25 3.2.2 Search engines dependency aspect...........................................................27 3.3 Search engines dependency problems..............................................................28 3.3.1 Privacy issues...........................................................................................29 3.3.2 Looking for other search engines.............................................................30 3.3.3 Search engine awareness..........................................................................30 3.3.4 Other search engines existence awareness...............................................32 3.3.5 Less confident regarding other search engines.........................................33 3.3.5 Less confident regarding other search engines.........................................33 3.3.6 Even the best cannot provide you everything...........................................34 Chapter 4: Risks of search engines dependency and its influence on data quality.....35 4.1 The information has been found but is poor....................................................36 4.2 What the search engines do not tell you...........................................................36 4.3 The best way to get data quality.......................................................................37 CHARDONNEAU Ronan - European Master in Business Studies 2007/2009 4/66
  • 5. 4.3.1 The sub-search engines.............................................................................37 4.3.2 The size of the Internet.............................................................................38 4.3.3 Single search engine Internet coverage....................................................39 4.3.4 Multiple search engine Internet coverage.................................................42 4.3.5 Others search engine Internet coverage....................................................44 4.3.6 A concrete representation of the World Wide Web...................................46 4.4 The gap between search engine dependency and data quality.........................47 Chapter 5: The Google example.................................................................................50 5.1 Google..............................................................................................................51 5.2 Google's success...............................................................................................51 5.3 Google dependency state..................................................................................52 5.4 Google functions..............................................................................................52 5.5 Google added functionalities............................................................................53 5.6 Google success is his weakness.......................................................................53 5.7 Google's disappearance hypothesis..................................................................54 Conclusion..................................................................................................................55 Declaration..................................................................................................................56 List of literature...........................................................................................................57 Afterword....................................................................................................................61 CHARDONNEAU Ronan - European Master in Business Studies 2007/2009 5/66
  • 6. Table of figures Illustration 1: Total Sites Across All Domains August 1995 - January 2009................9 Illustration 2: A light interface: the homepage of the search engine www.ask.fr.......17 Illustration 3: Home page of the Mail.ru portal..........................................................18 Illustration 4: Top 10: Search websites in the world August 2007.............................19 Illustration 5: The most visited website by country....................................................20 Illustration 6: Search engine market shares in Czech Republic..................................22 Illustration 7: Results page of Cuil.............................................................................24 Illustration 8: Search engines figures for France, Source: XitiMonitor .....................28 Illustration 9: Some sub-search engines of Google....................................................37 Illustration 10: Google's Index coverage....................................................................40 Illustration 11: “Powered by Google” search engines coverage.................................41 Illustration 12: Search engines coverage....................................................................43 Illustration 13: “Who is like it” search engine............................................................44 Illustration 14: Dogpile meta search engine...............................................................45 Illustration 15: A representation of the most visited websites....................................46 Illustration 16: Comparison among Google, Yahoo and MSN...................................48 Illustration 17: Google's domination in Europe..........................................................52 Illustration 18: Google's Advanced search engine......................................................53 Illustration 19: Eye Tracking on Google.....................................................................54 CHARDONNEAU Ronan - European Master in Business Studies 6/66
  • 7. Foreword As most of the students who has a computer one of my first move when I wake up is to switch on the computer and to spend my first twenty minutes of the day on the Internet. From there I have a look at the last news, I check my e-mails and eventually exchange some few words with a couple of friends by using online chat applications. I also check my other email account as well as my blogs and analyze the traffic I got during the last few days, to finish this process I consult my advertisement account to see if I got some revenues. I often use as well search engine to look for information which just came up into my mind during the night. In the paragraph you just read was the description of my morning routine on Internet. There is nothing special except that most of the moves I described above are in fact done on two to three major search engines: Google, Yahoo and Microsoft. I hardly ever use Yahoo or Microsoft for search purpose but Google is for sure the website I visit the most to crawl the web but... is Google the Internet? I got the idea to write about: « Risks of search engine dependency and its influence on data quality » not because I was using all those Google applications everyday and was scared about what will happen if I get in troubles with Google such as privacy issues or if Google just closed. I just write about it because one day I found Google results not accurate enough. And from this observation a lot of questions came to my mind: • Is it me who is not good enough at performing research on the Internet? • Is it because no one wrote about the information I am looking for? • Is it because the information is not on the first pages in Google that I have to browse all the pages in order to find it? • Is it because Google is not good enough? • Is it because the information is hidden in some other documents such as PDF, pictures, videos? • Is it because I have to use another way to crawl the web and if yes how? You see here how a simple observation can raise a lot of questions. I hesitated a lot about writing on this topic, the main problem I got was that I was not convinced that there is a potential risk of being search engine dependent. The CHARDONNEAU Ronan - European Master in Business Studies 7/66
  • 8. reason is that companies such as Google are working hard in order to fit Internet users expectations and the vision we get is that they are doing a wonderful work. The problem is that there could be a difference between perception and real facts and this is exactly what I am eager to discover here. Can we measure how huge is the gap between the information we were looking for and the one of search engines as Google are providing us? Search engines are set up to find information on the Internet, information being the basis of any good decisions making we can then understand how important and interesting it is to write on this topic. I hope you will appreciate this reading as much as I did when making my research. Ronan CHARDONNEAU CHARDONNEAU Ronan - European Master in Business Studies 8/66
  • 9. Chapter 1: Introduction of the topic background CHARDONNEAU Ronan - European Master in Business Studies 9/66
  • 10. I will not surprise you if I say that Internet has been created to share information and to communicate with each others. It is hard to evaluate how big is the Internet, estimations among companies are very different, it varies from 15 to some 30 billion Web pages1. The number of websites is increasing everyday and estimated at 185,167,8972 with a constant augmentation since the creation of the world wide web. Illustration 1: Total Sites Across All Domains August 1995 - January 2009 Habits have changed since the creation of the Internet and websites are used now in diverse manners if it comes to be a standard for companies (recognized as a mark of trust, seriousness and quality) it is also a space for many individuals (blog phenomenon). As an example regarding France, in June 2008 14% of French people above 12 year-old which means 22% of French Internet users are authors of a blog or a website3. The banalization of the Internet and the fact that anyone can create his own website for free increase the feeling we have regarding the Internet: a true jungle of information and even sometimes real “dump” regarding information accuracy. 1 cf. Koch, P. / Koch, S. (2009): How big is the Internet?. 2 Netcraft, (2009): January 2009 Web Server Survey. 3 Crédoc, (2008): La diffusion des technologies de l'information et de la communication dans la société française, p.120. CHARDONNEAU Ronan - European Master in Business Studies 10/66
  • 11. Websites can be accessible through three channels: • Direct access (for example you know the website address by heart, you put it in your favorites or you find a website on a business card and you are typing it in the address bar); • External links (you access to a website which has the link of another website, this is the case in most of websites, catalogs, advertisement); • Through Search Engines (you use a dedicated application by typing in some keywords in order to get suggestions of what you are looking for); As you can see from this list if you use only the first two ways to crawl the web it comes to be too rigid and not wide enough. It has been said as well that the first way is disappearing more and more in profit of search engines4. So one could say that there is currently two main ways to crawl the web, from link to link and by using search engine. This last one being indispensable in order to crawl the web properly. More and more information are put on the Internet which makes it come a true jungle. The only way to crawl those information properly is to use search engines. 1.1 Relevance of the subject Internet is becoming more and more our information provider. quot;In 2002 a study from the U.S. Department of Commerce estimated that 36% of the American public over the age of 3 used the Internet to search for product and service information in 2001. This usage represented a substantial increase in information search behavior from 2000, when 26% of the American public reported using the information.quot;5 Internet to search for product and service Between 2000 and 2008 the USA got an increase of Internet users of 130,9 %6 we can then imagine how important is searching information through Internet 4 cf. Ohayon, O. (2008): Google, moteur de recherche ou moteur de navigation?. 5 Peterson, Robert A. / Merino, Maria C. (2003):Consumer Information Search Behavior and the Internet, p.6. 6 Internet World Stats (2008): Internet Usage and Population in North America. CHARDONNEAU Ronan - European Master in Business Studies 11/66
  • 12. nowadays. The number of Internet users is estimated to 1,463,632,361 (world population 6,676,120,288) with a growth rate from 2000-2008 fixed at 305.5 %7. 1.2 Major terms In this thesis you will hear a lot about the following terms: search engines, search engine dependency and data quality. Search engine is the most flexible technology which has been created in order to crawl the web. A search engine is no more than a web application which is processing data. A search engine does not create data it just process some information it has in his index. I describe search engine dependency as the fact that one does not make the choice between one search engine and another. Search engine dependency is very relevant in most of the countries. Most of the people are using only one single search engine when performing requests on the web. I will speak as well about bad habits when dealing with search engines. For example you can be dependent of using a search engine but using it badly. Data quality is the quality of data. Data are of high quality quot;if they are fit for their intended uses in operations, decision making and planning quot;. Alternatively, the data are deemed of high quality if they correctly represent the real-world construct to which they refer. These two views can often be in disagreement, even about the same set of data used for the same purpose8. 1.3 Focus, goals and structure of the report The focus of this work is to show if there are some risks of being search engine dependent and if there are some, is the gap of information significant between a search engine to another? Goals are numerous: • to put in evidence how bad it is to be dependent of an unique search engine; 7 Internet World Stats (2008): INTERNET USAGE STATISTICS The Internet Big Picture. 8 Kaplan, I. (2008): Bad Data Can Cost You Big Time. CHARDONNEAU Ronan - European Master in Business Studies 12/66
  • 13. to show how important it is to have reliable information; • to show how big is the gap between a general search engine and a specialized one; • to look for alternatives if tomorrow the main search engines disappear; • to show the weaknesses of general search engines; • to show and discover other search engines and put in evidence their usefulness; • to show that general search engines are not well used; • to discover the future expectations people have regarding search engines and what are the coming technologies in this field; The structure of the report should be done as follow: The first idea is to introduce the concept of data quality. We all go on search engines in order to find information whatever it is or at least to find the answer to one of our question (how much does a Paris-Berlin train ticket costs? What is the weather in New York? What is the last result of our favorite soccer team?). The second point is about Data quality which is very interesting in order to understand why search engines exist. The next point is dealing with the world of search engines and the dependency which is outcome from them. Analyzing the world of search engines is fundamental to understand how Internet is not as rational as we could think (search engines may be not the Internet, search engines may be different from a country to another). Then we will have a look on figures which show quite clearly that people are not using several search engines when making research on the Internet but an unique one that I will define as the dependency concept. Once this definition given I will focus on the heart of this work which are the risks that this dependency provide and its influence on data quality. Google being in Europe the most used search engine I will use it as a major example in my last part. CHARDONNEAU Ronan - European Master in Business Studies 13/66
  • 14. Chapter 2: Concept of data quality CHARDONNEAU Ronan - European Master in Business Studies 14/66
  • 15. When I firstly decided to write about the concept of data quality for search engine I did not mean at the beginning that data has to be perfect. I just meant that when I write a request on any search engine I expect to have as results some answers which fit to my expectations. The last example I have in mind is when on the 31th of December a friend of mine looked for the « average height of Korean people » (request were made on Google) and all the results on the page were dealing about the « average size of the sexual organs of Korean people ». I have high doubts that no information regarding the average height of Korean people does not exist on the web it however has been what Google seemed to tell me. Here as you can see in this example it raises a lot of issues regarding data quality. My first expectation regarding data quality was then to get some results which fit to my request. But looking it deeper what is the value of a supposed correct answer that I could have found written by someone like you and I (not specialized in this topic). In fact here everything depends on how professional the data as been written, this is what I am explaining in this chapter. 2.1 Data quality definition “This part needs additional information and improvements and is then not finished yet.” Data quality is defined by four criteria: • Accuracy: it means that the information has to be true so based on real facts. Here we see the importance of having the source of the document. • Timely: the data given has to be dated. The most striking example could be the one with stock exchange, what is the value of a data regarding a currency change without the date? • Meaningful: here it comes back to what I was explaining in the introduction. Does the result fits with my request? What is the color of a frog? The answer is green, is it useful? Yes but is it complete?; • Complete: an information can be for sure useful but will be far more useful if CHARDONNEAU Ronan - European Master in Business Studies 15/66
  • 16. it is complete. Here is then the minimum vital of data quality. 2.2 The importance of data quality “This part needs additional information and improvements and is then not finished yet.” As I mentioned it there are two levels of data quality: • Poor quality one: for example you and I are looking for information whatever it is in order to get a quick answer or even a mere comment. Here two theories are facing each other: An information should be given even if it is possibly wrong. It of ▪ course can be useless if the information is a hoax but will it be in the case of a warning of a terrorist attack? ▪ An information should be given only if it is 100% accurate. Here it is interesting in order to not create polemical situations for nothing. There are no rational choice to make here, sometimes you need to use the first theory and sometimes the other one. I would describe poor quality data as mass consumption information which mean good enough for “lambda” people but not that useful for businesses and even less for researchers. With the increasing of Internet users more and more mass consumption data are created and of course their number is bypassing the one of quality ones. • High quality one: here we find the data which fits the four criteria I described above. High quality data are useful in order to take complex and rational decisions. The Wikipedia Issue: As you may know Wikipedia is an Open Source encyclopedia where CHARDONNEAU Ronan - European Master in Business Studies 16/66
  • 17. everyone can write what they want on a specific subject. His success made in come the biggest encyclopedia of all time. Such a success is recognized as a mark of trust and seriousness for search engines which is characterized by being displayed most of the time in the 5th first result for any request. With most of the time an appearance on the first result. The main issue even if recently it has been added the possibility to add the author name most of the articles are anonymous. It is also possible to write down an article without mentioning any source of information. This is then explained to the reader before starting the article but everything is then dependending on reader's behaviour. Which quality of information does he or she want to have. CHARDONNEAU Ronan - European Master in Business Studies 17/66
  • 18. Chapter 3: Search engines dependency CHARDONNEAU Ronan - European Master in Business Studies 18/66
  • 19. In order to understand well this part we should have firstly a look at the search engine market configuration. Even if for a lot of people search engine is a synonym of Google it may be not true for some other people. I made a strong effort during this work to make it as global as possible. Europe is good as well as the United States but we will never got the all answers of our issues if we always consider these two parts of the world. 3.1 Search engine market configuration 3.1.1 Search engine categories The world of search engines is not as uniformed as we could have think. I until now identified three kind of search engines: • Standard: the most well known search engines such as www.google.com, www.livesearch.com (Microsoft), Ask (http://www.ask.com/?o=312) they are looking for any kind of information through the Internet and are characterized by a very light interface: Illustration 2: A light interface: the homepage of the search engine www.ask.fr • Portals: may be the most complex search engines to analyze on a figures CHARDONNEAU Ronan - European Master in Business Studies 19/66
  • 20. point of view. Portals are characterized by a lot of information on their home page including the search engine function. It is then difficult to make the difference here between the success of the portal and the success of the search engine on this web page. The best example I found is the one of mail.ru which is the most visited website in Russia far before Google. It seems when looking at the search engine statistics that the research made on Mail.ru are not accounted or that no one is using the search function of Mail.ru. So it is sometimes hard to measure the success of some search engines. The most well known portal is Yahoo. Illustration 3: Home page of the Mail.ru portal • Specialized search engines: they to my point of view belong to a subcategory of the first group. The major problem of standard search engines is that they are too big. Specialized search engines are then no more than a special function which is working as a filter. It is then far more easier to find the information you are looking for on a specific website rather than using standard ones. A good example of it can be the one of Ebay, it is of course far more convenient to go on the Ebay website and use the search engine directly from there rather than going on Google and writing a request such as 'Ebay buying socks'. I mentioned it again but in most of the cases specialized search engines are not a revolution they are just one part of standard search engine and are not a new technology by themselves. CHARDONNEAU Ronan - European Master in Business Studies 20/66
  • 21. 3.1.2 Search engine market The search engine market is segmented by an enormous leader: Google, a follower who is far from him: Yahoo and a large amounts of small search engines. The dominant position of Google may stay for years and years and only a technical revolution could really jeopardized him. Illustration 4: Top 10: Search websites in the world August 2007 As you can see here I identified what we could call « The big four » which represents the four major search engines. Then comes three specialized search engines (7,49% all together) but their success are limited to the size of the website they are browsing. Then came in last positions what I would called the dead champions which once have been great and well known search engines but which are nowadays in decline and will one day probably disappear. 3.1.3 The search engines in the world The more I study the E-world and the more I realize that the web is exactly the reflect of our society but in a non-physical aspect. Internet did not break cultural differences and search engines really show it. As we just saw Google represents 6 research out of 10 in the world but does it mean that each country in the world has a population of 60% Google users? To answer this question let's have a look at the following map: CHARDONNEAU Ronan - European Master in Business Studies 21/66
  • 22. Illustration 5: The most visited website by country Source9 As we can see here the world is not covered entirely by Google. We clearly have some Google countries, Yahoo countries, Mail.ru countries and so on and so fourth. We can however notice some striking information such as almost all the American continent is using Google as well as Europe, Northern Africa and Southern Africa, Australia, India. In one word almost all countries which have strong links with the Anglo-Saxon culture. Then comes what I would qualified as a cultural wall which is starting in Eastern Europe and which is finishing in Russia. Here are the ex-soviet countries. I unfortunately have no concrete proves of what I am saying but I suppose that there is a kind of a « boycott of American technologies » and support of Russian technologies. The recent partnership between Yandex (main search engine in Russia) and the browser Firefox raised those suspicions10. Russia is not the only country in this situation, China also. The recent advertisement broadcast by Baidu (the leader search engine in China) shows clearly 9 Alexa the Web information company (2008). 10 cf. Houste, F. (2009): Russie : Yandex sera le moteur de recherche par défaut de Firefox. cf. Schwartz, B. (2009): Firefox Drops Google For Yandex In Russia, But Big Loser May Be Rambler. CHARDONNEAU Ronan - European Master in Business Studies 22/66
  • 23. the will that Chinese public institutions are ready to protect their territory.11 As you can see Asia is the region where Google is the least present. It is also the continent where are gathering a lot of different search engines. I may not emphasize the diversity of the Caribbean area as well as Center Africa which are areas where Internet is not that well implemented which mean then that the battle to take the lead is not finished yet. For example is it really relevant to say that Yahoo is the leader in Cameroon which is a country with less than 500,000 Internet users? I however will highlight that the Pacific area which is containing all the « Tigers » (Taiwan, Thailand, Singapore...) are all in red: Yahoo. The search engine world is then divided in two parts: • The Google world: which is composed of all the Anglo-Saxon countries as well as countries which have strong links with the United States or Great Britain. It is clearly showed regarding India and Australia. We can at the date of today still identify two countries which are not under Google control: Czech Republic and Iceland but actually by looking at the figures and the forecasts it is just a matter of time12. • The Asian – Pacific world: Asia is composed of a lot of countries and of course a lot of cultures. Among them we can identify four players: ◦ Mail.ru which is dominating all the ex-soviet countries; ◦ Baidu which has a total control over China; ◦ Naver, a 100% South Korean product which is the best example that search engines work by culture; ◦ Yahoo which is leader in all the “Tigers” Asian countries. Yahoo being an American technology such as Google, how is it possible that Yahoo is so successful in the Pacific area and not elsewhere? The reason I found is that Yahoo is a shiny portal and that Asian culture on 11 cf. Einhorn, B. (2007): Baidu Thinks It Can Play in Japan. cf. (2007): Baidu au Japon?. cf. Grallet, G. (2009): Baidu, un autre Google s'éveille. 12 cf. Rafat, A. (2008): Czech Portal Seznam Could Fetch $900 Million; Google, Apax, Warburg and Others in Fray. cf. Mar Hauksson, K. (2007): Global search report 2007, p8. CHARDONNEAU Ronan - European Master in Business Studies 23/66
  • 24. Internet recognize a quality website to the number of animations on it 13. I will then add that Japan is a strong pole of Internet with one of the highest rate of Internet integration in the world per capita14. This is why I think Yahoo is so popular in this region. 3.1.4 The search engine market shares per country One of the most complete work I found on this topic after mine is the « Global Search Report 2007 »15. What stroke me the most in all the countries studied in this report as well as in all the report and research I made until now are the search engines market configuration which looks like very often to this: Illustration 6: Search engine market shares in Czech Republic It is very rare to find a country where there is a close competition among search engines. Even if in the High Technology world things change from a day to another you have often the following configuration where the first search engine is leading the game by more than 30 points on its followers. This is typical from the search engines market or you are adopted by a population or you are not. This trend seems quite relevant in the software industry, people seem to look for a standard used by all. This is the case for the Operating System industry, the browser industry, the e-learning industry. The 13 cf. Tobin, R. / Hotchkiss, G. / Lee, P. (2008): Chinese Search Engine Engagement, p28. 14 Internet World Stats. (2008): Internet Usage in Asia. 15 cf. Wilsdon, N. (2007): Global Search Report 2007. CHARDONNEAU Ronan - European Master in Business Studies 24/66
  • 25. explanation I found for the success of search engine within a population is the word to mouth, this is how Google has been so successful isn 't it? How never heard sentences such as « you just have to Google it » Google is even nowadays in dictionaries as a verb16. As a conclusion I would say that in the world of search engines you are first or you are nothing. I have to mention as well that the market has a lot of small local search engines which are if original enough bought by the big ones or if not will disappear quickly (some example are coming in the news every month). The only key of the success on the short term seem to be advertisement but on the long run you need the technology behind in order to compete. 3.1.5 The search engines competition Google has been created in 1998 and at that time search engines were already in place, it did not scared Google and one after the other Google bypassed all of them. In fact among the big four Yahoo is the oldest (1994) and Microsoft the youngest (2003). Even if the battle seems to be finished it will take a lot of time to Google to be the number one in all countries (everything being linked to culture rather than rationality) which in fact is giving hope to its followers. At the time I am writing this thesis discussions are still on the way between Yahoo and Microsoft in order for Microsoft to buy Yahoo search technologies. We can understand how strategic a such acquisition could be. Yahoo having the research knowledge and Microsoft the funds as well as the software ownership. Regarding Baidu we cannot clearly see how they could compete against Google outside of China. As I said previously specialized search engines are limited to the website they are linked to. We could then think about new comers who starting from nothing could beat famous search engines in a small period of time, it could have been the success of some products such as Cuil launched in summer 2008 which received a lot of advertisement through the news17. But search engines is a very ungrateful world 16 cf. Merriam Webster. (2001): Google - Definition from the Merriam-Webster Online Dictionary. 17 cf. Arrington, M. (2008): Cuil On BusinessWeek's Most Successful of 2008 List. Huh?. CHARDONNEAU Ronan - European Master in Business Studies 25/66
  • 26. where visitors are giving no more than one chance: the product works or it does not. Illustration 7: Results page of Cuil This is a point that I discovered very quickly and that you can test by yourself. People want the information as soon as they can. They are ready to test the product but in a certain amount of tries. When you move from Google to another search engine you are often intransigent. At the first result which does not fit your expectations you will go back to Google. But is the search engine wrong or is it because it is responding differently that on what you were used to? In order to conclude this part I would say that with the search engine history we have and the search engine market configuration, I cannot see how Google could lose its position. Until now only one company succeeds to make a such gap in the world of search engine and it is Google itself and it was in a period where everything had to be created on Internet. So I would say that on this field I don't see how Google can be beaten and even worried. A new technology regarding research is however more and more recurrent in this field and is called semantic research. 3.1.6 The semantic web “This part needs additional information and improvements and is then not finished yet.” CHARDONNEAU Ronan - European Master in Business Studies 26/66
  • 27. The semantic web is another way of crawling the net. We all know how to make a search on the Internet isn't it? We just type in some keywords and press the return key in order to get the answer. In this configuration you have to feed the search engine with the request. With semantic Web the concept is a bit different and based on suggesting you the request instead of typing it entirely. Each time that you are starting to type your request a list of suggestions are coming to you. We are recently seeing more and more this technology on the biggest search engines. The purpose is in fact to guide you as best as they can in order to put you on the right track and trying to avoid you to reach the labyrinth of the web. This technology fits one of the main drawback of search engines and that I call « search engine technology awareness » which consists in how to write good requests for search engines. The main drawback of the semantic web is that this is a very new technology which then have a lot to do before reaching his maturity point. Here we are speaking about a maximum length of a decade. We can also complain about the rigidity of the system but it is true that with length and experience this issue could be fixed. It said that Ask is one of the search engine which based a lot of R&D on this new technology but according to me and without being a technician I think that Google can have better results because of its huge database of requests. Future will tell us what is going to happen. 3.2 Search engines dependency aspect As I mentioned it in the introduction I define search engine dependency as the fact that people are swearing only by one search engine when looking for information on the Internet. 3.2.1 Search engines dependency proves “This part needs additional information and improvements and is then not finished yet.” CHARDONNEAU Ronan - European Master in Business Studies 27/66
  • 28. This is not the studies which are lacking on this topic. When looking for information regarding information literacy on the Internet you arrive on different sources of studies and this regarding all the countries of the world. Information literacy is a relevant topic and an issue. I focused on some very recent and francophone research that I found on the Internet regarding Canadian students18, French19 and Belgium students20. I also found information regarding Germany on this topic. Many sources are as well saying that such research have been made in mostly all Europe, China (Hong Kong)21 and the United States22. All the studies I found until now (all done on students panels so literate people) are all saying the same thing: search engine are the first source of information when looking on the Internet and all students seem to have receive not enough training on how to look for information on the Internet. The best study I found on this topic is one made on all the registered PhD students (2,218 with an answer rate of 23,4%) last year (2008) on a whole region of France (not a high technological developed country but far to be the least on a worldwide scale)23. As we can imagine PhD students have a high requirements regarding quality of information. The study shows that 67,5% of the respondents have never received a training regarding how to look for information during their whole stay at the university which could explain the fact that people are running toward search engines directly. Search engines are used in 96% of the cases when performing research (which emphasize the necessity of how to well use those technologies). 94% of them do not use blogs which I take as a good thing (even if the survey is saying the opposite, blogs being written by professionals as well). The most used search engines are Google (85%) and Google Scholar 37% 18 cf. Crepuq. (2003): Etude sur les connaissances en recherche documentaire des étudiants entrant au 1er cycle dans les universités québécoises. 19 cf. Université de Lyon. (2007): De la documentation au plagiat. 20 cf. EduDoc. (2008): Enquête sur les compétences documentaires et informationnelles des étudiants qui accèdent à l'enseignement supérieur en Communauté française de Belgique. 21 cf. Leung, Hon-wing 梁漢榮. (2004): A study of computer science students' conceptions of information literacy and their experiences in information search process and use. 22 cf. Enquiro. (2004): Search Engine Usage in North America. 23 cf. URFIST de Rennes. (2008): ENQUÊTE SUR LES BESOINS DE FORMATION DES DOCTORANTS Á LA MAÎTRISE DE L’INFORMATION SCIENTIFIQUE dans les Ecoles doctorales de Bretagne. CHARDONNEAU Ronan - European Master in Business Studies 28/66
  • 29. (which is a sub search engine of Google). 60% of them do not know what is a meta search engine and only 5% of them use them. 46 % do not know the search engine of their field and only 20% do use them. Those figures are very interesting because they show clearly how people are not adapted to the technology they are using. PhD students should be some of the most search engines awarded people and it seems that for France they are not. They are strictly dependent of a single search engine which is here Google. They know very few of his sub search engine and as written above they do not know how to use the technology. They are also not aware massively about other search engines. 3.2.2 Search engines dependency aspect Search engines dependency can however comes from different ways: • Search engine satisfaction: you are using a specific search engine which give you entire satisfaction, so why should you change?; • Search engine patriotism: you are using this search engine in order to support your local technologies; • Search engine convenience: the search engine is providing you all kind of services which made it very convenient to use or even made the other ones not convenient to use; Of course the trend for all search engine is to go for convenience because it gives to customer everything they need. The main drawback is that for the search engine companies you have then to dedicate less people to your core activity and then there is a risk that your search activity will pay the price. CHARDONNEAU Ronan - European Master in Business Studies 29/66
  • 30. So being search engine dependent means using massively a search engine for one of those reasons and ignoring all the other ones. Search engines dependency reach very high rate in Europe: Illustration 8: Search engines figures for France, Source: XitiMonitor Most of the European countries have like France a strong addiction to Google with more than 90%. What does it concretely mean? Almost all European when making search on the Internet are fed by using the same way to process information. 3.3 Search engines dependency problems At the first sight when using a search engine we are not thinking about all the issues which are coming out from them. We make our research and we get results from this and then we try the results one after the other until finding the one which fits the best our expectations. The first main problem is that when addicted to a specific search engine which normally gave you satisfaction the day when the result will not be the one you want you may think about different possibilities: • The information is not displayed so the information you are looking for does not exist yet; • The request was not good enough so let's try with other keywords; The main issue to highlight is that people are so confident with some search engines that they will not normally look for alternatives or even consider that their CHARDONNEAU Ronan - European Master in Business Studies 30/66
  • 31. favorite search engine can be wrong. People are so confident with their search engine that they are not thinking that the search engine can be wrong. 3.3.1 Privacy issues I am a bit divided on this topic because I consider it as the mass public « scarecrow » which is only good to animate polemical debates for nothing. It finds its explanations in the way that search engines are collecting information. When analyzing search engines we have to consider that it is a free product for all of us (in fact search engines get paid by displaying advertisement on each web page). Each time you are making a research on the Internet the search engine you are using registers the IP number of your computer and of course the research you just made. All these data are of course supposed to be confidential but some are used in order to make some statistics such as how many Internet users from a specific country have visited this website. It can be used for other purposes such as the rank of the most used research and others data such as those. Of course the more information you give and the most they collect so if you open an email account on a search engine for example they will collect your name, address. Until now few are the cases where we got the proof that information collected by search engines have been given to third parties. The most famous one is the one of Yahoo in China which filtered some emails and gives the names of some Chinese journalists who were denouncing things about the Chinese government24. To make it clear until now no mass exploitation of data have been observed and the recent news given by major search engines (Microsoft and Google) are saying that the trend is to eliminate those data as much as possible in the fear of losing confidentiality25. We however have to consider an additional element the more search engine know about what we are looking for and the most they can fit our expectations, so I personally do not think that reducing the collection of data is in 24 cf. Kahn, J. (2005): Yahoo helped Chinese to prosecute journalist. 25 cf. Boucq, I. (2009): Yahoo et vos données persos.... CHARDONNEAU Ronan - European Master in Business Studies 31/66
  • 32. people interest and I will qualify the privacy issue as a global scarecrow in order to bug the major search engines and putting on the first row some alternative ones which on the long run may will not fix those issues. 3.3.2 Looking for other search engines A recurrent question often asked is « Why people are not looking for other search engines? ». Why do people knowledge regarding search engines is so low? If you ask to someone how many search engines he may know he will for sure answer you « Google » maybe « Yahoo » he may add « Microsoft » without telling you its real name which is « Live Search » and it will for sure stops right here. Many others search engines are however present on the market. We should then conclude that people are satisfied by the current search engines. However studies are showing that regarding Google people when getting their results are only considering the first three results and not the others giving far more importance to the first one.26 By just this observation we could then say that there are things to improve in Google's interface. I guess again that people are looking for something new even if it is not their priority. This is why I would like to raise another issue which is search engine awareness. 3.3.3 Search engine awareness Search engine awareness is for me the key issue of this all work and reveal two parts: • Poor search engine awareness regarding how to use a specific search engine; • Poor search engine awareness regarding the existence of other search engines; Both parts are fundamental. The first one deals with what we call search tools. It consists of a combination of keys in order to fit a specific request. If you are on Google typing the following request: search engine dependency is different from: 26 Eye tools. (2009): Eyetools Eyetracking Research. CHARDONNEAU Ronan - European Master in Business Studies 32/66
  • 33. « search engine dependency » In the first case you will look for pages which are containing the words: search engine dependency as well as all the others combinations such as pages with search dependency, engine dependency, search, engine, dependency. Whereas in the other cases you are restricting the pages which include only those three words and in the same order. It exists dozens of those tools per search engine, the most famous ones are called « boolean operators ». Some websites provide the list of those tools27. We have also to consider that all the search engines are not using the same conventions so some tools under search engine A will not work on search engine B. If you are on Yahoo the following request: • title:search engine will give you all the web pages which have for titles the following keywords. This tool is unfortunately not working on Google. This example shows well how search engines are in fact complementary in order to make precise research. Is Google really better than Yahoo? To get a piece of the answer you can make this empirical experience I made a couple of months ago. I did the same request on three different search engines (Google, Yahoo and MS Live Search) and compared the first five results and gave my opinion on each of those results. If the result fit my expectations I gave one point if it did not zero. Google got a four out of five, Yahoo a two out of five and MS Live Search a zero out of five. However all results from a search engine to another was different. So Google won for sure at the time I performed the award of pertinence, however the information I was looking may have been on the two results Yahoo gave me. So is Google better than Yahoo for this example the answer is yes. Does it mean that I should not consider Yahoo? Absolutely not. 27 cf. Koch, P. / Koch, S. (1999-2006): A short and easy search engine tutorial. CHARDONNEAU Ronan - European Master in Business Studies 33/66
  • 34. 3.3.4 Other search engines existence awareness “This part needs additional information and improvements and is then not finished yet.” Here is another issue which is actually not hitting only the competitors of major search engines but also themselves. In order to be recognized in the world of search engines you need very strong guarantees. This year two of them succeeds their advertisement campaign: « Cuil » by claiming that they had a database which is 6 times more than the one of Google. It said also that their success is coming from the fact that the company has been built by a former Google employee but actually Cuil disappears very soon as the change of their CEO. The main reproach which has been done is that the results given were not numerous enough which of course is seen from a very bad eye when you are claiming that it is your main strength. In the world of search engines you cannot be a clown. People give you one time the floor and will not give it to you back if you do not perform as they wish. The other example I can give is the one of Exalead, a French search engine which is supported by the European Union, the purpose behind is to set up an European technology in order to face the American ones. So as you can imagine there is a big support and a huge project behind which make me coming to the bullet point above if you are a search engine which wants just to be correctly advertised you definitely need very strong supports. I today January the 7th 2009 got twice in a day the same observation about researching information on the Internet. I was looking for two different songs but I did not know the title as well as the singer, all I had was some couple of words. I made then a research on Google with the words I knew from the song and added the word « lyrics » as well in order to CHARDONNEAU Ronan - European Master in Business Studies 34/66
  • 35. specify that what I was looking for was a song. I as well added the quotes « » in order to say to Google that I wanted the words in this strict order in order for it to identify better the song I was looking for among the jungle of the web. I wrote then: lyrics « freaky people » the answers I got where dealing with a singer called « Michael Franti and Spearhead » and this on three pages in a row. I tried to set a different request and the result was the same. Then I gave a try to GrabAll which is a double search engine displaying Google and Yahoo on the same screen but on two different columns. I made then the same request and got very surprise to see the name of the Fat Boy Slim band on the first page of Yahoo. This result was on the fourth page of Google instead. Here I have two points to make. Firstly the utility of using a second search engine in order to control the result and secondly the fact that Google was taking in account that I was 100% of what I was looking for. Google was in fact giving me those results because the two words « freaky people » are a part of a title of a song of « Michael Franti and Spearhead » but did I asked for that? No I just asked for lyrics which have the words « freaky people ». The second example I am going to give you is as well quite relevant of how search engine dependency can be dangerous. I was looking this time for lyrics with « lyrics wanna be your doll » what I did not know is that I had it wrong and it was not « lyrics wanna be your doll » but « wanna be your dog ». The fact is that both songs containing those lyrics exist and that Google was displaying me the songs containing the words I just fill in whereas actually Yahoo was presenting me diverse results including the one I wanted. I am not writing here that Yahoo is better than Google, I am just saying that there is a high interest to have a look around in the world of search engines, I could have a look around all night on Google looking for my song that I will have never found because I had it wrong since the beginning. So looking for diversity should be the first reflex when you cannot find what you are looking for in short attempts. 3.3.5 Less confident regarding other search engines “This part needs additional information and improvements and is then not finished yet.” CHARDONNEAU Ronan - European Master in Business Studies 35/66
  • 36. This part is in fact in close link with the example I just gave above. Each search engine has his own methodology when displaying the result. When you get used to get the results in a specific way then when it comes to be different you may think that the results are totally absurd. The problem is that the more you get addicted the more you will have the feeling that other search engines are bad and then will not give them a try. This is why I describe search engine dependency as the fact to be less and less confident regarding other search engines. 3.3.6 Even the best cannot provide you everything “This part needs additional information and improvements and is then not finished yet.” Some functionalities are not included in the biggest search engines which make not you think that they could even exist such as search engines for jobs or people. How to get a visual representation of what I am looking for? And the main problem of the big search engines is that they are too big so even if they have what you are looking for it is not sure that you can even see it. CHARDONNEAU Ronan - European Master in Business Studies 36/66
  • 37. Chapter 4: Risks of search engines dependency and its influence on data quality CHARDONNEAU Ronan - European Master in Business Studies 37/66
  • 38. This part is dealing with the measurement and the proves of what I am writing about. Here I expect to show how search engine dependency is affecting our everyday life and how we could overcome this situation. 4.1 The information has been found but is poor “This part needs additional information and improvements and is then not finished yet.” Let's imagine that you and I just performed a request under a specific search engine and that the answers given do not satisfy you entirely. You found the information but it is not developed enough, not signed, too old... 4.2 What the search engines do not tell you One of my former English teacher taught me one day that there is different ways of not giving information, one is to lie and one is to not say the information. I don't think search engines are lying and cheating even if in the case of Baidu the Chinese search engine there are high suspicions on it28. Some are saying that in order to be well ranked you need to pay Baidu for it and it has been said as well that the Chinese government as a big role to play when displaying the results. The recent Milk scandal in China (Baidu accepted to high ranked unlicensed companies which were providing fake milk in exchange of money) showed how dangerous can be a poor data quality search29. I however have the proof that search engines are not telling you everything when displaying results and that this rule is general for all the major search engines. In Google case, search engines are censored according to the country in which you are making your research from30 and it touches all the country around the world even countries such as France and Germany. Most of the time those censorship acts are for your welfare or to protect national security interest. 28 cf. China Tech News.com. (2007): CCTV: Baidu Search Engine Fraud Exposed?. 29 cf. China Daily. (2008): Baidu cuts revenue forecast on ad scandal. 30 cf. ROSEN, J. (2008): Google’s Gatekeepers. CHARDONNEAU Ronan - European Master in Business Studies 38/66
  • 39. 4.3 The best way to get data quality Here is maybe the most interesting part of my work which consists in giving the solution to the main issue I highlighted since the beginning. Until now I just raised questions regarding the risks of dependency and not showed what you should do in order to explore the web properly. 4.3.1 The sub-search engines As I said previously Google is too big, the bigger it is the more precise your request have to be. Few people knowing the existence of Boolean operators it then comes more and more difficult to get the right information. This is why in fact Google put at your disposal some sub-search engines in order to make your research easier, for example: Google books, Google videos, Google images. We all agree that looking for pictures could be done on the research bar of Google, but it is far more convenient to make it directly from « Google images » because it is displayed better. The main problem is what I described previously: « search engine awareness within a search engine ». Am I aware that Google has the following services? I could sum up it all in a scheme: Illustration 9: Some sub-search engines of Google For a Google search engine dependent everything starts from Google home page. He is then aware here of those specialized search engines in order to make CHARDONNEAU Ronan - European Master in Business Studies 39/66
  • 40. more precise research on the following information: images, maps, news and products. It is however under his own initiative to discover what is going on under those search engines and to discover them. I do not have a proof of what I am saying but I may presume that in order to buy a product more people are going on Google, type eBay or Amazon and go and buy on those websites rather than going on Google products. Google products is however including all those websites in his database so it should more convenient to have a look on Google products first in order to get the best price rather than going individually on eBay and Amazon. We can then symbolize the Internet users awareness of Google search engines according to this scheme. The more we get into deep of those levels and the least the Internet user is aware of it. Level 2 is still available on the first page but with two clicks. Level 3 is no more available on the home page but can be accessible in three clicks. Level 4 is even not accessible from Google main website and that you have too be aware of it in order to access it. Level 5 is hypothetical and constitute the unknown Google projects, often developed under another name. All those Google sub levels have been created in order to make easier research on a specific request. When looking for an image it is far more convenient to pass through Google images rather than the general Google. 4.3.2 The size of the Internet How big is the Internet? Is Google indexing all websites? It is really hard to answer to those questions but however possible to set up some estimations according to some information31. Those sources are saying that in 2005 the size of the Internet is estimated to 5 million terabytes and Google's index to 170 terabytes which would mean that Google is processing only 0,000034%. However the Internet is also containing what we call the invisible web 31 cf. Plesu, A. (2005): How Big Is the Internet?. CHARDONNEAU Ronan - European Master in Business Studies 40/66
  • 41. composed of websites that owners do not want its content indexed as well as websites which are protected by a password. In 2004 this invisible web was estimated to be 500 times bigger than the visible web. It has been said as well that Google is indexing invisible web only recently32. A clever calculation will then give us: Internet size = Visible Web + Invisible Web; Internet size = 501 * Visible Web; Visible Web = Internet size / 501; Visible Web = 5 000 000 / 501; Visible Web = 9980 terabytes; Google Index = 170 / Visible Web; Google Index = 1,7% This estimation is of course only an estimation and could be full of errors. I however find it more useful than no information at all. I would also emphasize the fact that Google has no interest in indexing bad quality websites and that technologies have evolved in the last few years and that this rate should be of course far higher than those 1,7%. Whatever is the final result my point is the following: Google is not the Internet and is not processing all the web... but does all the web need to be indexed? 4.3.3 Single search engine Internet coverage “This part needs additional information and improvements and is then not finished yet.” Here is a more optimistic representation of Google Internet coverage I made in order to show that Google is not the Internet and that even within Google's sphere Internet users cannot access to all the information: 32 cf. Chitu, A. (2008): Google Starts to Index the Invisible Web. CHARDONNEAU Ronan - European Master in Business Studies 41/66
  • 42. Illustration 10: Google's Index coverage Google dependent people are not only using Google when making research but as well Google partners all symbolized by the sign: Powered by Google means according to an IT company called Alacra33: Alacra uses Google Search Appliances to create the Alacra Compliance Web. The use of Google Search Appliances combines the power of Google search technology, including the ability to find the highest quality and most relevant documents, with Alacra's domain expertise in selecting those web sites and pages which are relevant for AML compliance. Which means that here you have a partnership working more or less in the same way that using a Google specialized search engine. 33 Alacra. (2008?): What does quot;Powered By Googlequot; mean?. CHARDONNEAU Ronan - European Master in Business Studies 42/66
  • 43. There are thousands and thousands search engines all specialized in a specific field on the Internet which are using Google technology to search sites such as Tourism, tutorials... By using all those specialized search engines you are crawling better the Google's coverage space. Illustration 11: “Powered by Google” search engines coverage As you can see here on this configuration by using specialized search engines powered by Google you will always browse Google's cyberspace and when we know that Google is not the Internet we can then ask ourselves how we can get the best out of Internet. CHARDONNEAU Ronan - European Master in Business Studies 43/66
  • 44. On January the 11th I went on both websites http://www.aol.fr/ and http://www.google.fr/ and type the following request 'les moteurs de recherches' both results on the first page were identical. The only difference is that Google gave me 3,210,000 results and AOL 290,000 (9% of Google results) so as I developed above AOL is looking at the same place as Google but is applying more filters. Is it really useful considering that AOL is a general search engine? The answer maybe be given in their home page at http://search.aol.com/aol/webhome « The AOL Search engine delivers great search results, enhanced by Google, plus relevant multimedia results delivered on a single page-so you can search less and discover more. »34 If you go on http://www.google.com/coop/cse/ you will have a good example of the search engine powered by Google. You can even create your own one. All is done by using Google technology and you are just applying your own filter by listing the websites where you want Google to look inside. 4.3.4 Multiple search engine Internet coverage “This part needs additional information and improvements and is then not finished yet.” Let's go now deeper in our analysis by taking in account search engine which are not using Google technologies. All developed their own technology and are sometimes better than Google in some fields worst in the other. We will then have something like that: 34 cf. Boswell, W. (2004?): AOL Search: How to Search with AOL Search. CHARDONNEAU Ronan - European Master in Business Studies 44/66
  • 45. Illustration 12: Search engines coverage So here is my point the most the search engines are different and the most it gives you the possibility to discover the web. Here I just put the main actors but we have also to consider that it exists a lot of small search engines which developed their own index and have then their own way to process data. Of course one could say that there is no interest for an European person to process information on a Chinese or even Russian search engine because Chinese search engine should of course be better in looking for Chinese information rather than an European one. It is definitely true if we are speaking about contents such as texts, but when it is dealing with pictures or video the reality should be totally different. On the other hand the day where this European person is looking for Chinese CHARDONNEAU Ronan - European Master in Business Studies 45/66
  • 46. information he should then be aware that he should use the Chinese search engine rather than the Chinese version of the European one. And as we can imagine all the other players: Yahoo, Baidu, Mail.ru have their own Boolean operators and their own sub search engines. So it comes back to the idea of search engine awareness. The more you know about them and the most you are increasing your knowledge about how to get the best out of Internet. 4.3.5 Others search engine Internet coverage “This part needs additional information and improvements and is then not finished yet.” You have a last category of search engine which are the one coming from the semantic web concept which are not based on keywords requests. I found one of those which is called « Who is like it » and gives as results websites similar to the one you just gave him. Illustration 13: “Who is like it” search engine Of course solutions have been found in order to create the perfect search engine including all those technologies. But as you can imagine it is not an easy thing if firstly search engines did not take the same standards as Boolean operators. Secondly the more search engine you combine the more results you will get and it then very complicated to decide which one is more pertinent than the other. CHARDONNEAU Ronan - European Master in Business Studies 46/66
  • 47. A good example of it is the meta search engine called Dogpile which is quite popular in the United States it combines the results of 4 big search engines which are Google, Yahoo, MSN live search and Ask. The main problem is already showed on the advanced search web page of Dogpile, few are the Boolean operators limited to 4. Illustration 14: Dogpile meta search engine The second issue which is relevant is how to decide which results from those search engines are the most relevant. I showed you previously that when making a comparison between Google and Yahoo I found what I wanted on Yahoo on the 8th position. So it is hard to decide which result is the most relevant and to this game you are the only judge of the situation. So the hypothetical search engine should have the following characteristics: including all the technologies ever created regarding search engines (Google's knowledge+Yahoo knowledge+MSN knowledge+...) in order to define the most powerful algorithm ever. All those companies have of course different objectives than being the most rational search engine all are looking to be the number one. This day will may come but it will not be for sure for tomorrow so waiting for it we should understand how to use one by one all those technologies. The solution being then to test the search engines one by one but before testing all of them you have to know that they exist and how to CHARDONNEAU Ronan - European Master in Business Studies 47/66
  • 48. use them but you should at least to know that they exist. 4.3.6 A concrete representation of the World Wide Web In order to make this work easier a Japanese company set up regularly a web map of the most famous websites in the world by referencing them by categories this map could of course be improved but should be a good start: Illustration 15: A representation of the most visited websites This map is available at the following address: http://informationarchitects.jp/ start/ with all the links included towards the websites it is composed of. It is very interesting in order to break the search engine dependency phenomenon. On this map are located all the most famous websites for 2008 you can then see all the most CHARDONNEAU Ronan - European Master in Business Studies 48/66
  • 49. influential websites and where to seek for information. For example if I want to look for a video I may follow the gray line and test all the websites which are on it to find the video I am looking for. In order to conclude this part which actually should not be the core of the all thesis the Web is big and in order to discover it you need to know how to. We could compare it to the real world when living in country A you receive as feedback from country B by different ways (people who are moving from country B to A, the news, the books and documentations you have about country B) but you will never be physically in country B and for this there are some information that you could never get. Of course you can get nearer to those information by crawling more and more the web with your country A search engine (it will be like documenting yourself more and more about country B) but it will never be like being and living in country B. 4.4 The gap between search engine dependency and data quality “This part needs additional information and improvements and is then not finished yet.” In this part I will explain how to put in evidence the gap of information between being search engine dependent of only one search engine and using the most rational tool to search for information on the Internet. It may not be easy to prove it concretely so I may need to prove it by making empirical studies. I succeed to put in evidence so far that: Internet users are search engines addicted; • Internet users have few knowledge about search engines awareness; • I have now to prove that this is bad; • My first point will be to show that search engines are using different technologies provide different results. Here is a comparison I made for three search engines through CHARDONNEAU Ronan - European Master in Business Studies 49/66
  • 50. http://www.thumbshots.org/Products/Thumbshots/Ranking.aspx which shows how many similarities search engines have among them. Here I said if I type in “Universität Kassel” what are the results that those search engines have in common (in blue). And I moreover added the option to highlight me the website of the university of Kassel (in red) which for me is a sign of relevancy of my request. Illustration 16: Comparison among Google, Yahoo and MSN Here as we can see: Google has 7 similarities with Yahoo out of 60; • Google has 10 similarities with MSN out of 60; • Google found two times the website I was looking for; • Yahoo has 4 similarities with MSN out of 60; • Yahoo found one time the website I was looking for; • MSN found 4 times the website I was looking for; • CHARDONNEAU Ronan - European Master in Business Studies 50/66
  • 51. This analysis is then confirming what I was supposing before search engines do not look for information at the same place. Then one could ask about the pertinence of the results, is it worthwhile to display more than one time on one page the website of the university of Kassel? Or is it a sign of relevancy? So here we have an interpretation according to the Internet user. One thing is sure search engines using different technology provide different results. CHARDONNEAU Ronan - European Master in Business Studies 51/66
  • 52. Chapter 5: The Google example CHARDONNEAU Ronan - European Master in Business Studies 52/66
  • 53. “This part needs additional information and improvements and is then not finished yet.” Google is for sure in some parts of the world the best example we can find of the search engine dependency phenomenon I described. 5.1 Google In January 1996 a 24 year-old PhD student called Larry Page studying at the University of Stanford was looking for a theme for his thesis. Encouraged by his supervisor he studied the following topic “exploring the mathematical properties of the World Wide Web“ working in collaboration with another student called Sergey Brin. To make it simple, it is from this work and collaboration which will came up “Google Inc” (officially created in September the 7th 1998). Two months later Google is already included in the Top 100 of world websites of PC magazine (a reference in the United States for computers). Even if Google is formerly a web based application in English it is a worldwide service available on the Internet for all. As his creator (Larry Page) said quot;Google's search engine has always had strong global appealquot;35. Google is nowadays the most famous search engine. In ten years as the Millward Brown report said36 Google will become the most powerful brand in the world. 5.2 Google's success The broadest explanation I found is the following « Google provides for free a useful service that people actively seek out »37. And when you think about it it is definitely true. People are going on Google because they are all looking for that kind of services. But how can it be more successful than its fellows? Here I could say that in general Google is giving better results which may be one reason, but moreover it 35 cf. Chardonneau, R. (2008): International Marketing: Google. 36 cf. Millward Brown. (2008): Top 100 most powerful brands 08. 37 Eternal Dreamer. (2008): Why Google is so Successful?. CHARDONNEAU Ronan - European Master in Business Studies 53/66
  • 54. is providing added services such emails, blogs, news, Advertisement programs. Moreover Google's health as a company has been well preserved by making good choices when making acquisitions and mergers, they internationally developed themselves very well. 5.3 Google dependency state If we take Europe we can see that it is definitely a Google dependent continent: Illustration 17: Google's domination in Europe As we can see here Google (in blue) is not only the most used search engine in those countries it is in fact like the only search engine present at the continental level. 5.4 Google functions Google Boolean operators are numerous but who really use them when performing a research on Internet? A simplified interface of them is available at http://www.google.com/advanced_search?hl=eng but here also who really makes research under it? CHARDONNEAU Ronan - European Master in Business Studies 54/66
  • 55. Illustration 18: Google's Advanced search engine 5.5 Google added functionalities Here is another part I did not developed so far. It deals with other functionalities and services that the search engine do have but do not put sufficiently in advance to the Internet user. So let's say that the advanced research which is located on the home page is already not that much used you can then imagine how much is used a service which is hidden. 5.6 Google success is his weakness The fate of Google is linked also to the one of Search Engine Optimization which is the ability to well index a website on search engines. The main issue is that Google being the most well known search engine a lot of computer scientists tried to understand how Google was working in order to index web pages. The more experiments are made and the more the secret algorithm of Google is known. As we can imagine few are the web masters who are interested to have a website in the latest pages of Google. This is why Google results are not the most rational and are not the reflect of our physical society. This observation is far more important if we consider that some studies38 showed that Internet users are considering only the first 3 results of Google when making research on it. 38 cf. Enquiro. (2008): Eye Tracking Studies. CHARDONNEAU Ronan - European Master in Business Studies 55/66
  • 56. Illustration 19: Eye Tracking on Google 5.7 Google's disappearance hypothesis It is hard to believe that a company such as Google can close his gates one day but everything can happen and moreover in the world of ICT, so let's imagine what will be the consequences of his disappearance. I truly believe that all Internet users will go for its followers Yahoo and Microsoft with exactly the same behaviors. It however said that the patent of Google algorithm belong to the Stanford University and expires in 2011 what will then happen to Google at that time? On January 31th 2009 a good example of this hypothesis has been put in evidence on a global scale. Due to a Google technician mistake during 50 minutes Google search engine was displaying all the results websites as potentially dangerous: CHARDONNEAU Ronan - European Master in Business Studies 56/66
  • 57. According to French analysts this human error decreased the number of Google visitors of more than 90%: CHARDONNEAU Ronan - European Master in Business Studies 57/66
  • 58. The first conclusion we can set is that people are definitely trusting Google. If Google say something is that it is supposed to be true. According to French analysts during that time Internet users did not went for other search engines39. They probably have shut down their computers during that time. However the day after all search engines competitors rose drastically their market shares except for powered Google search engines40. It clearly shows that Internet is stopping during a rather short period of time before that Internet users goes for alternative solutions. They then go for the major alternative search engines. 39 http://www.pcworld.fr/actualite/google-panne-et-degats-collateraux/26021/ 40 http://fr.news.yahoo.com/16/20090203/ttc-le-bug-de-google-revele-la-forte-dep-c2f7783.html CHARDONNEAU Ronan - European Master in Business Studies 58/66
  • 59. Conclusion The purpose of this intermediate report was for me to show my knowledge about the topic, to prove that the problematic is pertinent, interesting and that there is enough to write about it. I wanted to show as well that I have the ideas clear enough for the next coming months. What I keep in mind from this work is that I like the research I am doing, that I am very curious about the final result (how to make the difference when looking for information on the Internet and that the gap between rational search of information and mass search information is abyssal). I am quite conscious that a lot of work has to be done and that a lot of questions have still no answers. However I would like as well to emphasize what I personally did so far and what I learned: I know now how to structure a master thesis. I had until now no knowledge • about it and I learned from scratch how to set up an academical report; I studied since June 2008 the world of search engines and comes to know a • lot about the topic. Once graduated I would like to continue on the field of e- marketing either on a PhD level or in an international e-marketing company. This thesis is then making me more valuable to pretend to one of those two goals; Studying this field brought me additional ideas regarding the e-marketing • field. I wrote done an additional 20-pages report (not included in this intermediate report) about a hypothetical computer application which could give information about how to launch e-marketing campaigns on a global scale; February will be for sure the month which will decide how this master thesis will be achieved. I have concretely more than 3 weeks to dedicate days and nights to this project. The biggest issues I will have to think about are: Chapter 3.1.6: regarding semantic web I did not study yet in detail the topic • CHARDONNEAU Ronan - European Master in Business Studies 59/66
  • 60. and I will have to come back on this point later; Chapter 4: how to measure the gap of information between mass behavior • when looking for information on the Internet and the most rational behavior?; Chapter 5: The Google example; • Re writing the thesis from an informal point of view this time (using the third • person of the singular instead of the first which does not sound very academic); CHARDONNEAU Ronan - European Master in Business Studies 60/66
  • 61. Declaration I certify that this work has been done by myself and only myself. All the sources used for its realization have been well indicated. Ronan CHARDONNEAU European Master in Business Studies Institut de Management de l'Université de Savoie d'Annecy (FR) Università degli studi di Trento (IT) Universität Kassel (GER) Universidad de León (SP) 22th January, 2009 CHARDONNEAU Ronan - European Master in Business Studies 2007/2009 61/66
  • 62. List of literature Alacra. (2008?): What does quot;Powered By Googlequot; mean? URL: http://www.alacra.com/compliance- search/faq.asp Last access: 23/01/2009 Alexa the Web information company. (2008): Top sites by country URL: http://www.alexa.com/site/ds/top_500 Last access: 23/01/2009 Anonymous. (2007): Baidu au Japon? URL: http://asia.2803.com/chine/baidu-au-japon/ Last access: 23/01/2009 Arrington, M. (2008): Cuil On BusinessWeek's Most Successful of 2008 List. Huh? URL: http://www.washingtonpost.com/wp-dyn/content/article/2008/12/29/AR2008122901227.html Last access: 23/01/2009 Battelle, John. (2005): The birth of Google URL: http://www.wired.com/wired/archive/13.08/battelle.html?tw=wn_tophead_4 Last access: 23/01/2009 Boswell, W. (2004?): AOL Search: How to Search with AOL Search URL: http://websearch.about.com/od/enginesanddirectories/a/aol_search_2.htm Last access: 23/01/2009 Boucq, I. (2009): Yahoo et vos données persos... URL: http://www.erenumerique.fr/yahoo_et_vos_donnees_persos_-news-15162.html Last access: 23/01/2009 Chardonneau, R. (2008): International Marketing: Google URL: http://storage.canalblog.com/24/01/274197/31730758.pdf Last access: 23/01/2009 China Tech News. (2007): CCTV: Baidu Search Engine Fraud Exposed? URL: http://www.chinatechnews.com/2007/05/31/5459-cctv-baidu-search-engine-fraud-exposed/ Last access: 23/01/2009 China Daily. (2008): Baidu cuts revenue forecast on ad scandal URL: http://www.chinadaily.com.cn/bizchina/2008-12/14/content_7302788.htm Last access: 23/01/2009 Chitu, A. (2008): Google Starts to Index the Invisible Web URL: http://googlesystem.blogspot.com/2008/04/google-starts-to-index-invisible-web.html Last access: 23/01/2009 Crédoc, (2008): La diffusion des technologies de l'information et de la communication dans la société française URL: www.arcep.fr/uploads/tx_gspublication/etude-credoc-2008-101208.pdf Last acess 23/01/2009 Crepuq. (2003): Etude sur les connaissances en recherche documentaire des étudiants entrant au 1er cycle dans les universités québécoises URL: http://www.crepuq.qc.ca/documents/bibl/formation/etude.pdf Last access: 23/01/2009 EduDoc. (2008): Enquête sur les compétences documentaires et informationnelles des étudiants qui accèdent à l'enseignement supérieur en Communauté française de Belgique URL: http://www.edudoc.be/synthese.pdf Last access: 23/01/2009 CHARDONNEAU Ronan - European Master in Business Studies 2007/2009 62/66
  • 63. Einhorn, B. (2007): Baidu Thinks It Can Play in Japan URL: http://www.businessweek.com/globalbiz/content/feb2007/gb20070215_649662.htm? chan=globalbiz_asia_technology Last access: 23/01/2009 Enquiro. (2008): Eye Tracking Studies URL:http://www.enquiroresearch.com/eyetracking-report.aspx Last access: 23/01/2009 Enquiro. (2004): Search Engine Usage in North America URL: http://pages.enquiroresearch.com/search-engine-usage-in-na.html? source=Search_Engine_Usage_In_North_America_whitepaper Last access: 19/01/2009 Eternal Dreamer. (2008): Why Google is so Successful? URL: http://crumja.wordpress.com/2008/05/20/why-google-is-so-successful/ Last access: 23/01/2009 Eye tools. (2009): Eyetools Eyetracking Research URL: http://www.eyetools.com/inpage/research_google_eyetracking_heatmap.htm Last access: 23/01/2009 Grallet, G. (2009): Baidu, un autre Google s'éveille URL: http://www.lexpress.fr/actualite/high- tech/baidu-un-autre-google-s-eveille_734826.html Last access: 23/01/2009 Houste, F. (2009): Russie: Yandex sera le moteur de recherche par défaut de Firefox URL: http://www.search-engine-feng-shui.com/2009/01/russie-yandex-sera-le-moteur-de-recherche-par- defaut-de-firefox/ Last access: 23/01/2009 Internet World Stats (2008): Internet Usage and Population in North America URL: http://www.internetworldstats.com/stats14.htm Last access: 23/01/2009 Internet World Stats (2008): INTERNET USAGE STATISTICS The Internet Big Picture URL:http://www.internetworldstats.com/stats.htm Last access: 23/01/2009 Internet World Stats. (2008): Internet Usage in Asia URL:http://www.internetworldstats.com/stats3.htm Last access: 23/01/2009 Kahn, J. (2005): Yahoo helped Chinese to prosecute journalist URL: http://www.iht.com/articles/2005/09/07/business/yahoo.php Last access: 23/01/2009 Kaplan, I. (2008): Bad Data Can Cost You Big Time URL: http://www.federationofcredit.com/base/document/Newsletter/IKaplanSept08.html Last access:23/01/2009 Koch, P. / Koch, S. (2009): How big is the Internet? URL: http://www.pandia.com/sew/383- websize.html. Last access: 19/01/2009 Koch, P. / Koch, S. (1999-2006): A short and easy search engine tutorial URL: http://www.pandia.com/goalgetter/index.html Last access: 19/01/2009 Leung, Hon-wing 梁漢榮. (2004): A study of computer science students' conceptions of information literacy and their experiences in information search process and use URL: http://sunzi.lib.hku.hk/hkuto/record/B29597730 Last access: 19/01/2009 Lipsman, A. (2007): 61 Billion Searches Conducted Worldwide in August/Google Ranks as Top Global Search Property URL: http://www.comscore.com/press/release.asp?press=1802 Last CHARDONNEAU Ronan - European Master in Business Studies 2007/2009 63/66