SlideShare a Scribd company logo
1 of 25
Download to read offline
Outline                              Motivation                Results                            Conclusions




              Link Analysis in National Web Domains

                           Ricardo Baeza-Yates and Carlos Castillo
                           ICREA / C´tedra Telef´nica, Universitat Pompeu Fabra
                                    a           o
                                      http://www.upf.edu/dtecn/


                                               OSWIR 2005
                                            Compiegne, France
                                            September 19, 2005

Ricardo Baeza-Yates and Carlos Castillo                            Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                        http://www.upf.edu/dtecn/
Outline                              Motivation   Results                            Conclusions




              Motivation
          1




              Results
          2




              Conclusions
          3




Ricardo Baeza-Yates and Carlos Castillo               Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                           http://www.upf.edu/dtecn/
Outline                              Motivation   Results                            Conclusions



Motivation

          Sampling the Web
            X We don’t have access to a global-scale collection
            X A set of Web sites in the same organization is not diverse
              enough
            X A set of Web sites in the same topic might not be
              representative
            X A set of random Web sites might not be connected
            V A national domain has a good balance between
              diversity and completeness



Ricardo Baeza-Yates and Carlos Castillo               Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                           http://www.upf.edu/dtecn/
Outline                              Motivation   Results                            Conclusions



Motivation

          Sampling the Web
            X We don’t have access to a global-scale collection
            X A set of Web sites in the same organization is not diverse
              enough
            X A set of Web sites in the same topic might not be
              representative
            X A set of random Web sites might not be connected
            V A national domain has a good balance between
              diversity and completeness



Ricardo Baeza-Yates and Carlos Castillo               Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                           http://www.upf.edu/dtecn/
Outline                              Motivation   Results                            Conclusions



Motivation

          Sampling the Web
            X We don’t have access to a global-scale collection
            X A set of Web sites in the same organization is not diverse
              enough
            X A set of Web sites in the same topic might not be
              representative
            X A set of random Web sites might not be connected
            V A national domain has a good balance between
              diversity and completeness



Ricardo Baeza-Yates and Carlos Castillo               Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                           http://www.upf.edu/dtecn/
Outline                              Motivation   Results                            Conclusions



Motivation

          Sampling the Web
            X We don’t have access to a global-scale collection
            X A set of Web sites in the same organization is not diverse
              enough
            X A set of Web sites in the same topic might not be
              representative
            X A set of random Web sites might not be connected
            V A national domain has a good balance between
              diversity and completeness



Ricardo Baeza-Yates and Carlos Castillo               Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                           http://www.upf.edu/dtecn/
Outline                              Motivation   Results                            Conclusions



Motivation

          Sampling the Web
            X We don’t have access to a global-scale collection
            X A set of Web sites in the same organization is not diverse
              enough
            X A set of Web sites in the same topic might not be
              representative
            X A set of random Web sites might not be connected
            V A national domain has a good balance between
              diversity and completeness



Ricardo Baeza-Yates and Carlos Castillo               Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                           http://www.upf.edu/dtecn/
Outline                              Motivation              Results                            Conclusions



Collections used
          V Different economical, historical, linguistic, geographical
          contexts
                        Collection             Year

                                            Brazil        2005
                                            Chile         2004
                                            Greece        2004
                                            Indochina     2004
                                            Italy         2004
                                            South Korea   2004
                                            Spain         2004
                                            U. K.         2002
Ricardo Baeza-Yates and Carlos Castillo                          Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                      http://www.upf.edu/dtecn/
Outline                              Motivation                Results                            Conclusions



Collections used

                 Collection                       Year   Available hosts         Pages
                                                         [mill] (rank)           [mill]
                                                                   11th
                          Brazil                  2005   3.9                      4.7
                                                                   42th
                          Chile                   2004   0.3                      3.3
                                                                   40th
                          Greece                  2004   0.3                      3.7
                                                                   38th
                          Indochina               2004   0.5                      7.4
                                                                    4th
                          Italy                   2004   9.3                      41.3
                                                                   47th
                          South Korea             2004   0.2                      8.9
                                                                   25th
                          Spain                   2004   1.3                      16.2
                                                                   10th
                          U. K.                   2002   4.4                      18.5


Ricardo Baeza-Yates and Carlos Castillo                            Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                        http://www.upf.edu/dtecn/
Outline                              Motivation   Results                            Conclusions



Scale-free topology



            If we sort pages by the number of in-links, the k th page
            has indegree proportional to k −α (Zipf’s Law).
          = The fraction of pages with x in-links is proportional to
            x −θ (Power law). Experimentally, θ ≈ 2.1 on the Web
            Partial explanation: a multiplicative process; if dt is the
            number of links at time t, then dt+1 = C × dt .




Ricardo Baeza-Yates and Carlos Castillo               Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                           http://www.upf.edu/dtecn/
Outline                              Motivation   Results                            Conclusions



Scale-free topology



            If we sort pages by the number of in-links, the k th page
            has indegree proportional to k −α (Zipf’s Law).
          = The fraction of pages with x in-links is proportional to
            x −θ (Power law). Experimentally, θ ≈ 2.1 on the Web
            Partial explanation: a multiplicative process; if dt is the
            number of links at time t, then dt+1 = C × dt .




Ricardo Baeza-Yates and Carlos Castillo               Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                           http://www.upf.edu/dtecn/
Outline                              Motivation   Results                            Conclusions



Scale-free topology



            If we sort pages by the number of in-links, the k th page
            has indegree proportional to k −α (Zipf’s Law).
          = The fraction of pages with x in-links is proportional to
            x −θ (Power law). Experimentally, θ ≈ 2.1 on the Web
            Partial explanation: a multiplicative process; if dt is the
            number of links at time t, then dt+1 = C × dt .




Ricardo Baeza-Yates and Carlos Castillo               Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                           http://www.upf.edu/dtecn/
Outline                                   Motivation                                Results                                    Conclusions



In-degree
                           Brazil                                    Chile                                Greece
           10−1                                     10−1                                 10−1
           10−2                                     10−2                                 10−2
           10−3                                     10−3                                 10−3
             −4                                        −4                                  −4
           10                                       10                                   10
           10−5                                     10−5                                 10−5
           10−6                                     10−6                                 10−6
           10−7 0                                   10−7 0                               10−7 0
                     101     102    103       104              101    102    103   104             101       102   103   104
               10                                       10                                   10

                           Italy                                     Korea                                 Spain
           10−1                                     10−1                                 10−1
             −2                                       −2                                   −2
           10                                       10                                   10
             −3                                       −3                                   −3
           10                                       10                                   10
             −4                                       −4                                   −4
           10                                       10                                   10
           10−5                                     10−5                                 10−5
           10−6                                     10−6                                 10−6
           10−7                                     10−7                                 10−7
               100   101     102    103       104       100    101    102    103   104       100   101       102   103   104

                                                                     U.K.
                                                    10−1
                                                    10−2
                                                       −3
                                                    10
                                                    10−4
                                                    10−5
                                                    10−6
                                                       −7
                                                    10
                                                         100   101    102    103   104



Ricardo Baeza-Yates and Carlos Castillo                                                   Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                                                    http://www.upf.edu/dtecn/
Outline                                    Motivation                                   Results                                   Conclusions



Out-degree
                            Brazil                                       Chile                            Greece
           10−1                                      10−1                                    10−1
             −2                                         −2                                     −2
           10                                        10                                      10
           10−3                                      10−3                                    10−3
             −4                                         −4
                                                                                             10−4
           10                                        10
           10−5                                      10−5                                    10−5
           10−6 0                                    10−6 0                                  10−6 0
                       101           102       103              101              102   103              101           102   103
               10                                        10                                      10

                             Italy                                   Korea                                    Spain
           10−1                                      10−1                                    10−1
           10−2                                      10−2                                    10−2
           10−3                                      10−3                                    10−3
           10−4                                      10−4                                    10−4
             −5                                         −5                                     −5
           10                                        10                                      10
           10−6                                      10−6                                    10−6
               100       1             2         3
                                                         100         1             2     3
                                                                                                 100    101           102   103
                       10            10        10               10               10    10

                                                                         U.K.
                                                     10−1
                                                        −2
                                                     10
                                                        −3
                                                     10
                                                     10−4
                                                     10−5
                                                        −6
                                                     10
                                                          100   101              102   103



Ricardo Baeza-Yates and Carlos Castillo                                                       Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                                                   http://www.upf.edu/dtecn/
Outline                                                  Motivation                                            Results                                    Conclusions



Link scores (PageRank, Hubs, Authorities)

                     Brazil                                         Chile                                    Greece                                   Korea
10-2                                          10-2                                          10-2                                     10-2
  -3                                            -3                                            -3                                       -3
10                                            10                                            10                                       10
10-4                                          10-4                                          10-4                                     10-4
  -5                                            -5                                            -5                                       -5
10                                            10                                            10                                       10
10-6                                          10-6                                          10-6                                     10-6
10-7 -7                                       10-7 -7                                       10-7 -7                                  10-7 -7
                    -6        -5      -4                        -6          -5      -4                    -6          -5    -4                     -6         -5     -4
   10          10         10        10           10           10        10        10           10       10         10      10           10       10       10       10




                     Brazil                                         Chile                                    Greece                                   Korea
10-3                                          10-3                                          10-3                                     10-3
  -4                                            -4                                            -4                                       -4
10                                            10                                            10                                       10

10-5                                          10-5                                          10-5                                     10-5
  -6                                            -6                                            -6                                       -6
10                                            10                                            10                                       10
  -7                                            -7                                            -7                                       -7
10                                            10                                            10                                       10
          -7        -6         -5        -4             -7         -6        -5        -4          -7     -6          -5    -4
                                                                                                                                          10-7   10-6     10-5     10-4
     10        10         10        10             10         10        10        10             10     10         10      10




                     Brazil                                         Chile                                    Greece                                   Korea
10-3                                          10-3                                          10-3                                     10-3

10-4                                          10-4                                          10-4                                     10-4

10-5                                          10-5                                          10-5                                     10-5

10-6                                          10-6                                          10-6                                     10-6

10-7 -7                                       10-7 -7                                       10-7 -7                                  10-7 -7
                    -6         -5        -4                        -6        -5        -4                 -6          -5    -4                     -6         -5     -4
   10          10         10        10           10           10        10        10           10       10         10      10           10       10       10       10



Ricardo Baeza-Yates and Carlos Castillo                                                                            Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                                                                            http://www.upf.edu/dtecn/
Outline                              Motivation              Results                            Conclusions



Power-law exponents

                                   Collection              In- Degree
                                        Brazil                1.9
                                        Chile                 2.0
                                        Greece                1.9
                                        Indochina             1.6
                                        Italy                 1.8
                                        South Korea           1.9
                                        Spain                 2.1
                                        U. K.                 1.8
                                   (Broder. . . 2000)         2.1
                                   (Dill. . . 2002)           2.1
                                                              ≈2
                                   (Kleinberg. . . 1999)

Ricardo Baeza-Yates and Carlos Castillo                          Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                      http://www.upf.edu/dtecn/
Outline                              Motivation                 Results                             Conclusions



Power-law exponents
  Collection                                   In-       Outdegree        Page-            HITS
                                             degree     Small Large       Rank          Hubs Auth.
       Brazil                                     1.9   0.7    2.7          1.8          2.9          1.8
       Chile                                      2.0   0.7    2.6          1.9          2.7          1.9
       Greece                                     1.9   0.6    1.9          1.8          2.6          1.8
       Indochina                                  1.6   0.7    2.6
       Italy                                      1.8   0.7    2.5
       South Korea                                1.9   0.3    2.0          1.8          3.7          1.8
       Spain                                      2.1   0.9    4.2          2.0
       U. K.                                      1.8   0.7    3.4
  (Broder. . . 2000)                              2.1          2.7
  (Dill. . . 2002)                                2.1          2.2
  (Pandurangan. . . 2002)                                                   2.1
                                                  ≈2
  (Kleinberg. . . 1999)
Ricardo Baeza-Yates and Carlos Castillo                              Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                          http://www.upf.edu/dtecn/
Outline                              Motivation              Results                            Conclusions



Hostgraph
                               www.example1.com


                                                                                      S1




                    www.example2.com


                                                                       S2




                                          www.example3.com
                                                                                       S3




Ricardo Baeza-Yates and Carlos Castillo                          Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                       http://www.upf.edu/dtecn/
Outline                              Motivation             Results                            Conclusions



Hostgraph also exhibits a power-law

                                                  Hostgraph degree
                             Collection             In      Out
                                  Brazil            1.9            1.9
                                  Chile             2.0            1.7
                                  Greece            2.0            1.6
                                  South Korea       1.2            1.4
                                  Spain             1.8            1.3
                             (Bharat. . . 2001)   1.6-1.7        1.7-1.8
                             (Dill. . . 2002)       2.3



Ricardo Baeza-Yates and Carlos Castillo                         Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                     http://www.upf.edu/dtecn/
Outline                                           Motivation                                                                  Results                                                      Conclusions



Web structure: connected components

          “Normal” vs “Giant” strongly connected components
                           Brazil                                                                Chile                                                               Greece
           100                                                      100                                                                     100
          10-1                                                     10-1                                                                    10-1
          10-2                                                     10-2                                                                    10-2
          10-3                                                     10-3                                                                    10-3
          10-4                                                     10-4                                                                    10-4
          10-5                                                     10-5                                                                    10-5
            -6                                                       -6                                                                      -6
          10                                                       10                                                                      10
               100   101   102   103       104         105               100         101         102   103       104         105                 100         101         102   103   104   105




                                                             Korea                                                                 Spain
                                 100                                                                   100
                                   -1                                                                    -1
                                 10                                                                    10
                                   -2                                                                    -2
                                 10                                                                    10
                                   -3                                                                    -3
                                 10                                                                    10
                                 10-4                                                                  10-4
                                 10-5                                                                  10-5
                                 10-6                                                                  10-6
                                     100         101         102   103         104         105             100         101         102     103         104         105




Ricardo Baeza-Yates and Carlos Castillo                                                                                                  Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                                                                                                         http://www.upf.edu/dtecn/
Outline                              Motivation               Results                            Conclusions



Conclusions



           V Consistent results across collections
           V Differences in the amount of spam
           V Comparison of other aspects [to be available soon]

                                                  Thank you




Ricardo Baeza-Yates and Carlos Castillo                           Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                       http://www.upf.edu/dtecn/
Outline                              Motivation               Results                            Conclusions



Conclusions



           V Consistent results across collections
           V Differences in the amount of spam
           V Comparison of other aspects [to be available soon]

                                                  Thank you




Ricardo Baeza-Yates and Carlos Castillo                           Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                       http://www.upf.edu/dtecn/
Outline                              Motivation               Results                            Conclusions



Conclusions



           V Consistent results across collections
           V Differences in the amount of spam
           V Comparison of other aspects [to be available soon]

                                                  Thank you




Ricardo Baeza-Yates and Carlos Castillo                           Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                       http://www.upf.edu/dtecn/
Outline                              Motivation               Results                            Conclusions



Conclusions



           V Consistent results across collections
           V Differences in the amount of spam
           V Comparison of other aspects [to be available soon]

                                                  Thank you




Ricardo Baeza-Yates and Carlos Castillo                           Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                       http://www.upf.edu/dtecn/
Outline                              Motivation               Results                            Conclusions



Conclusions



           V Consistent results across collections
           V Differences in the amount of spam
           V Comparison of other aspects [to be available soon]

                                                  Thank you




Ricardo Baeza-Yates and Carlos Castillo                           Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains                                       http://www.upf.edu/dtecn/

More Related Content

Similar to Link Analysis in National Web Domains (OSWIR 2005 Compiegne)

Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?Mathieu d'Aquin
 
Leadership tech summit 2012 feeley
Leadership tech summit 2012 feeleyLeadership tech summit 2012 feeley
Leadership tech summit 2012 feeleyFrancis A. Feeley
 
Sssc2011 ontologies final
Sssc2011 ontologies finalSssc2011 ontologies final
Sssc2011 ontologies finalElena Simperl
 
Assessing the impact of access to and availability of OER on the emergence an...
Assessing the impact of access to and availability of OER on the emergence an...Assessing the impact of access to and availability of OER on the emergence an...
Assessing the impact of access to and availability of OER on the emergence an...Open Education Consortium
 
The OCWC's Next Frontier - Learning Ecosystems by Gary Matkin, UCI
The OCWC's Next Frontier - Learning Ecosystems by Gary Matkin, UCIThe OCWC's Next Frontier - Learning Ecosystems by Gary Matkin, UCI
The OCWC's Next Frontier - Learning Ecosystems by Gary Matkin, UCIGary Matkin
 
2010 CRC Showcase - Workforce Development - E-learning for Rail P4.110
2010 CRC Showcase - Workforce Development - E-learning for Rail P4.1102010 CRC Showcase - Workforce Development - E-learning for Rail P4.110
2010 CRC Showcase - Workforce Development - E-learning for Rail P4.110CRC for Rail Innovation
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global DataspaceOpen Education Consortium
 
The sxu library and information literacy
The sxu library and information literacyThe sxu library and information literacy
The sxu library and information literacyvargas8854
 
CINECA webinar slides: How to make training FAIR
CINECA webinar slides: How to make training FAIRCINECA webinar slides: How to make training FAIR
CINECA webinar slides: How to make training FAIRCINECAProject
 
Using Linked Data in Learning Analytics tutorial - Introduction and basics of...
Using Linked Data in Learning Analytics tutorial - Introduction and basics of...Using Linked Data in Learning Analytics tutorial - Introduction and basics of...
Using Linked Data in Learning Analytics tutorial - Introduction and basics of...Mathieu d'Aquin
 
James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...
James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...
James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...IL Group (CILIP Information Literacy Group)
 
Developing a network of content providers: The case of Organic.Edunet
Developing a network of content providers: The case of Organic.EdunetDeveloping a network of content providers: The case of Organic.Edunet
Developing a network of content providers: The case of Organic.EdunetVassilis Protonotarios
 
Technology thingamijigs tesol 3 15-handouts
Technology thingamijigs tesol 3 15-handoutsTechnology thingamijigs tesol 3 15-handouts
Technology thingamijigs tesol 3 15-handoutsCynthia Schuemann
 
Importance of Open Educational Resources (OER) in Research
Importance of Open Educational Resources (OER) in ResearchImportance of Open Educational Resources (OER) in Research
Importance of Open Educational Resources (OER) in ResearchShri Ram
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Monica Munoz-Torres
 

Similar to Link Analysis in National Web Domains (OSWIR 2005 Compiegne) (20)

Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?
 
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
Sparling and Cohen "BIBFRAME Implementation at the University of Alberta Libr...
 
Leadership tech summit 2012 feeley
Leadership tech summit 2012 feeleyLeadership tech summit 2012 feeley
Leadership tech summit 2012 feeley
 
Nacme resources
Nacme resourcesNacme resources
Nacme resources
 
Sssc2011 ontologies final
Sssc2011 ontologies finalSssc2011 ontologies final
Sssc2011 ontologies final
 
Assessing the impact of access to and availability of OER on the emergence an...
Assessing the impact of access to and availability of OER on the emergence an...Assessing the impact of access to and availability of OER on the emergence an...
Assessing the impact of access to and availability of OER on the emergence an...
 
The OCWC's Next Frontier - Learning Ecosystems by Gary Matkin, UCI
The OCWC's Next Frontier - Learning Ecosystems by Gary Matkin, UCIThe OCWC's Next Frontier - Learning Ecosystems by Gary Matkin, UCI
The OCWC's Next Frontier - Learning Ecosystems by Gary Matkin, UCI
 
2010 CRC Showcase - Workforce Development - E-learning for Rail P4.110
2010 CRC Showcase - Workforce Development - E-learning for Rail P4.1102010 CRC Showcase - Workforce Development - E-learning for Rail P4.110
2010 CRC Showcase - Workforce Development - E-learning for Rail P4.110
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspace
 
The sxu library and information literacy
The sxu library and information literacyThe sxu library and information literacy
The sxu library and information literacy
 
CINECA webinar slides: How to make training FAIR
CINECA webinar slides: How to make training FAIRCINECA webinar slides: How to make training FAIR
CINECA webinar slides: How to make training FAIR
 
Three posters presented at AAAS2015
Three posters presented at AAAS2015Three posters presented at AAAS2015
Three posters presented at AAAS2015
 
Using Linked Data in Learning Analytics tutorial - Introduction and basics of...
Using Linked Data in Learning Analytics tutorial - Introduction and basics of...Using Linked Data in Learning Analytics tutorial - Introduction and basics of...
Using Linked Data in Learning Analytics tutorial - Introduction and basics of...
 
James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...
James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...
James, Robertson & Bell - Why, why, why DELILA? A project to promote the open...
 
Developing a network of content providers: The case of Organic.Edunet
Developing a network of content providers: The case of Organic.EdunetDeveloping a network of content providers: The case of Organic.Edunet
Developing a network of content providers: The case of Organic.Edunet
 
Technology thingamijigs tesol 3 15-handouts
Technology thingamijigs tesol 3 15-handoutsTechnology thingamijigs tesol 3 15-handouts
Technology thingamijigs tesol 3 15-handouts
 
Online Education:Changing with the times
Online Education:Changing with the timesOnline Education:Changing with the times
Online Education:Changing with the times
 
Open educational resources (OER): why they matter
Open educational resources (OER): why they matterOpen educational resources (OER): why they matter
Open educational resources (OER): why they matter
 
Importance of Open Educational Resources (OER) in Research
Importance of Open Educational Resources (OER) in ResearchImportance of Open Educational Resources (OER) in Research
Importance of Open Educational Resources (OER) in Research
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.
 

More from Carlos Castillo (ChaTo)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social MediaCarlos Castillo (ChaTo)
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Carlos Castillo (ChaTo)
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Carlos Castillo (ChaTo)
 

More from Carlos Castillo (ChaTo) (20)

Finding High Quality Content in Social Media
Finding High Quality Content in Social MediaFinding High Quality Content in Social Media
Finding High Quality Content in Social Media
 
When no clicks are good news
When no clicks are good newsWhen no clicks are good news
When no clicks are good news
 
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
Socia Media and Digital Volunteering in Disaster Management @ DSEM 2017
 
Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)Detecting Algorithmic Bias (keynote at DIR 2016)
Detecting Algorithmic Bias (keynote at DIR 2016)
 
Discrimination Discovery
Discrimination DiscoveryDiscrimination Discovery
Discrimination Discovery
 
Fairness-Aware Data Mining
Fairness-Aware Data MiningFairness-Aware Data Mining
Fairness-Aware Data Mining
 
Big Crisis Data for ISPC
Big Crisis Data for ISPCBig Crisis Data for ISPC
Big Crisis Data for ISPC
 
Databeers: Big Crisis Data
Databeers: Big Crisis DataDatabeers: Big Crisis Data
Databeers: Big Crisis Data
 
Observational studies in social media
Observational studies in social mediaObservational studies in social media
Observational studies in social media
 
Natural experiments
Natural experimentsNatural experiments
Natural experiments
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Link prediction
Link predictionLink prediction
Link prediction
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Graph Partitioning and Spectral Methods
Graph Partitioning and Spectral MethodsGraph Partitioning and Spectral Methods
Graph Partitioning and Spectral Methods
 
Finding Dense Subgraphs
Finding Dense SubgraphsFinding Dense Subgraphs
Finding Dense Subgraphs
 
Graph Evolution Models
Graph Evolution ModelsGraph Evolution Models
Graph Evolution Models
 
Link-Based Ranking
Link-Based RankingLink-Based Ranking
Link-Based Ranking
 
Text Indexing / Inverted Indices
Text Indexing / Inverted IndicesText Indexing / Inverted Indices
Text Indexing / Inverted Indices
 
Indexing
IndexingIndexing
Indexing
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 

Recently uploaded

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Link Analysis in National Web Domains (OSWIR 2005 Compiegne)

  • 1. Outline Motivation Results Conclusions Link Analysis in National Web Domains Ricardo Baeza-Yates and Carlos Castillo ICREA / C´tedra Telef´nica, Universitat Pompeu Fabra a o http://www.upf.edu/dtecn/ OSWIR 2005 Compiegne, France September 19, 2005 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 2. Outline Motivation Results Conclusions Motivation 1 Results 2 Conclusions 3 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 3. Outline Motivation Results Conclusions Motivation Sampling the Web X We don’t have access to a global-scale collection X A set of Web sites in the same organization is not diverse enough X A set of Web sites in the same topic might not be representative X A set of random Web sites might not be connected V A national domain has a good balance between diversity and completeness Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 4. Outline Motivation Results Conclusions Motivation Sampling the Web X We don’t have access to a global-scale collection X A set of Web sites in the same organization is not diverse enough X A set of Web sites in the same topic might not be representative X A set of random Web sites might not be connected V A national domain has a good balance between diversity and completeness Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 5. Outline Motivation Results Conclusions Motivation Sampling the Web X We don’t have access to a global-scale collection X A set of Web sites in the same organization is not diverse enough X A set of Web sites in the same topic might not be representative X A set of random Web sites might not be connected V A national domain has a good balance between diversity and completeness Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 6. Outline Motivation Results Conclusions Motivation Sampling the Web X We don’t have access to a global-scale collection X A set of Web sites in the same organization is not diverse enough X A set of Web sites in the same topic might not be representative X A set of random Web sites might not be connected V A national domain has a good balance between diversity and completeness Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 7. Outline Motivation Results Conclusions Motivation Sampling the Web X We don’t have access to a global-scale collection X A set of Web sites in the same organization is not diverse enough X A set of Web sites in the same topic might not be representative X A set of random Web sites might not be connected V A national domain has a good balance between diversity and completeness Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 8. Outline Motivation Results Conclusions Collections used V Different economical, historical, linguistic, geographical contexts Collection Year Brazil 2005 Chile 2004 Greece 2004 Indochina 2004 Italy 2004 South Korea 2004 Spain 2004 U. K. 2002 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 9. Outline Motivation Results Conclusions Collections used Collection Year Available hosts Pages [mill] (rank) [mill] 11th Brazil 2005 3.9 4.7 42th Chile 2004 0.3 3.3 40th Greece 2004 0.3 3.7 38th Indochina 2004 0.5 7.4 4th Italy 2004 9.3 41.3 47th South Korea 2004 0.2 8.9 25th Spain 2004 1.3 16.2 10th U. K. 2002 4.4 18.5 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 10. Outline Motivation Results Conclusions Scale-free topology If we sort pages by the number of in-links, the k th page has indegree proportional to k −α (Zipf’s Law). = The fraction of pages with x in-links is proportional to x −θ (Power law). Experimentally, θ ≈ 2.1 on the Web Partial explanation: a multiplicative process; if dt is the number of links at time t, then dt+1 = C × dt . Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 11. Outline Motivation Results Conclusions Scale-free topology If we sort pages by the number of in-links, the k th page has indegree proportional to k −α (Zipf’s Law). = The fraction of pages with x in-links is proportional to x −θ (Power law). Experimentally, θ ≈ 2.1 on the Web Partial explanation: a multiplicative process; if dt is the number of links at time t, then dt+1 = C × dt . Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 12. Outline Motivation Results Conclusions Scale-free topology If we sort pages by the number of in-links, the k th page has indegree proportional to k −α (Zipf’s Law). = The fraction of pages with x in-links is proportional to x −θ (Power law). Experimentally, θ ≈ 2.1 on the Web Partial explanation: a multiplicative process; if dt is the number of links at time t, then dt+1 = C × dt . Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 13. Outline Motivation Results Conclusions In-degree Brazil Chile Greece 10−1 10−1 10−1 10−2 10−2 10−2 10−3 10−3 10−3 −4 −4 −4 10 10 10 10−5 10−5 10−5 10−6 10−6 10−6 10−7 0 10−7 0 10−7 0 101 102 103 104 101 102 103 104 101 102 103 104 10 10 10 Italy Korea Spain 10−1 10−1 10−1 −2 −2 −2 10 10 10 −3 −3 −3 10 10 10 −4 −4 −4 10 10 10 10−5 10−5 10−5 10−6 10−6 10−6 10−7 10−7 10−7 100 101 102 103 104 100 101 102 103 104 100 101 102 103 104 U.K. 10−1 10−2 −3 10 10−4 10−5 10−6 −7 10 100 101 102 103 104 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 14. Outline Motivation Results Conclusions Out-degree Brazil Chile Greece 10−1 10−1 10−1 −2 −2 −2 10 10 10 10−3 10−3 10−3 −4 −4 10−4 10 10 10−5 10−5 10−5 10−6 0 10−6 0 10−6 0 101 102 103 101 102 103 101 102 103 10 10 10 Italy Korea Spain 10−1 10−1 10−1 10−2 10−2 10−2 10−3 10−3 10−3 10−4 10−4 10−4 −5 −5 −5 10 10 10 10−6 10−6 10−6 100 1 2 3 100 1 2 3 100 101 102 103 10 10 10 10 10 10 U.K. 10−1 −2 10 −3 10 10−4 10−5 −6 10 100 101 102 103 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 15. Outline Motivation Results Conclusions Link scores (PageRank, Hubs, Authorities) Brazil Chile Greece Korea 10-2 10-2 10-2 10-2 -3 -3 -3 -3 10 10 10 10 10-4 10-4 10-4 10-4 -5 -5 -5 -5 10 10 10 10 10-6 10-6 10-6 10-6 10-7 -7 10-7 -7 10-7 -7 10-7 -7 -6 -5 -4 -6 -5 -4 -6 -5 -4 -6 -5 -4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 Brazil Chile Greece Korea 10-3 10-3 10-3 10-3 -4 -4 -4 -4 10 10 10 10 10-5 10-5 10-5 10-5 -6 -6 -6 -6 10 10 10 10 -7 -7 -7 -7 10 10 10 10 -7 -6 -5 -4 -7 -6 -5 -4 -7 -6 -5 -4 10-7 10-6 10-5 10-4 10 10 10 10 10 10 10 10 10 10 10 10 Brazil Chile Greece Korea 10-3 10-3 10-3 10-3 10-4 10-4 10-4 10-4 10-5 10-5 10-5 10-5 10-6 10-6 10-6 10-6 10-7 -7 10-7 -7 10-7 -7 10-7 -7 -6 -5 -4 -6 -5 -4 -6 -5 -4 -6 -5 -4 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 16. Outline Motivation Results Conclusions Power-law exponents Collection In- Degree Brazil 1.9 Chile 2.0 Greece 1.9 Indochina 1.6 Italy 1.8 South Korea 1.9 Spain 2.1 U. K. 1.8 (Broder. . . 2000) 2.1 (Dill. . . 2002) 2.1 ≈2 (Kleinberg. . . 1999) Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 17. Outline Motivation Results Conclusions Power-law exponents Collection In- Outdegree Page- HITS degree Small Large Rank Hubs Auth. Brazil 1.9 0.7 2.7 1.8 2.9 1.8 Chile 2.0 0.7 2.6 1.9 2.7 1.9 Greece 1.9 0.6 1.9 1.8 2.6 1.8 Indochina 1.6 0.7 2.6 Italy 1.8 0.7 2.5 South Korea 1.9 0.3 2.0 1.8 3.7 1.8 Spain 2.1 0.9 4.2 2.0 U. K. 1.8 0.7 3.4 (Broder. . . 2000) 2.1 2.7 (Dill. . . 2002) 2.1 2.2 (Pandurangan. . . 2002) 2.1 ≈2 (Kleinberg. . . 1999) Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 18. Outline Motivation Results Conclusions Hostgraph www.example1.com S1 www.example2.com S2 www.example3.com S3 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 19. Outline Motivation Results Conclusions Hostgraph also exhibits a power-law Hostgraph degree Collection In Out Brazil 1.9 1.9 Chile 2.0 1.7 Greece 2.0 1.6 South Korea 1.2 1.4 Spain 1.8 1.3 (Bharat. . . 2001) 1.6-1.7 1.7-1.8 (Dill. . . 2002) 2.3 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 20. Outline Motivation Results Conclusions Web structure: connected components “Normal” vs “Giant” strongly connected components Brazil Chile Greece 100 100 100 10-1 10-1 10-1 10-2 10-2 10-2 10-3 10-3 10-3 10-4 10-4 10-4 10-5 10-5 10-5 -6 -6 -6 10 10 10 100 101 102 103 104 105 100 101 102 103 104 105 100 101 102 103 104 105 Korea Spain 100 100 -1 -1 10 10 -2 -2 10 10 -3 -3 10 10 10-4 10-4 10-5 10-5 10-6 10-6 100 101 102 103 104 105 100 101 102 103 104 105 Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 21. Outline Motivation Results Conclusions Conclusions V Consistent results across collections V Differences in the amount of spam V Comparison of other aspects [to be available soon] Thank you Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 22. Outline Motivation Results Conclusions Conclusions V Consistent results across collections V Differences in the amount of spam V Comparison of other aspects [to be available soon] Thank you Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 23. Outline Motivation Results Conclusions Conclusions V Consistent results across collections V Differences in the amount of spam V Comparison of other aspects [to be available soon] Thank you Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 24. Outline Motivation Results Conclusions Conclusions V Consistent results across collections V Differences in the amount of spam V Comparison of other aspects [to be available soon] Thank you Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/
  • 25. Outline Motivation Results Conclusions Conclusions V Consistent results across collections V Differences in the amount of spam V Comparison of other aspects [to be available soon] Thank you Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain Link Analysis in National Web Domains http://www.upf.edu/dtecn/