The document summarizes research on analyzing the link structure of national web domains from different countries. The researchers collected web graphs from various country-code top-level domains like .br, .cl, .gr, etc. They found that all of the national web graphs exhibited scale-free topology, with power law distributions for in-degrees and out-degrees of pages. PageRank, hub, and authority scores of pages in each national domain also followed power law distributions. The exponents of the in-degree power laws were also reported for each country.
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Link Analysis in National Web Domains (OSWIR 2005 Compiegne)
1. Outline Motivation Results Conclusions
Link Analysis in National Web Domains
Ricardo Baeza-Yates and Carlos Castillo
ICREA / C´tedra Telef´nica, Universitat Pompeu Fabra
a o
http://www.upf.edu/dtecn/
OSWIR 2005
Compiegne, France
September 19, 2005
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
2. Outline Motivation Results Conclusions
Motivation
1
Results
2
Conclusions
3
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
3. Outline Motivation Results Conclusions
Motivation
Sampling the Web
X We don’t have access to a global-scale collection
X A set of Web sites in the same organization is not diverse
enough
X A set of Web sites in the same topic might not be
representative
X A set of random Web sites might not be connected
V A national domain has a good balance between
diversity and completeness
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
4. Outline Motivation Results Conclusions
Motivation
Sampling the Web
X We don’t have access to a global-scale collection
X A set of Web sites in the same organization is not diverse
enough
X A set of Web sites in the same topic might not be
representative
X A set of random Web sites might not be connected
V A national domain has a good balance between
diversity and completeness
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
5. Outline Motivation Results Conclusions
Motivation
Sampling the Web
X We don’t have access to a global-scale collection
X A set of Web sites in the same organization is not diverse
enough
X A set of Web sites in the same topic might not be
representative
X A set of random Web sites might not be connected
V A national domain has a good balance between
diversity and completeness
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
6. Outline Motivation Results Conclusions
Motivation
Sampling the Web
X We don’t have access to a global-scale collection
X A set of Web sites in the same organization is not diverse
enough
X A set of Web sites in the same topic might not be
representative
X A set of random Web sites might not be connected
V A national domain has a good balance between
diversity and completeness
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
7. Outline Motivation Results Conclusions
Motivation
Sampling the Web
X We don’t have access to a global-scale collection
X A set of Web sites in the same organization is not diverse
enough
X A set of Web sites in the same topic might not be
representative
X A set of random Web sites might not be connected
V A national domain has a good balance between
diversity and completeness
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
8. Outline Motivation Results Conclusions
Collections used
V Different economical, historical, linguistic, geographical
contexts
Collection Year
Brazil 2005
Chile 2004
Greece 2004
Indochina 2004
Italy 2004
South Korea 2004
Spain 2004
U. K. 2002
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
9. Outline Motivation Results Conclusions
Collections used
Collection Year Available hosts Pages
[mill] (rank) [mill]
11th
Brazil 2005 3.9 4.7
42th
Chile 2004 0.3 3.3
40th
Greece 2004 0.3 3.7
38th
Indochina 2004 0.5 7.4
4th
Italy 2004 9.3 41.3
47th
South Korea 2004 0.2 8.9
25th
Spain 2004 1.3 16.2
10th
U. K. 2002 4.4 18.5
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
10. Outline Motivation Results Conclusions
Scale-free topology
If we sort pages by the number of in-links, the k th page
has indegree proportional to k −α (Zipf’s Law).
= The fraction of pages with x in-links is proportional to
x −θ (Power law). Experimentally, θ ≈ 2.1 on the Web
Partial explanation: a multiplicative process; if dt is the
number of links at time t, then dt+1 = C × dt .
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
11. Outline Motivation Results Conclusions
Scale-free topology
If we sort pages by the number of in-links, the k th page
has indegree proportional to k −α (Zipf’s Law).
= The fraction of pages with x in-links is proportional to
x −θ (Power law). Experimentally, θ ≈ 2.1 on the Web
Partial explanation: a multiplicative process; if dt is the
number of links at time t, then dt+1 = C × dt .
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
12. Outline Motivation Results Conclusions
Scale-free topology
If we sort pages by the number of in-links, the k th page
has indegree proportional to k −α (Zipf’s Law).
= The fraction of pages with x in-links is proportional to
x −θ (Power law). Experimentally, θ ≈ 2.1 on the Web
Partial explanation: a multiplicative process; if dt is the
number of links at time t, then dt+1 = C × dt .
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
16. Outline Motivation Results Conclusions
Power-law exponents
Collection In- Degree
Brazil 1.9
Chile 2.0
Greece 1.9
Indochina 1.6
Italy 1.8
South Korea 1.9
Spain 2.1
U. K. 1.8
(Broder. . . 2000) 2.1
(Dill. . . 2002) 2.1
≈2
(Kleinberg. . . 1999)
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
17. Outline Motivation Results Conclusions
Power-law exponents
Collection In- Outdegree Page- HITS
degree Small Large Rank Hubs Auth.
Brazil 1.9 0.7 2.7 1.8 2.9 1.8
Chile 2.0 0.7 2.6 1.9 2.7 1.9
Greece 1.9 0.6 1.9 1.8 2.6 1.8
Indochina 1.6 0.7 2.6
Italy 1.8 0.7 2.5
South Korea 1.9 0.3 2.0 1.8 3.7 1.8
Spain 2.1 0.9 4.2 2.0
U. K. 1.8 0.7 3.4
(Broder. . . 2000) 2.1 2.7
(Dill. . . 2002) 2.1 2.2
(Pandurangan. . . 2002) 2.1
≈2
(Kleinberg. . . 1999)
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
18. Outline Motivation Results Conclusions
Hostgraph
www.example1.com
S1
www.example2.com
S2
www.example3.com
S3
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
19. Outline Motivation Results Conclusions
Hostgraph also exhibits a power-law
Hostgraph degree
Collection In Out
Brazil 1.9 1.9
Chile 2.0 1.7
Greece 2.0 1.6
South Korea 1.2 1.4
Spain 1.8 1.3
(Bharat. . . 2001) 1.6-1.7 1.7-1.8
(Dill. . . 2002) 2.3
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
21. Outline Motivation Results Conclusions
Conclusions
V Consistent results across collections
V Differences in the amount of spam
V Comparison of other aspects [to be available soon]
Thank you
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
22. Outline Motivation Results Conclusions
Conclusions
V Consistent results across collections
V Differences in the amount of spam
V Comparison of other aspects [to be available soon]
Thank you
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
23. Outline Motivation Results Conclusions
Conclusions
V Consistent results across collections
V Differences in the amount of spam
V Comparison of other aspects [to be available soon]
Thank you
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
24. Outline Motivation Results Conclusions
Conclusions
V Consistent results across collections
V Differences in the amount of spam
V Comparison of other aspects [to be available soon]
Thank you
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/
25. Outline Motivation Results Conclusions
Conclusions
V Consistent results across collections
V Differences in the amount of spam
V Comparison of other aspects [to be available soon]
Thank you
Ricardo Baeza-Yates and Carlos Castillo Universitat Pompeu Fabra - Barcelona, Spain
Link Analysis in National Web Domains http://www.upf.edu/dtecn/