1. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Sampling graphs efficiently: model assisted designs
and application to Twitter data
Antoine Rebecq
Universit´e Paris X - INSEE
3/23/17
Antoine Rebecq Sampling designs for graphs
2. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
1 Statistics and networks
Graphs and stats
Methods - algorithms - models
2 Survey sampling
Estimates
Use of auxiliary information
3 Extending the sampling design
Snowball sampling
Adaptive sampling
4 Application to Twitter data
The problem
Results
Model-assisted sampling
Antoine Rebecq Sampling designs for graphs
3. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Section 1
Statistics and networks
Antoine Rebecq Sampling designs for graphs
4. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Subsection 1
Graphs and stats
Antoine Rebecq Sampling designs for graphs
5. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Graphs
Graph G, set of vertices and edges : G = (V , E)
Antoine Rebecq Sampling designs for graphs
6. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Directed graphs
Antoine Rebecq Sampling designs for graphs
7. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Statistics of interest - graphs
Size
Degree
Centrality
Clustering
Communities
. . .
Antoine Rebecq Sampling designs for graphs
8. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Degree
dv = number of edges incident upon vertex v
Antoine Rebecq Sampling designs for graphs
9. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Degree / scale-free property
Antoine Rebecq Sampling designs for graphs
10. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Path lengths
Antoine Rebecq Sampling designs for graphs
11. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Centrality
Measure of “importance” of a node.
Examples : Google Pagerank, betweenness centrality (number of
times a node acts as a bridge along the shortest path between two
other nodes)
Antoine Rebecq Sampling designs for graphs
12. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Betweenness centrality
Antoine Rebecq Sampling designs for graphs
13. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Clustering
Global clustering coefficient =
3 · number of triangles
number of connected triplets
Local clustering coefficient of a vertex = how close its neighbours
are to being a clique (complete graph).
Antoine Rebecq Sampling designs for graphs
14. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Local clustering coefficient
Antoine Rebecq Sampling designs for graphs
15. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
The rise of “big graphs”
Rise of “big graphs”
Antoine Rebecq Sampling designs for graphs
16. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
The rise of “big graphs”
Example : The Graph500 benchmark
(http://www.graph500.org). Size of data sets up to 1.1 PB
adjacency list (human connectome size)
Antoine Rebecq Sampling designs for graphs
17. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Subsection 2
Methods - algorithms - models
Antoine Rebecq Sampling designs for graphs
18. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Methods for graph statistics
Algorithms (computer science, “big data”)
Model-based estimation
Sampling (“Design-based estimation”)
Antoine Rebecq Sampling designs for graphs
19. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Methods for graph statistics
Antoine Rebecq Sampling designs for graphs
20. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Computer science methods
Efficient algorithms (speed / memory).
Antoine Rebecq Sampling designs for graphs
21. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Computer science methods
Efficient algorithms (speed / memory).
Sometimes require sampling.
Antoine Rebecq Sampling designs for graphs
22. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Model-based estimation
Famous graph models :
Erd˝os-R´enyi
Price / Barab´asi-Albert (High tailed degree distribution)
Watts-Strogatz / “small-world” (short path lengths)
Stochastic block models (communities)
Images from [8]
Antoine Rebecq Sampling designs for graphs
23. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Model-based estimation : Erd˝os-R´enyi (“random graphs”)
Antoine Rebecq Sampling designs for graphs
24. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Model-based estimation : Barab´asi-Albert (“preferential
attachment”)
Antoine Rebecq Sampling designs for graphs
25. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Model-based estimation : Watts-Strogatz (“small world”)
Antoine Rebecq Sampling designs for graphs
26. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Model-based estimation : Stochastic Block Models
Antoine Rebecq Sampling designs for graphs
27. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Graphs and stats
Methods - algorithms - models
Sampling / Design-based estimation
Sampling : select a few vertices/edges and compute estimators
using sample data. Very little exists about design-based statistical
inference on networks (Kolaczyk 2009 , [5])
We try survey sampling methods used in official Statistics
Institutes to make design-based inference about “big graphs”
Antoine Rebecq Sampling designs for graphs
28. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Estimates
Use of auxiliary information
Section 2
Survey sampling
Antoine Rebecq Sampling designs for graphs
29. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Estimates
Use of auxiliary information
Subsection 1
Estimates
Antoine Rebecq Sampling designs for graphs
30. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Estimates
Use of auxiliary information
Horvitz-Thompson estimator
Population U (here vertices of the graph).
Assign all k ∈ U an inclusion probability P(k ∈ s) = πk
Antoine Rebecq Sampling designs for graphs
31. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Estimates
Use of auxiliary information
Horvitz-Thompson estimator
Classic unbiased estimator for totals and means :
Horvitz-Thompson
ˆT(Y )HT =
k∈s
yk
πk
ˆ¯y =
1
N
k∈s
yk
πk
Antoine Rebecq Sampling designs for graphs
32. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Estimates
Use of auxiliary information
Horvitz-Thompson estimator
Variance of the Horvitz-Thompson estimator depends on the first
and second-order inclusion probabilities :
πk = P(k ∈ s)
πkl = P(k, l ∈ s)
V( ˆT(Y )HT ) =
k∈U l∈U
(πkl − πkπl )
yk
πk
yl
πl
Antoine Rebecq Sampling designs for graphs
33. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Estimates
Use of auxiliary information
Bernoulli sampling
Poisson sampling : For each k ∈ U , run a πk-Bernoulli experiment
to decide whether to include unit k in the sample.
Bernoulli sampling : ∀k, πk = p
Antoine Rebecq Sampling designs for graphs
34. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Estimates
Use of auxiliary information
Subsection 2
Use of auxiliary information
Antoine Rebecq Sampling designs for graphs
35. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Estimates
Use of auxiliary information
Auxiliary information
If πk ∝ yk then V( ˆT(Y )HT ) = 0
In practice, use auxiliary variable : X which is well correlated to Y .
Antoine Rebecq Sampling designs for graphs
36. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Estimates
Use of auxiliary information
Stratified sampling
We write : U = U1 U2 . . . UH and draw independant
samples in each Uh.
Strata should be formed so that intra dispersion of yk is the lowest
possible.
Antoine Rebecq Sampling designs for graphs
37. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Estimates
Use of auxiliary information
Stratified sampling : Neyman allocation
Given a set of strata and a sample size n, optimal variance is
obtained for :
nh =
NhS2
h
h
NhS2
h
Antoine Rebecq Sampling designs for graphs
38. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Estimates
Use of auxiliary information
Calibrated estimator
Deville-Sarndal, 1992 ([2]). Modification of the Horvitz-Thompson
estimator to take auxiliary information into account.
Very similar to empirical likelihood methods ([7]).
Computing variances for calibrated estimators is easy.
Antoine Rebecq Sampling designs for graphs
39. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Section 3
Extending the sampling design
Antoine Rebecq Sampling designs for graphs
40. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Official statistics
Measuring “hidden populations”
Antoine Rebecq Sampling designs for graphs
41. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Community structure
When trying to measure the size of a community ( ˆNC ), use of
edges as auxiliary variables.
Antoine Rebecq Sampling designs for graphs
42. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Snowball sampling
From now on, our sampling designs will include extensions :
s = s0 ∪ sext
Antoine Rebecq Sampling designs for graphs
43. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Subsection 1
Snowball sampling
Antoine Rebecq Sampling designs for graphs
44. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Snowball sampling
Population U
Antoine Rebecq Sampling designs for graphs
45. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Snowball sampling
Initial sample s0
Antoine Rebecq Sampling designs for graphs
46. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Snowball sampling
One stage snowball extension s = A(s0)
Antoine Rebecq Sampling designs for graphs
47. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Snowball sampling
Formally, we write :
Bi = {i} ∪ {j ∈ V , Eji = ∅}
Ai = {i} ∪ {j ∈ V , Eij = ∅}
s = A(s0)
Antoine Rebecq Sampling designs for graphs
48. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Snowball sampling
ˆNC3 =
k∈s
zi
1 − ¯π(Bi )
where :
¯π(Bi ) = P(Bi ⊂ ¯s)
=
k∈Bi
(1 − P(k ∈ s))
Antoine Rebecq Sampling designs for graphs
49. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Snowball sampling
ˆV( ˆNC3) =
i∈s j∈s
zi zj
¯π(Bi ∪ Bj )
γij
where :
γij =
¯π(Bi ∪ Bj ) − ¯π(Bi )¯π(Bj )
[1 − ¯π(Bi )][1 − ¯π(Bj )]
Antoine Rebecq Sampling designs for graphs
50. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Subsection 2
Adaptive sampling
Antoine Rebecq Sampling designs for graphs
51. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Adaptive sampling
Adaptive sampling (Thompson, [9])
Used in official statistics to measure number of drugs users or
HIV-positive people
Sampling design often compared to the video game
“minesweeper”
Antoine Rebecq Sampling designs for graphs
52. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Adaptive sampling
Image from [10]
Antoine Rebecq Sampling designs for graphs
53. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Adaptive sampling
Once a unit bearing the characteristic of interest is found, all its
network is included in the sample.
Antoine Rebecq Sampling designs for graphs
54. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Adaptive sampling
Estimator :
ˆNC4 =
K
k=1
n∗
CkJk
πgk
where :
K = number of networks
y∗
k = total of Y in the network k
n∗
Ck
= Number of people with yk ≥ 1 in the network k
Jk = 1{k ∈ C}
πgk = probability that the initial sample intersects k
Antoine Rebecq Sampling designs for graphs
55. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Adaptive sampling
When using an adaptive design, it is often better to use the
Rao-Blackwell of the previous estimate. It has a very simple closed
form in the case of the adaptive stratified.
ˆNC5 = n0
+
K
k=1
nr
1 − (1 − p)nr
where : n0 = #s0 and s0 = ∪r {k ∈ s, δ(k, C) = 1} is the union of
the sides of C.
Antoine Rebecq Sampling designs for graphs
56. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Adaptive sampling - Variance
ˆV( ˆNC4) =
K
k=1
K
k =1
ykyk
πgkk
πgkk
πgkπgk
− 1
where :
πgkk = 1 − πgk − πgk + (1 − p)ngk +ngk
Antoine Rebecq Sampling designs for graphs
57. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
Snowball sampling
Adaptive sampling
Adaptive sampling - Variance
Variance estimation for the Rao-Blackwell can be done by selecting
m samples :
ˆV( ˆNC5) = ˆV( ˆNC4) −
1
m − 1
m
i=1
( ˆNC5i − ˆNC4)2
Antoine Rebecq Sampling designs for graphs
58. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Section 4
Application to Twitter data
Antoine Rebecq Sampling designs for graphs
59. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Subsection 1
The problem
Antoine Rebecq Sampling designs for graphs
60. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
The Twitter graph
Twitter in 2013
Image from [1]
Antoine Rebecq Sampling designs for graphs
61. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
The Twitter API
Access to the Twitter data through an API (Application
programming interface), which limits the number of calls per hour.
Antoine Rebecq Sampling designs for graphs
62. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Example : Star Wars : The Force Awakens
How many (real) users behind tweets talking about the new Star
Wars movie ?
Antoine Rebecq Sampling designs for graphs
63. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Example : “Star Wars, The Force Awakens”
Let’s write :
yk = Number of tweets @starwars by user k
between 10/29/15, 7 :48 - 10 :48 PM EST
zk = 1{yk ≥ 1}
Goal : estimate NC = T(Z)
Additionally, we write : nC =
k∈s
zk
Antoine Rebecq Sampling designs for graphs
64. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
The Twitter graph
The Twitter graph ([6]) :
Is directed
Degree distribution is heavy-tailed
Antoine Rebecq Sampling designs for graphs
65. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
The Twitter graph
Has small path lengths
Antoine Rebecq Sampling designs for graphs
66. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Sampling designs
1 Bernoulli sample
2 Stratified Bernoulli
3 Snowball over the stratified Bernoulli
4 Adaptive over the stratified Bernoulli
5 (Rao-blackwell of the adaptive estimator)
Antoine Rebecq Sampling designs for graphs
67. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Stratification
U1 = Followers of official @starwars account
U2 = Rest of Twitter users
Antoine Rebecq Sampling designs for graphs
68. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Stratification : Neyman allocation
Given some preliminary exploratory data, we get (for n = 2000) :
n1 = 9700
n2 = 10300
Antoine Rebecq Sampling designs for graphs
69. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Sample size - extension
Size of s0 : 1000 (so that total sample size, with extensions, would
be about n = 20000).
Antoine Rebecq Sampling designs for graphs
70. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Calibration variables
N = Number of users in scope
Structure of number of followers
Number of verified users
. . .
Antoine Rebecq Sampling designs for graphs
71. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Estimators
ˆNC1 =
nC
p
ˆNC2 =
N1
n1
nC1 +
N − N1
n2
nC2
ˆNC3 =
k∈s
zi
1 − ¯π(Bi )
ˆNC4 =
K
k=1
n∗
CkJk
πgk
ˆNC5 = n0
+
K
k=1
nr
1 − (1 − p)nr
Antoine Rebecq Sampling designs for graphs
72. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Exclusion probabilities
¯π(Bi ) = P(Bi ⊂ ¯s)
=
k∈Bi
(1 − P(k ∈ s))
= q
#(Bi ∩U1)
S1 · q
#(Bi ∩U2)
S2
Antoine Rebecq Sampling designs for graphs
73. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Subsection 2
Results
Antoine Rebecq Sampling designs for graphs
74. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Results
Design n nscope n0
ˆNC
ˆCV ˆDeff
Bernoulli 20013 3946 354121 0.231 1.04
Stratified 20094 9832 316889 0.097 0.68
1-snowball 159957 73570 1000 331097 0.031 0.60
Antoine Rebecq Sampling designs for graphs
75. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Results
Mean number of tweets @StarWars per user : 1.18 ± 0.07
Suggests that bots are not responsible for this very large number of
tweets (see [4], [3]) !
Adaptive sampling did not converge.
Antoine Rebecq Sampling designs for graphs
76. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Subsection 3
Model-assisted sampling
Antoine Rebecq Sampling designs for graphs
77. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Auxiliary information for Barab´asi-Albert model :
Degree Centrality Local clustering Mean path Max path
Degree ++ - - - -
Centrality - - - -
Local clustering + +
Mean path ++
Max path
Antoine Rebecq Sampling designs for graphs
78. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Future work
Combine all these (optimal allocations, etc.)
Asymptotics
Antoine Rebecq Sampling designs for graphs
79. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Conclusion
Thank you !
http://nc233.com/madstat2017
@nc233
Antoine Rebecq Sampling designs for graphs
80. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
Paul Burkhardt and Chris Waring.
An nsa big graph experiment.
In presentation at the Carnegie Mellon University SDI/ISTC
Seminar, Pittsburgh, Pa, 2013.
Jean-Claude Deville and Carl-Erik S¨arndal.
Calibration estimators in survey sampling.
Journal of the American statistical Association,
87(418) :376–382, 1992.
Emilio Ferrara.
”manipulation and abuse on social media” by emilio ferrara
with ching-man au yeung as coordinator.
SIGWEB Newsl., (Spring) :4 :1–4 :9, April 2015.
Emilio Ferrara, Onur Varol, Clayton Davis, Filippo Menczer,
and Alessandro Flammini.
The rise of social bots.
Antoine Rebecq Sampling designs for graphs
81. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
arXiv preprint arXiv :1407.5225, 2014.
Eric D Kolaczyk.
Statistical analysis of network data.
Springer, 2009.
Seth A Myers, Aneesh Sharma, Pankaj Gupta, and Jimmy Lin.
Information network or social network ? : the structure of the
twitter follow graph.
In Proceedings of the companion publication of the 23rd
international conference on World wide web companion, pages
493–498. International World Wide Web Conferences Steering
Committee, 2014.
Art B. Owen.
Empirical likelihood.
CRC press, 2010.
Tiago P. Peixoto.
Antoine Rebecq Sampling designs for graphs
82. Statistics and networks
Survey sampling
Extending the sampling design
Application to Twitter data
The problem
Results
Model-assisted sampling
The graph-tool python library.
figshare, 2014.
Steven K Thompson.
Adaptive cluster sampling.
Journal of the American Statistical Association,
85(412) :1050–1059, 1990.
Steven K Thompson.
Stratified adaptive cluster sampling.
Biometrika, pages 389–397, 1991.
Antoine Rebecq Sampling designs for graphs