Sinks Method Paper Presentation @ Duke Political Networks Conference 2010
1. Distance Measures for Dynamic Citation Networks
M. Bommarito D. Katz J. Zelner J. Fowler
May 21, 2010
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 1 / 21
2. Outline
1 Goals
Supreme Court Citation Network
2 Citation Dynamics and Sinks
3 Distance Measures for Dynamic Citation Networks
4 How does the “sink” method perform?
Simulation Results
United States Supreme Court
5 Conclusion and Future Directions
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 2 / 21
3. Goals Supreme Court Citation Network
Goals & Data
Goal: Can we uncover various mesoscopic patterns within the
jurisprudence of the United States Supreme Court?
1 |V | ≈ 36k, |E| ≈ 280k
2 1791-2005
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 3 / 21
4. Goals Supreme Court Citation Network
Standard Solution
Standard Solution: Obtain vertex community membership by
applying an out-of-the-box community detection method.
Methods:
1 Edge-Betweenness (Girvan & Newman 2002)
2 Fast-Greedy (Clauset et al. 2004)
3 Leading (or more) Eigenvector (Newman 2006, Richardson et al.
2009)
4 Walktrap (Pons & Latapy 2006)
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 4 / 21
5. Goals Supreme Court Citation Network
Expectations
Expectation: Dyadic relationships should be fairly stable.
If two vertices are in the same community m at t, they should be in the
same community n (not necessarily identical to m) at t + 1.
Formally, this can be written as “pairwise stability” σ:
σ =P(Cit+1 = Cj |Cit = Cj )
t+1 t
Cit :community membership of vertex i at time t
This conception of stability avoids many issues with community tracking.
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 5 / 21
6. Goals Supreme Court Citation Network
Results
Fast-Greedy Eigenvector
The results of these approaches do not match our expectation.
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 6 / 21
7. Goals Supreme Court Citation Network
Research Source
Title: On the Stability of Community Detection Algorithms on
Longitudinal Citation Data.
Michael J. Bommarito II, Daniel M. Katz, Jonathan L. Zelner.
Forthcoming in Proceedings of ASNA 2009 (ETH-Zurich).
Goal: Compare out-of-the-box community detection methods under
different parameters of a citation model w.r.t.:
1 Average number of resulting communities across all time steps
2 Average pairwise stability of all vertex pairs across all time steps
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 7 / 21
8. Goals Supreme Court Citation Network
Results
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 8 / 21
9. Goals Supreme Court Citation Network
Implications
Citation networks are different.
1 Patterns within citation networks are not well-revealed by these
methods.
2 Qualitative conclusions may vary dramatically based on the chosen
method.
3 The “appropriateness” of each method may depend on parameters of
the generating process.
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 9 / 21
10. Citation Dynamics and Sinks
Citation Dynamics
What are the basic growth rules of a citation network?
1 Documents and their citations are introduced into the network in
sequence.
2 Documents cannot create new outbound citations after introduction.
These rules guarantee that any resulting network is an acyclic digraph.
The simplest topological ordering is just the order of vertex introduction.
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 10 / 21
11. Citation Dynamics and Sinks
Dynamic Acyclic Digraphs
What properties do we have?
1 Each component has at least one “sink” and one “source.”
2 Sinks are vertices with zero out-degree. The first vertex in a
topological ordering must be a sink.
3 Sources are vertices with zero in-degree. The last vertex in a
topological ordering must be a source.
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 11 / 21
12. Citation Dynamics and Sinks
Sinks
If sinks have zero out-degree, they must represent the point at
which at least one idea is introduced into the network.
Either the document “invents” the idea or the head of the citation arc was
not sampled in the dataset.
Weak vs. Strong - Dimensional Data can help identify Weak Sinks
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 12 / 21
13. Citation Dynamics and Sinks
Six Degrees of Marbury v. Madison
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 13 / 21
14. Distance Measures for Dynamic Citation Networks
Basic Idea of the Distance Measure
If two vertices share more “ideas,” they should be more similar.
Alternative Example: Articles in Political Science
1 American Politics
2 Congress
3 Committee Assignments
4 Formal Theory
We want to be able to use clustering methods, so we then construct a
distance measure from this basic premise.
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 14 / 21
15. Distance Measures for Dynamic Citation Networks
A Simple Distance Measure
Simplest Distance Measure: Proportion of Possibly Shared Ideas
|Si ∩ Sj |
Di,j =1 −
|Si ∪ Sj |
Si :the set of sink vertex IDs for vertex i
Note that this is only one way to translate from similarity to distance.
Also note that distance between vertices i and j don’t change over
time.
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 15 / 21
16. Distance Measures for Dynamic Citation Networks
Flexible Framework for More Detailed Specifications
What if the story is more complicated?
1 Minimum path length to a sink
2 Number of paths to a sink
3 Total number of shared ancestors
4 Total elapsed time along path
Example with arbitrary f for path length and number of shared
ancestors:
s∈Si ∩Sj f (Ai,s , Pi,s , Aj,s , Pj,s )
Di,j =1 −
s∈Si ∪Sj f (Ai,s , Pi,s , Aj,s , Pj,s )
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 16 / 21
17. How does the “sink” method perform? Simulation Results
Simulation
1 Directed
2 Two vertex types
3 Asymmetric vertex connection probabilities
4 Preferential attachment mechanism (Two-Dimensional)
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 17 / 21
18. How does the “sink” method perform? Simulation Results
Simulation Results
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 18 / 21
19. How does the “sink” method perform? United States Supreme Court
United States Supreme Court
Movie Available @
computationallegalstudies.com
The Early Years of the United States Supreme Court
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 19 / 21
20. How does the “sink” method perform? United States Supreme Court
Supreme Court Results Using the Sink Method
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 20 / 21
21. Conclusion and Future Directions
Conclusion
1 There are issues with existing community detection methods in
dynamic citation networks.
2 Our sink-based method provides more reasonable qualitative results
than other methods we’ve tried.
3 Application to a larger segment of the SCOTUS data together with
qualitative strategy designed to evaluate the outputs
M. Bommarito, D. Katz, J. Zelner, J. Fowler Distance Measures for Dynamic Citation Networks
() May 21, 2010 21 / 21