The presentation argues that the C-Span archive is not a mere repository of moving pictures. It can also be seen as a one of a kind “big data” repository. If processed from a “practice capital” perspective with quantitative and network analytic tools, such data can significantly extend the capabilities of C-Span archives by identifying the central actors in a debate and their ability to sway it. The proposed approach may serve the public interest though API tools that support third party development of visualization and analytic apps, which can lead to more informed debates and new forms of data driven journalism.
Enhancing C-Span Video Archive with Practice Capital Metadata and data journalism APIs
1. ENHANCING THE C-SPAN ARCHIVE
WITH COMMUNICATIVE
METADATA:
A PRACTICE CAPITAL PROPOSAL
Sorin Adam Matei
Associate Professor
Discovery Park and Polytechnic Institute Fellow
Director of Research for Computational Social Science,
CyberCenter
BRIAN LAMB SCHOOL OF COMMUNICATION
2. DATA EVERYWHERE
• The C-Span Archive is a Big Data repository
• Social and Political Big Data
• Captures not just words or moving images
but INTERACTIONS
BRIAN LAMB SCHOOL OF COMMUNICATION
3. AN INTERACTION REPOSITORY
• The C-Span archive captures who said, what,
to whom
• Sender Message Receiver
• Concatenated, such chains of interaction
become SOCIAL NETWORKS OF DEBATE
BRIAN LAMB SCHOOL OF COMMUNICATION
4. COMMUNICATIVE META-DATA
• Each member of the network can be evaluated
for his or her role, importance, and impact
• The role, importance and impact can be turned
into search and visualization criteria both for the
speakers and for what was said
• Meta-data is data that describes the context of
the speech-act and can extend the search past
tags, keywords, author, or time
BRIAN LAMB SCHOOL OF COMMUNICATION
5. SOCIAL NETWORKS – THE BRIEFEST INTRO
• Mapping people as members of a network
reveals things that are not immediately
apparent
• What is important is not how much you talk to
other people, but how central you are in the
debate
BRIAN LAMB SCHOOL OF COMMUNICATION
6. THE IMPORTANCE OF BEING CENTRAL
• Centrality
– Simple
• How many conversation partners you have
• Follow the distribution of contributions
– Complex and subtle
• How important are you in the network of
communications
• If you were not there, would the network be poorer
BRIAN LAMB SCHOOL OF COMMUNICATION
7. THE MAGIC OF BETWEENNESS CENTRALITY
1 is the most central
node, although it is not
the most directly
connected
It might even be a very
unimportant (by
attributes) node or
even ignored
It is potentially a
bridge maker and
connector
BRIAN LAMB SCHOOL OF COMMUNICATION
8. PRACTICE CAPITAL
• Practice: working together within a human space
• Co-work ties are practice ties, not necessarily
communicative
• Practice ties can be detected via network
analysis
• High betweenness in practice space = high
practice capital
BRIAN LAMB SCHOOL OF COMMUNICATION
9. HOW DOES THIS MATTER?
• Mapping social conversations as networks
• Reveals the unseen powerbrokers or bridgemakers
• Suggests new information cues and selection
criteria for browsing the videos
• Facilitates a new kind of “data journalism”
BRIAN LAMB SCHOOL OF COMMUNICATION
10. AN EXAMPLE: JOINT SELECT COMMITTEE ON BUDGET
DEFICIT REDUCTION HEARINGS
• November, October 2011
• 17 speakers
representatives, senators,
former presidential
administration
staffers/players
• 280 minutes of conversation
• Over 115 turns of speech
http://c-spanvideo.org/topic/85
BRIAN LAMB SCHOOL OF COMMUNICATION
11. TURNING CONVERSATIONS INTO NETWORKS
• Analyze who is speaking to whom
• Create conversation ties that decay the longer the time that
passed between turns of speech
• Speakers that are closest to each other are the most connected,
those more distant are exponentially less connected
• Highest connection as defined by centrality in practice space,
higher practice capital
BRIAN LAMB SCHOOL OF COMMUNICATION
12. TECHNOLOGY WAS TESTED
• Methodology
already applied to
Wikipedia
• We created a
network of 3
million nodes
• Code is written in
JAVA, is open
source and will be
released soon
BRIAN LAMB SCHOOL OF COMMUNICATION
13. TEST ANALYSIS APPLIED TO A C-SPAN DEBATE
Baucus
Becerra
Bowles
Camp
Clyburn
Domenici
Elmendorf
Hensarling
Kerry
Kyl
Murray
Portman
Rivlin
Simpson
Toomey
Upton
Van Hollen
Two groups, several central talkers. Solid lines the strongest relationships.
BRIAN LAMB SCHOOL OF COMMUNICATION
14. HOW DOES CENTRALITY CHANGE THE STORY?
Betweeness Centrality
Speech minutes
Speech Minutes
100
80
90
72.3
70
80
70
60
60
50
50
40
40
30
30
20
20
10
10
49.86
39.25
32.5
6
0
0
Clyburn
Clyburn
Bowles
Domenici
Rivlin
Domenici
Rivlin
Bowles
Elmendorf
Elmendorf
Highest talkers are are not the most central practice capital members of the debate
BRIAN LAMB SCHOOL OF COMMUNICATION
15. THE MODEST PROPOSAL
Add search criteria for centrality, verbosity (amount), and persistence (turns of speech)
BRIAN LAMB SCHOOL OF COMMUNICATION
16. LOOKING FORWARD
• Analyze all C-Span video corpus, generate
centrality, verbosity, persistence for each
debater
• Store info, create service that serves data
alongside other metadata
• Allow third-parties to create visualization tools
and apps that indicate degree of connectedness
of speakers in practice space
• Visualize practice capital
BRIAN LAMB SCHOOL OF COMMUNICATION