SlideShare a Scribd company logo
1 of 72
Frontiers of
Computational Journalism
Columbia Journalism School
Week 8: Visualization and Network Analysis
November 7, 2018
This class
• Visualization as perception
• Visualization design
• Social network theory
• Network analysis in journalism
Visualization as Perception
Topic links in Gödel, Escher, Bach
“Visualization allows people to offload cognition to the
perceptual system, using carefully designed images as a
form of external memory.
The human visual system is a very high-bandwidth
channel to the brain, with a significant amount of
processing occurring in parallel and at the pre-conscious
level.”
- Tamara Munzner
Pop-Out Effects
Visual Comparisons
length
orientation
size color
...plus number, shape, relative motion, and much more
Basic idea of visualization:
Turn something you want to find
into something you can see
without thinking about it
correlations
clusters
extents
outliers
Design Study Methodology: Reflections from the Trenches and the Stacks, Sedlmair et al, 2012
Visualization Design
Inward and Outward Grand Challenges for Visualization, Tamara Munzner
Sequential Narrative
What’s Really Warming The World?, Bloomberg
Visualization isn’t “objective,” but that doesn’t mean you can’t mislead. (Is
this graph misleading?)
Social Network Theory
Network
A set of people
and a set of connections between pairs of them
Types of connections
Social network analysis: only one type of connection between
individuals (e.g. "friend")
Link analysis: multiple types of connections
friend
brother
employer
went to university with
sold a car to
owns 51% of
Link analysis is much more relevant to journalism, because it
allows representation of much more detail and context.
People Act in Groups
Family and friendships: I am most closely connected to a small set of people,
who are usually closely connected to each other.
Business: I am much more likely to do business with people I already know.
Influence: I listen to people I know more than I listen to strangers.
Norms: what is right depends on what the people around me think.
People tend to marry, do business with, spend time with, etc. people from similar
backgrounds... and people who have social ties tend to be similar.
Two major analysis methods
…after you have the network data, which may be a very manual
process.
• Look at a visualization
• Apply algorithm
In both cases, the results are not interpretable without context.
A “sociogram” of a fraternity from Moreno’s Who Shall Survive? (1934). Arrows show one way
“attraction” and lines with a cross bar show “mutual attraction.”
Force-Directed Layout
Each edge is a "spring" with a fixed preferred length.
Plus global repulsive force that pushes all nodes apart.
The Effect of Graph Layout on Inference from Social Network Data,
Blythe et al.
The Effect of Graph Layout on Inference from Social Network Data,
Blythe et al.
We asked respondents three questions about the same five
focal nodes in each sociogram:
1) how many subgroups were in the sociogram
2) how “prominent” was each player in the sociogram
3) how important a “bridging” role did each player occupy in
the sociogram
Centrality
Often identified with "influence" or "power." Often important in journalism.
We can visualize the graph and use our eyes, or we can compute centrality
values algorithmically.
Degree centrality: number of edges
Models: cases where the number of connections is important.
Example: which celebrity can reach the most people at once?
Closeness centrality: average distance to all other nodes
Models: cases where time taken to reach a node is important.
Example: who finds out about gossip first?
Betweenness centrality:
number of shortest paths that pass through node
Models: cases where control over transmission is important.
Example: who has the most power to make introductions?
Eigenvector centrality:
how likely you are to end up at a node on a random walk
(same idea as PageRank)
Models: cases where importance of neighbors is important.
Example: the private adviser to the president
Journalism centrality:
how important is this person to this story?
Finding Communities
No one definition of "community." Could mean a town, or a club, or an industry
network.
But for our purposes, a community is "a group of people with pre-existing
patterns of association."
In social network analysis, that translates into clusters in the graph.
Friends/followers
Co-consumption – Network of political book sales, Orgnet.com
Communications network – Exploring Enron, Jeffery Heer
Web link structure – Map of Iranian Blogosphere, Berkman Center
Individual time/location trails – CitySense, Sense Networks
Mathematical definitions of "cluster"
You've already seen several. If you can compute distance between any two
items, you can cluster.
But in social networks, not everyone is connected to everyone else...
Modularity
Are there more intra-group edges than we would
expect randomly?
Modularity
n = number of vertices
ki = degree of vertex i
Aij = 1 if edge between i,j, 0 otherwise
gij = 1 if i,j in same group, 0 otherwise
There are total edges in the graph.
If they go between random vertices then number of
edges between i,j is
m = 1
2 kiå
kikj / 2m
Modularity
n = number of vertices
ki = degree of vertex i
Aij = 1 if edge between i,j, 0 otherwise
gij = 1 if i,j in same group, 0 otherwise
Modularity
If Q>0 then there are "excess" edges inside the
groups (and fewer edges between them.)
Q = Aij -kikj / 2m( )
ij
å gij
Modularity algorithm
• Look for a division of nodes into two groups that maximizes Q
• Can find this through eigenvector technique
• Possible that no division has Q>0, in which case the graph is a
single community
• If a division with Q>0 found, split
• Recursively split sub-graphs
Network Analysis in Journalism
Case Study: Seattle Art World
In Seattle Art World, Women Run the Show, Seattle Times
Network obtained from
dozens of in-person
interviews. Interactive
visualization in story.
Case Study: Hot Wheels
Hot Wheels, Tampa Bay Times
Network obtained from
juvenile arrest records
concerning stolen cars.
Unpublished visualization
and centrality measures
used to direct reporting to
most interesting people.
Coded 34 Stories for Sources and Uses
Story visualization: published story contains a visualization
Reporting visualization: used to guide reporters, unpublished.
Scraping: network extracted from source documents
Algorithm: centrality, community, etc. used
Graph DB: network loaded into graph database
Results
0
5
10
15
20
25
30
35
40
Total Story Vis Scraping Reporting Vis Algorithm Graph DB
Why not algorithms?
Heterogeneous networks. Multiple entity/relationship types. “Link
analysis” like criminal investigations.
Incomplete data. Building out the network is often an interactive
process of data gathering.
Contextual interpretation: What does it mean for someone to be
“central”? Depends on the nature of the network and story.
Correlation of different types of info
Suppose you have a record of phone numbers called, a database of political
campaign donations, and a list of government appointees. Put them together, and
you have this story:
WASHINGTON—Time and again, Texas Gov. Rick Perry picked up his office phone in the
months before he would announce his bid for the presidency. He dialed wealthy friends who
were his big fundraisers and state officials who owed him for their jobs.
Perry also met with a Texas executive who would later co-found an independent political
committee that has promised to raise millions to support Perry but is prohibited from
coordinating its activities with the governor.
- Jack Gillum, Perry called top donors from work phones, AP, 6 Dec 2011
The state of the art: Panama Papers
Graph Databases in Theory
Load everything into the database, then analyze using a graph query
language and interactive visualization.
“Magic bullet” for large, complex, cross border investigations.
Panama Papers networks derived from
structured data only
Entity recognition is not solved!
Incredibly dirty source data. Current methods have low recall (~70%)
Entities found
out of 150
“Soft”
record
linkage
Unlinked
records
Graph Databases in Practice
Incomplete data. Building a network often requires scraping from documents. Bulk data often
unavailable or impractical, and some records need to be purchased one at a time. Instead,
reporting involves interactive data enrichment.
Record linkage: With N databases, there could be N copies of each entity.
Graph queries are not that helpful. Cipher was available to PP investigators but no one
outside the core team learned it. Moreover, it’s not clear how often reporting problems can be
expressed as a graph query. Even “find path between” did not produce any (documented) leads
on PP.
Networks need to be narratives. The most useful networks are hand-built, for a particular line
of reporting.
Maps, not data visualizations
Query results vs. hand-built graphs
Search for node to addGraph query results
Proposed System

More Related Content

What's hot

Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Rich Heimann
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...Daniel Katz
 
Interactive visualization and exploration of network data with Gephi
Interactive visualization and exploration of network data with GephiInteractive visualization and exploration of network data with Gephi
Interactive visualization and exploration of network data with GephiDigital Methods Initiative
 
Mathematics and Social Networks
Mathematics and Social NetworksMathematics and Social Networks
Mathematics and Social NetworksMason Porter
 
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL  NETWORKS: CASE OF TWITTER AND FACEBOOK POLITICAL OPINION ANALYSIS IN SOCIAL  NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK dannyijwest
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis Jari Jussila
 
Tutorial Cognition - Irene
Tutorial Cognition - IreneTutorial Cognition - Irene
Tutorial Cognition - IreneSSSW
 
SMART Seminar Series: Tweets, Emergencies and Experience - New Theory and Met...
SMART Seminar Series: Tweets, Emergencies and Experience - New Theory and Met...SMART Seminar Series: Tweets, Emergencies and Experience - New Theory and Met...
SMART Seminar Series: Tweets, Emergencies and Experience - New Theory and Met...SMART Infrastructure Facility
 
Opinion Dynamics on Networks
Opinion Dynamics on NetworksOpinion Dynamics on Networks
Opinion Dynamics on NetworksMason Porter
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesTraian Rebedea
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreWael Elrifai
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computingElena Simperl
 
Harvesting collective intelligence.
Harvesting collective intelligence. Harvesting collective intelligence.
Harvesting collective intelligence. Alberto Cottica
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5Traian Rebedea
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
 

What's hot (19)

Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 4 - Professor...
 
Interactive visualization and exploration of network data with Gephi
Interactive visualization and exploration of network data with GephiInteractive visualization and exploration of network data with Gephi
Interactive visualization and exploration of network data with Gephi
 
07 Network Visualization
07 Network Visualization07 Network Visualization
07 Network Visualization
 
Mathematics and Social Networks
Mathematics and Social NetworksMathematics and Social Networks
Mathematics and Social Networks
 
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL  NETWORKS: CASE OF TWITTER AND FACEBOOK POLITICAL OPINION ANALYSIS IN SOCIAL  NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis
 
Tutorial Cognition - Irene
Tutorial Cognition - IreneTutorial Cognition - Irene
Tutorial Cognition - Irene
 
Semantics based Summarization of Entities in Knowledge Graphs
Semantics based Summarization of Entities in Knowledge GraphsSemantics based Summarization of Entities in Knowledge Graphs
Semantics based Summarization of Entities in Knowledge Graphs
 
SMART Seminar Series: Tweets, Emergencies and Experience - New Theory and Met...
SMART Seminar Series: Tweets, Emergencies and Experience - New Theory and Met...SMART Seminar Series: Tweets, Emergencies and Experience - New Theory and Met...
SMART Seminar Series: Tweets, Emergencies and Experience - New Theory and Met...
 
15 Network Visualization and Communities
15 Network Visualization and Communities15 Network Visualization and Communities
15 Network Visualization and Communities
 
Opinion Dynamics on Networks
Opinion Dynamics on NetworksOpinion Dynamics on Networks
Opinion Dynamics on Networks
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profiles
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and more
 
Building better knowledge graphs through social computing
Building better knowledge graphs through social computingBuilding better knowledge graphs through social computing
Building better knowledge graphs through social computing
 
01 Network Data Collection (2017)
01 Network Data Collection (2017)01 Network Data Collection (2017)
01 Network Data Collection (2017)
 
Harvesting collective intelligence.
Harvesting collective intelligence. Harvesting collective intelligence.
Harvesting collective intelligence.
 
AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
 

Similar to Frontiers of Computational Journalism week 8 - Visualization and Network Analysis

Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSilvia Puglisi
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)Duke Network Analysis Center
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measuresdnac
 
Social Network Analysis - an Introduction (minus the Maths)
Social Network Analysis - an Introduction (minus the Maths)Social Network Analysis - an Introduction (minus the Maths)
Social Network Analysis - an Introduction (minus the Maths)Katy Jordan
 
Ona For Community Roundtable
Ona For Community RoundtableOna For Community Roundtable
Ona For Community RoundtablePatti Anklam
 
Introductory Talk on Social Network Analysis at Facebook Developer Circle Me...
Introductory Talk on Social Network Analysis  at Facebook Developer Circle Me...Introductory Talk on Social Network Analysis  at Facebook Developer Circle Me...
Introductory Talk on Social Network Analysis at Facebook Developer Circle Me...Premsankar Chakkingal
 
02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and OverviewDuke Network Analysis Center
 
Introduction to Computational Social Science
Introduction to Computational Social ScienceIntroduction to Computational Social Science
Introduction to Computational Social SciencePremsankar Chakkingal
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collectiondnac
 
Human-machine Inter-agencies
Human-machine Inter-agenciesHuman-machine Inter-agencies
Human-machine Inter-agenciesmo-seph
 
Platforms and Analytical Gestures
Platforms and Analytical GesturesPlatforms and Analytical Gestures
Platforms and Analytical GesturesBernhard Rieder
 
Sharma social networks
Sharma social networksSharma social networks
Sharma social networkskeuvoh7883
 
Sharma Social Networks (Tin180 Com)
Sharma Social Networks (Tin180 Com)Sharma Social Networks (Tin180 Com)
Sharma Social Networks (Tin180 Com)Tin180 VietNam
 
Sharma social crear red
Sharma social crear redSharma social crear red
Sharma social crear redkeuvoh7883
 
10 More than a Pretty Picture: Visual Thinking in Network Studies (2016)
10 More than a Pretty Picture: Visual Thinking in Network Studies (2016)10 More than a Pretty Picture: Visual Thinking in Network Studies (2016)
10 More than a Pretty Picture: Visual Thinking in Network Studies (2016)Duke Network Analysis Center
 
Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012CameliaN
 

Similar to Frontiers of Computational Journalism week 8 - Visualization and Network Analysis (20)

Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced information
 
SSRI_pt1.ppt
SSRI_pt1.pptSSRI_pt1.ppt
SSRI_pt1.ppt
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
 
Social Network Analysis - an Introduction (minus the Maths)
Social Network Analysis - an Introduction (minus the Maths)Social Network Analysis - an Introduction (minus the Maths)
Social Network Analysis - an Introduction (minus the Maths)
 
Ona For Community Roundtable
Ona For Community RoundtableOna For Community Roundtable
Ona For Community Roundtable
 
Introductory Talk on Social Network Analysis at Facebook Developer Circle Me...
Introductory Talk on Social Network Analysis  at Facebook Developer Circle Me...Introductory Talk on Social Network Analysis  at Facebook Developer Circle Me...
Introductory Talk on Social Network Analysis at Facebook Developer Circle Me...
 
Week2
Week2Week2
Week2
 
02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview02 Introduction to Social Networks and Health: Key Concepts and Overview
02 Introduction to Social Networks and Health: Key Concepts and Overview
 
Introduction to Computational Social Science
Introduction to Computational Social ScienceIntroduction to Computational Social Science
Introduction to Computational Social Science
 
02 Network Data Collection
02 Network Data Collection02 Network Data Collection
02 Network Data Collection
 
02 Network Data Collection (2016)
02 Network Data Collection (2016)02 Network Data Collection (2016)
02 Network Data Collection (2016)
 
Human-machine Inter-agencies
Human-machine Inter-agenciesHuman-machine Inter-agencies
Human-machine Inter-agencies
 
Seeing and talking about Big Data, Farida Vis, AHRC Subject Assocations
Seeing and talking about Big Data, Farida Vis, AHRC Subject AssocationsSeeing and talking about Big Data, Farida Vis, AHRC Subject Assocations
Seeing and talking about Big Data, Farida Vis, AHRC Subject Assocations
 
Platforms and Analytical Gestures
Platforms and Analytical GesturesPlatforms and Analytical Gestures
Platforms and Analytical Gestures
 
Sharma social networks
Sharma social networksSharma social networks
Sharma social networks
 
Sharma Social Networks (Tin180 Com)
Sharma Social Networks (Tin180 Com)Sharma Social Networks (Tin180 Com)
Sharma Social Networks (Tin180 Com)
 
Sharma social crear red
Sharma social crear redSharma social crear red
Sharma social crear red
 
10 More than a Pretty Picture: Visual Thinking in Network Studies (2016)
10 More than a Pretty Picture: Visual Thinking in Network Studies (2016)10 More than a Pretty Picture: Visual Thinking in Network Studies (2016)
10 More than a Pretty Picture: Visual Thinking in Network Studies (2016)
 
Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012
 

More from Jonathan Stray

Frameworks for Algorithmic Bias
Frameworks for Algorithmic BiasFrameworks for Algorithmic Bias
Frameworks for Algorithmic BiasJonathan Stray
 
Analyzing Bias in Data - IRE 2019
Analyzing Bias in Data - IRE 2019Analyzing Bias in Data - IRE 2019
Analyzing Bias in Data - IRE 2019Jonathan Stray
 
Frontiers of Computational Journalism week 11 - Privacy and Security
Frontiers of Computational Journalism week 11 - Privacy and SecurityFrontiers of Computational Journalism week 11 - Privacy and Security
Frontiers of Computational Journalism week 11 - Privacy and SecurityJonathan Stray
 
Frontiers of Computational Journalism week 10 - Truth and Trust
Frontiers of Computational Journalism week 10 - Truth and TrustFrontiers of Computational Journalism week 10 - Truth and Trust
Frontiers of Computational Journalism week 10 - Truth and TrustJonathan Stray
 
Frontiers of Computational Journalism week 9 - Knowledge representation
Frontiers of Computational Journalism week 9 - Knowledge representationFrontiers of Computational Journalism week 9 - Knowledge representation
Frontiers of Computational Journalism week 9 - Knowledge representationJonathan Stray
 
Frontiers of Computational Journalism week 7 - Randomness and Statistical Sig...
Frontiers of Computational Journalism week 7 - Randomness and Statistical Sig...Frontiers of Computational Journalism week 7 - Randomness and Statistical Sig...
Frontiers of Computational Journalism week 7 - Randomness and Statistical Sig...Jonathan Stray
 
Frontiers of Computational Journalism week 6 - Quantitative Fairness
Frontiers of Computational Journalism week 6 - Quantitative FairnessFrontiers of Computational Journalism week 6 - Quantitative Fairness
Frontiers of Computational Journalism week 6 - Quantitative FairnessJonathan Stray
 
Frontiers of Computational Journalism week 5 - Algorithmic Accountability and...
Frontiers of Computational Journalism week 5 - Algorithmic Accountability and...Frontiers of Computational Journalism week 5 - Algorithmic Accountability and...
Frontiers of Computational Journalism week 5 - Algorithmic Accountability and...Jonathan Stray
 
Frontiers of Computational Journalism - Final project suggestions
Frontiers of Computational Journalism - Final project suggestionsFrontiers of Computational Journalism - Final project suggestions
Frontiers of Computational Journalism - Final project suggestionsJonathan Stray
 
Frontiers of Computational Journalism week 4 - Statistical Inference
Frontiers of Computational Journalism week 4 - Statistical InferenceFrontiers of Computational Journalism week 4 - Statistical Inference
Frontiers of Computational Journalism week 4 - Statistical InferenceJonathan Stray
 

More from Jonathan Stray (10)

Frameworks for Algorithmic Bias
Frameworks for Algorithmic BiasFrameworks for Algorithmic Bias
Frameworks for Algorithmic Bias
 
Analyzing Bias in Data - IRE 2019
Analyzing Bias in Data - IRE 2019Analyzing Bias in Data - IRE 2019
Analyzing Bias in Data - IRE 2019
 
Frontiers of Computational Journalism week 11 - Privacy and Security
Frontiers of Computational Journalism week 11 - Privacy and SecurityFrontiers of Computational Journalism week 11 - Privacy and Security
Frontiers of Computational Journalism week 11 - Privacy and Security
 
Frontiers of Computational Journalism week 10 - Truth and Trust
Frontiers of Computational Journalism week 10 - Truth and TrustFrontiers of Computational Journalism week 10 - Truth and Trust
Frontiers of Computational Journalism week 10 - Truth and Trust
 
Frontiers of Computational Journalism week 9 - Knowledge representation
Frontiers of Computational Journalism week 9 - Knowledge representationFrontiers of Computational Journalism week 9 - Knowledge representation
Frontiers of Computational Journalism week 9 - Knowledge representation
 
Frontiers of Computational Journalism week 7 - Randomness and Statistical Sig...
Frontiers of Computational Journalism week 7 - Randomness and Statistical Sig...Frontiers of Computational Journalism week 7 - Randomness and Statistical Sig...
Frontiers of Computational Journalism week 7 - Randomness and Statistical Sig...
 
Frontiers of Computational Journalism week 6 - Quantitative Fairness
Frontiers of Computational Journalism week 6 - Quantitative FairnessFrontiers of Computational Journalism week 6 - Quantitative Fairness
Frontiers of Computational Journalism week 6 - Quantitative Fairness
 
Frontiers of Computational Journalism week 5 - Algorithmic Accountability and...
Frontiers of Computational Journalism week 5 - Algorithmic Accountability and...Frontiers of Computational Journalism week 5 - Algorithmic Accountability and...
Frontiers of Computational Journalism week 5 - Algorithmic Accountability and...
 
Frontiers of Computational Journalism - Final project suggestions
Frontiers of Computational Journalism - Final project suggestionsFrontiers of Computational Journalism - Final project suggestions
Frontiers of Computational Journalism - Final project suggestions
 
Frontiers of Computational Journalism week 4 - Statistical Inference
Frontiers of Computational Journalism week 4 - Statistical InferenceFrontiers of Computational Journalism week 4 - Statistical Inference
Frontiers of Computational Journalism week 4 - Statistical Inference
 

Recently uploaded

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 

Recently uploaded (20)

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 

Frontiers of Computational Journalism week 8 - Visualization and Network Analysis

  • 1. Frontiers of Computational Journalism Columbia Journalism School Week 8: Visualization and Network Analysis November 7, 2018
  • 2. This class • Visualization as perception • Visualization design • Social network theory • Network analysis in journalism
  • 4. Topic links in Gödel, Escher, Bach
  • 5.
  • 6. “Visualization allows people to offload cognition to the perceptual system, using carefully designed images as a form of external memory. The human visual system is a very high-bandwidth channel to the brain, with a significant amount of processing occurring in parallel and at the pre-conscious level.” - Tamara Munzner
  • 7.
  • 8.
  • 9.
  • 11. Visual Comparisons length orientation size color ...plus number, shape, relative motion, and much more
  • 12.
  • 13.
  • 14. Basic idea of visualization: Turn something you want to find into something you can see without thinking about it
  • 16. Design Study Methodology: Reflections from the Trenches and the Stacks, Sedlmair et al, 2012
  • 18. Inward and Outward Grand Challenges for Visualization, Tamara Munzner
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. Sequential Narrative What’s Really Warming The World?, Bloomberg
  • 24. Visualization isn’t “objective,” but that doesn’t mean you can’t mislead. (Is this graph misleading?)
  • 25.
  • 26.
  • 27.
  • 28.
  • 30. Network A set of people and a set of connections between pairs of them
  • 31. Types of connections Social network analysis: only one type of connection between individuals (e.g. "friend") Link analysis: multiple types of connections friend brother employer went to university with sold a car to owns 51% of Link analysis is much more relevant to journalism, because it allows representation of much more detail and context.
  • 32. People Act in Groups Family and friendships: I am most closely connected to a small set of people, who are usually closely connected to each other. Business: I am much more likely to do business with people I already know. Influence: I listen to people I know more than I listen to strangers. Norms: what is right depends on what the people around me think. People tend to marry, do business with, spend time with, etc. people from similar backgrounds... and people who have social ties tend to be similar.
  • 33. Two major analysis methods …after you have the network data, which may be a very manual process. • Look at a visualization • Apply algorithm In both cases, the results are not interpretable without context.
  • 34. A “sociogram” of a fraternity from Moreno’s Who Shall Survive? (1934). Arrows show one way “attraction” and lines with a cross bar show “mutual attraction.”
  • 35. Force-Directed Layout Each edge is a "spring" with a fixed preferred length. Plus global repulsive force that pushes all nodes apart.
  • 36. The Effect of Graph Layout on Inference from Social Network Data, Blythe et al.
  • 37. The Effect of Graph Layout on Inference from Social Network Data, Blythe et al. We asked respondents three questions about the same five focal nodes in each sociogram: 1) how many subgroups were in the sociogram 2) how “prominent” was each player in the sociogram 3) how important a “bridging” role did each player occupy in the sociogram
  • 38. Centrality Often identified with "influence" or "power." Often important in journalism. We can visualize the graph and use our eyes, or we can compute centrality values algorithmically.
  • 39. Degree centrality: number of edges Models: cases where the number of connections is important. Example: which celebrity can reach the most people at once?
  • 40. Closeness centrality: average distance to all other nodes Models: cases where time taken to reach a node is important. Example: who finds out about gossip first?
  • 41. Betweenness centrality: number of shortest paths that pass through node Models: cases where control over transmission is important. Example: who has the most power to make introductions?
  • 42. Eigenvector centrality: how likely you are to end up at a node on a random walk (same idea as PageRank) Models: cases where importance of neighbors is important. Example: the private adviser to the president
  • 43. Journalism centrality: how important is this person to this story?
  • 44. Finding Communities No one definition of "community." Could mean a town, or a club, or an industry network. But for our purposes, a community is "a group of people with pre-existing patterns of association." In social network analysis, that translates into clusters in the graph.
  • 46. Co-consumption – Network of political book sales, Orgnet.com
  • 47. Communications network – Exploring Enron, Jeffery Heer
  • 48. Web link structure – Map of Iranian Blogosphere, Berkman Center
  • 49. Individual time/location trails – CitySense, Sense Networks
  • 50. Mathematical definitions of "cluster" You've already seen several. If you can compute distance between any two items, you can cluster. But in social networks, not everyone is connected to everyone else...
  • 51. Modularity Are there more intra-group edges than we would expect randomly?
  • 52. Modularity n = number of vertices ki = degree of vertex i Aij = 1 if edge between i,j, 0 otherwise gij = 1 if i,j in same group, 0 otherwise There are total edges in the graph. If they go between random vertices then number of edges between i,j is m = 1 2 kiå kikj / 2m
  • 53. Modularity n = number of vertices ki = degree of vertex i Aij = 1 if edge between i,j, 0 otherwise gij = 1 if i,j in same group, 0 otherwise Modularity If Q>0 then there are "excess" edges inside the groups (and fewer edges between them.) Q = Aij -kikj / 2m( ) ij å gij
  • 54. Modularity algorithm • Look for a division of nodes into two groups that maximizes Q • Can find this through eigenvector technique • Possible that no division has Q>0, in which case the graph is a single community • If a division with Q>0 found, split • Recursively split sub-graphs
  • 55.
  • 56. Network Analysis in Journalism
  • 57. Case Study: Seattle Art World In Seattle Art World, Women Run the Show, Seattle Times Network obtained from dozens of in-person interviews. Interactive visualization in story.
  • 58. Case Study: Hot Wheels Hot Wheels, Tampa Bay Times Network obtained from juvenile arrest records concerning stolen cars. Unpublished visualization and centrality measures used to direct reporting to most interesting people.
  • 59. Coded 34 Stories for Sources and Uses Story visualization: published story contains a visualization Reporting visualization: used to guide reporters, unpublished. Scraping: network extracted from source documents Algorithm: centrality, community, etc. used Graph DB: network loaded into graph database
  • 60. Results 0 5 10 15 20 25 30 35 40 Total Story Vis Scraping Reporting Vis Algorithm Graph DB
  • 61. Why not algorithms? Heterogeneous networks. Multiple entity/relationship types. “Link analysis” like criminal investigations. Incomplete data. Building out the network is often an interactive process of data gathering. Contextual interpretation: What does it mean for someone to be “central”? Depends on the nature of the network and story.
  • 62. Correlation of different types of info Suppose you have a record of phone numbers called, a database of political campaign donations, and a list of government appointees. Put them together, and you have this story: WASHINGTON—Time and again, Texas Gov. Rick Perry picked up his office phone in the months before he would announce his bid for the presidency. He dialed wealthy friends who were his big fundraisers and state officials who owed him for their jobs. Perry also met with a Texas executive who would later co-found an independent political committee that has promised to raise millions to support Perry but is prohibited from coordinating its activities with the governor. - Jack Gillum, Perry called top donors from work phones, AP, 6 Dec 2011
  • 63. The state of the art: Panama Papers
  • 64. Graph Databases in Theory Load everything into the database, then analyze using a graph query language and interactive visualization. “Magic bullet” for large, complex, cross border investigations.
  • 65. Panama Papers networks derived from structured data only
  • 66. Entity recognition is not solved! Incredibly dirty source data. Current methods have low recall (~70%) Entities found out of 150
  • 68. Graph Databases in Practice Incomplete data. Building a network often requires scraping from documents. Bulk data often unavailable or impractical, and some records need to be purchased one at a time. Instead, reporting involves interactive data enrichment. Record linkage: With N databases, there could be N copies of each entity. Graph queries are not that helpful. Cipher was available to PP investigators but no one outside the core team learned it. Moreover, it’s not clear how often reporting problems can be expressed as a graph query. Even “find path between” did not produce any (documented) leads on PP. Networks need to be narratives. The most useful networks are hand-built, for a particular line of reporting.
  • 69. Maps, not data visualizations
  • 70.
  • 71. Query results vs. hand-built graphs Search for node to addGraph query results