SlideShare a Scribd company logo
1 of 52
Social Network Analysis
An overview
Presentation by @dougneedham
Introduction
 @dougneedham
 Data Guy - Started as a DBA in the Marine Corps, evolved to Architect,
now Data Scientist.
 Oracle, SQL Server, Cassandra, Hadoop, MySQL, Spark.
 I have a strong relational/traditional background.
 Perpetual Student
 Learning new things challenges our assumptions. Forces us to take a
new perspective on “old” problems. Eventually maybe even shows us
that there is a better way to solve a problem.
Why study social networks?
 It is cool.
 The concepts around Social Network Analysis can be applied to many
interesting problems in a variety of business verticals.
 The foundation of Social Network Analysis is Graph theory.
 Solving Crime
 Some examples: Introduction to Graph_Theory
What is Social Network Analysis?
 “Social network analysis (SNA) is a strategy for investigating social
structures through the use of network and graph theories. It
characterizes networked structures in terms of nodes (individual actors,
people, or things within the network) and the ties or edges
(relationships or interactions) that connect them. Examples of social
structures commonly visualized through social network analysis include
social media networks, friendship and acquaintance networks, kinship,
disease transmission, and sexual relationships. These networks are often
visualized through sociograms in which nodes are represented as points
and ties are represented as lines.” – Wikipedia
 https://en.wikipedia.org/wiki/Social_network_analysis
Example From wiki:
"Kencf0618FacebookNetwork" by
Kencf0618 - Own work. Licensed under
CC BY-SA 3.0 via Wikimedia Commons -
https://commons.wikimedia.org/wiki/File:
Kencf0618FacebookNetwork.jpg#/medi
a/File:Kencf0618FacebookNetwork.jpg
A little History
 The 7 Bridges of Konisberg
 Every tome on Graph theory or Network analysis devotes a small
portion of there time to the 7 Bridges of Konisberg.
 If I don’t cover this with you, the gods of mathematics will strike me
down, and never allow me to do analysis again in the future.
The Bridges
The Problem
 Folks enjoyed there Sunday afternoon strolls across the bridges, but
occasionally people would wonder if one particular route was more
efficient than another.
 Eventually Leonhard Euler was brought into the debate about the
efficiency problem.
 Euler used Vertices to represent the land masses and edges (or arcs, at
the time) to represent bridges. He realized the odd number of edges
per vertex made the problem unsolvable.
 Sarada Herke provides for one of the best explanations of the solution
Solution to Konisburg
 And here is the cool thing about mathematicians. If we tell you
something is impossible, we have to tell you why in a way you can
understand it. But he also invented the branch of mathematics today
we call Graph Theory.
 http://en.wikipedia.org/wiki/Leonhard_Euler
Why analyze Facebook data?
 Facebook is something that most people use.
 It is easy to see the relationships and the concepts of the
Graph/Network are intuitive to people who are looking at their “own”
network.
 The main idea is, if you can understand your own friend data, you can
learn the concepts quickly, then apply these same concepts to more
complicated problems.
 We will talk a little about some complicated topics at the end.
A few terms
 Stand back, we are going to talk about math!
 Basically we are talking about a bunch of dots joined together by lines
 Vertex – Dot on a graph
 Edge – Line connecting the two points
 Edge_Label – this is a term I coined originally related to Data Structure Graphs that
helps trace a path. If you label your edges, and you have multiple edges with the same
label in a Graph you can quite easily identify walks, paths, and cycles through your
graph.
 Triangle – 3 Vertices, 3 Edges
 Square – 4 Vertices, 4 edges
 Open Triangle - 3 Vertices, 2 edges /
 A lot of things are networks if you look at them the right way.
 Mark Newman has done a number of well done presentations, available on Youtube
about Network analysis.
 https://www.youtube.com/watch?v=lETt7IcDWLI
More terms
 Transitivity – The friend of my friend is my friend. Really?
 Homophily – how things are similar
 Directed Graphs – or Digraphs
 Contagion – How do things “spread” through a network?
 Let’s rearrange things, how does the layout affect understanding?
 Order of a graph – number of vertices
 Size of the graph – number of edges
 This is not just data visualization, it can also be used for prediction.
https://www.youtube.com/watch?v=rwA-y-XwjuU
Final terms
 Centrality – Hub and Authority
 This is almost a whole topic by itself, since there are different types of
Centrality:
 Degree Centrality – Simple, the Vertex with the most degrees is the most
central.
 Eigenvector Centrality – How important a particular Vertex is to a given
network.
 PageRank – similar to Eigenvector Centrality, only scaled, and if a given
vertex is closely connected to very high PageRank vertex, it is itself given a
high PageRank.
 Serious nutshell definitions.
 Shortest path – How are two vertices connected?
 Longest Path – Tracing the flow of an interesting item through a large
collection of applications.
Why is a path important? More on this
later…
The Original Joke This is me in different stores
The Math doesn’t change.
 One thing I like about Graphs –
 The Math does not change.
 The math behind Graph theory can be a little intense, but it does not
change regardless of the scale of the graph.
 Once you understand how to “do the math” on a small graph, those
same Maths apply to a Graph whether it is a graph of the people in this
room, or a graph of the people on this planet.
 Now, let me introduce you to a tool that does much of the
Mathematics for you…
But first, Netvizz…
 Netvizz is a tool that extracts data from different sections of the Facebook Platform.
 It provides an interface to the Facebook Graph API
 https://www.youtube.com/watch?v=3vkKPcN7V7Q
 For the version of data we will be looking at, I was able to extract friendship connections.
Facebook has since changed their permissions such that you can no longer extract this
information.
 However, there are some other interesting things you can do with Netvizz.
 If you manage a Facebook Group, this might be interesting.
 For this particular talk we are going to focus on Gephi interpretation. If we want to have a
more in-depth talk on Facebook and the Graph API that Facebook has opened, we can
discuss that at another time.
 To get this yourself go into Facebook and search for: Netvizz. (You have to authorize it. You
can un-authorized it later)
 You will have a number of options: group data, page data, page like network, search, and
link stats.
 Click “group data”
 Select a group if you need a sample id use: 39462256584
 It runs for a bit, then dumps to a zip file.
 Save the file, then extract it.
 Open Gephi, and use Gephi to import your GDF file.
Gephi
http://gephi.github.io/
From the website: “Gephi is an
interactive visualization and exploration
platform for all kinds of networks and
complex systems, dynamic and
hierarchical graphs.”
Java 1.7 required, you may have to set
this in Gephi.conf
Depending on the size of the network
you are studying you may need to
increase the memory available to Java
in Gephi.conf
Gephi Startup
Gephi – Open GML file
Gephi – After opening
Layout
Behavior Options
After running
Partitioning
Metrics
 Remember all those numbers we spoke about?
 Here are many of them.
Data Table
Configure Labels
Here is the layout with the labels as number of connections
Add Background
Visualization
File->Export-> SVG/PDF/PNG…
Export to Excel
How do we use this?
 Finding bottlenecks.
 You have to ignore the fact that everyone on this graph is connected
to you for a moment.
 How would someone get a message to another given person?
 They would have to pass it to someone either they both know, or pass
the message to someone who is more likely to be connected to the
target of the message.
 This was the heart of Milgram’s experiment that gave us the concept of
6 degrees of separation.
Other Analysis
 What else can be done with Social Network Analysis?
 How about risk exposure to banks?
 http://www.federalreserve.gov/newsevents/speech/yellen20130104a.htm
Application to Business Intelligence
 What if the Vertices are not people ?
 What if the Edges are not mutual connections?
 Jonathan and others over the past few meetings have done a great
job at explaining the underpinnings of how a particular BI framework is
put together.
 Within a Data Architecture there are lots of moving pieces. ETL, FTP,
SFTP, Web-Services, External data feeds. Data moving into Data Marts,
and Data Warehouses. Data Moving between applications.
 Let’s imagine how to visualize this using the information we just gained.
Data Structure Graph
 A Data Structure Graph is a group of atomic entities that are related to
each other, stored in a repository, then moved from one persistence
layer to another, rendered as a Graph.
 A group of atomic entities.
 Related to each other.
 Stored in a repository.
 Moved from one persistence layer to another.
 Rendered as a Graph.
Introducing Data Structure Graphs
 Data Structure Graph Level 1 (DSG-L1)– This is roughly like an Entity
Relationship Diagram (ERD) Tables are Vertices, Foreign Keys are Edges.
 Data Structure Graph Level 2 (DSG-L2) – Each Vertex in this graph is an
application. Each Edge is data transfer. Roughly equivalent to what we
used to call Data Flow diagrams.
 Data Structure Graph Dependency (DSG-D) – Each vertex is a
job,script, program, or process that is dependent on something
happening in sequence before it can do its work.
 A DSG-L1 can show you where you are going to have the most
interesting query performance of your tables.
 A DSG-L2 can show you where the most amount of work is going on in
your Enterprise.
 A DSG-D can show you the sequence of events that need to take
place in order for something to be completed.
New Project, Data Table, Import data.
Load as “Edges Table” Source, Target (required)
Choose Create Missing Nodes
After a few calculations and layout runs
PageRank – Which application is most important?
A few more tweaks
Where is that Node with the highest PageRank?
Remember paths?
The Original Joke This is me in different stores
Dijkstra's algorithm
 Some of you may have heard of Dijkstra’s algorithm.
 It is a method for finding the shortest path between two nodes on a
Graph.
 This is a great optimization technique, but what if you need to find the
longest path?
 What “edge_label” has the most influence on my organization?
 Iterate through each Edge_Label, create a subgraph that consists of
only the nodes this Edge_Label touches, then calculate the diameter of
that Graph.
 The data point represented by a given Edge_label that has the longest
path has the most “value” to your organization.
https://dougneedham.shinyapps.io/DataStructureGraph
Hard to see, I know, but the top diagram is the “master graph”, the bottom image is a single Edge_Label. You
can see how an individual data entity flows through an organization.
My book
Goes through a number of examples for doing an Graph analysis of a fictional organization.
Consider the following:
 If you need assistance, send a message to the group, or contact me
directly (I am easy to find @dougneedham)
 Network/Graph Analysis is cool.
 It can show you some interesting things about your data that you may
not have considered.
 Due thought should be put towards a network analysis project.
 Organizing the data requires a bit of thought. (From -> To vertices is just
a start).
 Directed graph, undirected, bigraph? Setup work needs to be done.
 Tools help with the detailed calculations, and show the paths, walks,
etc.
What did I leave out?
 Graphs that change over time – What happens when you remove a single
Edge or Vertex?
 Growth of a Network – Erdos-Renyi versus Barabasi-Albert models (Random
versus Preferential Attachment)
 Scale Free networks – Graphs that conform to Power laws. (These are
intrinsically Social Networks, but I didn’t give much detail)
 Comparing two networks – If you have the same number of edges and
nodes, are two graphs the same? Is one graph an isomorphism of another?
 Contagion – Ceteris paribus how will things(information, virus’s,
data,disease…) spread through the network. (Since a DSG represents
different types of Edges based on Edge_Label, Contagion should not affect
this type of network entirely.)
 Large Graphs – GraphX a part of Apache Spark is best used for this
purpose.
 The strength of Weak Ties Paradox
 Social Capital
Finally… Want to do Data Science?
 Challenge for members of the audience.
 1. Download Gephi.
 2. Put together a simple CSV: Source, Target,Edge_Label that describes
your own data environment.
 3. Load it in Gephi and have Gephi run the metrics, and perform the auto
layout.
 4. Answer this question: Did you get what you expected?
 5. Get a colleague to do the same thing, compare the images. How similar
are they?
 Here is my hypothesis: If you have more than 5 data applications, including
Hadoop, and Data Warehouse infrastructure, your Graph will follow the
rules of preferential attachment. (To<->From ETL tools don’t count in the
analysis)
 Tweet me @dougneedham #DataStructureGraph (anonymized, of course.)
 What does your Graph look like?
Final Thoughts – Questions?

More Related Content

What's hot

Unit 1 - SNA QUESTION BANK
Unit 1 - SNA QUESTION BANKUnit 1 - SNA QUESTION BANK
Unit 1 - SNA QUESTION BANKUsha Rani M
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018Arsalan Khan
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisSujoy Bag
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network AnalysisPremsankar Chakkingal
 
Network measures used in social network analysis
Network measures used in social network analysis Network measures used in social network analysis
Network measures used in social network analysis Dragan Gasevic
 
Community Detection with Networkx
Community Detection with NetworkxCommunity Detection with Networkx
Community Detection with NetworkxErika Fille Legara
 
Social network analysis
Social network analysisSocial network analysis
Social network analysisCaleb Jones
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewSatyaki Sikdar
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network AnalysisRory Sie
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)SocialMediaMining
 
Complex Network Analysis
Complex Network Analysis Complex Network Analysis
Complex Network Analysis Annu Sharma
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
 

What's hot (20)

Link prediction
Link predictionLink prediction
Link prediction
 
Unit 1 - SNA QUESTION BANK
Unit 1 - SNA QUESTION BANKUnit 1 - SNA QUESTION BANK
Unit 1 - SNA QUESTION BANK
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
3 Centrality
3 Centrality3 Centrality
3 Centrality
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Network measures used in social network analysis
Network measures used in social network analysis Network measures used in social network analysis
Network measures used in social network analysis
 
08 Exponential Random Graph Models (2016)
08 Exponential Random Graph Models (2016)08 Exponential Random Graph Models (2016)
08 Exponential Random Graph Models (2016)
 
Introduction to Complex Networks
Introduction to Complex NetworksIntroduction to Complex Networks
Introduction to Complex Networks
 
Community Detection with Networkx
Community Detection with NetworkxCommunity Detection with Networkx
Community Detection with Networkx
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief Overview
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Graph databases
Graph databasesGraph databases
Graph databases
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network Analysis
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)
 
Link Analysis
Link AnalysisLink Analysis
Link Analysis
 
Complex Network Analysis
Complex Network Analysis Complex Network Analysis
Complex Network Analysis
 
NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 

Viewers also liked

LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
LinkedIn - A Professional Network built with Java Technologies and Agile Prac...LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
LinkedIn - A Professional Network built with Java Technologies and Agile Prac...LinkedIn
 
Merry Christmas
Merry ChristmasMerry Christmas
Merry Christmassoniapr30
 
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...sachivchawla
 
陈兵教授《论附佛外道》
陈兵教授《论附佛外道》陈兵教授《论附佛外道》
陈兵教授《论附佛外道》walkmankim
 
Visuell kommunikation - E-business 2.0
Visuell kommunikation - E-business 2.0Visuell kommunikation - E-business 2.0
Visuell kommunikation - E-business 2.0Kajsa Snickars
 
Interpreting Lync Monitoring and Reporting
Interpreting Lync Monitoring and ReportingInterpreting Lync Monitoring and Reporting
Interpreting Lync Monitoring and ReportingBryan Marks
 
Baud rate is the number of change in signal
Baud rate is the number of change in signalBaud rate is the number of change in signal
Baud rate is the number of change in signalAbhishek Pathak
 
Impressionisme informàtica
Impressionisme informàticaImpressionisme informàtica
Impressionisme informàticatorragrau
 
αντιγονη
αντιγονηαντιγονη
αντιγονηekidrou
 
使用 zotero 做文獻管理及引用(1)
使用 zotero 做文獻管理及引用(1)使用 zotero 做文獻管理及引用(1)
使用 zotero 做文獻管理及引用(1)Chengtao Lin
 
Living in the moment
Living in the momentLiving in the moment
Living in the momentwalkmankim
 
圣严法师108语录
圣严法师108语录圣严法师108语录
圣严法师108语录walkmankim
 
ROBOTS POWER POINT
ROBOTS POWER POINTROBOTS POWER POINT
ROBOTS POWER POINTsoniapr30
 
郑水吉《楞严经新表解》
郑水吉《楞严经新表解》郑水吉《楞严经新表解》
郑水吉《楞严经新表解》walkmankim
 
原始佛教基本典籍 中阿含经
原始佛教基本典籍 中阿含经原始佛教基本典籍 中阿含经
原始佛教基本典籍 中阿含经walkmankim
 
Skriva för webben - E-business 2.0
Skriva för webben - E-business 2.0Skriva för webben - E-business 2.0
Skriva för webben - E-business 2.0Kajsa Snickars
 

Viewers also liked (20)

LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
LinkedIn - A Professional Network built with Java Technologies and Agile Prac...LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
LinkedIn - A Professional Network built with Java Technologies and Agile Prac...
 
Merry Christmas
Merry ChristmasMerry Christmas
Merry Christmas
 
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
 
陈兵教授《论附佛外道》
陈兵教授《论附佛外道》陈兵教授《论附佛外道》
陈兵教授《论附佛外道》
 
Visuell kommunikation - E-business 2.0
Visuell kommunikation - E-business 2.0Visuell kommunikation - E-business 2.0
Visuell kommunikation - E-business 2.0
 
James_McLaughlin_Render
James_McLaughlin_RenderJames_McLaughlin_Render
James_McLaughlin_Render
 
Trailer production
Trailer production Trailer production
Trailer production
 
Interpreting Lync Monitoring and Reporting
Interpreting Lync Monitoring and ReportingInterpreting Lync Monitoring and Reporting
Interpreting Lync Monitoring and Reporting
 
Baud rate is the number of change in signal
Baud rate is the number of change in signalBaud rate is the number of change in signal
Baud rate is the number of change in signal
 
Impressionisme informàtica
Impressionisme informàticaImpressionisme informàtica
Impressionisme informàtica
 
αντιγονη
αντιγονηαντιγονη
αντιγονη
 
使用 zotero 做文獻管理及引用(1)
使用 zotero 做文獻管理及引用(1)使用 zotero 做文獻管理及引用(1)
使用 zotero 做文獻管理及引用(1)
 
Living in the moment
Living in the momentLiving in the moment
Living in the moment
 
圣严法师108语录
圣严法师108语录圣严法师108语录
圣严法师108语录
 
ROBOTS POWER POINT
ROBOTS POWER POINTROBOTS POWER POINT
ROBOTS POWER POINT
 
郑水吉《楞严经新表解》
郑水吉《楞严经新表解》郑水吉《楞严经新表解》
郑水吉《楞严经新表解》
 
原始佛教基本典籍 中阿含经
原始佛教基本典籍 中阿含经原始佛教基本典籍 中阿含经
原始佛教基本典籍 中阿含经
 
Skriva för webben - E-business 2.0
Skriva för webben - E-business 2.0Skriva för webben - E-business 2.0
Skriva för webben - E-business 2.0
 
11조
11조11조
11조
 
S'more fun
S'more funS'more fun
S'more fun
 

Similar to Social Network Analysis Introduction including Data Structure Graph overview.

Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Doug Needham
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneDoug Needham
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkAnastasios Theodosiou
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014James Powell
 
Intro to Graph Theory w Neo4J
Intro to Graph Theory w Neo4JIntro to Graph Theory w Neo4J
Intro to Graph Theory w Neo4JRay Lukas
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and GiraphDoug Needham
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Intro to Graph Theory
Intro to Graph TheoryIntro to Graph Theory
Intro to Graph TheoryRay Lukas
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataJames Hendler
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling TechniqueCarmen Sanborn
 
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...Colin Panisset
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-sharestelligence
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data ConferenceDataTactics
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationRich Heimann
 
Network Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersNetwork Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersRenaud Clément
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 

Similar to Social Network Analysis Introduction including Data Structure Graph overview. (20)

Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZone
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Distributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache SparkDistributed Link Prediction in Large Scale Graphs using Apache Spark
Distributed Link Prediction in Large Scale Graphs using Apache Spark
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
 
Intro to Graph Theory w Neo4J
Intro to Graph Theory w Neo4JIntro to Graph Theory w Neo4J
Intro to Graph Theory w Neo4J
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
ML.pdf
ML.pdfML.pdf
ML.pdf
 
Intro to Graph Theory
Intro to Graph TheoryIntro to Graph Theory
Intro to Graph Theory
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of Metadata
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
 
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
 
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-shareBigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
BigData Visualization and Usecase@TDGA-Stelligence-11july2019-share
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data Conference
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics Corporation
 
Network Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for BeginnersNetwork Mapping & Data Storytelling for Beginners
Network Mapping & Data Storytelling for Beginners
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 

Recently uploaded

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 

Recently uploaded (20)

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 

Social Network Analysis Introduction including Data Structure Graph overview.

  • 1. Social Network Analysis An overview Presentation by @dougneedham
  • 2. Introduction  @dougneedham  Data Guy - Started as a DBA in the Marine Corps, evolved to Architect, now Data Scientist.  Oracle, SQL Server, Cassandra, Hadoop, MySQL, Spark.  I have a strong relational/traditional background.  Perpetual Student  Learning new things challenges our assumptions. Forces us to take a new perspective on “old” problems. Eventually maybe even shows us that there is a better way to solve a problem.
  • 3. Why study social networks?  It is cool.  The concepts around Social Network Analysis can be applied to many interesting problems in a variety of business verticals.  The foundation of Social Network Analysis is Graph theory.  Solving Crime  Some examples: Introduction to Graph_Theory
  • 4. What is Social Network Analysis?  “Social network analysis (SNA) is a strategy for investigating social structures through the use of network and graph theories. It characterizes networked structures in terms of nodes (individual actors, people, or things within the network) and the ties or edges (relationships or interactions) that connect them. Examples of social structures commonly visualized through social network analysis include social media networks, friendship and acquaintance networks, kinship, disease transmission, and sexual relationships. These networks are often visualized through sociograms in which nodes are represented as points and ties are represented as lines.” – Wikipedia  https://en.wikipedia.org/wiki/Social_network_analysis
  • 5. Example From wiki: "Kencf0618FacebookNetwork" by Kencf0618 - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons - https://commons.wikimedia.org/wiki/File: Kencf0618FacebookNetwork.jpg#/medi a/File:Kencf0618FacebookNetwork.jpg
  • 6. A little History  The 7 Bridges of Konisberg  Every tome on Graph theory or Network analysis devotes a small portion of there time to the 7 Bridges of Konisberg.  If I don’t cover this with you, the gods of mathematics will strike me down, and never allow me to do analysis again in the future.
  • 8. The Problem  Folks enjoyed there Sunday afternoon strolls across the bridges, but occasionally people would wonder if one particular route was more efficient than another.  Eventually Leonhard Euler was brought into the debate about the efficiency problem.  Euler used Vertices to represent the land masses and edges (or arcs, at the time) to represent bridges. He realized the odd number of edges per vertex made the problem unsolvable.  Sarada Herke provides for one of the best explanations of the solution Solution to Konisburg  And here is the cool thing about mathematicians. If we tell you something is impossible, we have to tell you why in a way you can understand it. But he also invented the branch of mathematics today we call Graph Theory.  http://en.wikipedia.org/wiki/Leonhard_Euler
  • 9. Why analyze Facebook data?  Facebook is something that most people use.  It is easy to see the relationships and the concepts of the Graph/Network are intuitive to people who are looking at their “own” network.  The main idea is, if you can understand your own friend data, you can learn the concepts quickly, then apply these same concepts to more complicated problems.  We will talk a little about some complicated topics at the end.
  • 10. A few terms  Stand back, we are going to talk about math!  Basically we are talking about a bunch of dots joined together by lines  Vertex – Dot on a graph  Edge – Line connecting the two points  Edge_Label – this is a term I coined originally related to Data Structure Graphs that helps trace a path. If you label your edges, and you have multiple edges with the same label in a Graph you can quite easily identify walks, paths, and cycles through your graph.  Triangle – 3 Vertices, 3 Edges  Square – 4 Vertices, 4 edges  Open Triangle - 3 Vertices, 2 edges /  A lot of things are networks if you look at them the right way.  Mark Newman has done a number of well done presentations, available on Youtube about Network analysis.  https://www.youtube.com/watch?v=lETt7IcDWLI
  • 11. More terms  Transitivity – The friend of my friend is my friend. Really?  Homophily – how things are similar  Directed Graphs – or Digraphs  Contagion – How do things “spread” through a network?  Let’s rearrange things, how does the layout affect understanding?  Order of a graph – number of vertices  Size of the graph – number of edges  This is not just data visualization, it can also be used for prediction. https://www.youtube.com/watch?v=rwA-y-XwjuU
  • 12. Final terms  Centrality – Hub and Authority  This is almost a whole topic by itself, since there are different types of Centrality:  Degree Centrality – Simple, the Vertex with the most degrees is the most central.  Eigenvector Centrality – How important a particular Vertex is to a given network.  PageRank – similar to Eigenvector Centrality, only scaled, and if a given vertex is closely connected to very high PageRank vertex, it is itself given a high PageRank.  Serious nutshell definitions.  Shortest path – How are two vertices connected?  Longest Path – Tracing the flow of an interesting item through a large collection of applications.
  • 13. Why is a path important? More on this later… The Original Joke This is me in different stores
  • 14. The Math doesn’t change.  One thing I like about Graphs –  The Math does not change.  The math behind Graph theory can be a little intense, but it does not change regardless of the scale of the graph.  Once you understand how to “do the math” on a small graph, those same Maths apply to a Graph whether it is a graph of the people in this room, or a graph of the people on this planet.  Now, let me introduce you to a tool that does much of the Mathematics for you…
  • 15. But first, Netvizz…  Netvizz is a tool that extracts data from different sections of the Facebook Platform.  It provides an interface to the Facebook Graph API  https://www.youtube.com/watch?v=3vkKPcN7V7Q  For the version of data we will be looking at, I was able to extract friendship connections. Facebook has since changed their permissions such that you can no longer extract this information.  However, there are some other interesting things you can do with Netvizz.  If you manage a Facebook Group, this might be interesting.  For this particular talk we are going to focus on Gephi interpretation. If we want to have a more in-depth talk on Facebook and the Graph API that Facebook has opened, we can discuss that at another time.  To get this yourself go into Facebook and search for: Netvizz. (You have to authorize it. You can un-authorized it later)  You will have a number of options: group data, page data, page like network, search, and link stats.  Click “group data”  Select a group if you need a sample id use: 39462256584  It runs for a bit, then dumps to a zip file.  Save the file, then extract it.  Open Gephi, and use Gephi to import your GDF file.
  • 16. Gephi http://gephi.github.io/ From the website: “Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.” Java 1.7 required, you may have to set this in Gephi.conf Depending on the size of the network you are studying you may need to increase the memory available to Java in Gephi.conf
  • 18. Gephi – Open GML file
  • 19. Gephi – After opening
  • 24.
  • 25. Metrics  Remember all those numbers we spoke about?  Here are many of them.
  • 28. Here is the layout with the labels as number of connections
  • 32. How do we use this?  Finding bottlenecks.  You have to ignore the fact that everyone on this graph is connected to you for a moment.  How would someone get a message to another given person?  They would have to pass it to someone either they both know, or pass the message to someone who is more likely to be connected to the target of the message.  This was the heart of Milgram’s experiment that gave us the concept of 6 degrees of separation.
  • 33. Other Analysis  What else can be done with Social Network Analysis?  How about risk exposure to banks?  http://www.federalreserve.gov/newsevents/speech/yellen20130104a.htm
  • 34.
  • 35. Application to Business Intelligence  What if the Vertices are not people ?  What if the Edges are not mutual connections?  Jonathan and others over the past few meetings have done a great job at explaining the underpinnings of how a particular BI framework is put together.  Within a Data Architecture there are lots of moving pieces. ETL, FTP, SFTP, Web-Services, External data feeds. Data moving into Data Marts, and Data Warehouses. Data Moving between applications.  Let’s imagine how to visualize this using the information we just gained.
  • 36. Data Structure Graph  A Data Structure Graph is a group of atomic entities that are related to each other, stored in a repository, then moved from one persistence layer to another, rendered as a Graph.  A group of atomic entities.  Related to each other.  Stored in a repository.  Moved from one persistence layer to another.  Rendered as a Graph.
  • 37. Introducing Data Structure Graphs  Data Structure Graph Level 1 (DSG-L1)– This is roughly like an Entity Relationship Diagram (ERD) Tables are Vertices, Foreign Keys are Edges.  Data Structure Graph Level 2 (DSG-L2) – Each Vertex in this graph is an application. Each Edge is data transfer. Roughly equivalent to what we used to call Data Flow diagrams.  Data Structure Graph Dependency (DSG-D) – Each vertex is a job,script, program, or process that is dependent on something happening in sequence before it can do its work.  A DSG-L1 can show you where you are going to have the most interesting query performance of your tables.  A DSG-L2 can show you where the most amount of work is going on in your Enterprise.  A DSG-D can show you the sequence of events that need to take place in order for something to be completed.
  • 38. New Project, Data Table, Import data.
  • 39. Load as “Edges Table” Source, Target (required)
  • 41. After a few calculations and layout runs
  • 42. PageRank – Which application is most important?
  • 43. A few more tweaks
  • 44. Where is that Node with the highest PageRank?
  • 45. Remember paths? The Original Joke This is me in different stores
  • 46. Dijkstra's algorithm  Some of you may have heard of Dijkstra’s algorithm.  It is a method for finding the shortest path between two nodes on a Graph.  This is a great optimization technique, but what if you need to find the longest path?  What “edge_label” has the most influence on my organization?  Iterate through each Edge_Label, create a subgraph that consists of only the nodes this Edge_Label touches, then calculate the diameter of that Graph.  The data point represented by a given Edge_label that has the longest path has the most “value” to your organization.
  • 47. https://dougneedham.shinyapps.io/DataStructureGraph Hard to see, I know, but the top diagram is the “master graph”, the bottom image is a single Edge_Label. You can see how an individual data entity flows through an organization.
  • 48. My book Goes through a number of examples for doing an Graph analysis of a fictional organization.
  • 49. Consider the following:  If you need assistance, send a message to the group, or contact me directly (I am easy to find @dougneedham)  Network/Graph Analysis is cool.  It can show you some interesting things about your data that you may not have considered.  Due thought should be put towards a network analysis project.  Organizing the data requires a bit of thought. (From -> To vertices is just a start).  Directed graph, undirected, bigraph? Setup work needs to be done.  Tools help with the detailed calculations, and show the paths, walks, etc.
  • 50. What did I leave out?  Graphs that change over time – What happens when you remove a single Edge or Vertex?  Growth of a Network – Erdos-Renyi versus Barabasi-Albert models (Random versus Preferential Attachment)  Scale Free networks – Graphs that conform to Power laws. (These are intrinsically Social Networks, but I didn’t give much detail)  Comparing two networks – If you have the same number of edges and nodes, are two graphs the same? Is one graph an isomorphism of another?  Contagion – Ceteris paribus how will things(information, virus’s, data,disease…) spread through the network. (Since a DSG represents different types of Edges based on Edge_Label, Contagion should not affect this type of network entirely.)  Large Graphs – GraphX a part of Apache Spark is best used for this purpose.  The strength of Weak Ties Paradox  Social Capital
  • 51. Finally… Want to do Data Science?  Challenge for members of the audience.  1. Download Gephi.  2. Put together a simple CSV: Source, Target,Edge_Label that describes your own data environment.  3. Load it in Gephi and have Gephi run the metrics, and perform the auto layout.  4. Answer this question: Did you get what you expected?  5. Get a colleague to do the same thing, compare the images. How similar are they?  Here is my hypothesis: If you have more than 5 data applications, including Hadoop, and Data Warehouse infrastructure, your Graph will follow the rules of preferential attachment. (To<->From ETL tools don’t count in the analysis)  Tweet me @dougneedham #DataStructureGraph (anonymized, of course.)  What does your Graph look like?
  • 52. Final Thoughts – Questions?