1. Social Network Analysis
Approach and Applications
Joshua S. White
PhD Candidate, Engineering Science
April 22, 2014
Committee Members:
Jeanna N. Matthews, PhD (Advisor)
John S. Bay, PhD (External Examiner)
Chris Lynch, PhD
Chen Liu, PhD
Stephanie C. Schuckers, PhD
| Clarkson University 1/42
3. Motivation
Partially inspired by Gladwell’s book, The Tipping Point [1], in which he discusses
how life can be thought of as an epidemic. Some criticism exists as to Gladwell’s
rigor, however for our use it is about inspiration and motivation not accuracy.
The Books Key Points “for our purposes”
• Actors (Connectors, Mavens, Salesmen).
• Information spreads like disease.
• Ideas reach a tipping point (critical mass).
Let’s Face It - Social Networks Are Fun
• We are a social species, that enjoy communicating and self adulation.
| Clarkson University 3/42
4. Problem Questions
• Can we come up with a way of classifying users based on actor types?
• Can we determine who the opinion leaders or influencers are?
• Can we determine how information spreads on these networks?
• Can we detect malicious social network use?
• Are there information security applications for social network data-mining?
| Clarkson University 4/42
5. Method & Publications
• Establish a reliable collection mechanism.
• Establish a large dataset that can be utilized to answer each question.
• Use a case study approach, whereby each case feeds the next.
• Produce each case study as an individual publication or presentation.
– 3 x Published Proceedings
– 2 x Pending Proceedings
– 3 x Invited Presentations
| Clarkson University 5/42
6. Coalmine
• Scales well based on initial tests
• Useful for both manual and automated detection
• Allowed us to refine our data collection capabilities
At the Time (Future Work)
• Rebuild of the tool to fix scaling limitations
• More extensible Map/Reduce method
• Inclusion of native multi-threading capability
• New storage and distribution method
• New algorithms for automated opinion leader detection
| Clarkson University 6/42
7. PySNAP
• Fixes all of the previous issues with Coalmine
• Completely reimplimented in Python with a few supportive Bash scripts
• Utilizes the DISCO MapReduce framework, also built on Python
• Included a better method for data capture that was previously bolted on to Coalmine
• Allowed us to establish a large dataset for future work
| Clarkson University 7/42
8. Established Dataset
• Over the course of 2012 we collected 165 TB of Twitter Data (Uncompressed)
– 175 Days Collected, 147 Full Days
∗ Estimated 45 Billion Tweets
– Recently released estimates place total Twitter traffic at 175 million tweets per
day in 2012
– Thus our daily collection rates varied between 50% and 80% of total Twitter
traffic.
– We captured complete tweet data in JSON format using Twitters REST API.
∗ This data includes a large number of additional fields other than the mes-
sage text, all of which can be taken into account when doing measure-
ments.
| Clarkson University 8/42
10. Botnet Command & Control Detection
• Joshua S White, Jeanna N Matthews, and John L Stacy. Coalmine: an experience in building a system for social
media analytics. In SPIE Defense, Security, and Sensing, pages 84080A-84080A. International Society for Optics
and Photonics, 2012.
| Clarkson University 10/42
11. Botnet Command & Control Detection Continued
Date/Time UID Text MSG Entropy Source
Sun Mar 20 15:27:02
+0000 2011
49492150
668365824
Shutdown -r now 3.373557
26227518
http://twitter.com/Ebastos
Sun Mar 20 01:25:20
+0000 2011
49280326
475853825
# shutdown -h now 3.373557
26227518
http://twitter.com/ohdediku
Sun Mar 20 21:40:53
+0000 2011
49586229
964062720
$ sudo shutdown -h
now
3.373557
26227518
http://twitter.com/souzabruno
Sun Mar 20 19:38:41
+0000 2011
49555476
769280000
Text: sudo shut-
down -h now
3.373557
26227518
http://twitter.com/stormyblack
Sun Mar 20 18:51:51
+0000 2011
49543693
820116992
shutdown -now 3.373557
26227518
http://twitter.com/godzilla2k9
Sun Mar 20 18:52:30
+0000 2011
49543856
840126464
shutdown -h now !: 3.373557
26227518
http://twitter.com/ph3nagen
Sun Mar 20 18:52:30
+0000 2011
49600582
113177600
shutdown -H now. 3.373557
26227518
http://twitter.com/willybistuer
Sun Mar 20 22:37:54
+0000 2011
49597117
039251457
elmenda: su shut-
down -h now
3.373557
26227518
http://twitter.com/NeoVasili
| Clarkson University 11/42
12. Phishing Website Detection
• Joshua S White, Jeanna N Matthews, and John L Stacy. A method for the automated detection phishing websites
through both site characteristics and image analysis. In SPIE Defense, Security, and Sensing, pages 84080B- 84080B.
International Society for Optics and Photonics, 2012.
| Clarkson University 12/42
14. Phishing Website Detection Continuum: ML based
detection
• Title: An Image-based Feature Extraction Approach for Phishing Website Detection
• Authors: Hao Jiang, Joshua White, Jeanna Matthews
• Builds off of our previous work in phishing website detection, specifically the image
analysis approach
• Utilizes a Machine Learning based approach to identifying the most prominent images
on a webpage, usually the sites logo
• Is able to detect phishing sites that the phash/hamming distance method concludes as
not similar.
– These are the “poor quality” phishing sites
| Clarkson University 14/42
15. Malware Infection Vector Detection
• BEK (The Blackhole Exploit Kit) was the predominant MaaS (Malware as a Service)
in 2012.
• It accounted for an estimated 29% of all malicious URLs.
• BEK licenses went for around 1500$ USD
• BEK used Twitter as it’s primary means of spreading infectious URLs
• Our method detects these malicious URLs and infectious accounts on a large scale
| Clarkson University 15/42
16. Malware Infection Vector Detection Continued
• Joshua S. White and Jeanna N. Matthews, “It’s you on photo?: Automatic detection of Twitter accounts in-
fected with the Blackhole Exploit Kit,” Malicious and Unwanted Software: "The Americas" (MALWARE), 2013 8th
International Conference on , vol., no., pp.51,58, 22-24 Oct. 2013 doi: 10.1109/MALWARE.2013.6703685
| Clarkson University 16/42
19. Actor Identification
• Title: Connectors, Mavens, Salesmen and More: Actor Based Online Social Network
(OSN) Analysis Method Using Tensed Predicate Logic
• Authors: Joshua White and Jeanna Matthews
• Submitted to KDD2014 (Knowledge Discovery and Data Mining) Conference “Data
Mining for Social Good”
• Utilized multiple definitions of actor types to created tensed predicate logic descriptions
• Translated these logics into semantic queries
• Tested the queries against a known dataset
| Clarkson University 19/42
21. Actor Identification Continued
• Time is important
• Previous methods did not take event sequence into account
• Liaison Example:
| Clarkson University 21/42
24. Event Identification
• Still in the initial stages of this part of our work
• Given a general topic, “search term, hashtag,” we can identify most of the related
content from the dataset
• We have a means for alerting on all new posts regarding that term
• We can dig historically through the data and trace the path that an itea took
• We can identify the influential individuals, “accounts,” that played a part in the infor-
mation spread
• Our test case was the KONY2012 Event
| Clarkson University 24/42
30. Conclusions
• We aimed to answer the following questions when we started this work:
– Can we come up with a way of classifying users based on actor types?
– Can we determine who the opinion leaders or influencers are?
– Can we determine how information spreads on these networks?
– Can we detect malicious social network use?
– Are there information security applications for social network data-mining?
• I think we did a good job at providing at least some cursory answers to these questions
| Clarkson University 30/42
31. Future Work
• We have applied for a data grant from Twitter
• We have, are in the process of, moving our entire dataset to the lab at Clarkson and
building up a new capture/analysis system
• I am planning on pursuing the semantic side of social network analysis
– Currently only one SNA semantic ontology exists and it’s on on paper.
– I am planning on rolling both the actor and event analysis into one approach
which will be part of a new ontology
| Clarkson University 31/42
32. Acknowledgements
• I would like to thank:
– Dr. Matthews
– Dr. Bay
– Dr. Lynch
– Dr. Schuckers
– Dr. Liu
| Clarkson University 32/42
33. References
[1] Gladwell, M. (2000). The tipping point. Boston: Little, Brown and Company
| Clarkson University 33/42