This document discusses social media analysis and privacy. It provides an overview of social media networks and the privacy issues related to them. Specifically, it examines who might want access to social media data and why, including government, businesses, and academia for purposes like tracking individuals, advertising, and research. It also outlines the privacy rights that should be protected when collecting and analyzing social media data, noting relevant US laws and regulations for government, businesses, and academics handling personal information from social networks.
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Social Media Privacy and Regulations
1. Unclassified // Public Release
Social Media Analysis and Privacy
Joshua White
whitej@ainfosec.com
Senior Computer Engineer
Assured Information Security
http://ainfosec.com
PhD Student of Engineering Science
Clarkson University
Date: Oct 31, 2012
Release: Unclassified // Public
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
2. Unclassified // Public Release
About: Company
AIS (Assured Information Security)
Research and Development of technologies and capabilities to support
effective operations within the entirety of the cyber domain.
Leading pioneers in the disciplines of Information Operations including
Network Operations, Electronic Warfare, and Computer Network
Operations of all types.
Located In:
Rome NY (Corporate Headquarters)
Portland OR
Baltimore MD
Beavercreek OH
San Antonio TX
Colorado Springs, CO
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
3. Unclassified // Public Release
About: Speaker
Joshua White
Education:
AAS Computer Network Technology (FLCC)
BS / MS Telecommunications (SUNYIT)
PhD Student of Engineering Science (Clarkson University)
Experience:
7+ years Government Contracting in Information Security and
Telecommunications Engineering
Areas of Study:
Intrusion Detection Systems
Optical Network Security
Large Dataset Analysis
Distributed Processing
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
4. Unclassified // Public Release
Overview: The Big Questions
Introduction
The Big Research Questions:
What are social media networks?
What is the privacy problem relating to them?
Who would want this data and why?
What rights of privacy must I protect?
What regulations regarding privacy exist?
What happens if I don't protect the privacy?
Conclusions
References
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
5. Unclassified // Public Release
What are social media networks?
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
6. Unclassified // Public Release
Definition
Social Media Networks
DHS identified multiple categories [1]
Search
Video
Maps
Photos
Blog aggregates
Micro-blogs
Traditional social networks
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
7. Unclassified // Public Release
What's the privacy problem as it relates to
these social media networks?
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
8. Unclassified // Public Release
Problem
Two-Part Problem
End Users
Unsure or unaware on ways to properly protect their privacy
Data Collectors
Don't know how to properly maintain the privacy of their
datasets
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
9. Unclassified // Public Release
Problem
Social Media-Networking Sites
Provide a communications method thought by many to be at least
somewhat private
Many never change the default security settings associated with their
accounts
Example: Percentage of Facebook users by age that change their
account security settings to anything other then the default (no
security) setting [2]
18-19 years old = 71%
30-39 years old = 67%
50-64 years old = 55%
80% of all users fall within the 18-64 age range
Estimated 20+ million users have no security but must still have a basic
expectation of privacy
Provides the largest “Social Network” datasets available for study
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
10. Unclassified // Public Release
Who would want this data and why?
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
11. Unclassified // Public Release
Problem Focus: Data Collectors
The problem of user expectations and
knowledge of privacy settings is for another
discussion
Lets focus on the “larger” problem
Data Collection
What can we collect?
What can we do with the data?
How must we protect the privacy of an individual’s PII
contained within the datasets?
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
12. Unclassified // Public Release
Social Media Networks Awareness
Benefits:
Government
Track locations of persons of interest with reasonable
accuracy
“Bad guys” may have protected posts
Sometimes accessible by simply looking at their friend’s
posts, or even other sites that they have allowed access
within their accounts
Track trends
Who said what, who repeated it?
Is it going to cause a riot or worse yet, a war?
News before “official” reports
Natural disasters, shootings, etc
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
13. Unclassified // Public Release
Social Media Networks Awareness
Benefits:
Businesses
Directed advertising
Track locations of consumers with reasonable accuracy
Track buying habits and interests
Track trends
Who said what, who repeated it, is something going to
effect a brand?
News before “official” reports
Did something happen that will effect the market rapidly
Natural disasters, news reports, etc
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
14. Unclassified // Public Release
Social Media Networks Awareness
Benefits:
Academia
Research
Track locations of subjects with reasonable accuracy
Track habits, interests and moods over time
Track trends
Who said what, who repeated it (graph theory)?
Study social networks with the largest datasets ever created
Collaborate with millions
Build prediction models
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
15. Unclassified // Public Release
Social Media Networks Awareness
It Concerns Groups Differently
Persons of interest
Don't want to be incriminated in things that you may not have done
Consumers
Don't want others to know things about their buying habits that can
be used against them
Subjects
Don't want information released that might cause them to be judged
by their peers
Some Concerns Everyone Shares
Discrimination
A feeling of (privacy) violation
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
16. Unclassified // Public Release
Case Study: Twitter
A real-time social media network of microblogs
Various API's
Search, Live, Historical
Highly accessible
Example: NodeXL offers a MS Excel plugin for quickly grabbing a
few thousand samples a day from multiple sites
Large user base
65+ million “tweets” per day
750+ “tweets” per second
International Community
At least 27 languages represented
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
17. Unclassified // Public Release
Case Study: Twitter
Twitter is used by:
People
Every Day Individuals
Politicians
Celebrities
Professionals
Bad Guys
Objects
Gadgets that tweet (Sensors, bots, computers, spammers)
Labeled Nefarious Groups
Lulzsec
Anonymous
others
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
18. Unclassified // Public Release
Case Study: Twitter
What's accessible:
Posts contain far more then what's shown in the
http://www.twitter.com web interface
Data is accessible as XML or in it's native JSON form
Data includes:
Location (Geo fields)
User names / real names
Threading
Track conversations using replies
Track re-tweets
Twitter client software data
Time stamping
Tweet text
And so much more
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
19. Unclassified // Public Release
Case Study: Twitter
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
20. Unclassified // Public Release
Case Study: Twitter
What can be done with all of this:
NYC Company DataMinr
Report the death of Bin Laden:
25 minutes after he was killed
13 minutes before the presidents address
They saw the first message regarding this only 19 minutes
after it happened
They were able to trace even earlier messages that with the
right algorithms would have shown something going on
before the initial military strike
Reports of US helicopter flying over head
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
21. Unclassified // Public Release
Case Study: Twitter
Consequences
Data on sites like twitter can be used to:
Predict Social Security numbers with reasonable accuracy [4]
Deduce the gender of an individual from nothing but the
message text [5]
Track a persons physical location and create predictable
pattern maps
Deny services based on views and opinions expressed
Use posts, even those that were deleted as evidence in court
[6]
So much more
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
22. Unclassified // Public Release
What rights of privacy must I protect?
&
What laws regarding privacy regulation exist?
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
23. Unclassified // Public Release
Privacy Protection / Regulation
First we need a strict definition for what is
and isn't PII (Personally Identifiable
Information)
PII is any information that can be used to identify a
specific individual
This includes data that can be combined with other sources
to identify an individual
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
24. Unclassified // Public Release
Privacy Protection / Regulation
You decided to use this data, what's next?
Protecting the PII of individuals within the dataset is
key, and to some extent dependent on who you are
We're back to:
Government
Businesses
Academics
Let's concentrate on US law during the rest of this talk
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
25. Unclassified // Public Release
Privacy Protection / Regulation
The US Government
Must protect the privacy of its citizens
Federal: Cannot collect data on citizens without a warrant
States: Cannot collect data on citizens without just cause
Cannot deny citizens the right to use social media networks
Cannot enforce privacy on the individual
Can enforce regulations on the social media companies
and those who use the data
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
26. Unclassified // Public Release
Privacy Protection / Regulation
Businesses in the US
Must protect the privacy of consumers
Must abide by regulations imposed by the government
that the site is located within
While not required by law, it's good practice to let
consumers know what is being done with their data
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
27. Unclassified // Public Release
Privacy Protection / Regulation
Academics in the US
Must protect the privacy of subjects
This applies even in instances where data is gathered without
consent, such as from social media network sites
Consent is not required for the collection of information
from these sites
Depending on the specific sites EULA, datasets may:
Not be shared with other researchers outside of the
organization
Not be duplicated within a publication
Summation through statistics and results is OK
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
28. Unclassified // Public Release
What happens if I don't protect the privacy of the
individuals within my datasets?
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
29. Unclassified // Public Release
Consequences
There are obvious legal ramifications for
not protecting the privacy of individuals
within a dataset
Legal (Federal / State)
Legal personal injury
Not so obvious
Loss of consumer trust / support
Loss of position through ethics violation clauses
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
30. Unclassified // Public Release
Consequences: Example
Ethics can be tied closely to privacy
Harvard researchers accessed complete Facebook
profiles of 1700 students [7]
Data consisted of public profiles collected within the
university
Researchers outside the university had to apply for access to
the data
Data manual contained statistics about the dataset that did
not require the application to be filled out
These statistics were used to identify individuals
Consequently researchers lost funding and the
University found that opinion of the school had
lowered
Researchers were put before the ethics board
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
31. Unclassified // Public Release
Conclusion
Social media network datasets contain PII
PII is not just profile data, it's also unseen
fields such as geo-location and data that can
be derived from the messages posted
Datasets can not be shared outside an
organization without prior permission if
required by the EULA
If the EULA allows for sharing of the data, it
still must be properly anonymized
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
32. Unclassified // Public Release
References
[1] DHS, Office of Operations Coordination and Planning, “Publicly Available Social Media
Monitoring and Situational Awareness Initiative,” June 22, 2010
[2] “Vaidhyanathan, S.; , “Welcome to the surveillance society,” Spectrum, IEEE , vol.48,
no.6, pp.48-51, June 2011 doi: 10.1109/MSPEC.2011.5779791
[3] Brodkin, Jon.; , “Bin Laden death-detecting analytics services signs partnership with
Twitter,” ArsTechnica, Apr 9 2012
[4] Alessandro Acquisti, Ralph Gross.; ,“Predicting Social Security Numbers from public
data”, Proceedings of the National Academy of Sciences, vol. 106, no. 27, July 7,
2009.
[5] Burger, John., Et. All.; , “Discriminating Gender on Twitter,” Mitre Corp, Nov, 2011
[6] Smith, . ; , "No warrant needed, no privacy: Judge rules even deleted tweets can be
used in court," Network World, Apr. 24, 2012
[7] Parry, Marc., ; , "Harvard Researchers Accused of Breaching Students' Privacy," The
Chronicle of Higher Education, July 10, 2011
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.
33. Unclassified // Public Release
Questions
153 Brooks Road, Rome, NY | 315.336.3306 | http://ainfosec.com
Copyright 2012 Assured Information Security, Inc.