Introducing PROTEI, a model of transparency & privacy dynamics influencing co...
How Your Facebook Update Could Make You a Victim of Crime : On the Privacy Implications of Multimedia Retrieval
1. How Your Facebook Update Could
Make You a Victim of Crime
: On the Privacy Implications of Multimedia
Retrieval
International Computer Science Institute
Jaeyoung Choi
jaeyoung@icsi.berkeley.edu
UC Berkeley KGSA Talk & Seminar - Oct 16, 2012
Monday, October 29, 12
4. Phillip Markoff
• Boston University medical
student
Monday, October 29, 12
5. Phillip Markoff
• Boston University medical
student
• ‘Craigslist Killer’
- He found his victims through the ads
on Craigslist
Monday, October 29, 12
6. Phillip Markoff
• Boston University medical
student
• ‘Craigslist Killer’
- He found his victims through the ads
on Craigslist
Monday, October 29, 12
7. Phillip Markoff
• Boston University medical
student
• ‘Craigslist Killer’
- He found his victims through the ads
on Craigslist
Monday, October 29, 12
8. Phillip Markoff
• Boston University medical
student
• ‘Craigslist Killer’
- He found his victims through the ads
on Craigslist
Monday, October 29, 12
25. Geo-Tagging : Benefits
Allows easier clustering of photo and video
series as well as additional services.
10
Monday, October 29, 12
26. • OK.. can you do real harm with Geo-tags??
Monday, October 29, 12
27. Doing Real Harm with Geo-Tag
G. Friedland and R. Sommer: "Cybercasing the Joint: On the Privacy
Implications of Geotagging", Proceedings of the Fifth USENIX Workshop
on Hot Topics in Security (HotSec 10), Washington, D.C, August 2010. 12
Monday, October 29, 12
28. Doing Real Harm with Geo-Tag
• Cybercasing: Using online (location-based) data and
services to enable real-world attacks.
G. Friedland and R. Sommer: "Cybercasing the Joint: On the Privacy
Implications of Geotagging", Proceedings of the Fifth USENIX Workshop
on Hot Topics in Security (HotSec 10), Washington, D.C, August 2010. 12
Monday, October 29, 12
29. Doing Real Harm with Geo-Tag
• Cybercasing: Using online (location-based) data and
services to enable real-world attacks.
• Three Case Studies:
G. Friedland and R. Sommer: "Cybercasing the Joint: On the Privacy
Implications of Geotagging", Proceedings of the Fifth USENIX Workshop
on Hot Topics in Security (HotSec 10), Washington, D.C, August 2010. 12
Monday, October 29, 12
30. Doing Real Harm with Geo-Tag
• Cybercasing: Using online (location-based) data and
services to enable real-world attacks.
• Three Case Studies:
G. Friedland and R. Sommer: "Cybercasing the Joint: On the Privacy
Implications of Geotagging", Proceedings of the Fifth USENIX Workshop
on Hot Topics in Security (HotSec 10), Washington, D.C, August 2010. 12
Monday, October 29, 12
32. Case Study 1: Twitter
•Pictures in Tweets can be geo-tagged
13
Monday, October 29, 12
33. Case Study 1: Twitter
•Pictures in Tweets can be geo-tagged
13
Monday, October 29, 12
34. Case Study 1: Twitter
•Pictures in Tweets can be geo-tagged
•From a technically-savvy celebrity we
found:
13
Monday, October 29, 12
35. Case Study 1: Twitter
•Pictures in Tweets can be geo-tagged
•From a technically-savvy celebrity we
found:
–Home location (several pics)
13
Monday, October 29, 12
36. Case Study 1: Twitter
•Pictures in Tweets can be geo-tagged
•From a technically-savvy celebrity we
found:
–Home location (several pics)
–Where the kids go to school
13
Monday, October 29, 12
37. Case Study 1: Twitter
•Pictures in Tweets can be geo-tagged
•From a technically-savvy celebrity we
found:
–Home location (several pics)
–Where the kids go to school
–The place where he/she walks the dog
13
Monday, October 29, 12
38. Case Study 1: Twitter
•Pictures in Tweets can be geo-tagged
•From a technically-savvy celebrity we
found:
–Home location (several pics)
–Where the kids go to school
–The place where he/she walks the dog
–“Secret” office
13
Monday, October 29, 12
39. Case Study 1: Twitter
•Pictures in Tweets can be geo-tagged
•From a technically-savvy celebrity we
found:
–Home location (several pics)
–Where the kids go to school
–The place where he/she walks the dog
–“Secret” office
13
Monday, October 29, 12
40. Case Study 1: Twitter
•Pictures in Tweets can be geo-tagged
•From a technically-savvy celebrity we
found:
–Home location (several pics)
–Where the kids go to school
–The place where he/she walks the dog
–“Secret” office
13
Monday, October 29, 12
41. Celebs unaware of Geo-
Tagging
Source: ABC News 14
Monday, October 29, 12
42. Celebs unaware of Geo-
Tagging
Source: ABC News 14
Monday, October 29, 12
45. Case Study 2: Craigslist
“For Sale” section of Bay Area Craigslist.com:
In 4 days:
• 68729 pictures total
• 1.3% geo-tagged
17
Monday, October 29, 12
47. People are Unaware of Geo-
Tagging
•Many ads with geo-location otherwise
anonymized
18
Monday, October 29, 12
48. People are Unaware of Geo-
Tagging
•Many ads with geo-location otherwise
anonymized
•Sometimes selling high-valued goods, e.g.
cars, diamonds
18
Monday, October 29, 12
49. People are Unaware of Geo-
Tagging
•Many ads with geo-location otherwise
anonymized
•Sometimes selling high-valued goods, e.g.
cars, diamonds
•Sometimes “call Sunday after 6pm”
18
Monday, October 29, 12
50. People are Unaware of Geo-
Tagging
•Many ads with geo-location otherwise
anonymized
•Sometimes selling high-valued goods, e.g.
cars, diamonds
•Sometimes “call Sunday after 6pm”
•Multiple photos allow interpolation of
coordinates for higher accuracy
18
Monday, October 29, 12
53. Geo-Tagging Resolution
iPhone 3G picture Google Street View
gure 1: 1: Photo a bike taken with an an iPhone 3G and corresponding Google Street View image based onon the stor
Figure Photo of of a bike taken with iPhone 3G and a a corresponding Google Street View image based the stored
ordinates. The accuracy of thethe camera location (marked) front of the garage is about +/−1 m. Many classified advertise
coordinates. The accuracy of camera location (marked) in in front of the garage is about +/−1 m. Many classified advert
Measured accuracy: +/- 1m
es contain photos describing objects forfor sale taken home that automatically contain geo-tagging.
sites contain photos describing objects sale taken at at home that automatically contain geo-tagging.
n would make it easy to to increase the confidence in the in in the second step identifying all other videos fr
tion would make it easy increase the confidence in the the second step identifying all other videos from
20
resultsOctober 29, 12
further.
ults further.
Monday,
corresponding users. 106 ofof these turned out to hav
corresponding users. 106 these turned out to have
56. Case Study 3: YouTube
Recall:
22
Monday, October 29, 12
57. Case Study 3: YouTube
Recall:
• Once data is published, the Internet keeps
it (in potentially many copies).
22
Monday, October 29, 12
58. Case Study 3: YouTube
Recall:
• Once data is published, the Internet keeps
it (in potentially many copies).
• APIs are easy to use and allow quick
retrieval of large amounts of data
22
Monday, October 29, 12
59. Case Study 3: YouTube
Recall:
• Once data is published, the Internet keeps
it (in potentially many copies).
• APIs are easy to use and allow quick
retrieval of large amounts of data
Can we find people on vacation in YouTube?
22
Monday, October 29, 12
60. Cybercasing on YouTube
Experiment: Cybercasing using the YouTube
API (240 lines in Python)
Location
Radius Query
Keywords
Results
Users? Query
YouTube
Results
Time-Frame
Distance
Filter
Cybercasing
23
Candidates
Monday, October 29, 12
69. • What if I turn off geo-tagging feature?
Monday, October 29, 12
70. Ongoing Work:
http://mmle.icsi.berkeley.edu
27
Monday, October 29, 12
71. Multimodal Location Estimation
We infer location of a media (video/photo/
document) based on visual, audio, and tags:
•Use geo-tagged data as training data
•Allows faster search, inference, and
intelligence gathering even without GPS.
28
Monday, October 29, 12
72. http://www.multimediaeval.org/
Mediaeval Placing Task
- An annual benchmark which provides standardized
datasets to the community of researchers for the
evaluation of new algorithms
Monday, October 29, 12
73. Overview of Our Approach
{berkeley,
sathergate,
{berkeley,
haas}
campanile}
Edge:
Correlated
loca7ons
(e.g.
common
tag,
visual,
acous7c
feature) Node:
Geoloca7on
of
video
k p(xj |{tk })
p(xi |{ti }) j
p(xi , xj |{tk }
i {tk })
j
{campanile} {campanile,
haas}
Edge
Poten,al:
Strength
of
an
edge,
(e.g.
posterior
distribu5on
of
loca5ons
given
common
tags)
J. Choi, G. Friedland, V. Ekambaram, K. Ramchandran: "Multimodal Location Estimation of Consumer Media: 30
Dealing with Sparse Training Data," in Proceedings of IEEE ICME 2012, Melbourne, Australia, July 2012.
Monday, October 29, 12
74. Results: MediaEval
J. Choi, G. Friedland, V. Ekambaram, K. Ramchandran: "The 2012 ICSI/Berkeley
Video Location Estimation System," in Proceedings of MediaEval 2012, Pisa, Italy,
October 2012.
Monday, October 29, 12
75. YouTube Cybercasing
Revisited
Old Experiment No Geotags
Initial Videos 1000 (max) 107
User Hull ~50k ~2000
Potential Hits 106 112
Actual Targets >12 >12
Even without Geo-Tags, cybercasing on
YouTube video is readily possible
G. Friedland, and J. Choi, “Semantic Computing and Privacy: a Case Study Using
Inferred Geo-Location.” in Int. J. Semantic Computing, Vol. 5, Nr. 1 (2011) , p. 32
79-93.
Monday, October 29, 12
80. Example
Idea: Can one link videos across acounts?
34
Monday, October 29, 12
81. Example
Idea: Can one link videos across acounts?
(e.g. YouTube linked to Facebook vs
anonymized dating site)
34
Monday, October 29, 12
82. Persona Linking using Internet
Videos
Speaker Recognition System
- Given a voice sample, it tells whether it’s from Howard, Gerald, Jae, etc..
City Identification System
- Modified from the traditional Speaker Recognition System
- Given an audio sample, it tells whether it’s from Seoul, San Francisco, Berlin, etc..
Monday, October 29, 12
83. Experiment
- 4869 test videos from Flickr
- 500 users in training,
493 hits, 2289 non-hits in test
-Audio characteristics :
“wild” (70% heavy noise, 50%
speech )
H. Lei, J. Choi, A. Janin, and G. Friedland: “Persona Linking: Matching Uploaders of
Videos Accross Accounts”, at IEEE International Conference on Acoustic, Speech, and
Signal Processing (ICASSP), Prague, May 2011
Monday, October 29, 12
84. User ID on Flickr videos
Even with a preliminary
experiment setting, the system
performs much better than
random.
(26.3% < 50% Equal Error Rate)
Monday, October 29, 12
85. Linkage Attacks on SNS
• There are methods to link a user’s accounts only by
accessing publicly available data.
• Your anonymity across different social networking
services accounts can be compromised.
Monday, October 29, 12
86. Linkage Attacks
• Multiple accounts on social networks
• Same or different purposes : reviewing ...
• What about the aggregate trace they leave
?
2
Works and slides credit: Oana Goga (http://www-npa.lip6.fr/~goga/)
Monday, October 29, 12
87. De-anonymization Model
?
im ilar
o ws
h
Targeted account
(YELP users are id’d)
Candidate list
7
Works and slides credit: Oana Goga (http://www-npa.lip6.fr/~goga/)
Monday, October 29, 12
88. Where a user is posting
- Twitter locations
- Yelp locations
5
Works and slides credit: Oana Goga (http://www-npa.lip6.fr/~goga/)
Monday, October 29, 12
89. When a user is posting
6
Works and slides credit: Oana Goga (http://www-npa.lip6.fr/~goga/)
Monday, October 29, 12
90. Performance of matching with
location profile
1
Yelp − Twitter 35% of Flickr Flickr and 60%
40% of and Yelp
Flickr − Twitter
accountsYelp accounts can
of can be matched
0.8 to a set of 250 Twitter set of
be matched to a
•x
accounts Twitter accounts
1000
CDF users
0.6
0.4
0.2
0
10 100 250 1000 10000
Rank of the ground truth user
11
Works and slides credit: Oana Goga (http://www-npa.lip6.fr/~goga/)
Monday, October 29, 12
92. Problems Reformulated
• Many applications are encouraging sharing data
heavily and users follow
Monday, October 29, 12
93. Problems Reformulated
• Many applications are encouraging sharing data
heavily and users follow
• Multimedia isn’t only a lot of data, it’s also a lot
of information
Monday, October 29, 12
94. Problems Reformulated
• Many applications are encouraging sharing data
heavily and users follow
• Multimedia isn’t only a lot of data, it’s also a lot
of information
• Users and engineers often unaware of (hidden)
retrieval possibilities of shared (multimedia) data
Monday, October 29, 12
95. Problems Reformulated
• Many applications are encouraging sharing data
heavily and users follow
• Multimedia isn’t only a lot of data, it’s also a lot
of information
• Users and engineers often unaware of (hidden)
retrieval possibilities of shared (multimedia) data
• Local anonymization and privacy policies
ineffective against cross-site inference
Monday, October 29, 12
97. Status Quo
• People will continue to want social
networks and location-based services
Monday, October 29, 12
98. Status Quo
• People will continue to want social
networks and location-based services
• Industry and research will continue to
improve retrieval techniques
Monday, October 29, 12
99. Status Quo
• People will continue to want social
networks and location-based services
• Industry and research will continue to
improve retrieval techniques
• Government will continue to do forensics
and intelligence gathering
Monday, October 29, 12
100. What Now?
• Research might help to:
• quantify and qualify risk factors
• visualize and offer choices in UIs
• identify privacy breaking information
Monday, October 29, 12
101. Conclusion
• We should continue to explore multimedia
retrieval
• At the same time we should:
• research methods to help mitigate risks and
offer choice
• develop privacy policies and APIs that take
into account multimedia retrieval
• educate users and engineers on privacy issues
Monday, October 29, 12
103. Take Home Message
• Be aware of the risks of revealing your
personal life online
Monday, October 29, 12
104. Take Home Message
• Be aware of the risks of revealing your
personal life online
• Think TWICE before you post something
on Facebook/Twitter/...
Monday, October 29, 12
105. Thank You!
Questions?
Work together with:
Robin Sommer, Oana Goga, Venkatesan Ekambaram, Kannan Ramchandran,
Luke Gottlieb, Howard Lei, Adam Janin, Oriol Vinyals, Trevor Darrell, Gerald
Friedland
and many others..
49
Monday, October 29, 12