1. Prepared for:
New Directions in the Science of Differential Privacy
March 2013
A Lifecycle Approach
to Information Privacy
Micah Altman
<Micah_Altman@alumni.brown.edu>
Director of Research, MIT Libraries
Non-Resident Senior Fellow, Brookings Institution
A Lifecycle Approach to Information Privacy
2. Collaborators*
A Lifecycle Approach to Information Privacy
• Privacy Tools for Sharing Research Data Project:
Edo Airoldi, Stephen Chong, Merce Crosas, Cynthia
Dwork Gary King, Phil Malone, Latanya Sweeney, Salil
Vadhan
• Research Support
Thanks to, the National Science Foundation (award 1237235),
the Sloan Foundation and the Massachusetts Institute of
Technology, & Harvard University.
* And co-conspirators
3. Related Work
Reprints available from:
micahaltman.com
• Comments on ANPRM: Human Subjects
Protection,
http://dataprivacylab.org/projects/irb/Vad
han.pdf
• Privacy tools project proposal:
http://privacytools.seas.harvard.edu/full-
project-description
A Lifecycle Approach to Information Privacy
4. Overarching challenges
A Lifecycle Approach to Information Privacy
• Law is evolving
– specification of technical requirements
– new legal concepts – “Right to be forgotten”
• Research is changing
– evidence base shifting:
reliant on big data, transactional data, new forms of data
– conduct of research distributive, collaborative, multi-
institutional, multi-national
– Infrastructure is changing:
cloud & distributed third-party computation & storage
• privacy analysis is advancing
– new computational privacy solution concepts
– new findings from reidentification experiments
– new methods for estimating utility/privacy tradeoffs
5. Shifting social science evidence base
How to deidentify without destroying utility?
• The “Netflix Problem”: large, sparse datasets that overlap can be
probabilistically linked [see Narayan and Shmatikov 2008]
• The “GIS problem”: fine geo-spatial-temporal data very difficult to
mask, when correlated with external data [see Zimmerman & Pavlik
2008; Zan et al, 2013; Srivatsa & Hicks 2013]
• The “Facebook Problem”: Possible to identify masked network data, if only
a few nodes controlled. [see Backstrom, et. al 2007]
• The “Blog problem” : Pseudononymous communication can be linked
through textual analysis [see Novak,, Raghavan, and Tomkins 2004]
Source: [Calberese 2008; Real Time Rome Project 2007]
A Lifecycle Approach to Information Privacy
6. CUSP aims for the the Leading Edge
• Urban Informatics –
high-velocity localized social science
• Leading edge data –
sensors, crowd-sourcing
• Leading edge privacy needs –
privacy policy,
privacy award information management,
privacy ethics
A Lifecycle Approach to Information Privacy
7. Data InputOutput Approach
Published Outputs
* Jones * * 1961 021*
* Jones * * 1961 021*
* Jones * * 1972 9404*
* Jones * * 1972 9404*
* Jones * * 1972 9404*
Modal Practice
“The correlation between X and
Y was large and statistically
significant”
Summary statistics
Contingency table
Public use sample microdata
Information Visualization
A Lifecycle Approach to Information Privacy
8. Questions Generated from Data I/O Model
Solution Concepts
• Comparison of risks
across concepts
• Extension of solution
concepts range
Processing Stage
• How to apply DP to new analytic
methods?
– Bayesian methods
– Data mining methods
– Text analysis methods
• How to apply DP to different types of
“Microdata”
– Network data
– Text
– Geospatial traces
– Relations A Lifecycle Approach to Information Privacy
Disclosure Deterministic Probabilistic
Individual Record Linkage
K-anonymity
Reidentification probability
Group
attributes
K-anonymity +
heterogeneity
(e.g. l-diversity
Threat analysis
SDC on skewed magnitude tables
Individual
Attributes
Attribute disclosure Differential privacy
Distributional privacy
Bayesian-optimal privacy
specified
columns/rows
Private Multiparty Computation
Questions about transformation
– Imputation methods
– Computation efficiency
– Informational utility*
See for example:
- Dwork & Smith 2009
* “My, what a large ε you have, grandma!”
9. Information Life Cycle Model
A Lifecycle Approach to Information Privacy
Creation/Colle
ction
Storage/I
ngest
Processing
Internal SharingAnalysis
External
dissemination/publica
tion
Re-use
Long-term
access
10. Legal/Policy Frameworks
Contract Intellectual Property
Access
Rights Confidentiality
Copyright
Fair Use
DMCA
Database Rights
Moral Rights
Intellectual
Attribution
Trade Secret
Patent
Trademark
Common Rule
45 CFR 26
HIPAA
FERPA
EU Privacy Directive
Privacy
Torts
(Invasion,
Defamation)
Rights of
Publicity
Sensitive but
Unclassified
Potentially
Harmful
(Archeological
Sites,
Endangered
Species, Animal
Testing, …)
Classified
FOIA
CIPSEA
State
Privacy Laws
EAR
State FOI
Laws
Journal
Replication
Requirements
Funder Open
Access
Contract
License
Click-Wrap
TOU
ITAR
Export
Restrictions
A Lifecycle Approach to Information Privacy
11. Questions Generated by Lifecycle Model
A Lifecycle Approach to Information Privacy
• Which laws apply to each stage:
– are legal requirements consistent
across stages?
• How to align legal instruments:
– consent forms, SLA, DUA’s
• Optimizing privacy risk/utility/cost
across the research stages…
when is it more efficient to…
– apply disclosure limitation at data
collection stage?
– Use particular solution concepts at
particular stages
– Harmonize concepts/treatments across
stages
• Policy design
– Policies to internalize future / public
stakeholder needs
– Policy equilibrium under different
privacy solution concepts
• Information reuse
– Bayesian priors
– Scientific verification and replication
• Infrastructure needs
– Data acquisition, storage, dissemination
– Identification, authorization,
authentication
– Metadata, protocols
Creation/Co
llection
Storage
/Ingest
Processing
Internal
Sharing
Analysis
External
dissemination/pub
lication
Re-use
Long-
term
access
Research
methods
Data Management
Systems
Legal / Policy
Frameworks∂
∂
Statistical /
Computational
Frameworks
12. Questions on Differential Privacy from
Information Lifecycle Analysis: Legal
• Legal requirements -- when does law …
– require exact answers? (DP does not give exact answers)
– give safe harbor if linkages are ‘only’ probabilistic? (DP provides safe
harbor in this case)
– require action based on “actual knowledge”? (How do we include strongly
informative priors in DP? When is DP not actually “worst case”?)
– require analysis of a specific unit of observation? (DP does not give
answers for individual units.)
– require balance of privacy and utility (DP does not inherently balance, but
uses minimax – maximizes utility subject to given privacy constraint. What
is appropriate choice of privacy constraint?_
• Legal instruments -- how to describe DP protections in a legally
coherent way for …
– service level agreements
– consent/deposit terms
– data usage agreements
A Lifecycle Approach to Information Privacy
13. Questions on Differential Privacy from
Information Lifecycle Analysis: System Design
• System design: potential increased implementation cost of DP:
– Information security -- hardening
– Information security – certification & auditing
– Model server development, provisioning, maintenance, reliability, availability
• System design: information security tradeoffs of DP… Interactive systems have larger
vulnerability:
– Availability risks: denial of service attack
– Availability/integrity risks: privacy budget exhaustion attacks
– Integrity risks: modification of delivered results (e.g. man-in-the-middle attacks)
– Secrecy/privacy: breach of authentication/authorization layer
• System design: optimizing privacy & utility across lifecycle
– When does limiting disclosive data collection (e.g. using randomized response, group aggregated
methods) dominate applying DP to data analysis stage
– When does restricted virtual data enclaves + public synthetic data dominate public DP queries (of
same type)
• System design: Information reuse
– How do you incorporate informative priors in DP privacy solution concept?
(When does the “Terry Gross” problem apply?)
– What’s required for ensuring scientific replication/verification of results produced by differentially
private model servers?
– How to do DP query on confidential data linked with externally provided microdata?
A Lifecycle Approach to Information Privacy
14. Questions on Differential Privacy from
Information Lifecycle Analysis: Policy Design
• Policy design: “market failures” for privacy goods
– Is their a market failure, how do we know?
– What is the nature of the market failure:
• Conditions on market structure/market power: Barriers to entry? Natural
monopoly/network effect? First-mover advantage, path dependency?
• Conditions on goods: excludability, rivalry, externality
• Conditions on exchange: transaction costs, agency problems, bounded
rationality, or informational asymmetry
• Policy design: policy equlibria
– When does enforcing a specific privacy concept yield socially
optimal solution?
– When is DP a prisoner’s dilemma?
(E.g. I contribute to a database for a small payment, since my
unilateral entry does note effect result, but equilibrium is that
database is largeand you learn substantially more about me than if it
database was small.)
A Lifecycle Approach to Information Privacy
15. Urban Instrumentation and Confidentiality
Specific data source
• Administrative records
• Transactions
• Traffic
• Health
• Mobile phones
• Microenvironment
• Crowdsource
Possible nosy questions…
• Were you fined?
• What did you buy?
• Where were you?
• Are you sick?
• How rich are you?
• Do you have meth lab?A Lifecycle Approach to Information Privacy
Categories
• Infrastructure
• Environment
• People
• Community – self-identified
neighborhood, school district,
voting precinct, election district,
police beat, crime locations,
grocery prices, produce availability
Privacy implications
• Business confidentiality
• Security & safety – infrastructure
chokepoints; police coverage;
endangered species; animal
testing labs; environmental
hazards
• Personal privacy
16. Law
Social Science
Public Policy
Data Collection
Methods
(Research
Methodology)
Data Management
(Information Science)
Statistics
Computer Science
• Privacy-aware data-management
systems
• Methods for confidential data
collection and management
Interdisciplinary Research Required
Law
Social
Science
Public Policy
Research
Methodology
Information
Science
Statistics
Computer
Science
• Creative-Commons-like modular license plugins
for privacy data use; consent; terms of service
• Model legislation – for modern privacy concepts
• Privacy requirements taxonomy and
classification
• Game theoretic/social-choice models of social
privacy equilibria under different privacy
policies
A Lifecycle Approach to Information Privacy
17. References
• Backstrom, Lars, Cynthia Dwork, and Jon Kleinberg. "Wherefore art thou r3579x?: anonymized social
networks, hidden patterns, and structural steganography." Proceedings of the 16th international
conference on World Wide Web. ACM, 2007
• C. Dwork, A. Smith, 2009, “Differential Privacy for Statistics: What we Know and What we Want to
Learn “, Journal of Privacy and Confidentiality (2009) 1(2) 135–154
• Narayanan, Arvind, and Vitaly Shmatikov. "Robust de-anonymization of large sparse
datasets." Security and Privacy, 2008. SP 2008. IEEE Symposium on. IEEE, 2008.
• Novak, Jasmine, Prabhakar Raghavan, and Andrew Tomkins. "Anti-aliasing on the web." Proceedings
of the 13th international conference on World Wide Web. ACM, 2004.
• M Srivatsa and Mhi cks. 2012. Deanonymizing mobility traces: using social network as a side-channel.
In Proceedings of the 2012 ACM conference on Computer and communications security (CCS '12).
ACM, New York, NY, USA, 628-637. DOI=10.1145/2382196.2382262
http://doi.acm.org/10.1145/2382196.2382262
• Bin Zan, Zhanbo Sun, Macro Gruteser, and Xuegang Ban. 2013. Linking anonymous location traces
through driving characteristics. In Proceedings of the third ACM conference on Data and application
security and privacy (CODASPY '13). ACM, New York, NY, USA, 293-300.
DOI=10.1145/2435349.2435391 http://doi.acm.org/10.1145/2435349.2435391
• Zimmerman, D. L., Pavlik, C. (2008). Quantifying the effects of mask metadata disclosure and
multiple releases on the confidentiality of geographically masked health data. Geographical Analysis
40.1, 52 (25).
A Lifecycle Approach to Information Privacy
This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
----- Meeting Notes (12/14/12 15:33) -----Common - law -- no probability , fail by showing lack direct of harm Public corporation data breaches -- stock law