The creators of technical infrastructure are under social and legal pressure to comply with expectations that can be difficult to translate into computational and business logics. The dissertation presented in this talk bridges this gap through three projects that focus on privacy engineering, information security, and data economics, respectively. These projects culminate in a new formal method for evaluating the strategic and tactical value of data. This method relies on a core theoretical contribution building on the work of Shannon, Dretske, Pearl, Koller, and Nissenbaum: a definition of information flow as a channel situated in a context of causal relations.
User Guide: Orion™ Weather Station (Columbia Weather Systems)
Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics
1. Context, Causality, and Information Flow:
Implications for Privacy Engineering, Security,
and Data Economics
A presentation of doctoral dissertation research by
Sebastian Benthall
UC Berkeley School of Information
2. Outline
● Motivation
● Overview of Projects
● Project #1: Contextual Integrity through the Lens of Computer Science
● Disciplinary Bridge
● Project #2: Origin Privacy: Causality and Data Protection
● Project #3: Data Games and the Value of Information
● Concluding remarks
THIS IS A FUN INTERACTIVE TALK:
An image slide means it is time to ask a question.
I’ll answer one question per picture.
15. Social Norms
Market
Law
Technology
Project #1
“Contextual Integrity
through the Lens of
Computer Science”
CS, Statistics, EE
Law Social Philosophy
Contextual Integrity
Economics
...a typical I School dissertation...
Surveys the use of Contextual Integrity, a theory of privacy norms, in Computer Science.
Identifies theoretical gaps in CI and opportunities for innovation in privacy CS.
16. Social Norms
Market
Law
Technology
Project #2
“Origin Privacy:
Causality and Data
Protection”
CS, Statistics, EE
Law Social Philosophy
Economics
...a typical I School dissertation...
Identifies a information flow restrictions in law based both on semantics and origin.
Resolves this ambiguity through theoretical contribution: situated information flow.
Shows application of this contribution to information security in embedded systems.
17. Social Norms
Market
Law
Technology
Project #3
“Data Games and the
Value of Information”
CS, Statistics, EE
Law Social Philosophy
Economics
...a typical I School dissertation...
Invents data economics to fill gap in economic theory.
Theoretical contribution: data games, mechanism design for information flow.
Answers: what is the value of information? Demonstrates with several examples.
21. Contextual Integrity through the Lens of computer science
Sebastian Benthall
Seda Gürses
Helen Nissenbaum
A presentation of S. Benthall, S. Gürses and H. Nissenbaum. Contextual Integrity through the Lens of Computer
Science. Foundations and Trends in Privacy and Security, vol. 2, no. 1, pp. 1–69, 2017
Included as Chapter 2 of Sebastian Benthall’s doctoral dissertation titled.
“Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data
Economics”
22. Project Goals
● Characterize different ways various CS efforts have interpreted and applied
Contextual Integrity (CI);
● Identify gaps in both contextual integrity and its technical projection that this body
of work reveals;
● Distill insights from these applications in order to facilitate future applications of
contextual integrity in privacy research and design.
“Making CI more actionable for computer science and computer scientists.”
23. Background: What is Contextual Integrity?
A social philosophy of privacy developed by Helen Nissenbaum.
● Privacy is appropriate information flow.
● Appropriateness depends on social context;
social contexts have information norms.
● Norms are adapted to societal values, contextual purposes, and individual ends.
● Norms are structured with five parameters:
○ (1) Sender, (2) Receiver, (3) Subject, (4) Attribute, (5) Transmission Principle
Example: In the health care context, there is a norm that when (1) a patient gives
information about (3) their (4) health to (2) a doctor, that information is treated
confidentially.
24. Background: Context in computing and policy
● Contextual Integrity:
○ Privacy as appropriate information flow according to contextual norms. First paper: 2004..
○ Uptake in computer science since 2006.
● Context in ubiquitous computing
○ An earlier computer science research tradition, pioneered by e.g. Dey in 2001 is also concerned
with privacy
○ “Context” refers to a situation: facts about the user, computer, environment. Location, identity,
state…
○ Dourish (2004) publishes a critique, arguing for interactional (not representational) context in
UbiComp.
● Context in policy
○ Excitement about privacy (FTC, White House, WEF) as respect for context motivates computer
science interest in Contextual Integrity...
○ … but within CS, multiple traditions are blended together.
25.
26. Study: research method
● Developed analytic template based on research questions.
● Searched for CS papers that claim to be using CI. (We found 20)
● Applied analytic template systematically to each paper.
● Used results to derive answers to each research question.
A systematic review of computer science literature using Contextual Integrity.
27. Study: research questions
● RQ1. For what kind of problems and solutions do computer scientists use CI?
○ Particular subfields of CS.
● RQ2. How have the authors dealt with the conceptual aspects of CI?
○ Social contexts, norms with specific parameters…
● RQ3. How have the authors dealt with the normative aspects of CI?
○ Norms are derived from social contexts, which are adaptations of a differentiated society.
● RQ4. Do the researchers expand on CI?
○ Where do CS researchers need to fill gaps or add to CI to make concrete systems work?
28. Results: RQ1 Architecture
CS researchers used CI across a few classes of technical architecture.
● User interfaces and experiences. These focus on an individual user’s activity
and preferences, rather than social norms.
● Infrastructure. Catering to a large set of users and diverse applications.
○ Social platforms. Technology that spans multiple social contexts.
○ Technical platforms. Technology that mediates many different other technologies. What about
the operators of these platforms?
○ Formal models. Frameworks to be used in design, but without implementation details.
● Decentralization. Decentralized architectures mirror complexity of society
itself. An interesting area for future research.
29. Findings: RQ1 Architecture
Theoretical Gaps:
- “Modular Contextual Integrity”,
faceting CI and giving guidelines
for design and research at specific
levels of the technical stack
- Specific guidance for
infrastructure design
Calls to Action:
- Be explicit about how system is
situated among other actors
(operators, moderators, etc.)
- Develop formal models that
connect user preferences with
contextual norms
30. Results: RQ2 What did they mean by context?
CS researchers had varying understandings of ‘context’’; e.g. sphere vs. situation.
Substantiality Abstract: Hospitals in general. Concrete: Mount Sinai Beth Israel hospital.
Domain Social: A classroom with a teacher and
students is a social context.
Technical: A language education mobile app.
Valence Normative: A conference Code of Conduct
is an account of norms inherent in a context.
Descriptive: A list of attendees, keynote
speakers, and program committee members
is a description of the context.
Stability
(Dourish, ‘04)
Representational: The Oval Office in the
White House is an easily represented
context..
Interactional: A flash mob is an interactional
context.
31. Findings: RQ2 Contexts
Theoretical Gaps:
- CI needs an account of how social
spheres connect to sociotechnical
situations
- What about interactional
contexts?
Calls to Action:
- Specifically address how ‘context’
is used, and when technology
bridges two or more meanings of
the term
- Detail flows of information to
third parties; what context is
that?
32. Results: RQ3 Source of Normativity
CI is specific about where norms come from: social adaptation to ends, purposes,
and values within differentiated spheres of society.
CS papers have not adopted this source of normativity entirely. Instead, they use:
● Compliance and Policy. System is designed to comply with existing laws and
policies.
● Threats. System is designed with a Threat Model, typical of security research.
● User preferences and expectations. Individual user preferences and/or
expectations solicited.
● Engagement. Users interact with system to determine norms dynamically.
33. Findings: RQ3 Normativity
Theoretical Gaps:
- Connect CI’s metaethical theory
with concrete sources of
normativity familiar to CS
- Spheres to threats?
- Spheres to user expectations?
- Spheres to the law?
Calls to Action:
- Measuring norms, not
expectations
- Supporting user engagement
around identifying norms
- Technical solutions for handling
conflicts over norms
34. Results: RQ4 Expanding CI
● Technological adaptation to changing social conditions.
● Technology operating in multiple contexts at once, or addressing context clash,
where activity in different contexts interact.
● Addressing the temporality and duration of information, and its effect on
privacy
● User decision making with respect to privacy and information flow controls.
35. Findings: RQ4 Expanding CI
Theoretical Gaps:
- Develop account of normative
change and adaptation
- Address the questions around
multiple interacting contexts
- Address time: duration of
information, forgetting, etc.
- What about user choice?
Calls to Action:
- More modeling CI from
information theory, information
flow security
- CI and differential privacy?
38. Bridge: Themes from Project #1
Contextual Integrity needs to be expanded...
● ...to account for social and technological platforms that span multiple social
spheres, perhaps by introducing an “operator” context.
● ...to account for more of the meanings of “context” that range from abstract
social spheres to concrete sociotechnical situations.
● ...for clarity on how social norms form to reflect ends, purposes, and values in
society, and the relationship between these norms and the law.
● ...to address the challenging cases where multiple social contexts collide or
clash.
39. What we get…
… life and technology make things
complicated.
Bridge: Dealing with context collisions
What society wants from privacy...
Professional Personal
Med Edu Fin
Professional Personal
Med
Edu
Fin
40. The problem with information semantics
Contextual Integrity says there are five parameters of an information norm:
Sender, Receiver, Subject, attribute, and Transmission Principle.
[Patient, Doctor, Patient, Health, Confidentiality]
But... information topics are indeterminate. E.g.:
41. What does information mean?
● In CI, information gets its meaning from its context: how actors in roles
normatively communicate with each other.
○ The meaning of information and the contextual practices are mutually constitutive.
● When information flows in a new way (between situations), that information
gets new meanings.
○ E.g. When your relatives see a Facebook post intended for friends, it gives them the opportunity
to make judgments about you that were not the intended meaning.
● Technical context collapse is challenging not because it violates norms, but
because it is beyond our social understanding but creates information flows
with new social meaning that may be disruptive to social life.
43. What does “information” mean?
According to Dretske (1981) (epistemology, philosopher of mind)
building on Shannon (1948), information is a naturalistic and causal
property:
Information that P is the message/signal needed for a suitably equipped
observer to learn P, due to the nomic associations of the signal with P.
Nomic means “law-like”, as in scientific law.
The red light carries the information that the train is coming because
(lawfully, regularly) the red is light if and only if the train is coming.
44. In the following projects we will update Dretske’s theory.
Using insights from statistics and computer science, we
will arrive at a specific formal concept of
situated information flow
for cross-disciplinary use.
45. Bayesian Networks
Bayesian Networks (BN) are a formalism for
representing the relationship between random events.
A BN has:
● A directed, acyclic graph of nodes, representing random variables, connected
by edges
● A conditional probability distribution (CPD) for each node, which is the
probability distribution of its random variables, conditional on its parent.
Together. these define a joint probability distribution over all the random variables,
with some important independence relations qualitatively inferable from the graph.
A
C
D
B E
46. What is information flow, really?
Pearl’s (2000) system for understanding causality is widely
acknowledged and applied in statistics, philosophy, machine learning,
cognitive psychology, social science research methods, …
Events are part of a causal
structure represented as a
directed acyclic graph.
This structure determines the
conditional dependency
of events on each other.
Recession Earthquake
Burglary
Alarm
47. What is information flow, really?
The alarm carries information about earthquakes, burglaries, and
recessions. (Topics are indeterminate).
In this model, the recession
and earthquakes are
conditionally independent.
I(Recession, Earthquake) = 0
(Carry no information
about each other;
have no mutual information.)
Recession Earthquake
Burglary
Alarm
48. What is information flow, really?
The alarm carries information about earthquakes, burglaries, and
recessions. (Topics are indeterminate).
In this model, the recession and earthquakes
are conditionally dependent
if we know the alarm has gone off. Recession Earthquake
Burglary
Alarm
49. Information flow: a unified model
1. Privacy is appropriate information flow.
(Nissenbaum)
2. Information flow is a message or signal from which
something can be learned because of nomic association.
(Dretske)
3. The nomic associations are the conditional
dependencies derived from causal structure. (Pearl)
The meaning of data is a function of the processes that
generated it, and their context.
51. Causality
What makes these causal models is Pearl’s do-calculus: an intervention on an event
severs the links from its parent nodes.
An intervention can made by anything exogenous to the model.
Recession Earthquake
Burglary
Alarm
do
52. Causality
“But this isn’t causality!
What about Rubin, treatment effects, randomized controlled experiments, ….”
- an economist in the audience
Pearlian causation fits how we experience and reason about causality (e.g., Sloman).
Interventionist causation has support from philosophers (e.g., Woodward).
It is compatible with other methods of causal inference and model fitting (e.g.,
Gelman).
It is used widely in social sciences like demography and sociology (e.g., Elwert).
It is the consensus view. We should use it!
56. Origin Privacy: Causality and Data Protection
Sebastian Benthall
Anupam Datta
Michael Tschantz
A technical report.
Included as Chapter 4 of Sebastian Benthall’s doctoral dissertation titled.
“Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data
Economics”
57. Origin Privacy: Highlights
● Policy motivations:
○ Origin and topic information flow restrictions in the law
○ Bounded information processing systems
● Use the theory of situated information flow
● Model a system embedded in its environment
● This uncovers an interesting class of security threats due to a confusion
between caused and embedded inputs.
(There’s a lot more in the chapter…)
58. Origin Privacy: Policy requirements: HIPAA
The HIPAA Privacy Rule defines psychotherapy notes as
notes recorded by a health care provider who is a mental health professional
documenting or analyzing the contents of … [a] counseling session
Psychotherapy notes are more protected than other protected health information,
intended only for use by the therapist.
These restrictions are tied to the provenance of the information: the counseling
session.
59. Origin Privacy: Policy requirements: GLBA
The Privacy Rule protects a consumer's "nonpublic personal information" (NPI).
● any information an individual gives you to get a financial product or service (e.g.,
name, address, income)
● any information you get about an individual from a transaction involving your
financial product(s) or service(s) (for example, the fact that an individual is your
consumer or customer),
● any information you get about an individual in connection with providing a financial
product or service (for example, information from court records or from a consumer
report).
There are origin requirements, but also more general subject requirements.
(from FTC.gov)
60. Claim: policies use origin and meaning based information flow restrictions in an
ambiguous way because:
(1) real information flow is situated
(2) the causal context determines:
- The origin
- The nomic associations (meaning)
Policy ambiguity problem solved!
Pregnancy
Purchases
Advertisements
61. Policy requirements: PCI DSS
“The PCI DSS security requirements apply to all system components included in or
connected to the cardholder data environment. The cardholder data environment
(CDE) is comprised of people, processes and technologies that store, process, or
transmit cardholder data or sensitive authentication data.”
PCI DSS determines the domain in which it applies in terms of the physical
connections between components.
Could PCI DSS be enforced or complied with if it applied to system components
unconnected from the CDE?
70. The point
● Probabilistic modeling of situated information flows can express both an
information processing system and its environment.
● Different security properties can be mapped onto this model and onto different
model conditions (interventions, observations)
● This gives us a fine-grained way to do compliance engineering.
73. The story so far...
● Social expectations of privacy may be expressed as norms of information flow
indexed to social spheres (Contextual Integrity)...
● … but our situation today is messy; our contexts collide because of our technical
infrastructure.
● Our complex situation means that we have lost control over what our
information means. Topics (part of the structure of norms) are indeterminate.
● Moving forward, we should scaffold our theory of privacy with situated
information flow, and build up to normative theory.
74. Data Games and the Value of Information
Goals (narrow version):
● Address the problem raised by Chapter 1 about how to model and design for
cross-context information flows in infrastructure…
● ...using the insights about situated information flow…
● ...to understand the economic impact of data protection laws.
76. U.S. Data Protection Laws
● Intellectual property laws
○ Since Feist v. Rural (1991), data (facts) are not protected by copyright.
○ Samuelson (2000) argues intellectual property won’t work for privacy because property is
alienable, but privacy rights aren’t.
● Confidentiality and sectoral privacy laws. HIPAA, GLBA, FERPA,
attorney-client privilege.
○ All tied to specific sectors or spheres.
○ They do reduce information flow outside of the situations where they apply.
○ But they do not regulate “the gaps”
● FTC notice-and-consent self-regulatory standard
○ Paradox: the more technically and legally detailed the notices, the more ignorant the consent!
○ Nobody thinks this is working.
77. E.U. General Data Protection Regulation
● It’s based on privacy rights. (Compare with IP)
○ Sort of like a property right, but different.
● Omnibus. It covers all the cases. (Compare with sectoral laws)
● New obligations protect the rights:
○ Data minimization says don’t keep or process data for no agreed upon reason.
○ Also some general exceptions to data protection, which may erode the protections...
● Consent is given to particular purposes of use.
○ A purpose is less complicated than legalease or technical data flow, so better notices?
Purpose-binding in the GDPR is reminiscent of Contextual Integrity, but is based on
rights not norms.
78. Law and economics for data?
● There is an important legal tradition of law and economics, using economic
theory to inform legal judgments.
● Do we have one for the data economy?
● To better design data protection policy (and antitrust? etc.), we need economic
theory that captures the economic impact on everyone involved (data
processors, data subjects, and others)...
79. Data Games and the Value of Information
Goals (real agenda):
● Using the insights about situated information flow…
● … develop a new tool, data games, for understanding the value of information ...
● … to start a new field of inquiry, data economics, that can better understand the
foundational principles of the information economy!
“Surely, that has been done before,” you say.
80. Contextualism in privacy economics
● “The Economics of Privacy,” by Acquisti, Taylor, and Wagman (2016) surveys
the existing privacy literature.
● They judge that economics can only ever deal with privacy in a contextually
specific way.
● While this sounds nice, it runs into the same problem as CI! Namely, …
● We know the most important practice in the data economy is date reuse, i.e., use
of data collected in one context for another!
81. Something new!
● We need a new data economics! Really!
● We need a way to model the outcomes of creating and destroying information
flows between strategic actors.
● The difference in outcome for each actor is the value of information.
82. We need a new tool to measure the value of information..
We will start with situated information flow
and add features for game theory
and mechanism design.
We can call this new tool a
data game
83. Multi-Agent Influence Diagrams (MAIDs)
MAIDs: Bayesian Networks + game theory. (Daphne Koller & Brian Milch, ‘03)
They have a set of agents and a directed acyclic graph with three kinds of nodes:
Chance variables. Random variables with a CPD conditioning on their
Parents in the graph.
Decision variables. The have an associated player. They
do not have a CPD (this is chosen by the player later).
Utility variables. These have a CPD and an associated player. They
may not have any descendants.
X
Y
Z
84. Multi-Agent Influence Diagrams (MAIDs)
To “play” a MAID:
1) Each player simultaneously chooses a CPD for
every decision node they control. This is their
strategy profile, σ.
2) The strategies induce the MAID into a Bayesian
Network, which is sampled.
3) The sum of the sampled values of each player’s
utility variables are awarded as payoffs.
W
X1
Z1
X2
Z2
Y
86. Data Games: Optional Edge
An optional edge (dotted arrow) implies the
diagram represents two different games, an open
and a closed case.
Note that the edge is an information flow.
We can now reason systematically about
consequences of allowing an information flow.
(This is mechanism design).
W
X1
Z1
X2
Z2
Y
87. Let’s look at two data games
for economic contexts that are well-understood
88. Example: Principal/Agent
V B
Up
Ua
Earliest privacy economics argument (?) by Richard Posner (1981):
Employers depend on information about potential hires (V) to make efficient
decisions (B).
More privacy means less information means less efficiency.
89. Example: Principal/Agent
E(U) Open Closed
Principal (E(X | X > w) - w) P[X > w] (x - w)[E(X) > w]
Agent w P[X > w] w[E(X) > w]
Agent, x > w w w[E(X) > w]
Agent, x < w 0 w[E(X) > w]
V B
Up
Ua
90. Example: Price discrimination
V R B
Uf
Uc
An important economic use of personal information is price differentiation (Shapiro
and Varian, 1998).
The firm chooses its price (R) with or without knowledge of the consumer’s demand
(V). The consumer chooses whether or not to buy (B) after getting the price.
91. Example: Price discrimination
E(U) Open Closed
Firm x - ϵ z* P[X > z]
Consumer ϵ (x - z*)P[X > z]
Consumer, x > z* ϵ x - z*
Consumer, x < z* ϵ 0
z* = argmaxz
E[z P(z < x)]
V R B
Uf
Uc
92.
93. Let’s look at two data games
for economic situations that have not yet been studied
94. Example: Expert Services
V
C
R
A
Uf
Uc
W The expert knows some specialized facts
about the world (W) (i.e., medicine, law, the
web).
The client’s personal character traits (C),
determine the value of each of m actions.
The expert, who may or may not know the
character traits, provides a recommendation
(R). The consumer takes an action (A).
The incentives of the expert and the client
are aligned (no conflicts of interest).
95. Example: Expert Services
V
C
R!
A
Uf
Uc
W First observation:
If the domain of R allows for enough bits (in
the Shannon sense) for the expert to encode
all the information in W, then personalization
is irrelevant.
This demonstrates a fundamental link
between Shannon information theory and
data economics: information bottlenecks
matter.
96. Example: Expert Services
V
C R
A
Uf
Uc
W First observation:
If the domain of R is narrow compared to the
expert knowledge W, then personalization
(an open edge C -> R) does improve outcomes
for the expert and client.
The value of a personalized service is
efficient dissemination of information
through a small channel.
99. Cross-context flows: the point
Data is valuable not as a good, but as a strategic resource.
It’s not consumed; it is part of the structure of the game of social and economic
relations itself.
Market externalities are the rule, not the exception.
Traditional theories of market equilibrium, industrial organization,
etc. are not going to cut it. We need to start the field of data
economics.
100. Tactical vs. Strategic Flows
● When considering an optional information flow, we can compare the
equilibrium outcomes of the open and closed cases. Call this the strategic
consequences of the information flow.
● We can also consider the outcome of opening the flow, while keeping the
closed equilibrium strategy for all players except the recipient of the
information. Call this the tactical consequences of the flow.
We can use this to make sensitive distinctions about, e.g. the effects of a data breach
vs. the chilling effects of ongoing surveillance. A data economics for cybersecurity?