1. A case for “open-ended”
data
Srinath Srinivasa, Web Science Lab, IIIT Bangalore
sri@iiitb.ac.in
2. Open-data concerns
Utilization of data in a social system is influenced
by three primary concerns: Transparency, Privacy
and Security
Open-data initiatives (like data.gov.in) focus on
data elements that promote transparency, and
exclude data that infringes on privacy (PII) and/or
are sensitive towards (national) security.
Transparency
Privacy Security
3. Open-data concerns
Data elements that are critical for transparency
concerns are called “open data.”
Data elements that can potentially compromise
collective security and have to be tightly controlled,
are called “closed data.” These are typically managed
in the form of shared secrets.
Private data is critical to the safety and well-being of
individuals. But it may sometimes need to be
disseminated in an “open-ended” fashion (i.e. not in
the control of the owner of the data.)
Transparency
Privacy Security
Open
data
Open-
ended
data
Closed data
(shared
secrets)
4. Open-ended data
Private data that may need to be disseminated in
an “open-ended” fashion
Open-ended means:
● Owner of data may not have knowledge of
all recipients
● Owner of data may not be able to
unilaterally control dissemination
Examples:
Dissemination of Aadhar details of a person to
different state and non-state stakeholders by
organizations
Sharing of medical records among hospitals
Sharing of exam records among universities
Open-ended data dissemination is critically dependent on the data
dissemination framework and the credibility of its decisions
5. Regulations for private data dissemination
EU GDPR
● Right to access
● Right to be forgotten
● Privacy by design
● Data protection officers
● Breach notification regulations
● Data portability rights
Indian data protection act (white paper)
● Technology agnosticism
● Holistic application (uniformity of legal
framework)
● Informed consent
● Data minimization (no soliciting extraneous
data)
● Controller accountability
● Structured enforcement
● Deterrent penalties
6. Characterizing Data Utility
Context of Utility
Utility of data is typically bounded within specific
contexts. Taken out of context, the data
element(s) may lose their utility.
Stakeholder capacity
Utility is not an objective characteristic of data --
but a characteristic of the association between
the data and the stakeholder capacity.
Divergent Aggregation of Utility
A given collection of data elements may be
aggregated in different ways for different utilitarian
contexts. There is no “one” correct aggregation.
Confounding of Utilities
The utilization of data by one stakeholder may
(positively or negatively) impact other stakeholders.
7. Characterizing Data Utility (Examples)
Context of Utility
Applicability of 5% GST is limited to specific
contexts (restaurants, not even catering).
Stakeholder capacity
Data about JEE cut-off marks for admission may
be useless to a layperson, but very utilitarian to a
student applying for engineering.
Divergent Aggregation of Utility
Open data about weather can be utilized in different
contexts for different purposes (agriculture, aviation,
traffic management, etc.)
Confounding of Utilities
Utilizing data about a person’s medical condition by
an advertiser may result in negative utility for the
person. (The Target pregnancy ad example).
8. Many Worlds on a Frame (MWF)
A knowledge representation framework for
publication and open-ended dissemination of
private data.
Essential building blocks:
● Worlds
● Actors
● Resources
XIIT
Raju
Raju
Role
Table
NIRF
toRole
from:Raju
to:Role
from:XIIT
9. Many Worlds on a Frame
Resource
● Refers to all forms of data elements that are
published and consumed in a technology
agnostic fashion
Actor
● Refers to consumers or producers of data.
May be a human user or an application. All
actors have login credentials or access keys
to enter the Frame
World
● Refers to a semantic boundary in which
certain data are relevant, and can be
published and consumed by legitimate
actors in appropriate capacity
10. Many Worlds on a Frame
Actors and Worlds
Every actor has an associated world with the same name
Actors publish and consume data only from their worlds
Data flow between worlds managed by worlds “participating” or
“playing roles” in other worlds
Raju
Raju
to:XIIT
11. Many Worlds on a Frame
Participations
A world participating in another world, is said to
be playing a “role” in the other world.
Each Role definition exports an “Interface” that
can be used to publish or consume data via that
role.
When data elements are published or accessed via
a role, then that operation is said to have taken
place in the “capacity” of that role.
World
Role Table
Role | Interface | Players
Privileges Table
Role | Constraints | Privileges
12. Many Worlds on a Frame
Participations Example
XIIT participates in NIRF in the role of “Affiliate Institution”
Through this role, it can interact with NIRF data using the
following interfaces: getRankData(), uploadApplication()
XIIT also participates in the role of “Mentor Institution” in
NIRF using which, it can access the following interfaces:
getMembers(), uploadReview()
XIIT can hence interact with data in the NIRF world in two
capacities: Affiliate Institution and Member Institution,
with different privileges.
NIRF
Role Table
Role | Interface | Players
Affiliate | getRankdata() | XIIT
Instt uploadApplication()
Mentor | getMembers() | XIIT
Instt uploadReview()
13. Many Worlds on a Frame
Participations
Every Role is associated with it, a set of “privileges” and
“constraints”
Constraints are represented in the form of required participations.
Example: the role “Affiliate Institution” in NIRF may have the
constraint “Recognized Institution” in the world called UGC. That
is, only worlds that are “Recognized Institutions” in UGC are
eligible to play the role of “Affiliate Institution” in NIRF.
The set of privileges cover various aspects of the system
operations like, create worlds, edit worlds, add data, read data,
delete data, represent worlds, grant privileges, etc.
World
Role Table
Role | Interface | Players
Privileges Table
Role | Constraints | Privileges
14. Many Worlds on a Frame
Representations
Actors (users or application programs) are associated with their
own worlds, which they represent fully
Based on the roles they play in other worlds, they may represent
those worlds in its participation
Example: Raju plays the role of Director in world XIIT. The
Director role (highlighted in Red) allows Raju to access the NIRF
world in the capacity of “Mentor Instt” by acting as a
representative of XIIT. Raju (the user or application program)
now has access to the interfaces for “Mentor Instt” exported by
NIRF. Bala, who plays the role of Dean at XIIT, can access NIRF in
the capacity of “Affiliate Instt” by representing XIIT.
XIIT
Privileges Table
Role | Constraints | Privileges
Admin | | :all
Chairman| | :represent(:all)
Director | | :represent(Mentor Instt, NIRF)
Dean | | :represent(Affiliate Instt, NIRF)
Raju
Raju
to:XIIT
Director
Bala
Bala
to:XIIT
Dean
15. Resource Tagging
The simplest interface for a Role are get() and
put() functions.
The get() function for role_id r in world w, gets all
resources from the target world that are tagged
to:r in w, and will be locally tagged as from:w
The put() function for role_id r, uploads all
resources to the target world, which are locally
tagged as to:r
Many Worlds on a Frame
Bots
Bots are virtual actors associated with worlds that
can represent the world in some or all roles.
The function of bots is to represent the world in all
other worlds where it is playing a role, by calling
the interface functions.
16. Many Worlds on a Frame
Worlds can be located-in or contained-in another world --
different from playing a role
Containment has following semantics. If world w is
contained in world c then:
● All role players of c are entitled to at least the same
roles and privileges in w
● If world c is inaccessible or invisible for actor a, then
w and all worlds contained in c are also inaccessible
or invisible to a.
For any installation of MWF, there is an overarching
container world (usually called UoD or Universe of
Discourse).
IISc
NIAS
17. MWF Grid
An MWF grid is created over multiple installations
or “sites”
The main site has the UoD which is not contained
in any other world
All other sites (called grid nodes) have their
top-most container world, itself being contained
in one of the existing worlds in an existing site.
Main site
Grid node
UoD
W
18. Provenance
All member sites of an MWF grid are part of a distributed
ledger system (blockchain) that maintain a copy of
transaction logs
Each transaction entry contains at least the following
information:
● Nature of the transaction
● World(s) involved in the transaction
● Resource(s) involved in the transaction
● Actor(s) involved in the transaction
● Capacity in which the transaction was performed
● Outcome of the transaction
Image Source: Wikipedia
19. MWF and GDPR
● Right to access
○ Actors publish data in their own worlds and
provide access by means of playing roles.
(Further dissemination of their data
currently only accessible via transaction
logs)
● Right to be forgotten
○ While worlds can discontinue their roles,
MWF (as yet) does not factor right to be
forgotten of older data
● Privacy by design
○ Check
● Data protection officers
○ Implemented by means of roles
● Breach notification regulations
○ Can be implemented on top of provenance
logging
● Data portability rights
○ Applies naturally to MWF since all data
pertinent to a person are managed in their
world and can be ported based on their
participations
20. MWF and Indian Data Protection Act
● Technology agnosticism
○ Check (MWF is a formal, technology
agnostic model)
● Holistic application
○ Check (common framework for different
kinds of worlds)
● Informed consent
○ Check (User data stored in their world, and
shared based on participation through
informed consent)
● Data minimization (no soliciting extraneous
data)
○ Check (Role interfaces)
● Controller accountability
○ Check (Enforceable by logging capacity and
provenance)
● Structured enforcement
○ Check (World containment provides
scalable semantics for structured
enforcement and jurisdictions)
● Deterrent penalties
○ Can be implemented as a layer over MWF
21. Conclusions
Three concerns of data sharing: Transparency, Privacy and Security leads to three
modalities of openness: Open, Open-ended and Closed data
MWF as a scalable formalism for open-ended dissemination of data
Current projects implementing MWF:
● RootSet (http://wsl.iiitb.ac.in/kb/)
○ Single node implementation of deprecated version of MWF
● Sandesh (http://wsl.iiitb.ac.in/sandesh-web)
○ Single node MWF as an underlying formalism for semantic integration of open data
● Open City
○ Ongoing PoC project using MWF as a data-exchange platform for smart city
implementations