SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Propagating Data Policies
A User Study
1
Enrico Daga (The Open University, UK)
Mathieu d’Aquin (Insight Centre, NUI Galway, Ireland)
Enrico Motta (The Open University, UK)
K-­‐CAP2017	
  
The	
  9th	
  Interna4onal	
  Conference	
  on	
  Knowledge	
  Capture	
  
December	
  4th-­‐6th,	
  2017Aus4n,	
  Texas,	
  United	
  States	
  
City Data Hubs
2
Smart	
  Bins	
  to	
  make	
  
garbage	
  collec2on	
  more	
  efficient
Monitor	
  parking	
  spaces	
  to	
  
support	
  ci2zens’	
  mobility
Observe	
  busyness	
  of	
  places	
  to	
  
be=er	
  tune	
  services
Forecast	
  car	
  accidents	
  to	
  improve	
  
drivers’	
  awareness
MK:Smart	
  is	
  an	
  integrated	
  innova4on	
  
and	
  support	
  programme	
  leveraging	
  
large-­‐scale	
  city	
  data	
  to	
  drive	
  growth	
  in	
  
Milton	
  Keynes	
  (UK).
hPps://datahub.mksmart.org
Delivery
Onboarding
Processing
Acquisi4on
Data	
  Hub
It is a loop!
Feedback:	
  @enridaga	
  	
  
Policy Propagation
In City Data Hubs publisher, processor and consumer are different.
3
How to support data managers on making aware developers
about the policies an output might derive from the data sources?
Feedback:	
  @enridaga	
  	
  
4
Policy Propagation
propagates(duty:attribution,processed into)
…
has(dataset1,permission:DerivativeWorks)
has(dataset1,duty:attribution)
…
ODRL	
  -­‐	
  License	
  and	
  Terms	
  of	
  Use
Datanode	
  -­‐	
  Process
Policy	
  Propaga=on	
  Rules
Focus on the knowledge components:

policies and processes
has(output, duty:attribution)
has(output, permission:commercialise)
…
has(X,P) ⋀ propagates(P,R) ⋀
relation(R,X,Y) → has(Y,P)
These are the
policies derived from
the sources!
Feedback:	
  @enridaga	
  	
  
Previous work
5
Data flow descriptions

“Describing semantic web applications through relations between data
nodes”, Technical Report, 2015
Rules acquisition and management 

“Propagation of policies in rich data flows”, K-Cap, 2015.
Policy knowledge acquisition 

“A Bottom-Up Approach for Licences Classification and Selection”, LeDA-
SWAn, 2015.
Process knowledge acquisition 

“An incremental learning method to support the annotation of workflows with
data-to-data relations”, EKAW, 2016.
Applicability in an end-to-end scenario 

“Addressing exploitability of smart city data”, IEEE ISC2, 2016.
Efficiency of the reasoner 

“Reasoning with Data Flows and Policy Propagation Rules”, Semantic Web
Journal, Special Issue to appear 2017?
Objective
To evaluate the system accuracy and utility:
• Are the assumptions made about the needed components the correct
ones?
• To what extent a system supporting such a task can be accurate?
When it is not, are there any fundamental reasons?
• How difficult is the task? Is this support needed?
6Feedback:	
  @enridaga	
  
User Study Design
A Data Hub manager to take decisions about policy propagation
• having the same knowledge used by the system.
• 10 participants selected among researchers and PhD students
having data analysis skills
• No legal expertise: developers and data managers don’t have that.
• Working in pairs: required to develop an agreement.
• 5 teams: MAPI, ILAN, CAAN, ALPA, NIFR
• Real data sources (MK Data Hub), licences (RDF Licence Database),
and processes (Rapid Miner files found on GitHub.com)
• 5 data journeys designed by mapping data sources and processes
in a narrative: SCRAPE, FOOD, CLOUD, AVG, CLEAN
• Distributed in a latin square: a team for 2 journeys and a journey for
2 teams; at most 1 shared journey between teams.
7Feedback:	
  @enridaga	
  
User Study Design
We developed a tool to assist participants in a data journey:
1. Understand the process (on Rapid Miner)
2. Understand the input datasets (from the MK Data Hub)
3. Understand the input policies (from the RDF Licence Database)
4. Decide what policies shall propagate to the output (1-5 Likert scale)
5. Compare with the automatic reasoner and explain differences
8Feedback:	
  @enridaga	
  
Understand a data journey
9
Table 6.1: Data Journeys (a,b).
(a) SCRAPE
SCRAPE Milton Keynes Websites Scraper.
The content of Websites about Milton Keynes is downloaded. Each web page is processed in order
to select only the relevant part of the content. After each iteration the resulting text is appended to a
dataset. The resulting dataset is modified before being saved in the local data storage.
Datasets Milton Keynes Council Website (UK OGL 2.0), MK50 Website (All rights reserved),
Wikipedia pages about Milton Keynes (CC-By-SA 3.0)
Policies Permissions: reutilization, Reproduction, Distribution, DerivativeWorks. Prohi-
bitions: Distribution, IPRRight, Reproduction, DerivativeWorks, databaseRight,
reutilization, extraction, CommercialUse. Duties: Notice, ShareAlike, Attribution.
Process https://github.com/mtrebi/SentimentAnalyzer/tree/
master/process/scraper.rmp
Teams ILAN, MAPI
(b) FOOD
FOOD Models for Food Rating Prediction.
A lift chart graphically represents the improvement that a mining model provides when compared
against a random guess, and measures the change in terms of a lift score. In this task, two techniques
Understand a data journey
10
Process https://github.com/mtrebi/SentimentAnalyzer/tree/
master/process/scraper.rmp
Teams ILAN, MAPI
(b) FOOD
FOOD Models for Food Rating Prediction.
A lift chart graphically represents the improvement that a mining model provides when compared
against a random guess, and measures the change in terms of a lift score. In this task, two techniques
are compared, namely Decision Tree and Naive Bays. The task uses data about Food Ratings in
information about quality of life in Milton Keynes Wards. The result are two pareto charts to be
compared.
Datasets OpenDataCommunities Worthwhile 2011-2012 Average Rating (UK OGL 2.0), Food
Establishments Info and Ratings (Terms of use)
Policies Permissions: DerivativeWorks, Distribution, Reproduction, display, extraction, reuti-
lization. Prohibitions: modify, use. Duties: Attribution, Notice, display.
Process https://github.com/samwar/tree/master/rapid_miner_
training/16_lift_chart.rmp
Teams NIFR, ALPA
Comparing with system
11
Explanation: blocked
12
Explanation: propagated
13
A discussion on ODRL action dependencies and how they
a↵ect the policy semantics is included in [13]. Nonetheless, to
the best of our knowledge, the first attempt to analyse how
policies can propagate in manipulation processes is the one
presented in one of our earlier papers [6]. In [6] we introduced
Figure 1: Explanation: propagation trace.
Explaining differences
14
Analysis
• Accuracy analysis. To evaluate the system (agreement with users)
• Teams agree the system is right
• Teams agree the system is not right
• Eg: the knowledge base needs to be improved (e.g. rules)
• Teams disagree: we don’t know whether the system is right!
• Thematic analysis. Focusing on the disagreements between users
• From explanations given in the last phase
• From discussions during the study (reaching agreement)
• Grounded Theory Approach
• Questionnaire. To assess the value to users after the study
experience.
15Feedback:	
  @enridaga	
  
Questionnaire
16
Questionnaire
17
produc-
bitions:
y.
miner
d in order
nce score.
license,
med/
ords.
rive, in-
tribute,
nymize.
n sensors
ready for
Weather
her Sta-
erms of
Distri-
ization,
Q.2
Do you think you had enough information to decide?
Yes 2 6 2 0 0 No
Q.3
How di cult was it to reach an agreement?
Easy 1 5 2 2 0 Di cult
Q.4
Somebody with strong technical skills is absolutely required to take this decision.
Do you agree?
Yes 1 8 1 0 0 No
Q.5
Somebody with strong technical skills is absolutely required to take this decision
even with the support of automated reasoning. Do you agree?
Yes 1 5 1 3 No
Q.6
Understanding the details of the process is fundamental to take a decision. Do
you agree?
Yes 6 3 1 0 0 No
Q.7
How enjoyable was it to discuss and decide on policies and how they propagate
in a process?
Very 4 5 0 1 0 Not
Q.8
How feasible/sustainable do you think it is to discuss and decide on policies
and how they propagate in a process?
Feasible 3 1 3 1 2 Unfeasible
Q.9
How sustainable do you think it is to discuss and decide on policies and how
they propagate in a process with the support of our system?
Feasible 5 4 0 1 0 Unfeasible
2
10
6 1
The owner of the input data
The process executor
The consumer of the processed data
They must do it together
Nobody (it cannot be done)
Q.10 Who should decide on what policies propagate to the output of a process?
Accuracy analysis
18
D Number of policies of data sources / Number of decisions
T_avg Agreement with system, average of the T1 and T2
T12 Agreement between T1 and T2
T_12+ Agreement on certain answers
T_1+,T_2+ Amount of certain answers
Permissions
8 20 12
1 5 2
8 15 15
3 13 6
31 56.5
0.6 0.8 0.7 0.6 0.6 0.9 0.5
0.7 1 0.9 0.7 0.2 0.7 0.3
1 0.5 0.8 0.5 1 1 1
0.7 0.5 0.6 0.8 0.2 0.8 0.4
0.8 0.7 0.6 0.7
otals)
T12+ T1+ T2+
0 0 3
6 12 6
0 0 1
0 7 7
2 4 5
8 22.5
(d) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
0.8 0.4 0.6 0.6 0 0 0.6
1 1 1 1 0.5 1 0.5
0 1 0.5 0 N.A. 0 0.5
1 0 0.5 0 N.A. 1 1
0.4 0 0.2 0.6 0.4 0.5 0.6
0.6 0.6 0.4 0.7
otals)
T12+ T1+ T2+
8 8 8
2 4 2
0 4 0
7 7 7
0 5 0
17 22.5
(f) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0 0.5 0.3 0.5 1 1 0.5
1 1 1 1 0 1 0
1 1 1 1 1 1 1
1 1 1 1 0 1 0
0.9 0.9 0.7 0.8
als)
T12+ T1+ T2+
3 3 3
0 4 4
1 1 1
1 1 1
1 4 1
(h) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0.3 0.7 0.5 0 N.A. 0.7 0.7
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 0.3 1 0.3
FOOD 22 14 18 16 14 8 20 12
CLOUD 7 5 7 6 5 1 5 2
AVG 15 15 8 11.5 8 8 15 15
CLEAN 17 12 9 10.5 14 3 13 6
All 77 58 55 31 56.5
0.6 0.8 0.7 0.6 0.6 0.9 0.5
0.7 1 0.9 0.7 0.2 0.7 0.3
1 0.5 0.8 0.5 1 1 1
0.7 0.5 0.6 0.8 0.2 0.8 0.4
0.8 0.7 0.6 0.7
(c) Permissions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 5 4 2 3 3 0 0 3
FOOD 12 12 12 12 12 6 12 6
CLOUD 2 0 2 1 0 0 0 1
AVG 7 7 0 3.5 0 0 7 7
CLEAN 8 3 0 1.5 5 2 4 5
All 34 21 20 8 22.5
(d) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
0.8 0.4 0.6 0.6 0 0 0.6
1 1 1 1 0.5 1 0.5
0 1 0.5 0 N.A. 0 0.5
1 0 0.5 0 N.A. 1 1
0.4 0 0.2 0.6 0.4 0.5 0.6
0.6 0.6 0.4 0.7
(e) Prohibitions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 8 8 8 8 8 8 8 8
FOOD 4 0 2 1 2 2 4 2
CLOUD 4 4 4 4 4 0 4 0
AVG 7 7 7 7 7 7 7 7
CLEAN 5 5 5 5 5 0 5 0
All 28 25 26 17 22.5
(f) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0 0.5 0.3 0.5 1 1 0.5
1 1 1 1 0 1 0
1 1 1 1 1 1 1
1 1 1 1 0 1 0
0.9 0.9 0.7 0.8
(g) Duties (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 3 3 3 3 3 3 3 3
FOOD 6 2 4 3 0 0 4 4
CLOUD 1 1 1 1 1 1 1 1
AVG 1 1 1 1 1 1 1 1
CLEAN 4 4 4 4 4 1 4 1
(h) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0.3 0.7 0.5 0 N.A. 0.7 0.7
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 0.3 1 0.3
during the study, that are discussed
ANALYSIS
ow the decisions made by the users
The decisions taken by the system
3. For example, the SCRAPE data
16 policies and the system decided
4 of the 5 permissions and all the
Tables 4a-4h summarize the results
tative way. The values are shown
e full numbers and the computed
decisions (Tables 4a and 4b), and
s (Tables 4c and 4d), prohibitions
ties (Tables 4g and 4h). The values
ne of the user study (data journey
ed for each data journey (average
as totals considering the decisions
at the bottom). The data journeys
wenty-two policies to be analysed
ven decisions. Table 4a shows the
CLOUD 2 0 2 1 0 0 0 1
AVG 7 7 0 3.5 0 0 7 7
CLEAN 8 3 0 1.5 5 2 4 5
All 34 21 20 8 22.5
0
1
0
(e) Prohibitions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 8 8 8 8 8 8 8 8
FOOD 4 0 2 1 2 2 4 2
CLOUD 4 4 4 4 4 0 4 0
AVG 7 7 7 7 7 7 7 7
CLEAN 5 5 5 5 5 0 5 0
All 28 25 26 17 22.5
T
1
0
1
1
1
(g) Duties (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 3 3 3 3 3 3 3 3
FOOD 6 2 4 3 0 0 4 4
CLOUD 1 1 1 1 1 1 1 1
AVG 1 1 1 1 1 1 1 1
CLEAN 4 4 4 4 4 1 4 1
All 15 12 9 6 11.5
T
1
0
1
1
1
altough this aspect will be discussed w
classes of policies.
Tables 4c and 4d only show result
type permission. The average agreeme
discussed
the users
e system
PE data
decided
d all the
e results
e shown
omputed
4b), and
hibitions
he values
journey
(average
decisions
journeys
analysed
hows the
CLOUD 2 0 2 1 0 0 0 1
AVG 7 7 0 3.5 0 0 7 7
CLEAN 8 3 0 1.5 5 2 4 5
All 34 21 20 8 22.5
0 1 0.5 0 N.A. 0 0.5
1 0 0.5 0 N.A. 1 1
0.4 0 0.2 0.6 0.4 0.5 0.6
0.6 0.6 0.4 0.7
(e) Prohibitions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 8 8 8 8 8 8 8 8
FOOD 4 0 2 1 2 2 4 2
CLOUD 4 4 4 4 4 0 4 0
AVG 7 7 7 7 7 7 7 7
CLEAN 5 5 5 5 5 0 5 0
All 28 25 26 17 22.5
(f) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0 0.5 0.3 0.5 1 1 0.5
1 1 1 1 0 1 0
1 1 1 1 1 1 1
1 1 1 1 0 1 0
0.9 0.9 0.7 0.8
(g) Duties (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 3 3 3 3 3 3 3 3
FOOD 6 2 4 3 0 0 4 4
CLOUD 1 1 1 1 1 1 1 1
AVG 1 1 1 1 1 1 1 1
CLEAN 4 4 4 4 4 1 4 1
All 15 12 9 6 11.5
(h) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0.3 0.7 0.5 0 N.A. 0.7 0.7
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 0.3 1 0.3
0.8 0.6 0.7 0.8
altough this aspect will be discussed when looking at specific
classes of policies.
Tables 4c and 4d only show results involving policies of
type permission. The average agreement between the system
Prohibitions
0 7 7
2 4 5
8 22.5
1 0 0.5 0 N.A. 1 1
0.4 0 0.2 0.6 0.4 0.5 0.6
0.6 0.6 0.4 0.7
otals)
T12+ T1+ T2+
8 8 8
2 4 2
0 4 0
7 7 7
0 5 0
17 22.5
(f) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0 0.5 0.3 0.5 1 1 0.5
1 1 1 1 0 1 0
1 1 1 1 1 1 1
1 1 1 1 0 1 0
0.9 0.9 0.7 0.8
als)
T12+ T1+ T2+
3 3 3
0 4 4
1 1 1
1 1 1
1 4 1
6 11.5
(h) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0.3 0.7 0.5 0 N.A. 0.7 0.7
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 0.3 1 0.3
0.8 0.6 0.7 0.8
be discussed when looking at specific
nly show results involving policies of
verage agreement between the system
ng all the decisions is 0.6. Particularly,
AVG 7 7 0 3.5 0 0 7 7
CLEAN 8 3 0 1.5 5 2 4 5
All 34 21 20 8 22.5
1 0 0.5 0 N.A. 1 1
0.4 0 0.2 0.6 0.4 0.5 0.6
0.6 0.6 0.4 0.7
(e) Prohibitions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 8 8 8 8 8 8 8 8
FOOD 4 0 2 1 2 2 4 2
CLOUD 4 4 4 4 4 0 4 0
AVG 7 7 7 7 7 7 7 7
CLEAN 5 5 5 5 5 0 5 0
All 28 25 26 17 22.5
(f) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0 0.5 0.3 0.5 1 1 0.5
1 1 1 1 0 1 0
1 1 1 1 1 1 1
1 1 1 1 0 1 0
0.9 0.9 0.7 0.8
(g) Duties (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 3 3 3 3 3 3 3 3
FOOD 6 2 4 3 0 0 4 4
CLOUD 1 1 1 1 1 1 1 1
AVG 1 1 1 1 1 1 1 1
CLEAN 4 4 4 4 4 1 4 1
All 15 12 9 6 11.5
(h) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
1 1 1 1 1 1 1
0.3 0.7 0.5 0 N.A. 0.7 0.7
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 0.3 1 0.3
0.8 0.6 0.7 0.8
altough this aspect will be discussed when looking at specific
classes of policies.
Tables 4c and 4d only show results involving policies of
type permission. The average agreement between the system
and the users considering all the decisions is 0.6. Particularly,
Duties
ourneys: System decisions
P ermissions P rohibitions Duties
4/5 8/8 3/3
0/12 4/4 4/6
0/2 4/4 1/1
0/7 7/7 1/1
0/8 5/5 4/4
4/34 28/28 13/15
sion (Q.1, Q.9). This feedback shows
n is a di cult problem, although it
ight knowledge models. Therefore, a
k has good value for users. The last
ant to understand whether the Data
ally decide on policy propagation. It
the users think he/she cannot solve
/she should involve the data owner
in this task. This conclusion reflects
during the study, that are discussed
Table 4: Agreement analysis.
D: total number of decisions; T1, T2: ag
tem and each team; Tavg: average agre
and system; T12: agreement between t
between teams (only Certainly Yes/A
T1+, T2+: amount of Certainly Yes/Abso
Tables on the left indicate totals, while
show the related ratios.
(a) All decisions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 16 15 13 14 14 11 11 14
FOOD 22 14 18 16 14 8 20 12
CLOUD 7 5 7 6 5 1 5 2
AVG 15 15 8 11.5 8 8 15 15
CLEAN 17 12 9 10.5 14 3 13 6
All 77 58 55 31 56.5
T1
0.9
0.6
0.7
1
0.7
(c) Permissions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 5 4 2 3 3 0 0 3
FOOD 12 12 12 12 12 6 12 6
CLOUD 2 0 2 1 0 0 0 1
AVG 7 7 0 3.5 0 0 7 7
CLEAN 8 3 0 1.5 5 2 4 5
All 34 21 20 8 22.5
T1
0.8
1
0
1
0.4
ons
Duties
3/3
4/6
1/1
1/1
4/4
13/15
ack shows
though it
erefore, a
The last
the Data
gation. It
not solve
ata owner
on reflects
discussed
Table 4: Agreement analysis.
D: total number of decisions; T1, T2: agreement between sys-
tem and each team; Tavg: average agreement between teams
and system; T12: agreement between teams; T12+ agreement
between teams (only Certainly Yes/Absolutely No answers);
T1+, T2+: amount of Certainly Yes/Absolutely No answers.
Tables on the left indicate totals, while the ones on the right
show the related ratios.
(a) All decisions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 16 15 13 14 14 11 11 14
FOOD 22 14 18 16 14 8 20 12
CLOUD 7 5 7 6 5 1 5 2
AVG 15 15 8 11.5 8 8 15 15
CLEAN 17 12 9 10.5 14 3 13 6
All 77 58 55 31 56.5
(b) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
0.9 0.8 0.9 0.9 0.8 0.7 0.9
0.6 0.8 0.7 0.6 0.6 0.9 0.5
0.7 1 0.9 0.7 0.2 0.7 0.3
1 0.5 0.8 0.5 1 1 1
0.7 0.5 0.6 0.8 0.2 0.8 0.4
0.8 0.7 0.6 0.7
(c) Permissions (totals)
Journey D T1 T2 Tavg T12 T12+ T1+ T2+
SCRAPE 5 4 2 3 3 0 0 3
FOOD 12 12 12 12 12 6 12 6
CLOUD 2 0 2 1 0 0 0 1
AVG 7 7 0 3.5 0 0 7 7
CLEAN 8 3 0 1.5 5 2 4 5
All 34 21 20 8 22.5
(d) (ratios)
T1 T2 Tavg T12 T12+ T1+ T2+
0.8 0.4 0.6 0.6 0 0 0.6
1 1 1 1 0.5 1 0.5
0 1 0.5 0 N.A. 0 0.5
1 0 0.5 0 N.A. 1 1
0.4 0 0.2 0.6 0.4 0.5 0.6
0.6 0.6 0.4 0.7
All policies
Feedback:	
  @enridaga	
  
Thematic analysis
• Expected disagreements:
A. system should propagate a policy, it didn’t
B. system should block a policy, it didn’t
C. system cannot decide: information is not enough
• (C) never occurred: the system has enough information to make a
decision!
Example FOOD journey: high disagreement between teams, due to
the different interpretation of the output:
(a) an enhanced version of the input (input included in the output)
(b) an independent dataset (input not included in the output)
(More discussion in the paper)
19
Thematic analysis
• Incomplete knowledge: rules missing or wrong, data flows
inaccurate, …
• Data reverse engineering: recurring theme (e.g. FOOD)
• Content-dependent decisions: current experiments assumed
reasoning at design time, although it is also possible to reason on
execution traces
• Dependant policies: permission:modify implies permission:use - but
we haven’t considered that
• The Legal Knowledge: users commented on the lack of legal
expertise.
– a support tool is useful
– a legal framework on top of the components is theoretically
possible, although unlikely in the short term …
20
Conclusions
• The system is accurate as developers/ data managers can be
• variance between the teams similar to the ones between each
team and system (Cohen’s kappa coefficient)
• The task is perceived as difficult, although not impossible, and the
system is therefore of good value for users
• The assumption behind the system is correct: there is a fundamental
correspondence between the possible data-to-data relations and the
way policies are propagated
• (and reuse does not necessarily imply derivation)
21Feedback:	
  @enridaga	
  
Future work
• How to consider the rights of the stakeholders other then data
publishers (e.g. process designer?)
• How to assess consistency of processes wrt policies
• Policy and process knowledge must be accurate and reflect the
relevant data-to-data relations. How can we assure that?
Challenge: “They must do it together”
• Need for shared knowledge about “data actions”, from jargon to
consensus (remodelling, refactoring, extraction, …)
• Negotiation of policies between data providers, processors and
consumers - also a non-technical challenge
22
Thank you
23
Enrico Daga
enrico.daga@open.ac.uk
tw: @enridaga

Contenu connexe

Tendances

What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...Scott Farley
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challengesDilpreet kaur Virk
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckPistoia Alliance
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
AdClickFraud_Bigdata-Apic-Ist-2019
AdClickFraud_Bigdata-Apic-Ist-2019AdClickFraud_Bigdata-Apic-Ist-2019
AdClickFraud_Bigdata-Apic-Ist-2019Neha gupta
 
Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach  Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach IJECEIAES
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataIJSTA
 
The Coming Storage Armageddon: What your Database Administrator Isn’t Tellin...
The Coming Storage Armageddon:  What your Database Administrator Isn’t Tellin...The Coming Storage Armageddon:  What your Database Administrator Isn’t Tellin...
The Coming Storage Armageddon: What your Database Administrator Isn’t Tellin...IBM India Smarter Computing
 

Tendances (18)

rerngvit_phd_seminar
rerngvit_phd_seminarrerngvit_phd_seminar
rerngvit_phd_seminar
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 
A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...A general framework for predicting the optimal computing configuration for cl...
A general framework for predicting the optimal computing configuration for cl...
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Big data issues and challenges
Big data issues and challengesBig data issues and challenges
Big data issues and challenges
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deck
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
AdClickFraud_Bigdata-Apic-Ist-2019
AdClickFraud_Bigdata-Apic-Ist-2019AdClickFraud_Bigdata-Apic-Ist-2019
AdClickFraud_Bigdata-Apic-Ist-2019
 
Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach  Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach
 
B018110610
B018110610B018110610
B018110610
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
 
Big data
Big dataBig data
Big data
 
The GDELT project
The GDELT project The GDELT project
The GDELT project
 
The Coming Storage Armageddon: What your Database Administrator Isn’t Tellin...
The Coming Storage Armageddon:  What your Database Administrator Isn’t Tellin...The Coming Storage Armageddon:  What your Database Administrator Isn’t Tellin...
The Coming Storage Armageddon: What your Database Administrator Isn’t Tellin...
 
big data
big databig data
big data
 

Similaire à Propagating Data Policies: A User Study Accuracy Analysis

Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxjuliennehar
 
Comparative analysis for_ddp_frameworks
Comparative analysis for_ddp_frameworksComparative analysis for_ddp_frameworks
Comparative analysis for_ddp_frameworksElenaEtchemendy1
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Denodo
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs UsageIRJET Journal
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
 
Towards Generating Policy-compliant Datasets
Towards Generating Policy-compliant DatasetsTowards Generating Policy-compliant Datasets
Towards Generating Policy-compliant DatasetsChristophe Debruyne
 
Standard Safeguarding Dataset - overview for CSCDUG.pptx
Standard Safeguarding Dataset - overview for CSCDUG.pptxStandard Safeguarding Dataset - overview for CSCDUG.pptx
Standard Safeguarding Dataset - overview for CSCDUG.pptxRocioMendez59
 
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...Chief Analytics Officer Forum
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDatabricks
 
IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET- A Survey on Mining of Tweeter Data for Predicting User BehaviorIRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET- A Survey on Mining of Tweeter Data for Predicting User BehaviorIRJET Journal
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data Sharjeel Imtiaz
 
Providing highly accurate service recommendation for semantic clustering over...
Providing highly accurate service recommendation for semantic clustering over...Providing highly accurate service recommendation for semantic clustering over...
Providing highly accurate service recommendation for semantic clustering over...IRJET Journal
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
Change Management: The Secret to a Successful SAS® Implementation
Change Management:  The Secret to a Successful SAS® ImplementationChange Management:  The Secret to a Successful SAS® Implementation
Change Management: The Secret to a Successful SAS® ImplementationThotWave
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508okeee
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET Journal
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
 

Similaire à Propagating Data Policies: A User Study Accuracy Analysis (20)

Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docx
 
Comparative analysis for_ddp_frameworks
Comparative analysis for_ddp_frameworksComparative analysis for_ddp_frameworks
Comparative analysis for_ddp_frameworks
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs Usage
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
Towards Generating Policy-compliant Datasets
Towards Generating Policy-compliant DatasetsTowards Generating Policy-compliant Datasets
Towards Generating Policy-compliant Datasets
 
Standard Safeguarding Dataset - overview for CSCDUG.pptx
Standard Safeguarding Dataset - overview for CSCDUG.pptxStandard Safeguarding Dataset - overview for CSCDUG.pptx
Standard Safeguarding Dataset - overview for CSCDUG.pptx
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
Dow Chemical presentation at the Chief Analytics Officer Forum East Coast USA...
 
Democratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn CreatorDemocratizing Machine Learning: Perspective from a scikit-learn Creator
Democratizing Machine Learning: Perspective from a scikit-learn Creator
 
IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET- A Survey on Mining of Tweeter Data for Predicting User BehaviorIRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior
 
Big Data Analytics and Open Data
Big Data Analytics and Open Data Big Data Analytics and Open Data
Big Data Analytics and Open Data
 
Providing highly accurate service recommendation for semantic clustering over...
Providing highly accurate service recommendation for semantic clustering over...Providing highly accurate service recommendation for semantic clustering over...
Providing highly accurate service recommendation for semantic clustering over...
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Change Management: The Secret to a Successful SAS® Implementation
Change Management:  The Secret to a Successful SAS® ImplementationChange Management:  The Secret to a Successful SAS® Implementation
Change Management: The Secret to a Successful SAS® Implementation
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508
 
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...IRJET-  	  Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
IRJET- Improved Model for Big Data Analytics using Dynamic Multi-Swarm Op...
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 

Plus de Enrico Daga

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyEnrico Daga
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...Enrico Daga
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
 
Capturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchCapturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchEnrico Daga
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEIEnrico Daga
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything projectEnrico Daga
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Enrico Daga
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchEnrico Daga
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachEnrico Daga
 
Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Enrico Daga
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterEnrico Daga
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesEnrico Daga
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsEnrico Daga
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionEnrico Daga
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsEnrico Daga
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
 

Plus de Enrico Daga (19)

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Capturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchCapturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities research
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything project
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities research
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
 
Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data Cluster
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tables
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selection
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 

Dernier

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Dernier (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Propagating Data Policies: A User Study Accuracy Analysis

  • 1. Propagating Data Policies A User Study 1 Enrico Daga (The Open University, UK) Mathieu d’Aquin (Insight Centre, NUI Galway, Ireland) Enrico Motta (The Open University, UK) K-­‐CAP2017   The  9th  Interna4onal  Conference  on  Knowledge  Capture   December  4th-­‐6th,  2017Aus4n,  Texas,  United  States  
  • 2. City Data Hubs 2 Smart  Bins  to  make   garbage  collec2on  more  efficient Monitor  parking  spaces  to   support  ci2zens’  mobility Observe  busyness  of  places  to   be=er  tune  services Forecast  car  accidents  to  improve   drivers’  awareness MK:Smart  is  an  integrated  innova4on   and  support  programme  leveraging   large-­‐scale  city  data  to  drive  growth  in   Milton  Keynes  (UK). hPps://datahub.mksmart.org Delivery Onboarding Processing Acquisi4on Data  Hub It is a loop! Feedback:  @enridaga    
  • 3. Policy Propagation In City Data Hubs publisher, processor and consumer are different. 3 How to support data managers on making aware developers about the policies an output might derive from the data sources? Feedback:  @enridaga    
  • 4. 4 Policy Propagation propagates(duty:attribution,processed into) … has(dataset1,permission:DerivativeWorks) has(dataset1,duty:attribution) … ODRL  -­‐  License  and  Terms  of  Use Datanode  -­‐  Process Policy  Propaga=on  Rules Focus on the knowledge components:
 policies and processes has(output, duty:attribution) has(output, permission:commercialise) … has(X,P) ⋀ propagates(P,R) ⋀ relation(R,X,Y) → has(Y,P) These are the policies derived from the sources! Feedback:  @enridaga    
  • 5. Previous work 5 Data flow descriptions
 “Describing semantic web applications through relations between data nodes”, Technical Report, 2015 Rules acquisition and management 
 “Propagation of policies in rich data flows”, K-Cap, 2015. Policy knowledge acquisition 
 “A Bottom-Up Approach for Licences Classification and Selection”, LeDA- SWAn, 2015. Process knowledge acquisition 
 “An incremental learning method to support the annotation of workflows with data-to-data relations”, EKAW, 2016. Applicability in an end-to-end scenario 
 “Addressing exploitability of smart city data”, IEEE ISC2, 2016. Efficiency of the reasoner 
 “Reasoning with Data Flows and Policy Propagation Rules”, Semantic Web Journal, Special Issue to appear 2017?
  • 6. Objective To evaluate the system accuracy and utility: • Are the assumptions made about the needed components the correct ones? • To what extent a system supporting such a task can be accurate? When it is not, are there any fundamental reasons? • How difficult is the task? Is this support needed? 6Feedback:  @enridaga  
  • 7. User Study Design A Data Hub manager to take decisions about policy propagation • having the same knowledge used by the system. • 10 participants selected among researchers and PhD students having data analysis skills • No legal expertise: developers and data managers don’t have that. • Working in pairs: required to develop an agreement. • 5 teams: MAPI, ILAN, CAAN, ALPA, NIFR • Real data sources (MK Data Hub), licences (RDF Licence Database), and processes (Rapid Miner files found on GitHub.com) • 5 data journeys designed by mapping data sources and processes in a narrative: SCRAPE, FOOD, CLOUD, AVG, CLEAN • Distributed in a latin square: a team for 2 journeys and a journey for 2 teams; at most 1 shared journey between teams. 7Feedback:  @enridaga  
  • 8. User Study Design We developed a tool to assist participants in a data journey: 1. Understand the process (on Rapid Miner) 2. Understand the input datasets (from the MK Data Hub) 3. Understand the input policies (from the RDF Licence Database) 4. Decide what policies shall propagate to the output (1-5 Likert scale) 5. Compare with the automatic reasoner and explain differences 8Feedback:  @enridaga  
  • 9. Understand a data journey 9 Table 6.1: Data Journeys (a,b). (a) SCRAPE SCRAPE Milton Keynes Websites Scraper. The content of Websites about Milton Keynes is downloaded. Each web page is processed in order to select only the relevant part of the content. After each iteration the resulting text is appended to a dataset. The resulting dataset is modified before being saved in the local data storage. Datasets Milton Keynes Council Website (UK OGL 2.0), MK50 Website (All rights reserved), Wikipedia pages about Milton Keynes (CC-By-SA 3.0) Policies Permissions: reutilization, Reproduction, Distribution, DerivativeWorks. Prohi- bitions: Distribution, IPRRight, Reproduction, DerivativeWorks, databaseRight, reutilization, extraction, CommercialUse. Duties: Notice, ShareAlike, Attribution. Process https://github.com/mtrebi/SentimentAnalyzer/tree/ master/process/scraper.rmp Teams ILAN, MAPI (b) FOOD FOOD Models for Food Rating Prediction. A lift chart graphically represents the improvement that a mining model provides when compared against a random guess, and measures the change in terms of a lift score. In this task, two techniques
  • 10. Understand a data journey 10 Process https://github.com/mtrebi/SentimentAnalyzer/tree/ master/process/scraper.rmp Teams ILAN, MAPI (b) FOOD FOOD Models for Food Rating Prediction. A lift chart graphically represents the improvement that a mining model provides when compared against a random guess, and measures the change in terms of a lift score. In this task, two techniques are compared, namely Decision Tree and Naive Bays. The task uses data about Food Ratings in information about quality of life in Milton Keynes Wards. The result are two pareto charts to be compared. Datasets OpenDataCommunities Worthwhile 2011-2012 Average Rating (UK OGL 2.0), Food Establishments Info and Ratings (Terms of use) Policies Permissions: DerivativeWorks, Distribution, Reproduction, display, extraction, reuti- lization. Prohibitions: modify, use. Duties: Attribution, Notice, display. Process https://github.com/samwar/tree/master/rapid_miner_ training/16_lift_chart.rmp Teams NIFR, ALPA
  • 13. Explanation: propagated 13 A discussion on ODRL action dependencies and how they a↵ect the policy semantics is included in [13]. Nonetheless, to the best of our knowledge, the first attempt to analyse how policies can propagate in manipulation processes is the one presented in one of our earlier papers [6]. In [6] we introduced Figure 1: Explanation: propagation trace.
  • 15. Analysis • Accuracy analysis. To evaluate the system (agreement with users) • Teams agree the system is right • Teams agree the system is not right • Eg: the knowledge base needs to be improved (e.g. rules) • Teams disagree: we don’t know whether the system is right! • Thematic analysis. Focusing on the disagreements between users • From explanations given in the last phase • From discussions during the study (reaching agreement) • Grounded Theory Approach • Questionnaire. To assess the value to users after the study experience. 15Feedback:  @enridaga  
  • 17. Questionnaire 17 produc- bitions: y. miner d in order nce score. license, med/ ords. rive, in- tribute, nymize. n sensors ready for Weather her Sta- erms of Distri- ization, Q.2 Do you think you had enough information to decide? Yes 2 6 2 0 0 No Q.3 How di cult was it to reach an agreement? Easy 1 5 2 2 0 Di cult Q.4 Somebody with strong technical skills is absolutely required to take this decision. Do you agree? Yes 1 8 1 0 0 No Q.5 Somebody with strong technical skills is absolutely required to take this decision even with the support of automated reasoning. Do you agree? Yes 1 5 1 3 No Q.6 Understanding the details of the process is fundamental to take a decision. Do you agree? Yes 6 3 1 0 0 No Q.7 How enjoyable was it to discuss and decide on policies and how they propagate in a process? Very 4 5 0 1 0 Not Q.8 How feasible/sustainable do you think it is to discuss and decide on policies and how they propagate in a process? Feasible 3 1 3 1 2 Unfeasible Q.9 How sustainable do you think it is to discuss and decide on policies and how they propagate in a process with the support of our system? Feasible 5 4 0 1 0 Unfeasible 2 10 6 1 The owner of the input data The process executor The consumer of the processed data They must do it together Nobody (it cannot be done) Q.10 Who should decide on what policies propagate to the output of a process?
  • 18. Accuracy analysis 18 D Number of policies of data sources / Number of decisions T_avg Agreement with system, average of the T1 and T2 T12 Agreement between T1 and T2 T_12+ Agreement on certain answers T_1+,T_2+ Amount of certain answers Permissions 8 20 12 1 5 2 8 15 15 3 13 6 31 56.5 0.6 0.8 0.7 0.6 0.6 0.9 0.5 0.7 1 0.9 0.7 0.2 0.7 0.3 1 0.5 0.8 0.5 1 1 1 0.7 0.5 0.6 0.8 0.2 0.8 0.4 0.8 0.7 0.6 0.7 otals) T12+ T1+ T2+ 0 0 3 6 12 6 0 0 1 0 7 7 2 4 5 8 22.5 (d) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 0.8 0.4 0.6 0.6 0 0 0.6 1 1 1 1 0.5 1 0.5 0 1 0.5 0 N.A. 0 0.5 1 0 0.5 0 N.A. 1 1 0.4 0 0.2 0.6 0.4 0.5 0.6 0.6 0.6 0.4 0.7 otals) T12+ T1+ T2+ 8 8 8 2 4 2 0 4 0 7 7 7 0 5 0 17 22.5 (f) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0 0.5 0.3 0.5 1 1 0.5 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0.9 0.9 0.7 0.8 als) T12+ T1+ T2+ 3 3 3 0 4 4 1 1 1 1 1 1 1 4 1 (h) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0.3 0.7 0.5 0 N.A. 0.7 0.7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.3 1 0.3 FOOD 22 14 18 16 14 8 20 12 CLOUD 7 5 7 6 5 1 5 2 AVG 15 15 8 11.5 8 8 15 15 CLEAN 17 12 9 10.5 14 3 13 6 All 77 58 55 31 56.5 0.6 0.8 0.7 0.6 0.6 0.9 0.5 0.7 1 0.9 0.7 0.2 0.7 0.3 1 0.5 0.8 0.5 1 1 1 0.7 0.5 0.6 0.8 0.2 0.8 0.4 0.8 0.7 0.6 0.7 (c) Permissions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 5 4 2 3 3 0 0 3 FOOD 12 12 12 12 12 6 12 6 CLOUD 2 0 2 1 0 0 0 1 AVG 7 7 0 3.5 0 0 7 7 CLEAN 8 3 0 1.5 5 2 4 5 All 34 21 20 8 22.5 (d) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 0.8 0.4 0.6 0.6 0 0 0.6 1 1 1 1 0.5 1 0.5 0 1 0.5 0 N.A. 0 0.5 1 0 0.5 0 N.A. 1 1 0.4 0 0.2 0.6 0.4 0.5 0.6 0.6 0.6 0.4 0.7 (e) Prohibitions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 8 8 8 8 8 8 8 8 FOOD 4 0 2 1 2 2 4 2 CLOUD 4 4 4 4 4 0 4 0 AVG 7 7 7 7 7 7 7 7 CLEAN 5 5 5 5 5 0 5 0 All 28 25 26 17 22.5 (f) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0 0.5 0.3 0.5 1 1 0.5 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0.9 0.9 0.7 0.8 (g) Duties (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 3 3 3 3 3 3 3 3 FOOD 6 2 4 3 0 0 4 4 CLOUD 1 1 1 1 1 1 1 1 AVG 1 1 1 1 1 1 1 1 CLEAN 4 4 4 4 4 1 4 1 (h) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0.3 0.7 0.5 0 N.A. 0.7 0.7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.3 1 0.3 during the study, that are discussed ANALYSIS ow the decisions made by the users The decisions taken by the system 3. For example, the SCRAPE data 16 policies and the system decided 4 of the 5 permissions and all the Tables 4a-4h summarize the results tative way. The values are shown e full numbers and the computed decisions (Tables 4a and 4b), and s (Tables 4c and 4d), prohibitions ties (Tables 4g and 4h). The values ne of the user study (data journey ed for each data journey (average as totals considering the decisions at the bottom). The data journeys wenty-two policies to be analysed ven decisions. Table 4a shows the CLOUD 2 0 2 1 0 0 0 1 AVG 7 7 0 3.5 0 0 7 7 CLEAN 8 3 0 1.5 5 2 4 5 All 34 21 20 8 22.5 0 1 0 (e) Prohibitions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 8 8 8 8 8 8 8 8 FOOD 4 0 2 1 2 2 4 2 CLOUD 4 4 4 4 4 0 4 0 AVG 7 7 7 7 7 7 7 7 CLEAN 5 5 5 5 5 0 5 0 All 28 25 26 17 22.5 T 1 0 1 1 1 (g) Duties (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 3 3 3 3 3 3 3 3 FOOD 6 2 4 3 0 0 4 4 CLOUD 1 1 1 1 1 1 1 1 AVG 1 1 1 1 1 1 1 1 CLEAN 4 4 4 4 4 1 4 1 All 15 12 9 6 11.5 T 1 0 1 1 1 altough this aspect will be discussed w classes of policies. Tables 4c and 4d only show result type permission. The average agreeme discussed the users e system PE data decided d all the e results e shown omputed 4b), and hibitions he values journey (average decisions journeys analysed hows the CLOUD 2 0 2 1 0 0 0 1 AVG 7 7 0 3.5 0 0 7 7 CLEAN 8 3 0 1.5 5 2 4 5 All 34 21 20 8 22.5 0 1 0.5 0 N.A. 0 0.5 1 0 0.5 0 N.A. 1 1 0.4 0 0.2 0.6 0.4 0.5 0.6 0.6 0.6 0.4 0.7 (e) Prohibitions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 8 8 8 8 8 8 8 8 FOOD 4 0 2 1 2 2 4 2 CLOUD 4 4 4 4 4 0 4 0 AVG 7 7 7 7 7 7 7 7 CLEAN 5 5 5 5 5 0 5 0 All 28 25 26 17 22.5 (f) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0 0.5 0.3 0.5 1 1 0.5 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0.9 0.9 0.7 0.8 (g) Duties (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 3 3 3 3 3 3 3 3 FOOD 6 2 4 3 0 0 4 4 CLOUD 1 1 1 1 1 1 1 1 AVG 1 1 1 1 1 1 1 1 CLEAN 4 4 4 4 4 1 4 1 All 15 12 9 6 11.5 (h) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0.3 0.7 0.5 0 N.A. 0.7 0.7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.3 1 0.3 0.8 0.6 0.7 0.8 altough this aspect will be discussed when looking at specific classes of policies. Tables 4c and 4d only show results involving policies of type permission. The average agreement between the system Prohibitions 0 7 7 2 4 5 8 22.5 1 0 0.5 0 N.A. 1 1 0.4 0 0.2 0.6 0.4 0.5 0.6 0.6 0.6 0.4 0.7 otals) T12+ T1+ T2+ 8 8 8 2 4 2 0 4 0 7 7 7 0 5 0 17 22.5 (f) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0 0.5 0.3 0.5 1 1 0.5 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0.9 0.9 0.7 0.8 als) T12+ T1+ T2+ 3 3 3 0 4 4 1 1 1 1 1 1 1 4 1 6 11.5 (h) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0.3 0.7 0.5 0 N.A. 0.7 0.7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.3 1 0.3 0.8 0.6 0.7 0.8 be discussed when looking at specific nly show results involving policies of verage agreement between the system ng all the decisions is 0.6. Particularly, AVG 7 7 0 3.5 0 0 7 7 CLEAN 8 3 0 1.5 5 2 4 5 All 34 21 20 8 22.5 1 0 0.5 0 N.A. 1 1 0.4 0 0.2 0.6 0.4 0.5 0.6 0.6 0.6 0.4 0.7 (e) Prohibitions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 8 8 8 8 8 8 8 8 FOOD 4 0 2 1 2 2 4 2 CLOUD 4 4 4 4 4 0 4 0 AVG 7 7 7 7 7 7 7 7 CLEAN 5 5 5 5 5 0 5 0 All 28 25 26 17 22.5 (f) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0 0.5 0.3 0.5 1 1 0.5 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0.9 0.9 0.7 0.8 (g) Duties (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 3 3 3 3 3 3 3 3 FOOD 6 2 4 3 0 0 4 4 CLOUD 1 1 1 1 1 1 1 1 AVG 1 1 1 1 1 1 1 1 CLEAN 4 4 4 4 4 1 4 1 All 15 12 9 6 11.5 (h) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 1 1 1 1 1 1 1 0.3 0.7 0.5 0 N.A. 0.7 0.7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.3 1 0.3 0.8 0.6 0.7 0.8 altough this aspect will be discussed when looking at specific classes of policies. Tables 4c and 4d only show results involving policies of type permission. The average agreement between the system and the users considering all the decisions is 0.6. Particularly, Duties ourneys: System decisions P ermissions P rohibitions Duties 4/5 8/8 3/3 0/12 4/4 4/6 0/2 4/4 1/1 0/7 7/7 1/1 0/8 5/5 4/4 4/34 28/28 13/15 sion (Q.1, Q.9). This feedback shows n is a di cult problem, although it ight knowledge models. Therefore, a k has good value for users. The last ant to understand whether the Data ally decide on policy propagation. It the users think he/she cannot solve /she should involve the data owner in this task. This conclusion reflects during the study, that are discussed Table 4: Agreement analysis. D: total number of decisions; T1, T2: ag tem and each team; Tavg: average agre and system; T12: agreement between t between teams (only Certainly Yes/A T1+, T2+: amount of Certainly Yes/Abso Tables on the left indicate totals, while show the related ratios. (a) All decisions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 16 15 13 14 14 11 11 14 FOOD 22 14 18 16 14 8 20 12 CLOUD 7 5 7 6 5 1 5 2 AVG 15 15 8 11.5 8 8 15 15 CLEAN 17 12 9 10.5 14 3 13 6 All 77 58 55 31 56.5 T1 0.9 0.6 0.7 1 0.7 (c) Permissions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 5 4 2 3 3 0 0 3 FOOD 12 12 12 12 12 6 12 6 CLOUD 2 0 2 1 0 0 0 1 AVG 7 7 0 3.5 0 0 7 7 CLEAN 8 3 0 1.5 5 2 4 5 All 34 21 20 8 22.5 T1 0.8 1 0 1 0.4 ons Duties 3/3 4/6 1/1 1/1 4/4 13/15 ack shows though it erefore, a The last the Data gation. It not solve ata owner on reflects discussed Table 4: Agreement analysis. D: total number of decisions; T1, T2: agreement between sys- tem and each team; Tavg: average agreement between teams and system; T12: agreement between teams; T12+ agreement between teams (only Certainly Yes/Absolutely No answers); T1+, T2+: amount of Certainly Yes/Absolutely No answers. Tables on the left indicate totals, while the ones on the right show the related ratios. (a) All decisions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 16 15 13 14 14 11 11 14 FOOD 22 14 18 16 14 8 20 12 CLOUD 7 5 7 6 5 1 5 2 AVG 15 15 8 11.5 8 8 15 15 CLEAN 17 12 9 10.5 14 3 13 6 All 77 58 55 31 56.5 (b) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 0.9 0.8 0.9 0.9 0.8 0.7 0.9 0.6 0.8 0.7 0.6 0.6 0.9 0.5 0.7 1 0.9 0.7 0.2 0.7 0.3 1 0.5 0.8 0.5 1 1 1 0.7 0.5 0.6 0.8 0.2 0.8 0.4 0.8 0.7 0.6 0.7 (c) Permissions (totals) Journey D T1 T2 Tavg T12 T12+ T1+ T2+ SCRAPE 5 4 2 3 3 0 0 3 FOOD 12 12 12 12 12 6 12 6 CLOUD 2 0 2 1 0 0 0 1 AVG 7 7 0 3.5 0 0 7 7 CLEAN 8 3 0 1.5 5 2 4 5 All 34 21 20 8 22.5 (d) (ratios) T1 T2 Tavg T12 T12+ T1+ T2+ 0.8 0.4 0.6 0.6 0 0 0.6 1 1 1 1 0.5 1 0.5 0 1 0.5 0 N.A. 0 0.5 1 0 0.5 0 N.A. 1 1 0.4 0 0.2 0.6 0.4 0.5 0.6 0.6 0.6 0.4 0.7 All policies Feedback:  @enridaga  
  • 19. Thematic analysis • Expected disagreements: A. system should propagate a policy, it didn’t B. system should block a policy, it didn’t C. system cannot decide: information is not enough • (C) never occurred: the system has enough information to make a decision! Example FOOD journey: high disagreement between teams, due to the different interpretation of the output: (a) an enhanced version of the input (input included in the output) (b) an independent dataset (input not included in the output) (More discussion in the paper) 19
  • 20. Thematic analysis • Incomplete knowledge: rules missing or wrong, data flows inaccurate, … • Data reverse engineering: recurring theme (e.g. FOOD) • Content-dependent decisions: current experiments assumed reasoning at design time, although it is also possible to reason on execution traces • Dependant policies: permission:modify implies permission:use - but we haven’t considered that • The Legal Knowledge: users commented on the lack of legal expertise. – a support tool is useful – a legal framework on top of the components is theoretically possible, although unlikely in the short term … 20
  • 21. Conclusions • The system is accurate as developers/ data managers can be • variance between the teams similar to the ones between each team and system (Cohen’s kappa coefficient) • The task is perceived as difficult, although not impossible, and the system is therefore of good value for users • The assumption behind the system is correct: there is a fundamental correspondence between the possible data-to-data relations and the way policies are propagated • (and reuse does not necessarily imply derivation) 21Feedback:  @enridaga  
  • 22. Future work • How to consider the rights of the stakeholders other then data publishers (e.g. process designer?) • How to assess consistency of processes wrt policies • Policy and process knowledge must be accurate and reflect the relevant data-to-data relations. How can we assure that? Challenge: “They must do it together” • Need for shared knowledge about “data actions”, from jargon to consensus (remodelling, refactoring, extraction, …) • Negotiation of policies between data providers, processors and consumers - also a non-technical challenge 22